About Me

Epidemiologist, data scientist, developer. Master’s in Epidemiology from McGill. Master’s in Computer and Information Technology from University of Pennsylvania. Co-author of many peer-reviewed publications. Experienced in epidemiologic and biostatistical methods, developing autonomous AI agent systems, and building inferential and predictive models for clinical applications. Expert in diverse real-world data sources (EMR, claims, next-gen sequencing), clinical trial data and standards, and multi-language development (R, Python, SQL, STAN, C++).

Contact

secrmatt@gmail.com

linkedin.com/in/matthewsecrest

github.com/mattsecrest

Technical Skills

Domain Expertise: Epidemiology, real-world evidence (RWE), observational study design, oncology drug development, biostatistics, causal inference

Generative AI:

  • Frameworks: LangGraph, LangChain
  • Context: retrieval-augmented generation (RAG), LLM fine-tuning, Model Context Protocol (MCP) deployment
  • Inference: API integration (Portkey, Azure, AWS, OpenAI), local model deployment (ollama, Jan, MLStudio)
  • Preferred LLMs and tools: Claude Opus 4.6 + GitHub Copilot or Claude Code (mostly in-line edits, moderate use of agent mode), Gemini 3 Pro (deep research tools)

Programming and Modeling:

  • Data science / Statistics: R (expert; base R, tidyverse, data.table, R6), Python (expert; pandas, scikit-learn, pytorch)
  • Bayesian modeling: Stan, JAGS
  • Database: SQL
  • Compiled / low-level: Java, C++

Interactive Applications & Visualization:

  • Dashboards: Quarto, R Shiny, Python Streamlit, Python Chainlit
  • Visualization: ggplot2, plotly, matplotlib

MLOps & Scientific Reproducibility:

  • Environments: Docker/Podman, renv/virtualenv, uv
  • Version Control: Git, GitHub / GitLab

Education

Master of Computer and Information Technology | University of Pennsylvania | 2026

Master of Science, Epidemiology | McGill University | 2016

Bachelor of Science, Chemistry | Wake Forest University | 2010

Experience

Principal Data Innovation Specialist | Genentech | 10/2025 - Present

Design and scale innovative solutions to support real-world and clinical data strategy. Lead the global real-world data analytics community (knowledge sharing, networking). Prototype autonomous agentic systems (RAG agents) to automate complex epidemiological workflows and enable and drive adoption of agentic frameworks (Claude code, GitHub copilot + MCP agents, Aider).

Principal Data Scientist | Genentech | 8/2024 - 10/2025

Designed and execute real-world/observational epidemiologic studies using EMR, claims, and NGS data to support product development in oncology. Developed dashboards, R packages, subject-matter expert Docker imgaes to support drug development and improve research reproducibility.

Senior Data Scientist | Genentech | 10/2021 - 8/2024

Data Scientist | Genentech | 3/2020 - 10/2021

Clinical Data Scientist | Verana Health | 11/2019 - 3/2020

Designed and executed real-world data epidemiologic studies in ophthalmology and neurology using proprietary EMR data. Mapped unstructured EMR data to structured clinical features.

Fellow | The Data Incubator | 9/2019 - 11/2019

Participated in an advanced, 8-week data science program designed to transition academic researchers to industry research. Created a Heroku app to predict the likelihood of drug approval from a clinical trial abstract using natural language processing.

Consultant | IQVIA | 8/2017 – 9/2019

Designed and managed studies of drug safety and effectiveness in secondary datasets for market access (single arm, historical comparator), label expansion, and post-marketing surveillance. Researched rare disease prevalence through literature reviews and steady-state disease modeling. Evaluated risk evaluation and mitigation strategy effectiveness.

Research Assistant | Lady Davis Institute | 7/2016 - 7/2017

Developed a method for a unique missing data problem in distributed data and evaluated its effectiveness in >1000GB of simulated patient-level data via a super-computer and in real EMR data from 59,957 patients in the UK. Designed a study advocating for increased study population restriction to reduce bias.

Research Assistant | Jill Baumgartner’s Group | 8/2014 – 6/2016

Conducted household air pollution measurements in China, creating an R program at the field site to convert raw sensor data from to climate modeling inputs. Analyzed the composition of 20 air pollution samples to identify pollution sources that generate high oxidative potential with factor analysis.

Senior Media Analyst | iCrossing | 11/2012 – 7/2014

Analyzed marketing data using a proprietary software.

Research Fellow | New York University | 8/2011 – 8/2012

Developed methods for chemical synthesis and instructed classes and laboratories.

Fulbright Fellow | US Department of State | 9/2010 – 5/2011

Taught English to French high schoolers and conducted sociologic researche informed by primary data collection.