Projects

Our projects fall into three groups, Drug Safety Discovery, Advancing Cancer Research and Care, and Data, Information, and Knowledge Extraction. You can find more details about each of these groups below or you can browse our publicly available code here and our publicly available resources here.

Drug Safety Discovery

Drug safety is an essential component of modern healthcare, aiming to maximize therapeutic benefits while minimizing harm to patients through robust pharmacovigilance practices. Adverse drug events (ADEs) are the fourth leading cause of death in the US and cost billions of dollars annually in increased healthcare costs. At the Tatonetti Lab, we use advanced data science techniques—including artificial intelligence and machine learning—to investigate drug safety. By leveraging emerging resources such as electronic health records (EHRs) and genomics databases, we aim to drive innovation and improve our understanding of medication risks and benefits.

In this effort, we’ve released and maintaine the world’s most comprehensive side effect resource, OnSIDES. OnSIDES includes over 3.6 million drug–adverse drug event (ADE) pairs for 2,793 drug ingredients extracted from 46,686 publicly available drug labels. Additionally, we support research into sex-specific drug effects and provide access to Sex-ADE, a curated dataset of side effects that differ between men and women. We also maintain the OffSIDES, KidSIDES and TwoSIDES databases—OffSIDES identifies unexpected ADEs by analyzing large-scale observational data (e.g., from EHRs). TwoSIDES extends this approach to study adverse drug–drug interactions, while KidSIDES ocuses specifically on pediatric drug safety signals during childhood developmental phases. Full details about these databases and how to access them are available here.

In active research, we are investigating how the intermediate layers of large language models (LLMs) that are trained for ADE prediction can be interpreted to better understand the relationships between drugs and their associated ADE. This includes leveraging molecular structure embeddings as well as patient data from EHRs, particularly unstructured clinical notes. We are also examining the adverse effects associated with immune checkpoint inhibitor (ICI) treatments by combining clinical notes, electronic health records, and mechanistic understanding of immune-related adverse drug reactions. This work will build the foundation for predictive models that can identify patients at risk of ADEs prior to treatment .

Advancing Cancer Research & Care

Cancer remains the second leading cause of death in the United States—and the leading cause among individuals under 85. Decades of intensive research have significantly improved cancer outcomes. Notably, cancer mortality has continued to decline through 2021, largely due to earlier detection and advances in treatment.

At the Tatonetti Lab, we dedicate another key area of our research to advancing cancer care by leveraging AI—particularly through the application of LLMs to both structured and EHR data. Our cancer-focused projects fall into two main categories: clinical applications and investigations into the underlying biological mechanism of cancer:

Real world clinical application projects can be grouped in the following area:

Beyond clinical applications, we are also investigating the biological mechanisms that drive cancer. One of our ongoing studies examines the role of Y chromosome loss in male cancer patients. This age-related mutation appears to help cancer cells evade the immune system, contributing to aggressive bladder cancer. Paradoxically, it also makes the disease more responsive to immune checkpoint inhibitors, a standard form of treatment.

Data, Information & Knowledge Extraction

The third key area of our research focuses on advancing data, information, and knowledge extraction from both structured and unstructured EHRs.

As part of this effort, we developed Chappy, a secure chatbot designed for use within the Cedars-Sinai ecosystem. Chappy enables users to interact with various large language models (LLMs) in a way that is compliant with PHI regulations. These models are mainly deployed through Azure, the cloud infrastructure built on the Cedars-Sinai–Microsoft platform partnership. This secure environment ensures that data privacy is maintained while allowing users to modify and experiment with LLM capabilities safely.

We are also studying the impact of accent-related bias in state-of-the-art automatic speech recognition (ASR) systems and LLMs. This project evaluates the transcription accuracy of voice recordings from participants with diverse linguistic backgrounds and accents, assessing how ASR and LLM technologies handle this variability.

We are leveraging the previously developed RIFTEHR tool to infer familial relationships from emergency contact data in Cedars-Sinai's EHR system, enabling large-scale heritability studies. Using this approach, we’ve estimated the genetic contribution to nearly 500 diseases, providing a scalable alternative to traditional genetic testing.