A restless night often leads to fatigue the next day, but it can also indicate health problems that may not arise until much later. Scientists at Stanford Medicine and their collaborators have developed an artificial intelligence system that can examine the body signals from a single night of sleep and assess a person’s risk for more than 100 different diseases. The system, called SleepFM, was trained using nearly 600,000 hours of sleep recordings from 65,000 people. These recordings come from polysomnography, an in-depth sleep test that uses multiple sensors to track brain activity, heart function, breathing patterns, eye movements, leg movements, and other physical signals during sleep.
Sleep Studies Contain Untapped Health Data
Polysomnography is considered the gold standard for assessing sleep and is typically performed overnight in a laboratory. While it is often used to diagnose sleep disorders, researchers have recognized that it also captures a wealth of physiological information that has not been fully analyzed. “We record an astonishing number of signals when we study sleep,” said Dr. Emmanual Mignot, PhD, Craig Reynolds Professor of Sleep Medicine and co-senior author of the new study published in Nature Medicine. “It’s a kind of general physiology that we study for eight hours in a person who is completely in our care. The data is very extensive.”

Teaching AI Sleep Patterns
To gain insights from the data, the researchers developed a base model, a type of AI designed to learn general patterns from very large datasets and then apply that knowledge to many tasks. Large language models such as ChatGPT use a similar approach, but they are trained on text rather than biological signals. SleepFM was trained on 585,000 hours of polysomnography data collected from patients who were examined in sleep clinics. Each sleep monitoring session was divided into five-second segments, which function similarly to words used to train language-based AI systems. SleepFM essentially learns the language of sleep, Zou said.
The model integrates multiple streams of information, including brain signals, heart rhythms, muscle activity, pulse measurements, and airflow during breathing, and learns how these signals interact with each other. To help the system understand these relationships, the researchers developed a training method called “Leave-One-Out Contrastive Learning.” In this approach, one type of signal is removed at a time and the model is asked to reconstruct it using the remaining data. One of the technical innovations they achieved in this work is figuring out how to harmonize all these different data modalities so that they can come together and learn the same language.
Predicting Future Diseases Based on Sleep
After training, the researchers adapted the model for specific tasks. First, they tested it in standard sleep studies, such as identifying sleep stages and assessing the severity of sleep apnea. In these tests, SleepFM matched or even surpassed the performance of the current leading models. The team then pursued a more ambitious goal: to find out whether sleep data can predict future diseases. To do this, they linked polysomnography recordings with the long-term health outcomes of the same individuals. This was possible because the researchers had access to decades of medical records from a single sleep clinic.
The Stanford Sleep Medicine Center was founded in 1970 by the late Dr. William Dement, PhD, widely considered the father of sleep medicine. The largest group used to train SleepFM included about 35,000 patients aged 2 to 96. Their sleep studies were recorded at the clinic between 1999 and 2024 and matched with electronic health records that followed some patients for up to 25 years. The clinic’s polysomnography records go back even further, but only in paper form, according to Mignot, who headed the sleep center from 2010 to 2019. Using this combined dataset, SleepFM reviewed more than 1,000 disease categories and identified 130 conditions that could be predicted with reasonable accuracy based on sleep data alone. The best results were achieved for cancer, pregnancy complications, circulatory diseases, and mental disorders, with prediction values above a C-index of 0.8.
How Prediction Accuracy is Measured
The C-index, or concordance index, measures how well a model can rank individuals according to their risk. It indicates how often the model correctly predicts which of two individuals will experience a health event first. “For all possible pairs of individuals, the model ranks who is more likely to experience an event—such as a heart attack—earlier. A C-index of 0.8 means that the model’s prediction matches the actual event in 80% of cases,” explained Zou. SleepFM performed particularly well in predicting Parkinson’s disease (C-index 0.89), dementia (0.85), hypertensive heart disease (0.84), heart attack (0.81), prostate cancer (0.89), breast cancer (0.87), and death (0.84).
The researchers were pleasantly surprised that the model can make meaningful predictions for a wide range of different diseases. Zou also pointed out that models with lower accuracy, often with a C-index of around 0.7, are already being used in medical practice, for example as a tool to predict how patients might respond to certain cancer treatments.
Understanding What AI Sees
The researchers are now working on improving SleepFM’s predictions and better understanding how the system arrives at its conclusions. Future versions could incorporate data from wearable devices to expand the range of physiological signals. “It doesn’t explain it to us in words,” Zou said. “But we’ve developed various interpretation techniques to figure out what the model is looking at when it makes a particular disease prediction.”
The team found that while heart-related signals had a greater impact on predicting cardiovascular disease and brain-related signals played a greater role in predicting mental health, the most accurate results came from combining all types of data. “Most of the information we obtained for predicting disease came from comparing the different channels,” Mignot said. Physical components that were out of sync—such as a brain that is asleep but a heart that is awake—seemed to herald problems. Rahul Thapa, a doctoral student in biomedical data science, and Magnus Ruud Kjaer, a doctoral student at the Technical University of Denmark, are the co-authors of the study. Researchers from the Technical University of Denmark, Copenhagen University Hospital Rigshospitalet, BioSerenity, the University of Copenhagen, and Harvard Medical School contributed to this work.







