Demystifying MIMIC3 Dataset of Neonatal Patients

by | May 22, 2017 | Articles | 0 comments

MIMIC3 is a publicly available database solution comprising de-identified health related data. It provides data of 46,520 patients who stayed in ICU of the Beth Israel Deaconess Medical Centre from 2001-2012(2).

Following our vision to predict future complications in neonates and prevent neonatal mortalities, we used the MIMIC3(1) data as our learning model to understand few patterns related to diagnosis and vitals.(2)

The dataset consists of numerous tables containing patient data such as their demographics, clinical, and diagnostic information. It also covers procedural events along with medications provided to the patient during the ICU stay.
MIMIC3 Dataset

Different tables in the dataset are listed below

The data for 7863 neonates was filtered from 46,520 patients based on admission_type attribute in ADMISSIONS table as ‘NEWBORN’(4). Using unique identifiers for each neonate we extracted a list of top 10 neonatal diseases from the table D_ICD_DIAGNOSES. We then mapped them with vitals such as heart rate, respiratory rate, etc. present in the table CHARTEVENTS.

CHARTEVENTS contains all patients’ routine vital signs, and other additional information like ventilators settings, lab values, code status, and mental status.(3,5)

Top 10 neonatal diseases

Domain experts have observed that the occurrence of a single disease in a neonate is rare. Evidences are present that prove the comorbidity of several diseases in neonates is pretty obvious.

Let’s take an example of RDS as its one of the most prevalent disease in NICU and one of the leading causes of neonatal death. Below is the co-morbidity matrix for MIMIC 3 patient data. Diagonal rows represent patients having only a specific disease. For example, number of babies suffering only with RDS are 16.

Until now, most large-scale studies of preterm infants are more focused on cross-sectional data study but there is no single longitudinal study which contains a complete record of health, physiological parameters, genetics and environmental profiles of preterm infants since birth till they grow up to adolescent age. This information is critical for predicting the overall mortality and morbidity of neonates. This project aims to acquire big data in the neonatal domain and defining disease-specific phenotypes. These phenotypes will be measured longitudinally over a large Indian population to help in reducing mortality and improving interventions in the neonatal population.

Going forward, we tried to map physiological parameters with each of these diseases. With the help of CHARTEVENTS Table, we identified most common vital signs and mapped them with the above-mentioned conditions.
To link diseases with measured vitals, we looked at disease v\s frequency count of measured vital (below is subset of complete analysis). There are 1576 neonate-specific vital in MIMIC 3 dataset.

Below table only shows the subset of the same:

From domain (literature) point of view it is known that vitals listed below are associated with RDS:

iNICU is also capturing and storing all physiological and clinical parameters. We will use this data to identify different patterns based on our learning from the above model.
iNICU can easily capture the frequency of measured vitals and help in validating domain based hypothesis. This will allow doctors to do statistical research to determine the changes in the vitals that can lead to different diseases.
iNICU will predict these conditions based on the warning signs and notify the health care providers at an early stage to prevent the life of our fragile patients.




Submit a Comment

Your email address will not be published. Required fields are marked *

Happy Clients

14 Robinson Road, #12-01/02 Far East Finance Building, Singapore 048545. | +91-11-40644232