Go to file
Paul-Corbalan 850684d213 Upload of code 2023-01-07 07:30:24 +01:00
original_data Upload of code 2023-01-07 07:30:24 +01:00
visualisation Upload of code 2023-01-07 07:30:24 +01:00
.gitignore Upload of code 2023-01-07 07:30:24 +01:00
README.md Upload of code 2023-01-07 07:30:24 +01:00
compare_data.py Upload of code 2023-01-07 07:30:24 +01:00
compare_original_synthetic.ipynb Upload of code 2023-01-07 07:30:24 +01:00
create_fake_data.ipynb Upload of code 2023-01-07 07:30:24 +01:00
data_treatment.py Upload of code 2023-01-07 07:30:24 +01:00
discriminator.py Upload of code 2023-01-07 07:30:24 +01:00
environment.yml Upload of code 2023-01-07 07:30:24 +01:00
fake_data_analysis.ipynb Upload of code 2023-01-07 07:30:24 +01:00
format_data.py Upload of code 2023-01-07 07:30:24 +01:00
generator.py Upload of code 2023-01-07 07:30:24 +01:00
train_generator_discriminator.py Upload of code 2023-01-07 07:30:24 +01:00
utils.py Upload of code 2023-01-07 07:30:24 +01:00

README.md

Use of "Generative Adversarial Networks" for the generation of virtual patients

Abstract

The use of Generative Adversarial Networks (GANs) for the generation of virtual patients is a promising approach for healthcare applications. GANs are a type of deep learning model that can generate new, realistic data that is similar to existing data. In this report, we discuss the use of GANs for the generation of virtual patients. We review the current state of the art in GANs and their applications with the Pima Indians Diabetes Database. We compare this work with what has been done previously with this dataset for the copulas from article Agent-Based modeling in Medical Research. Example in Health Economics[1]. A practical case is discussed with the training of the Generative Adversarial Network, for several possible configurations and taking into account what is done in similar works (Data augmentation using GANs[2]). We then generated data with the GAN and copulas for comparison. Through this comparison, we observe different generated data in terms of distribution for those generated with Generative Adversarial Networks, compared to the original data. The data generated by the copulas are much closer in terms of the spread. We also conclude that both methods are currently limited in generating atypical patient data, but still efficient in generating more conventional data.

[1] Philippe Saint-Pierre, Romain Demeulemeester, Nadège Costa, and Nicolas Savy. Agent-based modeling in medical research. example in health economics. arXiv preprint arXiv:2205.10131, 2022. [2] Fabio Henrique Kiyoiti dos Santos Tanaka and Claus Aranha. Data augmentation using gans. arXiv preprint arXiv:1904.09135, 2019.

Keywords: GAN, copula, PIMA, synthetic data, virtual patients, ABM, healthcare, Artificial Intelligence (AI).

Dependencies

Python libraries

To run Python scripts and Jupyter notebooks, please use the following command in terminal once Anaconda is install:

conda env create --file environment.yml

R libraries

To run and compile the code Prog_R_IJB_2022 in PDF format, please use the following command in R terminal before running the file:

install.packages(c("rvinecopulib", "e1071", "GGally", "caret", "MASS", "tidyverse", "corrr", "lsr", "cowplot", "EnvStats", "ggraph", "fitdistrplus", "truncdist", "truncnorm"))