-
Notifications
You must be signed in to change notification settings - Fork 9
/
README.Rmd
93 lines (70 loc) · 2.45 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
Sys.setenv("RETICULATE_REMAP_OUTPUT_STREAMS" = 1)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# ctgan
<!-- badges: start -->
[![Codecov test coverage](https://codecov.io/gh/kasaai/ctgan/branch/master/graph/badge.svg)](https://codecov.io/gh/kasaai/ctgan?branch=master)
[![R build status](https://github.com/kasaai/ctgan/workflows/R-CMD-check/badge.svg)](https://github.com/kasaai/ctgan/actions)
<!-- badges: end -->
The ctgan package provides an R interface to [CTGAN](https://github.com/DAI-Lab/CTGAN),
a GAN-based data synthesizer. The package enables one to create synthetic samples
of confidential or proprietary datasets for sharing. For more details and use cases,
see the papers in the [References](#references) section.
## Installation
You can install the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("remotes")
remotes::install_github("kasaai/ctgan")
```
## Example
A quick example:
```{r example}
library(ctgan)
# Install dependencies before first usage
# install_ctgan()
synthesizer <- ctgan()
synthesizer %>%
fit(iris, epochs = 20)
synthesizer %>%
ctgan_sample() %>%
# Dataset-specific post-processing
dplyr::mutate_if(is.numeric, ~ pmax(.x, 0.1))
```
This generated dataset can then be shared, but one can also serialize and share the trained synthesizer:
```{r}
model_dir <- tempdir()
synthesizer %>%
ctgan_save(model_dir)
ctgan_load(model_dir)
```
## References
If you use ctgan, please consider citing the original work,
- *Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, Kalyan Veeramachaneni.* **Modeling Tabular data using Conditional GAN**. NeurIPS, 2019.
and the work implementing the R package,
- *Kevin Kuo.* **Generative Synthesis of Insurance Datasets**. [arXiv:1912.02423](https://arxiv.org/abs/1912.02423), 2019.
```LaTeX
@inproceedings{xu2019modeling,
title={Modeling Tabular data using Conditional GAN},
author={Xu, Lei and Skoularidou, Maria and Cuesta-Infante, Alfredo and Veeramachaneni, Kalyan},
booktitle={Advances in Neural Information Processing Systems},
year={2019}
}
@misc{kuo2019generative,
title={Generative Synthesis of Insurance Datasets},
author={Kevin Kuo},
year={2019},
eprint={1912.02423},
archivePrefix={arXiv},
primaryClass={stat.AP}
}
```