-
Notifications
You must be signed in to change notification settings - Fork 3
/
readme.txt
61 lines (50 loc) · 2.68 KB
/
readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
1. Experiments:
sbatch
dir: src/Experiments/
5 different sbatch scripts: "bootstrapt_sbatch_i", where i = 0,1,2,3,4.
2. Source codes:
inference
dir:
src/EM_exact_death_times_hierarchical_shared.py
This code has following arguments:
name: name of the experiment
min_age: minimum age, default = 16
dataset: data specification, default = updated_data
age_inv: age interval specification, default = inv4, which is [16,23), [23,30), [30, 60), [60, inf)
n_patients_per_proc: number of patients per process, default = 100.
max_steps_em: number of EM iterations, default = 100
max_steps_optim: maximum number of optimal iterations in the M step, default = 5.
model: discrete (Markov Chain) and continuous (Markov Process), default = continuous.
test: boolean indicator for testing, default = store_true
autograd_optim: boolean indicator for auto-gradient in optimization (M-step).
Z_prior: prior probability of Model 1 (prior proportion of potential patients).
bootstrap_seed: seed number setting in Boostrap, default = 0.
bootstrap_total: number of total seeds, defualt = 5000
return:
counter: the number of iterations in EM
currZ_pos_list: [n_patients]
currStates_list: [currStates_0, currStates_1]
currAlpha: [currAlpha_0, currAlpha_1]: [[nStates_i, 4], [nStates_i,4], [nStates_i,2]]
currEta: [currEta_0, currEta_1]: [nState_i, 3]
currW: [currW_0, currW_1]: [4, n_inv], [9, n_inv]
currC: [currC_0, currC_1]: [nState_i, n_inv]
currNegLogLik: scalar
inv: age segmentation
validation
dir: src/model_validation/model_validation.py
name: name of the experiment, default = Model_Validation
min_age: minimum age default = 16
dataset: data specification, default = updated_data
age_inv: age interval specification, default = inv4, which is [16,23), [23,30), [30, 60), [60, inf)
n_patients_per_proc: number of patients per process, default = 100.
test: boolean indicator for testing, default = store_true
data_process
dir: src/combine_distributed_data.py
It combines all/part of distributed data (distributed_updated_nonzero_data, distributed_updated_data) to a integral data saved in (data_full (14601 distributed file and each file include 100 patient data), data_1000, data_nonzero_1000)
The save structure has following features:
1. mcmcPatientTestTypes: testTypes[p] patient_tests, patient_tests[j] patient_test ([3,], 3 categories)
2. mcmcPatientObservations: observations[p] patient_observations, patient_observations[j] patient_observation ([3,4], count)
3. mcmcPatientAges: ages[p] patient_ages, patient_ages[j] patient_age (scaler)
4. mcmcPatientTreatmentIndx: treatment_indx[p] patient_treatment_indx (array)
5. mcmcPatientCensorDates: censor_ages[p] censor_age (scalar)
6. mcmcPatientDeathStates: death_states[p] death_state ([0,1])