Skip to content
This repository has been archived by the owner on Sep 3, 2022. It is now read-only.

Datalab is not responding after machine learning model training #2161

Open
OrielResearchCure opened this issue Mar 22, 2020 · 0 comments
Open

Comments

@OrielResearchCure
Copy link

Hello,

I am using datalab machine 32 or 64 cores to train a model.

code looks like that: a keras model execution:

model_history = model.fit_generator(train_gen,
                                                steps_per_epoch = 30,
                                                epochs = EPOCHS,
                                                validation_data = validation_gen,
                                                validation_steps = 20,
                                                callbacks = [early_stopping,tensorboard_callback],
                                                class_weight=class_weight)

The training is running fine. the early stopping is enforced:

30/30 [==============================] - 71s 2s/step - loss: 0.0784 - tp: 863.0000 - fp: 484.0000 - tn: 7045.0000 - fn: 60.0000 - accuracy: 0.9356 - precision: 0.6407 - recall: 0.9350 - auc: 0.9835 - val_loss: 0.0501 - val_tp: 500.0000 - val_fp: 20.0000 - val_tn: 4060.0000 - val_fn: 0.0000e+00 - val_accuracy: 0.9956 - val_precision: 0.9615 - val_recall: 1.0000 - val_auc: 1.0000
Epoch 00010: early stopping

Once this is completed. the machine is disconnected. The only way for me to access the machine is restarting it.
Connection trials error is 504 gateway time out

My questions are:

  1. What might be causing the disconnect. If possible, I rather keep on using the datalab for models training.
  2. I run the following installations at the beginning of the execution:
!pip install tensorflow==2.0.0b0 -q
!conda install -y -c anaconda numpy
!conda install -y -c anaconda seaborn

Where will be the right way to include them so the machine will be already installed with these libraries when I create the machine or connect to it.

Many thanks for any advice,
eilalan

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant