first commit

uzh-rpg · Nov 18, 2021 · 818af9a · 818af9a
commit 818af9a
Show file tree

Hide file tree

Showing 23 changed files with 2,975 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -0,0 +1,117 @@
+# Event-based Vision meets Deep Learning on Steering Prediction for Self-driving Cars
+
+This repository contains a deep learning approach to unlock the potential of event cameras on the prediction of a vehicles's steering angle.
+
+#### Citing
+
+If you use this code in an academic context, please cite the following publication:
+
+Paper: [Event-based vision meets deep learning on steering prediction for self-driving cars](http://rpg.ifi.uzh.ch/docs/CVPR18_Maqueda.pdf)
+
+Video: [YouTube](https://www.youtube.com/watch?v=_r_bsjkJTHA&feature=youtu.be)
+
+```
+@inproceedings{maqueda_2018,
+  title={Event-based vision meets deep learning on steering prediction for self-driving cars},
+  author={Maqueda, Ana I and Loquercio, Antonio and Gallego, Guillermo and Garc{\i}a, Narciso and Scaramuzza, Davide},
+  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
+  pages={5419--5427},
+  year={2018}
+}
+
+```
+
+## Introduction
+
+Steering angle prediction with standard cameras is not robust to scenes characterized by high dynamic range (HDR), motion blur, and low light. Event cameras, however, are bioinspired sensors that are able to solve all three problems at once. They output a stream of asynchronous events that are generated by moving edges in the scene. Their natural response to motion, and their advantages over traditional cameras (very high temporal resolution, very high dynamic range, and low latency) make them a perfect fit for the steering prediction task, which is addressed by a DL-based solution from a regression viewpoint.
+
+
+### Model
+
+A series of ResNet architectures, i.e., ResNet18 and ResNet50, have been deployed to carry out the steering prediction task. They are used as feature extractors, considering only their convolutional layers. Next, a global average pooling (GAP) layer is used to encode the image features into a vectorized descriptor that feeds a fully-connected (FC) layer (256-dimensional for ResNet18 and 1024-dimensional for ResNet50). This FC layer is followed by a ReLU non-linearity, and the final 1-dimensional FC layer to output the predicted steering angle.
+
+![architecture](images/architecture.png)
+
+
+### Data
+
+In order to learn steering angles from event images, the publicly available [DAVIS Driving Dataset 2017 (DDD17)](https://docs.google.com/document/d/1HM0CSmjO8nOpUeTvmPjopcBcVCk7KXvLUuiZFS6TWSg/pub) has been used. It provides approximately 12 hours of annotated driving recordings collected by a car under different road, weather, and illumination conditions. The dataset includes asynchronous events as well as synchronous grayscale frames.
+
+![architecture](images/input_data.png)
+
+
+## Running the code
+
+### Software requirements
+
+This code has been tested on Ubuntu 14.04, and on Python 3.4.
+
+Dependencies:
+- Tensorflow
+- Keras 2.1.4
+- NumPy 
+- OpenCV
+- scikit-learn
+- Python gflags
+
+
+### Data preparation
+
+Please follow the instructions from the [DDD17 site](https://docs.google.com/document/d/1HM0CSmjO8nOpUeTvmPjopcBcVCk7KXvLUuiZFS6TWSg/pub), to download the dataset and visualize the HDF5 file contents. After that, you should get the following structure:
+
+```
+DDD17/
+	run1_test/
+	run2/
+	run3/
+	run4/
+	run5/
+```
+
+Authors also provide some [code](https://code.ini.uzh.ch/jbinas/ddd17-utils) for viewing and exporting the data. Download the repository and copy the files within the ```data_preprocessing``` directory.
+
+Asynchronous events are converted into synchronous event frames by pixel-wise accumulation over a constant time interval, using separate channels for positive and negative events. To prepare the data in the format required by our implementation, follow these steps:
+
+
+#### 1. Accumulate events
+
+Run ```data_preprocessing/reduce_to_frame_based.py``` to reduce data to frame-based representation. The output is another HDF5 file, containing the frame-based data as a result of accumulating the events every other ```binsize``` seconds. The created HDF5 file will contain two new fields:
+- **dvs_frame**: event frames (a 4-tensor, with number_of_frames x width x height x 2 elements).
+- **aps_frame**: grayscales frames (a 3-tensor, with number_of_frames x width x height).
+
+```
+python data_preprocessing/reduce_to_frame_based.py --binsize 0.050 --update_prog_every 10 --keep_events 1 --source_folder ../DDD17 --dest_folder ../DDD17/DAVIS_50ms
+```
+
+Note: the ```reduce_to_frame_based.py``` script is the original ```export.py``` provided by the authors, which has been modified in order to compute several HDF5 files from a source directory, and save positive and negative event frames by separately.
+
+
+
+#### 2. Split recordings
+
+Run ```data_preprocessing/split_recordings.py``` to split the recordings into consecutive and non-overlapping short sequences of a few seconds each. Subsets of these sequences are used for training and testing, respectively. In particular, we set training sequences to 40 sec, and testing sequences to 20 sec.
+
+```
+python data_preprocessing/split_recordings.py --source_folder ./DDD17/DAVIS_50ms --rewrite 1 --train_per 40 --test_per 20
+```
+
+Note: the ```split_recordings.py``` is the original ```prepare_cnn_data.py``` provided by the authors, which has been modified in order to compute several HDF5 files from a source directory, and avoid frame pre-processing.
+
+
+
+#### 3. Compute percentiles
+
+Run ```data_preprocessing/compute_percentiles.py``` to compute some percentiles from DVS/event frames in order to remove outliers, and normalize them.
+
+```
+python data_preprocessing/compute_percentiles.py --source_folder ./DDD17/DAVIS_50ms --inf_pos_percentile 0.0 --sup_pos_percentile 0.9998 --inf_neg_percentile 0.0 --sup_neg_percentile 0.9998
+```
+
+
+#### 4. Export CNN data
+
+Run ```data_preprocessing/export_cnn_data.py``` to export DVS/event frames, APS/grayscale frames, difference of grayscale frames (APS diff) in PNG format, and text files with steering angles form the HDF5 files to be used by the network.
+
+```
+python export_cnn_data.py --source_folder ./DDD17/DAVIS_50ms
+```
diff --git a/cnn.py b/cnn.py
@@ -0,0 +1,204 @@
+import tensorflow as tf
+import numpy as np
+import os
+import sys
+import gflags
+
+from keras.callbacks import ModelCheckpoint
+from keras import backend as K
+import keras
+
+import logz
+import cnn_models
+import utils
+import log_utils
+from common_flags import FLAGS
+from constants import TRAIN_PHASE
+
+
+
+def getModel(img_width, img_height, img_channels, output_dim, weights_path):
+    """
+    Initialize model.
+
+    # Arguments
+       img_width: Target image widht.
+       img_height: Target image height.
+       img_channels: Target image channels.
+       output_dim: Dimension of model output.
+       weights_path: Path to pre-trained model.
+
+    # Returns
+       model: A Model instance.
+    """
+    if FLAGS.imagenet_init:
+        model = cnn_models.resnet50(img_width,
+                                    img_height, img_channels, output_dim)
+    else:
+        model = cnn_models.resnet50_random_init(img_width,
+                                    img_height, img_channels, output_dim)
+
+
+    if weights_path:
+        #try:
+        model.load_weights(weights_path)
+        print("Loaded model from {}".format(weights_path))
+        #except:
+        #    print("Impossible to find weight path. Returning untrained model")
+
+    return model
+
+
+def trainModel(train_data_generator, val_data_generator, model, initial_epoch):
+    """
+    Model training.
+
+    # Arguments
+       train_data_generator: Training data generated batch by batch.
+       val_data_generator: Validation data generated batch by batch.
+       model: Target image channels.
+       initial_epoch: Dimension of model output.
+    """
+
+    # Initialize number of samples for hard-mining
+    model.k_mse = tf.Variable(FLAGS.batch_size, trainable=False, name='k_mse', dtype=tf.int32)
+
+    # Configure training process
+    optimizer = keras.optimizers.Adam(lr=FLAGS.initial_lr, decay=1e-4)
+    model.compile(loss=[utils.hard_mining_mse(model.k_mse)], optimizer=optimizer,
+                  metrics=[utils.steering_loss, utils.pred_std])
+
+    # Save model with the lowest validation loss
+    weights_path = os.path.join(FLAGS.experiment_rootdir, 'weights_{epoch:03d}.h5')
+    writeBestModel = ModelCheckpoint(filepath=weights_path, monitor='val_steering_loss',
+                                     save_best_only=True, save_weights_only=True)
+
+    # Save model every 'log_rate' epochs.
+    # Save training and validation losses.
+    logz.configure_output_dir(FLAGS.experiment_rootdir)
+    saveModelAndLoss = log_utils.MyCallback(filepath=FLAGS.experiment_rootdir,
+                                            period=FLAGS.log_rate,
+                                            batch_size=FLAGS.batch_size,
+                                            factor=FLAGS.lr_scale_factor)
+
+    # Train model
+    steps_per_epoch = np.minimum(int(np.ceil(
+        train_data_generator.samples / FLAGS.batch_size)), 2000)
+    validation_steps = int(np.ceil(val_data_generator.samples / FLAGS.batch_size))-1
+
+    model.fit_generator(train_data_generator,
+                        epochs=FLAGS.epochs, steps_per_epoch = steps_per_epoch,
+                        callbacks=[writeBestModel, saveModelAndLoss],
+                        validation_data=val_data_generator,
+                        validation_steps = validation_steps,
+                        initial_epoch=initial_epoch)
+
+
+def _main():
+
+    # Set random seed
+    if FLAGS.random_seed:
+        seed = np.random.randint(0,2*31-1)
+    else:
+        seed = 5
+    np.random.seed(seed)
+    tf.set_random_seed(seed)
+
+    K.set_learning_phase(TRAIN_PHASE)
+
+    # Create the experiment rootdir if not already there
+    if not os.path.exists(FLAGS.experiment_rootdir):
+        os.makedirs(FLAGS.experiment_rootdir)
+
+    # Input image dimensions
+    img_width, img_height = FLAGS.img_width, FLAGS.img_height
+
+    # Cropped image dimensions
+    crop_img_width, crop_img_height = FLAGS.crop_img_width, FLAGS.crop_img_height
+
+    # Output dimension (one for steering)
+    output_dim = 1
+
+    # Input image channels
+    # - DVS frames: 2 channels (first one for positive even, second one for negative events)
+    # - APS frames: 1 channel (grayscale images)
+    # - APS DIFF frames: 1 channel (log(I_1) -  log(I_0))
+    if FLAGS.frame_mode == 'dvs':
+        img_channels = 3
+    else:
+        img_channels = 3
+
+
+    # Generate training data with real-time augmentation
+    if FLAGS.frame_mode == 'dvs':
+        train_datagen = utils.DroneDataGenerator()
+    elif FLAGS.frame_mode == 'aps':
+        train_datagen = utils.DroneDataGenerator(rotation_range = 0.2,
+                                                 rescale = 1./255,
+                                                 width_shift_range = 0.2,
+                                                 height_shift_range=0.2)
+    else:
+        train_datagen = utils.DroneDataGenerator(rotation_range = 0.2,
+                                                 width_shift_range = 0.2,
+                                                 height_shift_range=0.2)
+
+    train_generator = train_datagen.flow_from_directory(FLAGS.train_dir,
+                                                        is_training=True,
+                                                        shuffle = True,
+                                                        frame_mode = FLAGS.frame_mode,
+                                                        target_size=(img_height, img_width),
+                                                        crop_size=(crop_img_height, crop_img_width),
+                                                        batch_size = FLAGS.batch_size)
+
+    # Generate validation data with real-time augmentation
+    if FLAGS.frame_mode == 'dvs' or FLAGS.frame_mode == 'aps_diff':
+        val_datagen = utils.DroneDataGenerator()
+    else:
+        val_datagen = utils.DroneDataGenerator(rescale = 1./255)
+
+    val_generator = val_datagen.flow_from_directory(FLAGS.val_dir,
+                                                        shuffle = False,
+                                                        frame_mode = FLAGS.frame_mode,
+                                                        target_size=(img_height, img_width),
+                                                        crop_size=(crop_img_height, crop_img_width),
+                                                        batch_size = FLAGS.batch_size)
+    # output dim
+    assert train_generator.output_dim == val_generator.output_dim, \
+                        " Not macthing output dimensions."
+    output_dim = train_generator.output_dim
+
+    # Weights to restore
+    weights_path = os.path.join(FLAGS.experiment_rootdir, FLAGS.weights_fname)
+    initial_epoch = 0
+    if not FLAGS.restore_model:
+        # In this case weights will start from random
+        weights_path = None
+    else:
+        # In this case weigths will start from the specified model
+        initial_epoch = FLAGS.initial_epoch
+
+    # Define model
+    model = getModel(img_width, img_height, img_channels,
+                        output_dim, weights_path)
+
+    # Serialize model into json
+    json_model_path = os.path.join(FLAGS.experiment_rootdir, FLAGS.json_model_fname)
+    utils.modelToJson(model, json_model_path)
+
+    # Train model
+    trainModel(train_generator, val_generator, model, initial_epoch)
+
+
+def main(argv):
+    # Utility main to load flags
+    try:
+      argv = FLAGS(argv)  # parse flags
+    except gflags.FlagsError:
+      print ('Usage: %s ARGS\\n%s' % (sys.argv[0], FLAGS))
+
+      sys.exit(1)
+    _main()
+
+
+if __name__ == "__main__":
+    main(sys.argv)