Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Depth map creation for monocular depth estimation #146

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

TilakD
Copy link

@TilakD TilakD commented Jul 6, 2020

Hi,

I have created a tutorial notebook which shows how to create depth map for frames from ring cameras using the API. This will help researchers to train and test their monocular depth estimation models.

@James-Hays
Copy link
Contributor

Cool! Thanks for your contribution. We'll take a look at this.

@TilakD
Copy link
Author

TilakD commented Jul 6, 2020

Thanks for the quick reply! This was a quick implementation, let me know if you are interested in integrating, I can clean it up and follow CONTRIBUTING.md to give a proper pull request.

@johnwlambert
Copy link
Contributor

Hi @TilakD, thanks very much for this contribution. This looks like something we would like to integrate.

Do you mind adding comments, type hints, and also follow black formatting for the Python (you could copy the code into a pure Python file, run black on that file, then copy back into the notebook)? Could you also make explicit variables w/ instructions for the user to enter input data/output directories?

I'm also curious about the visualization format. Do you think there is a more clear way to show the depth map? Would be nice also to make the RGB and depth images identical in size.
Screen Shot 2020-07-08 at 3 02 54 PM

@johnwlambert
Copy link
Contributor

johnwlambert commented Jul 9, 2020

Hi @TilakD, a few other small comments:

Thanks very much @TilakD.

@TilakD
Copy link
Author

TilakD commented Jul 11, 2020

Hello @johnwlambert @James-Hays I have made some improvements and cleanup and addressed few of your code comments, please take a look at the notebook once.

@johnwlambert I am trying to follow kitti dataset and I have replicated how they save their depth images and images. To your comment on precision, I am multiplying depth value (meters) with 256 pred_depth_scaled = blank_img * 256.0 and saving as uint32.

@johnwlambert
Copy link
Contributor

Hi @TilakD , thanks very much for these improvements. A few more small comments:

Imports should be grouped in the following order:

Standard library imports.
Related third party imports.
Local application/library specific imports.
You should put a blank line between each group of imports.
  • A few typos: "Mupltiple to maintain precision" -> "Multiple ...", "extract all neccessary data into" -> "necessary", "and create depth map" -> "and create depth maps".
  • Could we add a small note about the format on disk of the depth images? I.e. "0" denotes a null value, rather than actually zero depth.
  • Right after the "On sample data" cell, could you add an html or markdown text cell instructing the user to insert path to the dataset in a certain variable name, and output path? And then under Visualize, we can reference those same paths, instead of hard-coding them.
  • I didn't quite catch how the "Genererate rgb to depth mapping for model training" section differs from the previous sections -- could you explain that in more detail?

Thanks again for this contribution!

Fixed typos
Pep8 import standard
Few more comments
@TilakD
Copy link
Author

TilakD commented Jul 11, 2020

Hi @johnwlambert I have fixed 1 through 4 of your suggestions.

Visualize - I intended to keep it as an independent entity, so that user will be able to provide image name and logID of their interest and see the depth maps.

Genererate rgb to depth mapping for model training - In monocular depth estimation model training, people normally use a text file for model training and validation. Say your depth data is at /x/y/z/depth/log_id/xxx.png and rgb image at /x/y/z/rgb/log_id/xxx.png. While training you'll use /x/y/z/ as a reference path to iterate through the txt file to get rgb and depth image path along with focal value. As I type, I feel like this is unnecessary 😅. Let me know, I can remove it..

Updated paths
@johnwlambert
Copy link
Contributor

johnwlambert commented Jul 11, 2020

Thanks @TilakD, looking good 👍! Ok I understand now about the last cell. Having a txt file with the paths is certainly useful for a Pytorch dataloader. Maybe we can just clarify the language a bit further:

For your training dataloader, you will likely find it helpful to read image paths from a .txt file. In this final cell, we explicitly write to a .txt file all rgb image paths that have a corresponding sparse ground truth depth file.

Just a few more miscellaneous things:

  • Can we remove the hard coded focal length in the last cell's code (1396). The focal lengths actually differ slightly between each log, as you will see in the vehicle_calibration_info.json files, e.g.
fx=1408.61, fy=1408.61
fx=1397.99, fy=1397.99
fx=1399.49, fy=1399.49
fx=1403.58, fy=1403.58
...

- I noticed the variable idx_ is not used, can that be removed?

  • Do you mind adding return argument type hints to all the functions? For most of them, myfunc() -> None: seems to apply, but a few return np.ndarray.
  • Do you mind adding type hints to the constructor of Lidar2Depth?
    ~~- Also, before introducing that code, do you mind adding a brief explanation of what it does? For example, in a text cell, "We'll define and use the Lidar2Depth class to project all Argoverse LiDAR sweeps onto corresponding ring camera images to obtain sparse depth ground truth. We'll instantiate this class once for each log. Ground truth could be stored in a variety of formats, but we'll store integer-discretized depth as depth*256, to increase the range from ~0.001 meters to int_max/256 meters". 0 meters is a privileged value that represents null ground truth. Unlike KITTI, we store depth, as opposed to disparity maps.~~
  • We try to use pydocstyle notation for comments, e.g. as we do here. Args: and Returns are listed, with indented items afterwards.
    - Instead of using indexing into the all camera list as follows:
        # 0-1 are ring cameras, 7 and 8 are stereo left and right
        # we are processing just the ring cameras
        self.ring_camera_list = self.argoverse_loader.CAMERA_LIST[0:6]

you can just pull out the ring camera list as from argoverse.utils.camera_stats import RING_CAMERA_LIST
- Do you prefer uint32 or uint16? I saw we are mixing and matching, and we should probably decide on just one.

  • I think I would prefer to make lidaridx and cameraID arguments to extract_lidar_image_pair and to save_images, rather than class state. I think this makes the semantics of those functions clearer.
  • Can we rename the method save_images() to save_image_pair(lidar_frame_idx, camera_id)? lidaridx is a bit confusing to me since it could be a point in a LiDAR sweep, or a LiDAR beam index. But the extra term frame clarifies that to me.
    - I also saw we are using variable names camera_id and cameraID in different places in the notebook, can we just one globally?

Thanks again. 👍

@johnwlambert
Copy link
Contributor

johnwlambert commented Jul 11, 2020

A few more tweaks for the text:
"See github page for instructions on how to install the package." -> "Please see our Github page for..."

"Argoverse dataset can be download at https://www.argoverse.org This tutorial assumes that you already download and extract all necessary data into a specific folder"
->
"The Argoverse 3D-Tracking dataset can be downloaded at https://www.argoverse.org. This tutorial assumes that you have already downloaded and extracted all necessary data into a specific folder . "

@johnwlambert
Copy link
Contributor

Another text tweak:
"Genererate rgb to depth mapping for model training" -> "Generate rgb..."

@johnwlambert
Copy link
Contributor

johnwlambert commented Jul 11, 2020

Do you mind using cv2.imread(fpath,cv2.IMREAD_UNCHANGED) instead of cv2.imread(fpath,-1)? I think it makes things a bit more readable for those who aren't familiar with what the -1 flag means for OpenCV.

@johnwlambert
Copy link
Contributor

I noticed we are not catching return self.lidar_image_projection_points as a return argument, could we make that an explicit return argument and input argument to save_image_pair, so that the semantics are a bit clearer?

@johnwlambert
Copy link
Contributor

I slightly prefer to make closer objects more "hot" in the colormap, i.e. I would prefer the "inferno" colormap in matplotlib instead of "jet", and to show inverse depth after 30 iterations of dilation:

 dilated_depth_image = cv2.dilate(
        depth_image, kernel=np.ones((2, 2), np.uint8), iterations=30
 )
...
plt.imshow(1/dilated_depth_image, cmap="inferno")

Screen Shot 2020-07-11 at 6 32 26 PM

Screen Shot 2020-07-11 at 6 30 35 PM

@TilakD
Copy link
Author

TilakD commented Jul 13, 2020

These are amazing set if suggestions, I have modified the code accordingly, I have modified accordingly, and improved on top of it. I'll commit the changes in sometime.

inferno, does look good!

Thanks again for the amazing review!

@TilakD
Copy link
Author

TilakD commented Jul 13, 2020

@johnwlambert Updated notebook based on your suggestions, please take a look...

Note: I have integrated txt generation in Lidar2depth class.

@johnwlambert
Copy link
Contributor

johnwlambert commented Jul 13, 2020

Thanks very much @TilakD. Looking good. Inferno visualizations look great on that log, much clearer.

  • Could we compute the actual ranges for the data (min vs. max)? We state "the range from ~0.001 meters to int_max/256 meters.", but I think we can clarify that a bit more. I think the min range would be 1/256, which would be 0.004. And for max range -- 255.996 meters (see note below)
  • I noticed we are still defining the depth array as uint16, but then converting to uint32 before we save it. Can we save it directly as uint16? I think LiDAR returns are unlikely to be over 250 meters away, so 250*256 = 64,000, and np.iinfo(np.uint16).max = 65535. So I think uint16 is sufficient.
  • You were mentioning that some logs were repeated, e.g. included in both train3 and in val for 00c561b9-2057-358d-82c6-5b06d76cebcf. I don't see that when I extract the logs? Could you have accidentally duplicated those during extraction?
  • I think I wasn't too clear about the type hints: We follow the Google style guide here, that states:
These sections can be omitted in cases where the function’s name and signature are informative enough that it can be aptly described using a one-line docstring.

Instead of:

    def extract_lidar_image_pair(self):
        """
        For the provided cameraID and LiDAR ply file, 
        extract rgb image and corresponding LiDAR points in the fov.
        
        Args:
            None
            
        Returns:
            None
        """

the following is preferred (With type hint in the function signature)

    def extract_lidar_image_pair(self) -> None:
        """Extract RGB image and corresponding LiDAR points in the FoV."""
  • I think it's clearer to make cameraID and lidar_frame_idx as input arguments to extract_lidar_image_pair, rather than class state, since they aren't general to the class itself.

@TilakD
Copy link
Author

TilakD commented Jul 14, 2020

hi @johnwlambert Regarding repeated logs. I verified and re-verified(Extracted each one of them to totally different directories). Below logs are still repeated. Don't know what the issue is..

train4/49d66e75-3ce6-316b-b589-f659c7ef5e6d
train3/49d66e75-3ce6-316b-b589-f659c7ef5e6d


train3/02cf0ce1-699a-373b-86c0-eb6fd5f4697a
train2/02cf0ce1-699a-373b-86c0-eb6fd5f4697a


train3/00c561b9-2057-358d-82c6-5b06d76cebcf
val/00c561b9-2057-358d-82c6-5b06d76cebcf

@alliecc
Copy link
Contributor

alliecc commented Jul 15, 2020

Hey @TilakD the log splits of Argoverse 1.0 and Argoverse1.1 are different. We didn't see the issue in Argoverse 1.1 data. Maybe you extracted 1.0 and 1.1 into the same directory?

@TilakD
Copy link
Author

TilakD commented Jul 15, 2020

Hi @alliecc you are absolutely right! I had downloaded 1.0 Training part 3 instead of 1.1 🤦‍♂️

Sorry for the trouble @johnwlambert @alliecc

@johnwlambert
Copy link
Contributor

No worries @TilakD. Thanks for looking into this @alliecc!

@TilakD
Copy link
Author

TilakD commented Jul 17, 2020

hi @johnwlambert sorry for the delay. I was little busy with work.

Changes and fixes,

  1. Stupid mistake of mixing V1.0 and V1.1 dataset😂
  2. Making camera_ID and lidar_frame_idx as input arguments for save_image_pair, frame2depth_mapping and extract_lidar_image_pair
  3. Google Style guide is being followed and code cleanup based on pylint
  4. I validated max range on val dataset (to save time and though train would be somewhere nearby) and found that 218 is the max range.
  5. Changed uint32 to uint16
  6. You had mentioned earlier that we are loosing lot of precision, and had recommended me to use (np.round(uv1[2],1)*10).astype(np.uint16). I didn't do that, instead, I changed, blank_img from uint16 to float. now it will accept the whole lidar value. Example, say, a particular lidar point value is 216.7196932007291, blank image pixel value will also be 216.7196932007291, this will be multiplied with 256 to get 55480.24145938665. We then convert to unit16 and save it (55480).

Let me know your thoughts.
Thanks!

removed depth_vis
@TilakD
Copy link
Author

TilakD commented Jul 20, 2020

@johnwlambert I was experimenting with the above created dataset on a sample monocular depth estimation model. I stitched up results from val and test dataset and created a video, you can check it out front center cam and other ring camera. It is amazing!

Front center:
vlcsnap-2020-07-20-15h53m17s829

Other Ring cameras:
vlcsnap-2020-07-21-14h43m47s958

PS: There seems to be a halo effect on nearby car roof tops. Don't know if this is dataset issue or model issue, I need to investigate it...

@johnwlambert
Copy link
Contributor

johnwlambert commented Jul 21, 2020

Hi @TilakD, those videos look great, thanks for sharing. No worries about the dataset version mixup.

The code is looking really good. Thanks for your patience here, and for checking the max range in the val set. Just a few more small changes before we merge:

can you sort the imports alphabetically?

import os
import fnmatch
import glob

to

import fnmatch
import glob
import os

~and the same for ~

import cv2
from matplotlib import pyplot as plt
import numpy as np
from PIL import Image
from tqdm import tqdm

Could you replace "0.004 meters to int_max/256 meters." to the actual maximum, e.g. saying something like, "0.004 meters to 255.996 meters (uint16_max/256)".
- I noticed we are not making self.img and self.lidar_image_projection_points a return argument, but rather putting them in class variables -- could we make them both explicit return arguments and input arguments to save_image_pair, so that the semantics are a bit clearer? (since lidar_image_projection_points and img isn't general to the class, i would prefer for them not to be class variables)
- Could you add type hints here:

    def __init__(self, input_log_dir, output_save_path):

~to ~

    def __init__(self, input_log_dir: str, output_save_path: str) -> None:

- Could we rename blank_img to sparse_depth_img to be more informative? Good idea to keep it as float before the final conversion.
- Could you remove the commented line " # print(gt_string)"? Just curious, why are you saving the focal lengths in the dataloader names?
- Could we come up with a more descriptive variable name than pyl_counter? perhaps lidar_frame_counter?
- Could we list your local path just once, so that

# Modify paths here,
folders = [
    "./Argoverse/full_data/extracted/argoverse-tracking/train1/",
    "./Argoverse/full_data/extracted/argoverse-tracking/train2/",
    "./Argoverse/full_data/extracted/argoverse-tracking/train3/",
    "./Argoverse/full_data/extracted/argoverse-tracking/train4/",
    "./Argoverse/full_data/extracted/argoverse-tracking/val/",
]
output_save_path = (
    "./Argoverse/depth_dataset/"
)

becomes

# Modify paths here:
local_path_to_argoverse_splits = '"./Argoverse/full_data/extracted/argoverse-tracking'
output_save_path = "./Argoverse/depth_dataset/"

folders = [
    "{local_path_to_argoverse_splits}/train1/",
    "{local_path_to_argoverse_splits}/train2/",
    "{local_path_to_argoverse_splits}/train3/",
    "{local_path_to_argoverse_splits}/train4/",
    "{local_path_to_argoverse_splits}/val/",
   "{local_path_to_argoverse_splits}/test/",
]

then later in visualize, can you use:

data_path = f"{output_save_path}/train/"

instead of your local path again:

data_path = (
    "./Argoverse/depth_dataset/train/"
)

I didn't see you include the test set in the paths, was there any particular reason for that? Might be helpful to dump the test set as well, perhaps in a different directory.

Many thanks @TilakD.

@TilakD
Copy link
Author

TilakD commented Jul 21, 2020

@johnwlambert Done😊

I'm learning a lot through these suggestions! Thanks a lot! Thanks for your patience!!!

Regarding saving focal length. I thought it would be helpful to get accurate focal length corresponding to that image, if in case it is needed while training (There are few papers which use focal length for model training). If the user doesn't find it useful, they can edit that function.

@johnwlambert
Copy link
Contributor

johnwlambert commented Jul 21, 2020

Fantastic, looks great @TilakD . Ok, good to know about the focal length being useful for mono-depth methods.

I will admit I liked the previous images in the notebook a bit more (the ones you put in images/lidar_to_depth_map_tutorial.png -- could we make those the default?

A few more type hint suggestions:

  def extract_lidar_image_pair(
        self, camera_ID: int, lidar_frame_idx: str
    ) -> Tuple[np.ndarray, np.ndarray]:
        """
        if lidar_image_projection_points is None:
            print("No point image projection")
            return
        else:
            return np.array(img), lidar_image_projection_points

I would return None if "no point image projection" instead of return, and that return type should be Optional[Tuple[np.ndarray,np.ndarray]], and then if None, continue

I think this type should be:

  def save_image_pair(
        self, camera_ID: int, img, lidar_frame_idx: str, lidar_image_projection_points: np.ndarray
    ) -> None:

instead of:

  def save_image_pair(
        self, camera_ID: int, img, lidar_frame_idx: str, lidar_image_projection_points
    ) -> None:

Can you do one last check with black to make sure the format still passes?

@TilakD
Copy link
Author

TilakD commented Jul 22, 2020

@johnwlambert Done.

@johnwlambert
Copy link
Contributor

johnwlambert commented Jul 27, 2020

Hi @TilakD, thanks very much. Ok I ran the tutorial by a few others and we have just a few more requests for revision:

  1. Can we name the notebook to argoverse_monocular_depth_map_tutorial.ipynb, instead of argoverse_depth_map_tutorial.ipynb? I'll be adding a stereo tutorial shortly, so I would like to differentiate between those two.
  2. Could we re-name the title from "Introduction to Argoverse-depth map" to "Introduction to Argoverse Monocular Depth Map Extraction"?
  3. Could we rename the later section from "Start depth map extraction" to "Start monocular depth extraction"?
  4. Could we also rename the new directories as "./Argoverse/monocular_depth_dataset/" instead of "./Argoverse/depth_dataset/"? We'll be adding a /stereo_depth_dataset in our next notebook.
  5. In the NOTE section, 3rd bullet point, can you please replace "0" to "zero"? I.e. "3. Zero ("0") meters is a privileged value that represents null ground truth. Unlike KITTI, we store depth, as opposed to disparity maps".

Thanks again for this great contribution!

TilakD added 2 commits July 27, 2020 20:14
Will be replacing this with an updated notebook (argoverse_monocular_depth_map_tutorial.ipynb).
Modification to accommodate for stereo depth in another notebook.
@TilakD
Copy link
Author

TilakD commented Jul 27, 2020

@johnwlambert Done!

Looking forward to the stereo tutorial!

Thanks😀.

Updated notebook path
@isht7
Copy link

isht7 commented Jun 1, 2021

This looks great! I was wondering why it has not been merged to master. Is this no longer a feature of interest? It seems to be that this will be useful for the community.

@jhonykaesemodel
Copy link
Contributor

Hi @TilakD! Thanks again for the great contribution and sorry for the slow response.
Here are a couple of final suggestions we believe would improve the tutorial that you might want to consider:

  • Notebook filename: monocular_depth_dataset_tutorial.ipynb
  • Notebook title: Monocular Depth Dataset Tutorial
  • Intro text:
Here is a simple tutorial to create a sparse depth map dataset for monocular depth estimation. The ground truth depth maps are created from lidar point clouds. 
This tutorial requires the [Argoverse 3D Tracking v1.1](https://www.argoverse.org/data.html) dataset and the [Argoverse API](https://github.com/argoai/argoverse-api). Please check [here](https://github.com/argoai/argoverse-api) for instructions on how to install it.
  • Other text:
    Remove link from We’ll instantiate this class once for each log.
  • Code modifications:
    Change the depth map representation to inverse depth to guarantee more accuracy at the near range. For example:
inv_depth_map = np.divide(1.0, depth_map, where=depth!=0)
inv_depth_map_scaled = np.uint16(inv_depth_map * 256.0)

Change text to accommodate the inverse depth representation.
Change # Load Argo data to # Load Argoverse 3D Tracking dataset.
Change camera_ID to camera_name.
Change lidar_frame_idx to timestamp, and instead of using a counter to name the new files, name them with the corresponding timestamps.
Change frame2depth_mapping to something like write_paths_to_txt.

Also added some comments for the README file.

Please let us know if you have any questions or suggestions :).

@TilakD
Copy link
Author

TilakD commented Sep 28, 2021

Hi @jhonykaesemodel I have provided changes to the notebook and the readme. Please take a look...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants