Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry About Feature Usage in Octree-Based Point Cloud Processing #114

Open
kankanzheli opened this issue Dec 20, 2024 · 1 comment
Open

Comments

@kankanzheli
Copy link

I have been reviewing your work on converting camera point clouds into world coordinate point clouds and subsequently transforming them into octrees. I am particularly interested in understanding the feature representations used within the octree structure.

Could you please clarify whether the point coordinates are utilized as features within the octrees? Additionally, do you rely solely on RGB information to guide the robotic arm towards the target, or are there other features involved?

Your insights on this matter would be greatly appreciated.

Thank you for your time and assistance.

@AndrejOrsula
Copy link
Owner

Hello @kankanzheli,

Thank you for the questions!

Could you please clarify whether the point coordinates are utilized as features within the octrees?

Yes, each finest leaf octant (smallest possible subdivision at the maximum octree depth) contains features from all 3D points of the original point cloud that occupy the same volume in space. Any number of features can be embedded into a single octant. This is in addition to the inherent spatial structure included in the octree, where each octant has static 3D coordinates (similar to how 2D images have a structure where each pixel has static 2D coordinates).

When it comes to spatial features based on point coordinates, each finest leaf octant contains 1) the average distance from all original points to the centre of the octant cell d [normalized to range 0-1]; and 2) the average unit normal vector n estimated from the original points. In a sense, these two attributes can be used to reconstruct a shifted and oriented plane at each point (shown below) to hopefully provide the agent with a higher degree of spatial resolution that would otherwise be possible at any specified octree depth, but they are not strictly necessary because the octree already has a known spatial structure.

For even better representation, d could potentially be replaced with an average offset vector instead of a single scalar, which I believe could provide advantages for applications within robot learning. However, I have not experimented with this yet. The utilized spatial features are based on the work of Peng-Shuai Wang.

Additionally, do you rely solely on RGB information to guide the robotic arm towards the target, or are there other features involved?

In addition to the spatial features, each octant indeed also contains the average intensity of the points from the original point cloud, either with 3 RGB channels or a single monochromatic channel (as used in the lunar environments).

Any other feature could be included for each point and treated the same way you would treat channels in an RGB image. For example, semantic labels for robot/terrain/object could be added as a part of "privileged" information. However, I did not try any other features to keep the observation space realistic in terms of the sim-to-real transfer.


The images above are from my Master's thesis that contains some details, but the IROS paper also contains all relevant information in a concise manner.
Otherwise, feel free to let me know if you have any further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants