Inquiry About Feature Usage in Octree-Based Point Cloud Processing #114

kankanzheli · 2024-12-20T08:00:27Z

I have been reviewing your work on converting camera point clouds into world coordinate point clouds and subsequently transforming them into octrees. I am particularly interested in understanding the feature representations used within the octree structure.

Could you please clarify whether the point coordinates are utilized as features within the octrees? Additionally, do you rely solely on RGB information to guide the robotic arm towards the target, or are there other features involved?

Your insights on this matter would be greatly appreciated.

Thank you for your time and assistance.

AndrejOrsula · 2024-12-20T11:59:49Z

Hello @kankanzheli,

Thank you for the questions!

Could you please clarify whether the point coordinates are utilized as features within the octrees?

Yes, each finest leaf octant (smallest possible subdivision at the maximum octree depth) contains features from all 3D points of the original point cloud that occupy the same volume in space. Any number of features can be embedded into a single octant. This is in addition to the inherent spatial structure included in the octree, where each octant has static 3D coordinates (similar to how 2D images have a structure where each pixel has static 2D coordinates).

When it comes to spatial features based on point coordinates, each finest leaf octant contains 1) the average distance from all original points to the centre of the octant cell d [normalized to range 0-1]; and 2) the average unit normal vector n estimated from the original points. In a sense, these two attributes can be used to reconstruct a shifted and oriented plane at each point (shown below) to hopefully provide the agent with a higher degree of spatial resolution that would otherwise be possible at any specified octree depth, but they are not strictly necessary because the octree already has a known spatial structure.

For even better representation, d could potentially be replaced with an average offset vector instead of a single scalar, which I believe could provide advantages for applications within robot learning. However, I have not experimented with this yet. The utilized spatial features are based on the work of Peng-Shuai Wang.

Additionally, do you rely solely on RGB information to guide the robotic arm towards the target, or are there other features involved?

In addition to the spatial features, each octant indeed also contains the average intensity of the points from the original point cloud, either with 3 RGB channels or a single monochromatic channel (as used in the lunar environments).

Any other feature could be included for each point and treated the same way you would treat channels in an RGB image. For example, semantic labels for robot/terrain/object could be added as a part of "privileged" information. However, I did not try any other features to keep the observation space realistic in terms of the sim-to-real transfer.

The images above are from my Master's thesis that contains some details, but the IROS paper also contains all relevant information in a concise manner.
Otherwise, feel free to let me know if you have any further questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry About Feature Usage in Octree-Based Point Cloud Processing #114

Inquiry About Feature Usage in Octree-Based Point Cloud Processing #114

kankanzheli commented Dec 20, 2024

AndrejOrsula commented Dec 20, 2024

Inquiry About Feature Usage in Octree-Based Point Cloud Processing #114

Inquiry About Feature Usage in Octree-Based Point Cloud Processing #114

Comments

kankanzheli commented Dec 20, 2024

AndrejOrsula commented Dec 20, 2024