-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inquiry About Feature Usage in Octree-Based Point Cloud Processing #114
Comments
Hello @kankanzheli, Thank you for the questions!
Yes, each finest leaf octant (smallest possible subdivision at the maximum octree depth) contains features from all 3D points of the original point cloud that occupy the same volume in space. Any number of features can be embedded into a single octant. This is in addition to the inherent spatial structure included in the octree, where each octant has static 3D coordinates (similar to how 2D images have a structure where each pixel has static 2D coordinates). When it comes to spatial features based on point coordinates, each finest leaf octant contains 1) the average distance from all original points to the centre of the octant cell d [normalized to range 0-1]; and 2) the average unit normal vector n estimated from the original points. In a sense, these two attributes can be used to reconstruct a shifted and oriented plane at each point (shown below) to hopefully provide the agent with a higher degree of spatial resolution that would otherwise be possible at any specified octree depth, but they are not strictly necessary because the octree already has a known spatial structure. For even better representation, d could potentially be replaced with an average offset vector instead of a single scalar, which I believe could provide advantages for applications within robot learning. However, I have not experimented with this yet. The utilized spatial features are based on the work of Peng-Shuai Wang.
In addition to the spatial features, each octant indeed also contains the average intensity of the points from the original point cloud, either with 3 RGB channels or a single monochromatic channel (as used in the lunar environments). Any other feature could be included for each point and treated the same way you would treat channels in an RGB image. For example, semantic labels for robot/terrain/object could be added as a part of "privileged" information. However, I did not try any other features to keep the observation space realistic in terms of the sim-to-real transfer. The images above are from my Master's thesis that contains some details, but the IROS paper also contains all relevant information in a concise manner. |
I have been reviewing your work on converting camera point clouds into world coordinate point clouds and subsequently transforming them into octrees. I am particularly interested in understanding the feature representations used within the octree structure.
Could you please clarify whether the point coordinates are utilized as features within the octrees? Additionally, do you rely solely on RGB information to guide the robotic arm towards the target, or are there other features involved?
Your insights on this matter would be greatly appreciated.
Thank you for your time and assistance.
The text was updated successfully, but these errors were encountered: