search.json

[{"title":"A Survey For LLM-Based Works In Robotics","date":"2023-05-12T00:48:05.000Z","url":"/LLM/LLM-robotics-works.html","tags":[["LLM","/tags/LLM/"]],"categories":[["LLM","/categories/LLM/"]],"content":" Robot Manipulation Text2Motion: From Natural Language Instructions to Feasible Plans (arxiv.org) microsoft/ChatGPT-Robot-Manipulation-Prompts (github.com) This repository provides a set of prompts that can be used with OpenAI's ChatGPT to enable natural language communication between humans and robots for executing tasks. The prompts are designed to allow ChatGPT to convert natural language instructions into a sequence of executable robot actions, with a focus on robot manipulation tasks. The prompts are easy to customize and integrate with existing robot control and visual recognition systems. For more information, please see our blog post and our paper, ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application. MultiModal Perception facebookresearch/ImageBind: ImageBind One Embedding Space to Bind Them All (github.com) ImageBind learns a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU data. It enables novel emergent applications ‘out-of-the-box’ including cross-modal retrieval, composing modalities with arithmetic, cross-modal detection and generation. "},{"title":"【Reading】Interactive Gibson Benchmark：A Benchmark for Interactive Navigation in Cluttered Environments","date":"2023-05-07T23:17:02.000Z","url":"/true/gibson-benchmark.html","tags":[["navigation","/tags/navigation/"],["SLAM","/tags/SLAM/"],["planning","/tags/planning/"]],"categories":[["true","/categories/true/"]],"content":"Abstract: This paper presents Interactive Gibson Benchmark, the first comprehensive benchmark for training and evaluating Interactive Navigation solutions. Interactive Navigation tasks are robot navigation problems where physical interaction with objects (e.g., pushing) is allowed and even encouraged to reach the goal. "},{"title":"【Presentation】NeRF-based SLAM","date":"2023-05-04T00:06:32.000Z","url":"/NeRF/slam-pre.html","tags":[["NeRF","/tags/NeRF/"],["interview","/tags/interview/"],["SLAM","/tags/SLAM/"],["Robotics","/tags/Robotics/"]],"categories":[["NeRF","/categories/NeRF/"]],"content":"This is the presentation for THU DISCOVER lab. "},{"title":"【Presentation】UAV Planning Based on City Scale NeRF","date":"2023-05-04T00:01:13.000Z","url":"/NeRF/planning-pre.html","tags":[["NeRF","/tags/NeRF/"],["interview","/tags/interview/"],["planning","/tags/planning/"],["Robotics","/tags/Robotics/"]],"categories":[["NeRF","/categories/NeRF/"]],"content":"This the presentation slide for THU DISCOVER lab given on Apr. 28. "},{"title":"【Reading】LATITUDE:Robotic Global Localization with Truncated Dynamic Low-pass Filter in City-scale NeRF","date":"2023-04-19T19:22:28.000Z","url":"/NeRF/LATITUDE.html","tags":[["robotics","/tags/robotics/"],["NeRF","/tags/NeRF/"],["UAV","/tags/UAV/"],["localization","/tags/localization/"],["optimization","/tags/optimization/"]],"categories":[["NeRF","/categories/NeRF/"]],"content":"This paper proposes a two-stage localization mechanism in city-scale NeRF. Abstract Neural Radiance Fields (NeRFs) have made great success in representing complex 3D scenes with high-resolution details and efficient memory. Nevertheless, current NeRF-based pose estimators have no initial pose prediction and are prone to local optima during optimization. In this paper, we present LATITUDE: Global Localization with Truncated Dynamic Low-pass Filter, which introduces a two-stage localization mechanism in city-scale NeRF. In place recognition stage, we train a regressor through images generated from trained NeRFs, which provides an initial value for global localization. In pose optimization stage, we minimize the residual between the observed image and rendered image by directly optimizing the pose on the tangent plane. To avoid falling into local optimum, we introduce a Truncated Dynamic Low-pass Filter (TDLF) for coarse-to-fine pose registration. We evaluate our method on both synthetic and real-world data and show its potential applications for high-precision navigation in large scale city scenes. System Design Place Recognition Original poses, accompanied by additional poses around the original ones are sampled. The pose vector is passed through the trained and fixed Mega-NeRF with shuffled appearance embeddings. Initial poses of the inputted images are predicted by a pose regressor network. Pose Optimization The initial poses are passed through positional encoding filter The pose vector is passed through the trained and fixed Mega-NeRF and generates a rendered image. Calculate the photometric error of the rendered image and the observed image and back propagate to get a more accurate pose with the TDLF. Implementation Place Recognition Data Augmentation: A technique in machine learning used to reduce overfitting when training a machine learning model by training models on several slightly-modified copies of existing data. First uniformly sample several positions in a horizontal rectangle area around each position around original poses . Then add random perturbations on each axis drawn evenly in , where is the max amplitude of perturbation to form sampled poses . They are used to generate the rendered observations by inputting the poses to Mega-NeRF. To avoid memory explosion, we generate the poses using the method above and use Mega-NeRF to render images during specific epochs of pose regression training. Additionally, Mega-NeRF’s appearance embeddings are selected by randomly interpolating those of the training set, which can be considered as a data augmentation technique to improve the robustness of the APR model under different lighting conditions. Pose Regressor: Absolute pose regressor (APR) networks are trained to estimate the pose of the camera given a captured image. Architecture: Built on top of VGG16’s light network structure, we use 4 full connection layers to learn pose information from image sequences. Input: Observed image (resolution ), rendered observations Output: Corresponding estimated poses , . Loss Function: (In general, the model should trust more on real-world data and learn more from it.) Pose Optimization MAP Estimation Problem[A] Formulation: Here denotes place recognition; denotes the trained Mega-NeRF. We optimize posterior by minimizing the photometric error of and the image rendered by . Optimization on Tangent Plane: We optimize pose on tangent plane to ensure a smoother convergence. [1] TODO I know nothing about :( Explanations &amp; References [1]Adamkiewicz, M., Chen, T., Caccavale, A., Gardner, R., Culbertson, P., Bohg, J., &amp; Schwager, M. (2022). Vision-only robot navigation in a neural radiance world. IEEE Robotics and Automation Letters, 7(2), 4606-4613.  Turki, H., Ramanan, D., &amp; Satyanarayanan, M. (2022). Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12922-12931).  Yen-Chen, L., Florence, P., Barron, J. T., Rodriguez, A., Isola, P., &amp; Lin, T. Y. (2021, September). inerf: Inverting neural radiance fields for pose estimation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 1323-1330). IEEE.  [A]Maximum A Posterior (MAP) Estimation: Maximum a posteriori (MAP) estimation is a method of statistical inference that uses Bayes' theorem to find the most likely estimate of a parameter given some observed data. Maximum a posteriori estimation - Wikipedia "},{"title":"Reading:\"NeRF:Representing Scenes as Neural Radiance Fields for View Synthesis\"","date":"2023-04-17T19:21:56.000Z","url":"/NeRF/NeRF-startup.html","tags":[["NeRF","/tags/NeRF/"],["papers","/tags/papers/"],["Computer-Vision","/tags/Computer-Vision/"],["Deep-Learning","/tags/Deep-Learning/"]],"categories":[["NeRF","/categories/NeRF/"]],"content":"This is a summary for paper \"NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis\". Keywords: scene representation, view synthesis, image-based rendering, volume rendering, 3D deep learning A brief understanding: How to train a network for NeRF Training a neural network for NeRF (Neural Radiance Fields) involves several steps, including data preparation, network architecture design, training, and evaluation. Data preparation: The first step is to prepare the data that will be used to train the neural network. This typically involves capturing a set of 3D scans of the object or environment being represented, and labeling the data with the corresponding colors that should be associated with each point in the 3D space. Network architecture design: The next step is to design the architecture of the neural network that will be used to represent the object or environment. This typically involves defining the number and types of layers in the network, as well as the size and shape of the network. Training: Once the network architecture has been designed, the next step is to train the network using the prepared data. This involves feeding the data into the network and adjusting the weights of the network over multiple iterations, or epochs, to optimize the performance of the network. Evaluation: After the network has been trained, it is typically evaluated on a separate set of data to measure its performance and ensure that it is generating accurate results. This can involve comparing the output of the network to the ground truth data, as well as using visualization techniques to compare the rendered images produced by the network to actual photographs of the object or environment. Overall, the process of training a neural network for NeRF involves a combination of data preparation, network architecture design, training, and evaluation to produce a highly accurate and efficient 3D representation of an object or environment. ​ By Vicuna-13b Contribution An approach for representing continuous scenes with complex geometry and materials as 5D neural radiance fields, parameterized as basic MLP networks. A differentiable rendering procedure based on classical volume rendering techniques, which we use to optimize these representations from standard RGB images. This includes a hierarchical sampling strategy to allocate the MLP's capacity towards space with visible scene content. A positional encoding to map each input 5D coordinate into a higher dimensional space, which enables us to successfully optimize neural radiance fields to represent high-frequency scene content. An overview of our neural radiance field scene representation and differentiable rendering procedure. Here g.t. represents the \"ground truth\", which means the real scene. Overview of the Rendering Process March camera rays through the scene to generate a sampled set of 3D points. Use those points and their corresponding 2D viewing directions as input to the neural network to produce an output set of colors and densities. use classical volume rendering techniques to accumulate those colors and densities into a 2D image. we can use gradient descent to optimize this model by minimizing the error between each observed image and the corresponding views rendered from our representation. Neural Radiance Field Scene Representation This is a method for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network , whose input is a single continuous 5D coordinate and whose output is the volume density and view-dependent emitted radiance at that spatial location. : 3D location : 2D viewing direction : Emitted color : Volume density From Object to Scene: Volume Rendering with Radiance Fields Our 5D neural radiance field represents a scene as the volume density and directional emitted radiance at any point in space. We render the color of any ray passing through the scene using principles from classical volume rendering[1]. The volume density can be interpreted as the differential probability[2] of a ray terminating at a particle at location . The expected color of camera ray with near bound and far bound . : Camera ray, where is the position of the camera, is the position of the point in the 3D space being rendered, and is the direction of the camera ray. : Denotes the accumulated transmittance along the ray from to , i.e., the probability that the ray travels from tn to t without hitting any other particle. Example: From 3D object to hemisphere of viewing directions In (a) and (b), we show the appearance of two fixed 3D points from two different camera positions: one on the side of the ship (orange insets) and one on the surface of the water (blue insets). Our method predicts the changing appearance of these two 3D points with respect to the direction of observation , and in (c) we show how this behavior generalizes continuously across the whole hemisphere of viewing directions(This hemisphere can be viewed as the plot of , where is the unit vector in the spherical coordinate frame and shows the color). Discrete Sampling Rendering a view from our continuous neural radiance field requires estimating this integral for a camera ray traced through each pixel of the desired virtual camera. However, MLP would only be queried at a discrete set of locations. So we use deterministic quadrature[3] to numerically estimate this continuous integral. we partition into evenly-spaced bins and then draw one sample uniformly at random from within each bin: From Scene to Object: Estimation of : The distance between adjacent samples This function for calculating from the set of values is trivially differentiable and reduces to traditional alpha compositing[4] with alpha values . Implementation details Network Architecture First layers (ReLU): Input: 3D coordinate processed by Output: ; 256-dimensional feature vector. layer: Input: ; 256-dimensional feature vector; Cartesian viewing direction unit vector processed by Output: View-dependent RGB color Details of variables are in Improving Scenes of High Frequency. Network Architecture Training Datasets: Captured RGB images of the scene, The corresponding camera poses and intrinsic parameters, and Scene bounds (we use ground truth camera poses, intrinsics, and bounds for synthetic data, and use the COLMAP structure-from-motion package to estimate these parameters for real data) Iteration: Randomly sample a batch of camera rays from the set of all pixels in the dataset following the hierarchical sampling Loss: The total squared error between the rendered and true pixel colors for both the coarse and fine renderings In our experiments, we use a batch size of 4096 rays, each sampled at coordinates in the coarse volume and additional coordinates in the fine volume. We use the Adam optimizer with a learning rate that begins at and decays exponentially to over the course of optimization (other Adam hyper-parameters are left at default values of , , and ). The optimization for a single scene typically take around 100--300k iterations to converge on a single NVIDIA V100 GPU (about 1--2 days). Notable Tricks Enhancements brought by the tricks Improving Scenes of High Frequency Deep networks are biased towards learning lower frequency functions. findings in the context of neural scene representations, and show that reformulating as a composition of two functions , where is fixed. It is used to map variables of to . This function is applied separately to each of the three coordinate values in (which are normalized to lie in ) and to the three components of the Cartesian viewing direction unit vector (which by construction lie in ). In the experiments, we set for and for . Reducing the Cost with Hierarchical Sampling Our rendering strategy of densely evaluating the neural radiance field network at query points along each camera ray is inefficient: free space and occluded regions that do not contribute to the rendered image are still sampled repeatedly. Instead of just using a single network to represent the scene, we simultaneously optimize two networks: one \"coarse'' and one \"fine''. The coarse Network Rewrite the alpha composited color as a weighted sum of all sampled colors along the ray: : The number of sampling points for coarse network. The fine Network Normalizing as produces a piecewise-constant PDF along the ray. Then sample from locations from this distribution using inverse transform sampling[5]. Then we evaluate using samples. Conclusion TODO Explanations [1] Volume rendering is a technique used in computer graphics and computer vision to visualize 3D data sets as 2D images. It works by slicing the 3D data set into a series of thin layers, and then rendering each layer as a 2D image from a specific viewpoint. These 2D images are then composited together to form the final volume rendering. [2] If a distribution (here in 3D space) has a density , that means that for (almost) any volume in that space , you can assign a probability to it by integrating the density (here \"density\" means probability per unit volume, very similar to, say, the concentration of salt in a solution). [3] Deterministic quadrature is a mathematical method used to estimate the definite integral of a function. The basic idea is to divide the area under the curve into smaller areas, and calculate the approximate value of the definite integral by summing the areas of the smaller areas. There are several types of deterministic quadrature methods, including the trapezoidal rule, Simpson's rule, and Gaussian quadrature. [4] Alpha compositing is a technique used in computer graphics and image processing to combine two or more images or video frames by blending them together using an alpha channel. The alpha channel is a mask that defines the transparency or opacity of each pixel in the image. Alpha compositing is used to create composites, where the resulting image is a combination of the original images, with the transparency or opacity of each image controlled by the alpha channel. The alpha channel can be used to create effects such as blending, fading, and layering. Alpha compositing - Wikipedia [5] Inverse transform sampling (ITS) is a technique used in digital signal processing to reconstruct a signal from a set of samples. It is the inverse of the discrete Fourier transform(DFT). The basic idea behind ITS is to use the Fourier coefficients obtained from DFT to reconstruct the signal in the time domain. Inverse transform sampling - Wikipedia"},{"title":"An interview with AI","date":"2023-04-12T16:07:53.000Z","url":"/Grow-with-AI/ai-interview.html","tags":[["AIGC","/tags/AIGC/"],["interview","/tags/interview/"],["prompt-engineering","/tags/prompt-engineering/"]],"categories":[["Grow with AI","/categories/Grow-with-AI/"]],"content":"This is a record for an mock interview with LLM. The model I use is Vicuna-13b-v0 loaded in 8 bits. Record 1st Round Prompts: I want you to act as an interviewer. I will be the candidate and you will ask me the interview questions for the parallel computing Engineer position. This is a position which requires interviewers to be familiar with parallel computing libraries like C++, CUDA, OpenMP and MPI. I want you to only reply as the interviewer. Do not write all the conversation at once. I want you to only do the interview with me. Ask me the questions and wait for my answers. Do not write explanations. Ask me the questions one by one like an interviewer does and wait for my answers. My first sentence is \"Hi\". Full Text: interview1.txt Summary This mock interview was based on my project experience and mainly involved an examination of my algorithm design skills, optimization skills and debug skills. However, it did not examine in depth the project itself based on my proposed project, such as questions about the principles of particle simulation, or my code structure. You can also see a lack of processing of my answers, hence the repetition in the last few questions. What was enlightening about this interview was that it guided me to review the relevant projects and notes. 2nd Round Prompts: I want you to act as an interviewer. I will be the candidate and you will ask me the interview questions for the Software engineer position. This is a position which requires interviewers to be familiar with professional knowledge in writing regular C++ algorithms, and be familiar with the features of tools like CMAKE and GCC. I want you to only reply as the interviewer. Do not write all the conversation at once. I want you to only do the interview with me. Ask me the questions and wait for my answers. Do not write explanations. Ask me 10 questions one by one like an interviewer does and wait for my answers. My first sentence is \"Hi\". Full text: interview2.txt Summary This round was mainly focused on professional knowledge, and the main content was based on my experience in modeling drones. Perhaps because of the \"professional\" prompt, there was questions about professional knowledge, but it was largely unrelated to the C++ and other topics mentioned, this may be caused because I've mentioned Matlab modeling. In this interview, we can see that the model has problems with accommodating human users. 3rd Round (Not finished) I want you to act as an interviewer. I will be the candidate and you will ask me the interview questions for the New-energy automobile engineer position. This is a position which requires interviewers to have a deep understanding of the development of new energy automobiles, and have a clear recognition of its future. I want you to only reply as the interviewer. Do not write all the conversation at once. I want you to only do the interview with me. Ask me the questions and wait for my answers. Do not write explanations. Ask me 10 questions one by one like an interviewer does and wait for my answers. My first sentence is \"Hi\" Summary In this mock interview, I tried to use AI to examine how much the applicants fit to the position. Although the AI can be seen answering based on its pre-trained data in conversations outside of the interview, it still tends to examine my own project experience based on what I presented. Which is simialr to the 2nd round. Conclusion In this interview experience, we can see the ability of the LLMs to understand academic language. The ability to find the focus based on self-presentation (though sometimes incorrectly) and to ask in-depth questions based on this focus is more suitable for subjects who have no interview experience or need inspiration. However, the experience also revealed that it is easy to go too deep into a certain technical detail and less reliable in general knowledge scenarios such as job matching tests. In order to achieve a better mock interview effect, the model may need to first decompose the requirements based on the job, divide the examination surface into different dimensions and ask questions in an appropriate depth. Also, for general knowledge scenarios, the model may need to inject newer knowledge based on a llama-index type of model. TODO I will try to improve the quality based on the commonly used interview competency model, it shall be developed into a task-driven agent. 如何在面试中评估候选人的能力模型 - 知乎 (zhihu.com)"},{"title":"lec01","date":"2022-12-25T11:49:53.000Z","url":"/Artillery/lec01.html","tags":[["Artillery","/tags/Artillery/"],["Robotics","/tags/Robotics/"]],"categories":[["Artillery","/categories/Artillery/"]],"content":"ddd Content 常用坐标系及其变换 苏制坐标系&amp;英制坐标系 坐标系变换 ddd 苏制坐标系&amp;英制坐标系 苏式坐标系 苏式坐标系主要有地面坐标系、弹（机）体坐标系、弹道坐标系与速度坐标系。 原点：通常为发射点（发射瞬时飞行器的质心） 轴：通常为原点与目标之间的地球大圆切线，指向目标为正。 轴：垂直于 轴，向上为正。 轴：垂直 平面，按右手坐标系确定正方向 特点：固连于地球表面，随地球一起转动。 近似：研究近程飞行力学问题时，可忽略地球的自转和公转，将地球表面看作平面，认为重力与轴平行，地面坐标系可看作惯性坐标系。 原点 ：取在飞行器的质心上。 轴：与飞行器的纵轴重合,指向飞行器的头部为正。 轴：（立轴）位于飞行器的纵向对称面内，垂直于Ox1轴，向上为正。 轴：垂直于 平面，按右手坐标系确定正方向。 与弹体固连，相对于弹体不动，是动坐标。 原点O：取在飞行器的质心上。 轴：与飞行器的速度矢量V重合。 轴：位于包含速度矢量V的铅垂面内，垂直于Ox2轴，向上为正。 轴：垂直于平面，按右手坐标系确定正方向。 特点：与飞行器的速度矢量固连，是动坐标系。 用途：将飞行器质心移动的动力学方程投影到该坐标系上，形式简单，含义清晰。 原点 ：取在飞行器的质心上。 轴：与飞行器的速度矢量V重合。 轴：位于飞行器纵向对称面内，垂直于$x_3轴，向上为正。 轴：垂直于 平面，按右手坐标系确定正方向。 特点：与飞行器速度矢量固连，是动坐标系。 用途：用于确定飞行器相对于气流的角度；空气动力沿该坐标系三轴的投影分别定义为阻力、升力和侧向力。 英制坐标系 常用坐标系之间的联系 Remainder：此处采用苏制坐标系定义。 地面&amp;弹体坐标系 俯仰角 Pitch ()：飞行器纵轴（）与水平面（）之间的夹角。飞行器纵轴指向水平面上方则为正，反之为负。 偏航角 Yaw ()：飞行器纵轴在水平面的投影与地面坐标系轴之间的夹角。沿轴俯视，若由轴逆时针旋转而成则为正，反之为负。 倾斜角 Roll ()：弹体坐标系的轴与包含飞行器纵轴的铅垂面之间的夹角。由飞行器尾部顺纵轴前视，若轴位于铅垂面右侧，则为正。 地面&amp;弹道坐标系 弹道倾角 ()：飞行器速度矢量与水平面之间的夹角。速度矢量指向水平面上方，为正，反之为负。 弹道偏角 ()：飞行器速度矢量在水平面的投影与地面坐标系轴之间的夹角。沿轴俯视，若由轴逆时针旋转而成则为正，反之为负。 速度&amp;弹体坐标系： 攻角 ()：飞行器的速度矢量在弹体纵向对称面的投影与Ox1轴之间的夹角。若Ox1轴位于速度投影线的上方，则为正，反之为负。 侧滑角 ()：飞行器速度矢量与纵向对称面之间的夹角。沿飞行方向观察，若来流从右侧流向飞行器，则为正，反之为负。 坐标系变换 主要涉及刚体运动相关内容，通过 旋转矩阵、投影、欧拉角、四元数描述。 旋转矩阵 投影 欧拉角 四元数 Create a quaternion array - MATLAB (mathworks.com)"}]