diff --git a/README.md b/README.md
index e34b9ef..b6bbe77 100755
--- a/README.md
+++ b/README.md
@@ -36,11 +36,21 @@ I regularly update [my blog in Toward Data Science](https://medium.com/@patrickl
- [Multimodal Regression](https://towardsdatascience.com/anchors-and-multi-bin-loss-for-multi-modal-target-regression-647ea1974617)
- [Paper Reading in 2019](https://towardsdatascience.com/the-200-deep-learning-papers-i-read-in-2019-7fb7034f05f7?source=friends_link&sk=7628c5be39f876b2c05e43c13d0b48a3)
-## 2023-09 (1)
-- [RetNet: Retentive Network: A Successor to Transformer for Large Language Models](https://arxiv.org/abs/2307.08621) [[Notes](paper_notes/retnet.md)] [MSRA]
-- [Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention](https://arxiv.org/abs/2006.16236) [[Notes](paper_notes/transformers_are_rnns.md)] ICML 2020 [Linear attention]
+
+- [Task and Motion Planning with Large Language Models for Object Rearrangement](https://arxiv.org/abs/2303.06247) IROS 2023
+
+## 2023-12 (1)
+- [ChatGPT for Robotics: Design Principles and Model Abilities](https://arxiv.org/abs/2306.17582) [[Notes](paper_notes/prompt_craft.md)] [Microsoft]
+- [RoboVQA: Multimodal Long-Horizon Reasoning for Robotics](https://arxiv.org/abs/2311.00899) [Google DeepMind]
+- [Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents](https://arxiv.org/abs/2201.07207) ICML 2022
+- [ProgPrompt: Generating Situated Robot Task Plans using Large Language Models](https://arxiv.org/abs/2209.11302) ICRA 2023
+- [CLIPort: What and Where Pathways for Robotic Manipulation](https://arxiv.org/abs/2109.12098) CoRL 2021
+- [Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation](https://arxiv.org/abs/2209.05451) CoRL 2022
+- [LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale](https://arxiv.org/abs/2208.07339) NeurIPS 2022 [LLM Quant]
+- [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978) [Song Han, LLM Quant]
- [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864)
-- [AFT: An Attention Free Transformer](https://arxiv.org/abs/2105.14103) [[Notes](paper_notes/aft.md)] [Apple]
+- [CoDi: Any-to-Any Generation via Composable Diffusion](https://arxiv.org/abs/2305.11846) NeurIPS 2023
+- [What if a Vacuum Robot has an Arm?](https://ieeexplore.ieee.org/abstract/document/10202493) UR 2023
- [FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness](https://arxiv.org/abs/2205.14135)
- [GPT in 60 Lines of NumPy](https://jaykmody.com/blog/gpt-from-scratch/)
- [Speeding up the GPT - KV cache](https://www.dipkumar.dev/becoming-the-unbeatable/posts/gpt-kvcache/)
@@ -54,12 +64,9 @@ I regularly update [my blog in Toward Data Science](https://medium.com/@patrickl
- [CLIPort: What and Where Pathways for Robotic Manipulation](https://arxiv.org/abs/2109.12098) CoRL 2021 [Nvidia, end-to-end visuomotor]
- [GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers](https://arxiv.org/abs/2210.17323) ICLR 2023
- [SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models](https://arxiv.org/abs/2211.10438) ICML 2023 [Song Han, LLM Quant]
-- [LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale](https://arxiv.org/abs/2208.07339) NeurIPS 2022 [LLM Quant]
-- [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978) [Song Han, LLM Quant]
- [SAPIEN: A SimulAted Part-based Interactive ENvironment](https://arxiv.org/abs/2003.08515) CVPR 2020
- [FiLM: Visual Reasoning with a General Conditioning Layer](https://arxiv.org/abs/1709.07871) AAAI 2018
- [TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?](https://arxiv.org/abs/2106.11297) NeurIPS 2021
-- [ChatGPT for Robotics: Design Principles and Model Abilities](https://arxiv.org/abs/2306.17582) [Microsoft]
- [MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge](https://arxiv.org/abs/2206.08853) NeurIPS 2022 [Outstanding paper award]
- [QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314)
- [OVO: Open-Vocabulary Occupancy](https://arxiv.org/abs/2305.16133)
@@ -73,6 +80,12 @@ I regularly update [my blog in Toward Data Science](https://medium.com/@patrickl
- [An Attention Free Transformer](https://arxiv.org/abs/2105.14103) [Apple]
- [PDDL Planning with Pretrained Large Language Models]() [MIT Leslie Kaelbling]
+## 2023-09 (3)
+- [RetNet: Retentive Network: A Successor to Transformer for Large Language Models](https://arxiv.org/abs/2307.08621) [[Notes](paper_notes/retnet.md)] [MSRA]
+- [Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention](https://arxiv.org/abs/2006.16236) [[Notes](paper_notes/transformers_are_rnns.md)] ICML 2020 [Linear attention]
+- [AFT: An Attention Free Transformer](https://arxiv.org/abs/2105.14103) [[Notes](paper_notes/aft.md)] [Apple]
+
+
## 2023-08 (3)
- [RT-1: Robotics Transformer for Real-World Control at Scale](https://arxiv.org/abs/2212.06817) [[Notes](paper_notes/rt1.md)] [DeepMind]
- [RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control](https://robotics-transformer2.github.io/assets/rt2.pdf) [[Notes](paper_notes/rt2.md)] [DeepMind, end-to-end visuomotor]
@@ -1396,7 +1409,6 @@ Feature Extraction](https://arxiv.org/abs/2010.02893) [monodepth, semantics, Nav
- [MAGVIT: Masked Generative Video Transformer](https://arxiv.org/abs/2212.05199) CVPR 2023 highlight [Video prediction]
- [Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2304.08818) CVPR 2023 [Video prediction]
- [Runway Gen-1: Structure and Content-Guided Video Synthesis with Diffusion Models](https://arxiv.org/abs/2302.03011)
-- [Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents](https://arxiv.org/abs/2201.07207) ICML 2022
- [Learning to drive from a world on rails](https://arxiv.org/abs/2105.00636) ICCV 2021 oral [Philipp Krähenbühl]
- [Learning from All Vehicles](https://arxiv.org/abs/2203.11934) CVPR 2022 [Philipp Krähenbühl]
- [End-to-End Urban Driving by Imitating a Reinforcement Learning Coach](https://arxiv.org/abs/2108.08265) ICCV 2021
diff --git a/paper_notes/_template.md b/paper_notes/_template.md
index e0f793a..16472f8 100644
--- a/paper_notes/_template.md
+++ b/paper_notes/_template.md
@@ -1,6 +1,6 @@
# [Paper Title](link_to_paper)
-_September 2023_
+_December 2023_
tl;dr: Summary of the main idea.
diff --git a/paper_notes/prompt_craft.md b/paper_notes/prompt_craft.md
new file mode 100644
index 0000000..b65cabc
--- /dev/null
+++ b/paper_notes/prompt_craft.md
@@ -0,0 +1,32 @@
+# [ChatGPT for Robotics: Design Principles and Model Abilities](https://arxiv.org/abs/2306.17582)
+
+_December 2023_
+
+tl;dr: A pipeline to use ChatGPT for robotics tasks via prompt engineering, and writing high level code for execution. Similar to [CaP (code as policies)](cap.md).
+
+#### Overall impression
+Robotics systems, unlike text-only apps, require deep understanding of real-world **physics**, environmental **context**, and the **ability** to perform physical actions.
+
+LLM's out-of-the-box understanding of basic concepts (control, camera geometry, physical form factors) makes it an excellenet choice to build generalizable and user-friendly robotics pipeline.
+
+PromptCraft replaces a specialized engineer-in-the-loop with a user-on-the-loop. --> How to polish the interaction between user and the robot or automate as much as possible is the key to real world application (productization).
+
+PromptCraft is NOT a fully automated process, and needs human on the loop to monitor and intervene in case of unexpected behavior generated by LLM, especially so for safety-critical application.
+
+PromptCraft is not using VLM, but rather only LLM.
+
+#### Key ideas
+- Pipeline to construct ChatGPT-based robotics app
+ - Define high level robot function lib.
+ - Prompt with objectives and allowed functions.
+ - The user stays on the loop to evaluate.
+ - Deployed onto the robot.
+
+#### Technical details
+- The creation of a high level function library, and listing them in the prompt is a key concept that unlock the ablity to solve robotics app with ChatGPT. This avoids unbounded text-based answer, and avoids API under-specification.
+- The capability to write new functions confers flexibility and robustness to LLMs.
+- The diaglog/conversation ability of ChatGPT is a surprisingly effective vehicle for interactive behavior correction.
+- The user of simulators can be particularly useful to evaluate model's performance before deployment in the real world. --> Simulation (Habitat, AirSim, etc) is the right vehicle to evaluate closed-loop high level task planning.
+
+#### Notes
+- Application of LLM application on robotics, include visual-language navigation, language-based human-robot interaction, and visual-langauge manipulation control (PerAct, Cliport by Dieter Fox)
\ No newline at end of file