OpenThought - System 2 Research Links

Here you find a collection of material (books, papers, blog-posts etc.) related to reasoning and cognition in AI systems. Specifically we want to cover agents, cognitive architectures, general problem solving strategies and self-improvement.

The term "System 2" in the page title refers to the slower, more deliberative, and more logical mode of thought as described by Daniel Kahneman in his book Thinking, Fast and Slow.

You know a great resource we should add? Please see How to contribute.

Cognitive Architectures

(looking for additional links & articles and summaries)

SOAR (State, Operator, And Result) by John Laird, Allen Newell, and Paul Rosenbloom
ACT-R (Adaptive Control of Thought-Rational) by John Anderson at CMU
SPAUN (Semantic Pointer Architecture Unified Network) by Chris Eliasmith at Waterloo, SPAUN 2.0 by Feng-Xuan Choo
ART (Adaptive resonance theory) by Stephen Grossberg and Gail Carpenter
CLARION (Connectionist Learning with Adaptive Rule Induction ON-line) by Ron Sun
EPIC (Executive Process/Interactive Control) by David Kieras and David Meyer
LIDA (Learning Intelligent Distribution Agent) by Stan Franklin, 2015 Paper
Sigma by Paul Rosenbloom
OpenCog by Ben Goertzel
NARS (Non-Axiomatic Reasoning System) by Pei Wang
Icarus by Pat Langley
MicroPsi by Joscha Bach
Thousand Brains Theory & HTM (Hierarchical Temporal Memory) by Jeff Hawkins
SPH (Sparse Predictive Hierarchie) by Eric Laukien
Leabra (Local, Error-driven and Associative, Biologically Realistic Algorithm), 2016 Paper by Randall O'Reilly
CogNGen (COGnitive Neural GENerative system) by Alexander Ororbia and Mary Alexandria Kelly, see also here and here
KIX (KIX: A Metacognitive Generalization Framework) by A. Kumar and Paul Schrater
ACE (Autonomous Cognitive Entity) by David Shapiro et al., gh: daveshap/ACE_Framework
Iterative Updating of Working Memory by Jared Reser, website, Video

Agent Papers

LLM Based

MCTS

Minecraft Agents

Massive Sampling / Generate-and-Test

World Models

GameNGen: Diffusion Models Are Real-Time Game Engines, project page
A Path Towards Autonomous Machine Intelligence
GAIA-1: A Generative World Model for Autonomous Driving
Latent space world-models: Dreamer, V2, V3, DayDreamer
World Models, web: project page
Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models

Neuro-Symbolic Approaches

Math

Active Inference

From pixels to planning: scale-free active inference

Prompting Techniques

Surveys:
- (Jul 2024) The Prompt Report: A Systematic Survey of Prompting Techniques
- (Feb 2024) A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications
- Prompt Engineering Guide Prompting Techniques
- Prompting Fundamentals and How to Apply them Effectively by Eugene Yan
Tools:
- priompt: A JSX-based prompting library, Blog
Chain-of-Thoughts (COT): Paper
Tree-of-Thoughts (ToT): Paper, impl: Strategic Debate
Graph-of-Thoughts (GoT): Paper, code
Algorithm of Thoughts (AoT): Paper
Chain-of-Verification (CoVe/CoV): Paper
Mixture-of-Agents (MoA): Paper
Tool-Integrated Reasoning (ToRA / TIR): Paper
Program of Thoughts (PoT): Paper
Buffer of Thoughts (BoT): Paper
Chain of Code (CoC): Paper
Thought of Search (ToS): Paper
Re-Reading the question as input (RE2): Paper
Self-Harmonized Chain of Thought (ECHO): Paper, code
Divergent CoT (DCoT), Paper
Iteration of Thought (IoT), Paper
Logic-of-Thought (LoT) Paper
Forest-of-Thought (FoT) Paper

Negative results

Chain of Thoughtlessness? An Analysis of CoT in Planning

Mechanistic Interpretability

Blog Posts / Presentations

08 Jan 2025 ML CMU: Optimizing LLM Test-Time Compute Involves Solving a Meta-RL Problem
09 Jan 2025 Notebook: Agentic RAG with Hugging Face smolagents vs Vanilla RAG
07 Jan 2025 Chip Huyen: Agents
HF: Scaling Test Time Compute with Open Models
Nebius: Leveraging training and search for better software engineering agents
DeepMind AlphaProof and AlphaGeometry 2
Getting 50% (SoTA) on ARC-AGI with GPT-4o, code: rgreenblatt/arc_draw_more_samples_pub
Schmidhuber: Artificial Curiosity & Creativity
synthesis.ai: Do Androids Dream? World Models in Modern AI
Our Transformers Code Agent beats the GAIA benchmark!
Lil'Log LLM Powered Autonomous Agents (Jun 2023 )
BAIR Blog: The Shift from Models to Compound AI Systems
Microsoft Research Tracing the path to self-adapting AI agents
LLMs develop their own understanding of reality as their language abilities improve, Emergent Representations Paper
LessWrong post: LLM Generality is a Timeline Crux
Three levels of self-building autonomous agents (Tweet thread by Yohei )
Don't Sleep on Single-agent Systems
Video: Improving LLM Reasoning using self-generated data: RL and Verifiers, Slides by Rishabh Agarwal (DeepMind)
Slides: Reasoning with inference-time compute by Sean Welleck, tweet

Graph Neural Networks

Complex Logical Query Answering (CQLA)

Answering logical queries over Incomplete Knowledge Graphs. Aspirationally this requires combining sparse symbolic index collation (SQL, SPARQL, etc) and dense vector search, preferably in a differentiable manner.

Inductive Reasoning over Heterogeneous Graphs

Similar to the regular CQLA, but with the emphasis on the "Inductive Setting" - i.e. querying over new, unseen during training nodes, edge types or even entire graphs. The latter part is interesting as it relies on the higher order "relations between relations" structure, connecting KG inference to Category Theory.

Neural Algorithmic Reasoning (NAR)

Initially attempted back in 2014 with general-purpose but unstable Neural Turing Machines, modern NAR approaches limit their scope to making GNN-based "Algorithmic Processor Networks" which learn to mimic classical algorithms on synthetic data and can be deployed on noisy real-world problems by sandwiching their frozen instances inside Encoder-Processor-Decoder architecture.

Grokking

Open-Source Agents & Agent Frameworks

QwenLM/Qwen-Agent
meta-llama/llama-agentic-system
gpt-researcher, docs
open-interpreter, docs
ADAS (Automated Design of Agentic Systems)
AI-Scientist
Ollama_Agents
AgentK
Storm, Paper
crewAI, docs
AutoGPT, docs
AutoGen, docs, AutoGen Studio Paper
Trace, docs, Paper
motleycrew, docs
langflow, docs
show-me: A Visual and Transparent Reasoning Agent

Algorithms

Weak Search Methods

Weak methods are general but don't use knowledge (heuristics) to guide the search process.

depth-first-search (DFS)
breadth-first-search (BFS)
depth-limited-search, iterative-deepening-depth-first-search (IDDFS)
generate-and-test
hill-climbing (borderline case between weak and strong methods)

Strong Search Methods

Books

The Soar Cognitive Architecture, John E. Laird, MIT Press, 2019
How to Build a Brain: A Neural Architecture for Biological Cognition Chris Eliasmith, Oxford Series on Cognitive Models and Architectures, 2013
Active Inference: The Free Energy Principle in Mind, Brain, and Behavior, Thomas Parr, Giovanni Pezzulo, Karl J. Friston, MIT Press, 2022, MLST Interview with Thomas Parr
Principles of Synthetic Intelligence PSI: An Architecture of Motivated Cognition, Joscha Bach, Oxford Series on Cognitive Models and Architectures Book 4, 2009
Conscious Mind, Resonant Brain: How Each Brain Makes a Mind, Stephen Grossberg, Oxford University Press, 2021
The Society of Mind, Marvin Minsky, Simon & Schuster, 1986
Reinforcement Learning: An Introduction 2nd Edition, Sutton & Barto, MIT Press, 2018
Reinforcement Learning: An Overview, Dec 2024, Kevin Murphy
Mathematical Foundations of Reinforcement Learning, Shiyu Zhao, open course on github + video lectures
Natural Language Cognitive Architecture, David Shapiro, 2022, open source copy
An Introduction to Universal Artificial Intelligence, Marcus Hutter, David Quarel, Elliot Catt, CRC Press, 2024 - AIXI, Slides, Video

Biologically Inspired Approaches

Diverse approaches some of which tap into classical PDE systems of biological NNs, some concentrate on Distibuted Sparse Representations (by default non-differentiable), others draw inspiration from Hippocampal Grid Cells, Place Cells, etc. Biological systems surpass most ML methods for Continual and Online Learning, but are hard to implement efficienly on GPU.

Ogma Sparse Predictive Hierarchies (SPH): whitepaper
The Tolman-Eichenbaum Machine: Unifying space and relational memory through generalisation in the hippocampal formation (TEM), TEM-t
Arousal as a universal embedding for spatiotemporal brain dynamics
Sparse Distributed Memory is a Continual Learner
Computation with Sequences of Assemblies in a Model of the Brain

Dense Associative Memory

Dense Associative Memory is mainly represented by Modern Hopfield Networks (MHN), which can be viewed as a generalized Transformers capable of storing queries, keys and values explicitly (as in Vector Databases) and running recurrent retrival by energy minimization (relating them to Diffusion models). Application for Continual Learning is possible when combined with uncertainty quantification and differentiable top-k selection.

Continual Learning

MagMax

Software Tools & Libraries

paul-gauthier/aider
OpenRLHF
PRIME-RL/PRIME
claude-engineer
continuedev/continue
OpenHands (formerly OpenDevin)
princeton-nlp/SWE-agent, documentation
stanfordnlp/dspy, DSPy awesome list: ganarajpr/awesome-dspy, paper
InternLM/lagent - lightweight framework for building LLM-based agents

Commercial Offerings

Software Engineering
- aide.dev + codestoryai/sidecar
- Devin
- Cursor
- Windsurf by Codeium
- GitHub Copilot & copilot-workspace
- lovable.dev
- textgrad
- Cosine Genie
- v0.dev by Vercel
- Replit AI
- bolt
- continue.dev
- Amazon Q Developer
- Codeyby Sourcegraph
AWS Automated Reasoning checks

Competitions & Benchmarks

DevAI: Agent-as-a-Judge: Evaluate Agents with Agents
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents, web: project page, gh: stonybrooknlp/appworld, Leaderboard
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents, gh: camel-ai/crab
WebArena: A Realistic Web Environment for Building Autonomous Agents, web: project page, Leaderboard
ARC-AGI: Leaderboard, On the Measure of Intelligence
PlanBench: Paper, gh: karthikv792/LLMs-Planning
GAIA: a benchmark for General AI Assistants: Leaderboard
StreamBench: Towards Benchmarking Continuous Improvement of Language Agents, gh: stream-bench/stream-bench
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
ZebraLogic, Leaderboard
Omni-MATH, gh: KbsdJames/Omni-MATH
BatsResearch/planetarium - Dataset and benchmark for assessing LLMs in translating natural language descriptions of planning problems into PDDL

Code

SWE-bench, SWE-bench Lite
BigCodeBench: The Next Generation of HumanEval, Leaderboard
SciCode: A Research Coding Benchmark Curated by Scientists, web: https://scicode-bench.github.io/
commit-0 The challenge is to rebuild Python core libraries and pass their unit tests, Leaderboard

Related Projects

Awesome LLM Strawberry (OpenAI o1)
awesome-o1 literature list by Sasha Rush
awesome-ai-agents
Nous Research Open Reasoning Tasks, a list of reasoning tasks, gh: NousResearch/Open-Reasoning-Tasks
ARC-AGI Resources Google table paper list by ARC price

Youtube Content

Sasha Rush: Speculations on Test-Time Scaling (o1)
François Chollet: It's Not About Scale, It's About Abstraction
Evaluating, Understanding and Improving Approaches for Machine Reasoning
Channel: David Shapiro
Artem Kirsanov: Engrams, Building Blocks of Memory in the Brain
Channel: Edan Meyer on AI, ML & RL, Discrete vs. Continuous RL + Paper
MIT AGI: Cognitive Architecture (Nate Derbinsky)
Channel: Thinking About Thinking (Mathematics of Neuroscience and AI)
Invariance and equivariance in brains and machines
code_your_own_AI: The CORE IDEA of AI Agents Explained

Best LLM APIs

Open-weights Reasoning Models

DeepSeek-R1
NovaSky-AI/Sky-T1-32B-Preview, Blog, gh: NovaSky-AI/SkyThought
ngxson/MiniThinky-v2-1B-Llama-3.2
SmallThinker-3B-Preview (small model trained on PowerInfer/QWQ-LONGCOT-500K)
QwQ-32B-Preview, Blog post
ruliad/deepthought-8b-llama-v0.01-alpha JSON format: 1. Problem understanding, 2. Data gathering, 3. Analysis, 4. Calculation (when applicable), 5. Verification, 6. Conclusion drawing, 7. Implementation
migtissera/Tess-R1-Limerick-Llama-3.1-70B xml tags:
1. <thinking> tag to indicate when the model is performing CoT.
2. <contemplation> tag when the model contemplate on its answers.
3. <alternatively> tag for alternate suggestions.
4. <output> for the final output

Novel model architectures

20 Jan 2025 Kimi k1.5: Scaling Reinforcement Learning with LLMs
20 Jan 2025 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via RL
09 Jan 2025 Transformer^2: Self-adaptive LLMs
31 Dec 2024 Titans: Learning to Memorize at Test Time
27 Dec 2025 Xmodel-2 Technical Report - Deep-and-Thin Architecture (1.2B, 48 layers)
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Memory3: Language Modeling with Explicit Memory
TTT: Learning to (Learn at Test Time): RNNs with Expressive Hidden States, Video
TransformerFAM: Feedback attention is working memory

Philosophy: Nature of Intelligence & Consciousness

A High Level Theory on the Nature of Intelligence and Consciousness

Joscha Bach

Machine Consciousness
Consciousness as a coherence-inducing operator Talk by Josha Bach at the Models of Consciousness Conferences

Biology / Neuroscience

Workshops

https://s2r-at-scale-workshop.github.io (NeurIPS 2024)

Tutorials

Neurips 2024 Tutorial: Beyond Decoding: Meta-Generation Algorithms for Large Language Models

How to contribute

To share a link related to reasoning in AI systems that is missing here please create a pull request for this file. See editing files in the github documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
LICENSE		LICENSE
README.md		README.md

License

open-thought/system-2-research

Folders and files

Latest commit

History

Repository files navigation