This is the official repository for the paper "HLFS: A Framework for Human-Like Forgetting in Long-Term Memory Systems". In this paper, we introduce the Human-Like Forgetting System (HLFS), a comprehensive forgetting system based on a variant of the Ebbinghaus forgetting curve. Building upon this, we have developed the General Chat Box (GCB). The overall architecture of the model is illustrated in the following diagram:
- [2024-06-08] We have established an open-source repository for HLFS for the first time
HLFS is a comprehensive forgetting system with a range of functions including memory storage, data backup and recovery, and a recall mechanism. Powered by HLFS, GCB integrates gpt-3.5-turbo and boasts excellent dialogue processing capabilities.
The detailed project structure tree is as follows:
HLFS/
├── box/ -- GCB
│ ├── box_demo.py
│ └── box_functions.py
│
├── config/
│ └── config.py
│
├── core/
│ ├── forget/ -- Forgetting curve
│ ├── retrieve/ -- Memory retrieve methods
│ │ ├── con_sim.py
│ │ └── faiss.py
│ └── mem_struct.py -- Data structure for HLFS and GCB
│
├── data/ -- Datasets
│
├── eval/
│ ├── data/ -- Datasets of the critical dialogue retention=30%, 56%, 82%
│ │ ├── retention_30
│ │ ├── retention_56
│ │ └── retention_82
│ ├── image/ -- Results
│ ├── cal_metrics.py -- Benchmark
│ ├── cal_sim.py -- Similarity calculation
│ ├── curve.py -- Curve plotting
│ └── data_loader.py -- Loading the construction dataset
│
├── history/ -- Memory storage
│ └── bake/ -- Dataset of Retention=100%
│
├── logs/
│
├── misc/
│
├── prompts/
│ └── prompts.py -- The prompts and correlation functions
│
├── utils/
│ ├── api.py
│ ├── log.py
│ └── tools.py
│
├── requirements.txt
│
└── README.md
│
└── second_derivative_integral.py -- Calculation of second derivative integration in the appendix
In config/config.py
, Contains parameters related to HLFS and GCB. In addition, you need to put all your apiKeys into openai_key
(especially if your apiKeys are restricted)
The key requirements are as below:
- python 3.9+
- openai 0.27.0+
- llama-index==0.5.17.post1
- transformers==4.28.0
Step 1: Create a new conda environment:
conda create -n hlfs python=3.9 -y
conda activate hlfs
Step 2: Install relevant packages
pip install -r requirements.txt
We strive to make the process of experiencing the code as simple as possible.
We provide a method for loading and constructing datasets, which is in eval/data_loader.py
, You can change variable data_dir
and data_file
to select one.
But we have already helped complete this step, The constructed datasets under different retention rates are as follows:
- R=100%:
history/bake/
- R=82%:
eval/data/retention_82
- R=56%:
eval/data/retention_56
- R=30%:
eval/data/retention_30
you can directly transfer the constructed dataset to history/
so that GCB can read directly from it:
mv history/bake/en/exercise/* history/
or:
mv eval/data/retention_82/en/exercise/* history/
You can run this command to directly communicate with GCB:
cd box/
python box_demo.py \
--language en \
--history_path ../history \
--test_model False \
--ltm_box hlfs
We provide a quick overview of the arguments:
--language
: Choose language['cn', 'en']
--history_path
: Set the location for saving conversation history, user information, etc. We strongly recommend that you keep the default../history
--test_model
: Test mode, if you want to use the dataset provided in our paper for result validation, please set it toTrue
. In this case, the Global Memory Table (GMT) will not be updated and your conversation process will not be saved. If you want to experience the complete GCB conversation service, please set it toFalse
. You will experience the full functionality.--ltm_box
: We inherited two benchmarks['hlfs', 'scm']
. For MemoryBank, please visit MemoryBank-SiliconFriend
/history
to be non-empty.
cd core/
python timer.py \
--test_model False \
--time_gap 1 \
--recall_times 1 \
--forgetting_cycle 1
Here:
--test_model
: IfTrue
, you need to set--time_gap
and--recall_times
by yourself; ifFalse
, you need to set--forgetting_cycle
--time_gap
: The time gap between the last recall and the present (test_model = True)--recall_times
: recall times (test_model = True)--forgetting_cycle
-eval/cal_metrics.py
: Benchmark comparative experiment
-eval/cal_sim.py
: Contextual Coherence, Topic Similarity.
-second_derivative_integral.py
: Calculate the second derivative integral, as detailed in the appendix of the paper
📈 We have already run the program in advance and all the results have been accurately labeled in the dataset, Please refer to the code for specific usage details
This project is released under the MIT license. Please see the LICENSE file for more information.