English | 简体中文
We introduce InternLM2.5-7B-Chat-1M, a model developed to support extensive long inputs of up to 1M tokens. This enhancement significantly enhances the model's ability to handle ultra-long text applications. See model zoo for download or model cards for more details.
During pre-training, we utilized natural language corpora with text lengths of 256K tokens. To address the potential domain shift caused by homogeneous data, we supplemented with synthetic data to maintain the model's capabilities while expanding its context.
We employed the "needle in a haystack approach" to evaluate the model's ability to retrieve information from long texts. Results show that InternLM2.5-7B-Chat-1M can accurately locate key information in documents up to 1M tokens in length.
We also used the LongBench benchmark to assess long-document comprehension capabilities. Our model achieved optimal performance in these tests.
This section provides a brief overview of how to chat with InternLM2.5-7B-Chat-1M using an input document. For optimal performance, especially with extensively long inputs, we highly recommend using LMDeploy for model serving.
Currently, we support PDF, TXT, and Markdown files, with more file types to be supported soon!
- TXT and Markdown files: These can be processed directly without any conversions.
- PDF files: We have developed Magic-Doc, a lightweight open-source tool, to convert multiple file types to Markdown.
To get started, install the required packages:
pip install "fairy-doc[cpu]"
pip install streamlit
pip install lmdeploy
Download our model from model zoo.
Deploy the model using the following command. You can specify the session-len
(sequence length) and server-port
.
lmdeploy serve api_server {path_to_hf_model} \
--model-name internlm2-chat \
--session-len 65536 \
--server-port 8000
To further enlarge the sequence length, we suggest adding the following arguments:
--max-batch-size 1 --cache-max-entry-count 0.7 --tp {num_of_gpus}
streamlit run long_context/doc_chat_demo.py \
-- --base_url http://0.0.0.0:8000/v1
You can specify the port as needed. If running the demo locally, the URL could be http://0.0.0.0:{your_port}/v1
or http://localhost:{your_port}/v1
. For virtual cloud machines, we recommend using VSCode for seamless port forwarding.
For long inputs, we suggest the following parameters:
- Temperature: 0.05
- Repetition penalty: 1.02
Of course, you can tweak these settings for optimal performance yourself in the web UI.
The effect is demonstrated in the video below.
doc-chat-demo.mp4
We are continuously enhancing our models to better understand and reason with extensive long inputs. Expect new features, improved performance, and expanded capabilities in upcoming updates!