Skip to content

Latest commit

 

History

History
83 lines (52 loc) · 3.57 KB

README.md

File metadata and controls

83 lines (52 loc) · 3.57 KB

InternLM with Long Context

English | 简体中文

InternLM2.5 with 1M Context Length

We introduce InternLM2.5-7B-Chat-1M, a model developed to support extensive long inputs of up to 1M tokens. This enhancement significantly enhances the model's ability to handle ultra-long text applications. See model zoo for download or model cards for more details.

During pre-training, we utilized natural language corpora with text lengths of 256K tokens. To address the potential domain shift caused by homogeneous data, we supplemented with synthetic data to maintain the model's capabilities while expanding its context.

We employed the "needle in a haystack approach" to evaluate the model's ability to retrieve information from long texts. Results show that InternLM2.5-7B-Chat-1M can accurately locate key information in documents up to 1M tokens in length.

drawing

We also used the LongBench benchmark to assess long-document comprehension capabilities. Our model achieved optimal performance in these tests.

drawing

File Chat with InternLM2.5-1M

This section provides a brief overview of how to chat with InternLM2.5-7B-Chat-1M using an input document. For optimal performance, especially with extensively long inputs, we highly recommend using LMDeploy for model serving.

Supported Document Types

Currently, we support PDF, TXT, and Markdown files, with more file types to be supported soon!

  • TXT and Markdown files: These can be processed directly without any conversions.
  • PDF files: We have developed Magic-Doc, a lightweight open-source tool, to convert multiple file types to Markdown.

Installation

To get started, install the required packages:

pip install "fairy-doc[cpu]"
pip install streamlit
pip install lmdeploy

Deploy the Model

Download our model from model zoo.

Deploy the model using the following command. You can specify the session-len (sequence length) and server-port.

lmdeploy serve api_server {path_to_hf_model} \
--model-name internlm2-chat \
--session-len 65536 \
--server-port 8000

To further enlarge the sequence length, we suggest adding the following arguments: --max-batch-size 1 --cache-max-entry-count 0.7 --tp {num_of_gpus}

Launch the Streamlit Demo

streamlit run long_context/doc_chat_demo.py \
-- --base_url http://0.0.0.0:8000/v1

You can specify the port as needed. If running the demo locally, the URL could be http://0.0.0.0:{your_port}/v1 or http://localhost:{your_port}/v1. For virtual cloud machines, we recommend using VSCode for seamless port forwarding.

For long inputs, we suggest the following parameters:

  • Temperature: 0.05
  • Repetition penalty: 1.02

Of course, you can tweak these settings for optimal performance yourself in the web UI.

The effect is demonstrated in the video below.

doc-chat-demo.mp4

🔜 Stay Tuned for More

We are continuously enhancing our models to better understand and reason with extensive long inputs. Expect new features, improved performance, and expanded capabilities in upcoming updates!