Skip to content

Latest commit

 

History

History
142 lines (110 loc) · 4.96 KB

File metadata and controls

142 lines (110 loc) · 4.96 KB

An end-to-end example: quantize a Hugging Face model with Neural Solution

In this example, we show how to quantize a Hugging Face model with Neural Solution.

Objective

  • Demonstrate how to start the Neural Solution Service.
  • Demonstrate how to prepare an optimization task request and submit it to Neural Solution Service.
  • Demonstrate how to query the status of the task and fetch the optimization result.
  • Demonstrate how to query and manage the resource of the cluster.

Start the Neural Solution Service

# Activate your environment
conda activate ENV

# Start neural solution service with default configuration, log will be saved in the "serve_log" folder.
neural_solution start

# Start neural solution service with custom configuration
neural_solution start --task_monitor_port=22222 --result_monitor_port=33333 --restful_api_port=8001

# Stop neural solution service with default configuration
neural_solution stop

# Help Manual
neural_solution -h
# Help output

usage: neural_solution {start,stop,cluster} [-h] [--hostfile HOSTFILE] [--restful_api_port RESTFUL_API_PORT] [--grpc_api_port GRPC_API_PORT]
                   [--result_monitor_port RESULT_MONITOR_PORT] [--task_monitor_port TASK_MONITOR_PORT] [--api_type API_TYPE]
                   [--workspace WORKSPACE] [--conda_env CONDA_ENV] [--upload_path UPLOAD_PATH] [--query] [--join JOIN] [--remove REMOVE]

Neural Solution

positional arguments:
  {start,stop,cluster}  start/stop/management service

optional arguments:
  -h, --help            show this help message and exit
  --hostfile HOSTFILE   start backend serve host file which contains all available nodes
  --restful_api_port RESTFUL_API_PORT
                        start restful serve with {restful_api_port}, default 8000
  --grpc_api_port GRPC_API_PORT
                        start gRPC with {restful_api_port}, default 8000
  --result_monitor_port RESULT_MONITOR_PORT
                        start serve for result monitor at {result_monitor_port}, default 3333
  --task_monitor_port TASK_MONITOR_PORT
                        start serve for task monitor at {task_monitor_port}, default 2222
  --api_type API_TYPE   start web serve with all/grpc/restful, default all
  --workspace WORKSPACE
                        neural solution workspace, default "./ns_workspace"
  --conda_env CONDA_ENV
                        specify the running environment for the task
  --upload_path UPLOAD_PATH
                        specify the file path for the tasks
  --query               [cluster parameter] query cluster information
  --join JOIN           [cluster parameter] add new node into cluster
  --remove REMOVE       [cluster parameter] remove <node-id> from cluster

Submit optimization task

  • Step 1: Prepare the json file includes request content. In this example, we have created request that quantize a Text classification model from Hugging Face.
[user@server hf_model]$ cd path/to/neural_solution/examples/hf_model
[user@server hf_model]$ cat task_request.json
{
    "script_url": "https://github.com/huggingface/transformers/blob/v4.21-release/examples/pytorch/text-classification/run_glue.py",
    "optimized": "False",
    "arguments": [
        "--model_name_or_path bert-base-cased --task_name mrpc --do_eval --output_dir result"
    ],
    "approach": "static",
    "requirements": [],
    "workers": 1
}
  • Step 2: Submit the task request to service, and it will return the submit status and task id for future use.
[user@server hf_model]$ curl -H "Content-Type: application/json" --data @./task.json  http://localhost:8000/task/submit/

# response if submit successfully
{
    "status": "successfully",
    "task_id": "cdf419910f9b4d2a8320d0e420ac1d0a",
    "msg": "Task submitted successfully"
}

Query optimization result

  • Query the task status and result according to the task_id.
[user@server hf_model]$ curl  -X GET  http://localhost:8000/task/status/{task_id}

# return the task status
{
    "status": "done",
    "optimized_result": {
        "optimization time (seconds)": "58.15",
        "accuracy": "0.3162",
        "duration (seconds)": "4.6488"
    },
    "result_path": "http://localhost:8000/download/7602cd63d4c849e7a686a8165a77f69d"
    }
}

Download optimized model

  • Download the optimized model according to the task_id.
[user@server tf_example1]$ curl -X GET  http://localhost:8000/download/{task_id} --output quantized_model.zip
# download quantized_model.zip

Manage resource

# query cluster information
neural_solution cluster --query

# add new node into cluster
# parameter: "<node1> <number_of_sockets> <number_of_threads>;<node2> <number_of_sockets> <number_of_threads>"
neural_solution cluster --join "host1 2 20; host2 5 20"

# remove node from cluster according to id
neural_solution cluster --remove <node-id>

Stop the service

neural_solution stop