server : (experimental) vision support via libmtmd #12898

ngxson · 2025-04-11T16:08:00Z

Cont #12849

This is my first trial to bring libmtmd to server.cpp.

For the list of supported models, see: #13012

The current goals of this PR are:

To see how libmtmd can be used in a different context than CLI, so that I can adapt it progressively in upcoming PRs
To provide a place to test the integration of other vision models

There are still quite a lot of problems:

Many features are too hard to be compatible, like speculative decoding, context shifting, slot cache save/load,
Cached prompt is half working now:
- Missing image hash compare (to know if we should remove cached tokens of the image)
- Sometimes, we get batch with 0 tokens (for example, enter 2 times the same prompt)
Batched decode is disabled on image embd batch, which will degrade performance in case of multi-slots

Implementation

The core idea of this implementation is to migrate the input from using a std::vector<llama_token> to std::vector<server_inp_chunk>

There was an API called mtmd_input_chunk introduced in #12849 , and the difference between mtmd_input_chunk vs server_inp_chunk is that server_inp_chunk only store one single token in case of text ; in case of image, it stores a pointer to the mtmd_image_tokens

struct server_inp_chunk {
    llama_token tok_text; // one single token, not a list of tokens
    mtmd_image_tokens_ptr tok_image;
};

The reason why I did this is because keeping track of KV this way seems easier (i.e. the code easier to write). Here we mostly care about the individual tokens ; We never need to look into individual image tokens anyway, if an image is different from the last one in cache, their embeddings will be completely different, so we simply throw away the whole image.

As mtmd_image_tokens_ptr uses unique_ptr under the hood, a side effect of this change is that now we eliminated some copy when passing the task from one function to another, hence many std::move are added to the change.

TODOs

automatically deactivate certain features if vision is enabled, we will work on these features later
implement hash function for image (to keep track of the cache)
fix detokenize(server_inp_chunk)
add more error handlings
maybe support remote image_url in addition of base64

Demo

The server can be run with this command:

llama-server -hf ggml-org/gemma-3-4b-it-GGUF

Client code, ONLY base64 input is supported atm:

import json
import base64
from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="sk-test", timeout=9999)

# Function to encode the image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


# Path to your image
image_path = "../models/bliss.png"

# Getting the Base64 string
base64_image = encode_image(image_path)

response = client.chat.completions.create(
    model="gpt-4o",
    temperature=0.1,
    stream=True,
    messages=[
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "describe what you see in details" },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{base64_image}",
                    },
                },
            ],
        }
    ],
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

print("\n\n")

With the image:

This will output:

qnixsynapse · 2025-04-12T14:57:09Z

Awesome work. However, I noticed the model usually ignores the text prompt when the prompt is the first in the conversation!

ngxson · 2025-04-12T15:27:05Z

@qnixsynapse can you capture the raw http request? If the json paymoad is big, you can share it via a gist

qnixsynapse · 2025-04-12T15:45:57Z

@ngxson Will this be okay? https://gist.github.com/qnixsynapse/a4c61368d05180d3cb6c00f1baedf92c

ngxson · 2025-04-12T16:01:47Z

at minimum I ask for this https://wiki.wireshark.org/hyper_text_transfer_protocol

not the raw IP packet

qnixsynapse · 2025-04-12T16:11:40Z

I don't have wireshark installed unfortunately. But you can still inspect for example:

POST /v1/chat/completions HTTP/1.1
Host: localhost:8080
Authorization: Bearer -key
Content-Type: application/json
Accept: */*
Accept-Encoding: gzip, deflate
User-Agent: Python/3.11 aiohttp/3.11.11
Content-Length: 615117

{"stream": true, "model": "Gemma", "messages": [{"role": "user", "content": [{"type": "text", "text": "Fact check the content in this image please."}, {"type": "image_url", "image_url": {"url": "data:image/png;base64,<base64 png data from line 88>"}}]}], "stream_options": {"include_usage": true}, "temperature": 1.0, "top_p": 0.9}

HTTP/1.1 200 OK
Keep-Alive: timeout=5, max=100
Content-Type: text/event-stream
Server: llama.cpp
Transfer-Encoding: chunked
Access-Control-Allow-Origin:

ngxson · 2025-04-13T21:39:53Z

@qnixsynapse I had a problem with my logic, which make it discard the text batch comes before the image batch.

It should be fixed now, could you give it a try?

ngxson · 2025-04-13T22:02:44Z

Btw @ggerganov I'm noting here for visibility: while working on this PR, I realize that I can have 2 refactoring which can be done in their dedicated PR:

The first one is quite simple, currently server_task is passed-by-copy in some places, we need to add some std::move
The second one is a bit more tricky. Currently, we track everything using a std::vector<llama_token>. However, for multimodal, I introduced the notion of "input chunks" along with libmtmd. Server need to be adapted to work with chunks of tokens / embeddings instead of a simple list of tokens.
In the current PR, I'm kinda hacking this by having server_inp_chunk to wrap around one single text token (so most of the text-related logic are unchanged). But obviously this brings some complication when dealing with both text + image chunks. Do you have any better ideas to handle this?

And I also have a question regarding the logic around batch_view. IIRC, this is because sometimes the batch is too large for llama_decode to process, so we may want to reduce the input batch size (dynamically). However, we also internally split the batch into ubatch, so I'm wondering if this logic is now obsolete.

Edit: optionally one more refactoring, we should split llama-server into different compilation units, currently it may takes up to 20s to compile

qnixsynapse · 2025-04-14T05:23:10Z

@ngxson ~~Can you please refresh this branch with master?~~

Nvm. Ended up using your fork .. ~~working great!!!~~ 👍

On further testing, it seems that llama_batch_size exceeds sometimes in successive requests.

common/common.cpp:1161: GGML_ASSERT(batch.seq_id[batch.n_tokens] && "llama_batch size exceeded") failed

ggerganov · 2025-04-14T06:20:52Z

And I also have a question regarding the logic around batch_view. IIRC, this is because sometimes the batch is too large for llama_decode to process, so we may want to reduce the input batch size (dynamically). However, we also internally split the batch into ubatch, so I'm wondering if this logic is now obsolete.

This was useful mainly before the defragmentation support was added. The reason is that with time the KV cache can become highly fragmented and even if it has capacity for n_tokens it won't be able to find a contiguous slot, so attempting to split the batch into smaller chunks was a way to workaround this. With defragmentation enabled by default this is now rarely necessary. So yes, this should be simplified in a separate PR.

I'll think about the input chunk question today and let you know if I have any thoughts.

qnixsynapse · 2025-04-15T02:55:36Z

common/arg.cpp

        params.mmproj.hf_repo = params.model.hf_repo;
    }
+    // TODO @ngxson : this will break non-vision model with -hf, need to fix before merging
    common_params_handle_model(params.mmproj,            params.hf_token, "", true);


@ngxson Is it possible to add a --no-offload-mmproj param here to keep the mmproj model on the CPU and the larger text model on GPU?

We can use mtmd_param_param: use_gpu=false to keep the projector model on CPU itself.

llama.cpp/examples/llava/mtmd.h

Lines 52 to 58 in 0019279

struct mtmd_context_params {

bool use_gpu = true;

bool print_timings = true;

int n_threads = 4;

enum ggml_log_level verbosity = GGML_LOG_LEVEL_INFO;

const char * image_marker = "<__image__>";

};

It will be useful where the GPU VRAM is limited.

It's added in #13093

the flag is --no-mmproj-offload instead of --no-offload-mmproj, to align with the existing --no-kv-offload

Beinsezii · 2025-04-15T08:36:29Z

Seems like the batch decoding dies when you send a variety of longer requests.

common/common.cpp:1159: GGML_ASSERT(batch.seq_id[batch.n_tokens] && "llama_batch size exceeded") failed

Easiest way to trigger is to just wiggle the sequence length around, like with the example code

import json
import base64
from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="sk-test", timeout=9999)

# Function to encode the image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


# Path to your image
image_path = "../models/bliss.png"

# Getting the Base64 string
base64_image = encode_image(image_path)

for mult in [100, 200]:  # (beinsezii) make sure it has to rebuild some cache the 2nd time
    response = client.chat.completions.create(
        model="gpt-4o",
        temperature=0.1,
        stream=True,
        messages=[
            {
                "role": "user",
                "content": [
                    { "type": "text", "text": "describe what you see in details\n" * mult },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/png;base64,{base64_image}",
                        },
                    },
                ],
            }
        ],
    )

    for chunk in response:
        print(chunk.choices[0].delta.content, end="")

    print("\n\n")

ngxson · 2025-04-21T21:14:17Z

examples/server/utils.hpp

+
+    void reserve_embd_batch(float * embd, int32_t n_tokens, llama_pos pos_0, llama_seq_id seq_id) {
+        GGML_ASSERT(n_tokens <= (int32_t)pos.size());
+        seq_ids[n_tokens] = nullptr;


@Beinsezii yes seems like it's due to this single line which shouldn't be here, I'm removing this, will push a commit

Should be fixed in f8bc466

And btw, as #13012 is merged, we can now test other models (except for qwen2vl, not yet supported)

Oh nice, server will automatically have everything libmtmd does? Like #13050 post merge

I'll try to break the decode again here in a bit.

yes, everything supported by libmtmd will be available on server

Using gemma 27 I've run multiple fat images interspersed with varying text sequences and it seems to be stable now. I'm up to like 5 images in one prompt, so that should be all good now.

I know from experience GLM is broken on ROCm, but I am curious about MiniCPM and SmolVLM so I'll try those at some point.

ibm-research/granite-vision-3.2-2b-GGUF fails to encode when feeding in a 3840x2160 image but otherwise it's been a positive experience.

yes some models requires larger --batch for now. this will be fixed soon

larger --batch is no longer required. for some models, you may want to bump the ctx size to 8k or 16k because they use a lot more tokens for bigger images

ngxson · 2025-04-23T06:35:07Z

A hash can be an arbitrary byte sequence, right? It's not necessarily a valid string.

Yes but storing it hex string is easier for debugging, so it must be converted to a hex string to prevent potential problems null byte. This conversion is currently missing in the code.

ngxson · 2025-04-23T20:26:50Z

Significant changes in last commits:

bump to latest master, we're now supporting Pixtral 12B
using FNV hash, using the image bitmap (NOT the raw file data)
support large image batches, so models like granite-vision or minicpm-v won't crash

Beinsezii · 2025-04-23T20:48:35Z

bump to latest master, we're now supporting Pixtral 12B

curious if small 3.1 is the same vision mechanism or if that will need more work as well

Update: seems like Pixtral is broken. It thinks bliss.png is a "blue and green grid" and other images it just interprets as corrupted or noise.

ngxson · 2025-04-23T21:19:41Z

Update: seems like Pixtral is broke

Which backend you're using? Does it give the same result when running via llama-mtmd-cli?

Beinsezii · 2025-04-24T01:35:48Z

Which backend you're using? Does it give the same result when running via llama-mtmd-cli?

ROCm and it seems to be temperature dependent?

Like 0.1 temp it will reply
It seems we're starting with an image of a serene landscape featuring a clear blue sky transitioning into lush green fields below.
where temp 1.0 is
it seems that the image you've shared contains a pattern of repeating colors and shapes that might be difficult to describe precisely without more context.

Meanwhile on CPU it always recognizes it as a landscape even at temp 2.0. ROCm at 2.0 claims there isn't an image at all lol. I imagine something is wrong because I don't think temp should swing the results that hard for such a simple prompt.

Haven't tried Vulkan yet. Identical behavior with mtmd-cli. Shall I open an issue?

Slight update: even pure textually the model just seems really bad on ROCm with a moderate or high temp. ~~I wonder if this is just fp16 vs fp32 compute?~~ alright even with CUDA_F16 off and f32 k/v cache the whole model is completely unusable on ROCm with even a mild temp lol.

HAV0X1014 · 2025-04-24T02:31:26Z

I'm getting wildly incorrect outputs with Pixtral. I'm using the server API and llama-mtmd-cli and the server seems to completely ignore that I've sent an image, but the CLI outputs garbage - mentioning either a mosaic of colors or just outputting complete nonsense. This image in particular made it go nuts, counting up from 2013 until generation stopped.

I'm using a 7900xtx, compiled with ROCm. Running it on CPU and GPU produced different, but still incorrect, results.

Beinsezii · 2025-04-24T03:35:07Z

@HAV0X1014 if you're trying cpu, try a clean cpu only build without HIP compiled at all. For some reason compiling with HIP but using --ngl 0 can still break some models. GLM 4 is the same way.

ngxson · 2025-04-24T07:40:11Z

For the problem with pixtral, please follow: #13065 (comment)

Beinsezii · 2025-04-24T08:32:05Z

Is there a way to pass images via non-chat completion yet? I see in the server readme at one point /completions could substitute images like

http post http://127.0.0.1:8080/completion --content-type application/json {
    prompt: 'What is in this image?[img-12]',
    "image_data": [{"data": (open /tmp/bliss.png | encode base64), "id": 12}]
}

but I don't believe that's functional anymore.

ngxson · 2025-04-24T08:52:09Z

@Beinsezii I don't spend my time adding /completions because this PR already took me a lot of time

ngxson · 2025-04-24T21:37:07Z

Putting pixtral aside (because the bug is not related to server support), @Beinsezii would you be interested in testing other models?

I think the current PR is in a "working" state, meaning it can already handle the most important use case, the /chat/completions endpoint. That's the first phase on my plan. I'm thinking about moving to the second phase, it's to iterate on the current idea and fix bugs.

So would be nice to know if the current approach works well with these cases:

Chat completion, text-only
Chat completion, text and image
Chat completion, text and image, but multiple texts and multiple images interleaved in the same message
Chat completion, with KV reuse

Things will not be supported for now:

context shift and prompt truncation
speculative decoding
slot save/load
raw /completions (non-chat)
/embeddings, /rerank, /infill
n_cache_reuse

Beinsezii · 2025-04-24T22:19:28Z

So would be nice to know if the current approach works well with these cases:

Chat completion, text-only

Chat completion, text and image

Chat completion, text and image, but multiple texts and multiple images interleaved in the same message

For Gemma 3 QAT I've done basically all of this quite a bit as of the sha1 commit by setting up a SillyTavern instance pointed at /v1/chat/completions/ and having gemma iteratively write short stories using groups of images as inputs. Silly is kind of a pain to configure for this but its easier than directly writing OAI requests imo. Pretty fun once you get it going; most images I had at once was 5 spread out over a few instructions.

The only real annoyance I've had monkeying around so far is the mmproj being locked to the GPU eating half of the memory I normally use for k/v cache. With checksumming now I think it would make sense to add an -nglv (or -nglm?) at some point since the mmproj isn't part of the AR that I'm aware.

Basically all of the other models that currently work are glorified captioners so I haven't extensively played with them since I don't train t2i loras.

Otherwise I'm not sure what else to monkey with besides having it yap for 20k tokens to see if something eventually breaks.

Chat completion, with KV reuse

Is this different from a cache hit when the first X of a new request matches in tokens so it doesn't re-process?

Beinsezii · 2025-04-25T04:11:38Z

Ah, seems like -hf might not work properly on models without an mmproj? I did bin/llama-server -hf bartowski/THUDM_GLM-4-32B-0414-GGUF:Q4_K_L and it 404'd repeatedly until I switched to the non-mtmd server branch.

ngxson · 2025-04-25T07:55:19Z

The only real annoyance I've had monkeying around so far is the mmproj being locked to the GPU eating half of the memory I normally use for k/v cache.

There is a recently added --no-mmproj-offload, I don't add fine control for offloading layers because the mmproj usually very small compared to text model, and also it may take me more time to actually implement it.

a cache hit when the first X of a new request matches in tokens so it doesn't re-process?

yes it is

Ah, seems like -hf might not work properly on models without an mmproj?

it should be fixed in #13082 , but I haven't tested yet. Are you running the latest commit in the current PR? 2df8c1a

Edit: ok currently it does not work because tokenizer expect a multimodal, but that will be an easy fix

Beinsezii · 2025-04-25T08:24:54Z

There is a recently added --no-mmproj-offload, I don't add fine control for offloading layers because the mmproj usually very small compared to text model, and also it may take me more time to actually implement it.

Ah, that's perfect. I missed that on the new commit.

Update: vram usage is exactly 0 MiB lower. Given it's processing images in milliseconds and not seconds, I'm going to assume the flag isn't working, at least on ROCm.

yes it is

seemed to be doing good last I tried. Prompt processing is slow on my card so I'd notice if it was missing a lot.

ngxson · 2025-04-25T08:41:50Z

ok thanks for reporting, --no-mmproj-offload should be fixed in the last commit

Beinsezii · 2025-04-25T08:58:08Z

ok thanks for reporting, --no-mmproj-offload should be fixed in the last commit

confirm it works now. I got all my vram back at the cost of waiting a whole minute for all 7 images to encode lol.

Currently it seems like the image hashing doesn't actually save the tokens, rather just is a global kv re-use check. I'm on a 7900X, but I imagine someone with a quad core waiting 30 seconds for their image, realizing their prompt had a typo, then waiting 30 more seconds because the history changed so the cache was evicted. Not sure if it's possible to also cache the image tokens themselves or not, but it would probably help some people.

ngxson · 2025-04-25T09:53:11Z

yeah I was thinking about caching image embeddings too, but it can be a bit tricky because we don't yet have a cache eviction strategy. but this can be added later on

ngxson · 2025-04-25T16:10:06Z

examples/server/utils.hpp

+// each chunk can contain either one SINGLE text token or pointer to image
+// this is to simplify the logic of KV cache management
+struct server_token {
+    llama_token txt;
+    std::shared_ptr<mtmd_image_tokens> img;


In order not to affect too much on the existing functionalities, I end up using this struct. My idea is that each instance of this struct correspond to exactly 1 token in KV cache.

For functionalities that rely on std::vector<llama_token>, like ctx shift or speculative decoding, it can continue to function with minimal change. Ofc we make sure to disable these features if mtmd is used, because these features cannot (yet?) handle image tokens.

Also, here I have to use shared_ptr because an image can be represented by multiple tokens, so all of these tokens will point to one single image. I do this way so tokens.size() automatically reflects the token count in the prompt. It may sound ugly, but I think this is the best interim solution before we come up with something more clean.

@ggerganov If you have a bit of time, could you do a quick review just to see if my global implementation is ok?

ngxson added 2 commits April 11, 2025 17:46

server : (experimental) vision support via libmtmd

466c6cd

mtmd : add more api around mtmd_image_tokens

2317e61

github-actions bot added examples server labels Apr 11, 2025

ngxson mentioned this pull request Apr 11, 2025

server: Bring back multimodal support #8010

Open

19 tasks

ngxson added 2 commits April 11, 2025 21:49

mtmd : add more api around mtmd_image_tokens

a46b6db

mtmd : ability to calc image hash

7ac0b7b

ngxson mentioned this pull request Apr 11, 2025

mtmd : add methods to access mtmd_image_tokens #12906

Merged

ngxson added 2 commits April 12, 2025 10:34

shared_ptr for mtmd_image_tokens

58c4767

move hash to user-define ID (fixed)

d3c3e20

ngxson added 2 commits April 13, 2025 17:40

Merge branch 'xsn/mtmd_image_api' into xsn/server_mtmd

a44029a

abstract out the batch management

5e6c7ba

Merge branch 'master' into xsn/server_mtmd

78a76de

ngxson mentioned this pull request Apr 14, 2025

server : use std::move whenever possible #12936

Merged

qnixsynapse reviewed Apr 15, 2025

View reviewed changes

ngxson added 2 commits April 21, 2025 22:39

Merge branch 'master' into xsn/server_mtmd

c734b53

small fix

a6a3653

ngxson commented Apr 21, 2025

View reviewed changes

ngxson added 2 commits April 21, 2025 23:18

refactor logic adding tokens to batch

f8bc466

implement hashing image

f5420e1

ngxson added 4 commits April 23, 2025 20:48

Merge branch 'master' into xsn/server_mtmd

aae2e69

use FNV hash, now hash bitmap instead of file data

cd11585

allow decoding image embedding to be split into batches

8afa952

rm whitespace

989730c

ngxson mentioned this pull request Apr 23, 2025

arg : clean up handling --mmproj with -hf #13082

Merged

ngxson added 2 commits April 24, 2025 21:23

Merge branch 'master' into xsn/server_mtmd

19b9fe1

disable some features when mtmd is on

2df8c1a

fix --no-mmproj-offload

b9ef895

ngxson added 3 commits April 25, 2025 11:55

mtmd_context_params no timings

add9e21

Merge branch 'master' into xsn/server_mtmd

0f39770

refactor server_inp to server_tokens

58100b3

ngxson commented Apr 25, 2025

View reviewed changes

	struct mtmd_context_params {
	bool use_gpu = true;
	bool print_timings = true;
	int n_threads = 4;
	enum ggml_log_level verbosity = GGML_LOG_LEVEL_INFO;
	const char * image_marker = "<__image__>";
	};

server : (experimental) vision support via libmtmd #12898

Are you sure you want to change the base?

server : (experimental) vision support via libmtmd #12898

Conversation

ngxson commented Apr 11, 2025 • edited Loading

Implementation

TODOs

Demo

qnixsynapse commented Apr 12, 2025

ngxson commented Apr 12, 2025

qnixsynapse commented Apr 12, 2025

ngxson commented Apr 12, 2025

qnixsynapse commented Apr 12, 2025 • edited Loading

ngxson commented Apr 13, 2025

ngxson commented Apr 13, 2025 • edited Loading

qnixsynapse commented Apr 14, 2025 • edited Loading

ggerganov commented Apr 14, 2025

qnixsynapse Apr 15, 2025

Choose a reason for hiding this comment

ngxson Apr 25, 2025 • edited Loading

Choose a reason for hiding this comment

Beinsezii commented Apr 15, 2025

ngxson Apr 21, 2025

Choose a reason for hiding this comment

ngxson Apr 21, 2025

Choose a reason for hiding this comment

Beinsezii Apr 21, 2025 • edited Loading

Choose a reason for hiding this comment

ngxson Apr 21, 2025

Choose a reason for hiding this comment

Beinsezii Apr 21, 2025

Choose a reason for hiding this comment

Beinsezii Apr 21, 2025 • edited Loading

Choose a reason for hiding this comment

ngxson Apr 22, 2025

Choose a reason for hiding this comment

ngxson Apr 25, 2025

Choose a reason for hiding this comment

ngxson commented Apr 23, 2025

ngxson commented Apr 23, 2025 • edited Loading

Beinsezii commented Apr 23, 2025 • edited Loading

ngxson commented Apr 23, 2025

Beinsezii commented Apr 24, 2025 • edited Loading

HAV0X1014 commented Apr 24, 2025

Beinsezii commented Apr 24, 2025

ngxson commented Apr 24, 2025

Beinsezii commented Apr 24, 2025

ngxson commented Apr 24, 2025

ngxson commented Apr 24, 2025 • edited Loading

Beinsezii commented Apr 24, 2025

Beinsezii commented Apr 25, 2025

ngxson commented Apr 25, 2025 • edited Loading

Beinsezii commented Apr 25, 2025 • edited Loading

ngxson commented Apr 25, 2025

Beinsezii commented Apr 25, 2025

ngxson commented Apr 25, 2025

ngxson Apr 25, 2025 • edited Loading

Choose a reason for hiding this comment

ngxson commented Apr 11, 2025 •

edited

Loading

qnixsynapse commented Apr 12, 2025 •

edited

Loading

ngxson commented Apr 13, 2025 •

edited

Loading

qnixsynapse commented Apr 14, 2025 •

edited

Loading

ngxson Apr 25, 2025 •

edited

Loading

Beinsezii Apr 21, 2025 •

edited

Loading

Beinsezii Apr 21, 2025 •

edited

Loading

ngxson commented Apr 23, 2025 •

edited

Loading

Beinsezii commented Apr 23, 2025 •

edited

Loading

Beinsezii commented Apr 24, 2025 •

edited

Loading

ngxson commented Apr 24, 2025 •

edited

Loading

ngxson commented Apr 25, 2025 •

edited

Loading

Beinsezii commented Apr 25, 2025 •

edited

Loading

ngxson Apr 25, 2025 •

edited

Loading