Replies: 11 comments 3 replies
-
Take a look at question answering and the document question answering tasks for this use case instead of just a generic text generation model. There are only a couple question answering models available to transformers.js currently. However, the most popular document-qa model, layloutlm-document-qa seems small enough that, quantized, it should easily fit on browser. So we can upload it to HF in onnx weights for transformers.js. |
Beta Was this translation helpful? Give feedback.
-
A quick heads-up for some models already integrated in transformers.js The result is not great but also not terrible. I guess users would expect a more detailed and nuanced answer, but the model is trained on providing short answers. Maybe the context is also too long. Instead of searching with The question answering model might be easy to integrate but I'm not sure about the quality. Also it highly depends on the quality of search results. Summarization instead might be nice to integrate and seems to work ok-ish. I'll continue testing with other models. |
Beta Was this translation helpful? Give feedback.
-
Very cool! To reduce the context window, one idea is you can first use SemanticFinder to first get the k most relevant excerpts from the text, then you can ask the question about those. |
Beta Was this translation helpful? Give feedback.
-
By the way, I just found the company axilla, building a frontend for LLMs and its necessary settings (top-k documents / chunk retrieval etc.). There is a screenshot in the repo. |
Beta Was this translation helpful? Give feedback.
-
FYI: Llama2 support just landed. I might find some time next week to try some things. |
Beta Was this translation helpful? Give feedback.
-
Google just announced AI summaries in Chrome but there are lots of open questions about privacy, quality etc. |
Beta Was this translation helpful? Give feedback.
-
I think the summary function of distilbart-cnn-6-6 is pretty good at least for non-fictional text. Example from IPCC summary: Query
Top 3 results
Summary
I'll take a stab at it and think about how to best (optionally) integrate it in SemanticFinder. |
Beta Was this translation helpful? Give feedback.
-
Added the functionality in 9c0b66c. Works quite well with longer, non-fictional texts. Takes 1-2 mins for every run, so it would be great to look for an alternative model. Integration with the progress bar is yet missing. |
Beta Was this translation helpful? Give feedback.
-
I tested https://github.com/Mozilla-Ocho/llamafile last week and it's pretty cool! One binary file running on localhost and that's it. |
Beta Was this translation helpful? Give feedback.
-
Alright, so I just tested llamafile and the good thing is, it provides a super convenient API with a readableStream object. If you have the binary running locally, you can use the API in JS with this code, logging every token to the console: // Your fetch request
const url = "http://127.0.0.1:8080/completion";
const headers = {
"accept": "text/event-stream",
"accept-language": "de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7",
"cache-control": "no-cache",
"content-type": "application/json",
"pragma": "no-cache",
"sec-ch-ua": "\"Not_A Brand\";v=\"8\", \"Chromium\";v=\"120\", \"Google Chrome\";v=\"120\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"Windows\"",
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "same-origin",
// ... other headers
};
const body = {
"stream": true,
"n_predict": 400,
"temperature": 0.7,
"stop": ["</s>", "Llama:", "User:"],
"repeat_last_n": 256,
"repeat_penalty": 1.18,
"top_k": 40,
"top_p": 0.5,
"tfs_z": 1,
"typical_p": 1,
"presence_penalty": 0,
"frequency_penalty": 0,
"mirostat": 0,
"mirostat_tau": 5,
"mirostat_eta": 0.1,
"grammar": "",
"n_probs": 0,
"image_data": [],
"cache_prompt": true,
"slot_id": 0,
"prompt": "This is a conversation between User and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.\n\nUser:What is the meaning of life?\nLlama:"
};
fetch(url, {
method: "POST",
headers: headers,
body: JSON.stringify(body),
referrer: "http://127.0.0.1:8080/",
//referrerPolicy: "strict-origin-when-cross-origin",
mode: "cors",
credentials: "omit",
// ... other options
})
.then(response => {
const reader = response.body.getReader();
// Read the stream and process data as it comes
return reader.read().then(function processText({ done, value }) {
if (done) {
console.log("Stream completed");
return;
}
// Assuming the value is a Uint8Array, convert it to text
const text = new TextDecoder().decode(value);
// Check if the text contains "data: "
if (text.startsWith("data: ")) {
// Remove the "data: " prefix
const jsonDataString = text.substring("data: ".length);
try {
// Parse the response text as JSON
const jsonData = JSON.parse(jsonDataString);
// Log the response JSON object
console.log(jsonData.content);
} catch (error) {
console.error("Error parsing JSON:", error);
}
}
// Continue reading the stream
return reader.read().then(processText);
});
})
.catch(error => {
console.error("Error:", error);
});
However, the only thing, stopping me from connecting it to SemanticFinder is CORS. I guess, this is something that should be optional as a flag on llamafile side. Will ask the folks over there whether it would be in scope to offer CORS policy modification as an option for web experiments. |
Beta Was this translation helpful? Give feedback.
-
Just integrated Ollama. It has a huge community and provides exactly what I had in mind! |
Beta Was this translation helpful? Give feedback.
-
Maybe with the recent efforts in bringing LLMs to the browser we could think about a POC text generation demo based on page content.
Here's the link to a working Llama 2 of ggerganov's implementation running in the browser: https://twitter.com/ggerganov/status/1683174252990660610?t=SghA57AGQTQ4n660HuJ9lg&s=19
There are a few things to test:
how many results should be piped to the prompt? How to determine this number? It would be hard-coded in the beginning and after there could be some heuristic.
what is the most effective prompt for llama 2? How to provide the context efficiently? How to wrap the user's question?
how to avoid hallucinations if the semantic search results doesn't provide the right context for the user's question?
@VarunNSrivastava this would be very interesting for the browser plugin #15 too! "Chat with the web"- style, it would enable chatting with any (single page) website.
@lizozom did you follow-up on the use of web GPU so far? That would be a great addition (even though atm the c web implementation can't use it) However, I think that we should probably wait for transformers.js to add the feature.
Update: I just saw that distilgpt2 for text generation is already integrated in transformers.js with 122Mb. I'll play with it next week and try to figure out whether the results are satisfying.
Beta Was this translation helpful? Give feedback.
All reactions