Allow Attaching/Drag-and-Dropping more file types #715

humphd · 2024-11-08T23:44:32Z

This updates our "Attach Files..." menu option and Drag-and-Drop to support the following file types:

images: works same as before
PDF files: converts to markdown with jina.ai, then creates a new message with the contents
text files: tries to syntax highlight when type is known and creates a new message with the contents
Word DOCX: converts to HTML then to Markdown (the last bit isn't working yet)

You can attach or drag-and-drop a bunch of files and it will process them all.

A few things could be improved, which I ran out of time to work on (can happen in follow ups, or others can push fixes):

I want proper CSS to happen when the user drags over the prompt form so you know it's a drop target. I tried to do something with a green border, but it's not working.
~~There is a bunch of code duplication here that could be tamed~~
I want to be able to add a jina.ai provider key in the settings modal so you can have LLM Providers and "Document Providers" (e.g., services that allow you to process file types into text for LLMs). This isn't critical, since it works now.

@menghif, @Amnish04: can you see what I'm doing wrong with my CSS on the drop target?

humphd · 2024-11-08T23:46:28Z

~~It would be cool to add DOCX support next by passing the file through https://github.com/mwilliamson/mammoth.js/ and then sending the HTML to jina.ai to get Markdown.~~ Done!

cloudflare-workers-and-pages · 2024-11-09T22:14:18Z

Deploying chatcraft-org with Cloudflare Pages

Latest commit:	`8aa52ae`
Status:	✅ Deploy successful!
Preview URL:	https://0e9e0de8.console-overthinker-dev.pages.dev
Branch Preview URL:	https://humphd-import-non-image-file.console-overthinker-dev.pages.dev

View logs

humphd · 2024-11-10T01:18:29Z

Refactored to remove code duplication, simplified things, added support for Word DOCX (I can't get Jina.ai to do the right thing when I POST an HTML file to it, so for now I'm leaving the Word DOCX in HTML format, which still works).

humphd · 2024-11-10T01:27:07Z

I'd say this is good to land and we can do follow-ups. Anyone want to test before I do?

Amnish04

@humphd Looks and works great!

Left some comments on why CSS and html to markdown might not be working.

Amnish04 · 2024-11-10T05:47:50Z

src/lib/ai.ts

@@ -510,3 +510,109 @@ export function isChatModel(model: string): boolean {
    !(isTextToSpeechModel(model) || isSpeechToTextModel(model) || isTextToImageModel(model))
  );
 }
+
+export type JinjaReaderResponse = {


Isn't this Jina?

You can't believe how many times I've made this mistake. For whatever reason, my brain can't do jina vs. jinja...

Amnish04 · 2024-11-10T06:22:38Z

src/lib/ai.ts

+  // TODO: jina.ai is not returning the expected result for HTML, not sure why...
+  return convertToMarkdownWithJina(html, "html");


Is this because you return the input content as is, when file is a string and type is html?

chatcraft.org/src/lib/ai.ts

Lines 566 to 568 in c454e07

} else if (type === "html" && typeof file === "string") {

// HTML file is already a string

contents = file;

No, I want that behaviour: I only want to read the file into a base64 string if it's a PDF, but if it's already parsed HTML content, I want that bare string.

However, the bug seems to be with Jina.ai requiring a URL when you post your local HTML file, which makes no sense to me. Even when I add it, it seems to only return the first line of text.

I think I'll give up on this bit and rip it out.

Amnish04 · 2024-11-10T06:38:44Z

src/components/PromptForm/DesktopPromptForm.tsx

+            _hover={{ borderColor: activeBorder }}
+            borderColor={isDragActive ? activeBorder : "inherit"}


Suggested change

_hover={{ borderColor: activeBorder }}

borderColor={isDragActive ? activeBorder : "inherit"}

border={"2px solid"}

borderColor={isDragActive ? activeBorder : "inherit"}

The reason is there is no border here to start with :)

Amnish04 · 2024-11-10T06:40:36Z

src/components/PromptForm/DesktopPromptForm.tsx

@@ -291,19 +304,23 @@ function DesktopPromptForm({
    // Otherwise, let the default paste handling happen
  };

+  const activeBorder = useColorModeValue("green.100", "green.600");


Suggested change

const activeBorder = useColorModeValue("green.100", "green.600");

const activeBorder = useColorModeValue("green.200", "green.600");

I would change the lighter version to green.200. Current shade is too light.

Before:

After:

Thanks, I've done a version of this. I still think it could be made better than what I can do in a follow-up, but now it works.

Amnish04 · 2024-11-10T06:53:28Z

src/hooks/use-file-import.tsx

+        const document = (contents as JinjaReaderResponse).data;
+        chat.addMessage(new ChatCraftHumanMessage({ text: `${document.content}\n` }));


While testing, I ran into a case where my pdf content was not text and jina returned the following content:

{ "text": "\n" }

leading to an empty message like this:

Should we have an additional check here so we only add a user message when we have some content.trim()?

Also, should we be paying attention the chat's size by rejecting large files and showing error alerts, so we don't exceed model context lengths? Could probably be done in a follow up if we consider this an issue.

Added an info warning when a file is empty instead of importing.

For now, I think we should just try to import all files and let the browser/LLM complain if things are too large.

Allow Attaching/Drag-and-Dropping more file types

3ef820d

Refactor duplication to extract common hook

9d119da

Refactor, support Word DOCX

c454e07

Amnish04 reviewed Nov 10, 2024

View reviewed changes

humphd added 4 commits November 10, 2024 07:52

s/jinja/jina/

16ed1ff

Fix drag-drop CSS

bede921

Fix DOCX processing

55a99c1

Deal with empty files on import

8aa52ae

humphd merged commit ebb9258 into main Nov 10, 2024
4 checks passed

humphd deleted the humphd/import-non-image-files branch November 10, 2024 14:50

humphd mentioned this pull request Nov 10, 2024

Add support for non-LLM AI provider API keys #718

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Attaching/Drag-and-Dropping more file types #715

Allow Attaching/Drag-and-Dropping more file types #715

humphd commented Nov 8, 2024 •

edited

Loading

humphd commented Nov 8, 2024 •

edited

Loading

cloudflare-workers-and-pages bot commented Nov 9, 2024 •

edited

Loading

humphd commented Nov 10, 2024

humphd commented Nov 10, 2024

Amnish04 left a comment

Amnish04 Nov 10, 2024

humphd Nov 10, 2024

Amnish04 Nov 10, 2024 •

edited

Loading

humphd Nov 10, 2024

Amnish04 Nov 10, 2024

Amnish04 Nov 10, 2024

humphd Nov 10, 2024

Amnish04 Nov 10, 2024

humphd Nov 10, 2024

		// TODO: jina.ai is not returning the expected result for HTML, not sure why...
		return convertToMarkdownWithJina(html, "html");

	} else if (type === "html" && typeof file === "string") {
	// HTML file is already a string
	contents = file;

		_hover={{ borderColor: activeBorder }}
		borderColor={isDragActive ? activeBorder : "inherit"}

	const activeBorder = useColorModeValue("green.100", "green.600");
	const activeBorder = useColorModeValue("green.200", "green.600");

		const document = (contents as JinjaReaderResponse).data;
		chat.addMessage(new ChatCraftHumanMessage({ text: `${document.content}\n` }));

Allow Attaching/Drag-and-Dropping more file types #715

Allow Attaching/Drag-and-Dropping more file types #715

Conversation

humphd commented Nov 8, 2024 • edited Loading

humphd commented Nov 8, 2024 • edited Loading

cloudflare-workers-and-pages bot commented Nov 9, 2024 • edited Loading

Deploying chatcraft-org with Cloudflare Pages

humphd commented Nov 10, 2024

humphd commented Nov 10, 2024

Amnish04 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Amnish04 Nov 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

humphd commented Nov 8, 2024 •

edited

Loading

humphd commented Nov 8, 2024 •

edited

Loading

cloudflare-workers-and-pages bot commented Nov 9, 2024 •

edited

Loading

Amnish04 Nov 10, 2024 •

edited

Loading