Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV support? #97

Open
wesinator opened this issue Dec 4, 2024 · 2 comments
Open

CSV support? #97

wesinator opened this issue Dec 4, 2024 · 2 comments

Comments

@wesinator
Copy link

I realise this might be too much scope creep, but I think it would be nice if this tool were adapted to work on CSV files.

  • functionality in the extension could handle a CSV url + mimetype
  • a user could upload a CSV in the local instance for discovery

This could be implemented by a "wrapper" that converts CSV content to a list of JSON objects and then passes that JSON to the discovery tool. a max size limit could be applied.

@lahmatiy
Copy link
Member

lahmatiy commented Dec 6, 2024

I had similar thoughts and even looked into this possibility. The main problem with CSV is that it’s not a strict format, making it extremely difficult to distinguish CSV from other content automatically without false positives. Additionally, the format is highly variable in terms of delimiters, character escaping methods, and so forth.

It might be implemented someday, but it’s far from our current priorities. Most likely, this feature will be opt-in, enabled in the settings at the user’s own risk. In the near future, our plans include adding support for JSONL.

@wesinator
Copy link
Author

The main problem with CSV is that it’s not a strict format, making it extremely difficult to distinguish CSV from other content automatically without false positives

I bet 99% of the time it's a .csv (or .tsv) extension. Detecting delimiters should be relatively easy in most cases (best guess match comma or tab based on extension, then if no matches, non-word symbol character).
I see what you mean, but it seems doable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants