Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match only first half of the mime type in FileTypeRouter #6191

Closed
ZanSara opened this issue Oct 29, 2023 · 2 comments · Fixed by #7303
Closed

Match only first half of the mime type in FileTypeRouter #6191

ZanSara opened this issue Oct 29, 2023 · 2 comments · Fixed by #7303
Labels
good first issue Good for newcomers P3 Low priority, leave it in the backlog

Comments

@ZanSara
Copy link
Contributor

ZanSara commented Oct 29, 2023

In many cases, converter components can handle an entire category of mimetypes, not only a specific one. For example, audio transcribers could handle all audio mimetypes, the text converter could handle all text mimetypes, and so on.

It would be convenient if FileTypeRouter could match a mime type only by its first half, so not audio/mpeg or audio/x-wav but audio.

@masci masci added the P2 Medium priority, add to the next sprint if no P1 available label Jan 11, 2024
@masci
Copy link
Contributor

masci commented Jan 11, 2024

This could be done accepting regexes in the mime_types: List[str] parameter - this way it would be backward compatible

@masci masci added the good first issue Good for newcomers label Jan 11, 2024
@Sgvkamalakar
Copy link

Hey @ZanSara , @masci ..

The following Python script utilizes the mimetypes module to access a dictionary containing all known MIME types and their corresponding file extensions.

import mimetypes
all_mime_types = mimetypes.types_map
s = set()
for mimetype, extension in all_mime_types.items():
    category = extension.split('/')[0]
    s.add((extension, category))
sorted_list = sorted(s, key=lambda x: x[0])
for i in sorted_list:  print(i)

Output:

...
...
...
('audio/3gpp', 'audio')
('audio/3gpp2', 'audio')
('audio/aac', 'audio')
('audio/basic', 'audio')
('audio/mpeg', 'audio')
('audio/opus', 'audio')
('audio/x-aiff', 'audio')
('audio/x-pn-realaudio', 'audio')
('audio/x-wav', 'audio')
('image/gif', 'image')
('image/heic', 'image')
('image/heif', 'image')
('image/ief', 'image')
('image/jpeg', 'image')
('image/png', 'image')
('image/svg+xml', 'image')
...
...
...

@masci masci added P3 Low priority, leave it in the backlog and removed P2 Medium priority, add to the next sprint if no P1 available labels Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers P3 Low priority, leave it in the backlog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants