-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Directly get image from indices #377
Comments
Hi, you can replace the urls in the metadata by urls pointing to an http
service (eg nginx) where you host the images you have downloaded
…On Mon, Mar 18, 2024, 12:43 PM Maxwells_Ayakashi ***@***.***> wrote:
Hi, Rom. I have downloaded laion400m and launched KnnService with follow
arguments:
indices_paths="indices_paths_ViTL14.json"
clip_model="ViT-L/14"
enable_hdf5=False
enable_faiss_memory_mapping=True
columns_to_return = ["url", "image_path", "caption", "NSFW"]
reorder_metadata_by_ivf_index=False
enable_mclip_option=True
use_jit=False
use_arrow=True
provide_safety_model=False
provide_violence_detector=False
provide_aesthetic_embeddings=True
clip_resources = load_clip_indices(
indices_paths=indices_paths,
clip_options=ClipOptions(
indice_folder="",
clip_model=clip_model,
enable_hdf5=enable_hdf5,
enable_faiss_memory_mapping=enable_faiss_memory_mapping,
columns_to_return=columns_to_return,
reorder_metadata_by_ivf_index=reorder_metadata_by_ivf_index,
enable_mclip_option=enable_mclip_option,
use_jit=use_jit,
use_arrow=use_arrow,
provide_safety_model=provide_safety_model,
provide_violence_detector=provide_violence_detector,
provide_aesthetic_embeddings=provide_aesthetic_embeddings,
),
)
knnservice = KnnService(clip_resources=clip_resources)
In the code of clip_retrieval/clip_back.py, KnnService.query, Line 466
results = self.map_to_metadata(
indices, distances, num_images, clip_resource.metadata_provider, clip_resource.columns_to_return
)
and in results, I could only have 'url' and 'caption' like:
[{'url': 'https://s3.us-west-2.amazonaws.com/prod.retreat.guru/images/16212/medium/photo%20%280000000E%29.JPG',
'caption': 'Soul Safari Holistic Retreats'}]
But I noticed that the indices are list like:
[193396883, 169693704, 226852546, 94594796, 10774506, 139003161, 3917167, 217605597, 191966779, 197146260]
As you mentioned in other issues before, the url links are gradually
becoming unavailable. I think It would be more possible to access it from
files I have downloaded. So my question is: how could I directly get the
image data from such indices?
—
Reply to this email directly, view it on GitHub
<#377>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437XURHE46TVG6W4CQKDYY3HOJAVCNFSM6AAAAABE3NF7HWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4TCOJXGY2TENQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Emm, sorry for I'm not very familiar with these. Could you please explain them in more detail? e.g. "urls pointing to an http BTW, I noticed that in the dataframe of a parquet file, the key
But it didn't come up in retrieved meta (with only caption and url) when I start KnnService with Arrow file. If I directly use parquet files ( >>> metas[0]['image_path'], indices[0]
('194406653', 193396883) |
Can you explain from the beginning what you are trying to do ? I don't
follow
On Mon, Mar 18, 2024, 3:09 PM Maxwells_Ayakashi ***@***.***>
wrote:
… Hi, you can replace the urls in the metadata by urls pointing to an http
service (eg nginx) where you host the images you have downloaded
… <#m_3694132828550334832_>
On Mon, Mar 18, 2024, 12:43 PM Maxwells_Ayakashi *@*.*> wrote: Hi, Rom. I
have downloaded laion400m and launched KnnService with follow arguments:
indices_paths="indices_paths_ViTL14.json" clip_model="ViT-L/14"
enable_hdf5=False enable_faiss_memory_mapping=True columns_to_return =
["url", "image_path", "caption", "NSFW"]
reorder_metadata_by_ivf_index=False enable_mclip_option=True use_jit=False
use_arrow=True provide_safety_model=False provide_violence_detector=False
provide_aesthetic_embeddings=True clip_resources = load_clip_indices(
indices_paths=indices_paths, clip_options=ClipOptions( indice_folder="",
clip_model=clip_model, enable_hdf5=enable_hdf5,
enable_faiss_memory_mapping=enable_faiss_memory_mapping,
columns_to_return=columns_to_return,
reorder_metadata_by_ivf_index=reorder_metadata_by_ivf_index,
enable_mclip_option=enable_mclip_option, use_jit=use_jit,
use_arrow=use_arrow, provide_safety_model=provide_safety_model,
provide_violence_detector=provide_violence_detector,
provide_aesthetic_embeddings=provide_aesthetic_embeddings, ), ) knnservice
= KnnService(clip_resources=clip_resources) In the code of
clip_retrieval/clip_back.py, KnnService.query, Line 466 results =
self.map_to_metadata( indices, distances, num_images,
clip_resource.metadata_provider, clip_resource.columns_to_return ) and in
results, I could only have 'url' and 'caption' like: [{'url':
'https://s3.us-west-2.amazonaws.com/prod.retreat.guru/images/16212/medium/photo%20%280000000E%29.JPG
<https://s3.us-west-2.amazonaws.com/prod.retreat.guru/images/16212/medium/photo%20%280000000E%29.JPG>',
'caption': 'Soul Safari Holistic Retreats'}] But I noticed that the indices
are list like: [193396883, 169693704, 226852546, 94594796, 10774506,
139003161, 3917167, 217605597, 191966779, 197146260] As you mentioned in
other issues before, the url links are gradually becoming unavailable. I
think It would be more possible to access it from files I have downloaded.
So my question is: how could I directly get the image data from such
indices? — Reply to this email directly, view it on GitHub <#377
<#377>>, or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAR437XURHE46TVG6W4CQKDYY3HOJAVCNFSM6AAAAABE3NF7HWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4TCOJXGY2TENQ
<https://github.com/notifications/unsubscribe-auth/AAR437XURHE46TVG6W4CQKDYY3HOJAVCNFSM6AAAAABE3NF7HWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4TCOJXGY2TENQ>
. You are receiving this because you are subscribed to this thread.Message
ID: @.*>
Emm, sorry for I'm not very familiar with these. Could you please explain
them in more detail? e.g. "urls pointing to an http
service where you host the images"
BTW, I noticed that in the dataframe of a parquet file, the key image_path
exists:
image_path caption NSFW ... original_height exif sha256
0 000000007 Ben Affleck Could Be Latest Addition To <em>Th... UNLIKELY ... 320 {} 6561021576f886c0334b06955cea13e973101f296e0280...
1 000000015 60 Pcs Table Decorations Supplies Moana Themed... UNLIKELY ... 200 {} 2432d4ca862e078d911e9becdd7aa7bd85e5832ec5e44f...
2 000000001 Silverline Air Framing Nailer 90mm 10 - 12 Gau... UNLIKELY ... 225 {} b453f327a45b2b734772d8b38d12c1a441b0d69ceb458e...
3 000000049 Mini girls green crochet floral top UNLIKELY ... 300 {} 0ba5c4d3842b670ec67a95227121c84944d73436b95fcf...
4 000000075 HARRY CHAPIN - Soundstage: An Evening With Har... UNLIKELY ... 200 {} 1cc2add844cdab60decf867ba4242e88fa95b814e6799b...
...
But it didn't come up in retrieved meta (with only caption and url) when I
start KnnService with Arrow file. If I directly use parquet files (
use_arrow=False) to start KnnService, there are image_path in retrieved
metas, but it didn't match with indices:
>>> metas[0]['image_path'], indices[0]
('194406653', 193396883)
—
Reply to this email directly, view it on GitHub
<#377 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437QD6N5KHBFSIKQV3VTYY3YP5AVCNFSM6AAAAABE3NF7HWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBUGAZDCOJUGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
First, I downloaded laion400m and launched KnnService with So my question is: Is it possible to access image data locally from a index number? If so, which is the correct index?
Thank you so much for taking time to answer my question. I apologize for any misunderstood caused. |
Where would you like the jpeg bytes to appear after querying the index exactly? Is it in the browser or some other place? |
In python file, since I'm going to do retrieval augmented generation. |
Hi, Rom. I have downloaded laion400m and launched KnnService with follow arguments:
In the code of
clip_retrieval/clip_back.py, KnnService.query, Line 466
and in
results
, I could only have 'url' and 'caption' like:But I noticed that the
indices
are list like:As you mentioned in other issues before, the url links are gradually becoming unavailable. I think It would be more possible to access it from files I have downloaded. So my question is: how could I directly get the image data from such indices?
The text was updated successfully, but these errors were encountered: