Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with inference #345

Open
wisebreadloaf opened this issue Jan 22, 2024 · 13 comments
Open

issue with inference #345

wisebreadloaf opened this issue Jan 22, 2024 · 13 comments

Comments

@wisebreadloaf
Copy link

~/clip-retriever  master [!?] +93 -97  98% ............................................................................................................................via 🐍 v3.11.6 (env)
❯ clip-retrieval inference --input_dataset ./source_images/ --output_folder ./output_folder/
The number of samples has been estimated to be 22
Starting the worker
dataset is 16
Starting work on task 0
warming up with batch size 256 on cuda
done warming up in 24.880407333374023s
Traceback (most recent call last):
File "/home/bored/clip-retriever/env/bin/clip-retrieval", line 8, in
sys.exit(main())
^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/clip_retrieval/cli.py", line 18, in main
fire.Fire(
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/clip_retrieval/clip_inference/main.py", line 154, in main
distributor()
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/clip_retrieval/clip_inference/distributor.py", line 17, in call
worker(
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/clip_retrieval/clip_inference/worker.py", line 127, in worker
runner(task)
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/clip_retrieval/clip_inference/runner.py", line 39, in call
batch = iterator.next()
^^^^^^^^^^^^^^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/clip_retrieval/clip_inference/reader.py", line 225, in iter
for batch in self.dataloader:
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 630, in next
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/torch/_utils.py", line 694, in reraise
raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
^^^^^^^^^^^^^^^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
~~~~~~~~~~~~^^^^^
File "/home/bored/clip-retriever/env/lib/python3.11/site-packages/clip_retrieval/clip_inference/reader.py", line 99, in getitem
image_file = self.image_files[key]
~~~~~~~~~~~~~~~~^^^^^
KeyError: 'image1.txt'

Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/spawn.py", line 132, in _main
self = reduction.pickle.load(from_parent)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 115, in setstate
self._semlock = _multiprocessing.SemLock._rebuild(*state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory

@hknguyen20
Copy link

Encountering same problem with Python3.10 on Macbook M1 running with cpu.

@rom1504
Copy link
Owner

rom1504 commented Jan 28, 2024

Is source folder empty ? Did you try with an absolute path

@rom1504
Copy link
Owner

rom1504 commented Jan 28, 2024

You may need to --enable_text False if you don't have any captions

@hknguyen20
Copy link

Thanks for answering. The source folder is not empty and I think it did read the folder as the output printed "The number of samples has been estimated to be ...". There is no captions but --enable_text False still gives the same error.

@rom1504
Copy link
Owner

rom1504 commented Jan 28, 2024 via email

@hknguyen20
Copy link

I'm using macOS Sonoma. The image_folder was created using img2dataset, and contains:
00000 00000.parquet 00000_stats.json
Trying absolute path gives same error, except the dataset size is different:

clip-retrieval inference --input_dataset image_folder --output_folder embeddings_folder --enable_text False
The number of samples has been estimated to be 124
Starting the worker
dataset is 12
Starting work on task 0
warming up with batch size 256 on cpu
done warming up in 17.178229808807373s
Traceback (most recent call last):
  File "/opt/homebrew/bin/clip-retrieval", line 8, in <module>
    sys.exit(main())
  File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/cli.py", line 18, in main
    fire.Fire(
  File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/main.py", line 154, in main
    distributor()
  File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/distributor.py", line 17, in __call__
    worker(
  File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/worker.py", line 125, in worker
    runner(task)
  File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/runner.py", line 39, in __call__
    batch = iterator.__next__()
  File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/reader.py", line 222, in __iter__
    for batch in self.dataloader:
  File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 435, in __iter__
    return self._get_iterator()
  File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 381, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1034, in __init__
    w.start()
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_image_dataset.<locals>.ImageDataset'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/synchronize.py", line 110, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory

With absolute path:

clip-retrieval inference --input_dataset /Users/hknguyen20/image_folder --output_folder embedding
s_folder --enable_text False
The number of samples has been estimated to be 124
Starting the worker
dataset is 30
Starting work on task 0
warming up with batch size 256 on cpu
done warming up in 16.550618886947632s
Traceback (most recent call last):
  File "/opt/homebrew/bin/clip-retrieval", line 8, in <module>
    sys.exit(main())
  File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/cli.py", line 18, in main
    fire.Fire(
  File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/homebrew/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/main.py", line 154, in main
    distributor()
  File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/distributor.py", line 17, in __call__
    worker(
  File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/worker.py", line 125, in worker
    runner(task)
  File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/runner.py", line 39, in __call__
    batch = iterator.__next__()
  File "/opt/homebrew/lib/python3.10/site-packages/clip_retrieval/clip_inference/reader.py", line 222, in __iter__
    for batch in self.dataloader:
  File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 435, in __iter__
    return self._get_iterator()
  File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 381, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1034, in __init__
    w.start()
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_image_dataset.<locals>.ImageDataset'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/synchronize.py", line 110, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory

@rom1504
Copy link
Owner

rom1504 commented Jan 28, 2024 via email

@hknguyen20
Copy link

I see. It's same issue as #142

@rom1504
Copy link
Owner

rom1504 commented Jan 29, 2024

Can you try making

class ImageDataset(Dataset):
top level and rerun ?

@rohun-tripathi
Copy link

I see a similar error when trying to use this -

KeyError: '00000/000000000.txt'

Happens after I captions with the same name as a the image next to the image. How do the captions have to be placed?

@rom1504
Copy link
Owner

rom1504 commented Jan 30, 2024

@rohun-tripathi can you provide more information? Command, environment,...

This is not expected

@hknguyen20
Copy link

I tried making top level, but when running end2end inference test I encountered AttributeError: Can't pickle local object 'create_webdataset_filter.<locals> and could not resolve this. In the end, could solve the initial error without modifying code by stopping multiprocessing, as pointed out in #220

@ShuxunoO
Copy link

If you met the same errors,trying to refer to: #352

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants