Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Inconsistent results from search_datasets #884

Open
1 task done
giswqs opened this issue Nov 29, 2024 · 1 comment
Open
1 task done

[BUG] Inconsistent results from search_datasets #884

giswqs opened this issue Nov 29, 2024 · 1 comment

Comments

@giswqs
Copy link

giswqs commented Nov 29, 2024

Is this issue already tracked somewhere, or is this a new report?

  • I've reviewed existing issues and couldn't find a duplicate for this problem.

Current Behavior

I am trying to search the NASA OPERA data products using earthaccess. Using the OPERA keyword to search datasets can retrieve the eight OPERA data products correctly. However, using the * keyword to search only returns seven OPERA data products. The OPERA_L3_DSWX-S1_V1 data product is missing from the search results. This is problematic because I rely on using the * keyword to retrieve the entire NASA Earth Data as a CSV file for the NASA-Earth-Data repo.

Relevant repo: OPERA_Applications @alhandwerger

Expected Behavior

Using the * keyword to search should return all eight OPERA data products.

Steps To Reproduce

This code snippet can retrieve the eight OPERA data products correctly.

datasets = earthaccess.search_datasets(keyword="OPERA")
for dataset in datasets:
    print(dataset["umm"]["ShortName"])

image

This code snippet only returns seven OPERA data products. The OPERA_L3_DSWX-S1_V1 data product is missing.

datasets = earthaccess.search_datasets(keyword="*")
for dataset in datasets:
    if "OPERA" in dataset["umm"]["ShortName"]:
        print(dataset["umm"]["ShortName"])

image

Environment

- OS: Manjaro Linux
- Python: 3.12

Additional Context

No response

@mfisher87
Copy link
Collaborator

mfisher87 commented Nov 29, 2024

Thanks for the report @giswqs ! I need to look more deeply, but on my first pass I don't feel good about this:

>>> q = DataCollections().parameters(keyword="*")
>>> q.hits()
9400
>>> len(q.get_all())
8542
>>> datasets = earthaccess.search_datasets(keyword="*")
>>> len(datasets)
8544

Is there a bug in paging behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants