Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow full downloads on large repos #79

Open
pbsladek opened this issue Dec 5, 2019 · 7 comments
Open

Allow full downloads on large repos #79

pbsladek opened this issue Dec 5, 2019 · 7 comments
Labels
bug Something isn't working
Milestone

Comments

@pbsladek
Copy link

pbsladek commented Dec 5, 2019

Thx for writing this.. has eased a lot of stuff with repo transfers between instances.

Added this around line 176 of nexus_client.py

try:
     content = response.json()
   except json.decoder.JSONDecodeError:
     raise exception.NexusClientAPIError(response.content)

and got the following:

nexuscli.exception.NexusClientAPIError: b'ERROR: (ID 40b7714e-41ac-416a-8cbc-7fe8b7a0b639) 
Failed to execute phase [query], all shards failed;
shardFailures {[pUq2BG92Raedvpiju7ztMg][215bfafbc6963d8c9f7ec9a57f88c6223b00dfa6][0]:
RemoteTransportException[[8A042D9D-8A28015E-AF29C30B-EC2AE858-7B66110B][local[1]][indices:data/read/search[phase/query]]]; nested: 
QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10050]. 
See the scroll api for a more efficient way to request large data sets.
This limit can be set by changing the [index.max_result_window]
index level parameter.]; }

I think this is an elastic search issue with nexus etc and I doubt they will fix it anytime soon.

Wrote a quick wrapper to use the cli to download every sub directory. Would be cool if you could implement a way to do this within the cli.

e.g. grab the directory structure and download each sub directory 1 by 1.

@bt-thiago
Copy link
Contributor

Thanks for the bug report @pbsladek. I'm looking to reproduce this locally - how big is too big? I'm assuming 10,001 files.

I have a feeling that the directory iteration strategy might still fail for directories with a number of files above 10k.

@bt-thiago bt-thiago added the bug Something isn't working label Dec 6, 2019
@bt-thiago
Copy link
Contributor

@bt-thiago
Copy link
Contributor

bt-thiago commented Dec 6, 2019

To reproduce:

nexus3 repository create raw raw
from nexuscli import nexus_client, nexus_config
config = nexus_config.NexusConfig()
config.load()
c = nexus_client.NexusClient(config)
for i in range(10001):
    c.upload('/dev/null', f'raw/a{i}')
nexus3 dl raw/ .

@bt-thiago
Copy link
Contributor

@pbsladek the error can happen on a single directory with more than 10k files, so the strategy of breaking-up downloads per directory won't always work, although it would probably cover most cases.

I might implement your suggestion but I'd like to think about this for a bit.

@pbsladek
Copy link
Author

pbsladek commented Dec 6, 2019

Hey, no problem. Thanks for taking a look.

I ran into the same issue you mentioned on our repos. Generated a list of file names and downloaded them individually. Prob wouldn't make sense for the cli to manage that though.

@thiagofigueiro thiagofigueiro added this to the 3.1.0 milestone Feb 18, 2020
@forgondolin
Copy link

Hi, this issue can also be related to migrate repositories from a server A to a server B? Like doing a full download of all your components for a migration?
thanks

@thiagofigueiro
Copy link
Owner

Oi, @forgondolin. Yes, you would probably see this in an operation like you describe. I just looked at the (upstream issue)[https://issues.sonatype.org/browse/NEXUS-16917] and Sonatype won't be fixing this any time soon. It might help if everyone who sees this issues goes there and upvotes the bug.

Meanwhile, use the workaround that @pbsladek suggested: break-up your downloads into chunks of up to 10,000 files.

I'm happy to review PR contributions for a work-around but I'm also unlikely to do it myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants