Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ls/ls-url: introduce --size #9854

Merged
merged 1 commit into from
Aug 17, 2023
Merged

ls/ls-url: introduce --size #9854

merged 1 commit into from
Aug 17, 2023

Conversation

efiop
Copy link
Contributor

@efiop efiop commented Aug 17, 2023

I really need it only for studio right now, but this is really nice to have in a daily life as well.

Caveat: we just throw whatever we have in size field, which means that for dvcfs, you might get a disk-usage size for a directory instead of its physical size that usually is completely useless (see your ls -la). I think that's acceptable for now, but we will need dvc du for du purposes in the future.

E.g.

$ dvc ls-url . --size
108    Metadata-and-Results.dvc
97     AudioMP3.dvc
97     AudioWAV.dvc
12181  README.md
54     .gitignore
168    .gitattributes
99     VideoFlash.dvc
312    LICENSE.txt
139    .dvcignore
96     AudioMP3
96     Asset
192    .dvc
64     Metadata-and-Results
64     AudioWAV
192    Scripts
64     VideoFlash
384    .git
96     .vscode
$ dvc list . --size
139    .dvcignore
168    .gitattributes
54     .gitignore
96     .vscode
96     Asset
96     AudioMP3
97     AudioMP3.dvc
64     AudioWAV
97     AudioWAV.dvc
312    LICENSE.txt
64     Metadata-and-Results
108    Metadata-and-Results.dvc
12181  README.md
192    Scripts
64     VideoFlash
99     VideoFlash.dvc

Related https://github.com/iterative/studio/pull/6541

entries = _prettify(entries, with_color=True)
ui.write("\n".join(entries))
entries = _prettify(entries, with_color=True, with_size=self.args.size)
ui.table(entries)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kudos to @skshetry for ui.table, feels really nice to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that ui.table() can be slow when we are printing a million lines. 😄
So we were using raw prints and "\n".join() to print everything at once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry Good point. I guess we can keep that optimization in the simplest case. I'll take a look.

@skshetry
Copy link
Member

I thought we had --humanize but looks like not. :(

@skshetry
Copy link
Member

@efiop, do you mind humanizing the sizes by default?

@efiop
Copy link
Contributor Author

efiop commented Aug 17, 2023

@skshetry I wanted to avoid doing that, because I really need byte sizes at this point, and if we support both, then we'll need to support some kind of size formatting option. Do you have particualr suggestions that you'd like?

@skshetry
Copy link
Member

skshetry commented Aug 17, 2023

@skshetry I wanted to avoid doing that, because I really need byte sizes at this point, and if we support both, then we'll need to support some kind of size formatting option. Do you have particualr suggestions that you'd like?

I think it could be just --humanize or --human-readable similar to du -h output. We should not provide any other size formatting options.

@efiop
Copy link
Contributor Author

efiop commented Aug 17, 2023

@skshetry Actually, I don't need bytes in CLI, so we can indeed just humanize by default and stop there (maybe introduce some fmt option in the future if we need it, otherwise there is --json that has byte sizes). Thanks!

@skshetry
Copy link
Member

Btw, how are we getting sizes here? Would it be slow?

@efiop
Copy link
Contributor Author

efiop commented Aug 17, 2023

@skshetry We are getting them from info which is already there. In dvcfs case we just show what we have recorded in size field in dvcfile. And if there is none - we show nothing. No remote access or anything like that, plain and simple.

Btw, do you have a particular humanize library to recommend? I also thought we had it somewhere already, but indeed couldn't find it.

@skshetry
Copy link
Member

skshetry commented Aug 17, 2023

Btw, do you have a particular humanize library to recommend? I also thought we had it somewhere already, but indeed couldn't find it.

I don' know a library for this, but maybe something like this would be enough for us?
https://stackoverflow.com/a/1094933

@skshetry
Copy link
Member

skshetry commented Aug 17, 2023

@efiop, I just remembered that you can use tqdm as well.

from tqdm import tqdm

tqdm.format_sizeof(1000 * 1024, divisor=1024)

That's what I used in dvc-data du/count-objects.

https://github.com/iterative/dvc-data/blob/50c3f63b8b621e5194ccaae86268a5538d01fc46/src/dvc_data/cli.py#L352

@efiop efiop force-pushed the ls-sizes branch 2 times, most recently from e8b011e to bf1d0ee Compare August 17, 2023 17:16
@codecov
Copy link

codecov bot commented Aug 17, 2023

Codecov Report

Patch coverage: 60.00% and project coverage change: -0.02% ⚠️

Comparison is base (3081c60) 90.79% compared to head (9b4ee99) 90.77%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9854      +/-   ##
==========================================
- Coverage   90.79%   90.77%   -0.02%     
==========================================
  Files         471      471              
  Lines       35891    35900       +9     
  Branches     5185     5188       +3     
==========================================
+ Hits        32587    32588       +1     
- Misses       2713     2720       +7     
- Partials      591      592       +1     
Files Changed Coverage Δ
dvc/repo/ls.py 94.87% <ø> (ø)
dvc/repo/ls_url.py 100.00% <ø> (ø)
dvc/commands/ls/__init__.py 77.35% <42.85%> (-13.34%) ⬇️
dvc/commands/ls_url.py 95.00% <100.00%> (-0.24%) ⬇️
tests/func/test_ls.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@efiop efiop force-pushed the ls-sizes branch 3 times, most recently from f442dfa to bc53d96 Compare August 17, 2023 19:15
@efiop efiop added the feature is a feature label Aug 17, 2023
@efiop efiop self-assigned this Aug 17, 2023
if size is None:
size = ""
else:
size = tqdm.format_sizeof(size, divisor=1024)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to introduce some humanize util at some point when we'll need again.

I really need it only for studio right now, but this is really nice to have in a daily life
as well.

E.g.

```
$ dvc ls-url . --size
108    Metadata-and-Results.dvc
97     AudioMP3.dvc
97     AudioWAV.dvc
12181  README.md
54     .gitignore
168    .gitattributes
99     VideoFlash.dvc
312    LICENSE.txt
139    .dvcignore
96     AudioMP3
96     Asset
192    .dvc
64     Metadata-and-Results
64     AudioWAV
192    Scripts
64     VideoFlash
384    .git
96     .vscode
```

```
$ dvc list . --size
139    .dvcignore
168    .gitattributes
54     .gitignore
96     .vscode
96     Asset
96     AudioMP3
97     AudioMP3.dvc
64     AudioWAV
97     AudioWAV.dvc
312    LICENSE.txt
64     Metadata-and-Results
108    Metadata-and-Results.dvc
12181  README.md
192    Scripts
64     VideoFlash
99     VideoFlash.dvc
```
@efiop efiop merged commit 1723abc into iterative:main Aug 17, 2023
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature is a feature
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants