-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1.0.3.1] Problem: Overwrite only if different in size not working correctly anymore #18
Comments
Content-Length Filesize: -1 The application unable to get the filesize, so it cannot compare the filesize. The default action is download it anyway. Is the downloaded file is the same? If yes, I can do post-checking. |
Yes, they are exactly same. Like I've said previously elsewhere, the website doesn't allow members to modify uploaded pictures (afaik), meaning that changed files are only possible case for avatars. A post-check would do the thing I guess, it would probably need to re-do the online filesize check if returned value is 0 or less. Until this problem is addressed somehow, it seems a better idea for me would be to just disable overwriting completely, provided I don't care about updated avatars. FYI, from the way how avatar changing works, you can see that updated image has a different filename on the server. The added "13xxxxxxxx" part is a timestamp in hex format. Even though the scheme is different for uploaded images, you can still see it serves the same purpose - a timestamp. This means it's possible to find changed/updated files by comparing filenames, rather than filesizes. In case a user chose to use {serverFilename} you could monitor the changes like this while still not using the database. This scheme used on the server may or may not be changed in future, but it may still provide valid results with proper use. Another possibility is to write actual modified-time as creation time for all downloaded files from corresponding server responses, but this may create an issue with timezones changed on the user pc after the file was saved. This may also be more complex to implement. |
Thanks, will keep an eye on the log with this version and overwriting enabled. While it's still running now, noticed the following things:
So far it looks like it doesn't try to do any redundant dupes. By the way, I've noticed that my daily log size got decreased from ~20mb to ~7mb after disabling overwriting. I think it would really help to switch from filesize check to something else for regular images. Right now it looks like in case of changed/updated image under the same image id the app won't even be able to detect it as already downloaded, since the image file will have different name - then it will be downloaded as a new file and the old file under same id will never be touched again. |
For file size check, I need to depend on the filename and the file size, as the filename format is customize able, so I cannot depend on the server filename, unless I do save the image id to the db (only single image, no manga), and record the downloaded filesize (not done), and then refer to those to do the check. |
I forgot to tell that I was referring to the case if user-defined filename contains {serverFilename}. Since it's not included by default, this indeed would need to use the db.
I was trying to tell you that filesize check is useless if you take serverFilename into account, since with current scheme it'll be always same for same serverFilenames. |
DB updated, now record both filesize and extracted filename on server (based on the url), no detection logic yet. Technically, we can compare the url/server filename on the DB with the actual url parsed from the image page. |
As of yesterday, April 13, 2014, the app can no longer detect filesizes of many images, thus re-downloading and re-writing tons of files that are already downloaded. Up until yesterday I never had this problem.
Log file says:
Content-Length Filesize: -1 File Exists: [xxx].jpg, different size: 489199 vs -1, backing up to: [xxx].jpg.1397402216,98304
For some reason I can't reproduce this with a test batchjob.xml of 1 member job. It either has something to do with too many concurrent jobs (I haven't increased it though, it's still 6) or temporary server issues.
Would be great if you could add some kind of protection against incorrect server responses about filesizes.
EDIT:
I've now deleted all dupes after some fiddling with logs and cmd.exe. There were 12018 dupes after 2 runs (yesterday and today). My current download folder contains ~21000 files, so as you can guess the app created dupes for over 50% of downloaded files.
The text was updated successfully, but these errors were encountered: