Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dumpgenerator.py stuck with files requiring authentication #89

Open
GoogleCodeExporter opened this issue Apr 24, 2016 · 2 comments
Open

Comments

@GoogleCodeExporter
Copy link

Example: the http://qed.princeton.edu wiki. The script reaches the first file 
requiring authentication, asks for user and password and entering bogus 
credentials doesn't help because it keeps doing so. I've also no idea how to 
deal with such a case because apparently the webserver returns HTTP 302, not 
HTTP 401. Needs some debugging/research on how to detect these cases and skip 
the files in question.

$ wget -S http://qed.princeton.edu/images/1/10/JR_HMG050409.tif
--2014-01-31 16:03:26--  http://qed.princeton.edu/images/1/10/JR_HMG050409.tif
Risoluzione di qed.princeton.edu (qed.princeton.edu)... 128.112.131.223
Connessione a qed.princeton.edu (qed.princeton.edu)|128.112.131.223|:80... 
connesso.
Richiesta HTTP inviata, in attesa di risposta... 
  HTTP/1.1 302 Found
  Date: Fri, 31 Jan 2014 15:03:26 GMT
  Server: Apache/2.2.3 (Oracle)
  Location: http://qed.princeton.edu/getfile.php?f=JR_HMG050409.tif
  Content-Length: 321
  Keep-Alive: timeout=15, max=100
  Connection: Keep-Alive
  Content-Type: text/html; charset=iso-8859-1
Posizione: http://qed.princeton.edu/getfile.php?f=JR_HMG050409.tif [segue]
--2014-01-31 16:03:26--  http://qed.princeton.edu/getfile.php?f=JR_HMG050409.tif
Riutilizzo della connessione esistente a qed.princeton.edu:80.
Richiesta HTTP inviata, in attesa di risposta...
  HTTP/1.1 302 Found
  Date: Fri, 31 Jan 2014 15:03:26 GMT
  Server: Apache/2.2.3 (Oracle)
  X-Powered-By: PHP/5.1.6
  Location: /index.php/QED:Restricted_File
  Content-Length: 0
  Keep-Alive: timeout=15, max=99
  Connection: Keep-Alive
  Content-Type: text/html; charset=ISO-8859-1
Posizione: /index.php/QED:Restricted_File [segue]
--2014-01-31 16:03:27--  http://qed.princeton.edu/index.php/QED:Restricted_File
Riutilizzo della connessione esistente a qed.princeton.edu:80.
Richiesta HTTP inviata, in attesa di risposta...
  HTTP/1.1 200 OK
  Date: Fri, 31 Jan 2014 15:03:27 GMT
  Server: Apache/2.2.3 (Oracle)
  X-Powered-By: PHP/5.1.6
  Content-language: en
  Vary: Accept-Encoding,Cookie
  X-Vary-Options: Cookie;string-contains=TigerWeb_tw_UserID;string-contains=TigerWeb_tw__session,Accept-Encoding;list-contains=gzip
  Expires: Thu, 01 Jan 1970 00:00:00 GMT
  Cache-Control: private, must-revalidate, max-age=0
  Last-modified: Wed, 27 Aug 2008 18:14:18 GMT
  Keep-Alive: timeout=15, max=98
  Connection: Keep-Alive
  Transfer-Encoding: chunked
  Content-Type: text/html; charset=utf-8
Lunghezza: non specificato [text/html]
Salvataggio in: "JR_HMG050409.tif"
$ file JR_HMG050409.tif
JR_HMG050409.tif: HTML document, ASCII text, with very long lines

Original issue reported on code.google.com by [email protected] on 31 Jan 2014 at 3:07

@GoogleCodeExporter
Copy link
Author

Original comment by [email protected] on 31 Jan 2014 at 3:08

  • Changed title: dumpgenerator.py stuck with files requiring authentication

@GoogleCodeExporter
Copy link
Author

That is:

Checking index.php... http://qed.princeton.edu/index.php
index.php is OK
Analysing http://qed.princeton.edu/index.php
Loading config file...
Resuming previous dump process...
Title list was completed in the previous session
XML dump was completed in the previous session
Image list was completed in the previous session
2224 images were found in the directory from a previous session
Retrieving images from "MG©Colonial Distribution of the World 1914.jpg"
Enter username for access to restricted images (Princeton University netid) at 
qed.princeton.edu:
Enter password for  in access to restricted images (Princeton University netid) 
at qed.princeton.edu:
Enter username for access to restricted images (Princeton University netid) at 
qed.princeton.edu:
Enter password for  in access to restricted images (Princeton University netid) 
at qed.princeton.edu:
Enter username for access to restricted images (Princeton University netid) at 
qed.princeton.edu:
Enter password for  in access to restricted images (Princeton University netid) 
at qed.princeton.edu:

...

Original comment by [email protected] on 2 Feb 2014 at 10:48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant