dataminingchallenge

Before starting coding, I started of inspecting the requests from this link. Where I came across this URL from where the data was being fetched.

From the URL, I ran some experiments with "limit" and "offset" parameters. I found out that I can go till limit 400 in a single request. In the data, I found out the I can fetch number of products found from the query which helped me in computed number of pages beforehand. In this case, it was 2 pages since the number of products were 418 with respect to limit 400.

Finally, I also inspected robots.txt file. Where I looked for blocked pages and specially the crawl-delay which was not found therefore, I didn't applied any delay notion in my code.

I tested my code in docker using the following commands (this include getting the output to host machine rather than the container itself):
docker build -t zalora_scraping .

docker run --rm -it -v %cd%:/code/ zalora_scraping

To run this project simply execute the following command:

docker-compose up

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

dataminingchallenge

Files

README.md

Latest commit

History

README.md

File metadata and controls

dataminingchallenge