Skip to content

ahmedrao/dataminingchallenge

Repository files navigation

dataminingchallenge

Before starting coding, I started of inspecting the requests from this link. Where I came across this URL from where the data was being fetched.

From the URL, I ran some experiments with "limit" and "offset" parameters. I found out that I can go till limit 400 in a single request. In the data, I found out the I can fetch number of products found from the query which helped me in computed number of pages beforehand. In this case, it was 2 pages since the number of products were 418 with respect to limit 400.

Finally, I also inspected robots.txt file. Where I looked for blocked pages and specially the crawl-delay which was not found therefore, I didn't applied any delay notion in my code.

I tested my code in docker using the following commands (this include getting the output to host machine rather than the container itself):
docker build -t zalora_scraping .

docker run --rm -it -v %cd%:/code/ zalora_scraping


To run this project simply execute the following command:

docker-compose up

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published