dataminingchallenge

Before starting coding, I started of inspecting the requests from this link. Where I came across this URL from where the data was being fetched.

From the URL, I ran some experiments with "limit" and "offset" parameters. I found out that I can go till limit 400 in a single request. In the data, I found out the I can fetch number of products found from the query which helped me in computed number of pages beforehand. In this case, it was 2 pages since the number of products were 418 with respect to limit 400.

Finally, I also inspected robots.txt file. Where I looked for blocked pages and specially the crawl-delay which was not found therefore, I didn't applied any delay notion in my code.

I tested my code in docker using the following commands (this include getting the output to host machine rather than the container itself):
docker build -t zalora_scraping .

docker run --rm -it -v %cd%:/code/ zalora_scraping

To run this project simply execute the following command:

docker-compose up

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
output.PNG		output.PNG
scraped_products_sample.json		scraped_products_sample.json
zalora_scraping.py		zalora_scraping.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dataminingchallenge

About

Releases

Packages

Languages

ahmedrao/dataminingchallenge

Folders and files

Latest commit

History

Repository files navigation

dataminingchallenge

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages