Skip to content

Commit

Permalink
nodejs: add a nodejs version (with: future, stream, ...).
Browse files Browse the repository at this point in the history
  • Loading branch information
Lionel Atty committed Jan 20, 2020
1 parent e28bff9 commit 7af6fd4
Show file tree
Hide file tree
Showing 5 changed files with 213 additions and 11 deletions.
109 changes: 109 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,114 @@
cats/

# https://raw.githubusercontent.com/github/gitignore/master/Node.gitignore
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
lerna-debug.log*

# Diagnostic reports (https://nodejs.org/api/report.html)
report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json

# Runtime data
pids
*.pid
*.seed
*.pid.lock

# Directory for instrumented libs generated by jscoverage/JSCover
lib-cov

# Coverage directory used by tools like istanbul
coverage
*.lcov

# nyc test coverage
.nyc_output

# Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)
.grunt

# Bower dependency directory (https://bower.io/)
bower_components

# node-waf configuration
.lock-wscript

# Compiled binary addons (https://nodejs.org/api/addons.html)
build/Release

# Dependency directories
node_modules/
jspm_packages/

# TypeScript v1 declaration files
typings/

# TypeScript cache
*.tsbuildinfo

# Optional npm cache directory
.npm

# Optional eslint cache
.eslintcache

# Microbundle cache
.rpt2_cache/
.rts2_cache_cjs/
.rts2_cache_es/
.rts2_cache_umd/

# Optional REPL history
.node_repl_history

# Output of 'npm pack'
*.tgz

# Yarn Integrity file
.yarn-integrity

# dotenv environment variables file
.env
.env.test

# parcel-bundler cache (https://parceljs.org/)
.cache

# Next.js build output
.next

# Nuxt.js build / generate output
.nuxt
dist

# Gatsby files
.cache/
# Comment in the public line in if your project uses Gatsby and not Next.js
# https://nextjs.org/blog/next-9-1#public-directory-support
# public

# vuepress build output
.vuepress/dist

# Serverless directories
.serverless/

# FuseBox cache
.fusebox/

# DynamoDB Local files
.dynamodb/

# TernJS port file
.tern-port

# Stores VSCode versions used for testing VSCode extensions
.vscode-test

# Created by https://www.gitignore.io/api/python
# Edit at https://www.gitignore.io/?templates=python

Expand Down
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ image_downloader_mp: clean-img ## download images with multiprocessing version
@python $(PYTHON_ROOTDIR)/multiprocessing/image_downloader.py \
$(URL_IMG)

nodejs_image_downloader: clean-img ## download images with node-js version
@node src/nodejs/image_downloader.js

clean: clean-img clean-pyc ## remove all venv, build, coverage and Python artifacts

img-export-dir: ## create images export directory
Expand Down
60 changes: 49 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,30 +81,31 @@ Command being timed: "make image_downloader_mp"
Page size (bytes): 4096
```

```sh
```bash
╰─ /usr/bin/time -v make image_downloader_aio
find cats -name '*.jpg' -exec rm -f {} +
Nb url images: 1183
Downloading: https://cdn.pixabay.com/photo/2017/06/12/19/02/cat-2396473__480.jpg
[...]
Download complete: https://cdn.pixabay.com/photo/2014/10/29/22/12/cat-508665__480.jpg
Command being timed: "make image_downloader_aio"
User time (seconds): 3.95
System time (seconds): 0.98
Percent of CPU this job got: 34%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:14.51
Command being timed: "make image_downloader_aio"
User time (seconds): 6.26
System time (seconds): 1.43
Percent of CPU this job got: 54%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:14.24
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 59300
Maximum resident set size (kbytes): 60476
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 10
Minor (reclaiming a frame) page faults: 14560
Voluntary context switches: 37895
Involuntary context switches: 653
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 21267
Voluntary context switches: 38882
Involuntary context switches: 185
Swaps: 0
File system inputs: 7096
File system inputs: 0
File system outputs: 132632
Socket messages sent: 0
Socket messages received: 0
Expand All @@ -113,6 +114,43 @@ Command being timed: "make image_downloader_aio"
Exit status: 0
```

```bash
╰─ /usr/bin/time -v make nodejs_image_downloader
find cats -name '*.jpg' -exec rm -f {} +
0
1
2
3
[...]
52
51
50
End.
Command being timed: "make nodejs_image_downloader"
User time (seconds): 6.77
System time (seconds): 2.02
Percent of CPU this job got: 54%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.19
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 95616
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 23830
Voluntary context switches: 35553
Involuntary context switches: 192
Swaps: 0
File system inputs: 0
File system outputs: 132936
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
```

### Images folder sample

![cats](https://snipboard.io/VzlD78.jpg)
36 changes: 36 additions & 0 deletions src/nodejs/image_downloader.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
const request = require('request').defaults({
pool: {maxSockets: Infinity},
timeout: 10 * 1000,
});
const bluebird = require('bluebird');
const fs = require('fs-extra');
const CONCURRENCY_LEVEL = 50;

async function download(url, index) {
console.log(index);
if (!url) {
return Promise.resolve();
}
return new Promise((resolve, reject) => {
const r = request(url);
const f = fs.createWriteStream(`cats/${index}.jpg`);
r.on('error', reject);
r.pipe(f);
f.on('finish', resolve);
f.on('error', reject);
})
.catch((err) => {
console.log(`Skipping error ${err.message} for ${index}`)
})
}

async function main() {
const catsBuffer = await fs.readFile('cats.txt');
const cats = catsBuffer.toString().split('\n');
await bluebird.map(cats, download, {concurrency: CONCURRENCY_LEVEL})
}

main()
.then(() => console.log('End.'))
.catch((err) => console.log(`Error: ${err.toString()}\n${err.stack}`))
;
16 changes: 16 additions & 0 deletions src/nodejs/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"name": "image_downloader",
"version": "1.0.0",
"description": "Images Downloader",
"main": "image_downloader.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"author": "",
"license": "ISC",
"dependencies": {
"bluebird": "^3.7.2",
"fs-extra": "^8.1.0",
"request": "^2.88.0"
}
}

0 comments on commit 7af6fd4

Please sign in to comment.