PTT圖片下載器 (Python) For Windows and Linux

A crawler picture for web PTT

Demo Video NEW - Windows - 2017/4/23 update
Demo Video - Linux
Demo Video - Windows

教學

2018/12/18

refactor oop，如要看舊版，請參考 567482ba6e。

請先確認電腦有安裝 Python 3.6.6

接著安裝套件

請先切換到該目錄底下，接著在你的命令提示字元 (cmd ) 底下輸入

pip install -r requirements.txt

基本上安裝應該沒什麼問題。

特色

抓取PTT 圖檔(包含推文)
可指定要抓取的看板以及推文數多少以上

輸出格式

資料夾為文章標題加上推文數，資料夾內為圖片

效能優化

在 python 中有 Multiprocessing 以及 Threading，兩個使用的時機用比較容易的區分分法為，

當有高 CPU ( CPU-bound ) 計算的工作時，我們使用 Multiprocessing

當有大量 I/O ( I/O-bound ) 的工作時，我們使用 Threading

使用 concurrent.futures 優化效能

本範例來說，我們大量下載圖片，是使用 Threading 才對，不過我們之前使用 Multiprocessing 。

當下載量大時，速度會差到兩倍 , 在這種大量 I/O ( I/O-bound ) 的情境下，使用 Threading 才是對的選擇。

建議使用 python 3.5 以上，因為 max_workers 如果沒有特別指定，預設會使用 CPU*5 的 workers 數量，如下說明

concurrent.futures.ThreadPoolExecutor(max_workers=None, thread_name_prefix='')

參考連結 https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor

使用方法

方法一(指定看板抓圖)

python beauty_spider2.py [板名] [爬幾頁] [推文多少以上]

方法二(指定網址抓圖)

python download_beauty.py [輸入內容.txt]

爬蟲是利用 PTT 網頁版，所以頁面以網頁版為標準。

請參考：

https://www.ptt.cc/bbs/AKB48/index.html

執行範例

範例一(指定看板抓圖)

python beauty_spider2.py beauty 3 10

爬 PTT beauty 板 ( 表特板 ) 3頁文章內容，然後只下載推文數 >= 10 的文章內容圖片

執行畫面 - 1

輸出畫面 - 1

也可以指定其他看板，如下

python beauty_spider2.py AKB48 3 10

範例二(指定網址抓圖)

python download_beauty.py input.txt

爬 input.txt 檔案內的PTT文章連結圖片 , input.txt 檔案

執行畫面 - 2

輸出畫面 - 2

執行環境

Python 3.6.6

Donation

如果有幫助到您，也想鼓勵我的話，歡迎請我喝一杯咖啡:laughing:

贊助者付款

License

MIT license

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.gitignore		.gitignore
README.md		README.md
beauty_spider2.py		beauty_spider2.py
crawler.py		crawler.py
download_beauty.py		download_beauty.py
input.txt		input.txt
requirements.txt		requirements.txt
run_time.py		run_time.py
test_func_imge_url.py		test_func_imge_url.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PTT圖片下載器 (Python) For Windows and Linux

教學

特色

輸出格式

效能優化

使用方法

執行範例

執行畫面 - 1

輸出畫面 - 1

執行畫面 - 2

輸出畫面 - 2

執行環境

Donation

License

About

Releases

Packages

Languages

twtrubiks/PTT_Beauty_Spider

Folders and files

Latest commit

History

Repository files navigation

PTT圖片下載器 (Python) For Windows and Linux

教學

特色

輸出格式

效能優化

使用方法

執行範例

執行畫面 - 1

輸出畫面 - 1

執行畫面 - 2

輸出畫面 - 2

執行環境

Donation

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages