Skip to content

Ntrashh/traspider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

traspider

Downloads Downloads Downloads

简介

traspider是一个开箱即用的轻量爬虫框架

如果你需要写一个小的爬虫,使用traspider会让你事半功倍

github地址 : https://github.com/Ntrashh/traspider

文档地址: https://ntrashh.github.io/traspider/

环境要求

  • Python 3.7.0+
  • Works on Linux, Windows, macOS

安装

pip3 install traspider

使用

创建爬虫

traspider create -s demo_spider

生成代码 添加需要爬取的网址 http://httpbin.org/

from loguru import logger
from traspider import Spider

class DemoSpider(Spider):

    def __init__(self):
        self.urls  = ["http://httpbin.org/"]


    def parser(self, response, request):
        logger.info(response)

    async def download_middleware(self, request):
        request.headers = {
                "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36"
            }
        return request

if __name__ == "__main__":
    demo_spider = DemoSpider()
    demo_spider.start()

traspider这个项目开始之初就是为了爬虫在开发一些简单的项目能够更轻更快,所以对大型项目支持还是不够好。如果开发的是大型爬虫项目,推荐你使用feapderscrapy

如果你在使用过程中对traspider有任何问题或建议可以联系我

微信:

wechat

邮箱: [email protected]

鸣谢

hoopa

feapder

scrapy

huangjin

PyCharm logo.

About

一个开箱即用的轻量异步爬虫框架

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages