Skip to content

amoblin/Obsidian

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Obsidian - WebPage Crawler by json config

base on scrapy and just write json config like this:

{
    "name": "myspider",
    "prefix": "https://www.specialized.com/",
    "allowed_domains": ["specialized.com"],
    "start_urls": ["https://www.specialized.com/cn/zh/115443"],
    "entry": "",
    "link_array_pipline": [],
    "main_content_pipline": [{"type": "css", "value": ".js-body"}],
    "item_pipline": {
        "image": [{
            "type": "css", "value": ".carousel-hero__slide-image.js-lazy-slide-image::attr(data-src)"
        }]
    }
}

不需要开发,只需要会在浏览器中“审查元素”,然后书写json配置文件,就可以抓取网页上的特定数据。

目前直接可以使用的业务参见:https://github.com/amoblin/xiaozhua

About

web page data picker with json config

Resources

Stars

Watchers

Forks

Packages

No packages published