This repository contains practical web scraping examples using Playwright in both Python and Node.js. It’s organized to help you learn everything — from basics to advanced techniques.
Python 3.7+ or Node.js 18+
Playwright installed:
pip install playwright
playwright install
npm install playwright
The project is organized into two main folders: one for Python scripts and one for Node.js. Both have identical folder structures and contain the same functionality, but in different languages (Python for python/ and Node.js for node/).
.
├── Python/
│ ├── basics/
│ │ ├── launch_browser.py
│ │ ├── headless_vs_headful.py
│ │ └── open_multiple_tabs.py
│ ├── scraping/
│ │ ├── extract_text_title.py
│ │ ├── extract_links.py
│ │ ├── extract_images.py
│ │ ├── scrape_shadow_dom.py
│ │ ├── wait_for_element.py
│ │ ├── scrape_products_amazon.py
│ │ └── scrape_woocommerce.py
│ ├── selectors/
│ │ ├── select_by_css.py
│ │ ├── select_by_xpath.py
│ │ ├── select_by_role.py
│ │ └── select_by_text.py
│ ├── interactions/
│ │ ├── click_button.py
│ │ ├── fill_form.py
│ │ ├── select_dropdown.py
│ │ ├── hover_element.py
│ │ ├── click_pagination.py
│ │ └── infinite_scroll.py
│ ├── save data/
│ │ ├── save_json.py
│ │ ├── save_csv.py
│ │ ├── save_pdf.py
│ │ ├── download_files.py
│ │ └── screenshot_element.py
│ ├── auth/
│ │ ├── basic_auth.py
│ │ ├── save_cookies.py
│ │ └── reuse_cookies.py
│ ├── browser/
│ │ ├── set_user_agent.py
│ │ ├── use_proxy.py
│ │ └── emulate_device.py
│ ├── errors/
│ │ ├── retry_failed_requests.py
│ └── debug/
│ ├── record_video.py
│ ├── record_trace.py
│ ├── pause_script.py
│ └── debug_console.py
├── NodeJS/
│ └── [Same folder structure with .js equivalents]
Each script demonstrates a specific feature of Playwright and can be run independently.
This section has basic scripts that show how to launch a browser with Playwright, switch between headless and headful modes, and open multiple tabs in the same browser.
The table below lists the main commands for that.
Description | Python | Node.js |
---|---|---|
Launch browser | browser = await playwright.chromium.launch() |
const browser = await chromium.launch(); |
Headless mode | launch(headless=False) |
launch({ headless: false }) |
Open multiple tabs | context.new_page() |
context.newPage() |
You can check out the full scripts in the project folder.
This section contains scripts for extracting text, links, images, and working with complex elements like Shadow DOM or delayed content.
Description | Python | Node.js |
---|---|---|
Get page title | title = await page.title() |
const title = await page.title(); |
Extract links | page.eval_on_selector_all("a", "...") |
page.$$eval('a', ...) |
Get image URLs | img.get_attribute("src") |
img.getAttribute("src") |
Scrape Shadow DOM | locator = page.locator('css=shadow-root-selector') |
page.locator('css=shadow-root-selector') |
Wait for element | await page.wait_for_selector(".class") |
await page.waitForSelector('.class') |
This part of the project includes two ready-to-use scrapers implemented in both Python and Node.js:
scrape_products_amazon.py
/scrape_products_amazon.js
scrape_woocommerce.py
/scrape_woocommerce.js
If you want to learn how to build the similar scrapers step by step, check out the detailed guides:
Alternatively, you can use the no-code scrapers and APIs to quickly extract structured data from Amazon:
- Amazon Search Results
- Amazon Product Info
- Amazon Reviews
- Amazon Bestsellers
- Amazon Customer FAQs
- Amazon Price Tracker
This section demonstrates different ways to select elements on a page using CSS selectors, XPath, roles, and text content.
The table below lists the main commands for that.
Description | Python | Node.js |
---|---|---|
CSS selector | page.locator("div > span") |
page.locator("div > span") |
XPath selector | page.locator('//h1') |
page.locator('//h1') |
Select by role | page.get_by_role("button") |
page.getByRole('button') |
Select by text | page.get_by_text("Login") |
page.getByText('Login') |
This section covers scripts that simulate user actions like clicking buttons, filling out forms, selecting from dropdowns, hovering, and handling pagination or infinite scrolling. The table below lists the main commands for that.
Description | Python | Node.js |
---|---|---|
Click button | await page.click("button") |
await page.click('button') |
Fill input | await page.fill("#email", "[email protected]") |
await page.fill('#email', '[email protected]') |
Select dropdown | await page.select_option("select", "value") |
await page.selectOption('select', 'value') |
Hover element | await page.hover(".menu") |
await page.hover('.menu') |
Pagination | await page.click("text=Next") |
await page.click('text=Next') |
Infinite scroll | await page.evaluate("window.scrollBy(...)") |
await page.evaluate(() => window.scrollBy(...)) |
Avoid hardcoded delays — they’re unreliable and make your scraper brittle.
Don’t do this:
await asyncio.sleep(5)
await new Promise(r => setTimeout(r, 5000));
Do this instead:
await page.wait_for_selector(".product-thumb")
await page.waitForSelector(".product-thumb");
Waiting for the actual element is always better than guessing how long the page needs to load.
This section includes examples for saving scraped data in various formats like JSON, CSV, or PDF, and for downloading files or capturing screenshots. The table below lists the main commands for that.
Description | Python | Node.js |
---|---|---|
Save JSON | json.dump(data, open("file.json", "w")) |
fs.writeFileSync('file.json', JSON.stringify(data)) |
Save CSV | csv.writer(open("file.csv", "w")).writerows(data) |
fs.writeFileSync('file.csv', csvString) |
Download files | await page.click("a[download]") |
await page.click('a[download]') |
Screenshot element | await locator.screenshot(path="element.png") |
await locator.screenshot({ path: 'element.png' }) |
Save PDF | await page.pdf(path="output.pdf") |
await page.pdf({ path: 'output.pdf' }) |
This section provides scripts for handling basic authentication, managing cookies, and reusing them across sessions. The table below lists the main commands for that.
Description | Python | Node.js |
---|---|---|
Basic auth | context = browser.new_context(http_credentials={...}) |
browser.newContext({ httpCredentials: {...} }) |
Save cookies | context.cookies() → save to file |
context.cookies() → save to file |
Load cookies | context.add_cookies(cookies) |
context.addCookies(cookies) |
This section focuses on controlling browser behavior with custom user agents, proxies, and device emulation. The table below lists the main commands for that.
Description | Python | Node.js |
---|---|---|
Set user agent | browser.new_context(user_agent="...") |
browser.newContext({ userAgent: "..." }) |
Use proxy | launch(proxy={"server": "http://..."}) |
launch({ proxy: { server: "http://..." } }) |
Emulate device | playwright.devices["iPhone 12"] |
devices['iPhone 12'] |
If you want to check available devices:
print(p.devices.keys())
This will output something like:
dict_keys(['iPhone 12', 'Pixel 5', 'Galaxy S9+', ...])
console.log(Object.keys(devices));
Example output:
[
'Blackberry PlayBook',
'iPhone 12',
'Galaxy S9+',
'Pixel 5',
...
]
This section contains examples for retrying failed requests and handling timeouts or unexpected responses. The table below lists the main commands for that.
Description | Python | Node.js |
---|---|---|
Retry logic | for i in range(retries): try/except |
for (let i = 0; i < retries; i++) try/catch |
This section provides tools for debugging: recording videos and traces, pausing scripts, and inspecting with console logs. The table below lists the main commands for that.
Description | Python | Node.js |
---|---|---|
Record video | record_video_dir="videos/" |
recordVideo: { dir: 'videos/' } |
Record trace | trace.start() / trace.stop(path="trace.zip") |
tracing.start() / tracing.stop() |
Pause script | await page.pause() |
await page.pause() |
Console logs | page.on("console", ...) |
page.on('console', ...) |
You can check out the full scripts in the project folder.