This guide explains how to perform web scraping using Laravel:
Laravel is a powerful PHP framework with an elegant syntax, making it ideal for building APIs for web scraping. It supports various scraping libraries, simplifying data extraction.
Laravel’s scalability, easy integration, and strong MVC architecture keep scraping logic well-organized, making it great for complex or large-scale projects. For more details, see our guide on web scraping in PHP.
Here are some top libraries for web scraping in Laravel:
- BrowserKit – A Symfony component that simulates a web browser API for interacting with static HTML documents. It works with
DomCrawler
for efficient navigation and scraping. - HttpClient – A Symfony HTTP client that integrates seamlessly with
BrowserKit
for sending requests. - Guzzle – A powerful HTTP client for making web requests and handling responses. Useful for retrieving HTML documents. Learn how to set up a proxy in Guzzle.
- Panther – A headless browser for scraping dynamic sites that require JavaScript rendering or interaction.
To follow this tutorial for web scraping in Laravel, you need to meet the following prerequisites:
An IDE to code in PHP is also recommended.
This section walks you through creating a Laravel web scraping API using the Quotes scraping sandbox site. The scraping endpoint will:
- Select quote HTML elements from the page
- Extract data from them
- Return the scraped data in JSON format
Here’s what the target site looks like:
Step 1: Set up a Laravel project
Open the terminal and launch the Composer create-command
command below to initialize your Laravel web scraping application:
composer create-project laravel/laravel laravel-scraper
The lavaral-scraper
folder will now contain a blank Laravel project. Load it in your favorite PHP IDE.
This is the file structure of your current backend:
Step 2: Initialize Your Scraping API
Launch the Artisan command below in the project directory to add a new Laravel controller:
php artisan make:controller HelloWorldController
This will create the following ScrapingController.php
file in the /app/Http/Controllers
directory:
<?php
namespace App\Http\Controllers;
use Illuminate\Http\Request;
class ScrapingController extends Controller
{
//
}
In ScrapingController
file, add the following scrapeQuotes()
method:
public function scrapeQuotes(): JsonResponse
{
// scraping logic...
return response()->json('Hello, World!');
}
For now, the method returns a placeholder 'Hello, World!'
JSON message.
Add the following import:
use Illuminate\Http\JsonResponse;
Associate the scrapeQuotes()
method to a dedicated endpoint by adding the following lines to routes/api.php
:
use App\Http\Controllers\ScrapingController;
Route::get('/v1/scraping/scrape-quotes', [ScrapingController::class, 'scrapeQuotes']);
Let's verify that the Laravel scraping API works as expected. Since the Laravel APIs are available under the /api
path, the complete API endpoint is /api/v1/scraping/scrape-quotes
.
Launch your Laravel application:
php artisan serve
Your server should now be listening locally on port 8000
.
Use cURL to make a GET
request to the /api/v1/scraping/scrape-quotes
endpoint:
curl -X GET 'http://localhost:8000/api/v1/scraping/scrape-quotes'
You should get the following response:
"Hello, World!"
Step 3: Install the scraping libraries
Before installing any packages, determine which Laravel web scraping libraries suit your needs. Open the target site, inspect it using Developer Tools, and check the Network → Fetch/XHR section:
Since the site does not make AJAX requests, it is a static page with data embedded in the HTML. A headless browser is unnecessary, as it would add overhead.
For efficient scraping, use Symfony’s BrowserKit
and HttpClient
. Install them with:
composer require symfony/browser-kit symfony/http-client
Step 4: Download the target page
Import BrowserKit
and HttpClient
in ScrapingController
:
use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;
In scrapeQuotes()
, initialize a new HttpBrowser
object:
$browser = new HttpBrowser(HttpClient::create());
HttpBrowser
allows you to make HTTP requests while mimicking browser behavior, including cookie and session handling. However, it does not execute requests in a real browser.
Use the request()
method to perform an HTTP GET request to the target URL:
$crawler = $browser->request('GET', 'https://quotes.toscrape.com/');
The result will be a Crawler
object, which automatically parses the HTML document returned by the server. This class also provides node selection and data extraction capabilities.
You can verify that the above logic works by extracting the HTML of the page from the crawler:
$html = $crawler->outerHtml();
For testing, make your API return this data.
Your scrapeQuotes()
function will now look like this:
public function scrapeQuotes(): JsonResponse
{
// initialize a browser-like HTTP client
$browser = new HttpBrowser(HttpClient::create());
// download and parse the HTML of the target page
$crawler = $browser->request('GET', 'https://quotes.toscrape.com/');
// get the page outer HTML and return it
$html = $crawler->outerHtml();
return response()->json($html);
}
Your API will now return:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Quotes to Scrape</title>
<link rel="stylesheet" href="/static/bootstrap.min.css">
<link rel="stylesheet" href="/static/main.css">
</head>
<!-- omitted for brevity ... -->
Step 5: Inspect the page content
To define the extraction logic, inspect the HTML structure of the target page.
- Open Quotes To Scrape.
- Right-click a quote element and select Inspect in DevTools.
- Expand the HTML and examine its structure:
Each .quote
element contains:
- A
.text
node for the quote text - An
.author
node for the author’s name - Multiple
.tag
nodes for associated tags
With these CSS selectors, you can now extract the desired data in Laravel.
Step 6: Get ready to perform web scraping
Create a data structure where to store the scraped data. Use an array for that:
quotes = []
Now use the filter()
method from the Crawler
class to select all quote elements:
$quote_html_elements = $crawler->filter('.quote');
This returns all DOM nodes on the page that match the specified .quote
CSS selector.
Next, iterate over them and get ready to apply the data extraction logic on each of them:
foreach ($quote_html_elements as $quote_html_element) {
// create a new quote crawler
$quote_crawler = new Crawler($quote_html_element);
// scraping logic...
}
The DOMNode
objects returned by filter()
lack node selection methods. To work around this, create a local Crawler
instance scoped to the specific HTML quote element.
For the code to function correctly, add the following import:
use Symfony\Component\DomCrawler\Crawler;
Step 7: Implement data scraping
Inside the foreach
loop:
- Extract the data of interest from the
.text
,.author
, and.tag
elements - Populate a new
$quote
object with them - Add the new
$quote
object to$quotes
First, select the .text element inside the HTML quote element. Then, use the text()
method to extract the inner text from it:
$text_html_element = $quote_crawler->filter('.text');
$raw_text = $text_html_element->text();
Each quote is enclosed by the \u201c
and \u201d
special characters. You can remove them using the str_replace()
PHP function as follows:
$text = str_replace(["\u{201c}", "\u{201d}"], '', $raw_text);
Similarly, scrape the author info with this code:
$author_html_element = $quote_crawler->filter('.author');
$author = $author_html_element->text();
Scraping the tags can be challenging. Since a single quote can have multiple tags, you need to define an array and scrape each tag individually:
$tag_html_elements = $quote_crawler->filter('.tag');
$tags = [];
foreach ($tag_html_elements as $tag_html_element) {
$tag = $tag_html_element->textContent;
$tags[] = $tag;
}
Note that the DOMNode
elements returned by filter()
do not expose the text()
method. Equivalently, they provide the textContent
attribute.
Here is the entire Laravel data scraping logic:
// create a new quote crawler
$quote_crawler = new Crawler($quote_html_element);
// perform the data extraction logic
$text_html_element = $quote_crawler->filter('.text');
$raw_text = $text_html_element->text();
// remove special characters from the raw text information
$text = str_replace(["\u{201c}", "\u{201d}"], '', $raw_text);
$author_html_element = $quote_crawler->filter('.author');
$author = $author_html_element->text();
$tag_html_elements = $quote_crawler->filter('.tag');
$tags = [];
foreach ($tag_html_elements as $tag_html_element) {
$tag = $tag_html_element->textContent;
$tags[] = $tag;
}
Step 8: Return the scraped data
Create a $quote
object with the scraped data and add it to $quotes
:
$quote = [
'text' => $text,
'author' => $author,
'tags' => $tags
];
$quotes[] = $quote;
Next, update the API response data with the $quotes
list:
return response()->json(['quotes' => $quotes]);
At the end of the scraping loop, $quotes
will contain:
array(10) {
[0]=>
array(3) {
["text"]=>
string(113) "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking."
["author"]=>
string(15) "Albert Einstein"
["tags"]=>
array(4) {
[0]=>
string(6) "change"
[1]=>
string(13) "deep-thoughts"
[2]=>
string(8) "thinking"
[3]=>
string(5) "world"
}
}
// omitted for brevity...
[9]=>
array(3) {
["text"]=>
string(48) "A day without sunshine is like, you know, night."
["author"]=>
string(12) "Steve Martin"
["tags"]=>
array(3) {
[0]=>
string(5) "humor"
[1]=>
string(7) "obvious"
[2]=>
string(6) "simile"
}
}
}
This data will then be serialized into JSON and returned by the Laravel scraping API.
Step 9: Put it all together
Here is the final code of the ScrapingController
file in Laravel:
<?php
namespace App\Http\Controllers;
use Illuminate\Http\Request;
use Illuminate\Http\JsonResponse;
use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;
use Symfony\Component\DomCrawler\Crawler;
class ScrapingController extends Controller
{
public function scrapeQuotes(): JsonResponse
{
// initialize a browser-like HTTP client
$browser = new HttpBrowser(HttpClient::create());
// download and parse the HTML of the target page
$crawler = $browser->request('GET', 'https://quotes.toscrape.com/');
// where to store the scraped data
$quotes = [];
// select all quote HTML elements on the page
$quote_html_elements = $crawler->filter('.quote');
// iterate over each quote HTML element and apply
// the scraping logic
foreach ($quote_html_elements as $quote_html_element) {
// create a new quote crawler
$quote_crawler = new Crawler($quote_html_element);
// perform the data extraction logic
$text_html_element = $quote_crawler->filter('.text');
$raw_text = $text_html_element->text();
// remove special characters from the raw text information
$text = str_replace(["\u{201c}", "\u{201d}"], '', $raw_text);
$author_html_element = $quote_crawler->filter('.author');
$author = $author_html_element->text();
$tag_html_elements = $quote_crawler->filter('.tag');
$tags = [];
foreach ($tag_html_elements as $tag_html_element) {
$tag = $tag_html_element->textContent;
$tags[] = $tag;
}
// create a new quote object
// with the scraped data
$quote = [
'text' => $text,
'author' => $author,
'tags' => $tags
];
// add the quote object to the quotes array
$quotes[] = $quote;
}
var_dump($quotes);
return response()->json(['quotes' => $quotes]);
}
}
Let's test it. Start your Laravel server:
php artisan serve
Make a GET request to the /api/v1/scraping/scrape-quotes
endpoint:
curl -X GET 'http://localhost:8000/api/v1/scraping/scrape-quotes'
You will get the following result:
{
"quotes": [
{
"text": "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.",
"author": "Albert Einstein",
"tags": [
"change",
"deep-thoughts",
"thinking",
"world"
]
},
// omitted for brevity...
{
"text": "A day without sunshine is like, you know, night.",
"author": "Steve Martin",
"tags": [
"humor",
"obvious",
"simile"
]
}
]
}
This API is a basic example of Laravel's web scraping capabilities. To improve and scale your project, consider these enhancements:
- Implement web crawling – The target site spans multiple pages. Use web crawling to retrieve all quotes efficiently.
- Schedule scraping tasks – Automate data collection by scheduling API calls, storing data in a database, and keeping it up to date.
- Integrate proxies – Avoid IP bans by using residential proxies to distribute requests and bypass anti-scraping measures.
Web scraping is a powerful tool for data collection, but it must be done ethically and responsibly. Follow these best practices to ensure compliance and avoid harming target sites:
- Review the site’s terms of service – Check for guidelines on data usage, copyright, and intellectual property before scraping.
- Respect
robots.txt
rules – Follow the site's crawling instructions to maintain ethical scraping practices. - Scrape only public data – Avoid restricted content requiring authentication, as scraping private data may have legal consequences.
- Limit request frequency – Prevent server overload and rate limiting by pacing requests and adding random delays.
- Use reputable scraping tools – Choose well-maintained tools that follow ethical scraping guidelines.
Web scraping with Laravel is simple and takes only a few lines of code. However, most sites protect their data with anti-bot and anti-scraping solutions. To work around that, you can use Web Unlocker, out unlocking API that can seamlessly return the clean HTML of any page, circumventing any anti-scraping measures.
Sign up now and start your free trial.