Skip to content

This project is a Python-based web scraping script designed to extract flight details from the Yatra travel website. The extracted data includes flight pricing, origin, destination, departure and arrival times, and duration. The data is then saved into a CSV file for further analysis.

License

Notifications You must be signed in to change notification settings

vishal815/Python-Based-Flight-Data-Scraping-Automating-Data-Collection-for-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Python-Based-Flight-Data-Scraping-Automating-Data-Collection-for-for-Analysis

This project is a Python-based web scraping script designed to extract flight details from the Yatra travel website. The extracted data includes flight pricing, origin, destination, departure and arrival times, and duration. The data is then saved into a CSV file for further analysis or reporting.


Project Overview

Objective

The main objective of this assignment is to:

  • Scrape data from a travel website.
  • Extract relevant flight details such as:
    • Airline name
    • Departure time and airport
    • Arrival time and airport
    • Flight duration
    • Stops
    • Price
  • Save the extracted data into a CSV file.

Website Link

Yatra Travel Website


image

Features

  • Automated Web Scraping: Uses Selenium and BeautifulSoup to navigate and parse the website's HTML content.
  • Flight Details Extraction: Extracts key information such as airline names, times, airports, flight duration, stops, and prices.
  • Data Cleaning: Handles formatting and cleaning of data, such as removing unnecessary characters (e.g., +1 Day).
  • CSV Export: Saves the extracted data into a CSV file for easy access and further analysis.
  • Data Visualization: Enhances the presentation of extracted data using plots and graphs to provide insights into flight trends.

Requirements

Prerequisites

Make sure you have the following installed:

  • Python 3.8+
  • Google Chrome browser
  • ChromeDriver (compatible with your Chrome version)

Python Libraries

Install the required Python libraries using the following command:

pip install selenium beautifulsoup4 pandas matplotlib seaborn

How It Works

Script Workflow

  1. Set Up Selenium WebDriver:
    • The script initializes the Selenium WebDriver to open and navigate the website.
  2. Load Website:
    • Navigates to the Yatra flight search page.
  3. Parse HTML Content:
    • Uses BeautifulSoup to parse the loaded webpage.
  4. Extract Flight Details:
    • Scrapes flight details such as airline names, times, prices, and other relevant information.
  5. Data Cleaning:
    • Cleans and formats the extracted data.
  6. Save to CSV:
    • Stores the extracted data in a CSV file for further use.
  7. Visualize Data:
    • Uses Python libraries like Matplotlib and Seaborn to create visual representations of the data, such as:
      • Bar charts for price comparison between airlines.
      • Line plots for flight duration trends.


Output

The script generates a CSV file (flight_data.csv) with the following columns:

  • Airline
  • Departure Time
  • Departure Airport
  • Arrival Time
  • Arrival Airport
  • Flight Duration
  • Stops
  • Price

Example Output

Airline Departure Time Departure Airport Arrival Time Arrival Airport Flight Duration Stops Price (INR)
Srilankan Airlines 18:35 DEL 21:50 DXB 28h 45m 1 Stop 25,662
IndiGo 17:30 DEL 14:35 SHJ 22h 35m 1 Stop 25,677

Data Visualization

  • Visualization makes it easier to identify trends and patterns in the flight data.
  • Provides actionable insights for travel planning, such as:
    • Identifying the most cost-effective airline.
    • Observing peak travel times.

Example Visualizations

  1. Price Distribution:
    • A histogram to display the frequency of flight prices.
  2. Airline Comparison:
    • A bar chart comparing average prices across airlines.
  3. Flight Duration Trends:
    • A line plot to observe how flight durations vary with stops.

image

image


Challenges

  1. Dynamic Loading:
    • The website uses JavaScript to dynamically load flight details, requiring Selenium for rendering.
  2. Data Cleaning:
    • Formatting time strings and handling cases like +1 Day or +0 Day.

Future Enhancements

  • Add error handling for network or browser-related issues.
  • Automate handling of dynamic date and destination inputs.
  • Enhance scraping to include additional data such as baggage allowance.

Vishal Lazrus

About

This project is a Python-based web scraping script designed to extract flight details from the Yatra travel website. The extracted data includes flight pricing, origin, destination, departure and arrival times, and duration. The data is then saved into a CSV file for further analysis.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published