Skip to content

VV-MANOJ/Udacity-Log-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Log-Analysis-Udacity-Project

An internal reporting tool that uses information of large database of a web server and draw business conclusions from that information. (Project from Full Stack Web Development Nanodegree)

Introduction

This is a python module that uses information of large database of a web server and draw business conclusions from that information. The database contains newspaper articles, as well as the web server log for the site. The log has a database row for each time a reader loaded a web page. The database includes three tables:

  • The authors table includes information about the authors of articles.
  • The articles table includes the articles themselves.
  • The log table includes one entry for each time a user has accessed the site.

The project drives following conclusions:

  • Most popular three articles of all time.
  • Most popular article authors of all time.
  • Days on which more than 1% of requests lead to errors.

Functions in log.py:

  • connect(): Connects to the PostgreSQL database and returns a database connection.
  • popular_article(): Prints most popular three articles of all time.
  • popular_authors(): Prints most popular article authors of all time.
  • log_status(): Print days on which more than 1% of requests lead to errors.
  • view_popular_articles(): Creates view popular_articles that drives first conclusion.
  • view_popular_authors(): Creates view popular_authors that drives second conclusion.
  • view_log_status(): Creates view log_status that drives third conclusion.

Views Made:

  • popular_articles

create or replace view popular_articles as
select title, count(title) as views from articles,log
where log.path = concat('/article/',articles.slug)
group by title order by views desc
  • popular_authors

create or replace view popular_authors as
select authors.name, count(articles.author) as views from articles, log, authors
where log.path = concat('/article/',articles.slug) and articles.author = authors.id
group by authors.name order by views desc
  • log_status

create or replace view log_status as
select Date,Total,Error, (Error::float*100)/Total::float as Percent from
(select time::timestamp::date as Date, count(status) as Total,
sum(case when status = '404 NOT FOUND' then 1 else 0 end) as Error from log
group by time::timestamp::date) as result
where (Error::float*100)/Total::float > 1.0 order by Percent desc;

Instructions

  • Install Vagrant and VirtualBox.

  • Clone the repository to your local machine:

    git clone https://github.com/visheshbanga/Log-Analysis-Udacity-Project
  • Start the virtual machine

    From your terminal, inside the project directory, run the command `vagrant up`. This will cause Vagrant to download the Linux operating system and install it. When vagrant up is finished running, you will get your shell prompt back. At this point, you can run `vagrant ssh` to log in to your newly installed Linux VM!
  • Download the data

    You will need to unzip this file after downloading it. The file inside is called newsdata.sql. Put this file into the vagrant directory, which is shared with your virtual machine.
  • Setup Database

    To load the database use the following command:
    psql -d news -f newsdata.sql;
  • Make Views

    Make views by running respective queries on command line or uncomment code written in python module.
  • Run Module

    python log.py

Output:

Screenshot.jpg

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages