Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shreya #38

Open
wants to merge 43 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
bc38da0
addition to csv
antgad Nov 1, 2021
b73d730
print filename
antgad Nov 1, 2021
9bfb07d
Merge pull request #1 from SEProjGrp5/Anant
antgad Nov 1, 2021
8341caa
Added Etsy scraper
AnmolikaGoyal Nov 1, 2021
4c6be3a
Added product23
AnmolikaGoyal Nov 1, 2021
0b13c7f
Update formatter.py
shubhangij12 Nov 1, 2021
0669ba7
Update scraper.py
shubhangij12 Nov 1, 2021
0e91c77
Update slash.py
shubhangij12 Nov 1, 2021
45d51bf
Update scraper.py
shubhangij12 Nov 1, 2021
26c74c0
Merge pull request #3 from SEProjGrp5/Anmolika
AnmolikaGoyal Nov 1, 2021
d292ea0
Merge branch 'main' into Shubhangi
shubhangij12 Nov 1, 2021
1a8e0cb
Merge pull request #5 from SEProjGrp5/Shubhangi
shubhangij12 Nov 1, 2021
8d97a69
Update slash.py
antgad Nov 1, 2021
44bdf04
Added functionality to save user name and email
antgad Nov 1, 2021
b05afe3
added functionality to choose what to do
antgad Nov 1, 2021
48fde02
Formatting updates
antgad Nov 1, 2021
935e4cc
displaying all available ptoducts on 3 websites
antgad Nov 2, 2021
42f9996
Merge pull request #10 from SEProjGrp5/Anant
antgad Nov 2, 2021
6ab7590
consolidated the scraping
antgad Nov 2, 2021
99723ce
Merge pull request #12 from SEProjGrp5/Anant
antgad Nov 2, 2021
78db3c7
updated to save favourite product to local csv
antgad Nov 2, 2021
3abe084
Merge pull request #13 from SEProjGrp5/Anant
antgad Nov 2, 2021
2311be9
Update .gitignore
antgad Nov 2, 2021
85d9ff5
Merge pull request #15 from SEProjGrp5/Anant
antgad Nov 2, 2021
15fede3
functionality to view saved products
antgad Nov 2, 2021
a762d24
Merge pull request #17 from SEProjGrp5/Anant
antgad Nov 2, 2021
5631a81
saving csv in quick mode in the csv folder by default
antgad Nov 2, 2021
bc5b62a
Merge pull request #18 from SEProjGrp5/Anant
antgad Nov 2, 2021
1134ef2
Update requirements.txt
AnmolikaGoyal Nov 2, 2021
405d532
Update requirements.txt
AnmolikaGoyal Nov 2, 2021
3fb20aa
Update README.md
AnmolikaGoyal Nov 2, 2021
821d8ef
Update README.md
AnmolikaGoyal Nov 2, 2021
e0b9830
Update README.md
AnmolikaGoyal Nov 2, 2021
a467a14
minor updates
antgad Nov 2, 2021
1282bde
Merge pull request #30 from SEProjGrp5/Anant
antgad Nov 2, 2021
f13ed04
Update slash.py
srujanarao Nov 3, 2021
b5509e8
changes to currency function
srujanarao Nov 3, 2021
dc8e99e
currency conversion debugging
antgad Nov 3, 2021
1243bc7
Update formatter.py
antgad Nov 3, 2021
d14ed17
Merge pull request #31 from SEProjGrp5/srujana
antgad Nov 3, 2021
6fd12c8
use dataframes
sskarra1234 Nov 3, 2021
4555170
add Docstrings for the functions
sskarra1234 Nov 3, 2021
48e5195
add docstrings
sskarra1234 Nov 4, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,7 @@ dmypy.json

# Pyre type checker
.pyre/


*.csv
user_data.json
3 changes: 3 additions & 0 deletions .idea/.gitignore

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions .idea/inspectionProfiles/Project_Default.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/inspectionProfiles/profiles_settings.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions .idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions .idea/slash_new.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,11 +183,11 @@ python slash.py --search "philips hue" --num 5

<table>
<tr>
<td align="center"><a href="http://www.shubhammankar.com/"><img src="https://avatars.githubusercontent.com/u/29366125?v=4" width="75px;" alt=""/><br /><sub><b>Shubham Mankar</b></sub></a></td>
<td align="center"><a href="https://github.com/pratikdevnani"><img src="https://avatars.githubusercontent.com/u/43350493?v=4" width="75px;" alt=""/><br /><sub><b>Pratik Devnani</b></sub></a><br /></td>
<td align="center"><a href="https://github.com/moksh98"><img src="https://avatars.githubusercontent.com/u/29693765?v=4" width="75px;" alt=""/><br /><sub><b>Moksh Jain</b></sub></a><br /></td>
<td align="center"><a href="https://rahilsarvaiya.tech/"><img src="https://avatars0.githubusercontent.com/u/32304956?v=4" width="75px;" alt=""/><br /><sub><b>Rahil Sarvaiya</b></sub></a><br /></td>
<td align="center"><a href="https://github.com/annie0467"><img src="https://avatars.githubusercontent.com/u/17164255?v=4" width="75px;" alt=""/><br /><sub><b>Anushi Keswani</b></sub></a><br /></td>
<td align="center"><a href="https://github.com/antgad"><img src="https://avatars.githubusercontent.com/u/37169203?v=4" width="75px;" alt=""/><br /><sub><b>Anant Gadodia</b></sub></a></td>
<td align="center"><a href="https://github.com/AnmolikaGoyal"><img src="https://avatars.githubusercontent.com/u/68813421?v=4" width="75px;" alt=""/><br /><sub><b>Anmolika Goyal</b></sub></a><br /></td>
<td align="center"><a href="https://github.com/shubhangij12"><img src="https://avatars.githubusercontent.com/u/48826459?v=4" width="75px;" alt=""/><br /><sub><b>Shubhangi Jain</b></sub></a><br /></td>
<td align="center"><a href="https://github.com/shreyakarra"><img src="https://avatars0.githubusercontent.com/u/89954066?v=4" width="75px;" alt=""/><br /><sub><b>Shreya Karra</b></sub></a><br /></td>
<td align="center"><a href="https://github.com/srujanarao"><img src="https://avatars.githubusercontent.com/u/6882921?v=4" width="75px;" alt=""/><br /><sub><b>Srujana Rao</b></sub></a><br /></td>
</tr>
</table>

Expand Down
4 changes: 3 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,6 @@ urllib3==1.26.6
Werkzeug==1.0.1
wheel==0.37.0
zipp==3.5.0
DateTime==4.3
DateTime==4.3
lxml==4.6.3
requests-oauthlib==1.3.0
19 changes: 19 additions & 0 deletions src/csv_writer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import csv
from datetime import datetime
import os


def write_csv(arr,product,file_path):
''' Returns the CSV file with the naming nomenclature as 'ProductDate_Time'
Parameters- product: product entered by the user, file_path: path where the csv needs to be stored
Returns- file_name: CSV file '''
os.chdir(file_path)
keys = arr[0].keys()
now=datetime.now()
file_name=product+now.strftime("%m%d%y_%H%M")+'.csv'
a_file = open(file_name, "w", newline='')
dict_writer = csv.DictWriter(a_file, keys)
dict_writer.writeheader()
dict_writer.writerows(arr)
a_file.close()
return file_name
35 changes: 29 additions & 6 deletions src/formatter.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,25 +15,36 @@
from datetime import datetime
import math

def formatResult(website, titles, prices, links):
def formatResult(website, titles, prices, links,ratings,df_flag, currency):
"""
The formatResult function takes the scraped HTML as input, and extracts the
necessary values from the HTML code. Ex. extracting a price '$19.99' from
a paragraph tag.
"""
title, price, link = '', '', ''

title, price, link, rating, converted_cur = '', '', '', '', ''
if titles: title = titles[0].get_text().strip()
if prices: price = prices[0].get_text().strip()
if '$' not in price:
price='$'+price
if links: link = links[0]['href']
if ratings: rating = ratings[0].get_text().strip().split()[0]
if df_flag==0: title=formatTitle(title)
if df_flag==0: link=formatTitle(link)
if currency: converted_cur = getCurrency(currency, price)
product = {
'timestamp': datetime.now().strftime("%d/%m/%Y %H:%M:%S"),
"title": formatTitle(title),
"title": title,
"price": price,
# "link":f'www.{website}.com{link}',
"link":f'www.{website}.com{link}',
"website": website,
"rating" : rating,
"converted price": converted_cur
}

return product


def sortList(arr, sortBy, reverse):
"""
The sortList function is used to sort the products list based on the
Expand All @@ -43,7 +54,7 @@ def sortList(arr, sortBy, reverse):
return sorted(arr, key=lambda x: getNumbers(x["price"]), reverse=reverse)
# To-do: sort by rating
elif sortBy == "ra":
# return sorted(arr, key=lambda x: getNumbers(x.price), reverse=reverse)
return sorted(arr, key=lambda x: getNumbers(x["rating"]), reverse=reverse)
pass
return arr

Expand All @@ -62,6 +73,7 @@ def formatTitle(title):
return title[:40] + "..."
return title


def getNumbers(st):
"""
The getNumbers function extracts float values (price) from a string.
Expand All @@ -75,4 +87,15 @@ def getNumbers(st):
ans = float(ans)
except:
ans = math.inf
return ans
return ans

def getCurrency(currency, price):

converted_cur = 0.0
if len(price)>1 :
if currency == "inr":
converted_cur = 75 * int(price[(price.index("$")+1):price.index(".")].replace(",",""))
elif currency == "euro":
converted_cur = 1.16 * int(price[(price.index("$")+1):price.index(".")].replace(",",""))
converted_cur=currency.upper()+' '+str(converted_cur)
return converted_cur
105 changes: 105 additions & 0 deletions src/full_version.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
import json
import os
import pandas as pd
import scraper

class full_version:
def __init__(self):
self.data={}
self.name=""
self.email=""
self.user_data = os.path.join(
os.path.dirname(
os.path.dirname(
os.path.abspath(__file__))),
"json",
"user_data.json"
)
self.user_list = os.path.join(
os.path.dirname(
os.path.dirname(
os.path.abspath(__file__))),
"csvs",
"user_list.csv"
)
self.df=pd.DataFrame()
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 40)


def login(self):
if not os.path.exists(self.user_data):
print("Welcome to Slash!")
print("Please enter the following information: ")
name=input("Name: ")
email=input("Email: ")
self.data['name']=name
self.data['email']=email
with open(self.user_data, 'w') as outfile:
json.dump(self.data, outfile)
self.name=name
self.email=email
else:
with open(self.user_data) as json_file:
data = json.load(json_file)
self.name=data['name']
self.email=data['email']
return self.name, self.email

def search_fn(self):
prod=input("Enter name of product to Search: ")
self.scrape(prod)
ch=int(input("\n\nEnter 1 to save product to list \nelse enter any other key to continue"))
if ch==1:
indx=int(input("Enter row number of product to save: "))
if indx<len(self.df):
if os.path.exists(self.user_list):
old_data=pd.read_csv(self.user_list)
else:
old_data=pd.DataFrame()
if self.df.title[indx] not in old_data:
old_data=pd.concat([old_data,self.df.iloc[[indx]]])
print(self.df.iloc[[indx]])
old_data.to_csv(self.user_list, index=False,header=self.df.columns)

pass

def extract_list(self):
if os.path.exists(self.user_list):
old_data=pd.read_csv(self.user_list)
print(old_data)
else:
print("No saved data found.")
pass

def scrape(self,prod):
products_1 = scraper.searchAmazon(prod,1)
products_2 = scraper.searchWalmart(prod,1)
products_3 = scraper.searchEtsy(prod,1)
results=scraper.driver(prod,df_flag=1)
#esults = formatter.sortList(results, "ra" , True)
self.df=pd.DataFrame.from_dict(results, orient='columns')
print(self.df)



def driver(self):
self.login()
flag_loop=1
print("Welcome ",self.name)
while flag_loop==1:
print("Select from following:")
print("1. Search new product\n2. See exiting list\n3. Exit")
choice=int(input())
if choice==1:
self.search_fn()
elif choice==2:
self.extract_list()
elif choice==3:
print("Thank You for Using Slash")
flag_loop = 0
else:
print("Incorrect Option")

Loading