-
-
Notifications
You must be signed in to change notification settings - Fork 5
TextScrapper
Text Scraper is a tool for extracting and processing text data from web pages. It allows you to easily collect and structure large amounts of information from the web, without the need for manual copying and pasting.
from Webtrench import TextScrapper
This function takes a URL as an argument and returns the text from the HTML content of the URL. It uses the requests library to send a GET request to the URL, and the BeautifulSoup library to parse the HTML content and extract the text. Here's an example of how to use this function:
url = "https://www.example.com"
text = TextScrapper.text_from_url(url)
print(text)
This function takes a file path as an argument and returns the text from the file. Here's an example of how to use this function
file = "example.txt"
text = TextScrapper.text_from_file(file)
print(text)
This function takes a URL as an argument and returns the text of all
elements from the HTML content of the URL. It uses the requests library to send a GET request to the URL, and the BeautifulSoup library to parse the HTML content and extract the paragraphs. Here's an example of how to use this function:
url = "https://www.example.com"
paragraphs = TextScrapper.paragraph_from_url(url)
print(paragraphs)
This function takes a URL as an argument and returns all elements from the HTML content of the URL. It uses the requests library to send a GET request to the URL, and the BeautifulSoup library to parse the HTML content and extract the links. Here's an example of how to use this function:
url = "https://www.example.com"
links = TextScrapper.link_from_url(url)
print(links)
This function takes a URL and a class name as arguments and returns the text of the first HTML element with the specified class from the HTML content of the URL. It uses the requests library to send a GET request to the URL, and the BeautifulSoup library to parse the HTML content and extract the text. Here's an example of how to use this function:
url = "https://www.example.com"
class_name = "example-class"
text = TextScrapper.text_from_class(url, class_name)
print(text)
This function takes a URL and a id name as arguments and returns the text of the first HTML element with the specified id from the HTML content of the URL. It uses the requests library to send a GET request to the URL, and the BeautifulSoup library to parse the HTML content and extract the text. Here's an example of how to use this function:
url = "https://www.example.com"
id_name = "example-id"
text = TextScrapper.text_from_class(url, id_name)
print(text)
This function takes a URL as an argument and returns a list of all headings (h1, h2, h3, h4, h5, h6) from the HTML content of the URL along with their respective tags. It uses the requests library to send a GET request to the URL, the BeautifulSoup library to parse the HTML content and extract the text, and the strip method to remove any whitespaces. If no headings are found, the function returns an empty list. In case of any exception, an error message is printed and the function returns None.
Here's an example of how to use this function:
url = "https://www.example.com"
headings = TextScrapper.all_headings_from_url(url)
print(headings)
This function takes a URL as an argument and returns a list of all ul and ol elements from the HTML content of the URL. It uses the requests library to send a GET request to the URL and the BeautifulSoup library to parse the HTML content and extract the elements. If no ul or ol elements are found, the function returns an empty list. In case of any exception, an error message is printed and the function returns None.
Here's an example of how to use this function:
url = "https://www.example.com"
lists = TextScrapper.list_from_url(url)
print(lists)
This function takes a URL as an argument and returns a list of all li elements from the HTML content of the URL. It uses the requests library to send a GET request to the URL and the BeautifulSoup library to parse the HTML content and extract the elements. If no li elements are found, the function returns an empty list. In case of any exception, an error message is printed and the function returns None.
Here's an example of how to use this function:
url = "https://www.example.com"
list_items = TextScrapper.list_item_from_url(url)
print(list_items)
This function takes a URL as an argument and returns a list of all table elements from the HTML content of the URL. It uses the requests library to send a GET request to the URL, and the BeautifulSoup library to parse the HTML content and extract the elements. Here's an example of how to use this function:
url = "https://www.example.com"
tables = TextScrapper.table_from_url(url)
print(tables)
This function takes a URL as an argument and returns a list of all tr elements from the HTML content of the URL. It uses the requests library to send a GET request to the URL, and the BeautifulSoup library to parse the HTML content and extract the elements. Here's an example of how to use this function:
url = "https://www.example.com"
table_rows = TextScrapper.table_row_from_url(url)
print(table_rows)
This function takes a URL as an argument and returns a list of table data (td or th) elements from the HTML content of the URL. It uses the requests library to send a GET request to the URL, and the BeautifulSoup library to parse the HTML content and extract the table data elements. Here's an example of how to use this function:
url = "https://www.example.com"
table_data = TextScrapper.table_data_from_url(url)
print(table_data)