Skip to content

TextScrapper

Nuhman Pk edited this page Feb 12, 2023 · 1 revision

Text Scraper

Text Scraper is a tool for extracting and processing text data from web pages. It allows you to easily collect and structure large amounts of information from the web, without the need for manual copying and pasting.

from Webtrench import TextScrapper

1. Text from URL :

This function takes a URL as an argument and returns the text from the HTML content of the URL. It uses the requests library to send a GET request to the URL, and the BeautifulSoup library to parse the HTML content and extract the text. Here's an example of how to use this function:

url = "https://www.example.com"
text = TextScrapper.text_from_url(url)
print(text)

2. Text from File :

This function takes a file path as an argument and returns the text from the file. Here's an example of how to use this function

file = "example.txt"
text = TextScrapper.text_from_file(file)
print(text)

3. Paragraph from URL :

This function takes a URL as an argument and returns the text of all

elements from the HTML content of the URL. It uses the requests library to send a GET request to the URL, and the BeautifulSoup library to parse the HTML content and extract the paragraphs. Here's an example of how to use this function:

url = "https://www.example.com"
paragraphs = TextScrapper.paragraph_from_url(url)
print(paragraphs)

4. Link from URL :

This function takes a URL as an argument and returns all elements from the HTML content of the URL. It uses the requests library to send a GET request to the URL, and the BeautifulSoup library to parse the HTML content and extract the links. Here's an example of how to use this function:

url = "https://www.example.com"
links = TextScrapper.link_from_url(url)
print(links)

5. Text from Class :

This function takes a URL and a class name as arguments and returns the text of the first HTML element with the specified class from the HTML content of the URL. It uses the requests library to send a GET request to the URL, and the BeautifulSoup library to parse the HTML content and extract the text. Here's an example of how to use this function:

url = "https://www.example.com"
class_name = "example-class"
text = TextScrapper.text_from_class(url, class_name)
print(text)

6. Text from ID :

This function takes a URL and a id name as arguments and returns the text of the first HTML element with the specified id from the HTML content of the URL. It uses the requests library to send a GET request to the URL, and the BeautifulSoup library to parse the HTML content and extract the text. Here's an example of how to use this function:

url = "https://www.example.com"
id_name = "example-id"
text = TextScrapper.text_from_class(url, id_name)
print(text)

7. All Heading from URL :

This function takes a URL as an argument and returns a list of all headings (h1, h2, h3, h4, h5, h6) from the HTML content of the URL along with their respective tags. It uses the requests library to send a GET request to the URL, the BeautifulSoup library to parse the HTML content and extract the text, and the strip method to remove any whitespaces. If no headings are found, the function returns an empty list. In case of any exception, an error message is printed and the function returns None.

Here's an example of how to use this function:

url = "https://www.example.com"
headings = TextScrapper.all_headings_from_url(url)
print(headings)

8. List from URL :

This function takes a URL as an argument and returns a list of all ul and ol elements from the HTML content of the URL. It uses the requests library to send a GET request to the URL and the BeautifulSoup library to parse the HTML content and extract the elements. If no ul or ol elements are found, the function returns an empty list. In case of any exception, an error message is printed and the function returns None.

Here's an example of how to use this function:

url = "https://www.example.com"
lists = TextScrapper.list_from_url(url)
print(lists)

9. List Item from URL :

This function takes a URL as an argument and returns a list of all li elements from the HTML content of the URL. It uses the requests library to send a GET request to the URL and the BeautifulSoup library to parse the HTML content and extract the elements. If no li elements are found, the function returns an empty list. In case of any exception, an error message is printed and the function returns None.

Here's an example of how to use this function:

url = "https://www.example.com"
list_items = TextScrapper.list_item_from_url(url)
print(list_items)

10. Table from URL :

This function takes a URL as an argument and returns a list of all table elements from the HTML content of the URL. It uses the requests library to send a GET request to the URL, and the BeautifulSoup library to parse the HTML content and extract the elements. Here's an example of how to use this function:

url = "https://www.example.com"
tables = TextScrapper.table_from_url(url)
print(tables)

11. Table Row from URL :

This function takes a URL as an argument and returns a list of all tr elements from the HTML content of the URL. It uses the requests library to send a GET request to the URL, and the BeautifulSoup library to parse the HTML content and extract the elements. Here's an example of how to use this function:

url = "https://www.example.com"
table_rows = TextScrapper.table_row_from_url(url)
print(table_rows)

12. Table data from URL :

This function takes a URL as an argument and returns a list of table data (td or th) elements from the HTML content of the URL. It uses the requests library to send a GET request to the URL, and the BeautifulSoup library to parse the HTML content and extract the table data elements. Here's an example of how to use this function:

url = "https://www.example.com"
table_data = TextScrapper.table_data_from_url(url)
print(table_data)