HTML Text Extractor

This application uses Google's Gemini AI to intelligently extract and process English text content from HTML sources. It can process both URLs and HTML files, making it ideal for content extraction and preprocessing for other APIs.

Features

Extract English text from HTML content using Gemini AI
Process URLs directly
Process local HTML files
Intelligent text cleaning and formatting
Easy integration with other APIs

Setup

Clone the repository
Install dependencies:
```
pip install -r requirements.txt
```
Copy .env.example to .env:
```
cp .env.example .env
```
Add your API keys to the .env file:
- Add your Google Gemini API key (get one from https://makersuite.google.com/app/apikey)
- Add any other API keys needed for your target API

Usage

Basic Usage

from html_text_extractor import HTMLTextExtractor

extractor = HTMLTextExtractor()

# Process a URL
text = extractor.process_url("https://example.com")
print(text)

# Process an HTML file
text = extractor.process_html_file("path/to/file.html")
print(text)

Integration with Other APIs

The extracted text can be easily passed to other APIs. Simply take the output from the extractor and use it as input for your target API.

Requirements

Python 3.7+
Google Gemini API key
Internet connection for URL processing
Additional API keys as needed for your target APIs

Error Handling

The application includes built-in error handling for:

Invalid URLs
File reading errors
API communication issues
HTML parsing errors

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
html_text_extractor.py		html_text_extractor.py
input.csv		input.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HTML Text Extractor

Features

Setup

Usage

Basic Usage

Integration with Other APIs

Requirements

Error Handling

License

About

Releases

Packages

Contributors 2

Languages

License

sundai-club/heygen-tutorial

Folders and files

Latest commit

History

Repository files navigation

HTML Text Extractor

Features

Setup

Usage

Basic Usage

Integration with Other APIs

Requirements

Error Handling

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages