Releases: revanthkalagudi/pdf-to-text-python
AI-Generated Content Detection v1.0
AI-Generated Content Detection v1.0
We are excited to announce the release of AI-Generated Content Detection v1.0. This tool allows you to analyze PDF documents and determine the percentage of AI-generated content within the text. With this release, you can gain insights into the presence of AI-generated content and make informed decisions about the authenticity of the text.
Features
-
Extract text from PDF: The tool utilizes the PyPDF2 library to extract text from each page of the PDF document, providing a comprehensive analysis of the content.
-
AI-generated content detection: By leveraging the NLTK library, the tool compares the extracted text against a corpus of words to identify AI-generated content. It calculates the percentage of AI-generated content present in the text, enabling you to gauge the potential influence of automated content generation.
-
Output to text file: The tool writes the extracted text to a separate text file, allowing you to review the processed content and further analyze it if desired.
Getting Started
To get started with AI-Generated Content Detection v1.0, follow these steps:
-
Install Python: Ensure that you have Python 3.x installed on your system. If not, you can download and install the latest version from the official Python website (https://www.python.org/).
-
Install dependencies: Open a command prompt or terminal and install the required libraries by executing the following commands:
pip install PyPDF2
pip install nltk
-
Download NLTK corpus: Upon running the tool for the first time, NLTK will prompt you to download the words corpus. Follow the on-screen instructions to complete the download. This corpus is necessary for accurate AI-generated content detection.
-
Prepare your PDF file: Place the PDF file you want to analyze in the same directory as the tool.
-
Run the tool: Execute the tool by running the following command in the command prompt or terminal:
python ai_content_detection.py
-
Make sure to replace ai_content_detection.py with the actual filename of the tool.
-
Review the results: The tool will extract the text from each page of the PDF, remove non-alphanumeric characters, convert the text to lowercase, and calculate the percentage of AI-generated content. The results, including the AI-generated content percentage, will be displayed in the console.
-
Analyze the output: The tool will save the extracted text to a text file for further analysis. You can review the contents of the file to validate the AI-generated content detection and gain additional insights.
Limitations
Scanned documents: The tool assumes that the PDF file contains readable text. It may not provide accurate results for scanned documents or files consisting solely of images.
Corpus accuracy: The accuracy of AI-generated content detection depends on the quality of the NLTK word corpus. Please note that false positives or false negatives may occur due to the limitations of the corpus.
Future Enhancements
In future releases, we plan to enhance AI-Generated Content Detection with the following features:
Improved AI-generated content detection algorithms: We aim to refine the AI-generated content detection algorithms to increase accuracy and reduce false positives and false negatives.
Support for additional file formats: While the current version supports PDF files, we plan to extend support to other popular file formats, such as Microsoft Word documents and plain text files.
User-friendly interface: We intend to develop a graphical user interface (GUI) to make the tool more accessible and user-friendly, allowing users to analyze PDF documents without the need for command-line interaction.
Feedback and Contributions
We appreciate any feedback you have on AI-Generated Content Detection. If you encounter issues or have suggestions for improvement, please don't hesitate to reach out to our support team at support