Skip to content

Releases: jcassady/groq-qa-generator

v1.2.1

12 Oct 22:48
Compare
Choose a tag to compare

🚀 groq-qa-generator v1.2.1 Released 🚀

Announcing the release of groq-qa-generator v1.2.1, an update focused on enhancing the reliability of the QA generation process. This version addresses key issues and introduces improved logging for better clarity in managing train/test datasets. 🎯

Summary of Changes

🐛 Fixes:

  • Malformed QA Pair Handling:

    • QA pairs missing either a question or an answer are now properly filtered out during dataset creation, ensuring only valid pairs are included in the final output. This fix eliminates unexpected behavior related to malformed entries.
  • Accurate Dataset Saving:

    • Both train and test datasets are now consistently saved in either JSON or plain text format based on the selected configuration. This ensures train/test splits are correctly handled regardless of output format.

🔧 Improvements:

  • Enhanced Logging:

    • Logging has been improved to provide clear visibility into file paths for the train and test datasets. Users can now easily trace where their datasets are saved, whether in JSON or text formats.
  • Refined Documentation:

    • Docstrings across the codebase have been updated to improve clarity and ease of understanding for developers working with the source code.

🔬 Testing Enhancements:

  • Test coverage has been strengthened to ensure malformed QA pairs are filtered out as expected, and the correct number of valid pairs is included in the saved datasets.

Update Instructions

To update to the latest version, run:

pip install groq-qa-generator --upgrade

For additional details on usage and features, please refer to the official repository.

Feedback, issue reports, and contributions to the project are appreciated.

v1.2.0

12 Oct 20:05
Compare
Choose a tag to compare

v1.2.0: Dataset Splitting, Enhanced CLI, and Hugging Face Integration 🚀

I'm thrilled to announce the release of v1.2.0 of the groq-qa-generator project! This update brings significant new features and improvements to enhance your QA pair processing, dataset creation, and model fine-tuning workflows. For additional usage details, check out the README.

New Features 🎉

Dataset Splitting ✂️

  • Flexible Dataset Ratios: You can now split your QA datasets at custom ratios beyond the default 80% train and 20% test split. Tailor your dataset splits to suit your specific needs with ease using the new --split CLI argument.

Command Line Interface Enhancements 🖥️

  • --split Argument: Specify your desired train/test split ratios directly from the CLI.
  • --upload Argument: Seamlessly upload your datasets to Hugging Face with the new --upload argument.

Enhanced Output Formatting ✨

  • Formatted QA Pair Display: The script now outputs QA pairs enclosed in neat ASCII boxes for better readability.
  2024-10-12 15:20:24 - root - INFO - Question #1:
  2024-10-12 15:20:24 - root - INFO - +------------------------------------------------------------------------------------------------------+
  2024-10-12 15:20:24 - root - INFO - | Q: What was Alice's initial reaction when she saw the White Rabbit take a watch out of its           |
  2024-10-12 15:20:24 - root - INFO - | waistcoat-pocket and hurry on?                                                                       |
  2024-10-12 15:20:24 - root - INFO - | ---------------------------------------------------------------------------------------------------- |
  2024-10-12 15:20:24 - root - INFO - | A: She was startled and her curiosity was piqued, prompting her to follow the Rabbit.                |
  2024-10-12 15:20:24 - root - INFO - +------------------------------------------------------------------------------------------------------+
  • ASCII Tables for QA Summary: At the end of the QA generation, the script now provides an ASCII table summarizing the generated QA pairs, making it easier to review the output.
2024-10-12 15:20:25 - root - INFO - +-------------------+
2024-10-12 15:20:25 - root - INFO - | Training QA Pairs |
2024-10-12 15:20:25 - root - INFO - +-------------------+
2024-10-12 15:20:25 - root - INFO - +-----+------------------------------------------+------------------------------------------+
2024-10-12 15:20:25 - root - INFO - | #   | Question                                 | Answer                                   |
2024-10-12 15:20:25 - root - INFO - +-----+------------------------------------------+------------------------------------------+
2024-10-12 15:20:25 - root - INFO - | 1   | What did Alice find on the three-legged  | A tiny golden key that might unlock one  |
2024-10-12 15:20:25 - root - INFO - |     | glass table that gave her hope of        | of the doors in the hall.                |
2024-10-12 15:20:25 - root - INFO - |     | escaping the hall?                       |                                          |
2024-10-12 15:20:25 - root - INFO - | --- | ---------------------------------------- | ---------------------------------------- |
2024-10-12 15:20:25 - root - INFO - | 2   | What was Alice's cautious approach to    | Alice decided to examine the bottle      |
2024-10-12 15:20:25 - root - INFO - |     | the mysterious bottle with the "DRINK    | carefully to ensure it wasn't marked     |
2024-10-12 15:20:25 - root - INFO - |     | ME" label?                               | "poison" before tasting its contents.    |
2024-10-12 15:20:25 - root - INFO - +-----+------------------------------------------+------------------------------------------+

Hugging Face Integration 🤗

  • Upload to Hugging Face: Directly upload your processed datasets to Hugging Face, making sharing and collaborating on datasets easier than ever before.
  2024-10-12 15:20:25 - root - INFO - Uploading QA dataset to Hugging Face Hub.
  Creating parquet from Arrow format: 100%|██████████████████████████████████████████| 1/1 [00:00<00:00, 1283.05ba/s]
  Uploading the dataset shards: 100%|████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.31it/s]
  README.md: 100%|█████████████████████████████████████████████████████████████████| 398/398 [00:00<00:00, 2.58MB/s]
  2024-10-12 15:20:27 - root - INFO - Dataset uploaded to Hugging Face hub at https://huggingface.co/datasets/jcassady/test-dataset

Wrapping Up 🎉

I'm super excited about the new features in v1.2.0! With custom dataset splitting, an even better CLI, better output formatting, and seamless Hugging Face integration, generating and sharing QA datasets just got way cooler.

Don't wait—upgrade to the latest version and explore all the new perks. I'd love to hear what you think! Feel free to drop an issue or swing by with a pull request if you've got ideas or want to contribute.

Thanks for being awesome and supporting me on this journey!

Catch you later,

Jordan

jordan.cassady.me


References:


v1.1.0

09 Oct 17:47
Compare
Choose a tag to compare

🎉 What's New in v1.1.0!

🚀 New Features

  • CLI Enhancement: The CLI now supports the brand-new --questions argument! 🎯
    You can now specify the exact number of question-answer pairs to generate per chunk of text, offering greater control over output. Whether you're generating questions for demos or fine-tuning, this new feature helps you tailor the output to your needs.
    PR by @jcassady in #3

Example usage:

groq-qa --questions 1

🔧 Full Changelog:

This version now makes it easier to generate precise, customizable question-answer pairs right from the command line. Enjoy the new flexibility! ✨

v1.0.1

09 Oct 00:16
Compare
Choose a tag to compare

📦 Version 1.0.1

🔧 Improvements:

  • 🧹 Removed redundant logging handler cleanup code from config.py for a cleaner setup.
  • 🗑️ Removed unnecessary logging dependency from pyproject.toml to reduce complexity.
  • 🛠️ Moved the include option under [tool.poetry] in pyproject.toml to ensure the necessary files are properly packaged.

v1.0.0

07 Oct 03:46
1deaa08
Compare
Choose a tag to compare

Release Notes for groq-qa-generator v1.0.0

Overview

This is the initial public release of Groq QA Generator, a Python library for automating the creation of question-answer pairs from text. Designed to streamline the process of fine-tuning large language models (LLMs) such as LLaMA 3, this tool is ideal for generating high-quality QA datasets with minimal manual effort. It can be used as a command-line interface (CLI) or directly imported into Python projects.

✨ Features

  • 🖥️ CLI and Python Library: Use groq-qa directly from the command line or as a library in your Python projects.
  • 🤖 Automated QA Generation: Automatically generate question-answer pairs from input text using powerful LLMs.
  • 📄 Prompt Templates: Flexible question generation enabled through customizable prompt templates.
  • 📊 Model Support: Integration with advanced models like LLaMA 3.1 70B via the Groq API for high-quality output.
  • ⚙️ Customizable Configuration: Configure the generator through a config.json file or programmatically for customized QA creation.

🚀 Installation

Install the package via PyPI:

pip install groq-qa-generator

📌 Usage Examples

  • CLI: Use the groq-qa command to generate QA pairs from text files with default or custom settings.
  • Python Library: Import groq_qa_generator in your Python code and call generate() to generate QA pairs programmatically.

📝 Notes

This release marks the beginning of Groq QA Generator's journey, providing essential features to help developers streamline the dataset creation process for model fine-tuning. Contributions and feedback are welcome.