TEGD_thirdai_hackathon

A neural DB based google drive search engine

Installation

Wispher

!git clone https://github.com/ggerganov/whisper.cpp.git  
%cd /content/whisper.cpp/models  
!bash download-ggml-model.sh base.en  
%cd ..  
!make

ffmpeg (apt)

Requirements

pydub 
thirdai
thirdai[neural_db]
langchain
openai
paper-qa
requests
bs4
pandas

Directory Structure

book_web_Scrapping:

To Download the books from the web (NCERT)
Thirdai_gdrive_engine_TEGD.ipynb

main script to run the search engine

Usage

To run the search engine:

Open the notebook with google colab
mount the drive by executing the very first cell

from google.colab import drive
drive.mount('/content/drive')

NOTE:

We are having a user input option in the notebook, as the extraction and training of audio file is time consuming.

Please give input as y to the cell if you want to train your G-Drive's audio files.

mp3_train = False
print("Do you want to fetch and train mp3 files too?")
print("It will take a lot of time. You can skip it for now")
if 'y' in input("Press 'y' to extract audio files else press 'n'").lower():
  mp3_train = True

By running the notebook Thirdai_gdrive_engine_TEGD.ipynb

The drive will be mounted to the notebook
Get all the files in your G-Drive

Currently we are working on the pdfs, docx, mp3 and wav files. The other file format like all source code file and mp4 file will be added soon.

For processing audio files
- Download ggml-whisper which is a really fast version of whisper written in C/C++
- Convert an audio file with any format to wav (the only format currently supported by ggml-whisper)
- Convert the transcription to csv and save for model training later
We are storing a map for each audio_csv file to its original localtion in the drive. So that when we query from the engine, we can redirect the output to original location.
Get the General_QnA model from the bazaar
PREPARE INSERTABLE DOCS
- from pdfs
- docx
- mp3 (created earlier)
SEARCH THE QUERY (from user input), and you will get the exact location of the file in the drive.

DATA Link

https://drive.google.com/drive/folders/1tNJ-r6-URDAcU0VOZJ_h50hs4PrO5EkN?usp=sharing

Try Our Demo

DEMO

Note: We are not submitting the model, as our main Moto is to keep data private. You can just train your model on your own personal data.

#POCKETLLM

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
ThirdAI_Presentation.pdf		ThirdAI_Presentation.pdf
ThirdAI_Presentation.pptx		ThirdAI_Presentation.pptx
Thirdai_gdrive_engine_TEGD.ipynb		Thirdai_gdrive_engine_TEGD.ipynb
book_web_Scrapping.ipynb		book_web_Scrapping.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TEGD_thirdai_hackathon

Installation

Requirements

Directory Structure

Usage

NOTE:

DATA Link

Try Our Demo

Note: We are not submitting the model, as our main Moto is to keep data private. You can just train your model on your own personal data.

About

Releases

Packages

Contributors 2

Languages

git-siddhesh/TEGD_thirdai_hackathon

Folders and files

Latest commit

History

Repository files navigation

TEGD_thirdai_hackathon

Installation

Requirements

Directory Structure

Usage

NOTE:

DATA Link

Try Our Demo

Note: We are not submitting the model, as our main Moto is to keep data private. You can just train your model on your own personal data.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages