Add comments Argilla - David Berenstein #119

davidberenstein1957 · 2024-08-01T05:11:50Z

No description provided.

dagshub · 2024-08-01T05:11:53Z

Join the discussion on DagsHub!

review-notebook-app · 2024-08-01T05:11:55Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

davidberenstein1957 · 2024-08-01T07:36:21Z

@alex-zenml

* add embeddings finetuning quick and dirty code * formatting * add embeddings finetuning pipeline * refactor * works for multi-gpu setup * update gitignore * scratch test in notebook * pipeline is working again * no need for custom class * visualize performance comparison * compare apples to apples * fix viz bug and strip out code / logs examples * update print statement * mini updates * add dummy embedding pipeline to compare against * updates to collate * add dummy pipeline and refactor * new pipelines * run locally, fix attribute naming error * full run * next experiment * Add ollama ignores to .gitignore * add retries to question generation * generate 3 sets of questions * handle multiple question generation * add retries for question generation function * more robustness around failures * formatting * fix dataloader splitting (on newline) * inspect test dataloader construction WIP * split chunks by headers * strip out boilerplate at top of markdown docs * refactor & fix chunking logic * add artifact metadata logging * fix annotations and artifact metadata logging * update metadata * update logging statements * add time estimates to logging * log estimated completion time * update log statements * add rate for logging * Update generation rate message * fix eval * limited run + limited eval * full eval * distilroberta as base model * smaller batch size for a larger base model * fix eval mismatch and chunking * chore: Update .gitignore to exclude .flashrank_cache file * chore: Add notebook for argilla * chore: Add .gitignore entry for bge-base-financial-matryoshka * Update sentence-transformers to version 3 and add transformers dependency * add data and get finetuning working * chore: Add .gitignore entry for embeddings * Update zenml dependency to version 0.58.2 with server support * chore: Update dependencies and add .gitignore entry for embeddings * Add comments Argilla - David Berenstein (#119) * Add introduction to argilla_embeddings * Update context for data and synthetic query generation * Added argilla data section * embeddings updates * update gitignore * format * add push to HF imports correctly * remove extra connection cell * add final step to pipeline as active * fix typo * updates * Feature/finetune embeddings (#120) * Add introduction to argilla_embeddings * Update context for data and synthetic query generation * Added argilla data section * Update Argilla code * Remove cell outputs * Remove secrets * Remove duplicate Argilla section * Update dataset name * Update phrasing Argilla * Updatge merge conflicts * fixes for dataset generation * Update vector naming (#121) * Add introduction to argilla_embeddings * Update context for data and synthetic query generation * Added argilla data section * Update Argilla code * Remove cell outputs * Remove secrets * Remove duplicate Argilla section * Update dataset name * Update phrasing Argilla * Updatge merge conflicts * Update vector naming * working argilla upload * fix argilla upload * add wandb requirement * add finetuning step to pipeline * ignore local model files * working finetune step * finetune step pushes to the hub now * update gitignore * update pipeline * remove wandb tracking * broken version * remove automatic wandb tracking * return results * try with larger base model * add new model to gitignore * use snowflake model * add model registration * remove constant * add model throughout both pipelines * finetuning steps in logical order now * add new constant * update model version * use constant for model name * various improvements :) * add argilla flywheel functionality * refactor out a constant for the dataset name * add license * more small changes * Various improvements (#123) * Resolve argilla client default workspace warning * Add constant for other pipeline steps * Add constant for finetune_embeddings * Remove step embeddings constants * import contants in pipeline directly * Remove constants from distilabel_generation * Add workflow for basic dataset filtering * Add updated plots --------- Co-authored-by: Alex Strick van Linschoten <[email protected]> * format and remove unused imports * fix push step * fixed finetuning step * use github repo to install for latest changes * update README * Delete llm-complete-guide/__init__.py * Update llm-complete-guide/requirements.txt * update README * format and add results visualization * remove extra comma * add visualization step * visualization corrections * make distilabel imports clearer * update README docs * address TODO comment * credit phil * fix typo * add actual values to chart * add link to LLMOps guide * make visualization a bit nicer * update main README --------- Co-authored-by: David Berenstein <[email protected]>

Add introduction to argilla_embeddings

225b310

davidberenstein1957 changed the title ~~Add comment Argilla - David Berenstein~~ Add comments Argilla - David Berenstein Aug 1, 2024

davidberenstein1957 added 2 commits August 1, 2024 08:00

Update context for data and synthetic query generation

431dfd3

Added argilla data section

7066d6e

strickvl merged commit caf0dca into zenml-io:feature/finetune-embeddings Aug 1, 2024
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add comments Argilla - David Berenstein #119

Add comments Argilla - David Berenstein #119

davidberenstein1957 commented Aug 1, 2024

dagshub bot commented Aug 1, 2024

review-notebook-app bot commented Aug 1, 2024

davidberenstein1957 commented Aug 1, 2024

Add comments Argilla - David Berenstein #119

Add comments Argilla - David Berenstein #119

Conversation

davidberenstein1957 commented Aug 1, 2024

dagshub bot commented Aug 1, 2024

review-notebook-app bot commented Aug 1, 2024

davidberenstein1957 commented Aug 1, 2024