From a5d453110f730d7b48f41e4dd433e75d9150a430 Mon Sep 17 00:00:00 2001 From: Safoine El Khabich <34200873+safoinme@users.noreply.github.com> Date: Fri, 8 Nov 2024 14:45:57 +0100 Subject: [PATCH] Fix spelling errors (#3181) * Fix spelling errors in documentation and code comments * Auto-update of Starter template --------- Co-authored-by: GitHub Actions --- .../data-validators/data-validators.md | 2 +- .../data-validators/deepchecks.md | 2 +- .../data-validators/evidently.md | 4 ++-- .../deploying-finetuned-models.md | 24 +++++++++---------- .../evaluation-for-finetuning.md | 24 +++++++++---------- .../finetuning-with-accelerate.md | 8 +++---- .../starter-choices-for-finetuning-llms.md | 4 ++-- .../data_validators/base_data_validator.py | 4 ++-- .../deepchecks_data_validator.py | 2 +- 9 files changed, 37 insertions(+), 37 deletions(-) diff --git a/docs/book/component-guide/data-validators/data-validators.md b/docs/book/component-guide/data-validators/data-validators.md index 70ba1de73b0..80192f4ae72 100644 --- a/docs/book/component-guide/data-validators/data-validators.md +++ b/docs/book/component-guide/data-validators/data-validators.md @@ -23,7 +23,7 @@ Related concepts: * early on, even if it's just to keep a log of the quality state of your data and the performance of your models at different stages of development. * if you have pipelines that regularly ingest new data, you should use data validation to run regular data integrity checks to signal problems before they are propagated downstream. * in continuous training pipelines, you should use data validation techniques to compare new training data against a data reference and to compare the performance of newly trained models against previous ones. -* when you have pipelines that automate batch inference or if you regularly collect data used as input in online inference, you should use data validation to run data drift analyses and detect training-serving skew, data drift and model drift. +* when you have pipelines that automate batch inference or if you regularly collect data used as input in online inference, you should use data validation to run data drift analyzes and detect training-serving skew, data drift and model drift. #### Data Validator Flavors diff --git a/docs/book/component-guide/data-validators/deepchecks.md b/docs/book/component-guide/data-validators/deepchecks.md index 2fd2914c6e6..b24d827f0b5 100644 --- a/docs/book/component-guide/data-validators/deepchecks.md +++ b/docs/book/component-guide/data-validators/deepchecks.md @@ -10,7 +10,7 @@ The Deepchecks [Data Validator](./data-validators.md) flavor provided with the Z ### When would you want to use it? -[Deepchecks](https://deepchecks.com/) is an open-source library that you can use to run a variety of data and model validation tests, from data integrity tests that work with a single dataset to model evaluation tests to data drift analyses and model performance comparison tests. All this can be done with minimal configuration input from the user, or customized with specialized conditions that the validation tests should perform. +[Deepchecks](https://deepchecks.com/) is an open-source library that you can use to run a variety of data and model validation tests, from data integrity tests that work with a single dataset to model evaluation tests to data drift analyzes and model performance comparison tests. All this can be done with minimal configuration input from the user, or customized with specialized conditions that the validation tests should perform. Deepchecks works with both tabular data and computer vision data. For tabular, the supported dataset format is `pandas.DataFrame` and the supported model format is `sklearn.base.ClassifierMixin`. For computer vision, the supported dataset format is `torch.utils.data.dataloader.DataLoader` and supported model format is `torch.nn.Module`. diff --git a/docs/book/component-guide/data-validators/evidently.md b/docs/book/component-guide/data-validators/evidently.md index c80e048ef11..f48f70edb50 100644 --- a/docs/book/component-guide/data-validators/evidently.md +++ b/docs/book/component-guide/data-validators/evidently.md @@ -6,7 +6,7 @@ description: >- # Evidently -The Evidently [Data Validator](./data-validators.md) flavor provided with the ZenML integration uses [Evidently](https://evidentlyai.com/) to perform data quality, data drift, model drift and model performance analyses, to generate reports and run checks. The reports and check results can be used to implement automated corrective actions in your pipelines or to render interactive representations for further visual interpretation, evaluation and documentation. +The Evidently [Data Validator](./data-validators.md) flavor provided with the ZenML integration uses [Evidently](https://evidentlyai.com/) to perform data quality, data drift, model drift and model performance analyzes, to generate reports and run checks. The reports and check results can be used to implement automated corrective actions in your pipelines or to render interactive representations for further visual interpretation, evaluation and documentation. ### When would you want to use it? @@ -47,7 +47,7 @@ zenml stack register custom_stack -dv evidently_data_validator ... --set Evidently's profiling functions take in a `pandas.DataFrame` dataset or a pair of datasets and generate results in the form of a `Report` object. -One of Evidently's notable characteristics is that it only requires datasets as input. Even when running model performance comparison analyses, no model needs to be present. However, that does mean that the input data needs to include additional `target` and `prediction` columns for some profiling reports and, you have to include additional information about the dataset columns in the form of [column mappings](https://docs.evidentlyai.com/user-guide/tests-and-reports/column-mapping). Depending on how your data is structured, you may also need to include additional steps in your pipeline before the data validation step to insert the additional `target` and `prediction` columns into your data. This may also require interacting with one or more models. +One of Evidently's notable characteristics is that it only requires datasets as input. Even when running model performance comparison analyzes, no model needs to be present. However, that does mean that the input data needs to include additional `target` and `prediction` columns for some profiling reports and, you have to include additional information about the dataset columns in the form of [column mappings](https://docs.evidentlyai.com/user-guide/tests-and-reports/column-mapping). Depending on how your data is structured, you may also need to include additional steps in your pipeline before the data validation step to insert the additional `target` and `prediction` columns into your data. This may also require interacting with one or more models. There are three ways you can use Evidently to generate data reports in your ZenML pipelines that allow different levels of flexibility: diff --git a/docs/book/user-guide/llmops-guide/finetuning-llms/deploying-finetuned-models.md b/docs/book/user-guide/llmops-guide/finetuning-llms/deploying-finetuned-models.md index 6afa88d46c4..c16b515cfd7 100644 --- a/docs/book/user-guide/llmops-guide/finetuning-llms/deploying-finetuned-models.md +++ b/docs/book/user-guide/llmops-guide/finetuning-llms/deploying-finetuned-models.md @@ -14,23 +14,23 @@ various factors that influence the deployment process. One of the primary considerations is the memory and machine requirements for your finetuned model. LLMs are typically resource-intensive, requiring substantial RAM, processing power -and specialised hardware. This choice of hardware can significantly impact both +and specialized hardware. This choice of hardware can significantly impact both performance and cost, so it's crucial to strike the right balance based on your specific use case. Real-time considerations play a vital role in deployment planning, especially for applications that require immediate responses. This includes preparing for potential failover scenarios if your finetuned model encounters issues, -conducting thorough benchmarks and load testing, and modelling expected user +conducting thorough benchmarks and load testing, and modeling expected user load and usage patterns. Additionally, you'll need to decide between streaming and non-streaming approaches, each with its own set of trade-offs in terms of -latency and resource utilisation. +latency and resource utilization. -Optimisation techniques, such as quantisation, can help reduce the resource -footprint of your model. However, these optimisations often come with additional +Optimization techniques, such as quantization, can help reduce the resource +footprint of your model. However, these Optimizations often come with additional steps in your workflow and require careful evaluation to ensure they don't negatively impact model performance. [Rigorous evaluation](./evaluation-for-finetuning.md) -becomes crucial in quantifying the extent to which you can optimise without +becomes crucial in quantifying the extent to which you can optimize without compromising accuracy or functionality. ## Deployment Options and Trade-offs @@ -39,7 +39,7 @@ When it comes to deploying your finetuned LLM, several options are available, each with its own set of advantages and challenges: 1. **Roll Your Own**: This approach involves setting up and managing your own - infrastructure. While it offers the most control and customisation, it also + infrastructure. While it offers the most control and customization, it also requires expertise and resources to maintain. For this, you'd usually create some kind of Docker-based service (a FastAPI endpoint, for example) and deploy this on your infrastructure, with you taking care of all @@ -49,7 +49,7 @@ each with its own set of advantages and challenges: be aware of the "cold start" phenomenon, which can introduce latency for infrequently accessed models. 3. **Always-On Options**: These deployments keep your model constantly running - and ready to serve requests. While this approach minimises latency, it can be + and ready to serve requests. While this approach minimizes latency, it can be more costly as you're paying for resources even during idle periods. 4. **Fully Managed Solutions**: Many cloud providers and AI platforms offer managed services for deploying LLMs. These solutions can simplify the @@ -177,14 +177,14 @@ crucial. Key areas to watch include: 2. **Latency Metrics**: Monitor response times to ensure they meet your application's requirements. 3. **Load and Usage Patterns**: Keep an eye on how users interact with your model - to inform scaling decisions and potential optimisations. -4. **Data Analysis**: Regularly analyse the inputs and outputs of your model to + to inform scaling decisions and potential Optimizations. +4. **Data Analysis**: Regularly analyze the inputs and outputs of your model to identify trends, potential biases, or areas for improvement. It's also important to consider privacy and security when capturing and logging responses. Ensure that your logging practices comply with relevant data -protection regulations and your organisation's privacy policies. +protection regulations and your organization's privacy policies. By carefully considering these deployment options and maintaining vigilant monitoring practices, you can ensure that your finetuned LLM performs optimally -and continues to meet the needs of your users and organisation. +and continues to meet the needs of your users and organization. diff --git a/docs/book/user-guide/llmops-guide/finetuning-llms/evaluation-for-finetuning.md b/docs/book/user-guide/llmops-guide/finetuning-llms/evaluation-for-finetuning.md index e3c33dd1c82..c2fc7753b82 100644 --- a/docs/book/user-guide/llmops-guide/finetuning-llms/evaluation-for-finetuning.md +++ b/docs/book/user-guide/llmops-guide/finetuning-llms/evaluation-for-finetuning.md @@ -12,9 +12,9 @@ The motivation for implementing thorough evals is similar to that of unit tests 1. **Prevent Regressions**: Ensure that new iterations or changes don't negatively impact existing functionality. -2. **Track Improvements**: Quantify and visualise how your model improves with each iteration or finetuning session. +2. **Track Improvements**: Quantify and visualize how your model improves with each iteration or finetuning session. -3. **Ensure Safety and Robustness**: Given the complex nature of LLMs, comprehensive evals help identify and mitigate potential risks, biases, or unexpected behaviours. +3. **Ensure Safety and Robustness**: Given the complex nature of LLMs, comprehensive evals help identify and mitigate potential risks, biases, or unexpected behaviors. By implementing a robust evaluation strategy, you can develop more reliable, performant, and safe finetuned LLMs while maintaining a clear picture of your model's capabilities and limitations throughout the development process. @@ -38,12 +38,12 @@ finetuning use case. The main distinction here is that we are not looking to evaluate retrieval, but rather the performance of the finetuned model (i.e. [the generation part](../evaluation/generation.md)). -Custom evals are tailored to your specific use case and can be categorised into two main types: +Custom evals are tailored to your specific use case and can be categorized into two main types: 1. **Success Modes**: These evals focus on things you want to see in your model's output, such as: - Correct formatting - Appropriate responses to specific prompts - - Desired behaviour in edge cases + - Desired behavior in edge cases 2. **Failure Modes**: These evals target things you don't want to see, including: - Hallucinations (generating false or nonsensical information) @@ -59,7 +59,7 @@ from my_library import query_llm good_responses = { "what are the best salads available at the food court?": ["caesar", "italian"], - "how late is the shopping centre open until?": ["10pm", "22:00", "ten"] + "how late is the shopping center open until?": ["10pm", "22:00", "ten"] } for question, answers in good_responses.items(): @@ -67,7 +67,7 @@ for question, answers in good_responses.items(): assert any(answer in llm_response for answer in answers), f"Response does not contain any of the expected answers: {answers}" bad_responses = { - "who is the manager of the shopping centre?": ["tom hanks", "spiderman"] + "who is the manager of the shopping center?": ["tom hanks", "spiderman"] } for question, answers in bad_responses.items(): @@ -77,15 +77,15 @@ for question, answers in bad_responses.items(): You can see how you might want to expand this out to cover more examples and more failure modes, but this is a good start. As you continue in the work of iterating on your model and performing more tests, you can update these cases with known failure modes (and/or with obvious success modes that your use case must always work for). -### Generalised Evals and Frameworks +### Generalized Evals and Frameworks -Generalised evals and frameworks provide a structured approach to evaluating your finetuned LLM. They offer: +Generalized evals and frameworks provide a structured approach to evaluating your finetuned LLM. They offer: -- Assistance in organising and structuring your evals -- Standardised evaluation metrics for common tasks +- Assistance in organizing and structuring your evals +- Standardized evaluation metrics for common tasks - Insights into the model's overall performance -When using generalised evals, it's important to consider their limitations and caveats. While they provide valuable insights, they should be complemented with custom evals tailored to your specific use case. Some possible options for you to check out include: +When using Generalized evals, it's important to consider their limitations and caveats. While they provide valuable insights, they should be complemented with custom evals tailored to your specific use case. Some possible options for you to check out include: - [prodigy-evaluate](https://github.com/explosion/prodigy-evaluate?tab=readme-ov-file) - [ragas](https://docs.ragas.io/en/stable/getstarted/monitoring.html) @@ -112,7 +112,7 @@ As part of this, implementing comprehensive logging from the early stages of dev Alongside collecting the raw data and viewing it periodically, creating simple dashboards that display core metrics reflecting your model's performance is an -effective way to visualise and monitor progress. These metrics should align with +effective way to visualize and monitor progress. These metrics should align with your iteration goals and capture improvements over time, allowing you to quickly assess the impact of changes and identify areas that require attention. Again, as with everything else, don't let perfect be the enemy of the good; a simple diff --git a/docs/book/user-guide/llmops-guide/finetuning-llms/finetuning-with-accelerate.md b/docs/book/user-guide/llmops-guide/finetuning-llms/finetuning-with-accelerate.md index 3ad07632ffb..6f995f7439d 100644 --- a/docs/book/user-guide/llmops-guide/finetuning-llms/finetuning-with-accelerate.md +++ b/docs/book/user-guide/llmops-guide/finetuning-llms/finetuning-with-accelerate.md @@ -46,7 +46,7 @@ smaller size (e.g. one of the Llama 3.1 family at the ~8B parameter mark) and then iterate on that. This will allow you to quickly run through a number of experiments and see how the model performs on your use case. -In this early stage, experimentation is important. Accordingly, any way you can maximise the number of experiments you can run will help increase the amount you can learn. So we want to minimise the amount of time it takes to iterate to a new experiment. Depending on the precise details of what you do, you might iterate on your data, on some hyperparameters of the finetuning process, or you might even try out different use case options. +In this early stage, experimentation is important. Accordingly, any way you can maximize the number of experiments you can run will help increase the amount you can learn. So we want to minimize the amount of time it takes to iterate to a new experiment. Depending on the precise details of what you do, you might iterate on your data, on some hyperparameters of the finetuning process, or you might even try out different use case options. ## Implementation details @@ -190,15 +190,15 @@ components for distributed training. For more details, see the [Accelerate docum ## Dataset iteration -While these stages offer lots of surface area for intervention and customisation, the most significant thing to be careful with is the data that you input into the model. If you find that your finetuned model offers worse performance than the base, or if you get garbled output post-fine tuning, this would be a strong indicator that you have not correctly formatted your input data, or something is mismatched with the tokeniser and so on. To combat this, be sure to inspect your data at all stages of the process! +While these stages offer lots of surface area for intervention and customization, the most significant thing to be careful with is the data that you input into the model. If you find that your finetuned model offers worse performance than the base, or if you get garbled output post-fine tuning, this would be a strong indicator that you have not correctly formatted your input data, or something is mismatched with the tokeniser and so on. To combat this, be sure to inspect your data at all stages of the process! -The main behaviour and activity while using this notebook should be around being +The main behavior and activity while using this notebook should be around being more serious about your data. If you are finding that you're on the low end of the spectrum, consider ways to either supplement that data or to synthetically generate data that could be substituted in. You should also start to think about evaluations at this stage (see [the next guide](./evaluation-for-finetuning.md) for more) since the changes you will likely want to measure how well your model is doing, -especially when you make changes and customisations. Once you have some basic +especially when you make changes and customizations. Once you have some basic evaluations up and running, you can then start thinking through all the optimal parameters and measuring whether these updates are actually doing what you think they will. diff --git a/docs/book/user-guide/llmops-guide/finetuning-llms/starter-choices-for-finetuning-llms.md b/docs/book/user-guide/llmops-guide/finetuning-llms/starter-choices-for-finetuning-llms.md index d23b3798cf2..b0e5de4ebc2 100644 --- a/docs/book/user-guide/llmops-guide/finetuning-llms/starter-choices-for-finetuning-llms.md +++ b/docs/book/user-guide/llmops-guide/finetuning-llms/starter-choices-for-finetuning-llms.md @@ -42,11 +42,11 @@ In general, try to pick something that is small and self-contained, ideally the For example, a general use case of "answer all customer support emails" is almost certainly too vague, whereas something like "triage incoming customer support queries and extract relevant information as per some pre-defined checklist or schema" is much more realistic. -It's also worth picking something where you can reach some sort of answer as to whether this the right approach in a short amount of time. If your use case depends on the generation or annotation of lots of data, or organisation and sorting of pre-existing data, this is less of an ideal starter project than if you have data that already exists within your organisation and that you can repurpose here. +It's also worth picking something where you can reach some sort of answer as to whether this the right approach in a short amount of time. If your use case depends on the generation or annotation of lots of data, or organization and sorting of pre-existing data, this is less of an ideal starter project than if you have data that already exists within your organization and that you can repurpose here. ## Picking data for your use case -The data needed for your use case will follow directly from the specific use case you're choosing, but ideally it should be something that is already *mostly* in the direction of what you need. It will take time to annotate and manually transform data if it is too distinct from the specific use case you want to use, so try to minimise this as much as you possibly can. +The data needed for your use case will follow directly from the specific use case you're choosing, but ideally it should be something that is already *mostly* in the direction of what you need. It will take time to annotate and manually transform data if it is too distinct from the specific use case you want to use, so try to minimize this as much as you possibly can. A couple of examples of where you might be able to reuse pre-existing data: diff --git a/src/zenml/data_validators/base_data_validator.py b/src/zenml/data_validators/base_data_validator.py index 53a4ac669c4..0a62b2ed142 100644 --- a/src/zenml/data_validators/base_data_validator.py +++ b/src/zenml/data_validators/base_data_validator.py @@ -175,7 +175,7 @@ def model_validation( This method should be implemented by data validators that support running model validation checks (e.g. confusion matrix validation, - performance reports, model error analyses, etc). + performance reports, model error analyzes, etc). Unlike `data_validation`, model validation checks require that a model be present as an active component during the validation process. @@ -184,7 +184,7 @@ def model_validation( accommodate different categories of data validation tests, e.g.: * single dataset tests: confusion matrix validation, - performance reports, model error analyses, etc + performance reports, model error analyzes, etc * model comparison tests: tests that identify changes in a model behavior by comparing how it performs on two different datasets. diff --git a/src/zenml/integrations/deepchecks/data_validators/deepchecks_data_validator.py b/src/zenml/integrations/deepchecks/data_validators/deepchecks_data_validator.py index 94aa67cf65f..60dfccc8d51 100644 --- a/src/zenml/integrations/deepchecks/data_validators/deepchecks_data_validator.py +++ b/src/zenml/integrations/deepchecks/data_validators/deepchecks_data_validator.py @@ -430,7 +430,7 @@ def model_validation( """Run one or more Deepchecks model validation checks. Call this method to perform model validation checks (e.g. confusion - matrix validation, performance reports, model error analyses, etc). + matrix validation, performance reports, model error analyzes, etc). A second dataset is required for model performance comparison tests (i.e. tests that identify changes in a model behavior by comparing how it performs on two different datasets).