Skip to content

Commit

Permalink
Fix spelling errors (#3181)
Browse files Browse the repository at this point in the history
* Fix spelling errors in documentation and code comments

* Auto-update of Starter template

---------

Co-authored-by: GitHub Actions <[email protected]>
  • Loading branch information
safoinme and actions-user authored Nov 8, 2024
1 parent cc9c3d8 commit a5d4531
Show file tree
Hide file tree
Showing 9 changed files with 37 additions and 37 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Related concepts:
* early on, even if it's just to keep a log of the quality state of your data and the performance of your models at different stages of development.
* if you have pipelines that regularly ingest new data, you should use data validation to run regular data integrity checks to signal problems before they are propagated downstream.
* in continuous training pipelines, you should use data validation techniques to compare new training data against a data reference and to compare the performance of newly trained models against previous ones.
* when you have pipelines that automate batch inference or if you regularly collect data used as input in online inference, you should use data validation to run data drift analyses and detect training-serving skew, data drift and model drift.
* when you have pipelines that automate batch inference or if you regularly collect data used as input in online inference, you should use data validation to run data drift analyzes and detect training-serving skew, data drift and model drift.

#### Data Validator Flavors

Expand Down
2 changes: 1 addition & 1 deletion docs/book/component-guide/data-validators/deepchecks.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The Deepchecks [Data Validator](./data-validators.md) flavor provided with the Z

### When would you want to use it?

[Deepchecks](https://deepchecks.com/) is an open-source library that you can use to run a variety of data and model validation tests, from data integrity tests that work with a single dataset to model evaluation tests to data drift analyses and model performance comparison tests. All this can be done with minimal configuration input from the user, or customized with specialized conditions that the validation tests should perform.
[Deepchecks](https://deepchecks.com/) is an open-source library that you can use to run a variety of data and model validation tests, from data integrity tests that work with a single dataset to model evaluation tests to data drift analyzes and model performance comparison tests. All this can be done with minimal configuration input from the user, or customized with specialized conditions that the validation tests should perform.

Deepchecks works with both tabular data and computer vision data. For tabular, the supported dataset format is `pandas.DataFrame` and the supported model format is `sklearn.base.ClassifierMixin`. For computer vision, the supported dataset format is `torch.utils.data.dataloader.DataLoader` and supported model format is `torch.nn.Module`.

Expand Down
4 changes: 2 additions & 2 deletions docs/book/component-guide/data-validators/evidently.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: >-

# Evidently

The Evidently [Data Validator](./data-validators.md) flavor provided with the ZenML integration uses [Evidently](https://evidentlyai.com/) to perform data quality, data drift, model drift and model performance analyses, to generate reports and run checks. The reports and check results can be used to implement automated corrective actions in your pipelines or to render interactive representations for further visual interpretation, evaluation and documentation.
The Evidently [Data Validator](./data-validators.md) flavor provided with the ZenML integration uses [Evidently](https://evidentlyai.com/) to perform data quality, data drift, model drift and model performance analyzes, to generate reports and run checks. The reports and check results can be used to implement automated corrective actions in your pipelines or to render interactive representations for further visual interpretation, evaluation and documentation.

### When would you want to use it?

Expand Down Expand Up @@ -47,7 +47,7 @@ zenml stack register custom_stack -dv evidently_data_validator ... --set

Evidently's profiling functions take in a `pandas.DataFrame` dataset or a pair of datasets and generate results in the form of a `Report` object.

One of Evidently's notable characteristics is that it only requires datasets as input. Even when running model performance comparison analyses, no model needs to be present. However, that does mean that the input data needs to include additional `target` and `prediction` columns for some profiling reports and, you have to include additional information about the dataset columns in the form of [column mappings](https://docs.evidentlyai.com/user-guide/tests-and-reports/column-mapping). Depending on how your data is structured, you may also need to include additional steps in your pipeline before the data validation step to insert the additional `target` and `prediction` columns into your data. This may also require interacting with one or more models.
One of Evidently's notable characteristics is that it only requires datasets as input. Even when running model performance comparison analyzes, no model needs to be present. However, that does mean that the input data needs to include additional `target` and `prediction` columns for some profiling reports and, you have to include additional information about the dataset columns in the form of [column mappings](https://docs.evidentlyai.com/user-guide/tests-and-reports/column-mapping). Depending on how your data is structured, you may also need to include additional steps in your pipeline before the data validation step to insert the additional `target` and `prediction` columns into your data. This may also require interacting with one or more models.

There are three ways you can use Evidently to generate data reports in your ZenML pipelines that allow different levels of flexibility:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,23 @@ various factors that influence the deployment process. One of the primary
considerations is the memory and machine requirements for your finetuned model.
LLMs
are typically resource-intensive, requiring substantial RAM, processing power
and specialised hardware. This choice of hardware can significantly impact both
and specialized hardware. This choice of hardware can significantly impact both
performance and cost, so it's crucial to strike the right balance based on your
specific use case.

Real-time considerations play a vital role in deployment planning, especially
for applications that require immediate responses. This includes preparing for
potential failover scenarios if your finetuned model encounters issues,
conducting thorough benchmarks and load testing, and modelling expected user
conducting thorough benchmarks and load testing, and modeling expected user
load and usage patterns. Additionally, you'll need to decide between streaming
and non-streaming approaches, each with its own set of trade-offs in terms of
latency and resource utilisation.
latency and resource utilization.

Optimisation techniques, such as quantisation, can help reduce the resource
footprint of your model. However, these optimisations often come with additional
Optimization techniques, such as quantization, can help reduce the resource
footprint of your model. However, these Optimizations often come with additional
steps in your workflow and require careful evaluation to ensure they don't
negatively impact model performance. [Rigorous evaluation](./evaluation-for-finetuning.md)
becomes crucial in quantifying the extent to which you can optimise without
becomes crucial in quantifying the extent to which you can optimize without
compromising accuracy or functionality.

## Deployment Options and Trade-offs
Expand All @@ -39,7 +39,7 @@ When it comes to deploying your finetuned LLM, several options are available,
each with its own set of advantages and challenges:

1. **Roll Your Own**: This approach involves setting up and managing your own
infrastructure. While it offers the most control and customisation, it also
infrastructure. While it offers the most control and customization, it also
requires expertise and resources to maintain. For this, you'd
usually create some kind of Docker-based service (a FastAPI endpoint, for
example) and deploy this on your infrastructure, with you taking care of all
Expand All @@ -49,7 +49,7 @@ each with its own set of advantages and challenges:
be aware of the "cold start" phenomenon, which can introduce latency for
infrequently accessed models.
3. **Always-On Options**: These deployments keep your model constantly running
and ready to serve requests. While this approach minimises latency, it can be
and ready to serve requests. While this approach minimizes latency, it can be
more costly as you're paying for resources even during idle periods.
4. **Fully Managed Solutions**: Many cloud providers and AI platforms offer
managed services for deploying LLMs. These solutions can simplify the
Expand Down Expand Up @@ -177,14 +177,14 @@ crucial. Key areas to watch include:
2. **Latency Metrics**: Monitor response times to ensure they meet your
application's requirements.
3. **Load and Usage Patterns**: Keep an eye on how users interact with your model
to inform scaling decisions and potential optimisations.
4. **Data Analysis**: Regularly analyse the inputs and outputs of your model to
to inform scaling decisions and potential Optimizations.
4. **Data Analysis**: Regularly analyze the inputs and outputs of your model to
identify trends, potential biases, or areas for improvement.

It's also important to consider privacy and security when capturing and logging
responses. Ensure that your logging practices comply with relevant data
protection regulations and your organisation's privacy policies.
protection regulations and your organization's privacy policies.

By carefully considering these deployment options and maintaining vigilant
monitoring practices, you can ensure that your finetuned LLM performs optimally
and continues to meet the needs of your users and organisation.
and continues to meet the needs of your users and organization.
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ The motivation for implementing thorough evals is similar to that of unit tests

1. **Prevent Regressions**: Ensure that new iterations or changes don't negatively impact existing functionality.

2. **Track Improvements**: Quantify and visualise how your model improves with each iteration or finetuning session.
2. **Track Improvements**: Quantify and visualize how your model improves with each iteration or finetuning session.

3. **Ensure Safety and Robustness**: Given the complex nature of LLMs, comprehensive evals help identify and mitigate potential risks, biases, or unexpected behaviours.
3. **Ensure Safety and Robustness**: Given the complex nature of LLMs, comprehensive evals help identify and mitigate potential risks, biases, or unexpected behaviors.

By implementing a robust evaluation strategy, you can develop more reliable, performant, and safe finetuned LLMs while maintaining a clear picture of your model's capabilities and limitations throughout the development process.

Expand All @@ -38,12 +38,12 @@ finetuning use case. The main distinction here is that we are not looking to
evaluate retrieval, but rather the performance of the finetuned model (i.e.
[the generation part](../evaluation/generation.md)).

Custom evals are tailored to your specific use case and can be categorised into two main types:
Custom evals are tailored to your specific use case and can be categorized into two main types:

1. **Success Modes**: These evals focus on things you want to see in your model's output, such as:
- Correct formatting
- Appropriate responses to specific prompts
- Desired behaviour in edge cases
- Desired behavior in edge cases

2. **Failure Modes**: These evals target things you don't want to see, including:
- Hallucinations (generating false or nonsensical information)
Expand All @@ -59,15 +59,15 @@ from my_library import query_llm

good_responses = {
"what are the best salads available at the food court?": ["caesar", "italian"],
"how late is the shopping centre open until?": ["10pm", "22:00", "ten"]
"how late is the shopping center open until?": ["10pm", "22:00", "ten"]
}

for question, answers in good_responses.items():
llm_response = query_llm(question)
assert any(answer in llm_response for answer in answers), f"Response does not contain any of the expected answers: {answers}"

bad_responses = {
"who is the manager of the shopping centre?": ["tom hanks", "spiderman"]
"who is the manager of the shopping center?": ["tom hanks", "spiderman"]
}

for question, answers in bad_responses.items():
Expand All @@ -77,15 +77,15 @@ for question, answers in bad_responses.items():

You can see how you might want to expand this out to cover more examples and more failure modes, but this is a good start. As you continue in the work of iterating on your model and performing more tests, you can update these cases with known failure modes (and/or with obvious success modes that your use case must always work for).

### Generalised Evals and Frameworks
### Generalized Evals and Frameworks

Generalised evals and frameworks provide a structured approach to evaluating your finetuned LLM. They offer:
Generalized evals and frameworks provide a structured approach to evaluating your finetuned LLM. They offer:

- Assistance in organising and structuring your evals
- Standardised evaluation metrics for common tasks
- Assistance in organizing and structuring your evals
- Standardized evaluation metrics for common tasks
- Insights into the model's overall performance

When using generalised evals, it's important to consider their limitations and caveats. While they provide valuable insights, they should be complemented with custom evals tailored to your specific use case. Some possible options for you to check out include:
When using Generalized evals, it's important to consider their limitations and caveats. While they provide valuable insights, they should be complemented with custom evals tailored to your specific use case. Some possible options for you to check out include:

- [prodigy-evaluate](https://github.com/explosion/prodigy-evaluate?tab=readme-ov-file)
- [ragas](https://docs.ragas.io/en/stable/getstarted/monitoring.html)
Expand All @@ -112,7 +112,7 @@ As part of this, implementing comprehensive logging from the early stages of dev

Alongside collecting the raw data and viewing it periodically, creating simple
dashboards that display core metrics reflecting your model's performance is an
effective way to visualise and monitor progress. These metrics should align with
effective way to visualize and monitor progress. These metrics should align with
your iteration goals and capture improvements over time, allowing you to quickly
assess the impact of changes and identify areas that require attention. Again,
as with everything else, don't let perfect be the enemy of the good; a simple
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ smaller size (e.g. one of the Llama 3.1 family at the ~8B parameter mark) and
then iterate on that. This will allow you to quickly run through a number of
experiments and see how the model performs on your use case.

In this early stage, experimentation is important. Accordingly, any way you can maximise the number of experiments you can run will help increase the amount you can learn. So we want to minimise the amount of time it takes to iterate to a new experiment. Depending on the precise details of what you do, you might iterate on your data, on some hyperparameters of the finetuning process, or you might even try out different use case options.
In this early stage, experimentation is important. Accordingly, any way you can maximize the number of experiments you can run will help increase the amount you can learn. So we want to minimize the amount of time it takes to iterate to a new experiment. Depending on the precise details of what you do, you might iterate on your data, on some hyperparameters of the finetuning process, or you might even try out different use case options.

## Implementation details

Expand Down Expand Up @@ -190,15 +190,15 @@ components for distributed training. For more details, see the [Accelerate docum

## Dataset iteration

While these stages offer lots of surface area for intervention and customisation, the most significant thing to be careful with is the data that you input into the model. If you find that your finetuned model offers worse performance than the base, or if you get garbled output post-fine tuning, this would be a strong indicator that you have not correctly formatted your input data, or something is mismatched with the tokeniser and so on. To combat this, be sure to inspect your data at all stages of the process!
While these stages offer lots of surface area for intervention and customization, the most significant thing to be careful with is the data that you input into the model. If you find that your finetuned model offers worse performance than the base, or if you get garbled output post-fine tuning, this would be a strong indicator that you have not correctly formatted your input data, or something is mismatched with the tokeniser and so on. To combat this, be sure to inspect your data at all stages of the process!

The main behaviour and activity while using this notebook should be around being
The main behavior and activity while using this notebook should be around being
more serious about your data. If you are finding that you're on the low end of
the spectrum, consider ways to either supplement that data or to synthetically
generate data that could be substituted in. You should also start to think about
evaluations at this stage (see [the next guide](./evaluation-for-finetuning.md) for more) since
the changes you will likely want to measure how well your model is doing,
especially when you make changes and customisations. Once you have some basic
especially when you make changes and customizations. Once you have some basic
evaluations up and running, you can then start thinking through all the optimal
parameters and measuring whether these updates are actually doing what you think
they will.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,11 @@ In general, try to pick something that is small and self-contained, ideally the

For example, a general use case of "answer all customer support emails" is almost certainly too vague, whereas something like "triage incoming customer support queries and extract relevant information as per some pre-defined checklist or schema" is much more realistic.

It's also worth picking something where you can reach some sort of answer as to whether this the right approach in a short amount of time. If your use case depends on the generation or annotation of lots of data, or organisation and sorting of pre-existing data, this is less of an ideal starter project than if you have data that already exists within your organisation and that you can repurpose here.
It's also worth picking something where you can reach some sort of answer as to whether this the right approach in a short amount of time. If your use case depends on the generation or annotation of lots of data, or organization and sorting of pre-existing data, this is less of an ideal starter project than if you have data that already exists within your organization and that you can repurpose here.

## Picking data for your use case

The data needed for your use case will follow directly from the specific use case you're choosing, but ideally it should be something that is already *mostly* in the direction of what you need. It will take time to annotate and manually transform data if it is too distinct from the specific use case you want to use, so try to minimise this as much as you possibly can.
The data needed for your use case will follow directly from the specific use case you're choosing, but ideally it should be something that is already *mostly* in the direction of what you need. It will take time to annotate and manually transform data if it is too distinct from the specific use case you want to use, so try to minimize this as much as you possibly can.

A couple of examples of where you might be able to reuse pre-existing data:

Expand Down
Loading

0 comments on commit a5d4531

Please sign in to comment.