Skip to content

Nick/proofread #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: development
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions about.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ SmartCore is developed and maintained by Smartcore developers. Our goal is to bu

### Version 0.1.0

This is our first realease, enjoy! In this version you'll find:
This is our first release, enjoy! In this version you'll find:
- KNN + distance metrics (Euclidian, Minkowski, Manhattan, Hamming, Mahalanobis)
- Linear Regression (OLS)
- Logistic Regression
Expand All @@ -53,4 +53,4 @@ This is our first realease, enjoy! In this version you'll find:
- LU, QR, SVD, EVD
- Evaluation Metrics

Please let us know if you found a problem. The best way to report it is to [open an issue](https://github.com/smartcorelib/smartcore/issues) on GitHub.
Please let us know if you found a problem. The best way to report it is to [open an issue](https://github.com/smartcorelib/smartcore/issues) on GitHub.
4 changes: 2 additions & 2 deletions user_guide/developer.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ If you found a bug or problem please do not hesitate to report it by [opening an

The best way to request a new feature is by [opening an issue](https://github.com/smartcorelib/smartcore/issues) in GitHub. When you submit your idea, please keep in mind these recommendations:

* If you are requesting new algorithm, please add references to papers describing this algorithm. If you have a particular implementation in mind, feel free to share references to it as well. If not, we will do our best to find the best implementation available ourselves.
* If you are requesting a new algorithm, please add references to papers describing this algorithm. If you have a particular implementation in mind, feel free to share references to it as well. If not, we will do our best to find the best implementation available ourselves.
* Please tell us why this feature is important to you.

## Contributing code
Expand All @@ -43,6 +43,6 @@ To make sure your PR is swiftly approved and merged, please make sure new featur

## Changes to documentation

If you found a problem in documentation please do not hesitate to correct it and submit your proposed change as a [pull request](https://github.com/smartcorelib/smartcore/pulls) (PR) in GutHub. At this moment documentation is found in several places: [API](https://github.com/smartcorelib/smartcore), [website](https://github.com/smartcorelib/smartcorelib.org) and [examples](https://github.com/smartcorelib/smartcore-examples). Please submit your pull request to a corresponding repository. If your change is a minor correction (e.g. misspelling or grammar error) there is no need to open a separate issue describing what you've found, just correct it and submit your PR!
If you found a problem in documentation please do not hesitate to correct it and submit your proposed change as a [pull request](https://github.com/smartcorelib/smartcore/pulls) (PR) in GitHub. At this moment documentation is found in several places: [API](https://github.com/smartcorelib/smartcore), [website](https://github.com/smartcorelib/smartcorelib.org) and [examples](https://github.com/smartcorelib/smartcore-examples). Please submit your pull request to a corresponding repository. If your change is a minor correction (e.g. misspelling or grammar error) there is no need to open a separate issue describing what you've found, just correct it and submit your PR!

Another way to make a change in documentation is to [open an issue](https://github.com/smartcorelib/smartcore/issues) in GitHub.
10 changes: 5 additions & 5 deletions user_guide/model_selection.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ description: Tools for model selection and evaluation. K-fold cross validation,

*SmartCore* comes with a lot of easy-to-use algorithms and it is straightforward to fit many different machine learning models to a given dataset. Once you have many algorithms to choose from the question becomes how to choose the best machine learning model among a range of different models that you can use for your data. The problem of choosing the right model becomes even harder if you consider many different combinations of hyperparameters for each algorithm.

Model selection is the process of selecting one final machine learning model from among a collection of candidate models for you problem at hand. The process of assessing a model’s performance is known as model evaluation.
Model selection is the process of selecting one final machine learning model from among a collection of candidate models for the problem at hand. The process of assessing a model’s performance is known as model evaluation.

[K-fold Cross-Validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)) (k-fold CV) is a commonly used technique for model selection and evaluation. Another alternative is to split your data into three separate sets: _training_, _validation_, _test_. You use the _training_ set to train your model and _validation_ set for model selection and hyperparameter tuning. The _test_ set can be used to get an unbiased estimate of model performance.

Expand All @@ -34,12 +34,12 @@ let y = boston_data.target;
let (x_train, x_test, y_train, y_test) = train_test_split(&x, &y, 0.2, true);
```

While a simple test/train split method is good for a very large dataset, the test score dependents on how the data is split into train and test sets. To get a better indication of how well your model performs on new data use k-fold CV.
While a simple test/train split method is good for a very large dataset, the test score depends on how the data is split into train and test sets. To get a better indication of how well your model performs on new data use k-fold CV.

To evaluate performance of your model with k-fold CV use [`cross_validate`]({{site.api_base_url}}/model_selection/fn.cross_validate.html) function.
This function splits datasets up into k groups. One of the groups is used as the test set and the rest are used as the training set. The model is trained on the training set and evaluated on the test set. Then the process is repeated until each unique group as been used as the test set.
This function splits datasets up into k groups. One of the groups is used as the test set and the rest are used as the training set. The model is trained on the training set and evaluated on the test set. Then the process is repeated until each unique group has been used as the test set.

For example, when you split your dataset into 3 folds, as in <nobr>Figure 1</nobr>, `cross_validate` will fit and evaluate your model 3 times. First, the function will use folds 2 and 3 to train your model and fold 1 to evaluate its performance. On the second run, the function will take folds 1 and 3 for trainig and fold 2 for evaluation.
For example, when you split your dataset into 3 folds, as in <nobr>Figure 1</nobr>, `cross_validate` will fit and evaluate your model 3 times. First, the function will use folds 2 and 3 to train your model and fold 1 to evaluate its performance. On the second run, the function will take folds 1 and 3 for training and fold 2 for evaluation.

<figure class="image" align="center">
<img src="{{site.baseurl}}/assets/imgs/kfold.svg" alt="k-fold CV" class="img-fluid">
Expand Down Expand Up @@ -95,7 +95,7 @@ We also keep toy datasets behind the `datasets` feature flag. Feature `datasets`
smartcore = { version = "0.1.0", default-features = false}
```

When feature flag `datasets` is enabled you'l get these datasets:
When feature flag `datasets` is enabled you'll get these datasets:

{:.table .table-striped .table-bordered}
| Dataset | Description | Samples | Attributes | Type |
Expand Down
12 changes: 6 additions & 6 deletions user_guide/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ All of these algorithms are implemented in Rust.

Why another machine learning library for Rust, you might ask? While there are at least three [general-purpose ML libraries](http://www.arewelearningyet.com/) for Rust,
most of these libraries either do not support all of the algorithms that are implemented in *SmartCore* or aren't integrated with [nalgebra](https://nalgebra.org/) and [ndarray](https://github.com/rust-ndarray/ndarray).
All algorithms in *SmartCore* works well with both libraries. You can also use standard Rust vectors with all of the algorithms implemented here if you prefer to have minimum number of dependencies in your code.
All algorithms in *SmartCore* work well with both libraries. You can also use standard Rust vectors with all of the algorithms implemented here if you prefer to have minimum number of dependencies in your code.

We developed *SmartCore* to promote scientific computing in Rust. Our goal is to build an open-source library that has accurate, numerically stable, and well-documented implementations of the most well-known and widely used machine learning methods.

Expand Down Expand Up @@ -96,9 +96,9 @@ Our performance metric (accuracy) went up two percentage points! Nice work!

## High-level overview

Majority of machine learning algorithms rely on linear algebra routines and optimization methods to fit a model to a dataset or to make a prediction from new data. There are many crates for linear algebra and optimization in Rust but SmartCore does not has a hard dependency on any of these crates. Instead, machine learning algorithms in *SmartCore* use an abstraction layer where operations on multidimensional arrays and maximization/minimization routines are defined. This approach allow us to quickly integrate with any new type of matrix or vector as long as it implements all abstract methods from this layer.
Majority of machine learning algorithms rely on linear algebra routines and optimization methods to fit a model to a dataset or to make a prediction from new data. There are many crates for linear algebra and optimization in Rust but SmartCore does not have a hard dependency on any of these crates. Instead, machine learning algorithms in *SmartCore* use an abstraction layer where operations on multidimensional arrays and maximization/minimization routines are defined. This approach allows us to quickly integrate with any new type of matrix or vector as long as it implements all abstract methods from this layer.

Functions from optimization module are not available directly but we plan to make optimization library public once it is mature enough.
Functions from the optimization module are not available directly but we plan to make the optimization library public once it is mature enough.

While functions from [linear algebra module]({{site.api_base_url}}/linalg/index.html) are public you should not use them directly because this module is still unstable. We keep this interface open to let anyone add implementations of other types of matrices that are currently not supported by *SmartCore*. Please see [Developer's Guide]({{ site.baseurl }}/user_guide/developer.html) if you want to add your favourite matrix type to *SmartCore*.

Expand All @@ -111,9 +111,9 @@ Figure 1 shows 3 layers with abstract linear algebra and optimization functions

### API

All algorithms in *SmartCore* implement the same inrefrace when it comes to fitting an algorithm to your dataset or making a prediction from new data. All core interfaces are defined in the [api module]({{site.api_base_url}}/api/index.html).
All algorithms in *SmartCore* implement the same interface when it comes to fitting an algorithm to your dataset or making a prediction from new data. All core interfaces are defined in the [api module]({{site.api_base_url}}/api/index.html).

There is a static function `fit` that fits an algorithm to your data. This function is defined in two places, [`SupervisedEstimator`]({{ site.api_base_url }}/api/trait.SupervisedEstimator.html) and [`UnsupervisedEstimator`]({{ site.api_base_url }}/api/trait.UnsupervisedEstimator.html), one is used for supervised learning and another for unsupervised learning. Both estimators takes you training data and hyperparameters for the algorithm and produce a fully trained instance of the estimator. The only difference between these two traits is that `SupervisedEstimator` requires training target values in addition to training predictors to fit an algorithm to your data.
There is a static function `fit` that fits an algorithm to your data. This function is defined in two places, [`SupervisedEstimator`]({{ site.api_base_url }}/api/trait.SupervisedEstimator.html) and [`UnsupervisedEstimator`]({{ site.api_base_url }}/api/trait.UnsupervisedEstimator.html), one is used for supervised learning and another for unsupervised learning. Both estimators take your training data and hyperparameters for the algorithm and produce a fully trained instance of the estimator. The only difference between these two traits is that `SupervisedEstimator` requires training target values in addition to training predictors to fit an algorithm to your data.

A function `predict` is defined in the [`Predictor`]({{ site.api_base_url }}/api/trait.Predictor.html) trait and is used to predict labels or target values from new data. All mandatory parameters of the model are declared as parameters of function `fit`. All optional parameters are hidden behind `Default::default()`.

Expand Down Expand Up @@ -208,4 +208,4 @@ If you are done reading through this page we would recommend to go to a specific
* [Supervised Learning]({{ site.baseurl }}/user_guide/supervised.html), in this section you will find tree-based, linear and KNN models.
* [Unsupervised Learning]({{ site.baseurl }}/user_guide/unsupervised.html), unsupervised methods like clustering and matrix decomposition methods.
* [Model Selection]({{ site.baseurl }}/user_guide/model_selection.html), varios metrics for model evaluation.
* [Developer's Guide]({{ site.baseurl }}/user_guide/developer.html), would you like to contribute? Here you will find useful guidelines and rubrics to consider.
* [Developer's Guide]({{ site.baseurl }}/user_guide/developer.html), would you like to contribute? Here you will find useful guidelines and rubrics to consider.
Loading