|
| 1 | +# VSL Machine Learning (vsl.ml) |
| 2 | + |
| 3 | +VSL aims to provide a robust set of tools for scientific computing with an emphasis |
| 4 | +on performance and ease of use. In the `vsl.ml` module, some machine learning |
| 5 | +models are designed as observers of data, meaning they re-train automatically when |
| 6 | +data changes, while others do not require this functionality. |
| 7 | + |
| 8 | +## Key Features |
| 9 | + |
| 10 | +- **Observers of Data**: Some machine learning models in VSL act as observers, |
| 11 | + re-training automatically when data changes. |
| 12 | +- **High Performance**: Leverages V’s performance optimizations and can integrate |
| 13 | + with C and Fortran libraries like Open BLAS and LAPACK. |
| 14 | +- **Versatile Algorithms**: Supports a variety of machine learning algorithms and |
| 15 | + models. |
| 16 | + |
| 17 | +## Usage |
| 18 | + |
| 19 | +### Loading Data |
| 20 | + |
| 21 | +The `Data` struct in `vsl.ml` is designed to hold data in matrix format for machine |
| 22 | +learning tasks. Here's a brief overview of how to use it: |
| 23 | + |
| 24 | +#### Creating a Data Object |
| 25 | + |
| 26 | +You can create a `Data` object using the following methods: |
| 27 | + |
| 28 | +- `Data.new`: Creates a new `Data` object with specified dimensions. |
| 29 | +- `Data.from_raw_x`: Creates a `Data` object from raw x values (without y values). |
| 30 | +- `Data.from_raw_xy`: Creates a `Data` object from raw x and y values combined in a single matrix. |
| 31 | +- `Data.from_raw_xy_sep`: Creates a `Data` object from separate x and y raw values. |
| 32 | + |
| 33 | +### Data Methods |
| 34 | + |
| 35 | +The `Data` struct has several key methods to manage and manipulate data: |
| 36 | + |
| 37 | +- `set(x, y)`: Sets the x matrix and y vector and notifies observers. |
| 38 | +- `set_y(y)`: Sets the y vector and notifies observers. |
| 39 | +- `set_x(x)`: Sets the x matrix and notifies observers. |
| 40 | +- `split(ratio)`: Splits the data into two parts based on the given ratio. |
| 41 | +- `clone()`: Returns a deep copy of the Data object without observers. |
| 42 | +- `clone_with_same_x()`: Returns a deep copy of the Data object but shares the same x reference. |
| 43 | +- `add_observer(obs)`: Adds an observer to the data object. |
| 44 | +- `notify_update()`: Notifies observers of data changes. |
| 45 | + |
| 46 | +### Stat Observer |
| 47 | + |
| 48 | +The `Stat` struct is an observer of `Data`, providing statistical analysis of the |
| 49 | +data it observes. It automatically updates its statistics when the underlying data |
| 50 | +changes. |
| 51 | + |
| 52 | +## Observer Models |
| 53 | + |
| 54 | +The following machine learning models in VSL are compatible with the `Observer` |
| 55 | +pattern. This means they can observe data changes and automatically update |
| 56 | +themselves. |
| 57 | + |
| 58 | +### K-Means Clustering |
| 59 | + |
| 60 | +K-Means Clustering is used for unsupervised learning to group data points into |
| 61 | +clusters. As an observer model, it re-trains automatically when the data changes, |
| 62 | +which is useful for dynamic datasets that require continuous updates. |
| 63 | + |
| 64 | +### K-Nearest Neighbors (KNN) |
| 65 | + |
| 66 | +K-Nearest Neighbors (KNN) is used for classification tasks where the target |
| 67 | +variable is categorical. As an observer model, it re-trains automatically when the |
| 68 | +data changes, which is beneficial for datasets that are frequently updated. |
| 69 | + |
| 70 | +## Non-Observer Models |
| 71 | + |
| 72 | +The following machine learning models in VSL do not require the observer pattern |
| 73 | +and are trained once on a dataset without continuous updates. |
| 74 | + |
| 75 | +### Linear Regression |
| 76 | + |
| 77 | +Linear Regression is used for predicting a continuous target variable based on one |
| 78 | +or more predictor variables. It is typically trained once on a dataset and used to |
| 79 | +make predictions without requiring continuous updates. Hence, it is not implemented |
| 80 | +as an observer model. |
0 commit comments