Skip to content

Commit

Permalink
Changed SIMPLISMA method to Windeg's original version. Added kernel d…
Browse files Browse the repository at this point in the history
…ensity estimators, etc. Cutting 0.3.7 release
  • Loading branch information
CaseyKneale committed Mar 10, 2019
1 parent e56a249 commit cf3e61c
Show file tree
Hide file tree
Showing 24 changed files with 260 additions and 81 deletions.
22 changes: 10 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ This package contains a collection of tools to perform fundamental and advanced
- [Regression](https://github.com/caseykneale/ChemometricsTools.jl/blob/master/shootouts/RegressionShootout.jl)
- [Fault Detection](https://github.com/caseykneale/ChemometricsTools.jl/blob/master/shootouts/AnomalyShootout.jl)

### Package Status => Public release! (v 0.2.3)
ChemometricsTools is brand new, and was recently accepted to be registered as an official Julia package! Yep, so you can ```Pkg.add("ChemometricsTools")``` to install it. The git repo has the most stable version right now, I fixed a lot of bugs since 0.2.3, but soon the package will be up to date. Almost all of the functionality available can reliably be used/abused, and the documentation is pretty fleshed out. Betchya anything there are a few bugs in the package still. This is an early release for constructive criticism and spreading awareness.
### Package Status => "Stable" Public release! (v 0.3.7)
ChemometricsTools is brand new, and was recently accepted to be registered as an official Julia package! Yep, so you can ```Pkg.add("ChemometricsTools")``` to install it. The git repo has the most stable version right now, I fixed a lot of bugs since 0.2.3, but soon the package will be up to date. In 0.3.7 almost all of the functionality available can reliably be used/abused, and the documentation is pretty fleshed out. Betchya anything there are a few bugs in the package still so please file issues when you find them.

# Package Highlights
### Transforms/Pipelines
Expand All @@ -28,31 +28,29 @@ Two design choices introduced in this package are "Transformations" and "Pipelin
Multiple transformations can easily be chained together and stored using "Pipelines". Pipelines aren't "pipes" like are present in Bash, R and base Julia. They are flexible, yet immutable, convenience objects that allow for sequential preprocessing and data transformations to be reused, chained, or automated for reliable analytic throughput.

### Model training
ChemometricsTools offers easy to use iterators for K-folds validation's, and moving window sampling. More advanced sampling methods, like Kennard Stone, are just a function call away. Convenience functions for interval selections, weighting regression ensembles, etc are also available. These allow for ensemble models like SIPLS, P-DS, P-OSC, etc to be built quickly.
ChemometricsTools offers easy to use iterators for K-folds validation's, and moving window sampling/training. More advanced sampling methods, like Kennard Stone, are just a function call away. Convenience functions for interval selections, weighting regression ensembles, etc are also available. These allow for ensemble models like SIPLS, P-DS, P-OSC, etc to be built quickly. With the tools included both in this package and Base Julia, nothing should stand in your way.

### Regression Modeling
This package features dozens of regression performance metrics, and a few built in plots (Bland Altman, QQ, Interval Overlays etc) are included. The list of regression methods currently includes: CLS, Ridge, Kernel Ridge, LS-SVM, PCR, PLS(1/2), ELM's, Regression Trees, Random Forest... More to come. Chemometricians love regressions!

### Classification Modeling
In-house classification encodings (one cold/one hot), multiclass performance statistics. ChemometricsTools currently includes: LDA with Gaussian discriminants, logistic regression, PLS-DA, K-NN, Gaussian Naive Bayes, Classification Trees, Random Forest, and more to come.
In-house classification encodings (one cold/one hot), multiclass performance statistics. ChemometricsTools currently includes: LDA/PCA with Gaussian discriminants, logistic regression, PLS-DA, K-NN, Gaussian Naive Bayes, Classification Trees, Random Forest, and more to come.

## Specialized tools?
This package has tools for specialized fields of analysis'. For instance, fractional derivatives for the electrochemists (and the adventurous), Savitsky Golay smoothing for spectroscopists, curve resolution for forensics, process fault detection methods, etc. There are certainly plans for other tools for analyzing chemical data that packages in other languages have seemingly left out. Stay tuned.
This package has tools for specialized fields of analysis'. For instance, fractional derivatives for the electrochemists (and the adventurous), a handful of smoothing methods for spectroscopists, curve resolution for forensics, process fault detection methods, etc. There are certainly plans for other tools for analyzing chemical data that packages in other languages have seemingly left out. Stay tuned.

## Wheres the Data?
Right now I don't have rights to any data; but iris, the Tecator meat data, and a NASA fault detection dataset are included. I'd love for a collaborator to contribute some: spectra, mass spectra, chromatograms, etc. Please reach out to me if you wish to collaborate/contribute. There's a good chance in a week or so I'll be reaching out to the community for these sorts of things, in the mean time you can load in your own datasets using the Julia ecosystem.
Right now I don't have rights to provide much data; but the iris, Tecator meat data, and a NASA fault detection datasets are included. I'd love for a collaborator to contribute some: spectra, chromatograms, etc. Please reach out to me if you wish to collaborate/contribute. There's a good chance in a week or so I'll be reaching out to the community for these sorts of things, in the mean time you can load in your own datasets using the Julia ecosystem.

## What about Time Series? Cluster models?
Well, I'd love to hammer in some time series methods. That was originally part of the plan. Then I realized [OnlineStats.jl](https://github.com/joshday/OnlineStats.jl) already has pretty much everything covered. Similarly, if you want clustering methods, just install [Clustering.jl](https://github.com/JuliaStats/Clustering.jl). I'll add a few supportive odds and ends in here but really, some of the Julia ecosystem is really reliable and well made.
Well, I'd love to hammer in some time series methods. That was originally part of the plan. Then I realized [OnlineStats.jl](https://github.com/joshday/OnlineStats.jl) already has pretty much everything covered. Similarly, if you want clustering methods, just install [Clustering.jl](https://github.com/JuliaStats/Clustering.jl). I may add a few supportive odds and ends in here(or contribute to the packages directly) but really, some of the Julia ecosystem is really reliable and well made.

## ToDo:
- SIMPLISMA return unique pure var's...
- Double check FNNLS
- Some more unit tests...
- Some more practical unit tests...
- Long-term: SIMCA, N-WAY PCA, and N-WAY PLS
- Hyperspectral data preprocessing methods that fit into pipelines/transforms.

## Maybes:
- Hyperspectral data preprocessing methods?
- Convenience fns for standard addition, propagation of error, multiequilibria, kinetics?
- Design of Experiment tools (Partial Factorial design, simplex, etc...)?
- Electrochemical simulations and optical simulations?
- Electrochemical simulations and optical simulations (maybe separate packages...)?
Loading

0 comments on commit cf3e61c

Please sign in to comment.