Skip to content

Commit

Permalink
doc updates, minor bug fixes, Readme updates.
Browse files Browse the repository at this point in the history
  • Loading branch information
caseykneale committed Oct 26, 2019
1 parent a153fb9 commit 7fa602d
Show file tree
Hide file tree
Showing 10 changed files with 103 additions and 61 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "ChemometricsTools"
uuid = "a9718f02-dbee-5ae5-ad0e-dfbd07fa387b"
authors = ["caseykneale "]
version = "0.5.7"
version = "0.5.8"

[deps]
Arpack = "7d9fca2a-8960-54d3-9f78-7d1dccf2cb97"
Expand Down
14 changes: 9 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[![](https://img.shields.io/badge/docs-stable-blue.svg)](https://caseykneale.github.io/ChemometricsTools.jl/dev/) [![Build Status](https://travis-ci.org/caseykneale/ChemometricsTools.jl.svg?branch=master)](https://travis-ci.org/caseykneale/ChemometricsTools.jl)

# ChemometricsTools.jl
This package contains a collection of tools to perform fundamental and advanced Chemometric analysis' in Julia. It is currently richer and more fundamental than any single free chemometrics package available in any other language. If you are uninformed as to what Chemometrics is; it could nonelegantly be described as the marriage between data science and chemistry. Traditionally it is the symbiosis of applied linear algebra/statistics which is disciplined by the physics and meaning of chemical measurements. This is somewhat orthogonal to most specializations of machine learning where "add more layers" is the modus operandi. Sometimes chemometricians also get *desperate* and break out pure machine learning methods - so some of those techniques are in this package.
This package contains a collection of tools to perform fundamental and advanced Chemometric analysis' in Julia. It is currently richer and more fundamental than any single free chemometrics package available in any other language. If you are uninformed as to what Chemometrics is; it could nonelegantly be described as the marriage between data science and chemistry. Traditionally it is the symbiosis of applied linear algebra/statistics which is disciplined by the physics and meaning of chemical measurements. This is somewhat orthogonal to most specializations of machine learning where "add more layers" is the modus operandi. Sometimes chemometricians also weigh the pros and cons of black box modelling and break out pure machine learning methods - so some of those techniques are in this package.

## Tutorials/Demonstrations:
- [Transforms](https://caseykneale.github.io/ChemometricsTools.jl/dev/Demos/Transforms/)
Expand All @@ -18,7 +18,7 @@ This package contains a collection of tools to perform fundamental and advanced
- [Regression](https://github.com/caseykneale/ChemometricsTools.jl/blob/master/shootouts/RegressionShootout.jl)
- [Fault Detection](https://github.com/caseykneale/ChemometricsTools.jl/blob/master/shootouts/AnomalyShootout.jl)

### Package Status => Fleshing Out (v 0.5.7)
### Package Status => Fleshing Out (v 0.5.8)
ChemometricsTools has been accepted as an official Julia package! Yep, so you can ```Pkg.add("ChemometricsTools")``` to install it. A lot of features have been added since the first public release (v 0.2.3 ). In 0.5.7 almost all of the functionality available can be used/abused. If you find a bug or want a new feature don't be shy - file an issue. In v0.5.1 Plots was removed as a dependency, new plot recipes were added, and now the package compiles much faster! Multilinear modeling, univariate modeling, and DOE functions are now available. Making headway into the release plan for v0.6.0. Convenience functions, documentation, bug fixes, refactoring and clean up are in progress bare with me. The git repo's master branch typically has the most advanced version, but the features on it may be less reliable because I like to do development on it.

### Seeking Collaborators
Expand All @@ -44,7 +44,7 @@ ChemometricsTools offers easy to use iterators for K-folds validation's, and mov
This package features dozens of regression performance metrics, and a few built in plots (Bland Altman, QQ, Interval Overlays etc) are included. The list of regression methods currently includes: CLS, Ridge, Kernel Ridge, LS-SVM, PCR, PLS(1/2), ELM's, Regression Trees, Random Forest, Monotone Regression... More to come. Chemometricians love regressions! I've also added some convenience functions for univariate calibrations, standard addition experiments and some automated plot functions for them.

### Classification Modeling
In-house classification encodings (one cold/one hot), and easy to retrieve global or multiclass performance statistics. ChemometricsTools currently includes: LDA/PCA with Gaussian discriminants, also Hierchical LDA, multinomial softmax/logistic regression, PLS-DA, K-NN, Gaussian Naive Bayes, Classification Trees, Random Forest, Probabilistic Neural Networks, LinearPerceptrons, and more to come. You can also conveniently dump classification statistics to LaTeX/CSV reports!
In-house classification encodings (one cold/one hot), and easy to retrieve global or multiclass performance statistics. ChemometricsTools currently includes: LDA/PCA with Gaussian discriminants, Hierchical LDA, SIMCA, multinomial softmax/logistic regression, PLS-DA, K-NN, Gaussian Naive Bayes, Classification Trees, Random Forest, Probabilistic Neural Networks, LinearPerceptrons, and more to come. You can also conveniently dump classification statistics to LaTeX/CSV reports!

### Multiway/Multilinear Modeling
I've been working to fulfill an obvious gap in the available tooling. Standard
Expand All @@ -62,9 +62,13 @@ I'd love for a collaborator to contribute some: spectra, chromatograms, etc. Ple
Well, I'd love to hammer in some time series methods. That was originally part of the plan. Then I realized [OnlineStats.jl](https://github.com/joshday/OnlineStats.jl) already has the essentials for online learning covered. Surely many are contemplating packages with time series as a focus. Similarly, if you want clustering methods, just install [Clustering.jl](https://github.com/JuliaStats/Clustering.jl). I may add a few supportive odds and ends in here (or contribute to the packages directly) but really, most of the Julia 1.0+ ecosystem is really reliable, well made, and community supported.

## ToDo:
- Hyperspectral data preprocessing methods that fit into pipelines/transforms.
- Design of Experiment tools (Partial Factorial design, D/I-optimal, etc...)?
- Clean up.
- Performance improvements.
- Syntax improvements.
- Documentation improvements.
- Unit tests.

## Maybes:
- Design of Experiment tools (Partial Factorial design, D/I-optimal, etc...)?
- Convenience fns propagation of error, multiequilibria, kinetics?
- Electrochemical simulations and optical simulations (maybe separate packages...)?
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ makedocs(
"Speciality Tools" => Any[
"Model Analysis" => "man/modelanaly.md",
"MultiWay" => "man/MultiWay.md",
"Hyperspectral" => "man/Hyperspectral.md",
"Anomaly Detection" => "man/AnomalyDetection.md",
"Curve Resolution" => "man/CurveResolution.md"
],
Expand Down
8 changes: 8 additions & 0 deletions docs/src/man/Hyperspectral.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Hyperspectral Modelling Functions API Reference

## Functions

```@autodocs
Modules = [ChemometricsTools]
Pages = ["Hyperspectral.jl"]
```
2 changes: 1 addition & 1 deletion shootouts/ClassificationShootout.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
push!(LOAD_PATH, "/home/caseykneale/Desktop/ChemometricsTools/ChemometricsTools.jl/");
#push!(LOAD_PATH, "/home/caseykneale/Desktop/ChemometricsTools/ChemometricsTools.jl/");
using ChemometricsTools
#View the data in the package space
ChemometricsToolsDatasets()
Expand Down
43 changes: 22 additions & 21 deletions src/CurveResolution.jl
Original file line number Diff line number Diff line change
Expand Up @@ -60,53 +60,54 @@ function NMF(X; Factors = 1, tolerance = 1e-7, maxiters = 200)
return (W, H)
end


"""
SIMPLISMA(X; Factors = 1, alpha = 0.05, includedvars = 1:size(X)[2], SecondDeriv = true)
Performs SIMPLISMA on Array `X` using either the raw spectra or the Second Derivative spectra.
alpha can be set to reduce contributions of baseline, and a list of included variables in the determination
of pure variables may also be provided.
Returns a tuple of the following form: (Concentraion Profile, Pure Spectral Estimates, Pure Variables)
W. Windig, Spectral Data Files for Self-Modeling Curve Resolution with Examples Using the SIMPLISMA Approach, Chemometrics and Intelligent Laboratory Systems, 36, 1997, 3-16.
"""
function SIMPLISMA(X; Factors = 1, alpha = 0.05, includedvars = 1:size(X)[2], SecondDeriv = true)
@warn("Has not been tested for correctness.")
function SIMPLISMA(X; Factors = 1, alpha = 0.03, includedvars = 1:size(X)[2],
SecondDeriv = true)
@warn("SIMPLISMA has not been completely tested for correctness.")
Xcpy = deepcopy(X)
X = X[:,includedvars]
if SecondDeriv
X = map( x -> max( x, 0.0 ), -SecondDerivative( X ) )
end
purvarindex = []
(obs, vars) = size(X)
pureX = zeros( Factors, vars )
weights = zeros( vars )

Col_Std = Statistics.std(X, dims = 1) .* sqrt( (obs - 1) / obs);
Col_Mu = Statistics.mean(X, dims = 1);
Robust_Col_Mu = Col_Mu .+ (alpha * reduce(max, Col_Mu) );
Norm = sqrt.( ((Col_Std .+ Robust_Col_Mu).^ 2) .+ (Col_Mu .^ 2) )
Normed = X ./ Norm
normcov = (Normed' * Normed) ./ obs
Robust_Col_Mu = Col_Mu .+ (alpha .* reduce(max, Col_Mu) );
NormFactor = sqrt.( ((Col_Std .+ Robust_Col_Mu).^ 2) .+ (Col_Mu .^ 2) )
Xp = X ./ NormFactor

purity = Col_Std ./ Robust_Col_Mu
purvarindex = []
weights = zeros( vars )

for i in 1 : (Factors+1)
for j in 1 : vars
if i > 1
weights[j] = LinearAlgebra.det( normcov[ [ j; purvarindex] , [j; purvarindex ] ] )
else
weights[j] = LinearAlgebra.det( normcov[ j , j ] )
end
for i in 1 : (Factors)
for j in 1 : vars
purvarmatrix = Xp[ : , vcat( purvarindex, j) ] ;
O = (purvarmatrix' * purvarmatrix) ./ obs
weights[j] = det( O );
end
purity_Spec = weights .* purity'
push!(purvarindex, argmax(purity_Spec)[1])
pureX[i,:] = purity_Spec;
end

pureX = Xcpy[ : , includedvars[purvarindex[1:end]] ]
purespectra = pureX \ Xcpy
pureinX = Xcpy[ : , includedvars[purvarindex] ]
purespectra = pureinX \ Xcpy
pureabundance = Xcpy / purespectra

scale = LinearAlgebra.Diagonal(1.0 ./ sum(pureabundance, dims = 2))
pureabundance = pureabundance * scale
purespectra = Base.inv( scale ) * purespectra
return (pureabundance[:,2:end], purespectra[2:end,:], includedvars[purvarindex[2:end]])
return (pureabundance, purespectra, includedvars[purvarindex])
end

"""
Expand Down Expand Up @@ -341,7 +342,7 @@ DOI: 10.1016/S0019-0578(99)00022-1
"""
function ITTFA(X; Factors = 1, Components = Factors, maxiters = 500,
threshold = 1e-8, nonnegativity = true)
@warn("Has not been tested for correctness.")
@warn("ITTFA has not been tested for correctness.")
rows, vars = size( X );
Result = zeros( Components, rows )
selectedneedles = zeros( Components )
Expand Down
29 changes: 20 additions & 9 deletions src/Hyperspectral.jl
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
#This is a workspace for hyperspectral imaging methods
#Maybe some NWAY stuff will fall in here...
#This shouldn't be on master, but I won't put anything here unless I think it'll work.

"""
ACE(Background, X, Target)
Untested
"""
function ACE(Background, X, Target)
@assert( length(size(Background)) < 4 )
if length(size(Background)) == 3
Background = reshape(Background, prod( size( Background )[1:2 ]), size( Background )[ 3 ] )
end
Expand All @@ -16,15 +20,22 @@ function ACE(Background, X, Target)
return (numerator * numerator) / denominator
end

#MF is always superior to CEMXiurui Geng, Luyan Ji, Weitun Yang, Fuxiang Wang, Yongchao Zhao
#https://arxiv.org/pdf/1612.00549.pdf
"""
MatchedFilter(X, Target)
Untested
MatchedFilter is always superior to CEM. Xiurui Geng, Luyan Ji, Weitun Yang, Fuxiang Wang, Yongchao Zhao
https://arxiv.org/pdf/1612.00549.pdf
"""
function MatchedFilter(X, Target)
if length(size(Background)) == 3
Background = reshape(Background, prod( size( Background )[1:2 ]), size( Background )[ 3 ] )
@assert( length(size(X)) < 4 )
if length(size(X)) == 3
X = reshape(X, prod( size( X )[1:2 ]), size( X )[ 3 ] )
end
mu = Statistics.mean(Background, dims = 1)
mcent = Background .- mu
covinv = Base.inv( ( 1.0 / size(Background)[1] ) .* (mcent' * mcent) )
mu = Statistics.mean(X, dims = 1)
mcent = X .- mu
covinv = Base.inv( ( 1.0 / size(X)[1] ) .* (mcent' * mcent) )
tmu = Target .- mu
xmu = X .- mu
numerator = covinv * tmu
Expand Down
24 changes: 11 additions & 13 deletions src/InHouseStats.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
#This file will have 0 dependencies...

"""
rbinomial( p, size... )
Expand Down Expand Up @@ -144,17 +143,6 @@ function SampleSkewness(X)
return ( sqrt( N * (N - 1) ) / (N - 2) ) * Skewness( X )
end


#This was written for an algorithm and didn't fit in anywhere so for now it's kept
#but it may not have use...
struct PermutedVectorPair{A,B,C}
vec1::A
vec2::B
operation::C
i::Int
length::Int
end

"""
CorrelationMatrix(X; DOF_used = 0)
Expand All @@ -180,7 +168,7 @@ end
Returns the Pearson correlation of 2 vectors.
This is only included because finding a legible implementation was hard for me
to find some years ago (for the reader).
to find some years ago (for the reader).
"""
function CorrelationVectors( A, B )
obs = length( A )
Expand All @@ -190,6 +178,16 @@ function CorrelationVectors( A, B )
return ( A' * B ) * ( 1 / ( (obs - 1) * std( A ) * std( B )) )
end

#This was written for an algorithm and didn't fit in anywhere so for now it's kept
#but it may not have use...
struct PermutedVectorPair{A,B,C}
vec1::A
vec2::B
operation::C
i::Int
length::Int
end

"""
PermutedVectorPair(vec1, vec2; op = +)
Expand Down
8 changes: 6 additions & 2 deletions src/RegressionModels.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
abstract type RegressionModels end
#If only we could add methods to abstract types...
# (M::RegressionModel)(X) = RegressionOut(X, M)
# maybe in Julia 2.0? - Update: Julia 1.2 allows this!!!
# maybe in Julia 2.0? - Update: Julia 1.2+ allows this!!!

struct ClassicLeastSquares <: RegressionModels
Coefficients::Array
Expand Down Expand Up @@ -106,7 +106,10 @@ Returns a LSSVM Wrapper for a CLS object.
"""
function LSSVM( X, Y, Penalty; KernelParameter = 0.0, KernelType = "linear" )
Kern = Kernel( KernelParameter, KernelType, X )
return LSSVM(Kern, RidgeRegression( formatlssvminput( Kern( X ) ), vcat( 0.0, Y ), Penalty ) )
return LSSVM( Kern,
RidgeRegression( formatlssvminput( Kern( X ) ),
vcat( 0.0, Y ),
Penalty ) )
end

"""
Expand Down Expand Up @@ -273,6 +276,7 @@ end
Performs a monotone/isotonic regression on a vector x. This can be weighted
with a vector w.
Code was translated directly from:
Exceedingly Simple Monotone Regression. Jan de Leeuw. Version 02, March 30, 2017
"""
function MonotoneRegression(x::Array{Float64,1}, w = nothing)
Expand Down
Loading

0 comments on commit 7fa602d

Please sign in to comment.