doc updates, minor bug fixes, Readme updates.

caseykneale · Oct 26, 2019 · 7fa602d · 7fa602d
1 parent a153fb9
commit 7fa602d
Show file tree

Hide file tree

Showing 10 changed files with 103 additions and 61 deletions.
diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
 name = "ChemometricsTools"
 uuid = "a9718f02-dbee-5ae5-ad0e-dfbd07fa387b"
 authors = ["caseykneale "]
-version = "0.5.7"
+version = "0.5.8"
 
 [deps]
 Arpack = "7d9fca2a-8960-54d3-9f78-7d1dccf2cb97"

diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 [![](https://img.shields.io/badge/docs-stable-blue.svg)](https://caseykneale.github.io/ChemometricsTools.jl/dev/) [![Build Status](https://travis-ci.org/caseykneale/ChemometricsTools.jl.svg?branch=master)](https://travis-ci.org/caseykneale/ChemometricsTools.jl)
 
 # ChemometricsTools.jl
-This package contains a collection of tools to perform fundamental and advanced Chemometric analysis' in Julia. It is currently richer and more fundamental than any single free chemometrics package available in any other language. If you are uninformed as to what Chemometrics is; it could nonelegantly be described as the marriage between data science and chemistry. Traditionally it is the symbiosis of applied linear algebra/statistics which is disciplined by the physics and meaning of chemical measurements. This is somewhat orthogonal to most specializations of machine learning where "add more layers" is the modus operandi. Sometimes chemometricians also get *desperate* and break out pure machine learning methods - so some of those techniques are in this package.
+This package contains a collection of tools to perform fundamental and advanced Chemometric analysis' in Julia. It is currently richer and more fundamental than any single free chemometrics package available in any other language. If you are uninformed as to what Chemometrics is; it could nonelegantly be described as the marriage between data science and chemistry. Traditionally it is the symbiosis of applied linear algebra/statistics which is disciplined by the physics and meaning of chemical measurements. This is somewhat orthogonal to most specializations of machine learning where "add more layers" is the modus operandi. Sometimes chemometricians also weigh the pros and cons of black box modelling and break out pure machine learning methods - so some of those techniques are in this package.
 
 ## Tutorials/Demonstrations:
   - [Transforms](https://caseykneale.github.io/ChemometricsTools.jl/dev/Demos/Transforms/)
@@ -18,7 +18,7 @@ This package contains a collection of tools to perform fundamental and advanced
   - [Regression](https://github.com/caseykneale/ChemometricsTools.jl/blob/master/shootouts/RegressionShootout.jl)
   - [Fault Detection](https://github.com/caseykneale/ChemometricsTools.jl/blob/master/shootouts/AnomalyShootout.jl)
 
-### Package Status => Fleshing Out (v 0.5.7)
+### Package Status => Fleshing Out (v 0.5.8)
 ChemometricsTools has been accepted as an official Julia package! Yep, so you can  ```Pkg.add("ChemometricsTools")``` to install it. A lot of features have been added since the first public release (v 0.2.3 ). In 0.5.7 almost all of the functionality available can be used/abused. If you find a bug or want a new feature don't be shy - file an issue. In v0.5.1 Plots was removed as a dependency, new plot recipes were added, and now the package compiles much faster! Multilinear modeling, univariate modeling, and DOE functions are now available. Making headway into the release plan for v0.6.0. Convenience functions, documentation, bug fixes, refactoring and clean up are in progress bare with me. The git repo's master branch typically has the most advanced version, but the features on it may be less reliable because I like to do development on it.
 
 ### Seeking Collaborators
@@ -44,7 +44,7 @@ ChemometricsTools offers easy to use iterators for K-folds validation's, and mov
 This package features dozens of regression performance metrics, and a few built in plots (Bland Altman, QQ, Interval Overlays etc) are included. The list of regression methods currently includes: CLS, Ridge, Kernel Ridge, LS-SVM, PCR, PLS(1/2), ELM's, Regression Trees, Random Forest, Monotone Regression... More to come. Chemometricians love regressions! I've also added some convenience functions for univariate calibrations, standard addition experiments and some automated plot functions for them.
 
 ### Classification Modeling
-In-house classification encodings (one cold/one hot), and easy to retrieve global or multiclass performance statistics. ChemometricsTools currently includes: LDA/PCA with Gaussian discriminants, also Hierchical LDA, multinomial softmax/logistic regression, PLS-DA, K-NN, Gaussian Naive Bayes, Classification Trees, Random Forest, Probabilistic Neural Networks, LinearPerceptrons, and more to come. You can also conveniently dump classification statistics to LaTeX/CSV reports!
+In-house classification encodings (one cold/one hot), and easy to retrieve global or multiclass performance statistics. ChemometricsTools currently includes: LDA/PCA with Gaussian discriminants, Hierchical LDA, SIMCA, multinomial softmax/logistic regression, PLS-DA, K-NN, Gaussian Naive Bayes, Classification Trees, Random Forest, Probabilistic Neural Networks, LinearPerceptrons, and more to come. You can also conveniently dump classification statistics to LaTeX/CSV reports!
 
 ### Multiway/Multilinear Modeling
 I've been working to fulfill an obvious gap in the available tooling. Standard
@@ -62,9 +62,13 @@ I'd love for a collaborator to contribute some: spectra, chromatograms, etc. Ple
 Well, I'd love to hammer in some time series methods. That was originally part of the plan. Then I realized [OnlineStats.jl](https://github.com/joshday/OnlineStats.jl) already has the essentials for online learning covered. Surely many are contemplating packages with time series as a focus. Similarly, if you want clustering methods, just install [Clustering.jl](https://github.com/JuliaStats/Clustering.jl). I may add a few supportive odds and ends in here (or contribute to the packages directly) but really, most of the Julia 1.0+ ecosystem is really reliable, well made, and community supported.
 
 ## ToDo:
-  - Hyperspectral data preprocessing methods that fit into pipelines/transforms.
-  - Design of Experiment tools (Partial Factorial design, D/I-optimal, etc...)?
+  - Clean up.
+  - Performance improvements.
+  - Syntax improvements.
+  - Documentation improvements.
+  - Unit tests.
 
 ## Maybes:
+  - Design of Experiment tools (Partial Factorial design, D/I-optimal, etc...)?
   - Convenience fns propagation of error, multiequilibria, kinetics?
   - Electrochemical simulations and optical simulations (maybe separate packages...)?
diff --git a/docs/make.jl b/docs/make.jl
@@ -52,6 +52,7 @@ makedocs(
 			 "Speciality Tools" => Any[
 			 					 "Model Analysis" => "man/modelanaly.md",
 						 		 "MultiWay" => "man/MultiWay.md",
+								 "Hyperspectral" => "man/Hyperspectral.md",
 								 "Anomaly Detection" => "man/AnomalyDetection.md",
 								 "Curve Resolution" => "man/CurveResolution.md"
 							],

diff --git a/docs/src/man/Hyperspectral.md b/docs/src/man/Hyperspectral.md
@@ -0,0 +1,8 @@
+# Hyperspectral Modelling Functions API Reference
+
+## Functions
+
+```@autodocs
+Modules = [ChemometricsTools]
+Pages   = ["Hyperspectral.jl"]
+```
diff --git a/shootouts/ClassificationShootout.jl b/shootouts/ClassificationShootout.jl
@@ -1,4 +1,4 @@
-push!(LOAD_PATH, "/home/caseykneale/Desktop/ChemometricsTools/ChemometricsTools.jl/");
+#push!(LOAD_PATH, "/home/caseykneale/Desktop/ChemometricsTools/ChemometricsTools.jl/");
 using ChemometricsTools
 #View the data in the package space
 ChemometricsToolsDatasets()

diff --git a/src/CurveResolution.jl b/src/CurveResolution.jl
@@ -60,53 +60,54 @@ function NMF(X; Factors = 1, tolerance = 1e-7, maxiters = 200)
     return (W, H)
 end
 
-
 """
     SIMPLISMA(X; Factors = 1, alpha = 0.05, includedvars = 1:size(X)[2], SecondDeriv = true)
 Performs SIMPLISMA on Array `X` using either the raw spectra or the Second Derivative spectra.
 alpha can be set to reduce contributions of baseline, and a list of included variables in the determination
 of pure variables may also be provided.
 Returns a tuple of the following form: (Concentraion Profile, Pure Spectral Estimates, Pure Variables)
+
 W. Windig, Spectral Data Files for Self-Modeling Curve Resolution with Examples Using the SIMPLISMA Approach, Chemometrics and Intelligent Laboratory Systems, 36, 1997, 3-16.
 """
-function SIMPLISMA(X; Factors = 1, alpha = 0.05, includedvars = 1:size(X)[2], SecondDeriv = true)
-    @warn("Has not been tested for correctness.")
+function SIMPLISMA(X; Factors = 1, alpha = 0.03, includedvars = 1:size(X)[2],
+                    SecondDeriv = true)
+    @warn("SIMPLISMA has not been completely tested for correctness.")
     Xcpy = deepcopy(X)
     X = X[:,includedvars]
     if SecondDeriv
         X = map( x -> max( x, 0.0 ), -SecondDerivative( X ) )
     end
+    purvarindex = []
     (obs, vars) = size(X)
+    pureX = zeros( Factors, vars )
+    weights = zeros( vars )
+
     Col_Std = Statistics.std(X, dims = 1) .* sqrt( (obs - 1) / obs);
     Col_Mu = Statistics.mean(X, dims = 1);
-    Robust_Col_Mu = Col_Mu .+ (alpha * reduce(max, Col_Mu) );
-    Norm = sqrt.( ((Col_Std .+ Robust_Col_Mu).^ 2) .+ (Col_Mu .^ 2) )
-    Normed = X ./ Norm
-    normcov = (Normed' * Normed) ./ obs
+    Robust_Col_Mu = Col_Mu .+ (alpha .* reduce(max, Col_Mu) );
+    NormFactor = sqrt.( ((Col_Std .+ Robust_Col_Mu).^ 2) .+ (Col_Mu .^ 2) )
+    Xp = X ./ NormFactor
+
     purity = Col_Std ./ Robust_Col_Mu
-    purvarindex = []
-    weights = zeros( vars )
 
-    for i in 1 : (Factors+1)
-       for j in 1 : vars
-            if i > 1
-                weights[j] = LinearAlgebra.det( normcov[ [ j; purvarindex] , [j; purvarindex ]  ] )
-            else
-                weights[j] = LinearAlgebra.det( normcov[ j , j ] )
-            end
+    for i in 1 : (Factors)
+        for j in 1 : vars
+           purvarmatrix = Xp[ : , vcat( purvarindex, j) ] ;
+           O = (purvarmatrix' * purvarmatrix) ./ obs
+           weights[j] = det( O );
        end
        purity_Spec = weights .* purity'
        push!(purvarindex, argmax(purity_Spec)[1])
+       pureX[i,:] = purity_Spec;
     end
-
-    pureX = Xcpy[ : , includedvars[purvarindex[1:end]] ]
-    purespectra = pureX \ Xcpy
+    pureinX = Xcpy[ : , includedvars[purvarindex] ]
+    purespectra = pureinX \ Xcpy
     pureabundance = Xcpy / purespectra
 
     scale = LinearAlgebra.Diagonal(1.0 ./ sum(pureabundance, dims = 2))
     pureabundance = pureabundance * scale
     purespectra = Base.inv( scale ) * purespectra
-    return (pureabundance[:,2:end], purespectra[2:end,:], includedvars[purvarindex[2:end]])
+    return (pureabundance, purespectra, includedvars[purvarindex])
 end
 
 """
@@ -341,7 +342,7 @@ DOI: 10.1016/S0019-0578(99)00022-1
 """
 function ITTFA(X; Factors = 1, Components = Factors, maxiters = 500,
                 threshold = 1e-8, nonnegativity = true)
-    @warn("Has not been tested for correctness.")
+    @warn("ITTFA has not been tested for correctness.")
     rows, vars = size( X );
     Result = zeros( Components, rows )
     selectedneedles = zeros( Components )

diff --git a/src/Hyperspectral.jl b/src/Hyperspectral.jl
@@ -1,8 +1,12 @@
 #This is a workspace for hyperspectral imaging methods
-#Maybe some NWAY stuff will fall in here...
-#This shouldn't be on master, but I won't put anything here unless I think it'll work.
 
+"""
+    ACE(Background, X, Target)
+
+Untested
+"""
 function ACE(Background, X, Target)
+    @assert( length(size(Background)) < 4 )
     if length(size(Background)) == 3
         Background = reshape(Background, prod( size( Background )[1:2 ]), size( Background )[ 3 ]  )
     end
@@ -16,15 +20,22 @@ function ACE(Background, X, Target)
     return (numerator * numerator) / denominator
 end
 
-#MF is always superior to CEMXiurui Geng,   Luyan Ji, Weitun Yang, Fuxiang Wang, Yongchao Zhao
-#https://arxiv.org/pdf/1612.00549.pdf
+"""
+    MatchedFilter(X, Target)
+
+Untested
+
+MatchedFilter is always superior to CEM. Xiurui Geng, Luyan Ji, Weitun Yang, Fuxiang Wang, Yongchao Zhao
+https://arxiv.org/pdf/1612.00549.pdf
+"""
 function MatchedFilter(X, Target)
-    if length(size(Background)) == 3
-        Background = reshape(Background, prod( size( Background )[1:2 ]), size( Background )[ 3 ]  )
+    @assert( length(size(X)) < 4 )
+    if length(size(X)) == 3
+        X = reshape(X, prod( size( X )[1:2 ]), size( X )[ 3 ]  )
     end
-    mu = Statistics.mean(Background, dims = 1)
-    mcent = Background .- mu
-    covinv = Base.inv( ( 1.0 / size(Background)[1] ) .* (mcent' * mcent) )
+    mu = Statistics.mean(X, dims = 1)
+    mcent = X .- mu
+    covinv = Base.inv( ( 1.0 / size(X)[1] ) .* (mcent' * mcent) )
     tmu = Target .- mu
     xmu = X .- mu
     numerator = covinv * tmu

diff --git a/src/InHouseStats.jl b/src/InHouseStats.jl
@@ -1,5 +1,4 @@
 #This file will have 0 dependencies...
-
 """
     rbinomial( p, size... )
 
@@ -144,17 +143,6 @@ function SampleSkewness(X)
     return ( sqrt( N * (N - 1) ) / (N - 2) ) * Skewness( X )
 end
 
-
-#This was written for an algorithm and didn't fit in anywhere so for now it's kept
-#but it may not have use...
-struct PermutedVectorPair{A,B,C}
-    vec1::A
-    vec2::B
-    operation::C
-    i::Int
-    length::Int
-end
-
 """
     CorrelationMatrix(X; DOF_used = 0)
 
@@ -180,7 +168,7 @@ end
 Returns the Pearson correlation of 2 vectors.
 
 This is only included because finding a legible implementation was hard for me
-to find some years ago (for the reader). 
+to find some years ago (for the reader).
 """
 function CorrelationVectors( A, B )
     obs = length( A )
@@ -190,6 +178,16 @@ function CorrelationVectors( A, B )
     return ( A' * B ) * ( 1 / ( (obs - 1) * std( A ) * std( B )) )
 end
 
+#This was written for an algorithm and didn't fit in anywhere so for now it's kept
+#but it may not have use...
+struct PermutedVectorPair{A,B,C}
+    vec1::A
+    vec2::B
+    operation::C
+    i::Int
+    length::Int
+end
+
 """
     PermutedVectorPair(vec1, vec2; op = +)
 

diff --git a/src/RegressionModels.jl b/src/RegressionModels.jl
@@ -1,7 +1,7 @@
 abstract type RegressionModels end
 #If only we could add methods to abstract types...
 # (M::RegressionModel)(X) = RegressionOut(X, M)
-# maybe in Julia 2.0? - Update: Julia 1.2 allows this!!!
+# maybe in Julia 2.0? - Update: Julia 1.2+ allows this!!!
 
 struct ClassicLeastSquares <: RegressionModels
     Coefficients::Array
@@ -106,7 +106,10 @@ Returns a LSSVM Wrapper for a CLS object.
 """
 function LSSVM( X, Y, Penalty; KernelParameter = 0.0, KernelType = "linear" )
     Kern = Kernel( KernelParameter, KernelType, X )
-    return LSSVM(Kern, RidgeRegression( formatlssvminput( Kern( X ) ), vcat( 0.0, Y ), Penalty ) )
+    return LSSVM(   Kern,
+                    RidgeRegression( formatlssvminput( Kern( X ) ),
+                    vcat( 0.0, Y ),
+                    Penalty ) )
 end
 
 """
@@ -273,6 +276,7 @@ end
 Performs a monotone/isotonic regression on a vector x. This can be weighted
 with a vector w.
 
+Code was translated directly from:
 Exceedingly Simple Monotone Regression. Jan de Leeuw. Version 02, March 30, 2017
 """
 function MonotoneRegression(x::Array{Float64,1}, w = nothing)