Smile — Statistical Machine Intelligence and Learning Engine

Goal

Smile is a fast and comprehensive machine learning framework in Java. Smile also provides APIs in Scala, Kotlin, and Clojure with corresponding language paradigms. With advanced data structures and algorithms, Smile delivers state-of-art performance. Smile covers every aspect of machine learning, including deep learning, large language models, classification, regression, clustering, association rule mining, feature selection and extraction, manifold learning, multidimensional scaling, genetic algorithms, missing value imputation, efficient nearest neighbor search, etc. Furthermore, Smile also provides advanced algorithms for graph, linear algebra, numerical analysis, interpolation, computer algebra system for symbolic manipulations, and data visualization.

Features

Smile implements the following major machine learning algorithms:

GenAI: Native Java implementation of Llama 3.1, tiktoken tokenizer, high performance LLM inference server with OpenAI-compatible APIs and SSE-based chat streaming, fully functional frontend. A free service is available for personal or test usage. No registration is required.
Deep Learning: Deep learning with CPU and GPU. EfficientNet model for image classification.
Classification: Support Vector Machines, Decision Trees, AdaBoost, Gradient Boosting, Random Forest, Logistic Regression, Neural Networks, RBF Networks, Maximum Entropy Classifier, KNN, Naïve Bayesian, Fisher/Linear/Quadratic/Regularized Discriminant Analysis.
Regression: Support Vector Regression, Gaussian Process, Regression Trees, Gradient Boosting, Random Forest, RBF Networks, OLS, LASSO, ElasticNet, Ridge Regression.
Feature Selection: Genetic Algorithm based Feature Selection, Ensemble Learning based Feature Selection, TreeSHAP, Signal Noise ratio, Sum Squares ratio.
Clustering: BIRCH, CLARANS, DBSCAN, DENCLUE, Deterministic Annealing, K-Means, X-Means, G-Means, Neural Gas, Growing Neural Gas, Hierarchical Clustering, Sequential Information Bottleneck, Self-Organizing Maps, Spectral Clustering, Minimum Entropy Clustering.
Association Rule & Frequent Itemset Mining: FP-growth mining algorithm.
Manifold Learning: IsoMap, LLE, Laplacian Eigenmap, t-SNE, UMAP, PCA, Kernel PCA, Probabilistic PCA, GHA, Random Projection, ICA.
Multi-Dimensional Scaling: Classical MDS, Isotonic MDS, Sammon Mapping.
Nearest Neighbor Search: BK-Tree, Cover Tree, KD-Tree, SimHash, LSH.
Sequence Learning: Hidden Markov Model, Conditional Random Field.
Natural Language Processing: Sentence Splitter and Tokenizer, Bigram Statistical Test, Phrase Extractor, Keyword Extractor, Stemmer, POS Tagging, Relevance Ranking

License

SMILE employs a dual license model designed to meet the development and distribution needs of both commercial distributors (such as OEMs, ISVs and VARs) and open source projects. For details, please see LICENSE. To acquire a commercial license, please contact [email protected].

Issues/Discussions

Discussion/Questions: If you wish to ask questions about Smile, we're active on GitHub Discussions and Stack Overflow.
Docs: Smile is well documented and our docs are available online, where you can find tutorial, programming guides, and more information. If you'd like to help improve the docs, they're part of this repository in the web/src directory. Java Docs, Scala Docs, Kotlin Docs, and Clojure Docs are also available.
Issues/Feature Requests: Finally, any bugs or features, please report to our issue tracker.

Installation

You can use the libraries through Maven central repository by adding the following to your project pom.xml file.

    <dependency>
      <groupId>com.github.haifengl</groupId>
      <artifactId>smile-core</artifactId>
      <version>4.2.0</version>
    </dependency>

For deep learning and NLP, use the artifactId smile-deep and smile-nlp, respectively.

For Scala API, please add the below into your sbt script.

    libraryDependencies += "com.github.haifengl" %% "smile-scala" % "4.2.0"

For Kotlin API, add the below into the dependencies section of Gradle build script.

    implementation("com.github.haifengl:smile-kotlin:4.2.0")

For Clojure API, add the following dependency to your project file:

    [org.clojars.haifengl/smile "4.2.0"]

Some algorithms rely on BLAS and LAPACK (e.g. manifold learning, some clustering algorithms, Gaussian Process regression, MLP, etc.). To use these algorithms, you should include OpenBLAS for optimized matrix computation:

    libraryDependencies ++= Seq(
      "org.bytedeco" % "javacpp"   % "1.5.11"        classifier "macosx-arm64" classifier "macosx-x86_64" classifier "windows-x86_64" classifier "linux-x86_64",
      "org.bytedeco" % "openblas"  % "0.3.28-1.5.11" classifier "macosx-arm64" classifier "macosx-x86_64" classifier "windows-x86_64" classifier "linux-x86_64",
      "org.bytedeco" % "arpack-ng" % "3.9.1-1.5.11"  classifier "macosx-x86_64" classifier "windows-x86_64" classifier "linux-x86_64"
    )

In this example, we include all supported 64-bit platforms and filter out 32-bit platforms. The user should include only the needed platforms to save spaces.

If you prefer other BLAS implementations, you can use any library found on the "java.library.path" or on the class path, by specifying it with the "org.bytedeco.openblas.load" system property. For example, to use the BLAS library from the Accelerate framework on Mac OS X, we can pass options such as -Dorg.bytedeco.openblas.load=blas.

If you have a default installation of MKL or simply include the following modules that include the full version of MKL binaries, Smile will automatically switch to MKL.

libraryDependencies ++= {
  val version = "2025.0-1.5.11"
  Seq(
    "org.bytedeco" % "mkl-platform"        % version,
    "org.bytedeco" % "mkl-platform-redist" % version
  )
}

Shell

Smile comes with interactive shells for Java, Scala and Kotlin. Download pre-packaged Smile from the releases page. After unziping the package and cd into the home directory of Smile in a terminal, type

    ./bin/jshell.sh

to enter Smile shell in Java, which pre-imports all major Smile packages. You can run any valid Java expressions in the shell. In the simplest case, you can use it as a calculator.

To enter the shell in Scala, type

    ./bin/smile

Similar to the shell in Java, all major Smile packages are pre-imported. Besides, all high-level Smile operators are predefined in the shell.

By default, the shell uses up to 75% memory. If you need more memory to handle large data, use the option -J-Xmx or -XX:MaxRAMPercentage. For example,

    ./bin/smile -J-Xmx30G

You can also modify the configuration file ./conf/smile.ini for the memory and other JVM settings.

To use Smile shell in Kotlin, type

    ./bin/kotlin.sh

Unfortunately, Kotlin shell doesn't support pre-import packages.

Model Serialization

Most models support the Java Serializable interface (all classifiers do support Serializable interface) so that you can serialze a model and ship it to a production environment for inference. You may also use serialized models in other systems such as Spark.

Visualization

A picture is worth a thousand words. In machine learning, we usually handle high-dimensional data, which is impossible to draw on display directly. But a variety of statistical plots are tremendously valuable for us to grasp the characteristics of many data points. Smile provides data visualization tools such as plots and maps for researchers to understand information more easily and quickly. To use smile-plot, add the following to dependencies

    <dependency>
      <groupId>com.github.haifengl</groupId>
      <artifactId>smile-plot</artifactId>
      <version>4.2.0</version>
    </dependency>

On Swing-based systems, the user may leverage smile.plot.swing package to create a variety of plots such as scatter plot, line plot, staircase plot, bar plot, box plot, histogram, 3D histogram, dendrogram, heatmap, hexmap, QQ plot, contour plot, surface, and wireframe.

This library also support data visualization in declarative approach. With smile.plot.vega package, we can create a specification that describes visualizations as mappings from data to properties of graphical marks (e.g., points or bars). The specification is based on Vega-Lite. In a web browser, the Vega-Lite compiler automatically produces visualization components including axes, legends, and scales. It then determines properties of these components based on a set of carefully designed rules.

Contributing

Please read the contributing.md on how to build and test Smile.

Maintainers

Haifeng Li (@haifengl)
Karl Li (@kklioss)

Gallery

Scatterplot Matrix
Scatter Plot	Line Plot	Surface Plot
Bar Plot	Box Plot	Histogram Heatmap
Rolling Average	Geo Map	UMAP
Text Plot	Heatmap with Contour	Hexmap
IsoMap	LLE	Kernel PCA
Neural Network	SVM	Hierarchical Clustering
SOM	DBSCAN	Neural Gas
Wavelet	Exponential Family Mixture	Teapot Wireframe
Grid Interpolation

Name		Name	Last commit message	Last commit date
Latest commit History 4,650 Commits
.devcontainer		.devcontainer
.github		.github
base		base
bin		bin
binder		binder
buildSrc		buildSrc
chat		chat
clojure		clojure
core		core
deep		deep
gradle		gradle
json		json
kotlin		kotlin
mkl		mkl
nlp		nlp
plot		plot
project		project
scala		scala
serve		serve
shell		shell
spark		spark
web		web
.gitattributes		.gitattributes
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
COPYING		COPYING
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

License

Licenses found

haifengl/smile

Folders and files

Latest commit

History

Repository files navigation

Smile — Statistical Machine Intelligence and Learning Engine

Goal

Features

License

Issues/Discussions

Installation

Shell

Model Serialization

Visualization

Contributing

Maintainers

Gallery

Scatterplot Matrix

Scatter Plot

Line Plot

Surface Plot

Bar Plot

Box Plot

Histogram Heatmap

Rolling Average

Geo Map

UMAP

Text Plot

Heatmap with Contour

Hexmap

IsoMap

LLE

Kernel PCA

Neural Network

SVM

Hierarchical Clustering

SOM

DBSCAN

Neural Gas

Wavelet

Exponential Family Mixture

Teapot Wireframe

Grid Interpolation

About

Topics

Resources

License

Licenses found

Code of conduct

Stars

Watchers

Forks

Releases 31

Sponsor this project

Packages 0

Contributors 58

Languages

Packages