A Machine Learning Framework for Julia
MLJ is a machine learning framework for Julia aiming to provide a convenient way to use and combine a multitude of tools and models available in the Julia ML/Stats ecosystem. MLJ is released under the MIT licensed and sponsored by the Alan Turing Institute.
Using MLJ • Models Available • MLJ Universe • Contributing • MLJ Cheatsheet • Citing MLJ
To deal with MKL errors encountered on MacOS, see here.
Key goals
- Offer a consistent way to use, compose and tune machine learning models in Julia,
- Promote the improvement of the Julia ML/Stats ecosystem by making it easier to use models from a wide range of packages,
- Unlock performance gains by exploiting Julia's support for parallelism, automatic differentiation, GPU, optimisation etc.
Key features
- Data agnostic, train models on any data supported by the Tables.jl interface,
- Extensive support for model composition (pipelines and learning networks),
- Convenient syntax to tune and evaluate (composite) models.
- Consistent interface to handle probabilistic predictions.
- Extensible tuning interface, to support growing number of optimization strategies, and designed to play well with model composition.
Using MLJ
Initially it is recommended that MLJ and associated packages be installed in a new environment to avoid package conflicts. You can do this with
julia> using Pkg; Pkg.activate("My_MLJ_env", shared=true)
Installing MLJ is also done with the package manager:
julia> Pkg.add("MLJ")
It is important to note that MLJ is essentially a big wrapper providing a unified access to model providing packages and so you will also need to make sure these packages are available in your environment. For instance, if you want to use a Decision Tree Classifier, you need to have DecisionTree.jl installed:
julia> Pkg.add("DecisionTree");
julia> using MLJ;
julia> @load DecisionTreeClassifier
For a list of models and their packages see the table below, or run
using MLJ
models()
We recommend you start with models marked as coming from mature packages such as DecisionTree, ScikitLearn or XGBoost.
MLJ is supported by a number of satelite packages (MLJTuning, MLJModelInterface, etc) which the general user is not required to install directly. Developers can learn more about these here.
Tutorials
The best place to get started with MLJ is to go the MLJ Tutorials website. Each of the tutorial can be downloaded as a notebook or Julia script to facilitate experimentation with the packages. For more comprehensive documentation, see the user manual.
You're also welcome to join the #mlj
Julia slack channel to ask questions and make suggestions.
Known issues using ScitLearn models with MacOS
For users of Mac OS using Julia 1.3 or higher, using ScikitLearn models can lead to unexpected MKL errors due to an issue not related to MLJ. See this Julia Discourse discussion and this issue for context.
A temporary workaround for this issue is to force the installation of an older version of the OpenSpecFun_jll
library. To install an appropriate version, activate your MLJ environment and run using Pkg; Pkg.develop(PackageSpec(url="https://github.com/tlienart/OpenSpecFun_jll.jl"))
.
Available Models
MLJ provides access to to a wide variety of machine learning models. We are always looking for help adding new models or test existing ones. Currently available models are listed below; for the most up-to-date list, run using MLJ; models()
.
- experimental: indicates the package is fairly new and/or is under active development; you can help by testing these packages and making them more robust,
- medium: indicates the package is fairly mature but may benefit from optimisations and/or extra features; you can help by suggesting either,
- high: indicates the package is very mature and functionalities are expected to have been fairly optimised and tested.
Package | Models | Maturity | Note |
---|---|---|---|
Clustering.jl | KMeans, KMedoids | high | † |
DecisionTree.jl | DecisionTreeClassifier, DecisionTreeRegressor, AdaBoostStumpClassifier | high | † |
EvoTrees.jl | EvoTreeRegressor, EvoTreeClassifier, EvoTreeCount, EvoTreeGaussian | medium | gradient boosting models |
GLM.jl | LinearRegressor, LinearBinaryClassifier, LinearCountRegressor | medium | † |
LightGBM.jl | LightGBMClassifier, LightGBMRegressor | high | |
LIBSVM.jl | LinearSVC, SVC, NuSVC, NuSVR, EpsilonSVR, OneClassSVM | high | also via ScikitLearn.jl |
MLJModels.jl (builtins) | StaticTransformer, FeatureSelector, FillImputer, UnivariateStandardizer, Standardizer, UnivariateBoxCoxTransformer, OneHotEncoder, ContinuousEncoder, ConstantRegressor, ConstantClassifier | medium | |
MLJLinearModels.jl | LinearRegressor, RidgeRegressor, LassoRegressor, ElasticNetRegressor, QuantileRegressor, HuberRegressor, RobustRegressor, LADRegressor, LogisticClassifier, MultinomialClassifier | experimental | |
MultivariateStats.jl | RidgeRegressor, PCA, KernelPCA, ICA, LDA, BayesianLDA, SubspaceLDA, BayesianSubspaceLDA | high | † |
NaiveBayes.jl | GaussianNBClassifier, MultinomialNBClassifier, HybridNBClassifier | experimental | |
NearestNeighbors.jl | KNNClassifier, KNNRegressor | high | |
ParallelKMeans.jl | KMeans | experimental | |
ScikitLearn.jl | ARDRegressor, AdaBoostClassifier, AdaBoostRegressor, AffinityPropagation, AgglomerativeClustering, BaggingClassifier, BaggingRegressor, BayesianLDA, BayesianQDA, BayesianRidgeRegressor, BernoulliNBClassifier, Birch, ComplementNBClassifier, DBSCAN, DummyClassifier, DummyRegressor, ElasticNetCVRegressor, ElasticNetRegressor, ExtraTreesClassifier, ExtraTreesRegressor, FeatureAgglomeration, GaussianNBClassifier, GaussianProcessClassifier, GaussianProcessRegressor, GradientBoostingClassifier, GradientBoostingRegressor, HuberRegressor, KMeans, KNeighborsClassifier, KNeighborsRegressor, LarsCVRegressor, LarsRegressor, LassoCVRegressor, LassoLarsCVRegressor, LassoLarsICRegressor, LassoLarsRegressor, LassoRegressor, LinearRegressor, LogisticCVClassifier, LogisticClassifier, MeanShift, MiniBatchKMeans, MultiTaskElasticNetCVRegressor, MultiTaskElasticNetRegressor, MultiTaskLassoCVRegressor, MultiTaskLassoRegressor, MultinomialNBClassifier, OPTICS, OrthogonalMatchingPursuitCVRegressor, OrthogonalMatchingPursuitRegressor, PassiveAggressiveClassifier, PassiveAggressiveRegressor, PerceptronClassifier, ProbabilisticSGDClassifier, RANSACRegressor, RandomForestClassifier, RandomForestRegressor, RidgeCVClassifier, RidgeCVRegressor, RidgeClassifier, RidgeRegressor, SGDClassifier, SGDRegressor, SVMClassifier, SVMLClassifier, SVMLRegressor, SVMNuClassifier, SVMNuRegressor, SVMRegressor, SpectralClustering, TheilSenRegressor | high | † |
XGBoost.jl | XGBoostRegressor, XGBoostClassifier, XGBoostCount | high |
Note (†): some models are missing, your help is welcome to complete the interface. Get in touch with Thibaut Lienart on Slack if you would like to help, thanks!
The MLJ Universe
The functionality of MLJ is distributed over a number of repositories illustrated in the dependency chart below. Click on the appropriate link for further information:
Code Organization • Road Map • Contributing
MLJ • MLJBase • MLJModelInterface • MLJModels • MLJTuning • MLJLinearModels • MLJFlux
MLJTutorials • MLJScientificTypes • ScientificTypes
Dependency chart for MLJ repositories. Repositories with dashed connections do not currently exist but are planned/proposed.
Citing MLJ
@software{anthony_blaom_2019_3541506,
author = {Anthony Blaom and
Franz Kiraly and
Thibaut Lienart and
Sebastian Vollmer},
title = {alan-turing-institute/MLJ.jl: v0.5.3},
month = nov,
year = 2019,
publisher = {Zenodo},
version = {v0.5.3},
doi = {10.5281/zenodo.3541506},
url = {https://doi.org/10.5281/zenodo.3541506}
}
Contributors
Core design: A. Blaom, F. Kiraly, S. Vollmer
Active maintainers: A. Blaom, T. Lienart, S. Okon
Active collaborators: D. Arenas, D. Buchaca, J. Hoffimann, S. Okon, J. Samaroo, S. Vollmer
Past collaborators: D. Aluthge, E. Barp, G. Bohner, M. K. Borregaard, V. Churavy, H. Devereux, M. Giordano, M. Innes, F. Kiraly, M. Nook, Z. Nugent, P. Oleśkiewicz, A. Shridar, Y. Simillides, A. Sengupta, A. Stechemesser.
License
MLJ is supported by the Alan Turing Institute and released under the MIT "Expat" License.