# OnlineLearning

An implementation of online mini-batch learning for prediction in julia.

## Learners

A `Learner`

is fit by repeated calls to `update!(l::Learner, x::DSMat{Float64}, y::Vector{Float64})`

on mini-batches `(x, y)`

of a dataset. Updating a learner incrementally optimizes some loss function. The loss function depends on the implementation of concrete subtypes of `Learner`

. The actual optimization routine is implemented by an `AbstractSGD`

object.

Values of the outcome are predicted with `predict(l::Learner, x::DSMat{Float64})`

. The `predict!(obj::Learner, pr::Vector{Float64}, x::DSMat{Float64})`

method calculates predictions in place.

Features (`x`

) can be either a dense or sparse matrix. (`DSMat{T}`

is an alias for `DenseMatrix{T}`

or `SparseMatrixCSC{T, Ti <: Integer}`

)

### Available learners

`GLMLearner(m::GLMModel, optimizer::AbstractSGD)`

- GLMs without regularization.`GLMNetLearner(m::GLMModel, optimizer::AbstractSGD, lambda1 = 0.0, lambda2 = 0.0)`

- GLMs with l_1 and l_2 regularization.`SVMLearner`

- support vector machine, not fully implemented

The type of GLM is specified by `GLMModel`

. Choices are:

`LinearModel()`

for least squares`LogisticModel()`

for logistic regression`QuantileModel(tau=0.5)`

for`tau`

-quantile regression.

## Optimization

All of the learners require an optimizer of some sort. Currently, stochastic gradient descent type methods are provided by the `AbstractSGD`

type.

An `AbstractSGD`

implements an `update!(obj::AbstractSGD{Float64}, weights::Vector{Float64}, gr::Vector{Float64})`

method. This takes the current value of the weight(coefficient) vector and gradient and updates the weight vector in place. The `AbstractSGD`

instance stores tuning parameters and step information, and may have additional storage additional storage for if necessary.

### Available optimizers:

`SimpleSGD(alpha1::Float64, alpha2::Float64)`

- Standard SGD where step size is`alpha1/(1.0 + alpha1 * alpha2 * t)`

.`AdaDelta(rho::Float64, eps::Float64)`

- Implementation of Algorithm 1 here.`AdaGrad(eta::Float64)`

Stepsize is for weight`j`

`eta /[sqrt(sum of grad_j^2 up to t) + 1.0e-8]`

. Paper.`AveragedSGD(alpha1::Float64, alpha2::Float64, t0::Int)`

- Described in section 5.3 here with step size`alpha1/(1.0 + alpha1 * alpha2 * t)^(3/4)`

## Notes

This is a work in progress. Most testing has been in simulations and not with real data. `GLMLearner`

and `GLMNetLearner`

with l_2 regularization seem to work pretty well. `GLMNetLearner`

with l_1 regularization has not been thoroughly tested. Statistical performance tends to be pretty senstive to choice of optimizer and tuning parameters.

### TODO

- Everything is implemented in terms of Float64. Should allow for Float32 as well.
- Finish the SVM implementation, perhaps add Pegasos implementation
- Automatic transformations of features
- More useful interfaces/DataFrames interface
- More checking of data
- Automatic bounding for predictions
- Remove GLMLearner in favor of GLMNetLearner
- Better docs
- Eliminate extra memory allocation