sklearn_extra.kernel_methods.EigenProRegressor

class sklearn_extra.kernel_methods.EigenProRegressor(batch_size='auto', n_epoch=2, n_components=1000, subsample_size='auto', kernel='rbf', gamma='scale', degree=3, coef0=1, kernel_params=None, random_state=None)[source]

Regression using EigenPro iteration.

Train least squared kernel regression model with mini-batch EigenPro iteration.

Parameters:
batch_sizeint, default = ‘auto’

Mini-batch size for gradient descent.

n_epochint, default = 2

The number of passes over the training data.

n_componentsint, default = 1000

the maximum number of eigendirections used in modifying the kernel operator. Convergence rate speedup over normal gradient descent is approximately the largest eigenvalue over the n_componentth eigenvalue, however, it may take time to compute eigenvalues for large n_components

subsample_sizeint, default = ‘auto’

The number of subsamples used for estimating the largest n_component eigenvalues and eigenvectors. When it is set to ‘auto’, it will be 4000 if there are less than 100,000 samples (for training), and otherwise 12000.

kernelstring or callable, default = “rbf”

Kernel mapping used internally. Strings can be anything supported by scikit-learn, however, there is special support for the rbf, laplace, and cauchy kernels. If a callable is given, it should accept two arguments and return a floating point number.

gammafloat, default=’scale’

Kernel coefficient. If ‘scale’, gamma = 1/(n_features*X.var()). Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise. For kernels that use bandwidth, bandwidth = 1/sqrt(2*gamma).

degreefloat, default=3

Degree of the polynomial kernel. Ignored by other kernels.

coef0float, default=1

Zero coefficient for polynomial and sigmoid kernels. Ignored by other kernels.

kernel_paramsmapping of string to any

Additional parameters (keyword arguments) for kernel function passed as callable object.

random_stateint, RandomState instance or None, (default=None)

The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

References

  • Siyuan Ma, Mikhail Belkin “Diving into the shallows: a computational perspective on large-scale machine learning”, NIPS 2017.

Examples

>>> from sklearn_extra.kernel_methods import EigenProRegressor
>>> import numpy as np
>>> n_samples, n_features, n_targets = 4000, 20, 3
>>> rng = np.random.RandomState(1)
>>> x_train = rng.randn(n_samples, n_features)
>>> y_train = rng.randn(n_samples, n_targets)
>>> rgs = EigenProRegressor(n_epoch=3, gamma=.5, subsample_size=50)
>>> rgs.fit(x_train, y_train)
EigenProRegressor(gamma=0.5, n_epoch=3, subsample_size=50)
>>> y_pred = rgs.predict(x_train)
>>> loss = np.mean(np.square(y_train - y_pred))
__init__(batch_size='auto', n_epoch=2, n_components=1000, subsample_size='auto', kernel='rbf', gamma='scale', degree=3, coef0=1, kernel_params=None, random_state=None)[source]