`sklearn_extra.kernel_methods`.EigenProRegressor¶

class sklearn_extra.kernel_methods.EigenProRegressor(batch_size='auto', n_epoch=2, n_components=1000, subsample_size='auto', kernel='rbf', gamma='scale', degree=3, coef0=1, kernel_params=None, random_state=None)[source]¶

Regression using EigenPro iteration.

Train least squared kernel regression model with mini-batch EigenPro iteration.

Parameters:

batch_sizeint, default = ‘auto’: Mini-batch size for gradient descent.
n_epochint, default = 2: The number of passes over the training data.
n_componentsint, default = 1000: the maximum number of eigendirections used in modifying the kernel operator. Convergence rate speedup over normal gradient descent is approximately the largest eigenvalue over the n_componentth eigenvalue, however, it may take time to compute eigenvalues for large n_components
subsample_sizeint, default = ‘auto’: The number of subsamples used for estimating the largest n_component eigenvalues and eigenvectors. When it is set to ‘auto’, it will be 4000 if there are less than 100,000 samples (for training), and otherwise 12000.
kernelstring or callable, default = “rbf”: Kernel mapping used internally. Strings can be anything supported by scikit-learn, however, there is special support for the rbf, laplace, and cauchy kernels. If a callable is given, it should accept two arguments and return a floating point number.
gammafloat, default=’scale’: Kernel coefficient. If ‘scale’, gamma = 1/(n_features*X.var()). Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise. For kernels that use bandwidth, bandwidth = 1/sqrt(2*gamma).
degreefloat, default=3: Degree of the polynomial kernel. Ignored by other kernels.
coef0float, default=1: Zero coefficient for polynomial and sigmoid kernels. Ignored by other kernels.
kernel_paramsmapping of string to any: Additional parameters (keyword arguments) for kernel function passed as callable object.
random_stateint, RandomState instance or None, (default=None): The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

References

Siyuan Ma, Mikhail Belkin “Diving into the shallows: a computational perspective on large-scale machine learning”, NIPS 2017.

Examples

>>> from sklearn_extra.kernel_methods import EigenProRegressor
>>> import numpy as np
>>> n_samples, n_features, n_targets = 4000, 20, 3
>>> rng = np.random.RandomState(1)
>>> x_train = rng.randn(n_samples, n_features)
>>> y_train = rng.randn(n_samples, n_targets)
>>> rgs = EigenProRegressor(n_epoch=3, gamma=.5, subsample_size=50)
>>> rgs.fit(x_train, y_train)
EigenProRegressor(gamma=0.5, n_epoch=3, subsample_size=50)
>>> y_pred = rgs.predict(x_train)
>>> loss = np.mean(np.square(y_train - y_pred))

__init__(batch_size='auto', n_epoch=2, n_components=1000, subsample_size='auto', kernel='rbf', gamma='scale', degree=3, coef0=1, kernel_params=None, random_state=None)[source]¶

sklearn_extra.kernel_methods.EigenProRegressor¶

`sklearn_extra.kernel_methods`.EigenProRegressor¶