`rlassomodels`.Rlasso¶

class rlassomodels.Rlasso(*, post=True, sqrt=False, fit_intercept=True, cov_type='nonrobust', x_dependent=False, random_state=None, lasso_psi=False, prestd=False, n_corr=5, max_iter=2, conv_tol=0.0001, n_sim=5000, c=1.1, gamma=None, solver='cd', cd_max_iter=1000, cd_tol=1e-10, cvxpy_opts=None, zero_tol=0.0001)[source]¶

Rigorous Lasso estimator with theoretically justified penalty levels and desirable convergence properties.

Parameters

post: bool, default=True: If True, post-lasso is used to estimate betas.
sqrt: bool, default=False: If True, sqrt lasso criterion is minimized: loss = ||y - X @ beta||_2 / sqrt(n) see: Belloni, A., Chernozhukov, V., & Wang, L. (2011). Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4), 791-806.
fit_intercept: bool, default=True: If True, unpenalized intercept is estimated.
cov_type: str, default=”nonrobust”: Type of covariance matrix. “nonrobust” - nonrobust covariance matrix “robust” - robust covariance matrix
x_dependent: bool, default=False: If True, the less conservative lambda is estimated by simulation using the conditional distribution of the design matrix.
n_sim: int, default=5000: Number of simulations to be performed for x-dependent lambda calculation.
random_state: int, default=None: Random seed used for simulations if x_dependent is True.
lasso_psi: bool, default=False: If True, post-lasso is not used to obtain the residuals during the iterative estimation procedure.
prestd: bool, default=False: If True, the data is prestandardized instead of on the fly by penalty loadings. Currently only supports homoscedastic case.
n_corr: int, default=5: Number of correlated variables to be used in the for initial calculation of the residuals.
c: float, default=1.1: slack parameter used for lambda calculation. From Hansen et.al. (2020): “c needs to be greater than 1 for the regularization event to hold asymptotically, but not too high as the shrinkage bias is increasing in c.”
gamma: float, optional=None: Regularization parameter. If not provided gamma is calculated as 0.1 / np.log(n_samples)
max_iter: int, default=2: Maximum number of iterations to perform in the iterative estimation procedure.
conv_tol: float, default=1e-4: Tolerance for the convergence of the iterative estimation procedure.
solver: str, default=”cd”: Solver to be used for the iterative estimation procedure. “cd” - coordinate descent “cvxpy” - cvxpy solver
cd_max_iter: int, default=10000: Maximum number of iterations to perform in the shooting algorithm.
cd_tol: float, default=1e-10: Tolerance for the coordinate descent algorithm.
cvxpy_opts: dict, default=None: Options to be passed to the cvxpy solver. See cvxpy documentation for more details: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
zero_tol: float, default=1e-4: Tolerance for the rounding of the coefficients to zero.

References

Belloni, A., & Chernozhukov, V. (2013). Least squares after model selection: in high-dimensional sparse models. Bernoulli, 19(2), 521-547.
Belloni, A., Chernozhukov, V., & Wang, L. (2011).: Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4), 791-806.
Ahrens, A., Hansen, C. B., & Schaffer, M. E. (2020). lassopack: Model: selection and prediction with regularized regression in Stata. The Stata Journal, 20(1), 176-235.
Chernozhukov, V., Hansen, C., & Spindler, M. (2016).: hdm: High-dimensional metrics. arXiv preprint arXiv:1608.00354.

Attributes

coef_: numpy.array, shape (n_features,): Estimated coefficients.
intercept_: float: Estimated intercept.
lambd_: float: Estimated lambda/overall penalty level.
psi_: numpy.array, shape (n_features, n_features): Estimated penalty loadings.
n_iter_: int: Number of iterations performed by the rlasso algorithm.
n_features_in_: int: Number of features in the input data.
n_samples_: int: Number of samples/observations in the input data.
feature_names_in_: str: Name of the endogenous variable. Only stored if the input data is a pandas dataframe.

fit(X, y)[source]¶

Fit the model to the dataself.

Parameters

X: array-like, shape (n_samples, n_features): Design matrix.
y: array-like, shape (n_samples,): Target vector.

Returns

self: object: Returns self.

fit_formula(formula, data)[source]¶

Fit the the model to the data using fomula language. Parameters ———- formula: str

Formula to fit the model. Ex: “y ~ x1 + x2 + x3”

data: Union[pandas.DataFrame, numpy.recarray, dict]: Dataset to fit the model.

Returns

self: object: Returns self.

predict(X)[source]¶

Use fitted model to predict on new data.

Parameters

X: array-like, shape (n_samples, n_features): Design matrix.

Returns

y_pred: numpy.array, shape (n_samples,): Predicted target values.

rlassomodels.Rlasso¶

`rlassomodels`.Rlasso¶