rlassomodels.RlassoIV¶
- class rlassomodels.RlassoIV(*, select_X=True, select_Z=True, post=True, sqrt=False, fit_intercept=True, cov_type='nonrobust', x_dependent=False, random_state=None, lasso_psi=False, prestd=False, n_corr=5, max_iter=2, conv_tol=0.0001, n_sim=5000, c=1.1, gamma=None, solver='cd', cd_max_iter=1000, cd_tol=1e-10, cvxpy_opts=None, zero_tol=0.0001)[source]¶
Rigorous Lasso for instrumental-variable estimation in the presence of high-dimensional instruments and/or controls. Uses the post-double-selection (PDS) and post-regularization (CHS) methods for estimation, see references below.
- Parameters
- select_X: bool, optional (default: True)
Whether to use lasso/post-lasso for feature selection of high-dim controls.
- select_Z: bool, optional (default: True)
Whether to use lasso/post-lasso for feature selection of high-dim instruments.
- post: bool, default=True
If True, post-lasso is used to estimate betas. Note that post will only affect the results for the post-regularization (CHS) method and not those of post-double-selection (pds).
- sqrt: bool, default=False
If True, sqrt lasso criterion is minimized: loss = ||y - X @ beta||_2 / sqrt(n) see: Belloni, A., Chernozhukov, V., & Wang, L. (2011). Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4), 791-806.
- fit_intercept: bool, default=True
If True, unpenalized intercept is estimated.
- cov_type: str, default=”nonrobust”
Type of covariance matrix. “nonrobust” - nonrobust covariance matrix “robust” - robust covariance matrix “cluster” - cluster robust covariance matrix
- x_dependent: bool, default=False
If True, the less conservative lambda is estimated by simulation using the conditional distribution of the design matrix.
- n_sim: int, default=5000
Number of simulations to be performed for x-dependent lambda calculation.
- random_state: int, default=None
Random seed used for simulations if x_dependent is True.
- lasso_psi: bool, default=False
If True, post-lasso is not used to obtain the residuals during the iterative estimation procedure.
- prestd: bool, default=False
If True, the data is prestandardized instead of on the fly by penalty loadings. Currently only supports homoscedastic case.
- n_corr: int, default=5
Number of correlated variables to be used in the for initial calculation of the residuals.
- c: float, default=1.1
slack parameter used for lambda calculation. From Hansen et.al. (2020): “c needs to be greater than 1 for the regularization event to hold asymptotically, but not too high as the shrinkage bias is increasing in c.”
- gamma: float, optional=None
Regularization parameter. If not provided gamma is calculated as 0.1 / np.log(n_samples)
- max_iter: int, default=2
Maximum number of iterations to perform in the iterative estimation procedure.
- conv_tol: float, default=1e-4
Tolerance for the convergence of the iterative estimation procedure.
- solver: str, default=”cd”
Solver to be used for the iterative estimation procedure. “cd” - coordinate descent “cvxpy” - cvxpy solver
- cd_max_iter: int, default=10000
Maximum number of iterations to perform in the shooting algorithm.
- cd_tol: float, default=1e-10
Tolerance for the coordinate descent algorithm.
- cvxpy_opts: dict, default=None
Options to be passed to the cvxpy solver. See cvxpy documentation for more details: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
- zero_tol: float, default=1e-4
Tolerance for the rounding of the coefficients to zero.
References
- Chernozhukov, V., Hansen, C., & Spindler, M. (2015).
Post-selection and post-regularization inference in linear models with many controls and instruments. American Economic Review, 105(5), 486-90.
- Belloni, A., Chernozhukov, V., & Hansen, C. (2014).
Inference on treatment effects after selection among high-dimensional controls. The Review of Economic Studies, 81(2), 608-650.
- Ahrens, A., Hansen, C. B., & Schaffer, M. (2019).
PDSLASSO: Stata module for post-selection and post-regularization OLS or IV estimation and inference.
- Attributes
- results_: dict[“PDS”, “CHS”]
Dictionary containing the 2-stage-least-squares estimates. Values are linearmodels.iv.IV2SLS objects. See: https://bashtage.github.io/linearmodels/iv/iv/linearmodels.iv.model.IV2SLS.html https://bashtage.github.io/linearmodels/iv/examples/basic-examples.html
- X_selected_: dict[list[str]]
List of selected controls for each stage in the estimation.
- Z_selected_: list[str]
List of selected instruments.
- valid_vars_: list[str]
List of variables for which standard errors and test statistics are valid.
- fit(X, y, D_exog=None, D_endog=None, Z=None)[source]¶
Fit the model.
- Parameters
- X: array-like, shape (n_samples, n_controls)
Control variables. Potentially high-dimensional.
- y: array-like, shape (n_samples,)
Outcome/dependent variable.
- D_exog: array-like, shape (n_samples, n_exog)
Low-dimensionnal exogenous regressors. On which inference is performed.
- D_endog: array-like, shape (n_samples, n_endog)
Endogenous regressors. On which inference is performed.
- Z: array-like, shape (n_samples, n_instruments)
Instruments. Potentially high-dimensional.
- Returns
- selfobject
Returns the instance itself.