Title: | Control Function Methods with Possibly Invalid Instrumental Variables |
---|---|
Description: | Inference with control function methods for nonlinear outcome models when the model is known ('Guo and Small' (2016) <arXiv:1602.01051>) and when unknown but semiparametric ('Li and Guo' (2021) <arXiv:2010.09922>). |
Authors: | Taehyeon Koo [aut], Sai Li [aut], Dylan Small [ctb], Zijian Guo [aut, cre, cph] |
Maintainer: | Zijian Guo <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2025-02-08 04:07:55 UTC |
Source: | https://github.com/zijguo/controlfunctioniv |
Implement the control function method for the inference of nonlinear treatment effects.
cf(formula, d1 = NULL, d2 = NULL)
cf(formula, d1 = NULL, d2 = NULL)
formula |
A formula describing the model to be fitted. |
d1 |
The baseline treatment value. |
d2 |
The target treatment value. |
For example, the formula Y ~ D + I(D^2)+X|Z+I(Z^2)+X
describes the models
and
.
Here, the outcome is
Y
, the endogenous variables is D
, the baseline covariates are X
, and the instrument variables are Z
. The formula environment follows
that in the ivreg function in the AER package. The endogenous variable D
must be in the first term of the formula for the outcome model.
If either one of d1
or d2
is missing or NULL
, CausalEffect
is calculated assuming that the baseline value d1
is the median of the treatment and the target value d2
is d1+1
.
cf
returns an object of class "cf", which is a list containing the following components:
coefficients |
The estimate of the coefficients in the outcome model. |
vcov |
The estimated covariance matrix of coefficients. |
CausalEffect |
The causal effect when the treatment changes from |
CausalEffect.sd |
The standard error of the causal effect estimator. |
CausalEffect.ci |
The 95% confidence interval of the causal effect. |
Guo, Z. and D. S. Small (2016), Control function instrumental variable estimation of nonlinear causal effect models, The Journal of Machine Learning Research 17(1), 3448–3482.
data("nonlineardata") Y <- log(nonlineardata[,"insulin"]) D <- nonlineardata[,"bmi"] Z <- as.matrix(nonlineardata[,c("Z.1","Z.2","Z.3","Z.4")]) X <- as.matrix(nonlineardata[,c("age","sex")]) cf.model <- cf(Y~D+I(D^2)+X|Z+I(Z^2)+X) summary(cf.model)
data("nonlineardata") Y <- log(nonlineardata[,"insulin"]) D <- nonlineardata[,"bmi"] Z <- as.matrix(nonlineardata[,c("Z.1","Z.2","Z.3","Z.4")]) X <- as.matrix(nonlineardata[,c("age","sex")]) cf.model <- cf(Y~D+I(D^2)+X|Z+I(Z^2)+X) summary(cf.model)
Psuedo data provided by Youjin Lee, which is generated mimicing the structure of Framingham Heart Study data.
data(nonlineardata)
data(nonlineardata)
A data.frame with 3733 observations on 9 variables:
Y: The incidence of cardiovascular diseases.
bmi: The BMI level.
insulin: The insulin level.
Z.1: SNP genotypes.
Z.2: SNP genotypes.
Z.3: SNP genotypes.
Z.4: SNP genotypes.
age: the age of the subject.
sex: the sex of the subject.
The Framingham Heart Study data supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University.
data(nonlineardata)
data(nonlineardata)
This function implements the pretest estimator by comparing the control function and the TSLS estimators.
pretest(formula, alpha = 0.05)
pretest(formula, alpha = 0.05)
formula |
A formula describing the model to be fitted. |
alpha |
The significant level. (default = |
For example, the formula Y ~ D + I(D^2)+X|Z+I(Z^2)+X
describes the models
and
.
Here, the outcome is
Y
, the endogenous variables is D
, the baseline covariates are X
, and the instrument variables are Z
. The formula environment follows
that in the ivreg function in the AER package. The endogenous variable D
must be in the first term of the formula for the outcome model.
pretest
returns an object of class "pretest", which is a list containing the following components:
coefficients |
The estimate of the coefficients in the outcome model. |
vcov |
The estimated covariance matrix of coefficients. |
Hausman.stat |
The Hausman test statistic used to test the validity of the extra IV generated by the control function. |
p.value |
The p-value of the Hausman test. |
cf.check |
The indicator that the extra IV generated by the control function is valid. |
Guo, Z. and D. S. Small (2016), Control function instrumental variable estimation of nonlinear causal effect models, The Journal of Machine Learning Research 17(1), 3448–3482.
data("nonlineardata") Y <- log(nonlineardata[,"insulin"]) D <- nonlineardata[,"bmi"] Z <- as.matrix(nonlineardata[,c("Z.1","Z.2","Z.3","Z.4")]) X <- as.matrix(nonlineardata[,c("age","sex")]) pretest.model <- pretest(Y~D+I(D^2)+X|Z+I(Z^2)+X) summary(pretest.model)
data("nonlineardata") Y <- log(nonlineardata[,"insulin"]) D <- nonlineardata[,"bmi"] Z <- as.matrix(nonlineardata[,c("Z.1","Z.2","Z.3","Z.4")]) X <- as.matrix(nonlineardata[,c("age","sex")]) pretest.model <- pretest(Y~D+I(D^2)+X|Z+I(Z^2)+X) summary(pretest.model)
Perform causal inference in the probit outcome model with possibly invalid IVs.
ProbitControl( Y, D, Z, X = NULL, intercept = TRUE, invalid = TRUE, d1 = NULL, d2 = NULL, w0 = NULL, bs.Niter = 40 )
ProbitControl( Y, D, Z, X = NULL, intercept = TRUE, invalid = TRUE, d1 = NULL, d2 = NULL, w0 = NULL, bs.Niter = 40 )
Y |
The outcome observation, a vector of length |
D |
The treatment observation, a vector of length |
Z |
The instrument observation of dimension |
X |
The covariates observation of dimension |
intercept |
Whether the intercept is included. (default = |
invalid |
If |
d1 |
A treatment value for computing CATE(d1,d2|w0). |
d2 |
A treatment value for computing CATE(d1,d2|w0). |
w0 |
A vector of the instruments and baseline covariates for computing CATE(d1,d2|w0). |
bs.Niter |
The bootstrap resampling size for constructing the confidence interval. |
ProbitControl
returns an object of class "SpotIV", which is a list containing the following components:
betaHat |
The estimate of the model parameter in front of the treatment. |
beta.sdHat |
The estimated standard error of betaHat. |
cateHat |
The estimate of CATE(d1,d2|w0). |
cate.sdHat |
The estimated standard deviation of |
SHat |
The estimated set of relevant IVs. |
VHat |
The estimated set of relevant and valid IVs. |
Maj.pass |
The indicator that the majority rule is satisfied. |
Li, S., Guo, Z. (2020), Causal Inference for Nonlinear Outcome Models with Possibly Invalid Instrumental Variables, Preprint arXiv:2010.09922.
data("nonlineardata") Y <- nonlineardata[,"CVD"] D <- nonlineardata[,"bmi"] Z <- as.matrix(nonlineardata[,c("Z.1","Z.2","Z.3","Z.4")]) X <- as.matrix(nonlineardata[,c("age","sex")]) d1 <- median(D)+1 d2 <- median(D) w0 <- c(rep(0,4), 30, 1) Probit.model <- ProbitControl(Y,D,Z,X,invalid = TRUE,d1 =d1, d2 = d2,w0 = w0) summary(Probit.model)
data("nonlineardata") Y <- nonlineardata[,"CVD"] D <- nonlineardata[,"bmi"] Z <- as.matrix(nonlineardata[,c("Z.1","Z.2","Z.3","Z.4")]) X <- as.matrix(nonlineardata[,c("age","sex")]) d1 <- median(D)+1 d2 <- median(D) w0 <- c(rep(0,4), 30, 1) Probit.model <- ProbitControl(Y,D,Z,X,invalid = TRUE,d1 =d1, d2 = d2,w0 = w0) summary(Probit.model)
Perform causal inference in the semi-parametric outcome model with possibly invalid IVs.
SpotIV( Y, D, Z, X = NULL, intercept = TRUE, invalid = TRUE, d1, d2, w0, M.est = TRUE, M = 2, bs.Niter = 40, bw = NULL )
SpotIV( Y, D, Z, X = NULL, intercept = TRUE, invalid = TRUE, d1, d2, w0, M.est = TRUE, M = 2, bs.Niter = 40, bw = NULL )
Y |
The outcome observation, a vector of length |
D |
The treatment observation, a vector of length |
Z |
The instrument observation of dimension |
X |
The covariates observation of dimension |
intercept |
Whether the intercept is included. (default = |
invalid |
If TRUE, the method is robust to the presence of possibly invalid IVs; If FALSE, the method assumes all IVs to be valid. (default = |
d1 |
A treatment value for computing CATE(d1,d2|w0). |
d2 |
A treatment value for computing CATE(d1,d2|w0). |
w0 |
A vector of the instruments and baseline covariates for computing CATE(d1,d2|w0). |
M.est |
If |
M |
The dimension of indices in the outcome model, from 1 to 3. (default = |
bs.Niter |
The bootstrap resampling size for constructing the confidence interval. |
bw |
A (M+1) by 1 vector bandwidth specification. (default = |
SpotIV
returns an object of class "SpotIV", which "SpotIV" is a list containing the following components:
betaHat |
The estimate of the model parameter in front of the treatment. |
cateHat |
The estimate of CATE(d1,d2|w0). |
cate.sdHat |
The estimated standard error of cateHat. |
SHat |
The set of relevant IVs. |
VHat |
The set of relevant and valid IVs. |
Maj.pass |
The indicator that the majority rule is satisfied. |
Li, S., Guo, Z. (2020), Causal Inference for Nonlinear Outcome Models with Possibly Invalid Instrumental Variables, Preprint arXiv:2010.09922.
data("nonlineardata") Y <- nonlineardata[,"CVD"] D <- nonlineardata[,"bmi"] Z <- as.matrix(nonlineardata[,c("Z.1","Z.2","Z.3","Z.4")]) X <- as.matrix(nonlineardata[,c("age","sex")]) d1 <- median(D)+1 d2 <- median(D) w0 <- c(rep(0,4), 30, 1) SpotIV.model <- SpotIV(Y,D,Z,X,invalid = TRUE,d1 =d1, d2 = d2,w0 = w0) summary(SpotIV.model)
data("nonlineardata") Y <- nonlineardata[,"CVD"] D <- nonlineardata[,"bmi"] Z <- as.matrix(nonlineardata[,c("Z.1","Z.2","Z.3","Z.4")]) X <- as.matrix(nonlineardata[,c("age","sex")]) d1 <- median(D)+1 d2 <- median(D) w0 <- c(rep(0,4), 30, 1) SpotIV.model <- SpotIV(Y,D,Z,X,invalid = TRUE,d1 =d1, d2 = d2,w0 = w0) summary(SpotIV.model)