International Society for Data Science and Analytics, Data Science and Psychology - 2024 Meeting of ISDSA

Font Size: 
Using machine learning methods in the presence of numerous measured confounders in mediation analysis
Milica Miocevic

Date: 2024-07-22 11:00 AM – 11:10 AM
Last modified: 2024-07-05

Abstract


In certain fields, such as epidemiology and health research, there are numerous measured confounding variables (e.g., demographic information and medical history) that need to be included in the statistical model to avoid biasing the effects of interest. In statistical mediation analysis, researchers are usually focused on accurately estimating the indirect effect. Previous work has shown that in mediation analysis, accounting for confounders and pure predictors of the outcome allows for unbiased estimates of the indirect and direct effects, whereas adjusting for pure predictors of the independent variable and mediator can increase the standard errors of the indirect and direct effects, thus lowering the power to detect these effects (Diop et al., 2021). This project aims to examine if machine learning methods can be leveraged to select confounders of the paths constituting the indirect effect in mediation analysis from a large set of measured variables. A simulation study was conducted to examine if ridge regression, lasso, and elastic net successfully select confounders of the relationships between the independent variable and mediator (a-path), the mediator and the outcome (b-path), and the independent variable and the outcome (c’-path) in the following scenarios: (1) 40 measured variables, none of which act as either a covariate or a confounder for variables in the mediation model, (2) small effects of 4 confounders for all three paths in the mediation model and 36 unrelated measured variables, (3) large effects of 4 confounders for all three paths in the mediation model and 36 unrelated measured variables, (4) 4 pure covariates predicting the outcome variable and 36 unrelated measured variables, (5) a mixture of pure predictors and confounders of the a-path in addition to unrelated variables that are also included in the model, and (6) a mixture of pure predictors and confounders of the b-path in addition to unrelated variables that are also included in the model. The indirect effect was large in all conditions and the sample size was 200. The bias of the point estimates and the coverage of the 95% confidence intervals for the indirect effect were evaluated over 1,000 iterations. Preliminary findings indicate that lasso tended to have the highest accuracy out of the three procedures and ridge regression tended to yield the highest standard errors of the indirect effect except in scenario (4) when it was the most efficient method. Ridge regression also yielded intervals for the indirect effect with coverage below the nominal value in scenarios (3), (5), and (6), and all procedures had coverage below the nominal value in scenario (2). The presentation will discuss the pros and cons of each machine learning procedure for confounder selection across different scenarios.


Keywords


Machine learning; confounders; mediation analysis

Conference registration is required in order to view papers.