Multiple Imputation by Chained Equations in Praxis: Guidelines and Review
Keywords:
Multiple imputation by chained equations, MICE, missing data, guidelines, review, RAbstract
Multiple imputation by chained equations (MICE) is an effective tool to handle missing data ‑ an almost unavoidable problem in quantitative data analysis. However, despite the empirical and theoretical evidence supporting the use of MICE, researchers in the social sciences often resort to inferior approaches unnecessarily risking erroneous results. The complexity of the decision process when encountering missing data may be what is discouraging potential users from adopting the appropriate technique. In this article, we develop straightforward step‑by‑step graphical guidelines on how to handle missing data based on a comprehensive literature review. It is our hope that these guidelines can help improve current standards of handling missing data. The guidelines incorporate recent innovations on how to handle missing data such as random forests and predictive mean matching. Thus, the data analysts who already actively apply MICE may use it to review some of the newest developments. We demonstrate how the guidelines can be used in praxis using the statistical program R and data from the European Social Survey. We demonstrate central decisions such as variable selection and number of imputations as well as how to handle typical challenges such as skewed distributions and data transformations. These guidelines will enable a social science researcher to go through the process of handling missing data while adhering to the newest developments in the field.Downloads
Published
Issue
Section
License
Open Access Publishing
The Electronic Journal of Business Research Methods operates an Open Access Policy. This means that users can read, download, copy, distribute, print, search, or link to the full texts of articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, is that authors control the integrity of their work, which should be properly acknowledged and cited.