AmeliaView GUI Guide

Below is a guide to the AmeliaView menus with references back to the users’s guide. The same principles from the user’s guide apply to AmeliaView. The only difference is how you interact with the program. Whether you use the GUI or the command line versions, the same underlying code is being called, and so you can read the command line-oriented discussion above even if you intend to use the GUI.

Loading AmeliaView

The easiest way to load AmeliaView is to open an R session and type the following two commands:

library(Amelia)
AmeliaView()

This will bring up the AmeliaView window on any platform.

AmeliaView welcome screen

Loading data into AmeliaView

AmeliaView loads with a welcome screen that has buttons which can load a data in many of the common formats. Each of these will bring up a window for choosing your dataset. Note that these buttons are only a subset of the possible ways to load data in AmeliaView. Under the File menu (shown below), you will find more options, including the datasets included in the package (africa and freetrade). You will also find import commands for Comma-Separated Values (.CSV), Tab-Delimited Text (.TXT), Stata v.5-10 (.DTA), SPSS (.DAT), and SAS Transport (.XPORT). Note that when using a CSV file, Amelia assumes that your file has a header (that is, a row at the top of the data indicating the variable names).

AmeliaView File and import menu.

You can also load data from an RData file. If the RData file contains more than one data.frame, a pop-up window will ask to you find the dataset you would like to load. In the file menu, you can also change the underlying working directory. This is where AmeliaView will look for data by default and where it will save imputed datasets.

Variable Dashboard

Main variable dashboard in AmeliaView

Once a dataset is loaded, AmeliaView will show the variable dashboard. In this mode, you will see a table of variables, with the current options for each of them shown, along with a few summary statistics. You can reorder this table by any of these columns by clicking on the column headings. This might be helpful to, say, order the variables by mean or amount of missingness.

Variable options via right-click menu on the variable dashboard

You can set options for individual variables by the right-click context menu or through the “Variables” menu. For instance, clicking “Set as Time-Series Variable” will set the currently selected variable in the dashboard as the time-series variable. Certain options are disabled until other options are enabled. For instance, you cannot add a lagged variable to the imputation until you have set the time-series variable. Note that any factor in the data is marked as a ID variable by default, since a factor cannot be included in the imputation without being set as an ID variable, a nominal variable, or the cross-section variable. If there is a factor that fails to meet one of these conditions, a red flag will appear next to the variable name.

Set as Time-Series Variable - Sets the currently selected variable to the time-series variable. Disabled when more than one variable is selected. Once this is set, you can add lags and leads and add splines of time. The time-series variable will have a clock icon next to it.
Set as Cross-Section Variable - Sets the currently selected variable to the cross-section variable. Disabled when more than one variable is selected. Once this is set, you can interact the splines of time with the cross-section. The cross-section variable will have a person icon next to it.
Unset as Time-Series Variable - Removes the time-series status of the variable. This will remove any lags, leads, or splines of time.
Unset as Cross-Section Variable - Removes the cross-section status of the variable. This will remove any intersection of the splines of time and the cross-section.
Add Lag/Lead - Adds versions of the selected variables either lagged back (“lag”) or forward (“lead”).
Remove Lag/Lead - Removes any lags or leads on the selected variables.
Plot Histogram of Selected - Plots a histogram of the selected variables. This command will attempt to put all of the histograms on one page, but if more than nine histograms are requested, they will appear on multiple pages.
Add Transformation… - Adds a transformation setting for the selected variables. Note that each variable can only have one transformation and the time-series and cross-section variables cannot be transformed.
Remove Transformation - Removes any transformation for the selected variables.
Add or Edit Bounds - Opens a dialog box to set logical bounds for the selected variable.

Amelia Options

Options menu

The “Variable” menu and the variable dashboard are the place to set variable-level options, but global options are set in the “Options” menu. For more information on these options, see vignette("using-amelia").

Splines of Time with… - This option, if activated, will have Ameliause flexible trends of time with the specified number of knots in the imputation. The higher the number of knots the greater the variation in the trend structure, yet it will take more degrees of freedom to estimate.
Interact with Cross-Section? - Include and interaction of the cross-section with the time trends. This interaction is way of allowing the trend of time to vary across cases as well. Using a 0-level spline of time and interacting with the cross section is the equivalent of using a fixed effects.
Add Observational Priors… - Brings a dialog window to set prior beliefs about ranges for individual missing observations.
Numerical Options - Brings a dialog window to set the tolerance of the EM algorithm, the seed of the random number generator, the ridge prior for numerical stability, and the maximum number of redraws for the logical bounds.
Draw Missingness Map - Draws a missingness map.
Output File Options - Bring a dialog to set the stub of the prefix of the imputed data files and the number of imputations. If you set the prefix to mydata, your output files will be mydata1.csv, mydata2.csv... etc.
Output File Type - Sets the format of imputed data. If you would like to not save any output data sets (if you wanted, for instance, to simply look at diagnostics), set this option to “(no save).” Currently, you can save the output data as: Comma Separated Values (.CSV), Tab Delimited Text (.TXT), Stata (.DTA), R save object (.RData), or to hold it in R memory. This last option will only work if you have called AmeliaView from an R session and want to return to the R command line to work with the output. Its name in R workspace will be the file prefix. The stacked version of the Stata output will work with their built-in mi tools.

Numerical options

Numerical options menu

Seed - Sets the seed for the random number generator used by Amelia. Useful if you need to have the same output twice.
Tolerance - Adjust the level of tolerance that Amelia uses to check convergence of the EM algorithm. In very large datasets, if your imputation chains run a long time without converging, increasing the tolerance will allow a lower threshold to judge convergence and end chains after fewer iterations.
Empirical Prior - A prior that adds observations to your data in order to shrink the covariances. A useful place to start is around 0.5% of the total number of observations in the dataset.
Maximum Resample for Bounds - Amelia fits logical bounds by rejecting any draws that do not fall within the bounds. This value sets the number of times Amelia should attempt to resample to fit the bounds before setting the imputation to the bound.

Add Distributional Prior

Detail for Add Distributional Prior dialog

Current Priors - A table of current priors in distributional form, with the variable and case name. You can remove priors by selecting them and using the right-click context menu.
Case - Select the case name or number you wish to set the prior about. You can also choose to make the prior for the entire variable, which will set the prior for any missing cell in that variable. The case names are generated from the row name of the observation, the value of the cross-section variable of the observation and the value of the time series variable of the observation.
Variable - The variable associated with the prior you would like specify. The list provided only shows the missing variables for the currently selected observation. 1.Mean - The mean value of the prior. The textbox will not accept letters or out of place punctuation.
Standard Deviation - The standard deviation of the prior. The textbox will only accept positive non-zero values.

Add Range Prior

Detail for Add Range Prior dialog

Case - Select the case name or number you wish to set the prior about. You can also choose to make the prior for the entire variable, which will set the prior for any missing cell in that variable. The case names are generated from the row name of the observation, the value of the cross-section variable of the observation and the value of the time series variable of the observation.
Variable - The variable associated with the prior you would like specify. The list provided only shows the missing variables for the currently selected observation.
Minimum - The minimum value of the prior. The textbox will not accept letters or out of place punctuation.
Maximum - The maximum value of the prior. The textbox will not accept letters or out of place punctuation.
Confidence - The confidence level of the prior. This should be between 0 and 1, non-inclusive. This value represents how certain your priors are. This value cannot be 1, even if you are absolutely certain of a give range. This is used to convert the range into an appropriate distributional prior.

Imputing and checking diagnostics

Output log showing Amelia output for a successful imputation.

Once you have set all the relevant options, you can impute your data by clicking the “Impute!” button in the toolbar. In the bottom right corner of the window, you will see a progress bar that indicates the progress of the imputations. For large datasets this could take some time. Once the imputations are complete, you should see a “Successful Imputation!” message appear where the progress bar was. You can click on this message to open the folder containing the imputed datasets.

If there was an error during the imputation, the output log will pop-up and give you the error message along with some information about how to fix the problem. Once you have fixed the problem, simply click “Impute!” again. Even if there was no error, you may want to view the output log to see how Ameliaran. To do so, simply click the “Show Output Log” button. The log also shows the call to the amelia() function in R. You can use this code snippet to run the same imputation from the R command line. You will have to replace the x argument in the amelia() call to the name of you dataset in the R session.

Diagnostics Dialog

Detail for the Diagnostics dialog

Upon the successful completion of an imputation, the diagnostics menu will become available. Here you can use all of the diagnostics available at the command-line.

Compare Plots - This will display the relative densities of the observed (red) and imputed (black) data. The density of the imputed values are the average imputations across all of the imputed datasets.
Overimpute - This will run Ameliaon the full data with one cell of the chosen variable artificially set to missing and then check the result of that imputation against the truth. The resulting plot will plot average imputations against true values along with 90% confidence intervals. These are plotted over a \(y=x\) line for visual inspection of the imputation model.
Number of overdispersions - When running the overdispersion diagnostic, you need to run the imputation algorithm from several overdispersed starting points in order to get a clear idea of how the chain are converging. Enter the number of imputations here.
Number of dimensions - The overdispersion diagnostic must reduce the dimensionality of the paths of the imputation algorithm to either one or two dimensions due to graphical restraints.
Overdisperse - Run overdispersion diagnostic to visually inspect the convergence of the Amelia algorithm from multiple start values that are drawn randomly.