Setting up a moderation analysis with the HSB data

The program can fit three different moderation models. In this example, we show how to set up the simplest of the three moderation models, with a focal variable and a single moderator variable, using the well-known HSB data.

The following topics are covered in this document:

Description of the data and the model

Reading in the data

Model building

Defining outcome type and other options

Requesting graphs

Running the model

The data

The High School and Beyond (HSB) survey is a nationally representative sample of U.S. public and Catholic high schools.The data file used here contains information on 7185 students from 160 schools. Of these schools, 90 were public schools and 70 Catholic schools.

Here we fit a model related to those given in Hierarchical Linear Models, Raudenbush & Bryk (2002). We wish to examine the relationship between student socio-economic status (SES) and student mathematics achievement (MATHACH), while also evaluating the extent to which the sector a school originates from (public vs Catholic) moderates the relationship between SES and mathematics achievement.

Reading in the data

To start a new analysis, select the New Analysis option on the landing page.

The Data page will open. As a first step, select the data file HSB.csv and click Open to obtain the image shown below. Note that the data file may reside on a local hard drive, OneDrive, or Google Drive. A description of the variables contained in the data file are given below the image.

The variables are:

In the current example, we will only be using the variable MATHACH, SES and SECTOR, along with the ID variable.

The first step in data file specification is to identify the ID variable or variables. In this example, we are looking at a two-level model and need to specify only a level-2 ID. This ID is represented by the variable appropriately, if unimaginatively, named “id”. Check the box for this in the ID 2 row of the second table.

After doing so, select all variables that may be used in model building. Here, all the variables have been checked in the Variables row of the second table. Once all variables have been selected, click Update to request the program to automatically assign the selected variables to the different levels of the hierarchy.

The program has identified the variables MINORITY, FEMALE, SES, and MATHACH as level-1 variables, in other words, student characteristics. The variables SIZE, SECTOR, PRACAD, and DISCLIM are school level variables and are appropriately indicated as level-2 variables.

Model building

Data file specification is now complete, and model building can start. To begin, click on the Models link at the top of the page to open the Models page. When first opened, only one option is available in the Level-1 Model field: Set Response. Use the drop-down list associated with Set Response to select the outcome variable of choice, in this case MATHACH.

On selection of the outcome variable, the page is automatically updated and now displays a fully unconditional model with MATHACH as outcome.

To add predictors to the model, the Level-1 Variables and Level-2 Variables fields are used. Start by clicking on the Level-1 Variables field and selecting the variable SES, as shown below. Note that a variable can be entered as either uncentered, group centered (that is, centered around the group mean of the level-2 unit a level-1 record is nested in) or grand centered (in other words, centered around the mean of the variable over all level-1 records, irrespective of the unit a level-1 record is nested in). In this simple example, we enter SES as an uncentered predictor to the model by dragging the variable name SES from the Level-1 Variables field into the equation for MATHACH.

The page now displays a random intercept model, with MATHACH as outcome, a fixed and random intercept coefficient, and a fixed SES slope coefficient.

We now add the level-2 variable SECTOR to both of the level-2 equations by selecting it from the Level-2 Variables field and dragging it into both the level-2 equations. This model contains a fixed and random intercept term, fixed coefficients for SES and SECTOR, and a fixed coefficient for the interaction between SES and SECTOR. The latter is created by adding the variable SECTOR to the SES slope equation.

As a final step, check the Random check box for the second level-2 equation to also include a random SES slope coefficient. In other words, both intercept and SES slope are now allowed to vary randomly over the schools at level-2.

Defining the outcome type

Our Models page is now complete, and we can move on the specifying additional options such as the type of outcome variable on the Settings page. The Settings page is accessed by clicking on the Settings button.

Our outcome variable MATHACH is a continous normally distributed variable, and the program will by default attempt to guide the user to the most appropriate choice of outcome distribution, in this case Normal (HLM). Currently, moderation analyses can only be performed for a normal (HLM) model, so we need not change the distribution of the outcome.

At the bottom of the page, an Moderation Analysis section is displayed if the program picks up that interaction terms permitting moderation analysis have been specified on the Models page. Recall that we created a cross-level interaction between the level-1 variable SES and the level-2 variable SECTOR by adding SECTOR to the equation for the SES slope. As a result, we now can opt to select a focal variable and a moderator variable. As we want to evaluate the impact of the second level predictor SECTOR on the relationship between MATHACH and SES, we select SES as the focal variable and SECTOR as the moderator variable in the Interactions field.

Requesting graphs

For moderation analyses, two types of graphs may be made. By default, both simple slopes and confidence interval graphs will be produced with default graph settings. To modify the graph parameters, the Graphing page is used. Open the Graphing page by clicking on Graphing at the top of the page.

Note that you can request either a Simple slope graph, a Confidence interval graph, or both.

The color, style and line width may be changed, along with the color a line is to be displayed in. One can opt to show or suppress a legend, and change graph tile and the axes labels. In the case of a confidence graph, an additional option is available to set the fill color for the area between the two confidence intervals lines.

Assuming that a black and white graph would be best for publication purposes, we opted for the settings shown below.

Running the model

After specifying the appearance of the moderation graphs, click on the Run option on the top of the page to run the analysis and graphs. Click the Run Syntax button to start the analysis.

The Progress window opens. Error messages concerning the analysis, if applicable, will also appear here. The information in this window can be copied to a clipboard to by using the Copy to Clipboard button at bottom left. Close the Progress window by clicking Close to return to the Run page. The Run page will display links to all output files.

For example, clicking the link to the HTML output will display this output in a window on the Run Page:

The graphs requested are also displayed on the Run page in the browser. From here, they can be copied and pasted into any word processing program. The two graphs are shown below.

The first of the available graphs is a plot of the conditional regression line(s) describing the relationship between the outcome and the focal predictor as a function of the moderator. The plot will automatically show a line at each of three values of the moderator variable: mean – 1 standard deviation, mean, and mean + 1 standard deviation. In other words, the value of the moderator variable is held constant at three specific values. Values of the focal variable are used to define the x-axis, and the plot is confined to the area (mean of focal variable – 2 standard deviations, mean of focal variable + 2 standard deviations).

The second graph shows the regression line describing the relationship between the outcome and the focal predictor as a function of the moderator, along with a 95% confidence interval. It also shows the so-called region of significance, provided that the boundaries of this region fall within the scale set by the values of the moderator variable, which again defines the x-axis. The region between the lower and upper bound of the region of significance indicates the values of the moderator for which the slope of the regression of outcome on focal variable transitions from non-significance to significance. In the current example, only the lower bound, denoted by “L” on the graph, falls within the scale set by the moderator variable values.