Two-level centered model for the HSB data

In this example, we use the data from the High School and Beyond Study of 1982 and set up a model closely following some of the models in Chapter 4 of the well-known Raudenbush & Bryk (Sage, 2^nd Edition) text. Here we concentrate on how to set up the analysis, but readers are strongly urged to also read the relevant chapter to gain more insight into the model design decisions and interpretation of results obtained.

Description of the data

The model

Reading in the data

Model building

Defining outcome type and other options

Running the model

Description of the data

Data were available for a subsample of students and schools surveyed in 1982. The sample includes information on 160 schools, with a total of 7185 students nested within these. At a school level, we have the following information:

Type of school, as represented by the variable SECTOR. This variable assumes values of either 0 or 1, indicating whether the school is a public or Catholic school.
A measure of the average socio-economic status of students within each school, represented by the variable MEANSES.

For each student, we have information on

A standardized measure of mathematical achievement (MATHACH)
The student’s socio-economic status (SES). This measure is a composite of parental education and occupation and the income of the household.
With the students nested within a school, we define the lowest level of the hierarchy as the student level, and the second level as the school level. The focus here is to determine to what extent schools differ in the mean mathematics achievement, taking both socio-economic status and school sector (SECTOR) into account.

The data are stored in the file example.csv. The first few lines of this comma-separated values file are shown below.

The first line of the file contains the variable names. Each subsequent line contains information for a student. The file contains additional variables as well, such as FEMALE, representing the gender of a student. The first column, ID, contains the ID number of the school a student belongs to. This is followed by all student level information and all school level information for the school in question. Here we are looking at data from school with ID = 1224. School level information for this school, for example the information on SECTOR and MEANSES, are appended to the record for each student in the school.

It is easy to see that while student level variables such as MATHACH changes from student to student, values for school level variables such as MEANSES stays the same over all students within the school.

Descriptive statistics for variables of interest in this example are shown below.

The model

The model we want to fit is the following:

and

A variable with a horizontal bar on top represents the grand mean unless it contains a subscript such as .j in which case it represents a group mean.

The model can also be written as a mixed model of the form

The

coefficients represent the fixed effects in the model, while

and

represent the random intercept and random slope effects. Residual variation at level-1 is represented by

. It is assumed

and

Reading in the data

The first step is to read the data into the program. This can be done in one of two ways:

Clicking New Analysis on the landing page, which will take you to the Data page.
Clicking on the Data link at the top of the window.

When first accessed, the Data page shows the following fields:

Select file: used to provide the name of the data file
Type of file: by default, it is assumed that comma-separated value files (CSV) will be used. In subsequent versions of the program, this menu will be extended to allow other types of data files as well.
Location of file: by default, it is assumed that the file is stored on a local hard disk drive (HDD). However, one could access files stored on One Drive or Google Drive too.

The data set used here is a CSV file stored on the local hard disk drive, so it is only necessary to click in the Select file to start the process of data specification. Clicking in this field will open a standard Windows Open dialog box, allowing the user to browse for the file. After selection, the Data page is updated to

Clicking the Open button requests the program to open the file and display its contents:

The contents of the data file are displayed in the first of the two tables on the page. Note that all the data are accessible – simply use the scroll bars and > and >> buttons to access any part of the data set.

The second table needs to be completed before model specification can take place. Information required includes the following:

ID variable(s): The second and third line of check boxes are used to identify these. Note that only the ID2 line is active at the start, prompting the user to first select the level-2 ID. Once that is done, selection of the level-3 ID (if present) will be available too.
Weight variable(s):Any weight variables, if available, should be selected in lines 4 through 6 of this table.

Other Variables: The variables that are candidates for inclusion in the model are selected by checking the boxes in the first line of this table.

The first step is to indicate the ID variable, in this case simply named ID.

As there is no level-3 ID or weight variables in the current example, the variables of interest are selected next.

Clicking the Update button prompts the program to check the data and perform an automatic assignment of variables to the appropriate (student or school) levels.

The variable SECTOR and MEANSES are correctly indicated as school level variables, while the outcome of interest here, MATHACH, and the individual student SES are indicated as student level variables.

Having completed the Data File Specification, we are ready to start building the model. To do so, we move on to the Models page by clicking on the Models link at the top of the window.

Note that if data file specification is not completed, the Models page will simply display the message

prompting the user to first complete the file specification step. Model building only becomes possible once data file specification has been completed.

Model building

When the Models page is opened, the only active field displayed is the Set Response option in the Level-1 Model field. The program assumes the outcome variable of interest to be at the lowest level of the hierarchy, and selecting this variable is the starting point for model specification.

In the current example, we are interested in the mathematical scores of students, so we select the variable MATHACH from this drop-down list as shown below

after which the program automatically updates the contents of the Model page to display an unconditional two-level model with MATHACH as outcome.

The Level-1 Variables and Level-2 Variables fields are also activated in the process, so the next step is to start selecting predictors from these lists.

Recall from the discussion of the model previously that we would like to include the level-1 predictor SES. Moreover, we would like to include it as a group centered level-1 predictor. This is done by first clicking on the variable name in the Level-1 Variables field and then selecting the type of centering required. By default, it is assumed that predictors are to be entered into the model uncentered. Here the radio button for Group centered is checked to indicate that we want a group centered predictor.

Adding this variable to the level-1 equation is accomplished by sampling dragging the variable name into the equation while holding the mouse button down, releasing it only when the variable is in the equation field.

Once the mouse button is released, the model is updated to the following random intercept model.

This completes the level-1 specification, and we can now add the level-2 variables in a similar way using the Level-2 Variables field. While it is possible to drag and drop multiple variables into the model simultaneously, the current example calls for an uncentered and a grand mean centered variable to both level-2 equations. As such, the two variables are entered one by one. The first variable, SECTOR, is selected as

while the grand mean centered MEANSES is entered as

to obtain the model specification as shown below.

When this model is compared to that previously described, we notice that we need to add a random slope for the level-1 predictor SES. To do so, we simply click the Random check box above the slope equation. By default, an intercept is always included in the higher-level equations. The intercept may be removed, provided it is not the only effect on the equation of interest.

The model is now complete

and we can move on to the Settings page to specify the type of outcome.

Specifying outcome type and other options

The Settings page is used to specify the type of outcome used in the analysis. Additional options for the type of outcome can also be selected on this page.

For the current model, we have a normally distributed continuous outcome variable and we wish to fit a standard 2-level HLM model. The program will automatically suggest the distribution type based on the data read in. Ensure the option selected by the program is correct (see below). For a Normal (HLM) model, no other options are enabled on this page. As this page is complete, we click Save before moving to the Run page, accessed by clicking the Run link at the top of the window.

Running the model

The analysis is initiated via the Run page, accessed via the Run link at the top of the window. The description of the data and all options specified on the previous pages are captured in a JSON syntax (file extension MLCJSN) that can be saved to a file with MLCJSN extension that can be read back into the program, perhaps to serve a departure point for a subsequent model.

To start the analysis, click the Run Syntax button.

A Progress window will appear, giving details on the iterative procedure. Links to all output files will automatically appear on the Run page. Here we shows some of the HTML output at convergence. Note that a text version of the output is also produced, containing the same information.

While all fixed effects are highly significant, the table of Level-2 variance-covariance components shows that there is little evidence on a significant SES slope.