Using lavaan with semtree

This guide explains an exemplary usage of the semtree package step by step. We will create a latent growth curve model in lavaan, load some data, and infer a SEM Tree.

Step 1 – Load packages

Make sure that you have installed the package „semtree“ (also see Download) and lavaan. Open up R and import all necessary packages:

require("semtree");
require("lavaan");

STEP 2 – Load dataset

Download the example file: example1.txt (1100).
Load the dataset with the following command (add a local path to the file, if necessary):

dataset <- data.frame(read.csv("example1.txt",sep="\t", header=T)) 

STEP 3 – Create a model

Let’s create a linear latent growth curve model with a latent intercept and a latent slope, five occasions of measurement, and independent residual errors with equal variance. The following Figure shows a graphical representation of the model that we will define:

The following code is an lavaan model specification of the above model:

growthCurveModel <- '
    inter =~ 1*X1 + 1*X2 + 1*X3 + 1*X4 + 1*X5;
    slope =~ 0*X1 + 1*X2 + 2*X3 + 3*X4 + 4*X5;
    inter ~~ vari*inter; inter ~ meani*1;
    slope ~~ vars*slope; slope ~ means*1;
    inter ~~ cov*slope;
    X1 ~~ residual*X1; X1 ~ 0*1;
    X2 ~~ residual*X2; X2 ~ 0*1;
    X3 ~~ residual*X3; X3 ~ 0*1;
    X4 ~~ residual*X4; X4 ~ 0*1;
    X5 ~~ residual*X5; X5 ~ 0*1;
'

We can estimate parameters from the full parameter set by fitting the model in lavaan:


run <- lavaan(growthCurveModel,dataset)
summary(run);

Step 4 - Grow a tree

A lavaan model is then created without fitting data to the model:

mymodel <- lavaan(growthCurveModel, dataset,do.fit=FALSE)

The following command will start the tree growing process:

mytree <- semtree(mymodel, dataset)

All columns in the dataset with names not corresponding to manifest variables in the model are considered covariates and, thus, are included in the list of split candidates. In this example, the columns "o1" to "o5" correspond to manifest variables. The remaining columns "age", "training", and "noise" are therefore split candidates.
Please be patient. A large number of models is fitted during the inference of a tree.

Step 5 - Plot the tree

This command will visually display the SEM Tree:

plot(mytree);

The graphical output should look similar to the following Figure:

Let's draw the expected growth trajectories for each resulting leaf with the following code

# create expected trajectories from parameters
expected.growth <- matrix(
    rep(t(parameters(mytree))[, "meani"], each=5)+
    rep(t(parameters(mytree))[, "means"], each=5)*(0:4), nrow=3, byrow=T)
# plot expected trajectories for each leaf
plot(c(0,4), c(-3,5), xlab="time", ylab="score", type="n")
lines(0:4, expected.growth[1,], col="red", type="b", lw=3)
lines(0:4, expected.growth[2,], col="orange", type="b", lw=3)
lines(0:4, expected.growth[3,], col="blue", type="b", lw=3)
legend("bottomleft", c("left leaf", "middle leaf", "right leaf"),
col=c("red", "orange", "blue"), lw=3)

This should look similar to this:

Modify hyper-parameters of the tree

Hyper-parameters determine the way a SEM Tree is grown. For example, different variable selection strategies
can be employed, e.g., likelihood-ratio tests or cross-vadlidated likelihood-ratios. Other parameters affect
the maximum tree height or the type-I-error rate if a statistical selection criterion is used to select
variables. These hyper-parameters can be controlled with a semtree.control object. To obtain the default
object that is used in the semtree package, type:

my.control <- semtree.control();
semtree(mymodel, control=my.control);

See the documentation for a full description of parameters. I mention only a few examples here:

  • Set the significance level of a statistical selection criterion to 5%:
    my.control$alpha <- 0.05
  • Activate a conservative Bonferroni correction for multiple testing
    my.control$bonferroni <- TRUE
  • Set the minimum number of observations in a leaf node to 20:
    my.control$min.N <- 20;
  • Choose the fair (no variable selection bias) or crossvalidated likelihood-ratio as selection criterion:
    my.control$method <- "fair";

    or

    my.control$method <- "cv";
  • Exclude leafs with Heywood cases:
    my.control$exclude.heywood <- TRUE;

Variations on the graphical output

Fallen leaves

If you prefer all leaf nodes to be at the same bottom line, add the command "fallen=TRUE" to the plotting command:

plot(mytree, fallen=TRUE);

In case you prefer the sample size to be reflected in the width of the branches, use

plot(mytree, branch.type=5);