F.A.Q.

  • Q: How can I retrieve the rows of a dataset that are associated with a particular leaf in a tree?
    A: Inspect the tree for its internal node number (e.g. by inspecting the output of plot(tree) or print(tree); ). Let’s say, the node ID is 314. Then do

     leaf <- get_node_by_id(314); print(leaf$ids); 

    This yields the row indices of your dataset that are associated with this leaf.

  • Q: What types of covariates can I use?
    A: Any kind. All possible covariate-specific splits will be represented by binary dummy variables depending on whether a covariate is ordinal, continuous, or categorial.
  • Q: How can I specify the type of my covariates?
    A: Variables with type numeric are treated as continuous, variables with type factor but not ordered are treated as categorial variables, variables with type ordered factor are treated as ordinal covariates. See also the R commands

    is.factor(x), as.factor(x), 
    is.ordered(x), is.factor(x), 
    is.numeric(x), as.numeric(x), 
    class(x)

    .

  • Q: How can I obtain all parameter estimates at once?
    A: Obtain all parameter estimates with parameters(tree) and all standard errors of the estimates with se(tree). The resulting list will have rows with leaf ids as row names and columns with parameters as column names. To obtain a similar matrix of Z-statistics, do:

    parameters(tree)/se(tree);
  • What is the right spelling for SEM Trees?
    A: We refer to the R package as „semtree“ because R packages are usually named without spaces and in all lower case. We spell the methodology as „SEM Tree“. Note, there are also „SEM-trees“ (Sequence Embedding Multiset tree) and „SemTrees“ (Ontology-Based Decision Tree Algorithm for Recommender Systems) which are completely unrelated to our approach.