Title: | Model Based Phylogenetic Analysis |
---|---|
Description: | A collection of functions to do model-based phylogenetic analysis. It includes functions to calculate community phylogenetic diversity, to estimate correlations among functional traits while accounting for phylogenetic relationships, and to fit phylogenetic generalized linear mixed models. The Bayesian phylogenetic generalized linear mixed models are fitted with the 'INLA' package (<https://www.r-inla.org>). |
Authors: | Anthony Ives [aut], Russell Dinnage [aut] , Lucas A. Nell [aut] , Matthew Helmus [aut], Daijiang Li [aut, cre] |
Maintainer: | Daijiang Li <[email protected]> |
License: | GPL-3 |
Version: | 1.1.2 |
Built: | 2024-11-17 04:58:06 UTC |
Source: | https://github.com/daijiang/phyr |
This function will return elements of x not in y
x %nin% y
x %nin% y
x |
A vector. |
y |
A vector. |
A vector.
This function will remove species from community data that are not in the phylogeny. It will also remove tips from the phylogeny that are not in the community data. And then convert the phylogeny to a Var-cov matrix.
align_comm_V(comm, tree, prune.tree = FALSE, scale.vcv = TRUE)
align_comm_V(comm, tree, prune.tree = FALSE, scale.vcv = TRUE)
comm |
A site by species data frame, with site names as row names. |
tree |
A phylogeny with "phylo" as class; or a phylogenetic var-covar matrix. |
prune.tree |
Whether to prune the tree first then use vcv.phylo function. Default is FALSE: use vcv.phylo first then subsetting the matrix. |
scale.vcv |
Whether to scale vcv to a correlation matrix. |
A list of the community data and the phylogenetic var-cov matrix.
Implemented only for cor_phylo
objects thus far.
boot_ci(mod, ...)
boot_ci(mod, ...)
mod |
A |
... |
Additional arguments. |
A list of confidence intervals.
A data frame with site names as row names, species names as column names, cells are the abundance of each species at each site.
comm_a
comm_a
A data frame with 15 sites and 15 species.
A data frame with site names as row names, species names as column names, cells are the abundance of each species at each site.
comm_b
comm_b
A data frame with 15 sites and 9 species.
This function calculates Pearson correlation coefficients for multiple continuous
variates that may have phylogenetic signal, allowing users to specify measurement
error as the standard error of variate values at the tips of the phylogenetic tree.
Phylogenetic signal for each variate is estimated from the data assuming that variate
evolution is given by a Ornstein-Uhlenbeck process. Thus, the function allows the
estimation of phylogenetic signal in multiple variates while incorporating
correlations among variates. It is also possible to include independent variables
(covariates) for each variate to remove possible confounding effects.
cor_phylo
returns the correlation matrix for variate values, estimates
of phylogenetic signal for each variate, and regression coefficients for
independent variables affecting each variate.
cor_phylo(variates, species, phy, covariates = NULL, meas_errors = NULL, data = sys.frame(sys.parent()), REML = TRUE, method = c("nelder-mead-r", "bobyqa", "subplex", "nelder-mead-nlopt", "sann"), no_corr = FALSE, constrain_d = FALSE, lower_d = 1e-7, rel_tol = 1e-6, max_iter = 1000, sann_options = NULL, verbose = FALSE, rcond_threshold = 1e-10, boot = 0, keep_boots = c("fail", "none", "all")) ## S3 method for class 'cor_phylo' boot_ci(mod, refits = NULL, alpha = 0.05, ...) ## S3 method for class 'cor_phylo' print(x, digits = max(3, getOption("digits") - 3), ...)
cor_phylo(variates, species, phy, covariates = NULL, meas_errors = NULL, data = sys.frame(sys.parent()), REML = TRUE, method = c("nelder-mead-r", "bobyqa", "subplex", "nelder-mead-nlopt", "sann"), no_corr = FALSE, constrain_d = FALSE, lower_d = 1e-7, rel_tol = 1e-6, max_iter = 1000, sann_options = NULL, verbose = FALSE, rcond_threshold = 1e-10, boot = 0, keep_boots = c("fail", "none", "all")) ## S3 method for class 'cor_phylo' boot_ci(mod, refits = NULL, alpha = 0.05, ...) ## S3 method for class 'cor_phylo' print(x, digits = max(3, getOption("digits") - 3), ...)
variates |
A formula or a matrix specifying variates between which correlations
are being calculated.
The formula should be one-sided of the form |
species |
A one-sided formula implicating the variable inside |
phy |
Either a phylogeny of class |
covariates |
A list specifying covariate(s) for each variate.
The list can contain only two-sided formulas or matrices.
Formulas should be of the typical form: |
meas_errors |
A list or matrix containing standard errors for each variate.
If a list, it must contain only two-sided formulas like those for |
data |
An optional data frame, list, or environment that contains the
variables in the model. By default, variables are taken from the environment
from which |
REML |
Whether REML (versus ML) should be used for model fitting.
Defaults to |
method |
Method of optimization using |
no_corr |
A single logical for whether to make all correlations zero.
Running |
constrain_d |
If |
lower_d |
Lower bound on the phylogenetic signal parameter.
Defaults to |
rel_tol |
A control parameter dictating the relative tolerance for convergence
in the optimization. Defaults to |
max_iter |
A control parameter dictating the maximum number of iterations
in the optimization. Defaults to |
sann_options |
A named list containing the control parameters for SANN
minimization.
This is only relevant if |
verbose |
If |
rcond_threshold |
Threshold for the reciprocal condition number of two
matrices inside the log likelihood function.
Increasing this threshold makes the optimization process more strongly
"bounce away" from badly conditioned matrices and can help with convergence
and with estimates that are nonsensical.
Defaults to |
boot |
Number of parametric bootstrap replicates.
Bootstrapping can be run in parallel if |
keep_boots |
Character specifying when to output data (convergence codes
and simulated variate data) from bootstrap replicates.
This is useful for troubleshooting when one or more bootstrap replicates
fails to converge or outputs ridiculous results.
Setting this to |
mod |
|
refits |
One or more |
alpha |
Alpha used for the confidence intervals. Defaults to |
... |
arguments passed to and from other methods. |
x |
an object of class |
digits |
the number of digits to be printed. |
cor_phylo
returns an object of class cor_phylo
:
call |
The matched call. |
corrs |
The |
d |
Values of |
B |
A matrix of regression-coefficient estimates, SE, Z-scores, and P-values, respectively. Rownames indicate which coefficient it refers to. |
B_cov |
Covariance matrix for regression coefficients. |
logLik |
The log likelihood for either the restricted likelihood
( |
AIC |
AIC for either the restricted likelihood ( |
BIC |
BIC for either the restricted likelihood ( |
niter |
Number of iterations the optimizer used. |
convcode |
Conversion code for the optimizer.
This number is
For more information on the nlopt return codes, see https://nlopt.readthedocs.io/en/latest/NLopt_Reference/#return-values. |
rcond_vals |
Reciprocal condition numbers for two matrices inside
the log likelihood function. These are provided to potentially help guide
the changing of the |
bootstrap |
A list of bootstrap output, which is simply |
boot_ci
returns a list of confidence intervals with the following fields:
corrs
Estimates of correlations. This is a matrix the values above the diagonal being the upper limits and values below being the lower limits.
d
Phylogenetic signals.
B0
Coefficient estimates.
B_cov
Coefficient covariances.
boot_ci(cor_phylo)
: returns bootstrapped confidence intervals from a cor_phylo
object
print(cor_phylo)
: prints cor_phylo
objects
For the case of two variables, the function estimates parameters for the model of the form, for example,
where ,
,
, and
are regression
coefficients, and
is a variance-covariance matrix containing the correlation
coefficient r, parameters of the OU process
and
, and diagonal
matrices
and
of measurement standard errors for
and
. The matrix
is
, with
blocks given by
where are derived from
phy
under the assumption of joint
OU evolutionary processes for each variate (see Zheng et al. 2009). This formulation
extends in the obvious way to more than two variates.
Anthony R. Ives, Lucas A. Nell
Zheng, L., A. R. Ives, T. Garland, B. R. Larget, Y. Yu, and K. F. Cao. 2009. New multivariate tests for phylogenetic signal and trait correlations applied to ecophysiological phenotypes of nine Manglietia species. Functional Ecology 23:1059–1069.
## Simple example using data without correlations or phylogenetic ## signal. This illustrates the structure of the input data. set.seed(10) phy <- ape::rcoal(10, tip.label = 1:10) data_df <- data.frame( species = phy$tip.label, # variates: par1 = rnorm(10), par2 = rnorm(10), par3 = rnorm(10), # covariate for par2: cov2 = rnorm(10, mean = 10, sd = 4), # measurement error for par1 and par2, respectively: se1 = 0.2, se2 = 0.4 ) data_df$par2 <- data_df$par2 + 0.5 * data_df$cov2 cp <- cor_phylo(variates = ~ par1 + par2 + par3, covariates = list(par2 ~ cov2), meas_errors = list(par1 ~ se1, par2 ~ se2), species = ~ species, phy = phy, data = data_df) # If you've already created matrices/lists... X <- as.matrix(data_df[,c("par1", "par2", "par3")]) U <- list(par2 = cbind(cov2 = data_df$cov2)) M <- cbind(par1 = data_df$se1, par2 = data_df$se2) # ... you can also use those directly # (notice that I'm inputting an object for `species` # bc I ommitted `data`): cp2 <- cor_phylo(variates = X, species = data_df$species, phy = phy, covariates = U, meas_errors = M) # # # ## Simulation example for the correlation between two variables. The example # ## compares the estimates of the correlation coefficients from cor_phylo when # ## measurement error is incorporated into the analyses with three other cases: # ## (i) when measurement error is excluded, (ii) when phylogenetic signal is # ## ignored (assuming a "star" phylogeny), and (iii) neither measurement error # ## nor phylogenetic signal are included. # # # In the simulations, variable 2 is associated with a single independent variable. # # library(ape) # # set.seed(1) # # Set up parameter values for simulating data # n <- 50 # phy <- rcoal(n, tip.label = 1:n) # trt_names <- paste0("par", 1:2) # # R <- matrix(c(1, 0.7, 0.7, 1), nrow = 2, ncol = 2) # d <- c(0.3, 0.95) # B2 <- 1 # # Se <- c(0.2, 1) # M <- matrix(Se, nrow = n, ncol = 2, byrow = TRUE) # colnames(M) <- trt_names # # # Set up needed matrices for the simulations # p <- length(d) # # star <- stree(n) # star$edge.length <- array(1, dim = c(n, 1)) # star$tip.label <- phy$tip.label # # Vphy <- vcv(phy) # Vphy <- Vphy/max(Vphy) # Vphy <- Vphy/exp(determinant(Vphy)$modulus[1]/n) # # tau <- matrix(1, nrow = n, ncol = 1) %*% diag(Vphy) - Vphy # C <- matrix(0, nrow = p * n, ncol = p * n) # for (i in 1:p) for (j in 1:p) { # Cd <- (d[i]^tau * (d[j]^t(tau)) * (1 - (d[i] * d[j])^Vphy))/(1 - d[i] * d[j]) # C[(n * (i - 1) + 1):(i * n), (n * (j - 1) + 1):(j * n)] <- R[i, j] * Cd # } # MM <- matrix(M^2, ncol = 1) # V <- C + diag(as.numeric(MM)) # # # Perform a Cholesky decomposition of Vphy. This is used to generate phylogenetic # # signal: a vector of independent normal random variables, when multiplied by the # # transpose of the Cholesky deposition of Vphy will have covariance matrix # # equal to Vphy. # iD <- t(chol(V)) # # # Perform Nrep simulations and collect the results # Nrep <- 100 # cor.list <- matrix(0, nrow = Nrep, ncol = 1) # cor.noM.list <- matrix(0, nrow = Nrep, ncol = 1) # cor.noP.list <- matrix(0, nrow = Nrep, ncol = 1) # cor.noMP.list <- matrix(0, nrow = Nrep, ncol = 1) # d.list <- matrix(0, nrow = Nrep, ncol = 2) # d.noM.list <- matrix(0, nrow = Nrep, ncol = 2) # B.list <- matrix(0, nrow = Nrep, ncol = 3) # B.noM.list <- matrix(0, nrow = Nrep, ncol = 3) # B.noP.list <- matrix(0, nrow = Nrep, ncol = 3) # # # set.seed(2) # for (rep in 1:Nrep) { # # XX <- iD %*% rnorm(2 * n) # X <- matrix(XX, n, p) # colnames(X) <- trt_names # # U <- list(cbind(rnorm(n, mean = 2, sd = 10))) # names(U) <- trt_names[2] # # X[,2] <- X[,2] + B2[1] * U[[1]][,1] - B2[1] * mean(U[[1]][,1]) # # # Call cor_phylo with (i) phylogeny and measurement error, # # (ii) just phylogeny, # # and (iii) just measurement error # z <- cor_phylo(variates = X, # covariates = U, # meas_errors = M, # phy = phy, # species = phy$tip.label) # z.noM <- cor_phylo(variates = X, # covariates = U, # phy = phy, # species = phy$tip.label) # z.noP <- cor_phylo(variates = X, # covariates = U, # meas_errors = M, # phy = star, # species = phy$tip.label) # # cor.list[rep] <- z$corrs[1, 2] # cor.noM.list[rep] <- z.noM$corrs[1, 2] # cor.noP.list[rep] <- z.noP$corrs[1, 2] # cor.noMP.list[rep] <- cor(cbind( # lm(X[,1] ~ 1)$residuals, # lm(X[,2] ~ U[[1]])$residuals))[1,2] # # d.list[rep, ] <- z$d # d.noM.list[rep, ] <- z.noM$d # # B.list[rep, ] <- z$B[,1] # B.noM.list[rep, ] <- z.noM$B[,1] # B.noP.list[rep, ] <- z.noP$B[,1] # } # # correlation <- rbind(R[1, 2], mean(cor.list), mean(cor.noM.list), # mean(cor.noP.list), mean(cor.noMP.list)) # rownames(correlation) <- c("True", "With M and Phy", "Without M", # "Without Phy", "Without Phy or M") # # signal.d <- rbind(d, colMeans(d.list), colMeans(d.noM.list)) # rownames(signal.d) <- c("True", "With M and Phy", "Without M") # # est.B <- rbind(c(0, 0, B2), colMeans(B.list), # colMeans(B.noM.list[-39,]), # 39th rep didn't converge # colMeans(B.noP.list)) # rownames(est.B) <- c("True", "With M and Phy", "Without M", "Without Phy") # colnames(est.B) <- rownames(z$B) # # # Example simulation output: # # correlation # # [,1] # # True 0.7000000 # # With M and Phy 0.6943712 # # Without M 0.2974162 # # Without Phy 0.3715406 # # Without Phy or M 0.3291473 # # signal.d # # [,1] [,2] # # True 0.3000000 0.9500000 # # With M and Phy 0.3025853 0.9422067 # # Without M 0.2304527 0.4180208 # # est.B # # par1_0 par2_0 par2_cov_1 # # True 0.000000000 0.0000000 1.0000000 # # With M and Phy -0.008838245 0.1093819 0.9995058 # # Without M -0.008240453 0.1142330 0.9995625 # # Without Phy 0.002933341 0.1096578 1.0028474
## Simple example using data without correlations or phylogenetic ## signal. This illustrates the structure of the input data. set.seed(10) phy <- ape::rcoal(10, tip.label = 1:10) data_df <- data.frame( species = phy$tip.label, # variates: par1 = rnorm(10), par2 = rnorm(10), par3 = rnorm(10), # covariate for par2: cov2 = rnorm(10, mean = 10, sd = 4), # measurement error for par1 and par2, respectively: se1 = 0.2, se2 = 0.4 ) data_df$par2 <- data_df$par2 + 0.5 * data_df$cov2 cp <- cor_phylo(variates = ~ par1 + par2 + par3, covariates = list(par2 ~ cov2), meas_errors = list(par1 ~ se1, par2 ~ se2), species = ~ species, phy = phy, data = data_df) # If you've already created matrices/lists... X <- as.matrix(data_df[,c("par1", "par2", "par3")]) U <- list(par2 = cbind(cov2 = data_df$cov2)) M <- cbind(par1 = data_df$se1, par2 = data_df$se2) # ... you can also use those directly # (notice that I'm inputting an object for `species` # bc I ommitted `data`): cp2 <- cor_phylo(variates = X, species = data_df$species, phy = phy, covariates = U, meas_errors = M) # # # ## Simulation example for the correlation between two variables. The example # ## compares the estimates of the correlation coefficients from cor_phylo when # ## measurement error is incorporated into the analyses with three other cases: # ## (i) when measurement error is excluded, (ii) when phylogenetic signal is # ## ignored (assuming a "star" phylogeny), and (iii) neither measurement error # ## nor phylogenetic signal are included. # # # In the simulations, variable 2 is associated with a single independent variable. # # library(ape) # # set.seed(1) # # Set up parameter values for simulating data # n <- 50 # phy <- rcoal(n, tip.label = 1:n) # trt_names <- paste0("par", 1:2) # # R <- matrix(c(1, 0.7, 0.7, 1), nrow = 2, ncol = 2) # d <- c(0.3, 0.95) # B2 <- 1 # # Se <- c(0.2, 1) # M <- matrix(Se, nrow = n, ncol = 2, byrow = TRUE) # colnames(M) <- trt_names # # # Set up needed matrices for the simulations # p <- length(d) # # star <- stree(n) # star$edge.length <- array(1, dim = c(n, 1)) # star$tip.label <- phy$tip.label # # Vphy <- vcv(phy) # Vphy <- Vphy/max(Vphy) # Vphy <- Vphy/exp(determinant(Vphy)$modulus[1]/n) # # tau <- matrix(1, nrow = n, ncol = 1) %*% diag(Vphy) - Vphy # C <- matrix(0, nrow = p * n, ncol = p * n) # for (i in 1:p) for (j in 1:p) { # Cd <- (d[i]^tau * (d[j]^t(tau)) * (1 - (d[i] * d[j])^Vphy))/(1 - d[i] * d[j]) # C[(n * (i - 1) + 1):(i * n), (n * (j - 1) + 1):(j * n)] <- R[i, j] * Cd # } # MM <- matrix(M^2, ncol = 1) # V <- C + diag(as.numeric(MM)) # # # Perform a Cholesky decomposition of Vphy. This is used to generate phylogenetic # # signal: a vector of independent normal random variables, when multiplied by the # # transpose of the Cholesky deposition of Vphy will have covariance matrix # # equal to Vphy. # iD <- t(chol(V)) # # # Perform Nrep simulations and collect the results # Nrep <- 100 # cor.list <- matrix(0, nrow = Nrep, ncol = 1) # cor.noM.list <- matrix(0, nrow = Nrep, ncol = 1) # cor.noP.list <- matrix(0, nrow = Nrep, ncol = 1) # cor.noMP.list <- matrix(0, nrow = Nrep, ncol = 1) # d.list <- matrix(0, nrow = Nrep, ncol = 2) # d.noM.list <- matrix(0, nrow = Nrep, ncol = 2) # B.list <- matrix(0, nrow = Nrep, ncol = 3) # B.noM.list <- matrix(0, nrow = Nrep, ncol = 3) # B.noP.list <- matrix(0, nrow = Nrep, ncol = 3) # # # set.seed(2) # for (rep in 1:Nrep) { # # XX <- iD %*% rnorm(2 * n) # X <- matrix(XX, n, p) # colnames(X) <- trt_names # # U <- list(cbind(rnorm(n, mean = 2, sd = 10))) # names(U) <- trt_names[2] # # X[,2] <- X[,2] + B2[1] * U[[1]][,1] - B2[1] * mean(U[[1]][,1]) # # # Call cor_phylo with (i) phylogeny and measurement error, # # (ii) just phylogeny, # # and (iii) just measurement error # z <- cor_phylo(variates = X, # covariates = U, # meas_errors = M, # phy = phy, # species = phy$tip.label) # z.noM <- cor_phylo(variates = X, # covariates = U, # phy = phy, # species = phy$tip.label) # z.noP <- cor_phylo(variates = X, # covariates = U, # meas_errors = M, # phy = star, # species = phy$tip.label) # # cor.list[rep] <- z$corrs[1, 2] # cor.noM.list[rep] <- z.noM$corrs[1, 2] # cor.noP.list[rep] <- z.noP$corrs[1, 2] # cor.noMP.list[rep] <- cor(cbind( # lm(X[,1] ~ 1)$residuals, # lm(X[,2] ~ U[[1]])$residuals))[1,2] # # d.list[rep, ] <- z$d # d.noM.list[rep, ] <- z.noM$d # # B.list[rep, ] <- z$B[,1] # B.noM.list[rep, ] <- z.noM$B[,1] # B.noP.list[rep, ] <- z.noP$B[,1] # } # # correlation <- rbind(R[1, 2], mean(cor.list), mean(cor.noM.list), # mean(cor.noP.list), mean(cor.noMP.list)) # rownames(correlation) <- c("True", "With M and Phy", "Without M", # "Without Phy", "Without Phy or M") # # signal.d <- rbind(d, colMeans(d.list), colMeans(d.noM.list)) # rownames(signal.d) <- c("True", "With M and Phy", "Without M") # # est.B <- rbind(c(0, 0, B2), colMeans(B.list), # colMeans(B.noM.list[-39,]), # 39th rep didn't converge # colMeans(B.noP.list)) # rownames(est.B) <- c("True", "With M and Phy", "Without M", "Without Phy") # colnames(est.B) <- rownames(z$B) # # # Example simulation output: # # correlation # # [,1] # # True 0.7000000 # # With M and Phy 0.6943712 # # Without M 0.2974162 # # Without Phy 0.3715406 # # Without Phy or M 0.3291473 # # signal.d # # [,1] [,2] # # True 0.3000000 0.9500000 # # With M and Phy 0.3025853 0.9422067 # # Without M 0.2304527 0.4180208 # # est.B # # par1_0 par2_0 par2_cov_1 # # True 0.000000000 0.0000000 1.0000000 # # With M and Phy -0.008838245 0.1093819 0.9995058 # # Without M -0.008240453 0.1142330 0.9995625 # # Without Phy 0.002933341 0.1096578 1.0028474
A data frame of site environmental variables.
envi
envi
A data frame with 15 sites and 4 variables: sand proportion, canopy shade proportion, precipitation, and minimum temperature.
Family Objects for communityPGLMM objects
## S3 method for class 'communityPGLMM' family(object, ...)
## S3 method for class 'communityPGLMM' family(object, ...)
object |
the function |
... |
further arguments passed to methods. |
Fitted values for communityPGLMM
## S3 method for class 'communityPGLMM' fitted(object, ...)
## S3 method for class 'communityPGLMM' fitted(object, ...)
object |
A fitted model with class communityPGLMM. |
... |
Additional arguments, ignored for method compatibility. |
Fitted values. For binomial and poisson PGLMMs, this is equal to mu.
Extract the fixed-effects estimates
## S3 method for class 'communityPGLMM' fixef(object, ...)
## S3 method for class 'communityPGLMM' fixef(object, ...)
object |
A fitted model with class communityPGLMM. |
... |
Ignored. |
Extract the estimates of the fixed-effects parameters from a fitted model. For bayesian models, the p-values are simply to indicate whether the credible intervals include 0 (p = 0.04) or not (p = 0.6).
A dataframe of fixed-effects estimates.
get_design_matrix
gets design matrix for gaussian, binomial, and poisson modelsget_design_matrix
gets design matrix for gaussian, binomial, and poisson models
get_design_matrix(formula, data, random.effects, na.action = NULL)
get_design_matrix(formula, data, random.effects, na.action = NULL)
formula |
A two-sided linear formula object describing the mixed effects of the model. To specify that a random term should have phylogenetic covariance matrix along
with non-phylogenetic one, add Note that correlated random terms are not allowed. For example,
|
data |
A |
random.effects |
Optional pre-build list of random effects. If |
na.action |
What to do with NAs? |
A list of design matrices.
This function will remove species from community data that are not in the phylogeny. It will also remove tips from the phylogeny that are not in the community data.
match_comm_tree(comm, tree, comm_2 = NULL)
match_comm_tree(comm, tree, comm_2 = NULL)
comm |
A site by species data frame, with site names as row names. |
tree |
A phylogeny with "phylo" as class. |
comm_2 |
Another optional site by species data frame, if presented, both community data and the phylogeny will have the same set of species. This can be useful for PCD with custom species pool. |
A list of the community data and the phylogeny.
Extracting the Model Frame from a communityPGLMM Model object
## S3 method for class 'communityPGLMM' model.frame(formula, ...)
## S3 method for class 'communityPGLMM' model.frame(formula, ...)
formula |
|
... |
for For |
Number of Observation in a communityPGLMM Model
## S3 method for class 'communityPGLMM' nobs(object, use.fallback = FALSE, ...)
## S3 method for class 'communityPGLMM' nobs(object, use.fallback = FALSE, ...)
object |
A fitted model object. |
use.fallback |
logical: should fallback methods be used to try to guess the value? |
... |
Further arguments to be passed to methods. |
A list containing a phylogeny for XX species of Oldfield forbs, as well as a presence / absence dataset for their occurrence across several locations in Southern Ontario see Dinnage (2009) for details. Sites each had two plots which experienced a different treatment each; either they has been disturbed (ploughed 1 or 2 years previously), or they were a control plot (undisturbed in recent records).
oldfield
oldfield
A list with two elements:
phy
A phylogeny in ape
's phy
format
data
A data.frame containing data on the occurrence of the species in phy
oldfield$data is a data.frame with 1786 rows, and the following 7 columns:
site_orig
integer. Site ID number.
habitat_type
character. Plot treatment: disturbed or undisturbed.
sp
character. Species name using underscore to separate binomial names (to match phylogeny).
abundance
integer. Recorded abundance of species in plot.
disturbance
integer. Whether the plot was disturbed or not. 0 or 1. 0 for undisturbed, 1 for disturbed
site_orig
character. A unique site descriptor concatenating the site number with the disturbance treatment.
pres
integer. Species presence or absence in plot. 0 or 1. 0 for absent, 1 for present
Calculate pairwise site PCD, users can specify expected values from pcd_pred()
.
pcd(comm, tree, expectation = NULL, cpp = TRUE, verbose = TRUE, ...)
pcd(comm, tree, expectation = NULL, cpp = TRUE, verbose = TRUE, ...)
comm |
A site by species data frame or matrix, sites as rows. |
tree |
A phylogeny for species. |
expectation |
nsp_pool, psv_bar, psv_pool, and nsr calculated from |
cpp |
Whether to use loops written with c++, default is TRUE. |
verbose |
Do you want to see the progress? |
... |
Other arguments. |
A list of a variety of pairwise dissimilarities.
Ives, A. R., & Helmus, M. R. 2010. Phylogenetic metrics of community similarity. The American Naturalist, 176(5), E128-E142.
x1 = pcd_pred(comm_1 = comm_a, comm_2 = comm_b, tree = phylotree, reps = 100) pcd(comm = comm_a, tree = phylotree, expectation = x1)
x1 = pcd_pred(comm_1 = comm_a, comm_2 = comm_b, tree = phylotree, reps = 100) pcd(comm = comm_a, tree = phylotree, expectation = x1)
This function will calculate expected PCD from one or two sets of communities (depends on the species pool)
pcd_pred(comm_1, comm_2 = NULL, tree, reps = 10^3, cpp = TRUE)
pcd_pred(comm_1, comm_2 = NULL, tree, reps = 10^3, cpp = TRUE)
comm_1 |
A site by species dataframe or matrix, with sites as rows and species as columns. |
comm_2 |
An optional second site by species data frame. It should have the same number of rows as comm_1. This can be useful if we want to calculate temporal beta diversity, i.e. changes of the same site over time. Because data of the same site are not independent, setting comm_2 will use both communities as species pool to calculate expected PCD. |
tree |
The phylogeny for all species, with "phylo" as class; or a var-cov matrix. |
reps |
Number of random draws, default is 1000 times. |
cpp |
Whether to use loops written with c++, default is TRUE. If you came across with errors, try to set cpp = FALSE. This normally will run without errors, but slower. |
A list with species richness of the pool, expected PSV, PSV of the pool, and unique number of species richness across sites.
This function performs Generalized Linear Mixed Models for binary, count,
and continuous data, estimating regression coefficients with
approximate standard errors. It is specifically designed for community data
in which species occur within multiple sites (locations).
A Bayesian version of PGLMM uses the package INLA
,
which is not available on CRAN yet. If you wish to use this option,
you must first install INLA
from https://www.r-inla.org/ by running
install.packages('INLA', repos='https://www.math.ntnu.no/inla/R/stable')
in R.
pglmm( formula, data = NULL, family = "gaussian", cov_ranef = NULL, random.effects = NULL, REML = TRUE, optimizer = c("nelder-mead-nlopt", "bobyqa", "Nelder-Mead", "subplex"), repulsion = FALSE, add.obs.re = TRUE, verbose = FALSE, cpp = TRUE, bayes = FALSE, s2.init = NULL, B.init = NULL, reltol = 10^-6, maxit = 500, tol.pql = 10^-6, maxit.pql = 200, marginal.summ = "mean", calc.DIC = TRUE, calc.WAIC = TRUE, prior = "inla.default", prior_alpha = 0.1, prior_mu = 1, ML.init = FALSE, tree = NULL, tree_site = NULL, sp = NULL, site = NULL, bayes_options = NULL, bayes_nested_matrix_as_list = FALSE ) communityPGLMM( formula, data = NULL, family = "gaussian", cov_ranef = NULL, random.effects = NULL, REML = TRUE, optimizer = c("nelder-mead-nlopt", "bobyqa", "Nelder-Mead", "subplex"), repulsion = FALSE, add.obs.re = TRUE, verbose = FALSE, cpp = TRUE, bayes = FALSE, s2.init = NULL, B.init = NULL, reltol = 10^-6, maxit = 500, tol.pql = 10^-6, maxit.pql = 200, marginal.summ = "mean", calc.DIC = TRUE, calc.WAIC = TRUE, prior = "inla.default", prior_alpha = 0.1, prior_mu = 1, ML.init = FALSE, tree = NULL, tree_site = NULL, sp = NULL, site = NULL, bayes_options = NULL, bayes_nested_matrix_as_list = FALSE )
pglmm( formula, data = NULL, family = "gaussian", cov_ranef = NULL, random.effects = NULL, REML = TRUE, optimizer = c("nelder-mead-nlopt", "bobyqa", "Nelder-Mead", "subplex"), repulsion = FALSE, add.obs.re = TRUE, verbose = FALSE, cpp = TRUE, bayes = FALSE, s2.init = NULL, B.init = NULL, reltol = 10^-6, maxit = 500, tol.pql = 10^-6, maxit.pql = 200, marginal.summ = "mean", calc.DIC = TRUE, calc.WAIC = TRUE, prior = "inla.default", prior_alpha = 0.1, prior_mu = 1, ML.init = FALSE, tree = NULL, tree_site = NULL, sp = NULL, site = NULL, bayes_options = NULL, bayes_nested_matrix_as_list = FALSE ) communityPGLMM( formula, data = NULL, family = "gaussian", cov_ranef = NULL, random.effects = NULL, REML = TRUE, optimizer = c("nelder-mead-nlopt", "bobyqa", "Nelder-Mead", "subplex"), repulsion = FALSE, add.obs.re = TRUE, verbose = FALSE, cpp = TRUE, bayes = FALSE, s2.init = NULL, B.init = NULL, reltol = 10^-6, maxit = 500, tol.pql = 10^-6, maxit.pql = 200, marginal.summ = "mean", calc.DIC = TRUE, calc.WAIC = TRUE, prior = "inla.default", prior_alpha = 0.1, prior_mu = 1, ML.init = FALSE, tree = NULL, tree_site = NULL, sp = NULL, site = NULL, bayes_options = NULL, bayes_nested_matrix_as_list = FALSE )
formula |
A two-sided linear formula object describing the mixed effects of the model. To specify that a random term should have phylogenetic covariance matrix along
with non-phylogenetic one, add Note that correlated random terms are not allowed. For example,
|
data |
A |
family |
Either "gaussian" for a Linear Mixed Model, or
"binomial" or "poisson" for Generalized Linear Mixed Models.
"family" should be specified as a character string (i.e., quoted). For binomial and
Poisson data, we use the canonical logit and log link functions, respectively.
Binomial data can be either presence/absence, or a two-column array of 'successes' and 'failures'.
For both binomial and Poisson data, we add an observation-level
random term by default via |
cov_ranef |
A named list of covariance matrices of random terms. The names should be the
group variables that are used as random terms with specified covariance matrices
(without the two underscores, e.g. |
random.effects |
Optional pre-build list of random effects. If |
REML |
Whether REML or ML is used for model fitting the random effects. Ignored if
|
optimizer |
nelder-mead-nlopt (default), bobyqa, Nelder-Mead, or subplex.
Nelder-Mead is from the stats package and the other optimizers are from the nloptr package.
Ignored if |
repulsion |
When there are nested random terms specified, |
add.obs.re |
Whether to add an observation-level random term for binomial or Poisson
distributions. Normally it would be a good idea to add this to account for overdispersion,
so |
verbose |
If |
cpp |
Whether to use C++ function for optim. Default is TRUE. Ignored if |
bayes |
Whether to fit a Bayesian version of the PGLMM using |
s2.init |
An array of initial estimates of s2 for each random
effect that scales the variance. If s2.init is not provided for
|
B.init |
Initial estimates of |
reltol |
A control parameter dictating the relative tolerance
for convergence in the optimization; see |
maxit |
A control parameter dictating the maximum number of
iterations in the optimization; see |
tol.pql |
A control parameter dictating the tolerance for
convergence in the PQL estimates of the mean components of the
GLMM. Ignored if |
maxit.pql |
A control parameter dictating the maximum number
of iterations in the PQL estimates of the mean components of the
GLMM. Ignored if |
marginal.summ |
Summary statistic to use for the estimate of coefficients when
doing a Bayesian PGLMM (when |
calc.DIC |
Should the Deviance Information Criterion be calculated and returned
when doing a Bayesian PGLMM? Ignored if |
calc.WAIC |
Should the WAIC be calculated and returned
when doing a Bayesian PGLMM? Ignored if |
prior |
Which type of default prior should be used by |
prior_alpha |
Only used if |
prior_mu |
Only used if |
ML.init |
Only relevant if |
tree |
A phylogeny for column sp, with "phylo" class, or a covariance matrix for sp.
Make sure to have all species in the matrix; if the matrix is not standardized,
(i.e., det(tree) != 1), |
tree_site |
A second phylogeny for "site". This is required only if the site column contains species instead of sites. This can be used for bipartitie questions; tree_site can also be a covariance matrix. Make sure to have all sites in the matrix; if the matrix is not standardized (i.e., det(tree_site) != 1), pglmm' will try to standardize it for you. No longer used: keep here for compatibility. |
sp |
No longer used: keep here for compatibility. |
site |
No longer used: keep here for compatibility. |
bayes_options |
Additional options to pass to INLA for if |
bayes_nested_matrix_as_list |
For |
For Gaussian data, pglmm
analyzes the phylogenetic linear mixed model
where and
are fixed
effects, and
is a variance-covariance matrix
derived from a phylogeny (typically under the assumption of
Brownian motion evolution). Here, the variation in the mean
(intercept) for each species is given by the random effect
that is assumed to be independent among
species. Variation in species' responses to predictor variable
is given by a random effect
that is
assumed to depend on the phylogenetic relatedness among species
given by
; if species are closely related,
their specific responses to
will be similar. This
particular model would be specified as
z <- pglmm(Y ~ X + (1|sp__), data = data, family = "gaussian", cov_ranef = list(sp = phy))
Or you can prepare the random terms manually (not recommended for simple models but may be necessary for complex models):
re.1 <- list(1, sp = dat$sp, covar = diag(nspp))
re.2 <- list(dat$X, sp = dat$sp, covar = Vsp)
z <- pglmm(Y ~ X, data = data, family = "gaussian", random.effects = list(re.1, re.2))
The covariance matrix covar is standardized to have its determinant
equal to 1. This in effect standardizes the interpretation of the
scalar . Although mathematically this is
not required, it is a very good idea to standardize the predictor
(independent) variables to have mean 0 and variance 1. This will
make the function more robust and improve the interpretation of the
regression coefficients. For categorical (factor) predictor
variables, you will need to construct 0-1 dummy variables, and
these should not be standardized (for obvious reasons).
For binary generalized linear mixed models (family =
'binomial'
), the function estimates parameters for the model of
the form, for example,
where and
are fixed
effects, and
is a variance-covariance matrix
derived from a phylogeny (typically under the assumption of
Brownian motion evolution).
z <- pglmm(Y ~ X + (1|sp__), data = data, family = "binomial", cov_ranef = list(sp = phy))
As with the linear mixed model, it is a very good idea to standardize the predictor (independent) variables to have mean 0 and variance 1. This will make the function more robust and improve the interpretation of the regression coefficients.
An object (list) of class communityPGLMM
with the following elements:
formula |
the formula for fixed effects |
formula_original |
the formula for both fixed effects and random effects |
data |
the dataset |
family |
|
random.effects |
the list of random effects |
B |
estimates of the regression coefficients |
B.se |
approximate standard errors of the fixed effects regression coefficients.
This is set to NULL if |
B.ci |
approximate Bayesian credible interval of the fixed effects regression coefficients.
This is set to NULL if |
B.cov |
approximate covariance matrix for the fixed effects regression coefficients |
B.zscore |
approximate Z scores for the fixed effects regression coefficients. This is set to NULL if |
B.pvalue |
approximate tests for the fixed effects regression coefficients being different from zero. This is set to NULL if |
ss |
standard deviations of the random effects for the covariance matrix |
s2r |
random effects variances for non-nested random effects |
s2n |
random effects variances for nested random effects |
s2resid |
for linear mixed models, the residual variance |
s2r.ci |
Bayesian credible interval for random effects variances for non-nested random effects.
This is set to NULL if |
s2n.ci |
Bayesian credible interval for random effects variances for nested random effects.
This is set to NULL if |
s2resid.ci |
Bayesian credible interval for linear mixed models, the residual variance.
This is set to NULL if |
logLik |
for linear mixed models, the log-likelihood for either the restricted likelihood ( |
AIC |
for linear mixed models, the AIC for either the restricted likelihood ( |
BIC |
for linear mixed models, the BIC for either the restricted likelihood ( |
DIC |
for Bayesian PGLMM, this is the Deviance Information Criterion metric of model fit. This is set to NULL if |
REML |
whether or not REML is used ( |
bayes |
whether or not a Bayesian model was fit. |
marginal.summ |
The specified summary statistic used to summarize the Bayesian marginal distributions.
Only present if |
s2.init |
the user-provided initial estimates of |
B.init |
the user-provided initial estimates of |
Y |
the response (dependent) variable returned in matrix form |
X |
the predictor (independent) variables returned in matrix form (including 1s in the first column) |
H |
the residuals. For linear mixed models, this does not account for random terms,
To get residuals after accounting for both fixed and random terms, use |
iV |
the inverse of the covariance matrix for the entire system (of dimension ( |
mu |
predicted mean values for the generalized linear mixed model (i.e., similar to |
nested |
matrices used to construct the nested design matrix. This is set to NULL if |
Zt |
the design matrix for random effects. This is set to NULL if |
St |
diagonal matrix that maps the random effects variances onto the design matrix |
convcode |
the convergence code provided by |
niter |
number of iterations performed by |
inla.model |
Model object fit by underlying |
Anthony R. Ives, Daijiang Li, Russell Dinnage
Ives, A. R. and M. R. Helmus. 2011. Generalized linear mixed models for phylogenetic analyses of community structure. Ecological Monographs 81:511-525.
Ives A. R. 2018. Mixed and phylogenetic models: a conceptual introduction to correlated data. https://leanpub.com/correlateddata.
Rafferty, N. E., and A. R. Ives. 2013. Phylogenetic trait-based analyses of ecological networks. Ecology 94:2321-2333.
Simpson, Daniel, et al. 2017. Penalising model component complexity: A principled, practical approach to constructing priors. Statistical science 32(1): 1-28.
Li, D., Ives, A. R., & Waller, D. M. 2017. Can functional traits account for phylogenetic signal in community composition? New Phytologist, 214(2), 607-618.
## Structure of examples: # First, a (brief) description of model types, and how they are specified # - these are *not* to be run 'as-is'; they show how models should be organised # Second, a run-through of how to simulate, and then analyse, data # - these *are* to be run 'as-is'; they show how to format and work with data ############################################# ### Brief summary of models and their use ### ############################################# ## Model structures from Ives & Helmus (2011) if(FALSE){ # dat = data set for regression (note: must have a column "sp" and a column "site") # phy = phylogeny of class "phylo" # repulsion = to test phylogenetic repulsion or not # Model 1 (Eq. 1) z <- pglmm(freq ~ sp + (1|site) + (1|sp__@site), data = dat, family = "binomial", cov_ranef = list(sp = phy), REML = TRUE, verbose = TRUE, s2.init = .1) # Model 2 (Eq. 2) z <- pglmm(freq ~ sp + X + (1|site) + (X|sp__), data = dat, family = "binomial", cov_ranef = list(sp = phy), REML = TRUE, verbose = TRUE, s2.init = .1) # Model 3 (Eq. 3) z <- pglmm(freq ~ sp*X + (1|site) + (1|sp__@site), data = dat, family = "binomial", cov_ranef = list(sp = phy), REML = TRUE, verbose = TRUE, s2.init = .1) ## Model structure from Rafferty & Ives (2013) (Eq. 3) # dat = data set # phyPol = phylogeny for pollinators (pol) # phyPlt = phylogeny for plants (plt) z <- pglmm(freq ~ pol * X + (1|pol__) + (1|plt__) + (1|pol__@plt) + (1|pol@plt__) + (1|pol__@plt__), data = dat, family = "binomial", cov_ranef = list(pol = phyPol, plt = phyPlt), REML = TRUE, verbose = TRUE, s2.init = .1) } ##################################################### ### Detailed analysis showing covariance matrices ### ##################################################### # This is the example from section 4.3 in Ives, A. R. (2018) Mixed # and phylogenetic models: a conceptual introduction to correlated data. library(ape) library(mvtnorm) # Investigating covariance matrices for different types of model structure nspp <- 6 nsite <- 4 # Simulate a phylogeny that has a lot of phylogenetic signal (power = 1.3) phy <- compute.brlen(rtree(n = nspp), method = "Grafen", power = 1.3) # Simulate species means sd.sp <- 1 mean.sp <- rTraitCont(phy, model = "BM", sigma=sd.sp^2) # Replicate values of mean.sp over sites Y.sp <- rep(mean.sp, times=nsite) # Simulate site means sd.site <- 1 mean.site <- rnorm(nsite, sd=sd.site) # Replicate values of mean.site over sp Y.site <- rep(mean.site, each=nspp) # Compute a covariance matrix for phylogenetic attraction sd.attract <- 1 Vphy <- vcv(phy) # Standardize the phylogenetic covariance matrix to have determinant = 1. # (For an explanation of this standardization, see subsection 4.3.1 in Ives (2018)) Vphy <- Vphy/(det(Vphy)^(1/nspp)) # Construct the overall covariance matrix for phylogenetic attraction. # (For an explanation of Kronecker products, see subsection 4.3.1 in the book) V <- kronecker(diag(nrow = nsite, ncol = nsite), Vphy) Y.attract <- array(t(rmvnorm(n = 1, sigma = sd.attract^2*V))) # Simulate residual errors sd.e <- 1 Y.e <- rnorm(nspp*nsite, sd = sd.e) # Construct the dataset d <- data.frame(sp = rep(phy$tip.label, times = nsite), site = rep(1:nsite, each = nspp)) # Simulate abundance data d$Y <- Y.sp + Y.site + Y.attract + Y.e # Analyze the model pglmm(Y ~ 1 + (1|sp__) + (1|site) + (1|sp__@site), data = d, cov_ranef = list(sp = phy)) # Display random effects: the function `pglmm_plot_ranef()` does what # the name implies. You can set `show.image = TRUE` and `show.sim.image = TRUE` # to see the matrices and simulations. re <- pglmm_plot_ranef(Y ~ 1 + (1|sp__) + (1|site) + (1|sp__@site), data = d, cov_ranef = list(sp = phy), show.image = FALSE, show.sim.image = FALSE) ################################################# ### Example of a bipartite phylogenetic model ### ################################################# # Investigating covariance matrices for different types of model structure nspp <- 20 nsite <- 15 # Simulate a phylogeny that has a lot of phylogenetic signal (power = 1.3) phy.sp <- compute.brlen(rtree(n = nspp), method = "Grafen", power = 1.3) phy.site <- compute.brlen(rtree(n = nsite), method = "Grafen", power = 1.3) # Simulate species means mean.sp <- rTraitCont(phy.sp, model = "BM", sigma = 1) # Replicate values of mean.sp over sites Y.sp <- rep(mean.sp, times = nsite) # Simulate site means mean.site <- rTraitCont(phy.site, model = "BM", sigma = 1) # Replicate values of mean.site over sp Y.site <- rep(mean.site, each = nspp) # Generate covariance matrix for phylogenetic attraction among species sd.sp.attract <- 1 Vphy.sp <- vcv(phy.sp) Vphy.sp <- Vphy.sp/(det(Vphy.sp)^(1/nspp)) V.sp <- kronecker(diag(nrow = nsite, ncol = nsite), Vphy.sp) Y.sp.attract <- array(t(rmvnorm(n = 1, sigma = sd.sp.attract^2*V.sp))) # Generate covariance matrix for phylogenetic attraction among sites sd.site.attract <- 1 Vphy.site <- vcv(phy.site) Vphy.site <- Vphy.site/(det(Vphy.site)^(1/nsite)) V.site <- kronecker(Vphy.site, diag(nrow = nspp, ncol = nspp)) Y.site.attract <- array(t(rmvnorm(n = 1, sigma = sd.site.attract^2*V.site))) # Generate covariance matrix for phylogenetic attraction of species:site interaction sd.sp.site.attract <- 1 V.sp.site <- kronecker(Vphy.site, Vphy.sp) Y.sp.site.attract <- array(t(rmvnorm(n = 1, sigma = sd.sp.site.attract^2*V.sp.site))) # Simulate residual error sd.e <- 0.5 Y.e <- rnorm(nspp*nsite, sd = sd.e) # Construct the dataset d <- data.frame(sp = rep(phy.sp$tip.label, times = nsite), site = rep(phy.site$tip.label, each = nspp)) # Simulate abundance data d$Y <- Y.sp + Y.site + Y.sp.attract + Y.site.attract + Y.sp.site.attract + Y.e # Plot random effects covariance matrices and then add phylogenies # Note that, if show.image and show.sim are not specified, pglmm_plot_ranef() shows # the covariance matrices if nspp * nsite < 200 and shows simulations # if nspp * nsite > 100 re <- pglmm_plot_ranef(Y ~ 1 + (1|sp__) + (1|site__) + (1|sp__@site) + (1|sp@site__) + (1|sp__@site__), data=d, cov_ranef = list(sp = phy.sp, site = phy.site)) # This flips the phylogeny to match to covariance matrices rot.phy.site <- phy.site for(i in (nsite+1):(nsite+Nnode(phy.site))) rot.phy.site <- rotate(rot.phy.site, node = i) plot(phy.sp, main = "Species", direction = "upward") plot(rot.phy.site, main = "Site") # Analyze the simulated data and compute a P-value for the (1|sp__@site__) # random effect using a LRT. It is often better to fit the reduced model before # the full model, because it s numerically easier to fit the reduced model, # and then the parameter estimates from the reduced model can be given to the # full model. In this case, I have used the estimates of the random effects # from the reduce model, mod.r$ss, as the initial estimates for the same # parameters in the full model in the statement s2.init=c(mod.r$ss, 0.01)^2. # The final 0.01 is for the last random effect in the full model, (1|sp__@site__). # Note also that the output of the random effects from communityPGLMM(), mod.r$ss, # are the standard deviations, so they have to be squared for use as initial # values of variances in mod.f. mod.r <- pglmm(Y ~ 1 + (1|sp__) + (1|site__) + (1|sp__@site) + (1|sp@site__), data = d, cov_ranef = list(sp = phy.sp, site = phy.site)) mod.f <- pglmm(Y ~ 1 + (1|sp__) + (1|site__) + (1|sp__@site) + (1|sp@site__) + (1|sp__@site__), data = d, cov_ranef = list(sp = phy.sp, site = phy.site), s2.init = c(mod.r$ss, 0.01)^2) mod.f pvalue <- pchisq(2*(mod.f$logLik - mod.r$logLik), df = 1, lower.tail = FALSE) pvalue
## Structure of examples: # First, a (brief) description of model types, and how they are specified # - these are *not* to be run 'as-is'; they show how models should be organised # Second, a run-through of how to simulate, and then analyse, data # - these *are* to be run 'as-is'; they show how to format and work with data ############################################# ### Brief summary of models and their use ### ############################################# ## Model structures from Ives & Helmus (2011) if(FALSE){ # dat = data set for regression (note: must have a column "sp" and a column "site") # phy = phylogeny of class "phylo" # repulsion = to test phylogenetic repulsion or not # Model 1 (Eq. 1) z <- pglmm(freq ~ sp + (1|site) + (1|sp__@site), data = dat, family = "binomial", cov_ranef = list(sp = phy), REML = TRUE, verbose = TRUE, s2.init = .1) # Model 2 (Eq. 2) z <- pglmm(freq ~ sp + X + (1|site) + (X|sp__), data = dat, family = "binomial", cov_ranef = list(sp = phy), REML = TRUE, verbose = TRUE, s2.init = .1) # Model 3 (Eq. 3) z <- pglmm(freq ~ sp*X + (1|site) + (1|sp__@site), data = dat, family = "binomial", cov_ranef = list(sp = phy), REML = TRUE, verbose = TRUE, s2.init = .1) ## Model structure from Rafferty & Ives (2013) (Eq. 3) # dat = data set # phyPol = phylogeny for pollinators (pol) # phyPlt = phylogeny for plants (plt) z <- pglmm(freq ~ pol * X + (1|pol__) + (1|plt__) + (1|pol__@plt) + (1|pol@plt__) + (1|pol__@plt__), data = dat, family = "binomial", cov_ranef = list(pol = phyPol, plt = phyPlt), REML = TRUE, verbose = TRUE, s2.init = .1) } ##################################################### ### Detailed analysis showing covariance matrices ### ##################################################### # This is the example from section 4.3 in Ives, A. R. (2018) Mixed # and phylogenetic models: a conceptual introduction to correlated data. library(ape) library(mvtnorm) # Investigating covariance matrices for different types of model structure nspp <- 6 nsite <- 4 # Simulate a phylogeny that has a lot of phylogenetic signal (power = 1.3) phy <- compute.brlen(rtree(n = nspp), method = "Grafen", power = 1.3) # Simulate species means sd.sp <- 1 mean.sp <- rTraitCont(phy, model = "BM", sigma=sd.sp^2) # Replicate values of mean.sp over sites Y.sp <- rep(mean.sp, times=nsite) # Simulate site means sd.site <- 1 mean.site <- rnorm(nsite, sd=sd.site) # Replicate values of mean.site over sp Y.site <- rep(mean.site, each=nspp) # Compute a covariance matrix for phylogenetic attraction sd.attract <- 1 Vphy <- vcv(phy) # Standardize the phylogenetic covariance matrix to have determinant = 1. # (For an explanation of this standardization, see subsection 4.3.1 in Ives (2018)) Vphy <- Vphy/(det(Vphy)^(1/nspp)) # Construct the overall covariance matrix for phylogenetic attraction. # (For an explanation of Kronecker products, see subsection 4.3.1 in the book) V <- kronecker(diag(nrow = nsite, ncol = nsite), Vphy) Y.attract <- array(t(rmvnorm(n = 1, sigma = sd.attract^2*V))) # Simulate residual errors sd.e <- 1 Y.e <- rnorm(nspp*nsite, sd = sd.e) # Construct the dataset d <- data.frame(sp = rep(phy$tip.label, times = nsite), site = rep(1:nsite, each = nspp)) # Simulate abundance data d$Y <- Y.sp + Y.site + Y.attract + Y.e # Analyze the model pglmm(Y ~ 1 + (1|sp__) + (1|site) + (1|sp__@site), data = d, cov_ranef = list(sp = phy)) # Display random effects: the function `pglmm_plot_ranef()` does what # the name implies. You can set `show.image = TRUE` and `show.sim.image = TRUE` # to see the matrices and simulations. re <- pglmm_plot_ranef(Y ~ 1 + (1|sp__) + (1|site) + (1|sp__@site), data = d, cov_ranef = list(sp = phy), show.image = FALSE, show.sim.image = FALSE) ################################################# ### Example of a bipartite phylogenetic model ### ################################################# # Investigating covariance matrices for different types of model structure nspp <- 20 nsite <- 15 # Simulate a phylogeny that has a lot of phylogenetic signal (power = 1.3) phy.sp <- compute.brlen(rtree(n = nspp), method = "Grafen", power = 1.3) phy.site <- compute.brlen(rtree(n = nsite), method = "Grafen", power = 1.3) # Simulate species means mean.sp <- rTraitCont(phy.sp, model = "BM", sigma = 1) # Replicate values of mean.sp over sites Y.sp <- rep(mean.sp, times = nsite) # Simulate site means mean.site <- rTraitCont(phy.site, model = "BM", sigma = 1) # Replicate values of mean.site over sp Y.site <- rep(mean.site, each = nspp) # Generate covariance matrix for phylogenetic attraction among species sd.sp.attract <- 1 Vphy.sp <- vcv(phy.sp) Vphy.sp <- Vphy.sp/(det(Vphy.sp)^(1/nspp)) V.sp <- kronecker(diag(nrow = nsite, ncol = nsite), Vphy.sp) Y.sp.attract <- array(t(rmvnorm(n = 1, sigma = sd.sp.attract^2*V.sp))) # Generate covariance matrix for phylogenetic attraction among sites sd.site.attract <- 1 Vphy.site <- vcv(phy.site) Vphy.site <- Vphy.site/(det(Vphy.site)^(1/nsite)) V.site <- kronecker(Vphy.site, diag(nrow = nspp, ncol = nspp)) Y.site.attract <- array(t(rmvnorm(n = 1, sigma = sd.site.attract^2*V.site))) # Generate covariance matrix for phylogenetic attraction of species:site interaction sd.sp.site.attract <- 1 V.sp.site <- kronecker(Vphy.site, Vphy.sp) Y.sp.site.attract <- array(t(rmvnorm(n = 1, sigma = sd.sp.site.attract^2*V.sp.site))) # Simulate residual error sd.e <- 0.5 Y.e <- rnorm(nspp*nsite, sd = sd.e) # Construct the dataset d <- data.frame(sp = rep(phy.sp$tip.label, times = nsite), site = rep(phy.site$tip.label, each = nspp)) # Simulate abundance data d$Y <- Y.sp + Y.site + Y.sp.attract + Y.site.attract + Y.sp.site.attract + Y.e # Plot random effects covariance matrices and then add phylogenies # Note that, if show.image and show.sim are not specified, pglmm_plot_ranef() shows # the covariance matrices if nspp * nsite < 200 and shows simulations # if nspp * nsite > 100 re <- pglmm_plot_ranef(Y ~ 1 + (1|sp__) + (1|site__) + (1|sp__@site) + (1|sp@site__) + (1|sp__@site__), data=d, cov_ranef = list(sp = phy.sp, site = phy.site)) # This flips the phylogeny to match to covariance matrices rot.phy.site <- phy.site for(i in (nsite+1):(nsite+Nnode(phy.site))) rot.phy.site <- rotate(rot.phy.site, node = i) plot(phy.sp, main = "Species", direction = "upward") plot(rot.phy.site, main = "Site") # Analyze the simulated data and compute a P-value for the (1|sp__@site__) # random effect using a LRT. It is often better to fit the reduced model before # the full model, because it s numerically easier to fit the reduced model, # and then the parameter estimates from the reduced model can be given to the # full model. In this case, I have used the estimates of the random effects # from the reduce model, mod.r$ss, as the initial estimates for the same # parameters in the full model in the statement s2.init=c(mod.r$ss, 0.01)^2. # The final 0.01 is for the last random effect in the full model, (1|sp__@site__). # Note also that the output of the random effects from communityPGLMM(), mod.r$ss, # are the standard deviations, so they have to be squared for use as initial # values of variances in mod.f. mod.r <- pglmm(Y ~ 1 + (1|sp__) + (1|site__) + (1|sp__@site) + (1|sp@site__), data = d, cov_ranef = list(sp = phy.sp, site = phy.site)) mod.f <- pglmm(Y ~ 1 + (1|sp__) + (1|site__) + (1|sp__@site) + (1|sp@site__) + (1|sp__@site__), data = d, cov_ranef = list(sp = phy.sp, site = phy.site), s2.init = c(mod.r$ss, 0.01)^2) mod.f pvalue <- pchisq(2*(mod.f$logLik - mod.r$logLik), df = 1, lower.tail = FALSE) pvalue
pglmm_compare
performs linear regression for Gaussian, binomial and Poisson
phylogenetic data, estimating regression coefficients with approximate standard
errors. It simultaneously estimates the strength of phylogenetic signal in the
residuals and gives an approximate conditional likelihood ratio test for the
hypothesis that there is no signal. Therefore, when applied without
predictor (independent) variables, it gives a test for phylogenetic signal.
pglmm_compare
is a wrapper for pglmm
tailored for comparative data in
which each value of the response (dependent) variable corresponds to a single tip
on a phylogenetic tree. If there are multiple measures for each species, pglmm
will be helpful.
pglmm_compare( formula, family = "gaussian", data = list(), phy, REML = TRUE, optimizer = c("nelder-mead-nlopt", "bobyqa", "Nelder-Mead", "subplex"), add.obs.re = TRUE, verbose = FALSE, cpp = TRUE, bayes = FALSE, reltol = 10^-6, maxit = 500, tol.pql = 10^-6, maxit.pql = 200, marginal.summ = "mean", calc.DIC = FALSE, prior = "inla.default", prior_alpha = 0.1, prior_mu = 1, ML.init = FALSE, s2.init = 1, B.init = NULL )
pglmm_compare( formula, family = "gaussian", data = list(), phy, REML = TRUE, optimizer = c("nelder-mead-nlopt", "bobyqa", "Nelder-Mead", "subplex"), add.obs.re = TRUE, verbose = FALSE, cpp = TRUE, bayes = FALSE, reltol = 10^-6, maxit = 500, tol.pql = 10^-6, maxit.pql = 200, marginal.summ = "mean", calc.DIC = FALSE, prior = "inla.default", prior_alpha = 0.1, prior_mu = 1, ML.init = FALSE, s2.init = 1, B.init = NULL )
formula |
A two-sided linear formula object describing the
fixed-effects of the model; for example, Y ~ X. Binomial data can be either
presence/absence, or a two-column array of 'successes' and 'failures'.
For both binomial and Poisson data, we add an observation-level
random term by default via |
family |
Either "gaussian" for a Linear Mixed Model, or
"binomial" or "poisson" for Generalized Linear Mixed Models.
|
data |
A data frame containing the variables named in formula. It must has the tip labels of the phylogeny as row names; if they are not in the same order, the data frame will be arranged so that row names match the order of tip labels. |
phy |
A phylogenetic tree as an object of class "phylo". |
REML |
Whether REML or ML is used for model fitting the random effects. Ignored if
|
optimizer |
nelder-mead-nlopt (default), bobyqa, Nelder-Mead, or subplex.
Nelder-Mead is from the stats package and the other optimizers are from the nloptr package.
Ignored if |
add.obs.re |
Whether to add observation-level random term for binomial and Poisson
families. Normally it would be a good idea to add this to account for overdispersion,
so |
verbose |
If |
cpp |
Whether to use C++ function for optim. Default is TRUE. Ignored if |
bayes |
Whether to fit a Bayesian version of the PGLMM using |
reltol |
A control parameter dictating the relative tolerance
for convergence in the optimization; see |
maxit |
A control parameter dictating the maximum number of
iterations in the optimization; see |
tol.pql |
A control parameter dictating the tolerance for
convergence in the PQL estimates of the mean components of the
GLMM. Ignored if |
maxit.pql |
A control parameter dictating the maximum number
of iterations in the PQL estimates of the mean components of the
GLMM. Ignored if |
marginal.summ |
Summary statistic to use for the estimate of coefficients when
doing a Bayesian PGLMM (when |
calc.DIC |
Should the Deviance Information Criterion be calculated and returned,
when doing a Bayesian PGLMM? Ignored if |
prior |
Which type of default prior should be used by |
prior_alpha |
Only used if |
prior_mu |
Only used if |
ML.init |
Only relevant if |
s2.init |
An array of initial estimates of s2. If s2.init is not provided for
|
B.init |
Initial estimates of |
pglmm_compare
in the package phyr
is similar to binaryPGLMM
in
the package ape
, although it has much broader functionality, including
accepting more than just binary data, implementing Bayesian analyses, etc.
For non-Gaussian data, the function estimates parameters for the model
where is a covariance matrix derived from a phylogeny
(typically under the assumption of Brownian motion evolution). Although
mathematically there is no requirement for
to be ultrametric,
forcing
into ultrametric form can aide in the interpretation of the
model. This is especially true for binary data, because in regression for
binary dependent variables, only the off-diagonal elements (i.e., covariances)
of matrix
are biologically meaningful (see Ives & Garland 2014).
The function converts a phylo tree object into a covariance matrix,
and further standardizes this matrix to have determinant = 1. This in effect
standardizes the interpretation of the scalar
s2
. Although mathematically
not required, it is a very good idea to standardize the predictor
(independent) variables to have mean 0 and variance 1. This will make the
function more robust and improve the interpretation of the regression
coefficients.
For Gaussian data, the function estimates parameters for the model
where gives the non-phylogenetic residual variance. Note that this
is equivalent to a model with Pagel's lambda transformation.
An object (list) of class pglmm_compare
with the following elements:
formula |
the formula for fixed effects |
formula_original |
the formula for both fixed effects and random effects |
data |
the dataset |
family |
either |
B |
estimates of the regression coefficients |
B.se |
approximate standard errors of the fixed effects regression coefficients.
This is set to NULL if |
B.ci |
approximate bayesian credible interval of the fixed effects regression coefficients.
This is set to NULL if |
B.cov |
approximate covariance matrix for the fixed effects regression coefficients |
B.zscore |
approximate Z scores for the fixed effects regression coefficients. This is set to NULL if |
B.pvalue |
approximate tests for the fixed effects regression coefficients being different from zero. This is set to NULL if |
ss |
random effects' standard deviations for the covariance matrix |
s2r |
random effects variances for non-nested random effects |
s2n |
random effects variances for nested random effects |
s2resid |
for linear mixed models, the residual variance |
s2r.ci |
Bayesian credible interval for random effects variances for non-nested random effects.
This is set to NULL if |
s2n.ci |
Bayesian credible interval for random effects variances for nested random effects.
This is set to NULL if |
s2resid.ci |
Bayesian credible interval for linear mixed models, the residual variance.
This is set to NULL if |
logLik |
for linear mixed models, the log-likelihood for either the restricted likelihood ( |
AIC |
for linear mixed models, the AIC for either the restricted likelihood ( |
BIC |
for linear mixed models, the BIC for either the restricted likelihood ( |
DIC |
for bayesian PGLMM, this is the Deviance Information Criterion metric of model fit. This is set to NULL if |
REML |
whether or not REML is used ( |
bayes |
whether or not a Bayesian model was fit. |
marginal.summ |
The specified summary statistic used to summarise the Bayesian marginal distributions.
Only present if |
s2.init |
the user-provided initial estimates of |
B.init |
the user-provided initial estimates of |
Y |
the response (dependent) variable returned in matrix form |
X |
the predictor (independent) variables returned in matrix form (including 1s in the first column) |
H |
the residuals. For linear mixed models, this does not account for random terms,
To get residuals after accounting for both fixed and random terms, use |
iV |
the inverse of the covariance matrix. This is NULL if |
mu |
predicted mean values for the generalized linear mixed model (i.e. similar to |
Zt |
the design matrix for random effects. This is set to NULL if |
St |
diagonal matrix that maps the random effects variances onto the design matrix |
convcode |
the convergence code provided by |
niter |
number of iterations performed by |
inla.model |
Model object fit by underlying |
Anthony R. Ives
Ives, A. R. and Helmus, M. R. (2011) Generalized linear mixed models for phylogenetic analyses of community structure. Ecological Monographs, 81, 511–525.
Ives, A. R. and Garland, T., Jr. (2014) Phylogenetic regression for binary dependent variables. Pages 231–261 in L. Z. Garamszegi, editor. Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology. Springer-Verlag, Berlin Heidelberg.
pglmm
; package ape and its function binaryPGLMM
;
package phylolm and its function phyloglm
; package MCMCglmm
## Illustration of `pglmm_compare` with simulated data # Generate random phylogeny library(ape) n <- 100 phy <- compute.brlen(rtree(n=n), method = "Grafen", power = 1) # Generate random data and standardize to have mean 0 and variance 1 X1 <- rTraitCont(phy, model = "BM", sigma = 1) X1 <- (X1 - mean(X1))/var(X1) # Simulate binary Y sim.dat <- data.frame(Y = array(0, dim = n), X1 = X1, row.names = phy$tip.label) sim.dat$Y <- ape::binaryPGLMM.sim(Y ~ X1, phy = phy, data=sim.dat, s2 = 1, B = matrix(c(0, .25), nrow = 2, ncol = 1), nrep = 1)$Y # Fit model pglmm_compare(Y ~ X1, family = "binomial", phy = phy, data = sim.dat) # Compare with `binaryPGLMM` ape::binaryPGLMM(Y ~ X1, phy = phy, data = sim.dat) # Compare with `phyloglm` summary(phylolm::phyloglm(Y ~ X1, phy = phy, data = sim.dat)) # Compare with `glm` that does not account for phylogeny summary(glm(Y ~ X1, data = sim.dat, family = "binomial")) # Compare with logistf() that does not account # for phylogeny but is less biased than glm() logistf::logistf(Y ~ X1, data = sim.dat) ## Fit model with bayes = TRUE # pglmm_compare(Y ~ X1, family = "binomial", phy = phy, data = sim.dat, # bayes = TRUE, calc.DIC = TRUE) # Compare with `MCMCglmm` V <- vcv(phy) V <- V/max(V) detV <- exp(determinant(V)$modulus[1]) V <- V/detV^(1/n) invV <- Matrix::Matrix(solve(V),sparse = TRUE) sim.dat$species <- phy$tip.label rownames(invV) <- sim.dat$species nitt <- 43000 thin <- 10 burnin <- 3000 prior <- list(R=list(V=1, fix=1), G=list(G1=list(V=1, nu=1000, alpha.mu=0, alpha.V=1))) # commented out to save time # summary(MCMCglmm::MCMCglmm(Y ~ X1, random = ~species, ginvers = list(species = invV), # data = sim.dat, slice = TRUE, nitt = nitt, thin = thin, burnin = burnin, # family = "categorical", prior = prior, verbose = FALSE))
## Illustration of `pglmm_compare` with simulated data # Generate random phylogeny library(ape) n <- 100 phy <- compute.brlen(rtree(n=n), method = "Grafen", power = 1) # Generate random data and standardize to have mean 0 and variance 1 X1 <- rTraitCont(phy, model = "BM", sigma = 1) X1 <- (X1 - mean(X1))/var(X1) # Simulate binary Y sim.dat <- data.frame(Y = array(0, dim = n), X1 = X1, row.names = phy$tip.label) sim.dat$Y <- ape::binaryPGLMM.sim(Y ~ X1, phy = phy, data=sim.dat, s2 = 1, B = matrix(c(0, .25), nrow = 2, ncol = 1), nrep = 1)$Y # Fit model pglmm_compare(Y ~ X1, family = "binomial", phy = phy, data = sim.dat) # Compare with `binaryPGLMM` ape::binaryPGLMM(Y ~ X1, phy = phy, data = sim.dat) # Compare with `phyloglm` summary(phylolm::phyloglm(Y ~ X1, phy = phy, data = sim.dat)) # Compare with `glm` that does not account for phylogeny summary(glm(Y ~ X1, data = sim.dat, family = "binomial")) # Compare with logistf() that does not account # for phylogeny but is less biased than glm() logistf::logistf(Y ~ X1, data = sim.dat) ## Fit model with bayes = TRUE # pglmm_compare(Y ~ X1, family = "binomial", phy = phy, data = sim.dat, # bayes = TRUE, calc.DIC = TRUE) # Compare with `MCMCglmm` V <- vcv(phy) V <- V/max(V) detV <- exp(determinant(V)$modulus[1]) V <- V/detV^(1/n) invV <- Matrix::Matrix(solve(V),sparse = TRUE) sim.dat$species <- phy$tip.label rownames(invV) <- sim.dat$species nitt <- 43000 thin <- 10 burnin <- 3000 prior <- list(R=list(V=1, fix=1), G=list(G1=list(V=1, nu=1000, alpha.mu=0, alpha.V=1))) # commented out to save time # summary(MCMCglmm::MCMCglmm(Y ~ X1, random = ~species, ginvers = list(species = invV), # data = sim.dat, slice = TRUE, nitt = nitt, thin = thin, burnin = burnin, # family = "categorical", prior = prior, verbose = FALSE))
pglmm_matrix_structure
produces the entire
covariance matrix structure (V) when you specify random effects.pglmm_matrix_structure
produces the entire
covariance matrix structure (V) when you specify random effects.
pglmm_matrix_structure( formula, data = list(), family = "binomial", cov_ranef, repulsion = FALSE, ss = 1, cpp = TRUE ) communityPGLMM.matrix.structure( formula, data = list(), family = "binomial", cov_ranef, repulsion = FALSE, ss = 1, cpp = TRUE )
pglmm_matrix_structure( formula, data = list(), family = "binomial", cov_ranef, repulsion = FALSE, ss = 1, cpp = TRUE ) communityPGLMM.matrix.structure( formula, data = list(), family = "binomial", cov_ranef, repulsion = FALSE, ss = 1, cpp = TRUE )
formula |
A two-sided linear formula object describing the mixed effects of the model. To specify that a random term should have phylogenetic covariance matrix along
with non-phylogenetic one, add Note that correlated random terms are not allowed. For example,
|
data |
A |
family |
Either "gaussian" for a Linear Mixed Model, or
"binomial" or "poisson" for Generalized Linear Mixed Models.
"family" should be specified as a character string (i.e., quoted). For binomial and
Poisson data, we use the canonical logit and log link functions, respectively.
Binomial data can be either presence/absence, or a two-column array of 'successes' and 'failures'.
For both binomial and Poisson data, we add an observation-level
random term by default via |
cov_ranef |
A named list of covariance matrices of random terms. The names should be the
group variables that are used as random terms with specified covariance matrices
(without the two underscores, e.g. |
repulsion |
When there are nested random terms specified, |
ss |
Which of the |
cpp |
Whether to use C++ function for optim. Default is TRUE. Ignored if |
A design matrix.
Plot variance-cov matrix of random terms; also it is optional to simulate and
visualize data based on these var-cov matrices. The input can be a communityPGLMM
model (by setting argument x
). If no model has been fitted, you can also specify
data, formula, and family, etc. without actually fitting the model, which will
save time.
pglmm_plot_ranef( formula = NULL, data = NULL, family = "gaussian", sp.var = "sp", site.var = "site", tree = NULL, tree_site = NULL, repulsion = FALSE, x = NULL, show.image = TRUE, show.sim.image = FALSE, random.effects = NULL, add.tree.sp = TRUE, add.tree.site = FALSE, cov_ranef = NULL, tree.panel.space = 0.5, title.space = 5, tree.size = 3, ... ) communityPGLMM.show.re( formula = NULL, data = NULL, family = "gaussian", sp.var = "sp", site.var = "site", tree = NULL, tree_site = NULL, repulsion = FALSE, x = NULL, show.image = TRUE, show.sim.image = FALSE, random.effects = NULL, add.tree.sp = TRUE, add.tree.site = FALSE, cov_ranef = NULL, tree.panel.space = 0.5, title.space = 5, tree.size = 3, ... ) pglmm_plot_re( formula = NULL, data = NULL, family = "gaussian", sp.var = "sp", site.var = "site", tree = NULL, tree_site = NULL, repulsion = FALSE, x = NULL, show.image = TRUE, show.sim.image = FALSE, random.effects = NULL, add.tree.sp = TRUE, add.tree.site = FALSE, cov_ranef = NULL, tree.panel.space = 0.5, title.space = 5, tree.size = 3, ... ) communityPGLMM.plot.re( formula = NULL, data = NULL, family = "gaussian", sp.var = "sp", site.var = "site", tree = NULL, tree_site = NULL, repulsion = FALSE, x = NULL, show.image = TRUE, show.sim.image = FALSE, random.effects = NULL, add.tree.sp = TRUE, add.tree.site = FALSE, cov_ranef = NULL, tree.panel.space = 0.5, title.space = 5, tree.size = 3, ... )
pglmm_plot_ranef( formula = NULL, data = NULL, family = "gaussian", sp.var = "sp", site.var = "site", tree = NULL, tree_site = NULL, repulsion = FALSE, x = NULL, show.image = TRUE, show.sim.image = FALSE, random.effects = NULL, add.tree.sp = TRUE, add.tree.site = FALSE, cov_ranef = NULL, tree.panel.space = 0.5, title.space = 5, tree.size = 3, ... ) communityPGLMM.show.re( formula = NULL, data = NULL, family = "gaussian", sp.var = "sp", site.var = "site", tree = NULL, tree_site = NULL, repulsion = FALSE, x = NULL, show.image = TRUE, show.sim.image = FALSE, random.effects = NULL, add.tree.sp = TRUE, add.tree.site = FALSE, cov_ranef = NULL, tree.panel.space = 0.5, title.space = 5, tree.size = 3, ... ) pglmm_plot_re( formula = NULL, data = NULL, family = "gaussian", sp.var = "sp", site.var = "site", tree = NULL, tree_site = NULL, repulsion = FALSE, x = NULL, show.image = TRUE, show.sim.image = FALSE, random.effects = NULL, add.tree.sp = TRUE, add.tree.site = FALSE, cov_ranef = NULL, tree.panel.space = 0.5, title.space = 5, tree.size = 3, ... ) communityPGLMM.plot.re( formula = NULL, data = NULL, family = "gaussian", sp.var = "sp", site.var = "site", tree = NULL, tree_site = NULL, repulsion = FALSE, x = NULL, show.image = TRUE, show.sim.image = FALSE, random.effects = NULL, add.tree.sp = TRUE, add.tree.site = FALSE, cov_ranef = NULL, tree.panel.space = 0.5, title.space = 5, tree.size = 3, ... )
formula |
A two-sided linear formula object describing the mixed effects of the model. To specify that a random term should have phylogenetic covariance matrix along
with non-phylogenetic one, add Note that correlated random terms are not allowed. For example,
|
data |
A |
family |
Either "gaussian" for a Linear Mixed Model, or
"binomial" or "poisson" for Generalized Linear Mixed Models.
"family" should be specified as a character string (i.e., quoted). For binomial and
Poisson data, we use the canonical logit and log link functions, respectively.
Binomial data can be either presence/absence, or a two-column array of 'successes' and 'failures'.
For both binomial and Poisson data, we add an observation-level
random term by default via |
sp.var |
The variable name of "species"; y-axis of the image. |
site.var |
The variable name of "site"; x-axis of the image. |
tree |
A phylogeny for column sp, with "phylo" class, or a covariance matrix for sp.
Make sure to have all species in the matrix; if the matrix is not standardized,
(i.e., det(tree) != 1), |
tree_site |
A second phylogeny for "site". This is required only if the site column contains species instead of sites. This can be used for bipartitie questions; tree_site can also be a covariance matrix. Make sure to have all sites in the matrix; if the matrix is not standardized (i.e., det(tree_site) != 1), pglmm' will try to standardize it for you. No longer used: keep here for compatibility. |
repulsion |
When there are nested random terms specified, |
x |
A fitted model with class communityPGLMM. |
show.image |
Whether to show the images of random effects. |
show.sim.image |
Whether to show the images of simulated site by sp matrix. This can be useful to see how the phylogenetic information were included. |
random.effects |
Optional pre-build list of random effects. If |
add.tree.sp |
Whether to add a phylogeny of species at the top of the simulated site by sp matrix plot, default is TRUE. |
add.tree.site |
Whether to add a phylogeny of sites at the right of the simulated site by sp matrix plot, default is FALSE. |
cov_ranef |
A named list of covariance matrices of random terms. The names should be the
group variables that are used as random terms with specified covariance matrices
(without the two underscores, e.g. |
tree.panel.space |
The number of lines between the phylogeny and the matrix plot, if add.tree is TRUE. |
title.space |
The number of lines between the title and the matrix plot, if add.tree is TRUE. |
tree.size |
The height of the phylogeny to be plotted (number of lines), if add.tree is TRUE. |
... |
Additional arguments for
|
A hidden list, including the covariance matrices and simulated site by species matrices.
Individual plots are saved as plt_re_list
and plt_sim_list
. If show.image
or
show.sim.image
is TRUE, the corresponding final plot (plt_re_all_in_one
or
plt_sim_all_in_one
) can be saved as external file using ggplot2::ggsave
as
it is a grid object.
pglmm_predicted_values
calculates the predicted
values of Y; for the generalized linear mixed model (family %in%
c("binomial","poisson"), these values are in the transformed space.
pglmm_predicted_values( x, cpp = TRUE, gaussian.pred = c("nearest_node", "tip_rm"), re.form = NULL, type = c("link", "response"), ... ) communityPGLMM.predicted.values( x, cpp = TRUE, gaussian.pred = c("nearest_node", "tip_rm") )
pglmm_predicted_values( x, cpp = TRUE, gaussian.pred = c("nearest_node", "tip_rm"), re.form = NULL, type = c("link", "response"), ... ) communityPGLMM.predicted.values( x, cpp = TRUE, gaussian.pred = c("nearest_node", "tip_rm") )
x |
A fitted model with class communityPGLMM. |
cpp |
Whether to use c++ code. Default is TRUE. |
gaussian.pred |
When family is gaussian, which type of prediction to calculate? Option nearest_node will predict values to the nearest node, which is same as lme4::predict or fitted. Option tip_rm will remove the point then predict the value of this point with remaining ones. |
re.form |
(formula, |
type |
character string - either |
... |
Optional additional parameters. None are used at present. |
A data frame with column Y_hat (predicted values accounting for both fixed and random terms).
pglmm_profile_LRT
tests statistical significance of the
phylogenetic random effect using profile likelihoods when bayes = F.
The resulting p-values are conditional on the fixed
estimates of the other parameters, in contrast to a standard
likelihood ratio test.
pglmm_profile_LRT(x, re.number = 0, cpp = TRUE) communityPGLMM.profile.LRT(x, re.number = 0, cpp = TRUE)
pglmm_profile_LRT(x, re.number = 0, cpp = TRUE) communityPGLMM.profile.LRT(x, re.number = 0, cpp = TRUE)
x |
A fitted model with class communityPGLMM with |
re.number |
Which random term to test? Can be a vector with length >1. |
cpp |
Whether to use C++ function for optim. Default is TRUE. Ignored if |
A list of likelihood, df, and p-value.
A phylogeny with more species than the community data.
phylotree
phylotree
Newick format.
plot_bayes generic
plot_bayes(x, ...)
plot_bayes(x, ...)
x |
A communityPGLMM object fit with |
... |
Further arguments to pass to or from other methods. |
A ggplot object
Plots a representation of the marginal posterior distribution of model parameters. Note this
function requires the packages ggplot2
and ggridges
to be installed.
plot_data( x, sp.var = "sp", site.var = "site", show.sp.names = FALSE, show.site.names = FALSE, digits = max(3, getOption("digits") - 3), predicted = FALSE, ... ) ## S3 method for class 'communityPGLMM' plot_bayes(x, n_samp = 1000, sort = TRUE, ...)
plot_data( x, sp.var = "sp", site.var = "site", show.sp.names = FALSE, show.site.names = FALSE, digits = max(3, getOption("digits") - 3), predicted = FALSE, ... ) ## S3 method for class 'communityPGLMM' plot_bayes(x, n_samp = 1000, sort = TRUE, ...)
x |
A communityPGLMM object fit with |
sp.var |
The variable name of "species"; y-axis of the image. |
site.var |
The variable name of "site"; x-axis of the image. |
show.sp.names |
Whether to print species names as y-axis labels. |
show.site.names |
Whether to print site names as x-axis labels. |
digits |
Not used. |
predicted |
Whether to plot predicted values side by side with observed ones. |
... |
Further arguments to pass to or from other methods. |
n_samp |
Number of sample from the marginal posterior to take in order to estimate the posterior density. |
sort |
Whether to plot different terms in the order of their estimations. Default is 'TRUE'. |
A ggplot object
The underlying plot grid object is returned but invisible. It can be saved for later uses.
Predict Function for communityPGLMM Model Objects
## S3 method for class 'communityPGLMM' predict(object, newdata = NULL, ...)
## S3 method for class 'communityPGLMM' predict(object, newdata = NULL, ...)
object |
Object of class inheriting from |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
... |
further arguments passed to or from other methods. |
The form of the value returned by predict
depends on the
class of its argument. See the documentation of the
particular methods for details of what is produced by that method.
pglmm
This function is mainly used within pglmm
but can also be used independently to
prepare a list of random effects, which then can be updated by users for more complex models.
prep_dat_pglmm( formula, data, cov_ranef = NULL, repulsion = FALSE, prep.re.effects = TRUE, family = "gaussian", add.obs.re = TRUE, bayes = FALSE, bayes_nested_matrix_as_list = FALSE )
prep_dat_pglmm( formula, data, cov_ranef = NULL, repulsion = FALSE, prep.re.effects = TRUE, family = "gaussian", add.obs.re = TRUE, bayes = FALSE, bayes_nested_matrix_as_list = FALSE )
formula |
A two-sided linear formula object describing the mixed effects of the model. To specify that a random term should have phylogenetic covariance matrix along
with non-phylogenetic one, add Note that correlated random terms are not allowed. For example,
|
data |
A |
cov_ranef |
A named list of covariance matrices of random terms. The names should be the
group variables that are used as random terms with specified covariance matrices
(without the two underscores, e.g. |
repulsion |
When there are nested random terms specified, |
prep.re.effects |
Whether to prepare random effects for users. |
family |
Either "gaussian" for a Linear Mixed Model, or
"binomial" or "poisson" for Generalized Linear Mixed Models.
"family" should be specified as a character string (i.e., quoted). For binomial and
Poisson data, we use the canonical logit and log link functions, respectively.
Binomial data can be either presence/absence, or a two-column array of 'successes' and 'failures'.
For both binomial and Poisson data, we add an observation-level
random term by default via |
add.obs.re |
Whether to add an observation-level random term for binomial or Poisson
distributions. Normally it would be a good idea to add this to account for overdispersion,
so |
bayes |
Whether to fit a Bayesian version of the PGLMM using |
bayes_nested_matrix_as_list |
For |
A list with updated formula, random.effects, and updated cov_ranef.
Print summary information of fitted model
## S3 method for class 'communityPGLMM' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'communityPGLMM' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
A fitted communityPGLMM model. |
digits |
Minimal number of significant digits for printing, as in |
... |
Additional arguments, currently ignored. |
Print summary information of fitted model
## S3 method for class 'pglmm_compare' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'pglmm_compare' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
A fitted pglmm_compare. |
digits |
Minimal number of significant digits for printing, as in |
... |
Additional arguments, currently ignored. |
Calculate the bounded phylogenetic biodiversity metrics: phylogenetic species variability, richness, evenness and clustering for one or multiple communities.
psv( comm, tree, compute.var = TRUE, scale.vcv = TRUE, prune.tree = FALSE, cpp = TRUE ) psr( comm, tree, compute.var = TRUE, scale.vcv = TRUE, prune.tree = FALSE, cpp = TRUE ) pse(comm, tree, scale.vcv = TRUE, prune.tree = FALSE, cpp = TRUE) psc(comm, tree, scale.vcv = TRUE, prune.tree = FALSE) psv.spp(comm, tree, scale.vcv = TRUE, prune.tree = FALSE, cpp = TRUE) psd( comm, tree, compute.var = TRUE, scale.vcv = TRUE, prune.tree = FALSE, cpp = TRUE )
psv( comm, tree, compute.var = TRUE, scale.vcv = TRUE, prune.tree = FALSE, cpp = TRUE ) psr( comm, tree, compute.var = TRUE, scale.vcv = TRUE, prune.tree = FALSE, cpp = TRUE ) pse(comm, tree, scale.vcv = TRUE, prune.tree = FALSE, cpp = TRUE) psc(comm, tree, scale.vcv = TRUE, prune.tree = FALSE) psv.spp(comm, tree, scale.vcv = TRUE, prune.tree = FALSE, cpp = TRUE) psd( comm, tree, compute.var = TRUE, scale.vcv = TRUE, prune.tree = FALSE, cpp = TRUE )
comm |
Community data matrix, site as rows and species as columns, site names as row names. |
tree |
A phylo tree object with class "phylo" or a phylogenetic covariance matrix. |
compute.var |
Logical, default is TRUE, computes the expected variances for PSV and PSR for each community. |
scale.vcv |
Logical, default is TRUE, scale the phylogenetic covariance matrix to bound the metric between 0 and 1 (i.e. correlations). |
prune.tree |
Logical, default is FALSE, prune the phylogeny before converting to var-cov matrix? Pruning and then converting VS converting then subsetting may have different var-cov matrix resulted. |
cpp |
Logical, default is TRUE, whether to use cpp for internal calculations. |
Phylogenetic species variability (PSV) quantifies how
phylogenetic relatedness decreases the variance of a hypothetical
unselected/neutral trait shared by all species in a community.
The expected value of PSV is statistically independent of species richness,
is one when all species in a community are unrelated (i.e., a star phylogeny)
and approaches zero as species become more related. PSV is directly related
to mean phylogenetic distance, except except calculated on a scaled phylogenetic
covariance matrix. The expected variance around PSV for any community of a particular
species richness can be approximated. To address how individual species
contribute to the mean PSV of a data set, the function psv.spp
gives
signed proportions of the total deviation from the mean PSV that occurs when
all species are removed from the data set one at a time. The absolute values
of these “species effects” tend to positively correlate with species prevalence.
Returns a dataframe of the respective phylogenetic species diversity metric values
These metrics are bounded either between zero and one (PSV, PSE, PSC) or zero and species richness (PSR); but the metrics asymptotically approach zero as relatedness increases. Zero can be assigned to communities with less than two species, but conclusions drawn from assigning communities zero values need be carefully explored for any data set. The data sets need not be species-community data sets but may be any community data set with an associated phylogeny.
Matthew Helmus [email protected]
Helmus M.R., Bland T.J., Williams C.K. & Ives A.R. 2007. Phylogenetic measures of biodiversity. American Naturalist, 169, E68-E83
psv(comm = comm_a, tree = phylotree)
psv(comm = comm_a, tree = phylotree)
Extract the random-effects estimates
## S3 method for class 'communityPGLMM' ranef(object, ...)
## S3 method for class 'communityPGLMM' ranef(object, ...)
object |
A fitted model with class communityPGLMM. |
... |
Ignored. |
Extract the estimates of the random-effects parameters from a fitted model.
A dataframe of random-effects estimates.
cor_phylo
This function is to be called on a cor_phylo
object if one or more bootstrap
replicates fail to converge.
It allows the user to change parameters for the optimizer to get it to converge.
One or more of the resulting cp_refits
object(s) can be supplied to
boot_ci
along with the original cor_phylo
object to calculate confidence
intervals from only bootstrap replicates that converged.
refit_boots(cp_obj, inds = NULL, ...) ## S3 method for class 'cp_refits' print(x, digits = max(3, getOption("digits") - 3), ...)
refit_boots(cp_obj, inds = NULL, ...) ## S3 method for class 'cp_refits' print(x, digits = max(3, getOption("digits") - 3), ...)
cp_obj |
The original |
inds |
Vector of indices indicating the bootstraps you want to refit.
This is useful if you want to try refitting only a portion of bootstrap
replicates.
By passing |
... |
Arguments that should be changed from the original call to |
x |
an object of class |
digits |
the number of digits to be printed. |
A cp_refits
object, which is a list of cor_phylo
objects
corresponding to each matrix in <original cor_phylo object>$bootstrap$mats
.
print(cp_refits)
: prints cp_refits
objects
Getting different types of residuals for communityPGLMM objects.
## S3 method for class 'communityPGLMM' residuals( object, type = if (object$family %in% c("binomial", "poisson")) "deviance" else "response", scaled = FALSE, ... )
## S3 method for class 'communityPGLMM' residuals( object, type = if (object$family %in% c("binomial", "poisson")) "deviance" else "response", scaled = FALSE, ... )
object |
A fitted model with class communityPGLMM. |
type |
Type of residuals, currently only "response" for gaussian pglmm; "deviance" (default) and "response" for binomial and poisson pglmm. |
scaled |
Scale residuals by residual standard deviation for gaussian pglmm. |
... |
Additional arguments, ignored for method compatibility. |
A vector of residuals.
This function will remove site that has no observations in a site by species data frame.
rm_site_noobs(df, warn = FALSE)
rm_site_noobs(df, warn = FALSE)
df |
A data frame in wide form, i.e. site by species data frame, with site names as row name. |
warn |
Whether to warn when any site has no species? Default is |
A site by species data frame.
Daijiang Li
Remove species that not observed in any site
rm_sp_noobs(df, warn = FALSE)
rm_sp_noobs(df, warn = FALSE)
df |
A data frame in wide form, i.e. site by species data frame, with site names as row name. |
warn |
Whether to warn when any species does not occur in at least one site? Default is |
A site by species data frame.
Daijiang Li
This function will remove species that has no observations in any site.
Simulate from a communityPGLMM object
## S3 method for class 'communityPGLMM' simulate( object, nsim = 1, seed = NULL, re.form = NULL, ntry = 5, full = FALSE, ... )
## S3 method for class 'communityPGLMM' simulate( object, nsim = 1, seed = NULL, re.form = NULL, ntry = 5, full = FALSE, ... )
object |
A fitted model object with class 'communityPGLMM'. |
nsim |
positive integer scalar - the number of responses to simulate. |
seed |
an optional seed to be used in |
re.form |
(formula, |
ntry |
Number of times to retry simulation in the case of |
full |
If |
... |
optional additional arguments (none are used in
|
Summary information of fitted model
## S3 method for class 'communityPGLMM' summary(object, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'communityPGLMM' summary(object, digits = max(3, getOption("digits") - 3), ...)
object |
A fitted model with class communityPGLMM. |
digits |
Minimal number of significant digits for printing, as in |
... |
Additional arguments, currently ignored. |
Summary information of fitted pglmm_compare model
## S3 method for class 'pglmm_compare' summary(object, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'pglmm_compare' summary(object, digits = max(3, getOption("digits") - 3), ...)
object |
A fitted model with class pglmm_compare. |
digits |
Minimal number of significant digits for printing, as in |
... |
Additional arguments, currently ignored. |
A data frame of species functional traits.
traits
traits
A data frame with 18 species and 3 variables: sla, height, and seed dispersal mode.
This function will convert a phylogeny to a Var-cov matrix.
vcv2(phy, corr = FALSE)
vcv2(phy, corr = FALSE)
phy |
A phylogeny with "phylo" as class. |
corr |
Whether to return a correlation matrix instead of Var-cov matrix. Default is FALSE. |
A phylogenetic var-cov matrix.