Title: | Deriving Phylogenies from Synthesis Trees |
---|---|
Description: | To facilitate generating phylogenies from synthesis trees. |
Authors: | Daijiang Li [aut, cre] |
Maintainer: | Daijiang Li <[email protected]> |
License: | GPL-3 |
Version: | 1.0.3 |
Built: | 2025-02-14 02:56:26 UTC |
Source: | https://github.com/daijiang/rtrees |
Based on the classification of tips, find where is the basal and root node for each genus and each family. Such information can be later used to graft new tips onto the phylogeny. This function can be used to process a user provided tree.
add_root_info( tree, classification, process_all_tips = TRUE, genus_list = NULL, family_list = NULL, show_warning = FALSE )
add_root_info( tree, classification, process_all_tips = TRUE, genus_list = NULL, family_list = NULL, show_warning = FALSE )
tree |
A phylogeny with class "phylo". |
classification |
A data frame of 2 columns: genus, family. It should include all genus the tips of the tree belong to. |
process_all_tips |
Whether to find basal nodes for all tips? Default is |
genus_list |
An optinoal subset list of genus to find root information. |
family_list |
An optinoal subset list of family to find root information. This should be for species that do not have co-genus in the tree. |
show_warning |
Whether to print warning information about non-monophyletic clades or not. |
A phylogeny with basal nodes information attached.
Graft a tip to a phylogeny at location specified.
bind_tip( tree = NULL, where, tip_label, frac = 0.5, new_node_above = FALSE, node_label = NULL, return_tree = TRUE, tree_tbl = NULL, node_heights = NULL, use_castor = TRUE, sequential = TRUE )
bind_tip( tree = NULL, where, tip_label, frac = 0.5, new_node_above = FALSE, node_label = NULL, return_tree = TRUE, tree_tbl = NULL, node_heights = NULL, use_castor = TRUE, sequential = TRUE )
tree |
A phylogeny, with class of "phylo". |
where |
Location where to insert the tip. It can be either tip label or node label, but must be characters. If the location does not have a name, assign it first. |
tip_label |
Name of the new tip inserted. |
frac |
The fraction of branch length, must be between 0 and 1. This only applies when location is a tip or |
new_node_above |
Whether to insert the new node above when the location is a node? Default is |
node_label |
Name of the new node created. This only applies when location is a tip or |
return_tree |
Whether to return a phylogeny with class "phylo?" Default is |
tree_tbl |
A tibble version of the tree, optional. |
node_heights |
A named numeric vector of node hieghts of the tree, generated by |
use_castor |
Whether to use package |
sequential |
Whether to add the tip with sequential node number in the edge matrix. For example, if we want to bind a tip to a clade and the node number of the tips of this clade is from 101 to 150. We can set the node id of the new tip to 151 and push all the remaining node id to 1 after their current values. This will require us to find out the node ids of all tips that are descents of the node where we want to bind the new tip to, and it can be time costly. Yet I am still not sure whether this is necessary. Normally, the node ids of the |
Either a phylogeny or a data frame, which can be then converted to a phylogeny later.
## Not run: library(rtrees) bind_tip(tree_plant_otl, "N70407", tip_label = "test_sp") tree_plant_otl_df = tidytree::as_tibble(tree_plant_otl) node_heights = ape::branching.times(tree_plant_otl) bind_tip(tree_tbl = tree_plant_otl_df, where = "N70407", tip_label = "test_sp", node_heights = node_heights) ## End(Not run)
## Not run: library(rtrees) bind_tip(tree_plant_otl, "N70407", tip_label = "test_sp") tree_plant_otl_df = tidytree::as_tibble(tree_plant_otl) node_heights = ape::branching.times(tree_plant_otl) bind_tip(tree_tbl = tree_plant_otl_df, where = "N70407", tip_label = "test_sp", node_heights = node_heights) ## End(Not run)
Graft a tip to a phylogeny at location specified.
bind_tip_df( tree = NULL, where, tip_label, frac = 0.5, new_node_above = FALSE, node_label = NULL, return_tree = TRUE, tree_tbl = NULL, node_heights = NULL, use_castor = FALSE )
bind_tip_df( tree = NULL, where, tip_label, frac = 0.5, new_node_above = FALSE, node_label = NULL, return_tree = TRUE, tree_tbl = NULL, node_heights = NULL, use_castor = FALSE )
tree |
A phylogeny, with class of "phylo". |
where |
Location where to insert the tip. It can be either tip label or node label, but must be characters. If the location does not have a name, assign it first. |
tip_label |
Name of the new tip inserted. |
frac |
The fraction of branch length, must be between 0 and 1. This only applies when location is a tip or |
new_node_above |
Whether to insert the new node above when the location is a node? Default is |
node_label |
Name of the new node created. This only applies when location is a tip or |
return_tree |
Whether to return a phylogeny with class "phylo?" Default is |
tree_tbl |
A tibble version of the tree, optional. |
node_heights |
A named numeric vector of node hieghts of the tree, generated by |
use_castor |
Whether to use package |
Either a phylogeny or a data frame, which can be then converted to a phylogeny later.
## Not run: library(rtrees) bind_tip(tree_plant_otl, "N70407", tip_label = "test_sp") tree_plant_otl_df = tidytree::as_tibble(tree_plant_otl) node_heights = ape::branching.times(tree_plant_otl) bind_tip(tree_tbl = tree_plant_otl_df, where = "N70407", tip_label = "test_sp", node_heights = node_heights) ## End(Not run)
## Not run: library(rtrees) bind_tip(tree_plant_otl, "N70407", tip_label = "test_sp") tree_plant_otl_df = tidytree::as_tibble(tree_plant_otl) node_heights = ape::branching.times(tree_plant_otl) bind_tip(tree_tbl = tree_plant_otl_df, where = "N70407", tip_label = "test_sp", node_heights = node_heights) ## End(Not run)
Genus and family information of different groups of taxon.
Plant classification information. Its sources include:
+ based on V.PhyloMaker::nodes.info.1
+ based on The Plant List
+ taxonlookup
+ Plants of the World online
Fish classification information was based on FishBase. There are 4,825 genus in this file. https://fishtreeoflife.org/downloads/PFC_taxonomy.csv.xz
Bee classification information was from Bee Tree of Life. Note that we used 'Subfamily' in their nomenclature file as "family" here. If a genus' Subfamily is missing, we used its Family.
Bird classification information was based on BirdLife, which resulted in 2,391 genus. http://datazone.birdlife.org/species/taxonomy However, based on the taxonomy file of the Jetz et al. 2012 phylogeny, there are additional 117 genus that are not in the file of BirdLife. Both are combined here, which leads to 2,508 genus.
Mammal classification information was based on PHYLACINE, which has 1,400 genus. https://github.com/MegaPast2Future/PHYLACINE_1.2/blob/master/Data/Taxonomy/Synonymy_table_valid_species_only.csv Additional genus from Vertlife were added too. For the same genus from both PHYLACINE and Vertlife that have different family information, I used the family from Vertlife as I found that they are mostly more accurate.
Amphibian classification information was from VertLife.
Reptile classification information was largely from wikipedia.
Shark and Ray classification information was largely from NCBI.
Butterfly classification information was from Kawahara et al. 2023, using the tip labels of their phylogeny.
classifications
classifications
A data frame with three columns: genus, family, and taxon (plant
, fish
, bird
, mammal
, amphibian
, reptile
, shark_ray
, bee
, butterfly
).
Extract grafting status information as a data frame
get_graft_status(tree)
get_graft_status(tree)
tree |
A phylogeny generated by |
A tibble with three columns: tip_label, species, and status.
For a list of species, generate a phylogeny from a provided mega-tree. If a species is not in the mega-tree, it will be grafted to the mega-tree with three scenarioes.
get_one_tree( sp_list, tree, taxon, scenario = c("at_basal_node", "random_below_basal"), show_grafted = FALSE, tree_by_user = FALSE, .progress = "text", dt = TRUE )
get_one_tree( sp_list, tree, taxon, scenario = c("at_basal_node", "random_below_basal"), show_grafted = FALSE, tree_by_user = FALSE, .progress = "text", dt = TRUE )
sp_list |
A character vector or a data frame with at least three columns: species, genus, family. Species column holds the species for which we want to have a phylogeny. It can also have two optional columns: close_sp and close_genus. We can specify the closest species/genus of the species based on expert knowledge. If specified, the new species will be grafted to that particular location. It can also be a string vector if |
tree |
A mega-tree with class
|
taxon |
The taxon of species in the |
scenario |
How to insert a species into the mega-tree?
|
show_grafted |
Whether to indicate which species was grafted onto the mega-tree.
If |
tree_by_user |
Is the mega-tree provided by user? Default is |
.progress |
Form of progress bar, default to be text. |
dt |
Whether to use data.table version to bind tips bind_tip. The default is |
A phylogeny for the species required, with class phylo
.
For some taxa groups, there are multiple posterior megatrees. It is a common task to derive a phylogeny from each of these (or a random subset of) megatrees.
get_tree( sp_list, tree, taxon = NULL, scenario = c("at_basal_node", "random_below_basal"), show_grafted = FALSE, tree_by_user = FALSE, mc_cores = future::availableCores() - 2, .progress = "text", fish_tree = c("timetree", "all-taxon"), mammal_tree = c("vertlife", "phylacine"), bee_tree = c("maximum-likelihood", "bootstrap"), dt = TRUE )
get_tree( sp_list, tree, taxon = NULL, scenario = c("at_basal_node", "random_below_basal"), show_grafted = FALSE, tree_by_user = FALSE, mc_cores = future::availableCores() - 2, .progress = "text", fish_tree = c("timetree", "all-taxon"), mammal_tree = c("vertlife", "phylacine"), bee_tree = c("maximum-likelihood", "bootstrap"), dt = TRUE )
sp_list |
A character vector or a data frame with at least three columns: species, genus, family. Species column holds the species for which we want to have a phylogeny. It can also have two optional columns: close_sp and close_genus. We can specify the closest species/genus of the species based on expert knowledge. If specified, the new species will be grafted to that particular location. It can also be a string vector if |
tree |
A mega-tree with class
|
taxon |
The taxon of species in the |
scenario |
How to insert a species into the mega-tree?
|
show_grafted |
Whether to indicate which species was grafted onto the mega-tree.
If |
tree_by_user |
Is the mega-tree provided by user? Default is |
mc_cores |
Number of cores to parallel processing when |
.progress |
Form of progress bar, default to be text. |
fish_tree |
Which fish tree do you want to use? If it is "timetree" (default), it will be the smaller time tree with 11638 species that all have sequence data; if it is "all-taxon", then it will be the 100 larger posterior phylogenies with 31516 soecues. |
mammal_tree |
Which set of mammal trees to use? If it is "vertlife" (default), then 100 randomly selected posterior phylogenies provided by Vertlife will be used; if it is "phylacine", then 100 randomly selected posterior phylogenies provided by PHYLACINE will be used. |
bee_tree |
Which bee tree to use? If it is "maximum-likelihood" (default), the a single maximum likelihood tree will be used. If it is "bootstrap", then a set of 100 randomly selected posterior phylogenies will be used. All trees are provided by the Bee Tree of Life. |
dt |
Whether to use data.table version to bind tips bind_tip. The default is |
Derive a phylogeny from a mega-tree
For a list of species, generate a phylogeny or multiple phylogenies from a provided mega-tree or mega-trees. If a species is not in the mega-tree, it will be grafted to the mega-tree with two scenarios.
A phylogeny for the species required, with class phylo
;
or a list of phylogenies with class multiPhylo
depends on the input tree
. Within each phylogeny, the grafted status of all species was saved as a data frame named as "graft_status".
test_sp = c("Serrasalmus_geryi", "Careproctus_reinhardti", "Gobiomorphus_coxii", "Periophthalmus_barbarus", "Prognichthys_glaphyrae", "Barathronus_bicolor", "Knipowitschia_croatica", "Rhamphochromis_lucius", "Neolissochilus_tweediei", "Haplochromis_nyanzae", "Astronesthes_micropogon", "Sanopus_reticulatus") test_tree = get_tree(sp_list = test_sp, taxon = "fish", show_grafted = TRUE)
test_sp = c("Serrasalmus_geryi", "Careproctus_reinhardti", "Gobiomorphus_coxii", "Periophthalmus_barbarus", "Prognichthys_glaphyrae", "Barathronus_bicolor", "Knipowitschia_croatica", "Rhamphochromis_lucius", "Neolissochilus_tweediei", "Haplochromis_nyanzae", "Astronesthes_micropogon", "Sanopus_reticulatus") test_tree = get_tree(sp_list = test_sp, taxon = "fish", show_grafted = TRUE)
Remove trailing *
rm_stars(tree)
rm_stars(tree)
tree |
A phylogeny generated by |
A phylogeny after removing trailing stars.
Convert a vector of species names to a data frame
sp_list_df(sp_list, taxon)
sp_list_df(sp_list, taxon)
sp_list |
A string vector or a data frame with at least one column named "species". |
taxon |
The taxon group of this species list. If not specified, only species and genus will be returned. |
A data frame with columns: species, genus, and family (if taxon
is specified).
sp_list_df(sp_list = c("Serrasalmus_geryi", "Careproctus_reinhardti", "Gobiomorphus_coxii"), taxon = "fish")
sp_list_df(sp_list = c("Serrasalmus_geryi", "Careproctus_reinhardti", "Gobiomorphus_coxii"), taxon = "fish")
Supported taxonomic groups with mega-trees provided in the megatrees package.
taxa_supported
taxa_supported
An object of class character
of length 9.