Package 'mmod'

Title: Modern Measures of Population Differentiation
Description: Provides functions for measuring population divergence from genotypic data.
Authors: David Winter [aut, cre], Peter Green [ctb], Zhian Kamvar [ctb], Thierry Gosselin [ctb]
Maintainer: David Winter <[email protected]>
License: MIT + file LICENSE
Version: 1.3.3
Built: 2024-11-04 03:49:12 UTC
Source: https://github.com/dwinter/mmod

Help Index


as.genind.DNAbin

Description

Convert a DNAbin object into a genind object

Usage

as.genind.DNAbin(x, pops)

Arguments

x

object of class DNAbin

pops

vector of population assignemnts for each sequence

Value

genind

Examples

library(pegas)
data(woodmouse)
wm <- as.genind.DNAbin(woodmouse, rep(c("A", "B", "C"), each=5))
diff_stats(wm)

Produce bootstrap samples from each subpopulation of a genind object

Description

This function produces bootstrap samples from a genind object, with each subpopulation resampled according to its size. Because there are many statistics that you may wish to calculte from these samples, this function returns a list of genind objects representing bootsrap samples that can then be futher processed (see examples).

Usage

chao_bootstrap(x, nreps = 1000)

Arguments

x

genind object (from package adegenet)

nreps

numeric number of bootstrap replicates to perform (default 1000)

Details

You should note, this is a standard (frequentist) approach to quantifying uncertainty - effectively asking "if the population was exactly like our sample, and we repeatedly took samples like this from it, how much would those samples vary?" The confidence intervals don't include uncertainty produced from any biases in the way you collected your data. Additionally, this boostrapping procedure displays a slight upward bias for some datasets. If you plan or reporting a confidence interval for your statistic, it is probably a good idea to subtract the difference between the point estimate of the statistic and the mean of the boostrap distribution from the extremes of the interval (as demonstrated in the expample below)

Value

A list of genind objects

References

Chao, A. et al. (2008). A Two-Stage probabilistic approach to Multiple-Community similarity indices. Biometrics, 64:1178-1186

See Also

Other resample: jacknife_populations, summarise_bootstrap

Examples

## Not run:   
data(nancycats)
obs.D <- D_Jost(nancycats)
bs <- chao_bootstrap(nancycats)
bs_D <- summarise_bootstrap(bs, D_Jost)
bias <- bs.D$summary.global.het[1] - obs.D$global.het
bs.D$summary.global.het - bias

## End(Not run)

Calculate Jost's D

Description

This function calculates Jost's D from a genind object

Usage

D_Jost(x, hsht_mean = "arithmetic")

Arguments

x

genind object (from package adegenet)

hsht_mean

The type of mean to use to calculate values of Hs and Ht for a global estimate. (Default is teh airthmetic mean, can also be set to the harmonic mean).

Details

Takes a genind object with population information and calculates Jost's D Returns a list with values for each locus as well as two global estimates. 'global.het' uses the averages of Hs and Ht across all loci while 'global.harm_mean' takes the harmonic mean of all loci.

Because estimators of Hs and Ht are used, its possible to have negative estimates of D. You should treat these as numbers close to zero.

Value

per.locus values for each D for each locus in the dataset

global estimtes for D based on overall heterozygosity or the harmonic mean of values for each locus

References

Jost, L. (2008), GST and its relatives do not measure differentiation. Molecular Ecology, 17: 4015-4026.

See Also

Other diffstat: Gst_Hedrick, Gst_Nei, Phi_st_Meirmans, diff_stats

Other D: pairwise_D

Examples

data(nancycats)
D_Jost(nancycats)
D_Jost(nancycats, hsht_mean= "arithmetic")

Calculate differentiation statistics for a genind object

Description

By default this function calculates three different statistics of differentiation for a genetic dataset. Nei's Gst, Hedrick's G”st and Jost's D. Optionally, it can also calculate Phi'st, which is not calculated by default as it can take somewhat more time to run.

Usage

diff_stats(x, phi_st = FALSE)

Arguments

x

genind object (from package adegenet)

phi_st

Boolean Calculate Phi_st (default is FALSE)

Details

See individual functions (listed below) for more details.

Value

per.locus values for each statistic for each locus in the dataset

global estimtes for these statistics across all loci in the dataset

References

Hedrick, PW. (2005), A Standardized Genetic Differentiation Measure. Evolution 59: 1633-1638.

Jost, L. (2008), GST and its relatives do not measure differentiation. Molecular Ecology, 17: 4015-4026.

Meirmans PG, Hedrick PW (2011), Assessing population structure: FST and related measures. Molecular Ecology Resources, 11:5-18

Nei M. (1973) Analysis of gene diversity in subdivided populations. PNAS: 3321-3323.

Nei M, Chesser RK. (1983). Estimation of fixation indices and gene diversities. Annals of Human Genetics. 47: 253-259.

Meirmans, PW. (2005), Using the AMOVA framework to estimate a standardized genetic differentiation measure. Evolution 60: 2399-402.

Excoffier, L., Smouse, P., Quattro, J. (1992), Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 131: 479-91

See Also

Other diffstat: D_Jost, Gst_Hedrick, Gst_Nei, Phi_st_Meirmans

Examples

data(nancycats)
diff_stats(nancycats)

An exact test of population differentiation for genind objects

Description

This function uses Fisher's exact test to determine if alleles in sub-populations are drawn randomly from a larger population (i.e. a significance test for allelic differentiation among sub-populations).

Usage

diff_test(x, sim = TRUE, nreps = 2000)

Arguments

x

a genind object (from package adegenet)

sim

boolean: if TRUE simulate p-value by using an MCMC sample of those tables that have the same marginal totals as the observed data (required for all but the smallest datasets)

nreps

number of steps used to simulate p-value (default 2000)

Details

Note, this test returns p-values for each locus in a dataset _not_ estimates of effect size. Since most populations have some degree of population differentiation, very large samples are almost guaranteed to return significant results. Refer to estimates of the various differentiation statistics (D, G”ST and Phi'ST)to ascertain how meaningful such results might be.

Value

named vector of p-values testing the null hypothesis these samples where drawn from a panmictic population.

See Also

fisher.test, which this function wraps

Examples

data(nancycats)
diff_test(seploc(nancycats)[[2]], nreps=100)

Calculate distance between individual for co-dominant locus

Description

This function calculates the distance between individuals in a genind object based on their genotypes. Specifically, the simple metric of Kosman and Leonard (2005) in which distance is calculated as a propotion of shared alleles at each locus.

Usage

dist.codom(x, matrix = TRUE, global = TRUE, na.rm = TRUE)

Arguments

x

genind object (from package adegenet)

matrix

boolean: if TRUE return matrix (dist object if FALSE)

global

boolean: if TRUE, return a single global estimate based on all loci. If FALSE return a list of matrices for each locus. if FALSE

na.rm

boolean: if TRUE remove individuals with NAs

Value

either a list of distance matrices, one for each locus or a single matrix containing the mean distance between individuals across all loci

Dropped for each distance matrix and object of class "na.action" containing indices to those indivudals in the genind object which where omitted due to having NAs

References

Kosman E., Leonard, K.J. Similarity coefficients for molecular markers in studies of genetic relationships between individuals for haploid diploid, and polyploid species. Molecular Ecology. 14: 415-424

Examples

data(nancycats)
dm <- dist.codom(nancycats[40:45], matrix=FALSE)
head(dm)

Calculate Nei's Gst using estimators for Hs and Ht

Description

This function calculates Hedrick's G'st from a genind object

Usage

Gst_Hedrick(x)

Arguments

x

genind object (from package adegenet)

Details

Takes a genind object with population information and calculates Hedrick's G”st.

Because estimators of Hs and Ht are used, it's possible to have negative estimates of G”st. You should treat such results as zeros (or an attempt to estimate a very low number with some error which might push it below zero)

Value

per.locus values for each G”st for each locus in the dataset

global estimtes for G”st based on overall heterozygosity

References

Hedrick, PW. (2005), A Standardized Genetic Differentiation Measure. Evolution 59: 1633-1638.

Meirmans PG, Hedrick PW (2011), Assessing population structure: FST and related measures. Molecular Ecology Resources, 11:5-18

See Also

Other diffstat: D_Jost, Gst_Nei, Phi_st_Meirmans, diff_stats

Other Hedrick: pairwise_Gst_Hedrick

Examples

data(nancycats) 
Gst_Hedrick(nancycats)

Calculate Nei's Gst using estimators for Hs and Ht

Description

This function calculates Gst following Nei's method and using Nei and Chesser's estimators for Hs and Ht

Usage

Gst_Nei(x)

Arguments

x

genind object (from package adegenet)

Value

per.locus estimates of Gst for each locus in the dataset

per.locus estimates of Gst for across all loci

References

Nei M. (1973) Analysis of gene diversity in subdivided populations. PNAS: 3321-3323.

Nei M, Chesser RK. (1983). Estimation of fixation indices and gene diversities. Annals of Human Genetics. 47: 253-259.

See Also

Other diffstat: D_Jost, Gst_Hedrick, Phi_st_Meirmans, diff_stats

Other Nei: pairwise_Gst_Nei

Examples

data(nancycats)
Gst_Nei(nancycats)

Harmonic mean

Description

Calculate the harmonic mean of a numeric vector (will return NA if there are any negative numbers in the vector)

Usage

harmonic_mean(x, na.rm = TRUE)

Arguments

x

numeric vector

na.rm

logical remove NAs prior or calculation

Value

harmonic mean of vector

Examples

data(nancycats)
pop.sizes <- table(pop(nancycats))
harmonic_mean(pop.sizes)

Create jacknife samples of a genind object by population

Description

Makes a series of jacknife samples across populations from a genind object. This function returns a list of genind objects that can then be further processed (see examples below).

Usage

jacknife_populations(x, sample_frac = 0.5, nreps = 1000)

Arguments

x

genind object (from package adegenet)

sample_frac

fraction of pops to sample in each replication (default 0.5)

nreps

number of jacknife replicates to run (default 1000)

Value

a list of genind objects to be further processed

See Also

Other resample: chao_bootstrap, summarise_bootstrap

Examples

## Not run:   
data(nancycats)
obs <- diff_stats(nancycats)
jn <- jacknife_populations(nancycats)
jn.D <- summarise_bootstrap(jn, D_Jost)

## End(Not run)

Modern Measures of Differentiation

Description

Population geneticists have traditionally used Nei's Gst (often confusingly called Fst...) to measure divergence between populations. Recently, it has become clear that simple intereptations of the value of Gst can be misleading. For this reason several new measures differntiation have been developed. mmod is a package that brings some of these measures to R.

Details

The vignette for this package ( avaliable using vignette("demo", package="mmod") from within R) contains an introduction to these methods and and example usage for this package. I strongly suggest new users start by reading this documentation.


Calculates pairwise values of Jost's D

Description

This function calculates Jost's D, a measure of genetic differentiation, between all combinations of populaitons in a genind object.

Usage

pairwise_D(x, linearized = FALSE, hsht_mean = "arithmetic")

Arguments

x

genind object (from package adegenet)

linearized

logical, if TRUE will turned linearized D (1/1-D)

hsht_mean

type of mean to use for the global estimates of Hs and Ht default it "arithmetic", can also be set to "harmonic".

Value

A distance matrix with between-population values of D

References

Jost, L. (2008), GST and its relatives do not measure differentiation. Molecular Ecology, 17: 4015-4026.

See Also

Other pairwise: pairwise_Gst_Hedrick, pairwise_Gst_Nei

Other D: D_Jost

Examples

data(nancycats)
pairwise_D(nancycats[1:26,])

Calculates pairwise values of Hedrick's G'st

Description

This function calculates Hedrick's G'st, a measure of genetic differentiation, between all combinations of populaitons in a genind object.

Usage

pairwise_Gst_Hedrick(x, linearized = FALSE)

Arguments

x

genind object (from package adegenet)

linearized

logical, if TRUE will turned linearized G'st (1/()1-G'st))

Value

A distance matrix with between-population values of G”st

References

Hedrick, PW. (2005), A Standardized Genetic Differentiation Measure. Evolution 59: 1633-1638.

See Also

Other pairwise: pairwise_D, pairwise_Gst_Nei

Other Hedrick: Gst_Hedrick

Examples

data(nancycats)
pairwise_Gst_Hedrick(nancycats[1:26,])

Calculates pairwise values of Nei's Gst

Description

This function calculates Nei's Gst, a measure of genetic differentiation, between all combinations of populaitons in a genind object.

Usage

pairwise_Gst_Nei(x, linearized = FALSE)

Arguments

x

genind object (from package adegenet)

linearized

logical, if TRUE will turned linearized Gst (1/(1-Gst))

Value

dist A distance matrix with between-population values of Gst

References

Nei M. (1973) Analysis of gene diversity in subdivided populations. PNAS: 3321-3323.

Nei M, Chesser RK. (1983). Estimation of fixation indices and gene diversities. Annals of Human Genetics. 47: 253-259.

See Also

Other pairwise: pairwise_D, pairwise_Gst_Hedrick

Other Nei: Gst_Nei

Examples

data(nancycats)
pairwise_Gst_Nei(nancycats[1:26,])

Calculate Phi_st from a genind object

Description

This function calculates Meirmans' corrected version of Phi_st, an Fst analog produced using the AMOVA framework. Note, the global estimate produced by this function is calculated as the mean distance between individuals across all loci, and this exlcuded individuals with one or more missing value.

Usage

Phi_st_Meirmans(x)

Arguments

x

genind object (from package adegenet)

Value

per.locus Phi_st estimate for each locus

global Phi_st estimate across all loci

References

Meirmans, PW. (2005), Using the AMOVA framework to estimate a standardized genetic differentiation measure. Evolution 60: 2399-402.

Excoffier, L., Smouse, P., Quattro, J. (1992), Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 131: 479-91

See Also

Other diffstat: D_Jost, Gst_Hedrick, Gst_Nei, diff_stats

Examples

data(nancycats)
Phi_st_Meirmans(nancycats[1:26,])

Randomly create genotypes

Description

Use the multinomial distribution to randomly create genotpes for individuals for given allele frequences. By default this function returns a matrix of with alleles in rows and individuals in columns. There is an option to return a genind object representing the same data (see examples).

Usage

rgenotypes(n, ploidy, probs, genind = FALSE, pop_name = "A",
  loc_name = "L1")

Arguments

n

integer number of indviduals.

ploidy

integer number of alleles to asign to each individual.

probs

vector of probabilies corresponding to allele frequences.

genind

boolean if TRUE return a genind object

pop_name

charcter Name for population defined in genind object (not required if genind is not TRUE)

loc_name

character name to five locus in genind object

Details

Used in chao_bootstrap, also exported as it may come in handy for other simulations.

Value

Either a matrix with individuals in columns, alleles in rows or, if genind is TRUE a genind object for one population and locus.

See Also

rmultinom which this function wraps.

Examples

data(nancycats)
obs_allele_freqs <- apply(nancycats$tab[,1:16], 2,mean, na.rm=TRUE)
rgenotypes(10, 2, obs_allele_freqs)

Apply a differentiation statistic to a bootstrap sample

Description

This function applies a differentiation statistic (eg, D_Jost, Gst_Hedrick or Gst_Nei) to a list of genind objects, possibly produced with chao_bootsrap or jacknife_populations.

Usage

summarise_bootstrap(bs, statistic)

Arguments

bs

list of genind objects

statistic

differentiation statistic to apply (the function itself, as with apply family functions)

Details

Two different approaches are used for calculating confidence intervals in the results. The estimates given by lower.percentile and upper.percentile are simply the 2.5th and 97.5th precentile of the statistic across bootstrap samples. Note, the presence or rare alleles in some populations can bias bootstrapping procedures such that these intervals are not centered on the observed value. The mean of statistic across samples is returned as mean.bs and can be used to correct biased bootsrap samples. Alternatively, lower.normal and upper.normal form a confidence interval centered on the observed value of the statistic and using the standard deviation of the statistic across replicates to generate limits (sometimes called the normal-method of obtaining a confidence interval). The print function for objects returned by this function displays the normal-method confidence intervals.

Value

per.locus: matirx of statistics calculated for each locus (column) and each bootstrap replicate (row).

global.het: vector of global estimates calculated from overall heterozygosity

global.het: vector of global estimates calculated from harmonic mean of statistic (only applied to D_Jost)

summary.loci: data.frame summarising the distribution of the chosen statistic across replicates. Details of the different confidence intervals are given in details

summary.global_het: A vector containing the same measures as summary.loci but for a global value of the statistic calculated from all loci

summary.global_harm: As with summary.global_het but calculated from the harmonic mean of the statistic across loci (only applies to D_Jost)

See Also

Other resample: chao_bootstrap, jacknife_populations

Examples

## Not run:   
data(nancycats)
bs <- chao_bootstrap(nancycats)
summarise_bootstrap(bs, D_Jost)

## End(Not run)