Welcome to pypgen’s documentation!¶
Pypgen provides various utilities for estimating standard genetic diversity measures including Gst, G’st, G’‘st, and Jost’s D from large genomic datasets (Hedrick, 2005; Jost, 2008; Masatoshi Nei, 1973; Nei & Chesser, 1983). Pypgen operates both on individual SNPs as well as on user defined regions (e.g., five kilobase windows tiled across each chromosome). For the windowed analyses pypgen estimates the multi-locus versions of each estimator.
- Handles multiallelic SNP calls
- Allows a single VCF file to contain multiple populations
- Operates on standard VCF (Variant Call Format) formatted SNP calls
- Uses bgziped input for fast random access
- Takes advantage of multiple processor cores
- Calculates additional metrics:
- snp count per window
- mean read depth (+/- STDEV) per window
- populations with fixed alleles per SNP
Pypgen is written in Python 2.7. It may run under Python 2.6, but I haven’t tested it. It doesn’t run under Python 3. In order to interact with bgziped files it requires samtools and pysam to be installed.
If you already have a working install of pysam, pypgen can be installed from PyPi using pip or setuptools:
pip install pypgen
easy_install -U pypgen
However, it’s recommended, at least in these early days of pypgen, to install it directly from the github repository:
pip install -e git+https://github.com/ngcrawford/pypgen.git#egg=Package
If you have a general questions about pypgen you should post them on biostar and tag it
pypgen. Detailed questions about If you think you’ve found a bug in pypgen you can open an issue in the pypgen github repo.