Welcome to pypgen’s documentation!

Pypgen provides various utilities for estimating standard genetic diversity measures including Gst, G’st, G’‘st, and Jost’s D from large genomic datasets (Hedrick, 2005; Jost, 2008; Masatoshi Nei, 1973; Nei & Chesser, 1983). Pypgen operates both on individual SNPs as well as on user defined regions (e.g., five kilobase windows tiled across each chromosome). For the windowed analyses pypgen estimates the multi-locus versions of each estimator.


  • Handles multiallelic SNP calls
  • Allows a single VCF file to contain multiple populations
  • Operates on standard VCF (Variant Call Format) formatted SNP calls
  • Uses bgziped input for fast random access
  • Takes advantage of multiple processor cores
  • Calculates additional metrics:
    • snp count per window
    • mean read depth (+/- STDEV) per window
    • populations with fixed alleles per SNP


Pypgen is written in Python 2.7. It may run under Python 2.6, but I haven’t tested it. It doesn’t run under Python 3. In order to interact with bgziped files it requires samtools and pysam to be installed.

Quick Installation:

If you already have a working install of pysam, pypgen can be installed from PyPi using pip or setuptools:

pip install pypgen


easy_install -U pypgen

However, it’s recommended, at least in these early days of pypgen, to install it directly from the github repository:

pip install -e git+https://github.com/ngcrawford/pypgen.git#egg=Package

Reporting Problems:

If you have a general questions about pypgen you should post them on biostar and tag it pypgen. Detailed questions about If you think you’ve found a bug in pypgen you can open an issue in the pypgen github repo.

Indices and tables