Popularity
4.9
Stable
Activity
8.0
-
613
38
218

Code Quality Rank: L1
Programming language: C++
License: MIT License
Tags: Biology    
Latest version: v1.0.4

Vcflib alternatives and similar libraries

Based on the "Biology" category.
Alternatively, view Vcflib alternatives based on common mentions on social networks and blogs.

Do you think we are missing an alternative of Vcflib or a related project?

Add another 'Biology' Library

README

vcflib

A C++ library for parsing and manipulating VCF files.

Github-CI AnacondaBadge DL BrewBadge GuixBadge DebianBadge C++0x Chat on Matrix

Vcflib and related tools are the workhorses in bioinformatics for processing the VCF variant calling format. See

Vcflib and tools for processing the VCF variant call format; Erik Garrison, Zev N. Kronenberg, Eric T. Dawson, Brent S. Pedersen, Pjotr Prins; doi: https://doi.org/10.1101/2021.05.21.445151

news

May 2022: the vcflib paper has been published on PLoS Computational Biology!

See below for the citation.

April 2022: vcflib has just gone pangenome!

By introducing the wavefront algorithm we can now realign long sequences and reduce call complexity (and FPs!) introduced by pangenome variant callers using the new [vcfwave](./doc/vcfwave.md) tool.

See also [RELEASE_NOTES.md](./RELEASE_NOTES.md)

overview

The Variant Call Format (VCF) is a flat-file, tab-delimited textual format that describes reference-indexed variations between individuals. VCF provides a common interchange format for the description of variation in individuals and populations of samples, and has become the de facto standard reporting format for a wide array of genomic variant detectors.

vcflib provides methods to manipulate and interpret sequence variation described by VCF. It is both:

  • an API for parsing and operating on records of genomic variation as it can be described by the VCF format
  • a collection of command-line utilities for executing complex manipulations on VCF files

vclib is both a library (with an API) and a collection of useful tools. The API provides a quick and extremely permissive method to read and write VCF files. Extensions and applications of the library provided in the included utilities (*.cpp) comprise the vast bulk of the library's utility.

We have also added infrastructure to write Python bindings. See below.


Short index:


INSTALL

For latest updates see [RELEASE NOTES](./RELEASE_NOTES.md).

Bioconda

Conda installs in user land without root access

conda install -c bioconda vcflib

Homebrew

Homebrew installs on Linux and Mac OSX

brew install brewsci/bio/vcflib

Debian

For Debian and Ubuntu

apt-get install libvcflib-tools libvcflib-dev

GNU Guix

We develop against guix and vcflib is packaged as

guix package -i vcflib

See also the Guix shell below.

USAGE

Users are encouraged to drive the utilities in the library in a streaming fashion, using Unix pipes to fully utilize resources on multi-core systems. Piping provides a convenient method to interface with other libraries (vcf-tools, BedTools, GATK, htslib, bio-vcf, bcftools, freebayes) which interface via VCF files, allowing the composition of an immense variety of processing functions. Examples can be found in the scripts, e.g. [script](./scripts/vcfgtcompare.sh).

TOOLS

<!--

>>> from pytest.rtest import run_stdout, head, cat

-->

<!-- Created with ./scripts/bin2md.rb --index -->

filter

filter command description
[vcfuniq](./doc/vcfuniq.md)
[vcfuniqalleles](./doc/vcfuniqalleles.md)
[vcffilter](./doc/vcffilter.md)

metrics

metrics command description
[vcfcheck](./doc/vcfcheck.md)
[vcfhethomratio](./doc/vcfhethomratio.md)
[vcfhetcount](./doc/vcfhetcount.md)
[vcfdistance](./doc/vcfdistance.md)
[vcfentropy](./doc/vcfentropy.md)

phenotype

phenotype command description
[permuteGPAT++](./doc/permuteGPAT++.md)

genotype

genotype command description
[normalize-iHS](./doc/normalize-iHS.md)
[hapLrt](./doc/hapLrt.md)
[abba-baba](./doc/abba-baba.md)

transformation

transformation command description
[vcfinfo2qual](./doc/vcfinfo2qual.md)
[vcfsamplediff](./doc/vcfsamplediff.md)
[vcfaddinfo](./doc/vcfaddinfo.md)
[vcfremoveaberrantgenotypes](./doc/vcfremoveaberrantgenotypes.md)
[vcfglxgt](./doc/vcfglxgt.md)
[dumpContigsFromHeader](./doc/dumpContigsFromHeader.md)
[vcfevenregions](./doc/vcfevenregions.md)
[vcfcat](./doc/vcfcat.md)
[vcfannotategenotypes](./doc/vcfannotategenotypes.md)
[vcfafpath](./doc/vcfafpath.md)
[vcfclassify](./doc/vcfclassify.md)
[vcfallelicprimitives](./doc/vcfallelicprimitives.md)
[vcfqual2info](./doc/vcfqual2info.md)
[vcfcreatemulti](./doc/vcfcreatemulti.md)
[vcfgeno2alleles](./doc/vcfgeno2alleles.md)
[vcfsample2info](./doc/vcfsample2info.md)
[vcfld](./doc/vcfld.md)
[vcfnumalt](./doc/vcfnumalt.md)
[vcfstreamsort](./doc/vcfstreamsort.md)
[vcfinfosummarize](./doc/vcfinfosummarize.md)
[vcflength](./doc/vcflength.md)
[vcfkeepgeno](./doc/vcfkeepgeno.md)
[vcfcombine](./doc/vcfcombine.md)
[vcfprimers](./doc/vcfprimers.md)
[vcfflatten](./doc/vcfflatten.md)
[vcf2dag](./doc/vcf2dag.md)
[vcfcleancomplex](./doc/vcfcleancomplex.md)
[vcfbreakmulti](./doc/vcfbreakmulti.md)
[vcfindex](./doc/vcfindex.md)
[vcfkeepinfo](./doc/vcfkeepinfo.md)
[vcfgeno2haplo](./doc/vcfgeno2haplo.md)
[vcfintersect](./doc/vcfintersect.md)
[vcfannotate](./doc/vcfannotate.md)
[smoother](./doc/smoother.md)
[vcf2fasta](./doc/vcf2fasta.md)
[vcfsamplenames](./doc/vcfsamplenames.md)
[vcfleftalign](./doc/vcfleftalign.md)
[vcfglbound](./doc/vcfglbound.md)
[vcfcommonsamples](./doc/vcfcommonsamples.md)
[vcfecho](./doc/vcfecho.md)
[vcfkeepsamples](./doc/vcfkeepsamples.md)
[vcf2tsv](./doc/vcf2tsv.md)
[vcfoverlay](./doc/vcfoverlay.md)
[vcfgenosamplenames](./doc/vcfgenosamplenames.md)
[vcfremovesamples](./doc/vcfremovesamples.md)
[vcfremap](./doc/vcfremap.md)
[vcffixup](./doc/vcffixup.md)

statistics

statistics command description
[vcfgenosummarize](./doc/vcfgenosummarize.md)
[vcfcountalleles](./doc/vcfcountalleles.md)
[meltEHH](./doc/meltEHH.md)
[genotypeSummary](./doc/genotypeSummary.md)
[vcfrandomsample](./doc/vcfrandomsample.md)
[pVst](./doc/pVst.md)
[vcfrandom](./doc/vcfrandom.md)
[segmentFst](./doc/segmentFst.md)
[sequenceDiversity](./doc/sequenceDiversity.md)
[segmentIhs](./doc/segmentIhs.md)
[vcfgenotypes](./doc/vcfgenotypes.md)
[vcfaltcount](./doc/vcfaltcount.md)
[plotHaps](./doc/plotHaps.md)
[vcfsitesummarize](./doc/vcfsitesummarize.md)
[vcfgenotypecompare](./doc/vcfgenotypecompare.md)
[vcfstats](./doc/vcfstats.md)
[wcFst](./doc/wcFst.md)
[permuteSmooth](./doc/permuteSmooth.md)
[bFst](./doc/bFst.md)
[vcfroc](./doc/vcfroc.md)
[vcfparsealts](./doc/vcfparsealts.md)
[pFst](./doc/pFst.md)
[iHS](./doc/iHS.md)
[popStats](./doc/popStats.md)

See also [vcflib.md](./doc/vcflib.md).

scripts

The vcflib source repository contains a number of additional scripts. Click on the link to see the source code.

script description
[vcfclearinfo](./scripts/vcfclearinfo) clear INFO field
[vcfqualfilter](./scripts/vcfqualfilter) quality filter
[vcfnulldotslashdot](./scripts/vcfnulldotslashdot) rewrite null genotypes to ./.
[vcfprintaltdiscrepancy.r](./scripts/vcfprintaltdiscrepancy.r) show ALT discrepancies in a table
[vcfremovenonATGC](./scripts/vcfremovenonATGC) remove non-nucleotides in REF or ALT
[plotSmoothed.R](./scripts/plotSmoothed.R) smooth plot of wcFst, pFst or abba-baba
[vcf_strip_extra_headers](./scripts/vcf_strip_extra_headers) strip headers
[plotHapLrt.R](./scripts/plotHapLrt.R) plot results of pFst
[vcfbiallelic](./scripts/vcfbiallelic) remove anything that is not biallelic
[vcfsort](./scripts/vcfsort) sort VCF using shell script
[vcfnosnps](./scripts/vcfnosnps) remove SNPs
[vcfmultiwayscripts](./scripts/vcfmultiwayscripts) more multiway comparisons
[vcfgtcompare.sh](./scripts/vcfgtcompare.sh) annotates records in the first file with genotypes and sites from the second
[plotPfst.R](./scripts/plotPfst.R) plot pFst
[vcfregionreduce_and_cut](./scripts/vcfregionreduce_and_cut) reduce, gzip, and tabix
[plotBfst.R](./scripts/plotBfst.R) plot results of pFst
[vcfnobiallelicsnps](./scripts/vcfnobiallelicsnps) remove biallelic SNPs
[vcfindels](./scripts/vcfindels) show INDELS
[vcfmultiway](./scripts/vcfmultiway) multiway comparison
[vcfregionreduce](./scripts/vcfregionreduce) reduce VCFs using a BED File, gzip them up and create tabix index
[vcfprintaltdiscrepancy.sh](./scripts/vcfprintaltdiscrepancy.sh) runner
[vcfclearid](./scripts/vcfclearid) clear ID field
[vcfcomplex](./scripts/vcfcomplex) remove all SNPs but keep SVs
[vcffirstheader](./scripts/vcffirstheader) show first header
[plotXPEHH.R](./scripts/plotXPEHH.R) plot XPEHH
[vcfregionreduce_pipe](./scripts/vcfregionreduce_pipe) reduce, gzip and tabix in a pipe
[vcfplotaltdiscrepancy.sh](./scripts/vcfplotaltdiscrepancy.sh) plot ALT discrepancy runner
[vcfplottstv.sh](./scripts/vcfplottstv.sh) runner
[vcfnoindels](./scripts/vcfnoindels) remove INDELs
[bgziptabix](./scripts/bgziptabix) runs bgzip on the input and tabix indexes the result
[plotHaplotypes.R](./scripts/plotHaplotypes.R) plot results
[vcfplotsitediscrepancy.r](./scripts/vcfplotsitediscrepancy.r) plot site discrepancy
[vcfindelproximity](./scripts/vcfindelproximity) show SNPs around an INDEL
[bed2region](./scripts/bed2region) convert VCF CHROM column in VCF file to region
[vcfplotaltdiscrepancy.r](./scripts/vcfplotaltdiscrepancy.r) plot ALT discrepancies
[plot_roc.r](./scripts/plot_roc.r) plot ROC
[vcfmultiallelic](./scripts/vcfmultiallelic) remove anything that is not multiallelic
[vcfsnps](./scripts/vcfsnps) show SNPs
[vcfvarstats](./scripts/vcfvarstats) use fastahack to get stats
[vcfregionreduce_uncompressed](./scripts/vcfregionreduce_uncompressed) reduce, gzip and tabix
[plotWCfst.R](./scripts/plotWCfst.R) plot wcFst
[vcf2bed.py](./scripts/vcf2bed.py) transform VCF to BED file
[vcfjoincalls](./scripts/vcfjoincalls) overlay files using QUAL and GT from a second VCF
[vcf2sqlite.py](./scripts/vcf2sqlite.py) push VCF file into SQLite3 database using dbname

python

vcflib has rudimentary python bindings, but the are easy to build up on. See [pyvcflib](./test/pytest/pyvcflib.md).

Development

build from source

VCFLIB uses the cmake build system, after a recursive checkout of the sources make the files in the ./build directory with:

git clone --recursive https://github.com/vcflib/vcflib.git
cd vcflib
mkdir -p build && cd build
cmake  -DCMAKE_BUILD_TYPE=Debug -DZIG=OFF -DOPENMP=OFF ..
cmake --build .
cmake --install .

and to run the tests

ctest --verbose

Executables are built into the ./build directory in the repository.

Note, if you have an existing repo update submodules with

git submodule update --init --recursive --progress
cd build
cmake --build . --target clean

Build dependencies can be viewed in the github-CI scripts (see badges above), as well as [guix.scm](./guix.scm) used by us to create the build environment (for instructions see the header of guix.scm). Essentially:

  • cmake
  • C++ compiler
  • htslib
  • tabixpp
  • WFA2
  • pybind11 (for testing)

For include files add

  • libhts-dev
  • libtabixpp-dev
  • libtabixpp0

And for some of the VCF executables

  • python
  • perl

Using a different htslib

Check out htslib in tabixpp (recursively) and

cmake -DHTSLIB_LOCAL:STRING=./htslib/ ..
cmake --build .

link library

The standard build creates build/vcflib.a. Take a hint from the [cmake](./CMakeLists.txt) file that builds all the vcflib tools.

source code

See [vcfecho.cpp](./src/vcfecho.cpp) for basic usage. [Variant.h](./src/Variant.h) and [Variant.cpp](./src/Variant.cpp) describe methods available in the API. vcflib is incorporated into several projects, such as freebayes, which may provide a point of reference for prospective developers. Note vcflib contains submodules (git repositories) comprising some dependencies. A full Guix development environment we use is defined [here](./guix.scm).

adding tests

vcflib uses different test systems. The most important one is the doctest because it doubles as documentation. For an example see [vcf2tsv.md](./test/pytest/vcf2tsv.md) which can be run from the command line with

cd test
python3 -m doctest -o NORMALIZE_WHITESPACE -o REPORT_UDIFF pytest/vcf2tsv.md

We also added support for python bindings and unit tests. See [realign.py](./test/tests/realign.py) for an example.

Support

The developers are on the vcflib matrix channel. Please do not use the github issue tracker for support issues!

Contributing

To contribute code to vcflib send a github pull request. We may ask you to add a working test case as described in 'adding tests'.

LICENSE

This software is distributed under the free software [MIT LICENSE](./LICENSE).

CREDIT

Citations are the bread and butter of Science. If you are using this software in your research and want to support our future work, please cite the following publication:

Vcflib and tools for processing the VCF variant call format; Erik Garrison, Zev N. Kronenberg, Eric T. Dawson, Brent S. Pedersen, Pjotr Prins; doi: https://doi.org/10.1101/2021.05.21.445151

Bibtex reference

Please cite: A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar.

@article{10.1371/journal.pcbi.1009123,
    doi = {10.1371/journal.pcbi.1009123},
    author = {Garrison, Erik AND Kronenberg, Zev N. AND Dawson, Eric T. AND Pedersen, Brent S. AND Prins, Pjotr},
    journal = {PLOS Computational Biology},
    publisher = {Public Library of Science},
    title = {A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar},
    year = {2022},
    month = {05},
    volume = {18},
    url = {https://doi.org/10.1371/journal.pcbi.1009123},
    pages = {1-15}
}

Below the prepublished version of our paper

@article {Garrison2021.05.21.445151,
    author = {Garrison, Erik and Kronenberg, Zev N. and Dawson, Eric T. and Pedersen, Brent S. and Prins, Pjotr},
    title = {Vcflib and tools for processing the VCF variant call format},
    elocation-id = {2021.05.21.445151},
    year = {2021},
    doi = {10.1101/2021.05.21.445151},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2021/05/23/2021.05.21.445151},
    eprint = {https://www.biorxiv.org/content/early/2021/05/23/2021.05.21.445151.full.pdf},
    journal = {bioRxiv}
}


*Note that all licence references and agreements mentioned in the Vcflib README section above are relevant to that project's source code only.