Vcflib alternatives and similar libraries
Based on the "Biology" category.
Alternatively, view Vcflib alternatives based on common mentions on social networks and blogs.
InfluxDB - Purpose built for real-time analytics at any scale.
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of Vcflib or a related project?
Popular Comparisons
README
vcflib
A C++ library for parsing and manipulating VCF files.
Vcflib and related tools are the workhorses in bioinformatics for processing the VCF variant calling format. See
Vcflib and tools for processing the VCF variant call format; Erik Garrison, Zev N. Kronenberg, Eric T. Dawson, Brent S. Pedersen, Pjotr Prins; doi: https://doi.org/10.1101/2021.05.21.445151
news
May 2022: the vcflib paper has been published on PLoS Computational Biology!
See below for the citation.
April 2022: vcflib has just gone pangenome!
By introducing the wavefront algorithm we can now realign long sequences and reduce call complexity (and FPs!) introduced by pangenome variant callers using the new [vcfwave](./doc/vcfwave.md) tool.
See also [RELEASE_NOTES.md](./RELEASE_NOTES.md)
overview
The Variant Call Format (VCF) is a flat-file, tab-delimited textual format that describes reference-indexed variations between individuals. VCF provides a common interchange format for the description of variation in individuals and populations of samples, and has become the de facto standard reporting format for a wide array of genomic variant detectors.
vcflib provides methods to manipulate and interpret sequence variation described by VCF. It is both:
- an API for parsing and operating on records of genomic variation as it can be described by the VCF format
- a collection of command-line utilities for executing complex manipulations on VCF files
vclib is both a library (with an API) and a collection of useful tools. The API provides a quick and extremely permissive method to read and write VCF files. Extensions and applications of the library provided in the included utilities (*.cpp) comprise the vast bulk of the library's utility.
We have also added infrastructure to write Python bindings. See below.
Short index:
INSTALL
For latest updates see [RELEASE NOTES](./RELEASE_NOTES.md).
Bioconda
Conda installs in user land without root access
conda install -c bioconda vcflib
Homebrew
Homebrew installs on Linux and Mac OSX
brew install brewsci/bio/vcflib
Debian
For Debian and Ubuntu
apt-get install libvcflib-tools libvcflib-dev
GNU Guix
We develop against guix and vcflib is packaged as
guix package -i vcflib
See also the Guix shell below.
USAGE
Users are encouraged to drive the utilities in the library in a streaming fashion, using Unix pipes to fully utilize resources on multi-core systems. Piping provides a convenient method to interface with other libraries (vcf-tools, BedTools, GATK, htslib, bio-vcf, bcftools, freebayes) which interface via VCF files, allowing the composition of an immense variety of processing functions. Examples can be found in the scripts, e.g. [script](./scripts/vcfgtcompare.sh).
TOOLS
<!--
>>> from pytest.rtest import run_stdout, head, cat
-->
<!-- Created with ./scripts/bin2md.rb --index -->
filter
filter command | description |
---|---|
[vcfuniq](./doc/vcfuniq.md) | |
[vcfuniqalleles](./doc/vcfuniqalleles.md) | |
[vcffilter](./doc/vcffilter.md) |
metrics
metrics command | description |
---|---|
[vcfcheck](./doc/vcfcheck.md) | |
[vcfhethomratio](./doc/vcfhethomratio.md) | |
[vcfhetcount](./doc/vcfhetcount.md) | |
[vcfdistance](./doc/vcfdistance.md) | |
[vcfentropy](./doc/vcfentropy.md) |
phenotype
phenotype command | description |
---|---|
[permuteGPAT++](./doc/permuteGPAT++.md) |
genotype
genotype command | description |
---|---|
[normalize-iHS](./doc/normalize-iHS.md) | |
[hapLrt](./doc/hapLrt.md) | |
[abba-baba](./doc/abba-baba.md) |
transformation
transformation command | description |
---|---|
[vcfinfo2qual](./doc/vcfinfo2qual.md) | |
[vcfsamplediff](./doc/vcfsamplediff.md) | |
[vcfaddinfo](./doc/vcfaddinfo.md) | |
[vcfremoveaberrantgenotypes](./doc/vcfremoveaberrantgenotypes.md) | |
[vcfglxgt](./doc/vcfglxgt.md) | |
[dumpContigsFromHeader](./doc/dumpContigsFromHeader.md) | |
[vcfevenregions](./doc/vcfevenregions.md) | |
[vcfcat](./doc/vcfcat.md) | |
[vcfannotategenotypes](./doc/vcfannotategenotypes.md) | |
[vcfafpath](./doc/vcfafpath.md) | |
[vcfclassify](./doc/vcfclassify.md) | |
[vcfallelicprimitives](./doc/vcfallelicprimitives.md) | |
[vcfqual2info](./doc/vcfqual2info.md) | |
[vcfcreatemulti](./doc/vcfcreatemulti.md) | |
[vcfgeno2alleles](./doc/vcfgeno2alleles.md) | |
[vcfsample2info](./doc/vcfsample2info.md) | |
[vcfld](./doc/vcfld.md) | |
[vcfnumalt](./doc/vcfnumalt.md) | |
[vcfstreamsort](./doc/vcfstreamsort.md) | |
[vcfinfosummarize](./doc/vcfinfosummarize.md) | |
[vcflength](./doc/vcflength.md) | |
[vcfkeepgeno](./doc/vcfkeepgeno.md) | |
[vcfcombine](./doc/vcfcombine.md) | |
[vcfprimers](./doc/vcfprimers.md) | |
[vcfflatten](./doc/vcfflatten.md) | |
[vcf2dag](./doc/vcf2dag.md) | |
[vcfcleancomplex](./doc/vcfcleancomplex.md) | |
[vcfbreakmulti](./doc/vcfbreakmulti.md) | |
[vcfindex](./doc/vcfindex.md) | |
[vcfkeepinfo](./doc/vcfkeepinfo.md) | |
[vcfgeno2haplo](./doc/vcfgeno2haplo.md) | |
[vcfintersect](./doc/vcfintersect.md) | |
[vcfannotate](./doc/vcfannotate.md) | |
[smoother](./doc/smoother.md) | |
[vcf2fasta](./doc/vcf2fasta.md) | |
[vcfsamplenames](./doc/vcfsamplenames.md) | |
[vcfleftalign](./doc/vcfleftalign.md) | |
[vcfglbound](./doc/vcfglbound.md) | |
[vcfcommonsamples](./doc/vcfcommonsamples.md) | |
[vcfecho](./doc/vcfecho.md) | |
[vcfkeepsamples](./doc/vcfkeepsamples.md) | |
[vcf2tsv](./doc/vcf2tsv.md) | |
[vcfoverlay](./doc/vcfoverlay.md) | |
[vcfgenosamplenames](./doc/vcfgenosamplenames.md) | |
[vcfremovesamples](./doc/vcfremovesamples.md) | |
[vcfremap](./doc/vcfremap.md) | |
[vcffixup](./doc/vcffixup.md) |
statistics
statistics command | description |
---|---|
[vcfgenosummarize](./doc/vcfgenosummarize.md) | |
[vcfcountalleles](./doc/vcfcountalleles.md) | |
[meltEHH](./doc/meltEHH.md) | |
[genotypeSummary](./doc/genotypeSummary.md) | |
[vcfrandomsample](./doc/vcfrandomsample.md) | |
[pVst](./doc/pVst.md) | |
[vcfrandom](./doc/vcfrandom.md) | |
[segmentFst](./doc/segmentFst.md) | |
[sequenceDiversity](./doc/sequenceDiversity.md) | |
[segmentIhs](./doc/segmentIhs.md) | |
[vcfgenotypes](./doc/vcfgenotypes.md) | |
[vcfaltcount](./doc/vcfaltcount.md) | |
[plotHaps](./doc/plotHaps.md) | |
[vcfsitesummarize](./doc/vcfsitesummarize.md) | |
[vcfgenotypecompare](./doc/vcfgenotypecompare.md) | |
[vcfstats](./doc/vcfstats.md) | |
[wcFst](./doc/wcFst.md) | |
[permuteSmooth](./doc/permuteSmooth.md) | |
[bFst](./doc/bFst.md) | |
[vcfroc](./doc/vcfroc.md) | |
[vcfparsealts](./doc/vcfparsealts.md) | |
[pFst](./doc/pFst.md) | |
[iHS](./doc/iHS.md) | |
[popStats](./doc/popStats.md) |
See also [vcflib.md](./doc/vcflib.md).
scripts
The vcflib source repository contains a number of additional scripts. Click on the link to see the source code.
script | description |
---|---|
[vcfclearinfo](./scripts/vcfclearinfo) | clear INFO field |
[vcfqualfilter](./scripts/vcfqualfilter) | quality filter |
[vcfnulldotslashdot](./scripts/vcfnulldotslashdot) | rewrite null genotypes to ./. |
[vcfprintaltdiscrepancy.r](./scripts/vcfprintaltdiscrepancy.r) | show ALT discrepancies in a table |
[vcfremovenonATGC](./scripts/vcfremovenonATGC) | remove non-nucleotides in REF or ALT |
[plotSmoothed.R](./scripts/plotSmoothed.R) | smooth plot of wcFst, pFst or abba-baba |
[vcf_strip_extra_headers](./scripts/vcf_strip_extra_headers) | strip headers |
[plotHapLrt.R](./scripts/plotHapLrt.R) | plot results of pFst |
[vcfbiallelic](./scripts/vcfbiallelic) | remove anything that is not biallelic |
[vcfsort](./scripts/vcfsort) | sort VCF using shell script |
[vcfnosnps](./scripts/vcfnosnps) | remove SNPs |
[vcfmultiwayscripts](./scripts/vcfmultiwayscripts) | more multiway comparisons |
[vcfgtcompare.sh](./scripts/vcfgtcompare.sh) | annotates records in the first file with genotypes and sites from the second |
[plotPfst.R](./scripts/plotPfst.R) | plot pFst |
[vcfregionreduce_and_cut](./scripts/vcfregionreduce_and_cut) | reduce, gzip, and tabix |
[plotBfst.R](./scripts/plotBfst.R) | plot results of pFst |
[vcfnobiallelicsnps](./scripts/vcfnobiallelicsnps) | remove biallelic SNPs |
[vcfindels](./scripts/vcfindels) | show INDELS |
[vcfmultiway](./scripts/vcfmultiway) | multiway comparison |
[vcfregionreduce](./scripts/vcfregionreduce) | reduce VCFs using a BED File, gzip them up and create tabix index |
[vcfprintaltdiscrepancy.sh](./scripts/vcfprintaltdiscrepancy.sh) | runner |
[vcfclearid](./scripts/vcfclearid) | clear ID field |
[vcfcomplex](./scripts/vcfcomplex) | remove all SNPs but keep SVs |
[vcffirstheader](./scripts/vcffirstheader) | show first header |
[plotXPEHH.R](./scripts/plotXPEHH.R) | plot XPEHH |
[vcfregionreduce_pipe](./scripts/vcfregionreduce_pipe) | reduce, gzip and tabix in a pipe |
[vcfplotaltdiscrepancy.sh](./scripts/vcfplotaltdiscrepancy.sh) | plot ALT discrepancy runner |
[vcfplottstv.sh](./scripts/vcfplottstv.sh) | runner |
[vcfnoindels](./scripts/vcfnoindels) | remove INDELs |
[bgziptabix](./scripts/bgziptabix) | runs bgzip on the input and tabix indexes the result |
[plotHaplotypes.R](./scripts/plotHaplotypes.R) | plot results |
[vcfplotsitediscrepancy.r](./scripts/vcfplotsitediscrepancy.r) | plot site discrepancy |
[vcfindelproximity](./scripts/vcfindelproximity) | show SNPs around an INDEL |
[bed2region](./scripts/bed2region) | convert VCF CHROM column in VCF file to region |
[vcfplotaltdiscrepancy.r](./scripts/vcfplotaltdiscrepancy.r) | plot ALT discrepancies |
[plot_roc.r](./scripts/plot_roc.r) | plot ROC |
[vcfmultiallelic](./scripts/vcfmultiallelic) | remove anything that is not multiallelic |
[vcfsnps](./scripts/vcfsnps) | show SNPs |
[vcfvarstats](./scripts/vcfvarstats) | use fastahack to get stats |
[vcfregionreduce_uncompressed](./scripts/vcfregionreduce_uncompressed) | reduce, gzip and tabix |
[plotWCfst.R](./scripts/plotWCfst.R) | plot wcFst |
[vcf2bed.py](./scripts/vcf2bed.py) | transform VCF to BED file |
[vcfjoincalls](./scripts/vcfjoincalls) | overlay files using QUAL and GT from a second VCF |
[vcf2sqlite.py](./scripts/vcf2sqlite.py) | push VCF file into SQLite3 database using dbname |
python
vcflib has rudimentary python bindings, but the are easy to build up on. See [pyvcflib](./test/pytest/pyvcflib.md).
Development
build from source
VCFLIB uses the cmake build system, after a recursive checkout of the sources make the files in the ./build directory with:
git clone --recursive https://github.com/vcflib/vcflib.git
cd vcflib
mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Debug -DZIG=OFF -DOPENMP=OFF ..
cmake --build .
cmake --install .
and to run the tests
ctest --verbose
Executables are built into the ./build
directory in the repository.
Note, if you have an existing repo update submodules with
git submodule update --init --recursive --progress
cd build
cmake --build . --target clean
Build dependencies can be viewed in the github-CI scripts (see badges above), as well as [guix.scm](./guix.scm) used by us to create the build environment (for instructions see the header of guix.scm). Essentially:
- cmake
- C++ compiler
- htslib
- tabixpp
- WFA2
- pybind11 (for testing)
For include files add
- libhts-dev
- libtabixpp-dev
- libtabixpp0
And for some of the VCF executables
- python
- perl
Using a different htslib
Check out htslib in tabixpp (recursively) and
cmake -DHTSLIB_LOCAL:STRING=./htslib/ ..
cmake --build .
link library
The standard build creates build/vcflib.a
. Take a hint from the
[cmake](./CMakeLists.txt) file that builds all the vcflib tools.
source code
See [vcfecho.cpp](./src/vcfecho.cpp) for basic usage. [Variant.h](./src/Variant.h) and [Variant.cpp](./src/Variant.cpp) describe methods available in the API. vcflib is incorporated into several projects, such as freebayes, which may provide a point of reference for prospective developers. Note vcflib contains submodules (git repositories) comprising some dependencies. A full Guix development environment we use is defined [here](./guix.scm).
adding tests
vcflib uses different test systems. The most important one is the doctest because it doubles as documentation. For an example see [vcf2tsv.md](./test/pytest/vcf2tsv.md) which can be run from the command line with
cd test
python3 -m doctest -o NORMALIZE_WHITESPACE -o REPORT_UDIFF pytest/vcf2tsv.md
We also added support for python bindings and unit tests. See [realign.py](./test/tests/realign.py) for an example.
Support
The developers are on the vcflib matrix channel. Please do not use the github issue tracker for support issues!
Contributing
To contribute code to vcflib send a github pull request. We may ask you to add a working test case as described in 'adding tests'.
LICENSE
This software is distributed under the free software [MIT LICENSE](./LICENSE).
CREDIT
Citations are the bread and butter of Science. If you are using this software in your research and want to support our future work, please cite the following publication:
Vcflib and tools for processing the VCF variant call format; Erik Garrison, Zev N. Kronenberg, Eric T. Dawson, Brent S. Pedersen, Pjotr Prins; doi: https://doi.org/10.1101/2021.05.21.445151
Bibtex reference
Please cite: A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar.
@article{10.1371/journal.pcbi.1009123,
doi = {10.1371/journal.pcbi.1009123},
author = {Garrison, Erik AND Kronenberg, Zev N. AND Dawson, Eric T. AND Pedersen, Brent S. AND Prins, Pjotr},
journal = {PLOS Computational Biology},
publisher = {Public Library of Science},
title = {A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar},
year = {2022},
month = {05},
volume = {18},
url = {https://doi.org/10.1371/journal.pcbi.1009123},
pages = {1-15}
}
Below the prepublished version of our paper
@article {Garrison2021.05.21.445151,
author = {Garrison, Erik and Kronenberg, Zev N. and Dawson, Eric T. and Pedersen, Brent S. and Prins, Pjotr},
title = {Vcflib and tools for processing the VCF variant call format},
elocation-id = {2021.05.21.445151},
year = {2021},
doi = {10.1101/2021.05.21.445151},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2021/05/23/2021.05.21.445151},
eprint = {https://www.biorxiv.org/content/early/2021/05/23/2021.05.21.445151.full.pdf},
journal = {bioRxiv}
}
*Note that all licence references and agreements mentioned in the Vcflib README section above
are relevant to that project's source code only.