Popularity
4.8
Declining
Activity
8.2
-
422
40
195

Code Quality Rank: L1
Programming language: C++
License: MIT License
Tags: Biology    
Latest version: v1.0.2

Vcflib alternatives and similar libraries

Based on the "Biology" category.
Alternatively, view Vcflib alternatives based on common mentions on social networks and blogs.

Do you think we are missing an alternative of Vcflib or a related project?

Add another 'Biology' Library

README

vcflib

A C++ library for parsing and manipulating VCF files.

Github-CI Travis-CI AnacondaBadge DL BrewBadge GuixBadge DebianBadge C++0x Gitter

overview

The Variant Call Format (VCF) is a flat-file, tab-delimited textual format that describes reference-indexed variations between individuals. VCF provides a common interchange format for the description of variation in individuals and populations of samples, and has become the de facto standard reporting format for a wide array of genomic variant detectors.

vcflib provides methods to manipulate and interpret sequence variation described by VCF. It is both:

  • an API for parsing and operating on records of genomic variation as it can be described by the VCF format
  • a collection of command-line utilities for executing complex manipulations on VCF files

vclib is both a library (with an API) and a collection of useful tools. The API provides a quick and extremely permissive method to read and write VCF files. Extensions and applications of the library provided in the included utilities (*.cpp) comprise the vast bulk of the library's utility.


Short index:


INSTALL

Bioconda

Conda installs in user land without root access

conda install -c bioconda vcflib

Homebrew

Homebrew installs on Linux and Mac OSX

brew install brewsci/bio/vcflib

Debian

For Debian and Ubuntu

apt-get install libvcflib-tools libvcflib-dev

GNU Guix

We develop against guix

guix package -i vcflib

USAGE

Users are encouraged to drive the utilities in the library in a streaming fashion, using Unix pipes to fully utilize resources on multi-core systems. Piping provides a convenient method to interface with other libraries (vcf-tools, BedTools, GATK, htslib, bio-vcf, bcftools, freebayes) which interface via VCF files, allowing the composition of an immense variety of processing functions. Examples can be found in the scripts, e.g. [script](./scripts/vcfgtcompare.sh).

TOOLS

<!--

>>> from pytest.rtest import run_stdout, head, cat

-->

<!-- Created with ./scripts/bin2md.rb --index -->

filter

filter command description
[vcfuniq](./doc/vcfuniq.md)
[vcfuniqalleles](./doc/vcfuniqalleles.md)
[vcffilter](./doc/vcffilter.md)

metrics

metrics command description
[vcfcheck](./doc/vcfcheck.md)
[vcfhethomratio](./doc/vcfhethomratio.md)
[vcfhetcount](./doc/vcfhetcount.md)
[vcfdistance](./doc/vcfdistance.md)
[vcfentropy](./doc/vcfentropy.md)

phenotype

phenotype command description
[permuteGPAT++](./doc/permuteGPAT++.md)

genotype

genotype command description
[normalize-iHS](./doc/normalize-iHS.md)
[hapLrt](./doc/hapLrt.md)
[abba-baba](./doc/abba-baba.md)

transformation

transformation command description
[vcfinfo2qual](./doc/vcfinfo2qual.md)
[vcfsamplediff](./doc/vcfsamplediff.md)
[vcfaddinfo](./doc/vcfaddinfo.md)
[vcfremoveaberrantgenotypes](./doc/vcfremoveaberrantgenotypes.md)
[vcfglxgt](./doc/vcfglxgt.md)
[dumpContigsFromHeader](./doc/dumpContigsFromHeader.md)
[vcfevenregions](./doc/vcfevenregions.md)
[vcfcat](./doc/vcfcat.md)
[vcfannotategenotypes](./doc/vcfannotategenotypes.md)
[vcfafpath](./doc/vcfafpath.md)
[vcfclassify](./doc/vcfclassify.md)
[vcfallelicprimitives](./doc/vcfallelicprimitives.md)
[vcfqual2info](./doc/vcfqual2info.md)
[vcfcreatemulti](./doc/vcfcreatemulti.md)
[vcfgeno2alleles](./doc/vcfgeno2alleles.md)
[vcfsample2info](./doc/vcfsample2info.md)
[vcfld](./doc/vcfld.md)
[vcfnumalt](./doc/vcfnumalt.md)
[vcfstreamsort](./doc/vcfstreamsort.md)
[vcfinfosummarize](./doc/vcfinfosummarize.md)
[vcflength](./doc/vcflength.md)
[vcfkeepgeno](./doc/vcfkeepgeno.md)
[vcfcombine](./doc/vcfcombine.md)
[vcfprimers](./doc/vcfprimers.md)
[vcfflatten](./doc/vcfflatten.md)
[vcf2dag](./doc/vcf2dag.md)
[vcfcleancomplex](./doc/vcfcleancomplex.md)
[vcfbreakmulti](./doc/vcfbreakmulti.md)
[vcfindex](./doc/vcfindex.md)
[vcfkeepinfo](./doc/vcfkeepinfo.md)
[vcfgeno2haplo](./doc/vcfgeno2haplo.md)
[vcfintersect](./doc/vcfintersect.md)
[vcfannotate](./doc/vcfannotate.md)
[smoother](./doc/smoother.md)
[vcf2fasta](./doc/vcf2fasta.md)
[vcfsamplenames](./doc/vcfsamplenames.md)
[vcfleftalign](./doc/vcfleftalign.md)
[vcfglbound](./doc/vcfglbound.md)
[vcfcommonsamples](./doc/vcfcommonsamples.md)
[vcfecho](./doc/vcfecho.md)
[vcfkeepsamples](./doc/vcfkeepsamples.md)
[vcf2tsv](./doc/vcf2tsv.md)
[vcfoverlay](./doc/vcfoverlay.md)
[vcfgenosamplenames](./doc/vcfgenosamplenames.md)
[vcfremovesamples](./doc/vcfremovesamples.md)
[vcfremap](./doc/vcfremap.md)
[vcffixup](./doc/vcffixup.md)

statistics

statistics command description
[vcfgenosummarize](./doc/vcfgenosummarize.md)
[vcfcountalleles](./doc/vcfcountalleles.md)
[meltEHH](./doc/meltEHH.md)
[genotypeSummary](./doc/genotypeSummary.md)
[vcfrandomsample](./doc/vcfrandomsample.md)
[pVst](./doc/pVst.md)
[vcfrandom](./doc/vcfrandom.md)
[segmentFst](./doc/segmentFst.md)
[sequenceDiversity](./doc/sequenceDiversity.md)
[segmentIhs](./doc/segmentIhs.md)
[vcfgenotypes](./doc/vcfgenotypes.md)
[vcfaltcount](./doc/vcfaltcount.md)
[plotHaps](./doc/plotHaps.md)
[vcfsitesummarize](./doc/vcfsitesummarize.md)
[vcfgenotypecompare](./doc/vcfgenotypecompare.md)
[vcfstats](./doc/vcfstats.md)
[wcFst](./doc/wcFst.md)
[permuteSmooth](./doc/permuteSmooth.md)
[bFst](./doc/bFst.md)
[vcfroc](./doc/vcfroc.md)
[vcfparsealts](./doc/vcfparsealts.md)
[pFst](./doc/pFst.md)
[iHS](./doc/iHS.md)
[popStats](./doc/popStats.md)

See also [vcflib.md](./doc/vcflib.md).

scripts

The vcflib source repository contains a number of additional scripts. Click on the link to see the source code.

script description
[vcfclearinfo](./scripts/vcfclearinfo) clear INFO field
[vcfqualfilter](./scripts/vcfqualfilter) quality filter
[vcfnulldotslashdot](./scripts/vcfnulldotslashdot) rewrite null genotypes to ./.
[vcfprintaltdiscrepancy.r](./scripts/vcfprintaltdiscrepancy.r) show ALT discrepancies in a table
[vcfremovenonATGC](./scripts/vcfremovenonATGC) remove non-nucleotides in REF or ALT
[plotSmoothed.R](./scripts/plotSmoothed.R) smooth plot of wcFst, pFst or abba-baba
[vcf_strip_extra_headers](./scripts/vcf_strip_extra_headers) strip headers
[plotHapLrt.R](./scripts/plotHapLrt.R) plot results of pFst
[vcfbiallelic](./scripts/vcfbiallelic) remove anything that is not biallelic
[vcfsort](./scripts/vcfsort) sort VCF using shell script
[vcfnosnps](./scripts/vcfnosnps) remove SNPs
[vcfmultiwayscripts](./scripts/vcfmultiwayscripts) more multiway comparisons
[vcfgtcompare.sh](./scripts/vcfgtcompare.sh) annotates records in the first file with genotypes and sites from the second
[plotPfst.R](./scripts/plotPfst.R) plot pFst
[vcfregionreduce_and_cut](./scripts/vcfregionreduce_and_cut) reduce, gzip, and tabix
[plotBfst.R](./scripts/plotBfst.R) plot results of pFst
[vcfnobiallelicsnps](./scripts/vcfnobiallelicsnps) remove biallelic SNPs
[vcfindels](./scripts/vcfindels) show INDELS
[vcfmultiway](./scripts/vcfmultiway) multiway comparison
[vcfregionreduce](./scripts/vcfregionreduce) reduce VCFs using a BED File, gzip them up and create tabix index
[vcfprintaltdiscrepancy.sh](./scripts/vcfprintaltdiscrepancy.sh) runner
[vcfclearid](./scripts/vcfclearid) clear ID field
[vcfcomplex](./scripts/vcfcomplex) remove all SNPs but keep SVs
[vcffirstheader](./scripts/vcffirstheader) show first header
[plotXPEHH.R](./scripts/plotXPEHH.R) plot XPEHH
[vcfregionreduce_pipe](./scripts/vcfregionreduce_pipe) reduce, gzip and tabix in a pipe
[vcfplotaltdiscrepancy.sh](./scripts/vcfplotaltdiscrepancy.sh) plot ALT discrepancy runner
[vcfplottstv.sh](./scripts/vcfplottstv.sh) runner
[vcfnoindels](./scripts/vcfnoindels) remove INDELs
[bgziptabix](./scripts/bgziptabix) runs bgzip on the input and tabix indexes the result
[plotHaplotypes.R](./scripts/plotHaplotypes.R) plot results
[vcfplotsitediscrepancy.r](./scripts/vcfplotsitediscrepancy.r) plot site discrepancy
[vcfindelproximity](./scripts/vcfindelproximity) show SNPs around an INDEL
[bed2region](./scripts/bed2region) convert VCF CHROM column in VCF file to region
[vcfplotaltdiscrepancy.r](./scripts/vcfplotaltdiscrepancy.r) plot ALT discrepancies
[plot_roc.r](./scripts/plot_roc.r) plot ROC
[vcfmultiallelic](./scripts/vcfmultiallelic) remove anything that is not multiallelic
[vcfsnps](./scripts/vcfsnps) show SNPs
[vcfvarstats](./scripts/vcfvarstats) use fastahack to get stats
[vcfregionreduce_uncompressed](./scripts/vcfregionreduce_uncompressed) reduce, gzip and tabix
[plotWCfst.R](./scripts/plotWCfst.R) plot wcFst
[vcf2bed.py](./scripts/vcf2bed.py) transform VCF to BED file
[vcfjoincalls](./scripts/vcfjoincalls) overlay files using QUAL and GT from a second VCF
[vcf2sqlite.py](./scripts/vcf2sqlite.py) push VCF file into SQLite3 database using dbname

Development

build from source

VCFLIB uses the cmake build system, after a recursive checkout of the sources make the files in the ./build directory with:

git clone --recursive https://github.com/vcflib/vcflib.git
cd vcflib
mkdir -p build && cd build
cmake ..
cmake --build .
cmake --install .

and to run the tests

ctest --verbose

Executables are built into the ./build directory in the repository.

Build dependencies can be viewed in the Travis-CI and github-CI scripts (see badges above), as well as [guix.scm](./guix.scm) used by us to create the build environment (for instructions see the header of guix.scm). Essentially:

  • C++ compiler
  • htslib
  • tabixpp

For include files add

  • libhts-dev
  • libtabixpp-dev
  • libtabixpp0

And for some of the VCF executables

  • python
  • perl

Using a different htslib

Check out htslib in tabixpp (recursively) and

cmake -DHTSLIB_LOCAL:STRING=./tabixpp/htslib/ ..
cmake --build .

link library

The standard build creates build/vcflib.a. Take a hint from the [cmake](./CMakeLists.txt) file that builds all the vcflib tools.

source code

See [vcfecho.cpp](./src/vcfecho.cpp) for basic usage. [Variant.h](./src/Variant.h) and [Variant.cpp](./src/Variant.cpp) describe methods available in the API. vcflib is incorporated into several projects, such as freebayes, which may provide a point of reference for prospective developers. Note vcflib contains submodules (git repositories) comprising some dependencies. A full Guix development environment we use is defined [here](./guix.scm).

adding tests

vcflib uses different test systems. The most important one is the doctest because it doubles as documentation. For an example see [vcf2tsv.md](./test/pytest/vcf2tsv.md) which can be run from the command line with

cd test
python3 -m doctest -o NORMALIZE_WHITESPACE -o REPORT_UDIFF pytest/vcf2tsv.md
``

# Contributing

To contribute code to vcflib send a github pull request. We may ask
you to add a working test case as described in 'adding tests'.

# LICENSE

This software is distributed under the free software [MIT
LICENSE](./LICENSE).


*Note that all licence references and agreements mentioned in the Vcflib README section above are relevant to that project's source code only.