VCFtools

Description

From the VCFtools Home Page: VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.

This toolset can be used to perform the following operations on VCF files:

  • Filter out specific variants
  • Compare files
  • Summarize variants
  • Convert to different file types
  • Validate and merge files
  • Create intersections and subsets of variants

VCFtools consists of two parts, a perl module and a binary executable. The perl module is a general Perl API for manipulating VCF files, whereas the binary executable provides general analysis routines.ken.

Version

  • 0.1.14

Authorized Users

  • CIRCE account holders
  • RRA account holders
  • SC account holders

Platforms

  • CIRCE cluster
  • RRA cluster
  • SC cluster

Modules

VCFtools requires the following module file to run:

  • apps/vcftools/0.1.14

Running VCFtools on CIRCE/SC

The VCFtools user guide is essential to understanding the application and making the most of it. The guide and this page should help you to get started with your simulations. Please refer to the Documentation section for a link to the guide.

  • Note on CIRCE: Make sure to run your jobs from your $WORK directory!
  • Note: Scripts are provided as examples only. Your SLURM executables, tools, and options may vary from the example below. For help on submitting jobs to the queue, see our SLURM User’s Guide.

Interactive Mode

Next, use the following commands to open an SRUN Interactive Session, load the module for VCFtools, and execute the VCFtools binary:

[user@login0 ~]$ srun --time=48:00:00 --nodes=1 --cpus-per-task=1 --pty /bin/bash
[user@wh-520-4-1 ~]$ module load apps/vcftools/0.1.14
[user@wh-520-4-1 ~]$ vcftools --vcf input_data.vcf --chr 1 --from-bp 1000000 --to-bp 2000000

Batch Job submission

To run batch jobs on the CIRCE/SC cluster, users will need to submit their jobs to the scheduling environment if their jobs take more than 30 minutes to run on a standard PC.

If, for example, you wish to filter out variants from a VCF file, you would set up a submit script to use vcftools like this

#!/bin/bash
#
#SBATCH --job-name=vcftools-test
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=output.%j.vcftools-test

#### Slurm 1 processor vcftools test to run for 1 hours.

# Load the vcftools module:
module load apps/vcftools/0.1.14

# Start vcftools
vcftools --vcf input_data.vcf --chr 1 --from-bp 1000000 --to-bp 2000000

 
Next, you can change to your job’s directory, and run the sbatch command to submit the job:

[user@login0 ~]$ cd my/jobdir
[user@login0 jobdir]$ sbatch ./vcftools-test.sh
  • You can view the status of your job with the “squeue -u <username>” command

Documentation

Home Page, User Guides, and Manuals

More Job Information

See the following for more detailed job submission information:

Reporting Bugs

Report bugs with VCFtools to the IT Help Desk: rc-help@usf.edu