Htslib

Description

From the Htslib Home Page: HTSlib is an implementation of a unified C library for accessing common file formats, such as SAM, CRAM and VCF, used for high-throughput sequencing data, and is the core library used by samtools and bcftools. HTSlib only depends on zlib. It is known to be compatible with gcc, g++ and clang.

HTSlib implements a generalized BAM index, with file extension .csi (coordinate-sorted index). The HTSlib file reader first looks for the new index and then for the old if the new index is absent.

This project also includes the popular tabix indexer, which indexes both .tbi and .csi formats, and the bgzip compression utility.

Version

  • 1.2.1

Authorized Users

  • CIRCE account holders
  • RRA account holders
  • SC account holders

Platforms

  • CIRCE cluster
  • RRA cluster
  • SC cluster

Modules

Htslib requires the following module file to run:

  • apps/htslib/1.2.1

Running Htslib on CIRCE/SC

The Htslib user guide is essential to understanding the application and making the most of it. The guide and this page should help you to get started with your simulations. Please refer to the Documentation section for a link to the guide.

  • Note on CIRCE: Make sure to run your jobs from your $WORK directory!
  • Note: Scripts are provided as examples only. Your SLURM executables, tools, and options may vary from the example below. For help on submitting jobs to the queue, see our SLURM User’s Guide.

Interactive Mode

Next, use the following commands to open an SRUN Interactive Session, load the module for Htslib, and execute the Htslib binary:

[user@login0 ~]$ srun --time=48:00:00 --nodes=1 --cpus-per-task=1 --pty /bin/bash
[user@wh-520-4-1 ~]$ module load apps/htslib/1.2.0
[user@wh-520-4-1 ~]$ tabix sorted.gff.gz chr1:10,000,000-20,000,000;

Batch Job submission

To run batch jobs on the CIRCE/SC cluster, users will need to submit their jobs to the scheduling environment if their jobs take more than 30 minutes to run on a standard PC.

If, for example, you have wish generate a tab delimited genome position file, you would set up a submit script to use htslib like this

#!/bin/bash
#
#SBATCH --job-name=htslib-test
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=output.%j.htslib-test

#### Slurm 1 processor bamtools test to run for 1 hours.

# Load the htslib module:
module load apps/htslib/1.2.0

# Start the tabix binary from htslib
tabix sorted.gff.gz chr1:10,000,000-20,000,000;

 
Next, you can change to your job’s directory, and run the sbatch command to submit the job:

[user@login0 ~]$ cd my/jobdir
[user@login0 jobdir]$ sbatch ./htslib-test.sh
  • You can view the status of your job with the “squeue -u <username>” command

Documentation

Home Page, User Guides, and Manuals

More Job Information

See the following for more detailed job submission information:

Reporting Bugs

Report bugs with Htslib to the IT Help Desk: rc-help@usf.edu