Genome Analysis ToolKit (GATK)
Description
From the GATK Home Page: The Genome Analysis Toolkit or GATK is a software package for analysis of high-throughput sequencing data, developed by the Data Science and Data Engineering group at the Broad Institute. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Version
- 3.5
Authorized Users
CIRCE
account holdersRRA
account holdersSC
account holders
Platforms
CIRCE
clusterRRA
clusterSC
cluster
Modules
Genome Analysis ToolKit (GATK) requires the following module file to run:
apps/gatk/3.5
- See Modules for more information.
Running Genome Analysis ToolKit (GATK) on CIRCE/SC
The Genome Analysis ToolKit (GATK) user guide is essential to understanding the application and making the most of it. The guide and this page should help you to get started with your simulations. Please refer to the Documentation section for a link to the guide.
- Note on CIRCE: Make sure to run your jobs from your $WORK directory!
- Note: Scripts are provided as examples only. Your SLURM executables, tools, and options may vary from the example below. For help on submitting jobs to the queue, see our SLURM User’s Guide.
Interactive Mode
Next, use the following commands to open an SRUN Interactive Session, load the module for Genome Analysis ToolKit (GATK), and execute the Genome Analysis ToolKit (GATK) binary:
[user@login0 ~]$ srun --time=48:00:00 --nodes=1 --ntasks-per-node=1 --pty /bin/bash [user@wh-520-4-1 ~]$ module load apps/gatk/3.5 [user@wh-520-4-1 ~]$ java -jar $GATKJAR -T CountLoci -R exampleFASTA.fasta -I exampleBAM.bam -o output.txt
Batch Job submission
To run batch jobs on the CIRCE/SC cluster, users will need to submit their jobs to the scheduling environment if their jobs take more than 20 minutes to run on a standard PC.
- If, for example, you have a FASTA file exampleFASTA.fasta and BAM file exampleBAM.bam file you wish to perform the CountLoci operation on, you would set up a submit script to use GATK like this
#!/bin/bash # #SBATCH --job-name=gatk-test #SBATCH --time=48:00:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --output=output.%j.gatk-test #### SLURM 1 processor GATK test to run for 48 hours. # Load the GATK module: module load apps/gatk/3.5 # Start GATK java -jar $GATKJAR -T CountLoci -R exampleFASTA.fasta -I exampleBAM.bam -o output.txt
Next, you can change to your job’s directory, and run the sbatch command to submit the job:
[user@login0 ~]$ cd my/jobdir [user@login0 jobdir]$ sbatch ./gatk-test.sh
- You can view the status of your job with the “squeue -u <username>” command
Documentation
Home Page, User Guides, and Manuals
- GATK Home Page:
- GATK Guide:
Benchmarks, Known Tests, Examples, Tutorials, and Other Resources
More Job Information
See the following for more detailed job submission information:
Reporting Bugs
Report bugs with Genome Analysis ToolKit (GATK) to the IT Help Desk: rc-help@usf.edu