Gemini

Description

From the Gemini Home Page: GEMINI (GEnome MINIng) is a flexible framework for exploring genetic variation in the context of the wealth of genome annotations available for the human genome. By placing genetic variants, sample phenotypes and genotypes, as well as genome annotations into an integrated database framework, GEMINI provides a simple, flexible, and powerful system for exploring genetic variation for disease and population genetics.

Using the GEMINI framework begins by loading a VCF file (and an optional PED file) into a database. Each variant is automatically annotated by comparing it to several genome annotations from source such as ENCODE tracks, UCSC tracks, OMIM, dbSNP, KEGG, and HPRD. All of this information is stored in portable SQLite database that allows one to explore and interpret both coding and non-coding variation using “off-the-shelf” tools or an enhanced SQL engine.

Version

  • 0.18.0

Authorized Users

  • CIRCE account holders
  • RRA account holders
  • SC account holders

Platforms

  • CIRCE cluster
  • RRA cluster
  • SC cluster

Modules

Gemini requires the following module file to run:

  • apps/gemini/0.18.0

Running Gemini on CIRCE/SC

The Gemini user guide is essential to understanding the application and making the most of it. The guide and this page should help you to get started with your simulations. Please refer to the Documentation section for a link to the guide.

  • Note on CIRCE: Make sure to run your jobs from your $WORK directory!
  • Note: Scripts are provided as examples only. Your SLURM executables, tools, and options may vary from the example below. For help on submitting jobs to the queue, see our SLURM User’s Guide.

Interactive Mode

Next, use the following commands to open an SRUN Interactive Session, load the module for Gemini, and execute the Gemini binary:

[user@login0 ~]$ srun --time=48:00:00 --nodes=1 --cpus-per-task=1 --pty /bin/bash
[user@wh-520-4-1 ~]$ module load apps/gemini/0.18.0
[user@wh-520-4-1 ~]$ gemini load -v chr22.VEP.vcf -p trio.ped -t VEP --cores 4 --skip-gene-tables chr22.db

Batch Job submission

To run batch jobs on the CIRCE/SC cluster, users will need to submit their jobs to the scheduling environment if their jobs take more than 30 minutes to run on a standard PC.

  • If, for example, you have wish to load a VCF file into a Gemini database, you would set up a submit script to use Gemini like this
#!/bin/bash
#
#SBATCH --job-name=gemini-test
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --output=output.%j.gemini-test

#### SLURM 4 processor gemini test to run for 1 hours.

# Load the gemini module:
module load apps/gemini/0.18.0

# Start Gemini
gemini load -v chr22.VEP.vcf -p trio.ped -t VEP --cores 4 --skip-gene-tables chr22.db

 
Next, you can change to your job’s directory, and run the sbatch command to submit the job:

[user@login0 ~]$ cd my/jobdir
[user@login0 jobdir]$ sbatch ./gemini-test.sh
  • You can view the status of your job with the “squeue -u <username>” command

Documentation

Benchmarks, Known Tests, Examples, Tutorials, and Other Resources

More Job Information

See the following for more detailed job submission information:

Reporting Bugs

Report bugs with Gemini to the IT Help Desk: rc-help@usf.edu