FAQ

Revision as of 17:17, 24 June 2016 by Botto (talk | contribs) (Created page with "= CIRCE Frequently Asked Questions = This page contains a list of questions that we receive quite often from our users. We’ll attempt to answer these questions as best as p...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

CIRCE Frequently Asked Questions

This page contains a list of questions that we receive quite often from our users. We’ll attempt to answer these questions as best as possible so that you can come here for quick answers. If there are other questions you’d like to see added to this FAQ, please send them to help@usf.edu. Make sure you mention Research Computing

Q. Help! I can’t log in!

A. Here are some possibilities:

  1. Have you changed your NetID password lately (E-Mail, Blackboard, etc.)? If you have, make sure you are using the same password when you log into CIRCE
  2. Have you forgotten your password? You can reset it at https://netid.usf.edu
  3. Are you no longer a student? Send an E-Mail to help@usf.edu saying that you’re a former student and that you’d like access to your Research Computing data. Please provide your USF NetID that you used to use to access your account.

Q. What is this “queue” thing I keep hearing about and why do I have to use it?

A. The queue is like any other queue: it’s simply a waiting line for people that are wanting access to limited resources. Whether its a bank teller, the nice person at the DMV, or a 32 processor server with 512GB or RAM, resources must be managed. You’ll want to read our Guide to SLURM for a better description.

Q. How do I run graphical applications like Matlab, Comsol, p4vasp, PyMol, or CUBIT?

A. We recommend that you follow the supported CIRCE Desktop instructions for connections to CIRCE for running graphical applications.

Q. I’ve submitted a job to the cluster and I’m getting errors about “Command not found” or “Syntax Error”. The script looks fine when I edit it. What gives?

A. A number of different problems can cause these messages. The most frequent are listed below:

  1. Did you create your submit script from within the Windows editor Notepad? If so, please ensure that you convert your submit script with the dos2unix command by running:
[user@host ~]$ dos2unix  

where <script> is the name of your job script. This will convert the file to the proper UNIX file format that SLURM recognizes.

  1. Does your submit script contain the appropriate module lines? Please see the documentation for your respective application for the appropriate module lines to add.
  2. Are you actually referencing a file that exists? You can make sure that the file exists by doing:
[user@host ~]$ ls <path_to_file>

where <path_to_file> cannot find the file, then chances are it doesn’t exist on the nodes either.

Q. I just deleted an extremely important file from my home directory and I’m defending my thesis next week! Please tell me you have a backup copy!

A. Yes, we have at least 14 days of incremental backups to rely on. Please provide the path and file name and send a request to help@usf.edu in order to request a restore.

Q. I try to run Ansys/Matlab/Mathematica/Gambit on CIRCE and it fails, saying “This model requires more scratch space than available” or “Out of memory”. Why is this happening?

A. There are a couple reasons why this happens:

  1. This happens if you are trying to run the command directly from a login machine (your prompt will look like this):
 [user@login# ~]$

To be fair to other users, we have resource restrictions on our login machines limiting the amount of RAM and CPU time you can use. We recommend running these applications from an srun session:

[user@host ~]$ srun —time=HH:MM:SS —pty /bin/bash

Where --time= is the hours, minutes, and seconds that you believe you will need to finish working with an application. With srun, you will be guaranteed the resources needed to complete your work.

2. This can also happen if you are within a srun session and you did not specify a memory request. Try again with a reasonable estimate with how much memory (in megabytes) you think you need (for example, 4096 Megabytes, or 4 Gigabytes):

[user@host ~]$ srun —time=HH:MM:SS —mem=4096 —pty /bin/bash

Q. I submitted my job and it has been in the ‘PD’ (pending) state for a long time. Why isn’t my job being run?

A. There are a few reasons why this might happen:

1. Did you specify a queue? Don’t do this! If all of the slots in the queue are occupied, you have to wait until enough are freed to start your job. Removing the queue definition in your submission script will allow your job to use many more resources and will increase your overall job throughput.

2. Did you make a reasonable resource request? If your submission script calls for 4096 processors and 1TB of RAM, you’ll be hard pressed to find enough resources on CIRCE to meet that request. To see our available hardware pool in order to make good decisions about resource requests, see the following guides:

Q. My job keeps terminating with no indication of anything wrong. What gives?

A. By default, all SLURM jobs have a runtime of 1 hour, after which the scheduler will send a termination signal. Have you specified a runtime on your job?

—time=08:00:00

This option would request a runtime of eight hours. The —time or -t option is required on both submitted jobs and interactive jobs. It is also possible that you did not specify enough time when making your request. Please see this guide on determining job run-time for help: Job Runtime Guide

Q. Why do I get redirected while using Internet Explorer to download software from the RC isos site?

A. The version of Internet Explorer you are using has a bug and is unable to download large files. Because of this, we have redirected users of Internet Explorer to this FAQ to explain the issue. You will be able to access the ISOS site and download the files but you must users browser, like Firefox, Mozilla, Chrome, Opera, or Safari. Here is a link to a Microsoft KB article discussing the issue in more detail: http://support.microsoft.com/kb/298618.

Q. I’ve read all of this stuff and I’m still having trouble. Who do I contact for help?

A. Send an e-mail to help@usf.edu with a detailed description of your problem. Include any input or output files, any error messages or warnings. Make sure you include Research Computing in the subject for the quickest possible response.