News

Recent News

Posted on: Details
11/07/2025 17:56 EST Update 1: SVC Power Outage


A quick update. Most power to the SVC data center has been restored. Normal usage may resume for most partitions. Full service has been restored for the RRA Cluster, Student Cluster, network filesystems, and web hosts. Three partitions of the main CIRCE cluster (general, himem, and hii02), as well as a few individual nodes (mostly high power draw GPU nodes) remain offline while additional actions are performed on the electrical systems in the datacenter. Users may still submit jobs to the "Down" partitions for placement in the queue, but jobs will not start until full power is restored. We anticipate that all services and partitions will be able to be restored by Monday (Nov 10) afternoon.

In summary: Service to most Research Computing resources have been restored, with the exception of the general, himem, and hii02 partitions, as well as a few GPU nodes. The remaining services should be restored by Monday afternoon.

If you have any questions about the above information, please contact Research Computing at rc-help@usf.edu

11/07/2025 15:26 EST SVC Power Outage


Research Computing has been notified of a power outage within the SVC data center. As such, all access to the CIRCE HPC cluster has been halted until power has been restored. This will affect all Research Computing services including the CIRCE cluster, RRA cluster, network mounted filesystems, and web services.

Further messages will be sent once we have received updates regarding the campus electrical utilities.

10/30/2025 16:04 EDT Update: CIRCE Maintenance Window on Oct. 30


The networking mainenance for the CIRCE cluster has been completed successfully, and all access has been restored. Normal operations may now be resumed.

If you have any questions or run into problems, please contact Research Computing at rc-help@usf.edu

10/28/2025 13:40 EDT CIRCE Maintenance Window on Oct. 30


A brief maintenance window has been scheduled for Thursday October 30th from 1pm to 5pm EDT in order to update and expand the HDR Infiniband network in SVC. As this network is used for the primary networked filesystem, a downtime of approximately 2-4 hours for the CIRCE cluster and partition nodes will be necessary.

During this time, all CIRCE login and compute nodes will be unavailable. Remote filesystem access and hosted web pages will also be offline.

Access to the RRA cluster, SC cluster, and PGS filesystem should remain available and not affected by the proposed maintenance window.

In summary: The CIRCE cluster and some RC web services will be unavailable starting at 1pm (EDT) on Thursday, October 30 for network upgrades in SVC. Service is anticipated to be restored by 5pm (EDT) that same day.

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu

08/20/2025 15:17 EDT CIRCE general partition and work directories - Update 1


The maintenance period for the CIRCE cluster has been completed and all services have been restored.

Old user work directories are now mounted at /work_gpfs/ on the login nodes. If you previously had data stored in /work, please check /work_gpfs for any files that you ay need. It is advisable that users transfer any data that you may need to access later to another location. It is also important to note that all directory locations that are prefixed with "work" are not included in any backup routines. These are considered temporary, volatile storage mounts for use during jobs run on cluster partitions. Important files should not be stored in "work" directories long term.

The previously name "circe" partition has been replaced with one name "general". These are cpu only nodes, intended for general research usage. It is now the default partition, if no partition name is specified in a SLURM submit script. The work directory on this partition in located in /work_gpfs, and uses the same /work_gpfs that you see on the login nodes.

The "MOTD" that you see when you login has been updated with the names and types of other partitions that are available for general research usage without preemption.

All user directories that had at least 1 login access dating back to January 1, 2025 have now been restored from either back up or the old filesystem. If you login and do not see your home directory, please submit a ticket to rc-help@usf.edu.

All partitions are now active, and job submissions can resume as normal.

If you have any further questions, please contact Research Computing at rc-help@usf.edu.

Information is also posted on the News page of the RC wiki site as well as at the other links below.

https://wiki.rc.usf.edu/index.php/News

https://wiki.rc.usf.edu/index.php/CIRCE_Hardware

https://wiki.rc.usf.edu/index.php/SLURM_Partitions

08/15/2025 12:18 EDT CIRCE general partition and work directories


This is an update with regards to the old /work directories and the default compute partition on CIRCE. The GPFS filesystem that was formerly the main filesystem and is the only high-speed parallel filesystem available for the general usage partition has been fully repaired and can be returned to service. All files, including /work directories will be available once it has been mounted. In order to do this, we will need to schedule a short maintenance window for Wednesday, August 20, beginning at 9am. The scheduled work should be able to be completed the same day, and all services should be able to be restored by 4pm. Login access and filesystem mounts via cifs.rc.usf.edu will be unavailable during this time. Jobs submitted to the cluster should continue to run in the background.

Services restored will include access to the old /work directories on login nodes and cifs, as well as the general usage circe compute partition for job submissions with SLURM.

The himem partition has been upgraded with new network hardware, and has already been migrated to the new filesystem and re-enabled for users that need access to high memory nodes.

If you have any further questions, please contact Research Computing at rc-help@usf.edu.

07/21/2025 16:35 EDT Update 6: SVC datacenter maintenance - July 2025


Access to the CIRCE cluster has now been restored. The /home and /shares directories have been recovered from backup and are currently hosted on a secondary filesystem. Users may now log in and submit jobs; however, please review the following important updates, as several aspects of the CIRCE environment have changed due to the filesystem migration.

  • Partition Availability: The circe and himem partitions remain offline, as the legacy hardware supporting them is incompatible with the new filesystem. All users may submit jobs to the snsm_itn19 partition using the openacces QOS with the following SLURM flags:
    • #SBATCH --partition=snsm_itn19 --qos=openaccess
    • Please note that job wait times may be longer than usual due to reduced partition availability.
  • Work Directory Information: The /work directories are volatile, high-performance scratch spaces intended for temporary use during SLURM scheduled jobs. These directories are not backed up, and any data previously stored there will not be available when you log in.
  • Quota Adjustments: User storage quotas have been revised to accommodate the smaller capacity of the secondary filesystem:
    • /home: 200 GB per user (unchanged)
    • /shares and /work: 1 TB per user
  • Restoration Status: Not all user directories have been fully restored. If you accessed CIRCE within the past three months, your directory has likely already been recovered. Restoration is ongoing and prioritized by most recent login date. We appreciate your patience as we complete this process..
  • File Recovery: Data has been restored from a snapshot taken at midnight on July 8. Files created or modified after July 7 may not be present. However, nightly incremental backups may contain additional data. If you are missing important files, please contact us at rc-help@usf.edu so we can investigate further recovery options.
  • Job Resubmission: Any jobs that were active prior to the recent power event will need to be reviewed for data integrity and manually resubmitted if necessary.

Efforts are ongoing to repair the original filesystem. While data recovery remains a priority, users should be prepared for the possibility that restoration may not be feasible. The affected hardware, in service since 2016, was scheduled for replacement this fall. New hardware, including additional cluster nodes and a more robust, next-generation filesystem has already been purchased, and work is underway for deployment of these new resources during the Fall 2025 semester.

If additional resources are recovered in the meantime, we will post further updates to the mailing list and on this News page. Thank you for your continued support and understanding.

Details about hardware in the available CIRCE partitions can be found at the following links. https://wiki.rc.usf.edu/index.php/CIRCE_Hardware https://wiki.rc.usf.edu/index.php/SLURM_Partitions

If you have any further questions, please contact Research Computing at rc-help@usf.edu.

07/17/2025 14:45 EDT Update 5: SVC datacenter maintenance - July 2025


At present, we are actively working to restore the primary filesystem used by the CIRCE cluster. While the rebuild and verification processes are underway, progress has been slower than anticipated. In parallel, we are transferring data from recent backup snapshot to a secondary filesystem. This secondary environment is being prepared as a contingency should recovery of the primary filesystem prove unsuccessful. Given the time required for either path to complete, we currently anticipate that the cluster will remain unavailable for user logins and job submissions until Monday afternoon. We sincerely appreciate your continued patience as the Research Computing team works diligently to complete the restoration process.

Additional information will be posted here as it becomes available.

If you have any further questions, please contact Research Computing at rc-help@usf.edu.

07/14/2025 16:05 EDT Update 4: SVC datacenter maintenance - July 2025


Brief update regarding CIRCE restart following the SVC building maintenance.

Power has been restored to the main CIRCE filesystem components. Due to the unexpected nature of the initial shutdown, there are currently multiple rebuild processes running on disks in both the metadata and object data storage pools. As there are over 1000 disks comprising both major systems, this has been running for several hours. The rebuild process may take over 48 hours to complete. Until the rebuild of the metadata has completed, the file system will not be able to be mounted.

Once the rebuild processes are complete, we can then start the process of remounting the filesystem across the cluster. Additional email notifications will be sent when full functionality is restored.

At this time, the RRA and Student Clusters are fully operational, as well as the license servers.

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

07/09/2025 18:35 EDT Update 3: SVC datacenter maintenance - July 2025


Update regarding CIRCE access during the SVC building maintenance.

Unfortunately, the reduced power supplied to the datacenter for the maintenance period was not able to support the remaining systems, and all CIRCE systems in SVC had to be shut down. The contractors are scheduled to continue working on the electrical systems in SVC through this weekend, meaning the CIRCE cluster and associated services will remain powered down through Monday, July 14th. We apologize for any inconvenience that this may cause.

The RRA cluster and license servers will remain active, as those services are housed in a separate datacenter that is unaffected by the maintenance in SVC.

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

07/03/2025 14:55 EDT Update 1: SVC datacenter maintenance beginning July 2025


UPDATE - Revised Start Date! Research Computing has been notified of a date change to the planned facilities maintenance for the electrical systems within the SVC data center. Work will now begin on July 9th.

The planned work is required maintenance for the continued operation of the power backup systems in the SVC data center. During this scheduled work, available power will be reduced while circuits are bypassed during the maintenance on each system. This will affect computational resources in the following CIRCE partitions.

circe, amd_2021, cool2022, himem, snsm_itn19, amdwoods_2022, bfbsm_2019, chbme_2018, hchg, margres_2020, muma_2021, qcg_gales_2022, simmons_itn18 and hii02

Beginning at 9am on July 9, new jobs submissions to the above queues will be suspended. Any running jobs that remain will be stopped by 5pm on July 9. Users should still be able to log in to CIRCE to transfer files. Any new jobs submitted to the above partitions will not start running until the maintenance has been completed. Currently, no down time is planned for the RRA cluster, the Student Cluster, license servers, or filesystems.

The contractor conducting the work plans to complete the project as quickly as possible. The electrical maintenance is scheduled to be completed by Monday morning, July 14 at 9 am. There is a possibility that it may be finished sooner, however please plan for the full down time of the partitions until the CoB on Monday. Once Research Computing has received the all-clear that work has been completed, the affected partitions will be powered-on and resumed. While we do not anticipate a full loss of power to the data center during this time, there is always a risk, and extended downtime may be necessary if additional facilities maintenance is required. Please be aware that any unscheduled maintenance may result in additional CIRCE and other RC resources becoming unavailable.

In summary: The above listed CIRCE partitions will be unavailable starting at 9am (EDT) on Thursday, July 9 for facilities maintenance related to building electrical infrastructure in SVC. Service is anticipated to be restored by 5pm (EDT) on Monday, July 14.

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

06/30/2025 16:45 EDT SVC datacenter maintenance beginning July-10-2025


Research Computing has been notified of planned facilities maintenance for the electrical systems within the SVC data center.

Important - please check the https://wiki.rc.usf.edu/index.php/News page for updates, as this mailing list may not be available during the planned outage

The planned work is required maintenance for the continued operation of the power backup systems in the SVC data center. During this scheduled work, available power will be reduced while circuits are bypassed during the maintenance on each system. This will affect computational resources in the following CIRCE partitions.

circe, amd_2021, cool2022, himem, snsm_itn19, amdwoods_2022, bfbsm_2019, chbme_2018, hchg, margres_2020, muma_2021, qcg_gales_2022, simmons_itn18 and hii02

Beginning at 9am on July 10, new jobs submissions to the above queues will be suspended. Any running jobs that remain will be stopped by 12 noon on July 10. Users should still be able to log in to CIRCE to transfer files. Any new jobs submitted to the above partitions will not start running until the maintenance has been completed. Currently, no down time is planned for the RRA cluster, the Student Cluster, license servers, or filesystems.

The contractor conducting the work plans to complete the project as quickly as possible. The electrical maintenance is scheduled to be completed by Monday morning, July 14 at 9 am. There is a possibility that it may be finished sooner, however please plan for the full down time of the partitions until the CoB on Monday. Once Research Computing has received the all-clear that work has been completed, the affected partitions will be powered-on and resumed. While we do not anticipate a full loss of power to the data center during this time, there is always a risk, and extended downtime may be necessary if additional facilities maintenance is required. Please be aware that any unscheduled maintenance may result in additional CIRCE and other RC resources becoming unavailable.

In summary: The above listed CIRCE partitions will be unavailable starting at 9am (EDT) on Thursday, July 10 for facilities maintenance related to building electrical infrastructure in SVC. Service is anticipated to be restored by 5pm (EDT) on Monday, July 14.

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

03/14/2025 05:30 EDT Update 2: Upcoming scheduled maintenance for CIRCE


All systems and partitions have been fully restored to service. Normal activity may resume at this time.

As a reminder, the license0.rc.usf.edu network license server is scheduled for maintenance on Tuesday, March 18 beginning at 9am. All licenses that are hosted on license0.rc.usf.edu will be unavailable for the duration. The work is expected to be completed by 5pm.

More information can be found in the email below, or in the "News" section of the Research Computing website: https://wiki.rc.usf.edu/index.php/News

03/13/2025 16:43 EDT Update 1: Upcoming scheduled maintenance for CIRCE


Quick update regarding the GPFS filesystem maintenance. All of the faulty components have been replaced and the filesystem is not reporting any errors at this time. There are currently several rebuild processes running on disks that were attached to the previously faulty disk array. This is the expected behavior, and ALL data appears to be fully intact. The rebuild may take a few more hours, as there are 45 disks rebuilding in the array, as opposed to the usual 1 or 2 from a routine disk replacement. As there are a large number of disks rebuilding, it is best to not yet enable access to a full load from computational jobs.

Once the rebuild processes are nearly complete, login access to CIRCE will be restored. When this happens, users may stage files for jobs or submit new jobs to the queue. However, we ask that users refrain from transferring large files or large numbers of files until the rebuild processes are fully complete. The CIRCE partitions will also remain down (submitted jobs will pend in the queue) until rebuilding is complete. Another email notification will be sent when full functionality is restored.

At this time, the RRA and Student Clusters are fully operational.

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

03/12/2025 10:43 EDT Reminder 1: Upcoming scheduled maintenance for CIRCE


This is a reminder that the CIRCE cluster will be taken offline at 9am tomorrow (March 13) for GPFS filesystem maintenance.

There is also a planned maintenance window for the license0.rc.usf.edu license server on Tuesday, March 18.

More information can be found in the email below, or in the "News" section of the Research Computing website: https://wiki.rc.usf.edu/index.php/News

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

03/03/2025 13:02 EST Upcoming scheduled maintenance for CIRCE


As a user of the CIRCE HPC cluster, this message is to inform you of upcoming scheduled maintenance for the main home and shares filesystem, as well as the license0.rc.usf.edu license server.

First, on Thursday March 13, the GPFS filesystem that houses home and shares on CIRCE will be taken offline to repair/replace failed components. The filesystem is currently still fully redundant, but the components need to be replaced in order to insure the stability of the filesystem in the even of further hardware failures. In order to replace the components, the system must be taken offline so that one of the enclusures that houses hard drives can be removed and disassembled.

The above work will affect all CIRCE attached partitions, login nodes, network mountable filesystems, and all RC hosted web servers as well. Beginning at 9am on March 13, all jobs will be stopped and no further access to any CIRCE partition, login node, or filesystem will be available until after the maintenance has been completed. Access is planned to be restored by 6pm on the sam day. However, please plan for the possibility that services may not be fully restored until later in the week should additional issues manifest.

While the RRA and SC clusters may remain online in the MDC data center, there is a possibility that the loss of communication with resources housed in SVC may cause instability on those systems. Therefore we recommend not relying on the availability of the RRA or SC clusters during the planned GPFS filesystem outage.


The Second scheduled maintenance is for the license0.rc.usf.edu network license server. The old server housing license0 is being replaced with a new VM. This work is planned for Tuesday, March 18. Beginning at 9am that day, all licenses that are hosted on license0.rc.usf.edu will be unavailable for the duration. The work is expected to be completed by 5pm. The following software packages will be affected during the maintenance window.

  • Matlab
  • Mathematica
  • COMSOL
  • Ansys EDT (HFSS)
  • Ansys Multiphysics
  • Fluent
  • IDL
  • Maple
  • Synopsys

In summary: All CIRCE partitions, as well as any RC hosted filesystems and web servers, will be offline starting at 9am (EDT) on March 13 for primary filesystem maintenance. Other RRA and Student Cluster services may be affected as well. Service is anticipated to be restored by 6pm (EDT) on March 13, but may extend until later in that week. A second maintenance event is planned for 9am (EDT) on March 18 for the license0.rc.usf.edu license server. The old server is planned to be replaced, and may be offline for 1 full business day. The software outlined above will be affected.

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

02/04/2025 16:51 EST Update 3: SVC datacenter maintenance beginning 01/27/2025


All CIRCE nodes are being resumed at this time. Users may log in and submit jobs normally. It may take a few hours until all compute node are fully functional, so please be aware that queue times may be longer than normal until the systems are all back to nominal operations.

Any jobs that were running during this mornings power event will need to be checked for any data loss and resubmitted.

The file system itself has been checked for errors, and we do not believe that any existing data has been corrupted or lost. However, any jobs running at the time that were writing out data may have been stopped mid-stream. Those will need to be checked for partially incomplete files.

Going forward, we have been told that the data center is back on fully redundant power with battery backup. Any further work on the electrical facilities has been postponed until next week. We have been informed that steps are being taken to prevent downtime, but have not been given a specific date for the rescheduled time. Currently, there are no plans to shut down any queues or filesystems over the next 2 weeks, but any changes to the current course of action will be sent to this mailing list.

Additional details can be found in the previous email below, or in the "News" section of the RC wiki site https://wiki.rc.usf.edu/index.php/News

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

02/04/2025 06:19 EST Update 2: SVC datacenter maintenance beginning 01/27/2025


Due to an unexpected issue with the remaining electrical power that was supplying the CIRCE compute nodes and filesystems that occurred at 04:42 AM EST, all access to the CIRCE cluster has been suspended. All compute nodes in SVC are currently powered down, and the main filesystem is being i/o restricted to protect the health of the disks. It may be fully spun down later today, if deemed necessary. RC staff are currently monitoring the situation, and are in contact with facilities management. No ETA for a return to service can be given at this time, but further updates will be sent to this mailing list as soon as info is available.

Additional details can be found in the previous email below, or in the "News" section of the RC wiki site https://wiki.rc.usf.edu/index.php/News

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

01/30/2025 15:48 EST Update 1: SVC datacenter maintenance beginning 01/27/2025


This is an update regarding the electrical facilities maintenance in the SVC Datacenter. We have been notified that part of the work has been completed, but additional parts needed to be ordered. Currently, work has been halted, but the room is still on reduced power. At this time, most systems are being returned to service if power is available. Job submissions to all queues can be resumed, but several nodes remain without power in the previously listed partitions. Once we receive additional details for when the work will be resumed, further updates will be sent to the mailing list.

Additional details can be found in the previous email below, or in the "News" section of the RC wiki site https://wiki.rc.usf.edu/index.php/News

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

01/17/2025 14:46 EST SVC datacenter maintenance beginning 01/27/2025


Research Computing has been notified of planned facilities maintenance for the electrical systems within the SVC data center.

The planned work is requried maintenance for the continued operation of the power backup systems in the SVC data center. During this scheduled work, available power will be reduced while circuits are bypassed during the maintenance on each system. This will affect only compute nodes in the following CIRCE partitions that are housed in SVC.

circe, amd_2021, cool2022, himem, mri2016, and hii02

Beginning at 5pm on Jan. 27, new jobs submissions to the above queues will be suspended. Any running jobs that remain will be stopped by 8am on Jan. 28. Users should still be able to log in to CIRCE to monitor and submit new jobs to partitions that remain operational. Any new jobs submitted to the above partitions will not start running until the maintenance has been completed. Currently, no down time is planned for RRA, the Student Cluster, license servers, or filesystems.

The contractor conducting the work plans to complete the project as quickly as possible. The electical maintenance is scheduled to be completed by Friday morning, Jan. 31 at 9 am. There is a possibility that it may be finished sooner, however please plan for the full down time of the partions until the CoB on Friday. Once Research Computing has received the all-clear that work has been completed, the affected partitions will be powered-on and resumed. While we do not anticipate a full loss of power to the data center during this time, there is always a risk, and extended downtime may be necessary if additional facilities maintenance is required. Please be aware that any unscheduled maintenance may result in additional CIRCE and other RC resources becoming unavailable.

In summary: The above listed CIRCE partitions will be unavailable starting at 5pm (EST) on Monday, Jan. 27 for facilities maintenance related to building electrical infrastructure in SVC. Service is anticipated to be restored by 9am (EST) on Jan. 31.

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

10/11/2024 10:25 EDT Update 2: Hurricane Milton


Quick update. Hope that you all are doing well and made it through the storm without too many problems. While crews assess and repair the damage on campus, Research Computing remains in emergency operations mode. Generators and backup chillers did kick in during the storm, but the filesystems and essential system servers were able to stay on.

For the time being, critical infrastructure including the login nodes, storage systems, web servers, and CIFS servers will continue to operate and will be accessible.

Currently we are unable to provide an exact estimate for when the compute nodes will be able to be powered back on. We will begin the process as soon as we are given the all clear from Emergency Management that it is safe to return to campus and resume normal operations. It may be another day or 2 before this happens however.

Should any change in this decision be made, we will make our best efforts to keep this Research Computing mailing list and our News page (https://wiki.rc.usf.edu/index.php/News) as up-to-date as possible.

Please contact rc-help@usf.edu with any questions or concerns.

10/08/2024 11:35 EDT Update 1: Hurricane Milton


In response to the current University guidance and based upon current tracking data with potential effects of the storm on the Tampa Bay area, Research Computing has entered emergency operations mode. RC staff will begin suspending queues on the CIRCE, RRA, and Student clusters this afternoon. This action will reduce heat within the data centers, in addition to conserving power draw in the event that emergency generators are needed. Currently running jobs may be able to finish, but all jobs will be stopped so that compute nodes can be powered down by 12 noon on Wednesday.

For the time being, critical infrastructure including the CIRCE and RRA login nodes, storage systems, and CIFS servers will continue to operate and will be accessible. Depending on the effects of the storm, these assets may need to be powered down as well.

Should any change in this decision be made, we will make our best efforts to keep this Research Computing mailing list and our News page (https://wiki.rc.usf.edu/index.php/News) as up-to-date as possible.

Please contact rc-help@usf.edu with any questions or concerns.

10/07/2024 09:48 EDT Hurricane Milton


Research Computing staff are closely monitoring the progress of Hurricane Milton. At this time, all systems are functional and remain under normal operations.

While no outages are scheduled at this time, please plan for the likely event that we may see some impact to operations of the Research Computing clusters depending on the path of the storm. The National Hurricane Center has issued a Hurricane watch for the Tampa Bay area, and it could bring heavy rains and wind by mid week. In the event of disruption to power or cooling in the data center, we may need to execute a shutdown of systems in the following order:

1.) A shutdown of all computational systems.
2.) A shutdown of all login nodes.
3.) A shutdown of all storage systems.

We will post updates as needed and when possible. Please contact rc-help@usf.edu with any questions regarding Research Computing services.

09/27/2024 12:18 EDT Final Update: Hurricane Helene


Research Computing has been given the all-clear notice from USF facilities that the campus infrastructure is ready for the compute nodes to be returned to service.

As of 11:50 am today, all Research Computing computational resources are now back online. Users may log in and submit jobs to the CIRCE, RRA, or SC clusters as normal.

If you experience any issues, please contact the Research Computing Help Desk at rc-help@usf.edu

09/25/2024 12:16 EDT Update 1: Hurricane Helene


In response to the current University guidance and based upon current tracking data with potential effects of the storm on the Tampa Bay area, Research Computing has entered emergency operations mode. RC staff will be partially suspending queues on the CIRCE and RRA clusters later this afternoon. This action will reduce heat within the data centers, in addition to conserving power draw in the event that emergency generators are needed. Job submissions will still be accepted and queued, but wait times may increase until full operations resume. A full shutdown is NOT expected at this time.

Currently, critical infrastructure including the CIRCE and RRA login nodes, storage systems, and CIFS servers will continue to operate and will be accessible. Depending on the severity of the storm effects and advisement from campus facilities operations, cluster assets may need to be powered down during the day on Thursday.

Should any change in this decision be made, we will make our best efforts to keep this Research Computing mailing list and our News page (https://wiki.rc.usf.edu/index.php/News) as up-to-date as possible.

Please contact rc-help@usf.edu with any questions or concerns.

09/24/2024 11:41 EDT Potential Tropical Storm Helene


Research Computing staff are closely monitoring the development of Potential Tropical Cyclone Nine. At this time, all systems are functional and remain under normal operations.

While no outages are planned at this time, we may see some impact to operations of the Research Computing clusters depending on the path of the storm. The National Hurricane Center has issued a Hurricane watch for the Tampa Bay area, and it could bring heavy rains and wind over towards the end of this week. In the event of disruption to power or cooling in the data center, we may need to execute a shutdown of systems in the following order:

1.) A shutdown of all computational systems.
2.) A shutdown of all login nodes.
3.) A shutdown of all storage systems.

We will post updates as needed and when possible. Please contact rc-help@usf.edu with any questions regarding Research Computing services.

08/05/2024 15:07 EDT Reminder 2: Network maintenance beginning August 6th, 2024


This is a reminder for the service outage related to the network maintenance beginning tomorrow (Aug 6) at 8am (EDT), and extending until Wednesday morning (Aug 7). Please plan for the possibility that services may not be fully restored until later in the week, in the event that unexpected complications arise.

Additional details can be found in the previous email below, or in the "News" section of the RC wiki site https://wiki.rc.usf.edu/index.php/News

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

07/31/2024 12:15 EDT Reminder 1: Network maintenance beginning August 6th, 2024


This is a reminder for the service outage related to the network maintenance beginning next Tuesday (Aug 6) at 8am (EDT), and extending until Wednesday morning (Aug 7). Please plan for the possibility that services may not be fully restored until later in the week, in the event that unexpected complications arise.

During the scheduled work, the network switches, login nodes, and several attached systems will be offline for the duration. This will affect access to ALL resources attached to the CIRCE, RRA, and student clusters, as well as remote file systems and RC hosted web sites. Any jobs running after the maintenance starts will be stopped. Jobs will need to be resubmitted after the resources are back online.

Additional details can be found in the previous email below, or in the "News" section of the RC wiki site https://wiki.rc.usf.edu/index.php/News

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

07/17/2024 12:49 EDT Network maintenance beginning August 6th, 2024


The Ethernet networking hardware used by several Research Computing systems will be replaced and/or upgraded on Tuesday, Aug 6th.

During the scheduled work, the network switches, login nodes, and several attached systems will be offline for the duration. This will affect access to ALL resources attached to the CIRCE, RRA, and student clusters, as well as remote file systems and RC hosted web sites. During the network outage, all CIRCE and RRA partitions will be unavailable. Beginning at 8am (EDT) on Tuesday, Aug 6th, all jobs will be stopped and no further access to any partition, login node, or filesystem will be available until after the new network gear has been installed and put into service.

This outage should NOT affect the license servers.

We are currently projecting a return to services by 9am (EDT) on Wednesday, Aug. 7th. However, please plan for the possibility that services may not be fully restored until later in the week.

In summary: All CIRCE, RRA, and student cluster partitions will be offline starting at 8am (EDT) on Tuesday Aug. 6 for network upgrades and maintenance. All RC hosted file systems and web sites will also be affected. Service is anticipated to be restored by 9am (EDT) on Aug. 7, but may extend until later in that week.

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

06/03/2024 15:35 EDT PGS storage returned to service


The PGS storage system has been returned to service. Network mounts via cifs-pgs can be resumed.

The PGS filesystem had been powered down during the planned electrical maintenance that started on May 31. When the system was powered back up, the array exhibited cascading drive failures. RC staff have been able to replace the faulty systems and rebuild the filesystem structure. Any directories that were hosted on PGS that had backups have been restored from the most recent snapshot taken before the May 31st electrical maintenance. For most directories, the backup would have been from approximately 6am EDT on Friday May 31. Files placed in PGS mounted directories after the 6am May 31st backup will need to be re-transferred or recreated. If any older files are missing, they may still exist in an older daily incremental. Please contact rc-help@usf.edu if you have any questions.

06/05/2024 16:08 EDT PGS storage currently down


The PGS filesystem is currently unavailable due to issues with the filesystem related to last weekend's power maintenance in MDC. RC is working to fix the issue. Additional updates will follow when more information is available.

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

06/04/2024 15:53 EDT Final Update: MDC datacenter maintenance beginning 05/31/2024


This is the final update regarding the May 31 data center maintenance. The electrical work in the MDC building has been completed. Login access for the RRA and SC clusters is now available. All partitions and file systems are available on the CIRCE cluster.

All queues across the 3 clusters are operational. A few nodes remain offline due to various hardware issues, but RC staff will continue working with these nodes to repair or replace them. Job submissions can resume as normal at this time.

No further updates will be mailed regarding this maintenance window. Additional details can be found in the previous email below, or in the "News" section of the RC wiki site https://wiki.rc.usf.edu/index.php/News

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

05/23/2024 15:01 EDT Reminder 1: MDC datacenter maintenance beginning 05/31/2024


This is a reminder for service outage related to the MDC Data Center facilities maintenance beginning next Friday (May 31), and extending until at least Wednesday evening (June 5). The CIRCE partitions listed below, as well as the RRA cluster, Student Cluster, BGFS filesystem and PGS filesystem will unavailable.

  • amd_2021
  • amdwoods_2022
  • bfbsm_2019
  • cbcs
  • charbonnier_2022
  • chbme_2018
  • cms_ocg
  • margres_2020
  • muma_2021
  • qcg_gayles_2022
  • simmons_itn18
  • snsm_itn19


This outage will NOT affect the main circe and himem partitions, or the services hosting websites and license servers. Users will still be able to login to the CIRCE cluster to access files and submit jobs. Jobs submitted to any of the offline partitions will remain in a pending state until the resources are back online.

Additional details can be found in the previous email below, or in the "News" section of the RC wiki site https://wiki.rc.usf.edu/index.php/News

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

05/06/2024 13:43 EDT MDC datacenter maintenance beginning 05/31/2024


Research Computing has been notified of planned building facilities maintenance within the MDC data center. During the maintenance period, the CIRCE partitions in SVC will remain online. Please continue to read the full message below for details about which services will be offine, and which will remain available.

The planned work consists of several upgrades to the electrical infrastructure in the MDC building. Similar work was previously done in the SVC datacenter. During this scheduled work, all power to servers and filesystems within the MDC data center will be offline for the duration. This will affect several Research Computing services. The partitions listed below will be unavailable for the duration. In addition, all RRA and Student Cluster resources will be offline, as well as the BGFS and PGS filesystems.

  • amd_2021
  • amdwoods_2022
  • bfbsm_2019
  • cbcs
  • charbonnier_2022
  • chbme_2018
  • cms_ocg
  • margres_2020
  • muma_2021
  • qcg_gayles_2022
  • simmons_itn18
  • snsm_itn19


Beginning at 5pm on May 31, jobs on the above listed partitions will be stopped, and not available for further jobs until after power to the building has been restored.

This outage will NOT affect the main circe and himem partitions, or the services hosting websites and license servers. Users will still be able to login to the CIRCE cluster to access file and submit jobs. Jobs submitted to any of the offline partitions will remain in a pending state until the resources are back online, however.

The building maintenance is scheduled to be complete by Tuesday morning, June. 4th. Once Research Computing has received the all-clear that work has been completed, the process of rebooting all systems will begin. This process normally takes several hours, and while we do not anticipate any additional power disruption, it may take several days to fully restore services should any unexpected issues arise. We are currently projecting a return to services for 6pm (EDT) on June. 5th. However, please plan for the possibility that services may not be fully restored until later in the week.

In summary: The CIRCE partitions listed above, as well as the RRA cluster, Student Cluster, BGFS filesystem and PGS filesystem will be powered down starting at 5pm (EDT) on May 31 for facilities maintenance related to building electrical infrastructure in MDC. The main "circe" and himem partitions, login nodes, web sites and license servers will remain online. Service is anticipated to be restored by 6pm (EDT) on May 5th, but may extend until later in that week.

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.

03/08/2024 10:36 EST CIRCE resource to require USF VPN beginning March 14


On March 14, access to all Research Computing resources, including the CIRCE login nodes and mountable filesystems, will start to be moved behind the USF VPN. Access from machines on campus using either Wi-Fi or a wired Ethernet connection should remain unaffected. All off-campus users will want to be connected to the USF VPN prior to accessing CIRCE using X2Go, ssh, Filezilla, WinSCP, or other remote connection tools starting on that date. At some point after 12:00 AM EST on March 14, connections to CIRCE and other Research computing resources will no longer be accessible without first connecting to the USF VPN. We recommend that all users start the habit of connecting to the VPN from off campus as soon as possible. Even if some resources are available after that date, they would only remain so for a short time until all networking changes have been committed.

Please test any automated scripts and workflows that utilize CIRCE connections with the VPN prior to March 14, in order for RC staff to provide any necessary assistance in a timely manner.

Hosted web sites will NOT need to use the VPN, and will still be accessible from a web browser as normal.

The PDFs in the links below from the Office of Research & Innovation provide details about connecting to the VPN.

https://www.usf.edu/research-innovation/documents/globalprotect-windows.pdf

https://www.usf.edu/research-innovation/documents/globalprotect-macos.pdf

If you have not used the USF VPN before, please use the following link to set up the VPN on your device.

https://vpn.usf.edu/

Requests for assistance with connecting to the USF VPN should be directed to the main USF IT Help Desk: help@usf.edu

If you have any questions about CIRCE access, please contact Research Computing at rc-help@usf.edu.

02/29/2024 16:25 EST SVC datacenter maintenance beginning March 25, 2024


Research Computing has been notified of scheduled maintenance to upgrade an electrical breaker panel within the SVC data center.

During the scheduled work, power to the login nodes and core network switches within the SVC data center will be offline for the duration. This will affect access to all resources attached to the CIRCE cluster. During the power outage, all CIRCE partitions will be unavailable. Beginning at 6am on Monday, Mar 25, all jobs will be stopped and no further access to any CIRCE partition, login node, or filesystem will be available until after power to the breaker panel has been restored.

The RRA and SC clusters will remain online in the MDC data center, but there is a possibility that the loss of communication with resources housed in SVC may cause instability on those systems. Therefore we recommend not relying on the availability of the RRA or SC clusters during the planned SVC outage.

This outage should NOT affect the license servers or RC websites.

The electrical maintenance is scheduled to be complete by Tuesday afternoon, Mar. 26th. Once Research Computing has received the all-clear that work has been completed, the process of rebooting all systems will begin. This process normally takes several hours. We are currently projecting a return to services for 6pm (EDT) on Tuesday, Mar. 26th. However, please plan for the possibility that services may not be fully restored until later in the week.

In summary: All CIRCE partitions will be powered down starting at 6am (EDT) on Mar. 25 for facilities maintenance related to electrical upgrades in the SVC datacenter. Other RRA and SC services may be affected. Service is anticipated to be restored by 6pm (EDT) on Mar. 26, but may extend until later in that week.

If you have any questions about this maintenance window, please contact Research Computing at rc-help@usf.edu.