User Tools

Site Tools


news

ILRI Research Computing News

Here we will post news and announcements about the ILRI research-computing infrastructure, including important updates, breaking changes, etc.


May 2, 2023: planned maintenance

ILRI ICT is planning some major maintenance on our Nairobi infrastructure during the upcoming weekend.

Due to the invasive nature of the maintenance, the HPC will be inaccessible for several hours on the mornings of Saturday, May 6th and Sunday, May 7th beginning at 08:00 AM. Please refrain from submitting long-running jobs as they will be suspended or may be interrupted unexpectedly.

We will notify you when the maintenance is finished, after which time you can re-submit your jobs.

March 23, 2023: planned maintenance/upgrade

I am planning to upgrade the HPC's backend storage systems to a new version of their operating system (CentOS)¹ on Monday, March 27rd at 9AM (EAT). Due to the invasive nature of these upgrades I will need to stop existing SLURM jobs and restrict access to the clsuter. The work will probably take the entire morning.

— ¹ Technically speaking, we are upgrading from CentOS 7 to CentOS Stream 8, which is the same upgrade I've performed on several compute nodes over the past few months. This is a major upgrade that we only perform every five years or so, and will modernize and increase security on HPC.

November 15, 2022: planned maintenance/upgrade

We are planning to upgrade the operating system on the HPC head node. This will bring the OS from CentOS 7 to CentOS Stream 8, which is the same upgrade we did on all compute nodes several months ago. The maintenance will take place on:

                       Friday, November 25th @ 2–5PM

Any SLURM jobs that are still running will be re-queued and held until after the maintenance is over. The cluster will be inaccessible during this time.

Please bear with us for this important update to our infrastructure.

Thank you!

August 28, 2022: power issues in Nairobi

Unfortunately we had major power issues at the ILRI, Nairobi campus this weekend and the entire cluster was powered off several times. ILRI ICT is doing emergency maintenance on the UPS power backup system to hopefully mitigate issues like this in the future.

If you had any jobs running they may have crashed or been resubmitted. Please check carefully to see the state of your jobs and let me know if you have any questions.

  • 2022-08-29: Power went off again at 8:00AM.
  • 2022-08-30: Power went off again at 3:30PM.

August 22, 2022: updated syntax for SBATCH scripts

Due to a change in the behavior of our SLURM resource scheduler, the syntax for SBATCH scripts must change slightly to fix an issue with environment modules. If you are seeing errors such as this:

/var/spool/slurmd/job747929/slurm_script: line 8: module: command not found

Then you need to make sure the first line of your SBATCH script starts like this:

#!/usr/bin/bash -l

The new syntax requires -l after the bash command.


news.txt · Last modified: 2023/05/02 12:47 by aorth