Quick PBS Commands for GEOFrame Simulations

In these days, I have had the need to run several jobs on a High-Performance Computing (HPC) system. Specifically, I wanted to perform multiple kriging simulations with the GEOFrame hydrological modeling system. GEOFrame already supports parallel processing, so each job can perform multiple simulations simultaneously using a topology file—typically synthetic to avoid unnecessary waits at leaf nodes. To further accelerate the workflow, I split my topology into multiple files and submitted each as part of an array job.

Below, I share some useful commands, starting with the PBS file for submitting array jobs. Each simulation file is named grap${PBS_ARRAY_INDEX}, where ${PBS_ARRAY_INDEX} varies from 1 to 90.

PBS Array Job Submission Example:


#!/bin/bash

#PBS -l select=1:ncpus=20:mem=30GB
#PBS -N kriging_array_1
#PBS -m abe
#PBS -M my_mail@mail.com
#PBS -l walltime=03:00:00
#PBS -J 1-90

#PBS -q myQueue


module load jdk-11.0.1

cd /home/daniele.andreis/articolo_kriging/

SIM_FILE="simulation/kriging/temp/grid/graph${PBS_ARRAY_INDEX}.sim"

java -Xms10G  -Xmx28G -XX:+UseG1GC  \
        -Doms3.work=~/kriging/ \
        -cp ".:~/oms-3.6.28-console/lib/oms-all.jar:lib/*:dist/*" oms3.CLI \
        -r $SIM_FILE  \
         &> /dev/null 2>&1


Other Useful PBS Commands:

These commands can be useful for monitoring and managing both simple jobs and array jobs:

qstat (Job Status and Queue Information)

  • qstat -Q: Displays information about all available queues.
  • qstat -u $USER: Shows the status of your jobs and job arrays.
  • qstat -atu $USER: Provides detailed information about all your jobs, including individual tasks within job arrays.
  • qstat -Qf queue_name: Shows detailed information about a specific queue (e.g., maximum number of jobs).

qdel (Cancel Jobs)

  • qdel job_id: Cancels a specific job.
  • qdel job_id[]: Cancels all jobs in an array.
  • qdel job_id[2]: Cancels the second job within a job array.

Checking Job Output Files:

  • To verify the number of output files generated matches expectations:
    ls -1A | wc -l
  • To quickly identify and delete empty error logs (indicating jobs that possibly failed):
    find ./ -type f -size 0 -name 'kriging*.e*' -delete

Comments