Quick PBS Commands for GEOFrame Simulations
In these days, I have had the need to run several jobs on a High-Performance Computing (HPC) system. Specifically, I wanted to perform multiple kriging simulations with the GEOFrame hydrological modeling system. GEOFrame already supports parallel processing, so each job can perform multiple simulations simultaneously using a topology file—typically synthetic to avoid unnecessary waits at leaf nodes. To further accelerate the workflow, I split my topology into multiple files and submitted each as part of an array job.
Below, I share some useful commands, starting with the PBS file for submitting array jobs. Each simulation file is named grap${PBS_ARRAY_INDEX}
, where ${PBS_ARRAY_INDEX}
varies from 1 to 90.
PBS Array Job Submission Example:
#!/bin/bash
#PBS -l select=1:ncpus=20:mem=30GB
#PBS -N kriging_array_1
#PBS -m abe
#PBS -M my_mail@mail.com
#PBS -l walltime=03:00:00
#PBS -J 1-90
#PBS -q myQueue
module load jdk-11.0.1
cd /home/daniele.andreis/articolo_kriging/
SIM_FILE="simulation/kriging/temp/grid/graph${PBS_ARRAY_INDEX}.sim"
java -Xms10G -Xmx28G -XX:+UseG1GC \
-Doms3.work=~/kriging/ \
-cp ".:~/oms-3.6.28-console/lib/oms-all.jar:lib/*:dist/*" oms3.CLI \
-r $SIM_FILE \
&> /dev/null 2>&1
Other Useful PBS Commands:
These commands can be useful for monitoring and managing both simple jobs and array jobs:
qstat (Job Status and Queue Information)
qstat -Q
: Displays information about all available queues.qstat -u $USER
: Shows the status of your jobs and job arrays.qstat -atu $USER
: Provides detailed information about all your jobs, including individual tasks within job arrays.qstat -Qf queue_name
: Shows detailed information about a specific queue (e.g., maximum number of jobs).
qdel (Cancel Jobs)
qdel job_id
: Cancels a specific job.qdel job_id[]
: Cancels all jobs in an array.qdel job_id[2]
: Cancels the second job within a job array.
Checking Job Output Files:
- To verify the number of output files generated matches expectations:
ls -1A | wc -l
- To quickly identify and delete empty error logs (indicating jobs that possibly failed):
find ./ -type f -size 0 -name 'kriging*.e*' -delete
Comments
Post a Comment