ALERT! Warning: your browser isn't supported. Please install a modern one, like Firefox, Opera, Safari, Chrome or the latest Internet Explorer. Thank you!
Startseite » ... » Zentrale Einrichtungen  » ZIH  » Wiki
phone prefix: +49 351 463.....

HPC Support

Operation Status

Ulf Markwardt: 33640
Claudia Schmidt: 39833

Login and project application

Phone: 40000
Fax: 42328

You are here: Compendium » DataManagement » FileSystems

File systems

Permanent file systems

Global /home file system

Each user has 20 GB in his /home directory independent of the granted capacity for the project. Hints for the usage of the global home directory:
  • If you need distinct .bashrc files for each machine, you should create separate files for them, named .bashrc_<machine_name>
  • If you use various machines frequently, it might be useful to set the environment variable HISTFILE in .bashrc_deimos and .bashrc_mars to $HOME/.bash_history_<machine_name>. Setting HISTSIZE and HISTFILESIZE to 10000 helps as well.
  • Further, you may use private module files to simplify the process of loading the right installation directories, see private modules.

Global /projects file system

For project data, we have a global project directory, that allows better collaboration between the members of an HPC project. However, for compute nodes /projects is mounted as read-only, because it is not a filesystem for parallel I/O. See below and also check the HPC introduction for more details.

Backup and snapshots of the file system

  • Backup is only available in the /home and the /projects file systems!
  • Files are backed up using snapshots of the NFS server and can be restored by the user
  • A changed file can always be recovered as it was at the time of the snapshot
  • Snapshots are taken:
    • from Monday through Saturday between 06:00 and 18:00 every two hours and kept for one day (7 snapshots)
    • from Monday through Saturday at 23:30 and kept for two weeks (12 snapshots)
    • every Sunday st 23:45 and kept for 26 weeks
  • to restore a previous version of a file:
    • go into the directory of the file you want to restore
    • run cd .snapshot (this subdirectory exists in every directory on the /home file system although it is not visible with ls -a)
    • in the .snapshot-directory are all available snapshots listed
    • just cd into the directory of the point in time you wish to restore and copy the file you wish to restore to where you want it
    • Attention! The .snapshot directory is not only hidden from normal view (ls -a), it is also embedded in a different directory structure. An ls ../..will not list the directory where you came from. Thus, we recommend to copy the file from the location where it originally resided:
      % pwd /home/username/directory_a % cp .snapshot/timestamp/lostfile lostfile.backup 
  • /home and /projects/ are definitely NOT made as a work directory: since all files are kept in the snapshots and in the backup tapes over a long time, they
    • senseless fill the disks and
    • prevent the backup process by their sheer number and volume from working efficiently.

Group quotas for the file system

The quotas of the home file system are meant to help the users to keep in touch with their data. Especially in HPC, it happens that millions of temporary files are created within hours. This is the main reason for performance degradation of the file system. If a project exceeds its quota (total size OR total number of files) it cannot submit jobs into the batch system. The following commands can be used for monitoring:
  • quota -s -g _groupname_ shows the project's usage of the file system.
  • quota -s -f /home shows the user's usage of the file system.
In case a project is above it's limits please...
  • remove core dumps, temporary data
  • talk with your colleagues to identify the hotspots,
  • check your workflow and use /tmp or the scratch file systems for temporary files
  • systematicallyhandle your important data:

Work directories

File system Usable directory Capacity Availability Backup Remarks
Lustre /scratch/ 4 PB global No Only accessible via workspaces. Not made for billions of files!
Lustre /lustre/ssd 40 TB global No Only accessible via workspaces. Fastest available file system, only for large parallel applications running with millions of small I/O operations
ext4 /tmp 95.0 GB local No is cleaned up after the job automatically

Large files in /scratch

The data containers in Lustre are called object storage targets (OST). The capacity of one OST is about 21 TB. All files are striped over a certain number of these OSTs. For small and medium files, the default number is 2. As soon as a file grows above ~1 TB it makes sense to spread it over a higher number of OSTs, eg. 16. Once the file system is used > 75%, the average space per OST is only 5 GB. So, it is essential to split your larger files so that the chunks can be saved!

Lets assume you have a dierctory where you tar your results, eg. /scratch/mark/tar . Now, simply set the stripe count to a higher number in this directory with:
lfs setstripe -c 20  /scratch/ws/mark-stripe20/tar

Note: This does not affect existing files. But all files that will be created in this directory will be distributed over 20 OSTs.

Warm archive

Recommendations for file system usage

To work as efficient as possible, consider the following points
  • Save source code etc. in /home or /projects/...
  • Store checkpoints and other temporary data in /scratch/ws/...
  • Compilation in /dev/shm or /tmp

Getting high I/O-bandwitdh
  • Use many clients
  • Use many processes (writing in the same file at the same time is possible)
  • Use large I/O transfer blocks