ALERT! Warning: your browser isn't supported. Please install a modern one, like Firefox, Opera, Safari, Chrome or the latest Internet Explorer. Thank you!
Startseite » ... » Zentrale Einrichtungen  » ZIH  » Wiki
phone prefix: +49 351 463.....

HPC Support

Operation Status

Ulf Markwardt: 33640
Claudia Schmidt: 39833 hpcsupport@zih.tu-dresden.de

Login and project application

Phone: 40000
Fax: 42328
servicedesk@tu-dresden.de

You are here: Compendium » DataManagement » FileSystems

File systems

Permanent file systems

Global /home file system

Each user has 20 GB in his /home directory independent of the granted capacity for the project. Hints for the usage of the global home directory:
  • If you need distinct .bashrc files for each machine, you should create separate files for them, named .bashrc_<machine_name>
  • If you use various machines frequently, it might be useful to set the environment variable HISTFILE in .bashrc_deimos and .bashrc_mars to $HOME/.bash_history_<machine_name>. Setting HISTSIZE and HISTFILESIZE to 10000 helps as well.
  • Further, you may use private module files to simplify the process of loading the right installation directories, see private modules.

Global /projects file system

For project data, we have a global project directory, that allows better collaboration between the members of an HPC project.

Backup and snapshots of the file system

  • Backup is only available in the /home and the /projects file systems!
  • Files are backed up using snapshots of the NFS server and can be restored by the user
  • A changed file can always be recovered as it was at the time of the snapshot
  • Snapshots are taken:
    • from Monday through Saturday between 06:00 and 18:00 every two hours and kept for one day (7 snapshots)
    • from Monday through Saturday at 23:30 and kept for two weeks (12 snapshots)
    • every Sunday st 23:45 and kept for 26 weeks
  • to restore a previous version of a file:
    • go into the directory of the file you want to restore
    • run cd .snapshot (this subdirectory exists in every directory on the /home file system although it is not visible with ls -a)
    • in the .snapshot-directory are all available snapshots listed
    • just cd into the directory of the point in time you wish to restore and copy the file you wish to restore to where you want it
    • Attention! The .snapshot directory is not only hidden from normal view (ls -a), it is also embedded in a different directory structure. An ls ../..will not list the directory where you came from. Thus, we recommend to copy the file from the location where it originally resided:
      % pwd /home/username/directory_a % cp .snapshot/timestamp/lostfile lostfile.backup 
  • /home and /projects/ are definitely NOT made as a work directory: since all files are kept in the snapshots and in the backup tapes over a long time, they
    • senseless fill the disks and
    • prevent the backup process by their sheer number and volume from working efficiently.
  • The backup automatically skips directories named NOBACKUP.ZIH.

Group quotas for the file system

The quotas of the home file system are meant to help the users to keep in touch with their data. Especially in HPC, it happens that millions of temporary files are created within hours. This is the main reason for performance degradation of the file system. If a project exceeds its quota (total size OR total number of files) it cannot submit jobs into the batch system. The following commands can be used for monitoring:
  • quota -s -g _groupname_ shows the project's usage of the file system.
  • quota -s -f /home shows the user's usage of the file system. Please mark: We have no quotas for the single accounts, but for the whole project!
In case a project is above it's limits please...
  • remove core dumps, temporary data
  • talk with your colleagues to identify the hotspots,
  • check your workflow and use /fastfs for temporary files
  • systematicallyhandle your important data:
    • For later use (weeks...months) at the HPC systems, build tar archives with meaningful names or IDs and store them in the DMF system. Avoid using this system (/hpc_fastfs) for files < 1 MB!
    • refer to the hints for long term preservation for research data.

Work directories

Taurus

File system Usable directory Capacity Availability Backup Remarks
Lustre /scratch 2.8 PB global No automounter (dynamical mount of project and user directories on access)
Lustre /lustre/ssd 40 TB global No fastest available file system, only for large parallel applications running with millions of small I/O operations
ext4 /tmp 62.0 GB local No  

SGI Ultraviolet (Venus)

File system Usable directory Capacity Availability Backup Remarks
Lustre /scratch 2.8 PB global No same as /fasts on Atlas
Lustre /lustre/ssd 40 TB global No fastest available file system, only for large parallel applications running with millions of small I/O operations

Recommendations for file system usage

To work as efficient as possible, consider the following points
  • Save source code etc. only in /home
  • Store checkpoints and other temporary data in /scratch
  • Compilation in /scratch or /tmp
  • To retrieve all files from tape use the much more efficient pipe operator
     dmfind . -state OFL | dmget 

Getting great I/O-bandwitdh
  • Use many clients
  • Use many processes (writing in the same file at the same time is possible)
  • Use large I/O transfer blocks

-- Main.UlfMarkwardt - 2012-04-12