HPSS Logo
HPSS for GPFS
HPSS:
The High Performance Storage System (HPSS) is IBM's highly scalable Hierarchical Storage Management (HSM) System. HPSS is intended to be used by IBM's high-end HPC customers, with storage requirements in the tens of millions, hundreds of millions, and even the billion file range. HPSS is capable of concurrently accessing hundreds of tapes for extremely high aggregate data transfer rates, and can easily meet otherwise unachievable total storage bandwidth and capacity requirements. HPSS can stripe files across multiple tapes, which ensures high bandwidth data transfers of huge files. HPSS provides for stewardship and access of many petabytes of data stored on robotic tape libraries.
HPSS Scales
GPFS:
The IBM General Parallel File System (GPFS) is a true distributed, clustered file system. Multiple servers are used to manage the data and metadata of a single file system. Individual files are broken into multiple blocks and striped across multiple disks, and multiple servers, which eliminates bottlenecks. Information Lifecycle Management (ILM) policy scans are also distributed across multiple servers, which allows GPFS to quickly scan the entire file system, identifying files that match a specific criteria. Shortly after we showed the Billion File Demo at the international conference on high performance computing (SC07), the Almaden Research Center showed that a pre-GA version of GPFS is capable of scanning a single GPFS file system, containing a billion files, in less than 15 minutes!
GPFS Scales
GPFS/HPSS Interface (GHI):
HPSS can now be used to automatically HSM manage GPFS disk resources. GPFS customers can now store petabytes of data on a file system with terabytes of high performance disks. HPSS can also be used to capture a point-in-time backup of your GPFS cluster and file systems. In the event of a catastrophic failure, HPSS can be used to restore your GPFS cluster and file systems. The GPFS high performance ILM policy scans are used to:
  • Identify new files, or files that changed, so the data can be migrated to tape;
  • Identify older, unused files that no longer need to remain on disk;
  • Identify files that users need to bulk-stage back to GPFS for future processing;
  • Capture GPFS cluster information; and
  • Capture GPFS file system structure and file attributes.
GPFS + HPSS:
The ILM policy scan results are sent to the Scheduler. The Scheduler distributes the work to the I/O Managers (IOM), and the GPFS data are copied to HPSS in parallel. For those files that are no longer active, holes are then punched into GPFS files to free up GPFS disk resources. The continuous movement of GPFS files to HPSS tape, and the freeing of GPFS disk resources is an automated process that is transparent to the GPFS user. If the GPFS user should access a file that is only on HPSS tape, the file will automatically stage back to GPFS, so the user can access the file.

The GPFS/HPSS Interface continuously copies GPFS file data to HPSS. When the time comes to perform a backup, only those files that have not yet been copied to HPSS tape are migrated. Therefore, it is NOT necessary to recapture all of the file data at each backup.

GHI to HPSS
Small File Aggregation:
Most GPFS file systems are made up of small files about 90% of the files use 10% of the disk resources. Traditionally, copying small files to tape diminishes your tape drive performance. The GPFS/HPSS Interface copies small files from GPFS to HPSS by grouping many small files into much larger aggregates. Small file aggregation is completely configurable, but 10,000 GPFS files were placed into each HPSS aggregate at the SC07 Billion File Demo. Large aggregates allow data to stream to the tape drive, which yields higher tape transfer rates.
Aggregation


That's why we say...
GPFS + HPSS = Extreme Storage Scalability!

< Home

HPSS for GPFS
What's New?
HPSS at SC13 - SC13 is the 2013 international conference for high performance computing, networking, storage and analysis. SC13 will be in Denver, Colorado, from November 18th through 21st - Learn More. Come visit the HPSS folks at the IBM booth and schedule an HPSS briefing at the IBM Executive Briefing Center

2013 HPSS Users Forum - The 2013 HPSS Users Forum (HUF) will be hosted by the National Center for Atmospheric Research (NCAR), in Boulder, Colorado. The conference will run from November 4th - 7th. For more information and registration.

HPSS @ SEG13 - SEG13 is the 2013 annual meeting for the Society of Exploration Geophysics International Exposition. SEG13 will bring together all of the major companies involved in oil, gas, and mineral exploration, and new areas of civil engineering, environmental and archaeology. The conference will be held at the George R. Brown Convention Center in Houston, Texas, from September 22nd through 25th - Learn More. Come visit the HPSS folks at the IBM booth and schedule an HPSS briefing.

2013 HPSS Training - The next HPSS System Administration course from September 23rd - 27th. For more information and registration.

NCSA in production with RAIT - A massive 380 petabyte HPSS system was successfully deployed. -- the world's largest automated near-line data repository for open science. Learn more from NCSA, and HPCwire. The new HPSS system went into production using HPSS Redundant Array of Independent Tapes (RAIT) tiers, which is similar to RAID, providing redundancy for a tape stripe. RAIT allows HPSS customers to meet their performance and redundancy requirements without doubling their tape cost. Learn more about RAIT.

HPSS VFS for RHEL 6 is available - HPSS Virtual File System (VFS) allows a computer to use the UNIX mount command to mount the HPSS name space, much like NFS is mounted. Learn more about VFS.

+ More News     

Home    |    About HPSS    |    Services    |    Contact us
Copyright 2013, IBM Corporation. All Rights Reserved.