Storage Tiering & How It Works
Storage tiering is based on pools of storage whereby files are stored and accessed. These could be DAS (direct attached storage), NAS (Network Attached Storage) or SAN (Storage Area Network) running iSCSI or Fibre Channel. Typically a company would have a tier 1 storage platform and this would be the storage with the fastest access to the data, thereafter a company might have a large pool of SATA disk and tape library. The tier 1 storage would also be the most expensive to purchase and maintain, it is this storage that typically requires the biggest budget in order to store larger volumes of data.
Many organisations today are struggling with the sheer volumes of data being created and the issues of how to organise and store this data without the need to constantly backup, copy, move and purchase new storage. The fundamental issue has been to keep all data current and legacy on spinning disks and then back this up, but the problem historically is how to figure out what to keep on tier 1 disk and the issues involved in moving those files to other storage including SATA RAID, tape and optical. A fact that has been known for many years is that up to 80% of your organisation’s data has not been accessed in the last 90 days and at least 60% of it will not be required in the future.
The traditional way of archiving data was called HSM (hierarchical storage manager), this evolved from the mainframe whereby all data in a given area was archived. During the 90’s HSM was really the only way of archiving information to a drive letter or device. The primary problem with HSM is as follows:
1. Data has to be pointed to a physical location.
2. No migration or policy rules can be applied to the data.
3. It is primarily a manual task.
4. Restores from archive are slow.
5. Users are not aware and cannot recover archived data. These operations must be carried out by system administrators.
Due to increasingly complex data compliance regulations the option to archive and store everything is not ideal.
1. Users need to be able to find and recover files.
2. Compliance officers require discovery tools and audit trails.
3. To store everything only increases the amount of storage space required.
Keep it on spinning disk
The reluctance for companies to archive data is based upon lack of knowledge and the fear of losing valuable information. The mindset of keep it on RAID arrays and backing it up every night is still the example we encounter daily as a business.
As a consequence the size of on-line data volumes are spiralling out of control and storage management has become an ever-increasing challenge.
- Server performance and data access are diminishing
- Spiralling energy consumption and predicted power outages over the next 5-10 years
- Business legislation and user demands are requiring companies to increase disk space to alleviate the problem
- Data management puts high overheads on network and backup windows
- DR policies for recovery take longer as all data needs to be restored rather than the most essential
- The annual cost of managing this data over it’s lifetime is more than 5x the initial purchase price of the equipment
- Traditional ways of storing offline labelled media on a shelf is problematic. Organisations today require information to be accessible at all times.
Active Data Archiving
Clearly for the reasons mentioned a better and more efficient way of managing data needs to be found. Active Data Archiving is a method of managing data types by policy and controlling the movement of this information through different tiers of storage and moving this information to a more suitable storage location. The ideal solution is to use a file system that natively stores the data where it belongs, according to the storage policies. Alternatively use file migration, this leaves a 1–4k stub that knows where the files are located in case of restore.
An active archive ensures the data is always available and accessible to users, albeit with a slight delay in restoring the information depending where and how it is stored.
Active Data Archiving overcomes many of these issues and enables companies to adopt an archiving strategy that can evolve and develop as business conditions dictate.
By deploying an Active Data Archiving strategy any business will soon reap the rewards and wonder why they hadn’t done anything sooner.
Existing Investment in IT
Numerous organisations have a huge on-going investment in purchasing data storage systems and this investment is increasing year on year as the demands to store more information increases. After 3-5 years, this equipment is then replaced.
In addition, a very cumbersome and time-consuming task in all organisations is to migrate data sets for various reasons: obsolete hardware, disk array full, etc. A tiered storage solution easily migrates data from an existing storage to another over time.
With the implementation of an Tiered Storage Solution we can actually extend the life of this investment by moving the data to a secure active archive, thereby freeing up valuable disk space on high performing storage solutions and slowing down the necessary and ongoing investment of more storage space giving a huge ROI benefit. An additional benefit with active archiving is that you may be able to utilise your existing older storage systems to archive data.
As the need of corporate governance increases, companies need to know, retain and delete information by setting up policies. These policies contain rules which carry out instructions, such as:
- Gather statistics
- File size
- File type
- Date created
- Date Modified
- Last access date
More organisations are adopting a DR strategy as part of their corporate governance and compliance regulations. A DR strategy can be a cluster of servers, remote data replication to a DR site or having backup tapes to restore. An Active Data Archive can significantly help in a DR environment by reducing the amount of information that is being backed up or replicated to a DR site. This will greatly assist DR restores as the only data on a file server that needs to be restored are the archive pointers which are between 1k and 4k in size.
In addition to the above, aged data can be automatically replicated to your DR site(s).
All data needs to be managed, how companies control and police the information lifecycle of their data is a continually growing problem. Many data compliance and corporate governance regulations require information to be stored and kept for a minimum number of years and thereafter deleted.
A tiered storage solution can alleviate all of these issues by creating policy rules based on the data type. For example an insurer might issue a new policy to a customer and this information may need to be accessed over the following 60 days for accuracy and formal checks, thereafter it can be archived. The information will therefore remain on primary storage for the first 60 days, moving to SATA after this time and finally migrating to a Blu-ray optical disc after 120 days for long term archive.”
All this can be achieved with very little human intervention once the policy rules have been created as the whole information lifecycle process is fully automated.
Backup The Data
The backing up of data does not move the data through the storage tiers. All a backup does is ensure that the data can be restored in the event of a disaster. As data volumes rise the backup window shrinks and this again makes backup a more selective process rather than a backup everything policy. A backup could be used to keep an archive copy of migrated data and this could be performed monthly or quarterly.
Corporate Governance and Data Compliance
The need for corporate governance and data compliance is becoming increasingly common in business. An Active Data Archive can become an essential part of corporate governance and data compliant strategy. By setting policies on data types we can determine where it needs to be stored, how long it needs to be retained and when it needs to be deleted or moved. In addition to this, reporting and data discovery is necessary when trying to find/track file movements within the archive. It can also be used to discover file types which are not allowed to be stored on the corporate network and automatically delete the information or move it to another location for processing. This provides an essential part of any compliant archiving solution.
Identifying The Data
The task many IT managers face is trying to identify current from historical data. We can provide this as a service and supply a detailed report outlining your IOP usage, who, what, where & when the files were last accessed. Once we identify our files we simply create simple policy rules to move the data based on age, file size, user/group, file extension, last accessed/modified date, project folder etc. These rules can then be scheduled to run out of hours and the data moved transparently through the storage tiers without the users knowledge. When the user wishes to access the file it is automatically pre-migrated to the primary tiered storage.
Once a tiered storage solution has been implemented the financial investment will be repaid in a relatively short time frame. As mentioned we can free up disk space on primary tier 1 storage, save energy by storing data on lower cost storage platforms and reduce our backup window.