Posts Tagged ‘tiering’

Why Healthcare IT Should Abandon Data Storage Islands and Take the Plunge into Data Lakes

One of the most significant technology-related challenges in the modern era is managing data growth. As healthcare organizations leverage new data-generating technology, and as medical record retention requirements evolve, the exponential rise in data (already growing at 48 percent each year according to the Dell EMC Digital Universe Study) could span decades.

Let’s start by first examining the factors contributing to the healthcare data deluge:

  • Longer legal retention times for medical records – in some cases up to the lifetime of the patient.
  • Digitization of healthcare and new digitized diagnostics workflows such as digital pathology, clinical next-generation sequencing, digital breast tomosynthesis, surgical documentation and sleep study videos.
  • With more digital images to store and manage, there is also an increased need for bigger picture archive and communication system (PACS) or vendor-neutral archive (VNA) deployments.
  • Finally, more people are having these digitized medical tests, (especially given the large aging population) resulting in a higher number of yearly studies with increased data sizes.

Healthcare organizations also face frequent and complex storage migrations, rising operational costs, storage inefficiencies, limited scalability, increasing management complexity and storage tiering issues caused by storage silo sprawl.

Another challenge is the growing demand to understand and utilize unstructured clinical data. To mine this data, a storage infrastructure is necessary that supports the in-place analytics required for better patient insights and the evolution of healthcare that enables precision medicine.

Isolated Islands Aren’t Always Idyllic When It Comes to Data

The way that healthcare IT has approached data storage infrastructure historically hasn’t been ideal to begin with, and it certainly doesn’t set up healthcare organizations for success in the future.

Traditionally, when adding new digital diagnostic tools, healthcare organizations provided a dedicated storage infrastructure for each application or diagnostic discipline. For example, to deal with the growing storage requirements of digitized X-rays, an organization will create a new storage system solely for the radiology department. As a result, isolated storage siloes, or data islands, must be individually managed, making processes and infrastructure complicated and expensive to operate and scale.

Isolated siloes further undermine IT goals by increasing the cost of data management and compounding the complexity of performing analytics, which may require multiple copies of large amounts of data copied into another dedicated storage infrastructure that can’t be shared with other workflows. Even the process of creating these silos is involved and expensive because tech refreshes require migrating medical data to new storage. Each migration, typically performed every three to five years, is labor-intensive and complicated. Frequent migrations not only strain resources, but take IT staff away from projects aimed at modernizing the organization, improving patient care and increasing revenue.

Further, silos make it difficult for healthcare providers to search data and analyze information, preventing them from gaining the insights they need for better patient care. Healthcare providers are also looking to tap potentially important medical data from Internet-connected medical devices or personal technologies such as wireless activity trackers. If healthcare organizations are to remain successful in a highly regulated and increasingly competitive, consolidated and patient-centered market, they need a simplified, scalable data management strategy.

Simplify and Consolidate Healthcare Data Management with Data Lakes

The key to modern healthcare data management is to employ a strategy that simplifies storage infrastructure and storage management and supports multiple current and future workflows simultaneously. A Dell EMC healthcare data lake, for example, leverages scale-out storage to house data for clinical and non-clinical workloads across departmental boundaries. Such healthcare data lakes reduce the number of storage silos a hospital uses and eliminate the need for data migrations. This type of storage scales on the fly without downtime, addressing IT scalability and performance issues and providing native file and next-generation access methods.

Healthcare data lake storage can also:

  • Eliminate storage inefficiencies and reduce costs by automatically moving data that can be archived to denser, more cost-effective storage tiers.
  • Allow healthcare IT to expand into private, hybrid or public clouds, enabling IT to leverage cloud economies by creating storage pools for object storage.
  • Offer long-term data retention without the security risks and giving up data sovereignty of the public cloud; the same cloud expansion can be utilized for next-generation use cases such as healthcare IoT.
  • Enable precision medicine and better patient insights by fostering advanced analytics across all unstructured data, such as digitized pathology, radiology, cardiology and genomics data.
  • Reduce data management costs and complexities through automation, and scale capacity and performance on demand without downtime.
  • Eliminate storage migration projects.


The greatest technical challenge facing today’s healthcare organizations is the ability to effectively leverage and manage data. However, by employing a healthcare data management strategy that replaces siloed storage with a Dell EMC healthcare data lake, healthcare organizations will be better prepared to meet the requirements of today’s and tomorrow’s next-generation infrastructure and usher in advanced analytics and new storage access methods.


Get your fill of news, resources and videos on the Dell EMC Emerging Technologies Healthcare Resource Page



Tiers without tears: Discover easier tiering with ECS

Bob Williamsen

Sr. Business Development Manager at EMC

Latest posts by Bob Williamsen (see all)

The new features in EMC’s Elastic Cloud Storage (ECS) bring intelligent cloud tiering to your archive.

Let’s break that down – first, what do we mean by an “intelligent” archive? Obviously it makes sense to move older inactive data off your expensive primary storage to a lower-cost repository – and your archive solution takes care of this. With an intelligent archive solution, you can further reduce your archive costs by setting policies in your storage to automatically move inactive older data (say after 30, 60 or 90 days) to even cheaper cloud-based storage.

This is archive made easier, which is absolutely essential in the face of today’s data growth. It makes even more sense in today’s big data world, where you want your archived data to be readily available for analytics or eDiscovery.

The “cloud” part comes when organizations think of storage costs – cents per gigabyte per month. Cloud archive storage could be public cloud storage like Amazon S3 or Microsoft Azure, a private cloud hosted by a service provider – or your own on-premise cloud based on EMC ECS.

Why would you need an on-premise object storage solution like ECS over public cloud? One good reason is ownership. You own your cloud, you control it and can do whatever you wish with it. With a public cloud, the storage provider is ultimately in control of your data – and if you stop paying them, your data is gone.

Another reason is cost of monetizing data. Data is the raw material of 21st-century business. When it comes to performing data analytics on your archive, public cloud platforms would usually require you to move data out of cold archive first – incurring time and data retrieval costs.

With ECS as your cloud storage platform, you own your data – whether in your own data center or hosted by an enterprise service provider. You can manage and interact with all your cold archive data, in any file format, with multi-protocol support – including HDFS for in-place Hadoop analytics.

So, we’ve covered “intelligent” and “cloud” – what about the “tiering” part? Data can first be moved to a ‘warm’ archive tier of higher-performance disk, where it can still be accessed quickly to meet RPO and RTO SLA’s. As you retain archives for longer, older data can then be moved to a ‘cold’ archive tier with better economics. (This is similar to the tiered storage cost/performance model offered by Amazon S3 with its “warm” Standard tier, “cold” Infrequent Access tier and “frozen” Glacier tier.)

Consider a typical scenario with EMC Isilon as your primary storage. You can leverage the new CloudPools feature to add ‘pools’ of remote cloud storage to your Isilon namespace – perfect for archiving. And that cloud storage can now be ECS – for an easy tiered archive solution, optimized for cost, capacity and durability.Tiered archive to EMC ECS with EMC Isilon CloudPools

The biggest advantage to this tiered archive setup with ECS is automatic geo-distribution, to protect your data against entire site failures. Of course, ECS is vastly scalable from petabytes to exabytes, to future-proof your growing archive as you store more data for longer.

To ensure compatibility of ECS with your preferred archive solution – and make tiered archiving even easier. ECS tiered archive also works seamlessly with EMC Documentum, SourceOne and InfoArchive.

So, in summary, ECS now makes it easy and stress-free to create an intelligent tiered cloud archive – in other words, “tiers without tears”.

Explore the Top Reasons to Choose EMC ECS for Archive.

Learn more about ECS with a free download.

Keep it Simple, Stupid

I can’t help but notice a continued war of words between the caching and tiering camps.

Come on folks, let’s not insult our readers and our customers. Neither option is a panacea – and even together, there are clearly edge cases in which both are ineffective.

There is a simple computer science principle at work here – locality of reference. Caches exploit temporal locality; pre-fetch algorithms exploit spatial locality. The principle of locality (and the larger one of predictability) is key toward building any automation.

Automated tiering works using the principles of temporal locality too, although the time window is much larger. Rather than a block being hot over a period of seconds, it may be hot over a period of hours – or there may be a repeated pattern of that particular block being hot compared relative to its peers.

Can your cache always be as large as your storage system? No, that would be absurd. So clearly the cache hit rate cannot be 100%. Hence, automated tiering. Can automation ensure that a hot block (or cold block) is always on the correct tier when it is accessed (or not)? No, access patterns are not 100% reproducible. So clearly the automated tier hit ratio cannot be 100% either.

If either of these methods fails you, the end result is less performance and more cost. In both cases, it is prudent to consider what happens when the storage system doesn’t do everything automatically … because ultimately it is your workflow, your data and your organization that needs to function optimally.

I loved Jon Toigo’s recent post – an informed organization cannot avoid data classification, because it is part of understanding their workflow. Purely automated methods without any user input (both caching and fully automated tiering) are not sufficient to maximize performance and minimize cost – it is simple mathematics. A storage system has to give the user simple controls over how the system interacts with their data based on straight-forward business rules, and then use those hints to automate to the hilt.

The computer science principle that most people tend to remember and forget at the same time is the best one of all: KISS

Keep It Simple, Stupid

I want maximum performance and minimal cost without sacrificing an ounce of simplicity. Don’t you?

The Tracks of my Tiers

Ever since Tom Georgens mentioned on NetApp’s earnings call that “I think the entire concept of tiering is dying,” there have been a host of articles discussing what he meant by that and weighing in with dissent or agreement.

Some great examples are (and more are linked from these blogs as well):

In addition, Chris Mellor wrote an article about Avere recently – and of course, their claim to fame is a scale-out caching layer (but they’re leveraging automated tiering, as well).

This is hot a topic, no doubt about that. I’ve spent the last 12+ months talking to a wide variety of our customers and ultimately tiering, or more appropriately, maximizing performance and reducing costs, while maintaining simplicity, is invariably top of mind.

I won’t speak to the effectiveness of block-based automated tiering strategies, nor enter the fray as to whether a pure caching approach is effective enough to capture every possible performance oriented scenario. What customers have been communicating to me, in many different ways, is that there are a multitude of different environments out there – all of which demand a slightly different approach to maximizing the performance, while simultaneously reducing the cost. The other key point customers have made is that they don’t want additional complexity in their environments. They don’t want to give up their enterprise features, their ability to support NFS and CIFS, or their ability to scale a single system to large amounts of performance and data without adding management overhead.

The ideal storage system will provide customers the ability to maximize their particular workflow, on a per-application, per-LUN, per-file, and per-directory basis. Storage systems need to be flexible enough to provide not only automated options, but also manual/specific workflow oriented options as well.

A great example of this (although slightly orthogonal) is our recent ability to put meta-data purely on SSD, spill it over as necessary, etc. This is not something that can be accomplished with either caching or automated tiering – since it ultimately cannot be predicted a priori by the system – rather it is input that the storage administrator and application architect must uniquely provide.

There are a host of other potential options – file types that have specific access patterns, directories that contain virtual machines, LUNs which are owned by Exchange 2010, etc. Clearly, any such system would minimize the amount of work required, not only in defining and applying such capabilities, but also in presenting a single file system, single namespace and single point of management to the users.

Is that tiering? I’m not sure that it is. That might just be the definition of a next-generation filesystem.



Connect with us on Twitter