At the Speed of Light

Keith Manthey

CTO of Analytics at EMC Emerging Technologies Division

For the last year, an obvious trend in analytics has begun to emerge. Batch analytics are getting bigger and real-time analytics are getting faster.  This divergence has never been more apparent then as of late.

Batch Analytics

Batch analytics primarily compose the arena of descriptive analytics, massive scale analytics, and online model development. Descriptive analytics are still the main purview of data warehouses, but Hadoop has expanded the capabilities to ask “What If” questions with far more data types and analytics capabilities. The size of some Hadoop descriptive analytics installations have reached rather massive scale.

The documented successes of massive scale analytics is well trod. Cross data analytics (like disease detection with multiple data sets), time-series modeling, and anomaly detection rank are particularly impressive due to their depth of adoption in several verticals. The instances in health care analytics with Hadoop alone in the past year are numerous and show the potential of this use case to provide amazing insights into caring for our aging population as well as healing rather bespoke diseases.

Model development is an application that effectively highlights the groundbreaking potential that can be unlocked through Hadoop’s newest capabilities and analytics. Creating real-time models based upon trillions of transactions for a hybrid architecture is a good example of this category. Due to the small percentage of records that actually occur as real fraud on a daily basis, trillions of transactions are required for a fraud model to be certified as effective. The model is then deployed into production, which is often a real-time system.

One data point for the basis of my belief that batch is getting “bigger” is that I have been engaged in no less than 10 Hadoop clusters that have crossed the 50 PB threshold this year alone. In each case, the cluster has hit a logical pause point, causing customers  to re-evaluate the architecture and operations. This may be due to cost, scale, limitations, or other catalysts.  These are often the times when I am engaged with these customers. Not every client reaches these catalysts at consistent sizes or times, so it’s interesting that 10 clusters greater than 50 PB have hit this in 2017 alone. Nonetheless, Hadoop continues to capture new records of customers setting all-time size limits on their Hadoop cluster size.

Real-Time Analytics

While hybrid analytics were certainly in vogue last year, real-time or streaming analytics appear to be the hottest trend as of late. Real-time analytics, such as efforts to combat fraud authorizations, are not new endeavors. Why is the latest big push for streaming analytics now the “new hot thing”? There are several factors at play.

Data is growing at an ever increasing rate. One contributing factor can effectively be categorized as “whether to store or not to store.” While this step takes place usually in conjunction with more complex processes, one aspect that is clearly apparent is some form of analytics to decide if the data is useful.  Not every piece of data is valuable and an enormous amount of data is being generated. Determining if there is value in using batch storage for a particular artifact of data is one use for real-time analytics.

Moving up the value chain, the more significant factor at play is that the value proposition of real-time far outweighs the value proposition in batch. However, this doesn’t mean that batch and real-time are de-coupled or not symbiotic in some ways. In high-frequency trading, fraud authorization detection, cyber security, and other streaming use cases, the value of gaining insights in real time versus several days can be especially critical. Real-time systems have historically not relied upon Hadoop for their architectures, which has not gone unnoticed by some traditional Hadoop ecosystem tools like Spark.  The University of California Berkeley recently shifted the focus of its AMP Labs to create RISELabs, greenlighting projects such as Drizzle that aim to bring low-latency streaming capabilities to Spark. The ultimate goal of Drizzle and RISELabs is to increase the viability of Spark for real-time, non-Hadoop workloads. The emphasis on creating lower latency tools will certainly escalate the usage of streaming analytics, as real time continues to get “faster.”

The last factor is the “Internet of Everything,” often referred to as “IoT” or “M2M.” While sensors are top of mind, most companies are still finding their way in this new world of streaming sensor data. Highly technologically advanced use cases and designs are already in place, but the installs are still very bespoke and limited in nature. The mass adoption is still a work in progress. The theoretical value of this data for use in governance analytics or the analytics of improving business operations is massive. Given the dearth of data, storage in batch is not a feasible alternative at scale. As such, most of the analytics of IoT are streaming-based capabilities. The value proposition is still truly outstanding and IoT analytics remain in the hype phase. The furor and spending is in full-scale deployment regardless.

In closing, the divergence of analytics is growing between batch and online analytics. The symbiotic relationship remains strong, but the architectures are quickly separating. Most predictions from IDC, Gartner, and Forrester indicate streaming analytics will grow at a far greater rate than batch analytics due to most of the factors above. It will be interesting to see how this trend continues to manifest itself.  Dell EMC is always interested in learning more about specific use cases, and we welcome your stories on how these trends are impacting your business.

Overcoming the Exabyte-Sized Obstacles to Precision Medicine

Wolfgang Mertz

CTO of Healthcare, Life Sciences and High performance Computing

As we make strides towards a future that includes autonomous cars and grocery stores sans checkout lines, concepts that once seemed reserved only for utopian fiction, it seems there’s no limit to what science and technology can accomplish. It’s an especially exciting time for those in the life sciences and healthcare fields, with 2016 seeing breakthroughs such as a potential “universal” flu vaccine and CRISPR, a promising gene editing technology that may help treat cancer.

Several of Dell EMC’s customers are also making significant advances in precision medicine, the medical model that focuses on using an individual’s specific genetic makeup to customize and prescribe treatments.

Currently, physicians and scientists are in the research phase of a myriad of applications for precision medicine, including oncology, diabetes and cardiology. Before we are able to realize the vision President Obama shared of “the right treatments at the right time, every time, to the right person” from his 2015 Precision Medicine Initiative, there are significant challenges to overcome.

Accessibility

In order for precision medicine to become available to the masses, this will require researchers and doctors to not only have the technical infrastructure to support genomic sequencing, but the storage capacity and resources to access, view and share additional relevant data as well. They will need to have visibility into patients’ electronic health records (EHR), along with information on environmental conditions and lifestyle behaviors and biological samples. While increased data sharing may sound simple enough, the reality is there is still much work to be done on the storage infrastructure side to make this possible. Much of this data is typically siloed, which impedes healthcare providers’ ability to collaborate and review critical information that could impact a patient’s diagnosis and treatment. To fully take advantage of the potential life-saving insights available from precision medicine, organizations must implement a storage solution that enables high-speed access anytime, anywhere.

Volume

Another issue to confront is the storage capacity needed to house and preserve the petabytes of genomic data, medical imaging, EHR and other data. Thanks to decreased costs of genomic sequencing and more genomes being analyzed, the sheer volume of genomic data alone being generated is quickly eclipsing the storage available in most legacy systems. According to a scientific report by Stephens et. al published in PLOS Biology, between 100 million and two billion human genomes may be sequenced by 2025. This may lead to storage demands of up to 2-40 exabytes since storage requirements must take into consideration the accuracy of the data collected. The paper states that, “For every 3 billion bases of human genome sequence, 30-fold more data (~100 gigabases) must be collected because of errors in sequencing, base calling and genome alignment.” With this exponential projected growth, scale-out storage that can simultaneously manage multiple current and future workflows is necessary now more than ever.

Early Stages 

Finally, while it’s easy to get caught up in the excitement of the advances made thus far in precision medicine, we have to remember this remains a young discipline. At the IT level, there’s still much to be done around network and storage infrastructure and workflows in order to develop the solutions that will make this ground-breaking research readily available to the public, the physician community and healthcare professionals. Third-generation platform applications need to be built to make this more mainstream. Fortunately, major healthcare technology players such as GE and Philips have undertaken initiatives to attract independent software vendor (ISV) applications. With high-profile companies willing to devote time and resources to supporting ISV applications, the more likely it is scientists will have access to more sophisticated tools sooner.

More cohort analysis such as Genomic England’s 100,000 Genomic Project must be put in place to ensure researchers have sufficient data to develop new forms of screening and treatment and these efforts will also necessitate additional storage capabilities.

Conclusion

Despite these barriers, the future remains promising for precision medicine. With the proper infrastructure in place to provide reliable shared access and massive scalability, clinicians and researchers will have the freedom to focus on discovering the breakthroughs of tomorrow.

Get first access to our Life Sciences Solutions

Examining TCO for Object Storage in the Media and Entertainment Industry

The cloud has changed everything for the media and entertainment industry when it comes to storage. The economies of scale that cloud-based storage can support has transformed the way that media organizations archive multi-petabyte amounts of media.

Tape-based multi-petabyte archives present a number of challenges, including a host of implementation of maintenance issues. Data stored on tape is not accessible until the specific tape is located, loaded onto a tape drive, and then positioned to the proper location on the tape. Then there is the factor of the physical footprint of the library frame, and real estate required for frame expansions – tape libraries are huge. This becomes all the more problematic in densely populated, major media hubs such as Hollywood, Vancouver and New York.

At first, the public cloud seemed like a good alternative to tape, providing lower storage costs. But while it’s cheaper to store content in the public cloud, you must also factor in the high costs associated with data retrieval, which can be prohibitive given data egress fees. The public cloud also requires moving your entire media archive library to the cloud and giving up the freedom to use the applications of your choice. Suddenly the lower initial costs of the public cloud can be wrapped up in a significantly larger price to pay.

Object storage is emerging as a viable option that offers media companies a number of benefits and efficiencies that the public cloud and tape-based archives simply cannot provide. In fact, object storage is rapidly becoming mandatory for applications that must manage large, constantly growing repositories of media for long-term retention.

Dell EMC Elastic Cloud Storage (ECS) blends next-generation object storage with traditional storage features that offer the media and entertainment world an on-premises cloud storage platform that is cost-competitive with multi-petabyte type libraries. ECS not only simplifies the archive infrastructure, it enables critical new cloud-enabled workflows not possible with a legacy tape library.

Instant Availability of Content

The greatest benefit of object storage for media and entertainment companies is the instant availability of their media content – you can’t access media on tape without a planned and scheduled retrieval from a robotic tape library. For a broadcast company, the delay in data availability could result in a missed air date, advertiser revenue loss, and legal fees.

With instant access to their entire archives, a whole new world of possibilities opens up for content creators. Archives aren’t often considered when it comes to content creation – the process of accessing media content has historically been difficult and the process of obtaining data often takes far too long. However, with instant access to archived media, archives can effectively become monetized, rather than just sitting around on tape in a dark closet gathering dust and being wasted. Being able to access all of your media content at any time allows rapid deployment of new workflows and new revenue opportunities. Further, with object storage, engineering resources that were focused on tape library maintenance can be re-focused on new projects.

Operational Efficiencies

Object storage can also offer increased operational efficiencies – eliminating annual maintenance costs, as one example. One of the biggest – and least predictable – expenses with operating a tape library is maintenance. Errors on a tape library are commonplace, drive failures and downtime to fix issues can impact deadlines and cause data availability issues that can require valuable engineering time and result in lost revenue.

Going Hot and Cold: Consolidation and Prioritization

Public cloud storage services can enable users to move cold or inactive content off of tier 1 storage for archiving, but concerns around security, compliance, vendor-lock and unpredictable costs still remain a concern.  Cold content can still deliver value and ESC allows organizations to monetize this data and provide an active-archive with the same scalability and low costs benefits, but without the lack of IT agility and reliability concerns.

ECS allows organizations to consolidate their backup and archive storage requirements into a single platform. It can replace tape archives for long-term retention and near-line purposes, and surpass public cloud service for backup.

In the video below, Dell EMC’s Tom Burns and Manuvir Das offer some additional perspective on how the media and entertainment industry can benefit from object storage: 

Stay current with Media & Entertainment industry trends

Dispelling Common Misperceptions About Cloud-Based Storage Architectures

As the media and entertainment industry moves to 4K resolution and virtual/augmented content formats, the storage and archive requirements for media content has grown exponentially. But while storage requirements continue to skyrocket, industry revenue has not grown accordingly – and M&E organizations are finding themselves challenged to “do more with less.” More organizations are looking to leverage the cost efficiencies, scalability and flexibility that cloud storage can offer, but many remain apprehensive about taking the plunge.

To be clear, in this post when we talk about “the cloud,” we’re talking cloud architectures, versus the public cloud provided by vendors such as Microsoft, AWS and Google, among others. Unlike public clouds, cloud architectures can be used completely within your facility if desired and they are designed with infinite scalability and ease of access in mind.

There are a number of misperceptions about moving data to cloud architectures that are (wait for it) clouding people’s judgment. It’s time we busted some of the bigger myths and misperceptions out there about cloud storage.

Myth #1: I’ll have to learn a whole new interface – false! Dell EMC’s Elastic Cloud Storage (ECS) employs a tiered system, where it sits under a file system – in our case, Isilon. For organizations already deploying Isilon SAN or NAS storage platforms, the workflows stay exactly as they were, as does users’ interface to the file system.

This tiered approach helps companies to “do more with less” by allowing them to free up primary storage and consolidate resources. By tiering down “cold,” inactive data to ECS, you can better optimize your tier-one higher performance storage and drive down costs.

Myth #2: My data won’t be safe in the cloud – false! ECS features a geo-efficient architecture that stores, distributes and protects data both locally and geographically, eliminating any single point of failure and providing a seamless failover from site to site with no impact to business. Further, even though the data within ECS is distributed, it’s still a secure, private environment so users won’t run into scenarios where anyone can access information without the right credentials.

Myth #3: Collaboration and access is going to be negatively impacted – false! If you look at the VFX industry, for example, teams are frequently spread across the world and working across time zones on a 24/7 basis. ECS enables global teams to work on the same piece of data at the same time from one system – it’s true collaboration. ECS’s multi-site, active-active architecture and universal accessibility enables anywhere access to content from any application or device.

Myth #4: Moving to the cloud is an all-or-nothing approach – false! ECS can be deployed when your organization is ready for it – whether that’s in a month, or six months, or a year. We realize a lot of operations personnel like to “see” their data and know first-hand that it’s there. We get that. But as things evolve, it’s likely that organizations will face pressure to take at least some of the data offsite. With ECS, you can still keep your data in the data center and, when the time is right to take your data off-site, Dell EMC can work with your organization to move your infrastructure to a hosted facility or a co-lo where you can continue to access your data just as you did when it was on-premise. ECS is available in a variety of form factors that can be deployed and expanded incrementally, so you can choose the right size for your immediate needs and project growth.

Because it is designed with “limitless scale” in mind, ECS eliminates concerns and worries of running out of storage, it can meet the needs for today’s M&E organizations, as well as those in the future simply by adding additional storage, just as you used to do with tapes.

Hopefully we’ve been able to bust a few of the myths around adopting a cloud-based storage architecture. This video featuring Dell EMC’s Tom Burns and Manuvir Das can offer additional insight into ECS’s tiered approach and how media organizations can begin seeing benefits from day one.

Stay current with Media & Entertainment industry trends here or listen to Broadcast Workflows webcast recording.

TGen Cures Storage Needs with Dell EMC to Advance Precision Medicine

Sasha Paegle

Sasha Paegle

Sr. Business Development Manager, Life Sciences

As the gap between theoretical treatment and clinical application for precision medicine continues to shrink, we’re inching closer to having the practice of doctors using individual human genomes to prescribe specific care strategies become a commonplace reality.

Organizations such as the Translational Genomics Research Institute (TGen), a leading biomedical research institute, are on the forefront of enabling a new generation of life-saving treatments. With innovations from TGen, breakthroughs in genetic sequencing are unraveling mysteries of complex diseases like cancer.

To help achieve its goal to successfully use –omics to prevent, diagnose and treat disease, the Phoenix-based non-profit research institute selected Dell EMC to enhance its IT system and infrastructure to manage its petabyte-size sequencing cluster.

Data Tsunami 

The time and cost of genomic sequencing for a single person has dropped dramatically since the Human Genome Project, which spanned 13 years and cost $1 billion. Today, sequencing can be completed in roughly one day for approximately $1,000. Furthermore, technological advances in sequencing and on the IT front have enabled TGen to increase the number of patients being sequenced from the hundreds to the thousands annually. To handle the storage output from current sequencing technologies and emerging single molecule real-time (SMRT) sequencing, TGen required an infrastructure with the storage capacity and performance to support big data repositories produced by genetic sequencing—even as they grow exponentially.

“When you get more sequencers that go faster and run cheaper, and the more people are being sequenced, you’re going to need more resources in order to process this tsunami of data,” said James Lowey, TGen’s CIO.

TGen stores vast amounts of data generated by precision medicine, such as genetic data and data from wearables including glucose monitors and pain management devices, as well as clinical records and population health statistics. Scientists must then correlate and analyze this information to develop a complete picture of an individual’s illness and potential treatment. This involves TGen’s sequencing cluster churning through one million CPU hours per month and calls for a storage solution that is also able to maintain high availability, which is critical to the around the clock research environment.

Benefits for Researchers

In the coming years, researchers can expect genetic sequences to increase in addition to SMRT sequencing paving the way for larger data volumes.

Lowey notes, “As genetic data continues to grow exponentially, it’s even more important to have an extremely reliable infrastructure to manage that data and make it accessible to the scientists 24/7.”

Having a robust storage infrastructure in place allows researchers to fully devote their time and attention on the core business of science without worrying if there’s enough disk space or processing capacity. It also helps scientists get more precise treatments to patients faster, enabling breakthroughs that lead to life-saving and life-changing medical treatments – the ultimate goal of TGen and like-minded research institutes.

Looking Ahead

With the likelihood of sequencing clusters growing to exabyte-scale, TGen and its peers must continue to seek out an enterprise approach that emphasizes reliability and scalability and ensures high availability of critical data for 24/7 operations.

Lowey summarizes the future of precision medicine and IT by saying, “The possibilities are endless, but the real trick is to build all of that backend infrastructure to support it.”

To learn more about Dell EMC’s work with TGen, check out our video below.

 

Get first access to our Life Sciences Solutions

The Next Element for IT Service Providers in the Digital Age

Diana Gao

Senior Product Marketing Manager at EMC² ECS

Digital technology has disrupted large swaths of the economy and is generating huge amount of data, where the average backup hovers at around a petabyte. Not all organizations can cope up with this data deluge and look to service providers for storage and protection. Many service providers provide tape-based backup and archiving services. Despite their best efforts to innovate, data volumes always seem to grow faster, pushing the boundaries of tape capacity.

Today, companies of all sizes still use tape to store business information, but now it is more for cold storage than for data that needs to be accessed frequently. While tape as a low cost and reliable storage option is ideal for data not being accessed often, maintaining multiple versions of software and legacy infrastructure can put a burden on already taxed resources. These challenges come at a cost including software licenses, maintenance, and a waste of technical resources that could be spent on other more important initiatives to help drive business innovation. As a service provider, you need a secure and compliant data storage option that will enable you to sell more value added services.

As reported in Tech Target, a Storage magazine Purchasing Intention survey showed that the trend away from tape continues – 76% of IT professionals see their use of tape as a backup format either declining or staying the same.

Some service providers are considering offering cloud-based backup-as-a-service without causing any security concerns for their customers. Others are looking for a solution that combines the benefits of faster data access along with the cost advantages of tape.

More than a few service providers have discovered an ideal solution that covers all of these benefits: Elastic Cloud Storage (ECS) object storage platform. As a highly scalable, multi-tenant, multi-protocol object storage system, ECS is the perfect platform that helps service providers to better meet their service-level-agreement (SLA) commitments to customers by offering highly resilient, reliable and low-cost storage services with enterprise-class security.

Iron Mountain® Incorporated (NYSE: IRM), a leading provider of storage and information management services, is one of those who have discovered this solution. In additional to its traditional tape-based storage-as-a-service, it partnered with Dell EMC to provide a cost-effective, scalable and modern Cloud Archive as a part of their services portfolio. Designed to scale as the volume of data grows with ECS as the backend storage platform, the Cloud Archive solution is ideal for organizations needing offsite, pay-as-you-use archival storage with near-infinite scalability.

“Our customers trust that we know where the data is by having those cloud-based solutions in our datacenters. It gives them a peace of mind where they know where their data is at rest.” said Eileen Sweeney, SVP Data Management at Iron Mountain.

Watch the video below to hear more about how Iron Mountain uses ECS to modernize its storage management services for 95% of Fortune 1000 companies. 

You’ll find the full rundown of Iron Mountain Cloud Archive solution with ECS here.

Planning on getting away to Barcelona for Mobile World Congress (MWC) 2017? Stop by at VMWare Stand at Hall 3, Stand 3K10 to meet with Dell EMC experts!

Follow Dell EMC

Categories

Archives

Connect with us on Twitter