Posts Tagged ‘Dell EMC’

At the Speed of Light

Keith Manthey

CTO of Analytics at EMC Emerging Technologies Division

For the last year, an obvious trend in analytics has begun to emerge. Batch analytics are getting bigger and real-time analytics are getting faster.  This divergence has never been more apparent then as of late.

Batch Analytics

Batch analytics primarily compose the arena of descriptive analytics, massive scale analytics, and online model development. Descriptive analytics are still the main purview of data warehouses, but Hadoop has expanded the capabilities to ask “What If” questions with far more data types and analytics capabilities. The size of some Hadoop descriptive analytics installations have reached rather massive scale.

The documented successes of massive scale analytics is well trod. Cross data analytics (like disease detection with multiple data sets), time-series modeling, and anomaly detection rank are particularly impressive due to their depth of adoption in several verticals. The instances in health care analytics with Hadoop alone in the past year are numerous and show the potential of this use case to provide amazing insights into caring for our aging population as well as healing rather bespoke diseases.

Model development is an application that effectively highlights the groundbreaking potential that can be unlocked through Hadoop’s newest capabilities and analytics. Creating real-time models based upon trillions of transactions for a hybrid architecture is a good example of this category. Due to the small percentage of records that actually occur as real fraud on a daily basis, trillions of transactions are required for a fraud model to be certified as effective. The model is then deployed into production, which is often a real-time system.

One data point for the basis of my belief that batch is getting “bigger” is that I have been engaged in no less than 10 Hadoop clusters that have crossed the 50 PB threshold this year alone. In each case, the cluster has hit a logical pause point, causing customers  to re-evaluate the architecture and operations. This may be due to cost, scale, limitations, or other catalysts.  These are often the times when I am engaged with these customers. Not every client reaches these catalysts at consistent sizes or times, so it’s interesting that 10 clusters greater than 50 PB have hit this in 2017 alone. Nonetheless, Hadoop continues to capture new records of customers setting all-time size limits on their Hadoop cluster size.

Real-Time Analytics

While hybrid analytics were certainly in vogue last year, real-time or streaming analytics appear to be the hottest trend as of late. Real-time analytics, such as efforts to combat fraud authorizations, are not new endeavors. Why is the latest big push for streaming analytics now the “new hot thing”? There are several factors at play.

Data is growing at an ever increasing rate. One contributing factor can effectively be categorized as “whether to store or not to store.” While this step takes place usually in conjunction with more complex processes, one aspect that is clearly apparent is some form of analytics to decide if the data is useful.  Not every piece of data is valuable and an enormous amount of data is being generated. Determining if there is value in using batch storage for a particular artifact of data is one use for real-time analytics.

Moving up the value chain, the more significant factor at play is that the value proposition of real-time far outweighs the value proposition in batch. However, this doesn’t mean that batch and real-time are de-coupled or not symbiotic in some ways. In high-frequency trading, fraud authorization detection, cyber security, and other streaming use cases, the value of gaining insights in real time versus several days can be especially critical. Real-time systems have historically not relied upon Hadoop for their architectures, which has not gone unnoticed by some traditional Hadoop ecosystem tools like Spark.  The University of California Berkeley recently shifted the focus of its AMP Labs to create RISELabs, greenlighting projects such as Drizzle that aim to bring low-latency streaming capabilities to Spark. The ultimate goal of Drizzle and RISELabs is to increase the viability of Spark for real-time, non-Hadoop workloads. The emphasis on creating lower latency tools will certainly escalate the usage of streaming analytics, as real time continues to get “faster.”

The last factor is the “Internet of Everything,” often referred to as “IoT” or “M2M.” While sensors are top of mind, most companies are still finding their way in this new world of streaming sensor data. Highly technologically advanced use cases and designs are already in place, but the installs are still very bespoke and limited in nature. The mass adoption is still a work in progress. The theoretical value of this data for use in governance analytics or the analytics of improving business operations is massive. Given the dearth of data, storage in batch is not a feasible alternative at scale. As such, most of the analytics of IoT are streaming-based capabilities. The value proposition is still truly outstanding and IoT analytics remain in the hype phase. The furor and spending is in full-scale deployment regardless.

In closing, the divergence of analytics is growing between batch and online analytics. The symbiotic relationship remains strong, but the architectures are quickly separating. Most predictions from IDC, Gartner, and Forrester indicate streaming analytics will grow at a far greater rate than batch analytics due to most of the factors above. It will be interesting to see how this trend continues to manifest itself.  Dell EMC is always interested in learning more about specific use cases, and we welcome your stories on how these trends are impacting your business.

Overcoming the Exabyte-Sized Obstacles to Precision Medicine

Wolfgang Mertz

CTO of Healthcare, Life Sciences and High performance Computing

As we make strides towards a future that includes autonomous cars and grocery stores sans checkout lines, concepts that once seemed reserved only for utopian fiction, it seems there’s no limit to what science and technology can accomplish. It’s an especially exciting time for those in the life sciences and healthcare fields, with 2016 seeing breakthroughs such as a potential “universal” flu vaccine and CRISPR, a promising gene editing technology that may help treat cancer.

Several of Dell EMC’s customers are also making significant advances in precision medicine, the medical model that focuses on using an individual’s specific genetic makeup to customize and prescribe treatments.

Currently, physicians and scientists are in the research phase of a myriad of applications for precision medicine, including oncology, diabetes and cardiology. Before we are able to realize the vision President Obama shared of “the right treatments at the right time, every time, to the right person” from his 2015 Precision Medicine Initiative, there are significant challenges to overcome.

Accessibility

In order for precision medicine to become available to the masses, this will require researchers and doctors to not only have the technical infrastructure to support genomic sequencing, but the storage capacity and resources to access, view and share additional relevant data as well. They will need to have visibility into patients’ electronic health records (EHR), along with information on environmental conditions and lifestyle behaviors and biological samples. While increased data sharing may sound simple enough, the reality is there is still much work to be done on the storage infrastructure side to make this possible. Much of this data is typically siloed, which impedes healthcare providers’ ability to collaborate and review critical information that could impact a patient’s diagnosis and treatment. To fully take advantage of the potential life-saving insights available from precision medicine, organizations must implement a storage solution that enables high-speed access anytime, anywhere.

Volume

Another issue to confront is the storage capacity needed to house and preserve the petabytes of genomic data, medical imaging, EHR and other data. Thanks to decreased costs of genomic sequencing and more genomes being analyzed, the sheer volume of genomic data alone being generated is quickly eclipsing the storage available in most legacy systems. According to a scientific report by Stephens et. al published in PLOS Biology, between 100 million and two billion human genomes may be sequenced by 2025. This may lead to storage demands of up to 2-40 exabytes since storage requirements must take into consideration the accuracy of the data collected. The paper states that, “For every 3 billion bases of human genome sequence, 30-fold more data (~100 gigabases) must be collected because of errors in sequencing, base calling and genome alignment.” With this exponential projected growth, scale-out storage that can simultaneously manage multiple current and future workflows is necessary now more than ever.

Early Stages 

Finally, while it’s easy to get caught up in the excitement of the advances made thus far in precision medicine, we have to remember this remains a young discipline. At the IT level, there’s still much to be done around network and storage infrastructure and workflows in order to develop the solutions that will make this ground-breaking research readily available to the public, the physician community and healthcare professionals. Third-generation platform applications need to be built to make this more mainstream. Fortunately, major healthcare technology players such as GE and Philips have undertaken initiatives to attract independent software vendor (ISV) applications. With high-profile companies willing to devote time and resources to supporting ISV applications, the more likely it is scientists will have access to more sophisticated tools sooner.

More cohort analysis such as Genomic England’s 100,000 Genomic Project must be put in place to ensure researchers have sufficient data to develop new forms of screening and treatment and these efforts will also necessitate additional storage capabilities.

Conclusion

Despite these barriers, the future remains promising for precision medicine. With the proper infrastructure in place to provide reliable shared access and massive scalability, clinicians and researchers will have the freedom to focus on discovering the breakthroughs of tomorrow.

Get first access to our Life Sciences Solutions

The Next Element for IT Service Providers in the Digital Age

Diana Gao

Senior Product Marketing Manager at EMC² ECS

Digital technology has disrupted large swaths of the economy and is generating huge amount of data, where the average backup hovers at around a petabyte. Not all organizations can cope up with this data deluge and look to service providers for storage and protection. Many service providers provide tape-based backup and archiving services. Despite their best efforts to innovate, data volumes always seem to grow faster, pushing the boundaries of tape capacity.

Today, companies of all sizes still use tape to store business information, but now it is more for cold storage than for data that needs to be accessed frequently. While tape as a low cost and reliable storage option is ideal for data not being accessed often, maintaining multiple versions of software and legacy infrastructure can put a burden on already taxed resources. These challenges come at a cost including software licenses, maintenance, and a waste of technical resources that could be spent on other more important initiatives to help drive business innovation. As a service provider, you need a secure and compliant data storage option that will enable you to sell more value added services.

As reported in Tech Target, a Storage magazine Purchasing Intention survey showed that the trend away from tape continues – 76% of IT professionals see their use of tape as a backup format either declining or staying the same.

Some service providers are considering offering cloud-based backup-as-a-service without causing any security concerns for their customers. Others are looking for a solution that combines the benefits of faster data access along with the cost advantages of tape.

More than a few service providers have discovered an ideal solution that covers all of these benefits: Elastic Cloud Storage (ECS) object storage platform. As a highly scalable, multi-tenant, multi-protocol object storage system, ECS is the perfect platform that helps service providers to better meet their service-level-agreement (SLA) commitments to customers by offering highly resilient, reliable and low-cost storage services with enterprise-class security.

Iron Mountain® Incorporated (NYSE: IRM), a leading provider of storage and information management services, is one of those who have discovered this solution. In additional to its traditional tape-based storage-as-a-service, it partnered with Dell EMC to provide a cost-effective, scalable and modern Cloud Archive as a part of their services portfolio. Designed to scale as the volume of data grows with ECS as the backend storage platform, the Cloud Archive solution is ideal for organizations needing offsite, pay-as-you-use archival storage with near-infinite scalability.

“Our customers trust that we know where the data is by having those cloud-based solutions in our datacenters. It gives them a peace of mind where they know where their data is at rest.” said Eileen Sweeney, SVP Data Management at Iron Mountain.

Watch the video below to hear more about how Iron Mountain uses ECS to modernize its storage management services for 95% of Fortune 1000 companies. 

You’ll find the full rundown of Iron Mountain Cloud Archive solution with ECS here.

Planning on getting away to Barcelona for Mobile World Congress (MWC) 2017? Stop by at VMWare Stand at Hall 3, Stand 3K10 to meet with Dell EMC experts!

Digital Transformation with Radical Simplicity

Corey O'Connor

Senior Product Marketing Manager at Dell EMC² ETD

Digital Transformation with Radical Simplicity

Welcome to another edition of the Emerging Technologies ECS blog series, where we discuss topics related to cloud storage and ECS (Elastic Cloud Storage), Dell EMC’s cloud-scale storage platform.

The Inflection Point

It’s no surprise that unstructured data continues to grow exponentially year over year and doesn’t show signs of slowing down anytime soon. Some organizations are left managing this data with traditional storage infrastructure which is not only very expensive, but does not scale at the rate in which the data is growing.  IT budgets continue to remain flat or grow at an ungenerous rate of about 5% annually and capital expenses tend to double almost every year for most organizations.  The other pressing issue is the requirement of somehow maintaining the same (if not better) level of service with fewer resources as data growth continues to strain storage infrastructure.  This type of trend is not sustainable and if certain organizations do not transform their business then they will struggle without question.  We know what you’re thinking – wouldn’t it be great if the world’s largest provider of data storage systems created a cost effective, cloud-scale solution to solve this enterprise level challenge?

Dell EMC’s Elastic Cloud Storage (ECS)

Elastic Cloud Storage (ECS) is Dell EMC’s 3rd generation object-based storage system that provides the ability to:

  • Consolidate primary storage resources and elevate ROI
  • Modernize traditional and legacy applicationsfor better storage utilization
  • Accelerate cloud native applications to deliver new business value

ECS delivers a multipurpose platform that satisfies a variety of different use cases and plugs in perfectly to almost any existing Dell EMC investment(s). ECS singlehandedly simplifies management, increases agility, and most importantly – lowers costs.  At scale, ECS is undoubtedly one of the most cost effective solutions available in the market today.  In fact, analyst Enterprise Strategy Group (ESG) recently conducted a survey that shows ECS provides a 60% or greater cost advantage compared to other leading public cloud providers.   

ECS extends the cloud to primary storage and allows you to free up your infrastructure through Dell EMC cloud-enabled solutions (e.g. CloudPools, CloudBoost, CloudArray, etc.).  Customers have the ability to seamlessly tier colder, inactive workloads from existing primary storage investments (e.g. Isilon, VMAX Series, VPLEX, VNX Series, Vx Series, Data Domain, Data Protection Suite, etc.) to ECS.  This resource consolidation eliminates the need to purchase additional, more expensive platforms and better utilizes the infrastructure you have in your storage environment today.

An object-based platform like ECS can drastically increase responsiveness and better secure data when you compare it to that of a traditional NAS system.  Data is protected using erasure coding and the chunks of data are then geo-distributed across all nodes within the system providing instant read/write access from any location.  Strong consistency semantics ensures only the most recent copy of data is accessed simplifying application development efforts.  A geo-caching capability further enhances responsiveness through intelligent system recognition of access patterns which minimizes WAN traffic and improves system latency.

ECS provides simple and easy access to applications through a single global namespace.  This makes it easy for developers not having to deal with complex NFS file systems – they can focus on app development and not the operations and implementation details behind it.   By modernizing traditional applications into an object store, users get fast and easy provisioning, direct access to content over the web via HTTP, global accessibility through a single namespace, and the absolute best utilization of storage resources in the datacenter.

Cloud-native applications take full advantage of a cloud system framework.  ECS’ architecture is completely software defined with total abstraction from the north and southbound allowing compute and storage resources to scale independently from each other.  Everything within ECS is containerized and there are no hardware dependencies or the need to re-code, re-tool or reconfigure applications as ECS provides multi-protocol support. This allows developers to innovate and deliver their applications to market at a much quicker rate.

Bridging the Gap

Enterprises and cloud service providers alike can leverage ECS as a way to fund their ‘digital transformation’ as traditional, line-of-business applications go into decline and cloud-native apps begin to surge over the next decade.  ECS bridges the gap between Platform 2 (traditional) and Platform 3 (next-gen) applications on a single storage system.  Not only can ECS easily handle the extraordinary amount of unstructured data that’s growing, but as a multi-purpose platform it can serve up all the many different workloads you currently manage today and ready your organization for what the future throws at you.

Big Data Analysis for the Greater Good: Dell EMC & the 100,000 Genome Project

Wolfgang Mertz

CTO of Healthcare, Life Sciences and High performance Computing

It might seem far-reaching to say that big data analysis can fundamentally impact patient outcomes around cancer and other illnesses, and that it has the power to ultimately transform health services and indeed society at large, but that’s the precise goal behind the 100,000 Genome Project from Genomics England.

DNA backgroundFor background, Genomics England is a wholly-owned company of the Department of Health, set up to deliver the 100,000 Genomes Project. This exciting endeavor will sequence and collect 100,000 whole genomes from 70,000 NHS patients and their families (with their full consent), focusing on patients with rare diseases as well as those with common cancers.

The program is designed to create a lasting legacy for patients as well as the NHS and the broader UK economy, while encouraging innovation in the UK’s bioscience sector. The genetic sequences will be anonymized and shared with approved academic researchers to help develop new treatments and diagnostic testing methods targeted at the genetic characteristics of individual patients.

Dell EMC provides the platform for large-scale analytics in a hybrid cloud model for Genomics England, which leverages our VCE vScale, with EMC Isilon and EMC XtremIO solutions. The Project has been using EMC storage for its genomic sequence library, and now it will be leveraging an Isilon data lake to securely store data during the sequencing process. Backup services are provided by EMC’s Data Domain and EMC Networker.

The Genomics England IT environment uses both on-prem servers and IaaS provided by cloud service providers on G-Cloud. According to an article from Government Computing, “one of Genomics England’s key legacies is expected to be an ecosystem of cloud service providers providing low cost, elastic compute on demand through G-Cloud, bringing the benefits of scale to smaller research groups.”

There are two main considerations from an IT perspective around genome and DNA sequencing projects such as those being done by Genomics England and others: data management and speed. Vast amounts of research data have to be stored and retrieved, and this large-scale biologic data has to be processed quickly in order to gain meaningful insights.

Scale is another key factor. Sequencing and storing genomic information digitally is a data-intensive endeavor, to say the least. Just sequencing a single genome creates hundreds of gigabytes and the Project has sequenced over 13,000 genomes to date, which is expected to generate ten times more data over the next two years. The data lake being used by Genomics England allows 17 petabytes of data to be stored and made available for multi-protocol analytics (including Hadoop).

For perspective, 1 PB is a quadrillion bytes – think of that as 20 million four-drawer filing cabinets filled with text. Or, considering the Milky Way has roughly two hundred billion stars in its galaxy, if you count each single star as a single byte – it would take 5,000 Milky Way galaxies to reach 1PB of data. It’s staggering.

The potential of being able to contribute to eradicating disease and identify exciting new treatments is truly awe inspiring.  And considering the immense scale of the data involved – 5,000 galaxies! – provides new context around reaching for the stars.

Get first access to our LifeScience Solutions

 

Embrace Digital Transformation with Elastic Cloud Storage (ECS) 3.0

Sam Grocott

Senior Vice President, Marketing & Product Management at EMC ETD

Digital Transformation is drastically changing the business landscape, and the effects are being felt in every industry, and every region of the world. For some, the goal of this transformation is to use technology to leapfrog the competition by offering innovative products and services. For others, the focus is on avoiding disruption from new market entrants. Whatever your situation might be, it’s clear that you can’t ignore the change. In a recent study by Dell Technologies, 78% of global businesses surveyed believe that digital start-ups will pose a threat to their organization, while almost half (45%) fear they may become obsolete in the next three to five years due to competition from digital-born start-ups. These numbers are a stark indication of the pressure that business leaders are feeling to adapt or fall by the wayside.

But for IT leaders, this raises an uncomfortable question: Where will you find the money to make this transformation? You’re already under constant pressure to lower IT costs. How can you invest in new technologies while still doing this?

Elastic Cloud Storage (ECS), Dell EMC’s object storage platform, was built to help organizations with precisely this challenge. After being in market for just under two years, the latest release, ECS 3.0 is being announced at Dell EMC World today. ECS is a next-generation storage platform that simplifies storage and management of your unstructured data, increases your agility, and most importantly, lowers your costs. Let’s take a look at some of the ways ECS can help modernize your datacenter, clearing the way for you to embrace Digital Transformation.

Simplify and Accelerate Cloud-Native Development

The success of companies like Uber and AirBnB has highlighted the transformative power of “cloud native” mobile and web apps. Enterprises everywhere are taking note – in the previously mentioned Dell Technologies survey, 72% of companies indicated that they are expanding their software development capabilities. Often, these software development efforts are directed towards “cloud-native” applications designed for the web and mobile devices.

ECS is designed for cloud-native applications that utilize the S3 protocol (or other REST-based APIs like OpenStack Swift). ECS natively performs many functions like geo-distribution, ensuring strong data consistency and data protection, freeing up application developers to focus on what moves their business forward. This greatly increases developer productivity, and reduces the time to market for new applications that can unlock greater customer satisfaction, as well as new sources of revenue.

Reduce storage TCO and complexity

Legacy storage systems that sit in most enterprise datacenters are struggling to keep up with the explosion in unstructured data. Primary storage platforms are constantly running out of capacity, and it is expensive to store infrequently accessed data on these platforms. Additionally, as many businesses operate on a global scale, data coming in from different corners of the world ends up forming silos, which increase management complexity and lower agility in responding to business needs.
ECS is compatible with a wide range of cloud-enabled tiering solutions for Dell EMC primary storage resources like VMAX, VNX, Isilon and Data Domain.  Additionally, ECS is certified on many 3rd party tiering solutions, which enable it to act as a low cost, global cloud-tier for 3rd party storage platforms. These solutions drive up primary storage efficiency and drive down cost by accessing a lower cost tier with ECS. Tiering to ECS is friction-free, which means that apps or users accessing primary storage don’t have to change any behavior at all.

image1

Tape Replacement

The new ECS dense compute rack D-series increases storage density by more than 60%, making it an ideal replacement for tape archives. The D-Series comes as an eight node system that provides the highest density configurations for ECS at 4.5PB (D-4500) and 6.2PB (D-6200) in a single rack.

These new configurations provide the low cost and scalability benefits of traditional tape solutions, but without the lack of agility, poor reliability, and operational difficulties associated with storing data on tape.  Additionally, ECS makes business data available to BUs in an on-demand fashion. This allows organizations to fully embrace Digital Transformation, which relies on insights mined from business data to create more compelling experiences for customers.

Legacy application modernization

ECS can serve as an ideal storage platform for organizations looking to modernize legacy LoB applications that utilize or generate a large amount of unstructured data. Modifying legacy apps to point to ECS using the S3 (or other REST-based APIs like OpenStack Swift) protocol can help reduce costs, simplify maintenance of the application, and allow them to scale to handle massive amounts of data.

Take the Next Step

Learn more about how ECS can enable your transformation , follow @DellEMCECS on Twitter, or try it out – for free!

 

 

Follow Dell EMC

Categories

Archives

Connect with us on Twitter