Archive for the ‘Emerging Tech Blog’ Category

Digital Transformation with Radical Simplicity

Corey O'Connor

Senior Product Marketing Manager at Dell EMC² ETD

Digital Transformation with Radical Simplicity

Welcome to another edition of the Emerging Technologies ECS blog series, where we discuss topics related to cloud storage and ECS (Elastic Cloud Storage), Dell EMC’s cloud-scale storage platform.

The Inflection Point

It’s no surprise that unstructured data continues to grow exponentially year over year and doesn’t show signs of slowing down anytime soon. Some organizations are left managing this data with traditional storage infrastructure which is not only very expensive, but does not scale at the rate in which the data is growing.  IT budgets continue to remain flat or grow at an ungenerous rate of about 5% annually and capital expenses tend to double almost every year for most organizations.  The other pressing issue is the requirement of somehow maintaining the same (if not better) level of service with fewer resources as data growth continues to strain storage infrastructure.  This type of trend is not sustainable and if certain organizations do not transform their business then they will struggle without question.  We know what you’re thinking – wouldn’t it be great if the world’s largest provider of data storage systems created a cost effective, cloud-scale solution to solve this enterprise level challenge?

Dell EMC’s Elastic Cloud Storage (ECS)

Elastic Cloud Storage (ECS) is Dell EMC’s 3rd generation object-based storage system that provides the ability to:

  • Consolidate primary storage resources and elevate ROI
  • Modernize traditional and legacy applicationsfor better storage utilization
  • Accelerate cloud native applications to deliver new business value

ECS delivers a multipurpose platform that satisfies a variety of different use cases and plugs in perfectly to almost any existing Dell EMC investment(s). ECS singlehandedly simplifies management, increases agility, and most importantly – lowers costs.  At scale, ECS is undoubtedly one of the most cost effective solutions available in the market today.  In fact, analyst Enterprise Strategy Group (ESG) recently conducted a survey that shows ECS provides a 60% or greater cost advantage compared to other leading public cloud providers.   

ECS extends the cloud to primary storage and allows you to free up your infrastructure through Dell EMC cloud-enabled solutions (e.g. CloudPools, CloudBoost, CloudArray, etc.).  Customers have the ability to seamlessly tier colder, inactive workloads from existing primary storage investments (e.g. Isilon, VMAX Series, VPLEX, VNX Series, Vx Series, Data Domain, Data Protection Suite, etc.) to ECS.  This resource consolidation eliminates the need to purchase additional, more expensive platforms and better utilizes the infrastructure you have in your storage environment today.

An object-based platform like ECS can drastically increase responsiveness and better secure data when you compare it to that of a traditional NAS system.  Data is protected using erasure coding and the chunks of data are then geo-distributed across all nodes within the system providing instant read/write access from any location.  Strong consistency semantics ensures only the most recent copy of data is accessed simplifying application development efforts.  A geo-caching capability further enhances responsiveness through intelligent system recognition of access patterns which minimizes WAN traffic and improves system latency.

ECS provides simple and easy access to applications through a single global namespace.  This makes it easy for developers not having to deal with complex NFS file systems – they can focus on app development and not the operations and implementation details behind it.   By modernizing traditional applications into an object store, users get fast and easy provisioning, direct access to content over the web via HTTP, global accessibility through a single namespace, and the absolute best utilization of storage resources in the datacenter.

Cloud-native applications take full advantage of a cloud system framework.  ECS’ architecture is completely software defined with total abstraction from the north and southbound allowing compute and storage resources to scale independently from each other.  Everything within ECS is containerized and there are no hardware dependencies or the need to re-code, re-tool or reconfigure applications as ECS provides multi-protocol support. This allows developers to innovate and deliver their applications to market at a much quicker rate.

Bridging the Gap

Enterprises and cloud service providers alike can leverage ECS as a way to fund their ‘digital transformation’ as traditional, line-of-business applications go into decline and cloud-native apps begin to surge over the next decade.  ECS bridges the gap between Platform 2 (traditional) and Platform 3 (next-gen) applications on a single storage system.  Not only can ECS easily handle the extraordinary amount of unstructured data that’s growing, but as a multi-purpose platform it can serve up all the many different workloads you currently manage today and ready your organization for what the future throws at you.

Why Healthcare IT Should Abandon Data Storage Islands and Take the Plunge into Data Lakes

One of the most significant technology-related challenges in the modern era is managing data growth. As healthcare organizations leverage new data-generating technology, and as medical record retention requirements evolve, the exponential rise in data (already growing at 48 percent each year according to the Dell EMC Digital Universe Study) could span decades.

Let’s start by first examining the factors contributing to the healthcare data deluge:

  • Longer legal retention times for medical records – in some cases up to the lifetime of the patient.
  • Digitization of healthcare and new digitized diagnostics workflows such as digital pathology, clinical next-generation sequencing, digital breast tomosynthesis, surgical documentation and sleep study videos.
  • With more digital images to store and manage, there is also an increased need for bigger picture archive and communication system (PACS) or vendor-neutral archive (VNA) deployments.
  • Finally, more people are having these digitized medical tests, (especially given the large aging population) resulting in a higher number of yearly studies with increased data sizes.

Healthcare organizations also face frequent and complex storage migrations, rising operational costs, storage inefficiencies, limited scalability, increasing management complexity and storage tiering issues caused by storage silo sprawl.

Another challenge is the growing demand to understand and utilize unstructured clinical data. To mine this data, a storage infrastructure is necessary that supports the in-place analytics required for better patient insights and the evolution of healthcare that enables precision medicine.

Isolated Islands Aren’t Always Idyllic When It Comes to Data

The way that healthcare IT has approached data storage infrastructure historically hasn’t been ideal to begin with, and it certainly doesn’t set up healthcare organizations for success in the future.

Traditionally, when adding new digital diagnostic tools, healthcare organizations provided a dedicated storage infrastructure for each application or diagnostic discipline. For example, to deal with the growing storage requirements of digitized X-rays, an organization will create a new storage system solely for the radiology department. As a result, isolated storage siloes, or data islands, must be individually managed, making processes and infrastructure complicated and expensive to operate and scale.

Isolated siloes further undermine IT goals by increasing the cost of data management and compounding the complexity of performing analytics, which may require multiple copies of large amounts of data copied into another dedicated storage infrastructure that can’t be shared with other workflows. Even the process of creating these silos is involved and expensive because tech refreshes require migrating medical data to new storage. Each migration, typically performed every three to five years, is labor-intensive and complicated. Frequent migrations not only strain resources, but take IT staff away from projects aimed at modernizing the organization, improving patient care and increasing revenue.

Further, silos make it difficult for healthcare providers to search data and analyze information, preventing them from gaining the insights they need for better patient care. Healthcare providers are also looking to tap potentially important medical data from Internet-connected medical devices or personal technologies such as wireless activity trackers. If healthcare organizations are to remain successful in a highly regulated and increasingly competitive, consolidated and patient-centered market, they need a simplified, scalable data management strategy.

Simplify and Consolidate Healthcare Data Management with Data Lakes

The key to modern healthcare data management is to employ a strategy that simplifies storage infrastructure and storage management and supports multiple current and future workflows simultaneously. A Dell EMC healthcare data lake, for example, leverages scale-out storage to house data for clinical and non-clinical workloads across departmental boundaries. Such healthcare data lakes reduce the number of storage silos a hospital uses and eliminate the need for data migrations. This type of storage scales on the fly without downtime, addressing IT scalability and performance issues and providing native file and next-generation access methods.

Healthcare data lake storage can also:

  • Eliminate storage inefficiencies and reduce costs by automatically moving data that can be archived to denser, more cost-effective storage tiers.
  • Allow healthcare IT to expand into private, hybrid or public clouds, enabling IT to leverage cloud economies by creating storage pools for object storage.
  • Offer long-term data retention without the security risks and giving up data sovereignty of the public cloud; the same cloud expansion can be utilized for next-generation use cases such as healthcare IoT.
  • Enable precision medicine and better patient insights by fostering advanced analytics across all unstructured data, such as digitized pathology, radiology, cardiology and genomics data.
  • Reduce data management costs and complexities through automation, and scale capacity and performance on demand without downtime.
  • Eliminate storage migration projects.

 

The greatest technical challenge facing today’s healthcare organizations is the ability to effectively leverage and manage data. However, by employing a healthcare data management strategy that replaces siloed storage with a Dell EMC healthcare data lake, healthcare organizations will be better prepared to meet the requirements of today’s and tomorrow’s next-generation infrastructure and usher in advanced analytics and new storage access methods.

 

Get your fill of news, resources and videos on the Dell EMC Emerging Technologies Healthcare Resource Page

 

 

Converged Infrastructure + Isilon: Better Together

David Noy

VP Product Management, Emerging Technologies Division at EMC

You can’t beat Isilon for simplicity, scalability, performance and savings. We’re talking  world-class scale-out NAS that stores, manages, protects and analyzes your unstructured data with a powerful platform that stays simple, no matter how large your data environment. And Dell EMC already has the #1 converged infrastructure with blocks and racks. So bringing these two superstars together into one converged system is truly a case of one plus one equals three.

This convergence—pairing Vblock/VxBlock/VxRack systems and the Technology Extension for Isilon— creates an unmatched combination that flexibly supports a wide range of workloads with ultra-high performance, multi-protocol NAS storage. And the benefits really add up, too:

As impressive as these numbers are, it all boils down to value and versatility. These converged solutions give you more value for your investment because, quite simply, they store more data for less. And their versatility allows you to optimally run both traditional and nontraditional workloads These include video surveillance, SAP/Oracle/Microsoft applications, mixed workloads that generate structured and unstructured data, Electronic Medical Records and Medical Imaging and more – on infrastructure built and supported as one product.

With a Dell EMC Converged System, you’ll see better, faster business outcomes through simpler IT across a wide range of application workloads. For more information on modernizing your data center with the industry’s broadest converged portfolio, visit emc.com/ci or call your Dell EMC representative today.

 

Learn more about Converged Infrastructure and IsilonAlso, check out the full infographic

Using a World Wide Herd (WWH) to Advance Disease Discovery and Treatment

Patricia Florissi

Vice President & Global Chief Technology Officer, Sales at Dell EMC
Patricia Florissi is Vice President and Global Chief Technology Officer (CTO) for Sales. As Global CTO for Sales, Patricia helps define mid and long term technology strategy, representing the needs of the broader EMC ecosystem in EMC strategic initiatives. Patricia is an EMC Distinguished Engineer, holds a Ph. D. in Computer Science from Columbia University in New York, graduated valedictorian with an MBA at the Stern Business School in New York University, and has a Master's and a Bachelor's Degree in Computer Science from the Universidade Federal de Pernambuco, in Brazil. Patricia holds multiple patents, and has published extensively in periodicals including Computer Networks and IEEE Proceedings.

Latest posts by Patricia Florissi (see all)

Analysis of very large genomic datasets has the potential to radically alter the way we keep people healthy. Whether it is quickly identifying the cause of a new infectious outbreak to prevent its spread or personalizing a treatment based on a patient’s genetic variants to knock out a stubborn disease, modern Big Data analytics has a major role to play.

By leveraging cloud, Apache™ Hadoop®, next-generation sequencers, and other technologies, life scientists potentially have a new, very powerful way to conduct innovative global-scale collaborative genomic analysis research that has not been possible before. With the right approach, there are great benefits that can be realized.

image1_

To illustrate the possibilities and benefits of using coordinated worldwide genomic analysis, Dell EMC partnered with researchers at Ben-Gurion University of the Negev (BGU) to develop a global data analytics environment that spans across multiple clouds. This environment lets life sciences organizations analyze data from multiple heterogeneous sources while preserving privacy and security. The work conducted by this collaboration simulated a scenario that might be used by researchers and public health organizations to identify the early onset of outbreaks of infectious diseases. The approach could also help uncover new combinations of virulence factors that may characterize new diseases. Additionally, the methods used have applicability to new drug discovery and translational and personalized medicine.

 

Expanding on past accomplishments

In 2003, SARS (severe acute respiratory syndrome) was the first infectious outbreak where fast global collaborative genomic analysis was used to identify the cause of a disease. The effort was carried out by researchers in the U.S. and Canada who decoded the genome of the coronavirus to prove it was the cause of SARS.

The Dell EMC and BGU simulated disease detection and identification scenario makes use of technological developments (the much lower cost of sequencing, the availability of greater computing power, the use of cloud for data sharing, etc.) to address some of the shortcomings of past efforts and enhance the outcome.

Specifically, some diseases are caused by the combination of virulence factors. They may all be present in one pathogen or across several pathogens in the same biome. There can also be geographical variations. This makes it very hard to identified root causes of a disease when pathogens are analyzed in isolation as has been the case in the past.

Addressing these issues requires sequencing entire micro-biomes from many samples gathered worldwide. The computational requirements for such an approach are enormous. A single facility would need a compute and storage infrastructure on a par with major government research labs or national supercomputing centers.

Dell EMC and BGU simulated a scenario of distributed sequencing centers scattered worldwide, where each center sequences entire micro-biome samples. Each center analyzes the sequence reads generated against a set of known virulence factors. This is done to detect the combination of these factors causing diseases, allowing for near-real time diagnostic analysis and targeted treatment.

To carry out these operations in the different centers, Dell EMC extended the Hadoop framework to orchestrate distributed and parallel computation across clusters scattered worldwide. This pushed computation as close as possible to the source of data, leveraging the principle of data locality at world-wide scale, while preserving data privacy.

Since one Hadoop instance is represented by a single elephant, Dell EMC concluded that a set of Hadoop instances, scattered across the world, but working in tandem formed a World Wide Herd or WWH. This is the name Dell EMC has given to its Hadoop extensions.

image2_

Using WWH, Dell EMC wrote a distributed application where each one of a set of collaborating sequence centers calculates a profile of the virulence factors present in each of the micro-biome it sequenced and sends just these profiles to a center selected to do the global computation.

That center would then use bi-clustering to uncover common patterns of virulence factors among subsets of micro-biomes that could have been originally sampled in any part of the world.

This approach could allow researchers and public health organizations to potentially identify the early onset of outbreaks and also uncover new combinations of virulence factors that may characterize new diseases.

There are several biological advantages to this approach. The approach eliminates the time required to isolate a specific pathogen for analysis and for re-assembling the genomes of the individual microorganisms. Sequencing the entire biome lets researchers identify known and unknown combinations of virulence factors. And collecting samples independently world-wide helps ensure the detection of variants.

On the compute side, the approach uses local processing power to perform the biome sequence analysis. This reduces the need for a large centralized HPC environment. Additionally, the method overcomes the matter of data diversity. It can support all data sources and any data formats.

This investigative approach could be used as a next-generation outbreak surveillance system. It allows collaboration where different geographically dispersed groups simultaneously investigate different variants of a new disease. In addition, the WWH architecture has great applicability to pharmaceutical industry R&D efforts, which increasingly relies on a multi-disciplinary approach where geographically dispersed groups investigate different aspects of a disease or drug target using a wide variety of analysis algorithms on share data.

 

Learn more about modern genomic Big Data analytics

 

 

Build vs. Buy for OpenStack Private Clouds

Doug Bowers

VP of Engineering, Infrastructure Solutions Group at Dell EMC

Latest posts by Doug Bowers (see all)

Over the past several months there have been some excellent posts on this blog that highlight Dell EMC build vs. buy options as it relates to OpenStack.  Dell EMC offers a range of OpenStack solutions starting with enabling technologies for customers who want a do-it-yourself (DIY) cloud and ending with turnkey solutions like VxRack System with Neutrino.

The goal for VxRack Neutrino is to bring the benefits of turnkey deployment and integrated lifecycle management to an open source software stack –  a stack that has its roots firmly planted in the DIY world.

OpenStack started life as a DIY alternative to public cloud offerings.  Its popularity has extended to customers that want the benefits of an open source platform without having to hire the expertise to assemble and operate the platform themselves (i.e. non-DIY) – hence VxRack Neutrino.  So what have we learned from customers using or considering VxRack Neutrino?

  • Customers want products that make it easier to deploy open source software stacks – products that pre-integrate disparate software components and ensure they will work on a stable hardware platform.  This need is not just limited to initial installation and deployment, but also support for day 2 and beyond in order to successfully monitor and manage the system and establish a clear way to upgrade the various software components that must stay in synch (life cycle management).
  • VxRack Neutrino is a turnkey solution – which means that the customer gives up a degree of flexibility to get the benefit of operational efficiency.  While in many cases this is a tradeoff customers are willing to make, early customer feedback indicates customers want more flexibility in hardware options than VxRack Neutrino – the turnkey solution – offers.
  • Customers also indicate that support and training on the OpenStack distribution itself is critical. Customers have expressed interest in getting these services from Dell EMC partner companies (e.g. Red Hat).

So what does all this mean?  Dell EMC has made the strategic decision to meet this customer demand for OpenStack private clouds with our Reference Architecture and Validated System portfolio and end of life VxRack Neutrino.

DELL EMC has the following solutions for customers looking to build OpenStack private clouds:

  • Red Hat OpenStack Solution – A validated solution using Dell servers and switches delivered via our strategic partnership with Red Hat and jointly engineered by Dell EMC and Red Hat
  • ScaleIO OpenStack Reference Architectures – Validated building block of ScaleIO software defined block storage and Dell servers. As a heterogeneous software defined storage offering; ScaleIO supports Red Hat, Mirantis and Canonical OpenStack environments.

These options provide outstanding hardware flexibility.  They also leverage partner relationships (e.g. Red Hat) to provide customers the OpenStack support and training experience they are seeking, while using a combination of up front engineering and validation along with services to provide a turnkey experience.

Dell EMC remains strongly committed to supporting the OpenStack ecosystem as demonstrated by the breadth of our offerings.   Some areas of particular focus:

  • OpenStack community engagement: This includes community participation and contributions to enhance OpenStack, development and support of plug-ins for all of our products, and development of reference architectures with multiple partners.
  • OpenStack committers: Steady increasing level of commits and committers release over release, and broad support for integrating Dell EMC storage products into an OpenStack based cloud.

In summary we remain committed to listening to our customer’s and offering choice across a broad range of OpenStack deployment options – from best in class components for those looking to “build” and validated solutions and reference architectures  for  those looking for more.

Data Security: Are You Taking It For Granted?

Keith Manthey

CTO of Analytics at EMC Emerging Technologies Division

sc1

Despite the fact that the Wells Fargo fake account scandal first broke in September, the banking giant still finds itself the topic of national news headlines and facing public scrutiny months later. While it’s easy to assign blame, whether to the now-retired CEO, the company’s unrealistic sales goals and so forth, let’s take a moment to discuss a potential solution for Wells Fargo and its enterprise peers. I’m talking about data security and governance.

There’s no question that the data security and governance space is still evolving and maturing. Currently, the weakest link in the Hadoop ecosystem is masking of data. As it stands at most enterprises using Hadoop, access to the Hadoop space translates to uncensored access to information that can be highly sensitive. Fortunately, there are some initiatives to change that. Hortonworks recently released Ranger 2.5, which starts to add allocated masking. Shockingly enough, I can count on one hand the number of clients that understand they need this feature. In some cases, CIO- and CTO-level executives aren’t even aware of just how critical configurable row and column masking capabilities are to the security of their data.

Another aspect I find to be shocking is the lack of controls around data governance in many enterprises. Without data restrictions, it’s all too easy to envision Wells Fargo’s situation – which resulted in 5,300 employees being fired – repeating itself at other financial institutions. It’s also important to point out entering unmasked sensitive and confidential healthcare and financial data into a Hadoop system is not only an unwise and negligent practice; it’s a direct violation of mandated security and compliance regulations.

Identifying the Problem and Best Practices

sc3From enterprise systems administrators to C-suite executives, both groups are guilty of taking data security for granted, and assuming that masking and encryption capabilities are guaranteed by default of having a database. These executives are failing to do their research, dig into the weeds and ask the more complex questions, often times due to a professional background that focused on analytics or IT rather than governance. Unless an executive’s background includes building data systems or setting up controls and governance around these types of systems, he/she may not know the right questions to ask.

Another common mistake is not strictly controlling access to sensitive data, putting it at risk of theft and loss. Should customer service representatives be able to pull every file in the system? Probably not. Even IT administrators’ access should be restricted to the specific actions and commands required to perform their jobs. Encryption provides some file level protections from unauthorized users.  Authorized users who have the permission to unlock an encrypted file can often look at fields that aren’t required for their job.

As more enterprises adopt Hadoop and other similar systems, they should consider the following:

Do your due diligence. When meeting with customers, I can tell they’ve done their homework if they ask questions about more than the “buzz words” around Hadoop. These questions alone indicate they’re not simply regurgitating a sales pitch and have researched how to protect their environment. Be discerning and don’t assume the solution you’re purchasing off the shelf contains everything you need. Accepting what the salesperson has to say at face value, without probing further, is reckless and could lead to an organization earning a very damaging and costly security scandal.

Accept there are gaps. Frequently, we engage with clients who are confident they have the most robust security and data governance available.
sc4However, when we start to poke and prod a bit more to understand what other controls they have in place, the astonishing answer is zero. Lest we forget that “Core” Hadoop only obtained security in 2015 without third-party add-ons, the governance around the software framework is still in its infancy stage in many ways. Without something as inherently rudimentary in traditional IT security as a firewall in place, it’s difficult for enterprises to claim they are secure.

Have an independent plan. Before purchasing Hadoop or a similar platform, map out your exact business requirements, consider what controls your business needs and determine whether or not the product meets each of them. Research regulatory compliance standards to select the most secure configuration of your Hadoop environment and the tools you will need to supplement it.

To conclude, here is a seven-question checklist enterprises should be able to answer about their Hadoop ecosystem:

  • Do you know what’s in your Hadoop?
  • Is it meeting your business goals?
  • Do you really have the controls in place that you need to enable your business?
  • Do you have the governance?
  • Where are your gaps and how are you protecting them?
  • What are your augmented controls and supplemental procedures?
  • Have you reviewed the information the salesperson shared and mapped it to your actual business requirements to decide what you need?

Categories

Archives

Connect with us on Twitter