Converged Infrastructure + Isilon: Better Together

David Noy

VP Product Management, Emerging Technologies Division at EMC

You can’t beat Isilon for simplicity, scalability, performance and savings. We’re talking  world-class scale-out NAS that stores, manages, protects and analyzes your unstructured data with a powerful platform that stays simple, no matter how large your data environment. And Dell EMC already has the #1 converged infrastructure with blocks and racks. So bringing these two superstars together into one converged system is truly a case of one plus one equals three.

This convergence—pairing Vblock/VxBlock/VxRack systems and the Technology Extension for Isilon— creates an unmatched combination that flexibly supports a wide range of workloads with ultra-high performance, multi-protocol NAS storage. And the benefits really add up, too:

As impressive as these numbers are, it all boils down to value and versatility. These converged solutions give you more value for your investment because, quite simply, they store more data for less. And their versatility allows you to optimally run both traditional and nontraditional workloads These include video surveillance, SAP/Oracle/Microsoft applications, mixed workloads that generate structured and unstructured data, Electronic Medical Records and Medical Imaging and more – on infrastructure built and supported as one product.

With a Dell EMC Converged System, you’ll see better, faster business outcomes through simpler IT across a wide range of application workloads. For more information on modernizing your data center with the industry’s broadest converged portfolio, visit emc.com/ci or call your Dell EMC representative today.

 

Learn more about Converged Infrastructure and IsilonAlso, check out the full infographic

Using a World Wide Herd (WWH) to Advance Disease Discovery and Treatment

Patricia Florissi

Vice President & Global Chief Technology Officer, Sales at Dell EMC
Patricia Florissi is Vice President and Global Chief Technology Officer (CTO) for Sales. As Global CTO for Sales, Patricia helps define mid and long term technology strategy, representing the needs of the broader EMC ecosystem in EMC strategic initiatives. Patricia is an EMC Distinguished Engineer, holds a Ph. D. in Computer Science from Columbia University in New York, graduated valedictorian with an MBA at the Stern Business School in New York University, and has a Master's and a Bachelor's Degree in Computer Science from the Universidade Federal de Pernambuco, in Brazil. Patricia holds multiple patents, and has published extensively in periodicals including Computer Networks and IEEE Proceedings.

Latest posts by Patricia Florissi (see all)

Analysis of very large genomic datasets has the potential to radically alter the way we keep people healthy. Whether it is quickly identifying the cause of a new infectious outbreak to prevent its spread or personalizing a treatment based on a patient’s genetic variants to knock out a stubborn disease, modern Big Data analytics has a major role to play.

By leveraging cloud, Apache™ Hadoop®, next-generation sequencers, and other technologies, life scientists potentially have a new, very powerful way to conduct innovative global-scale collaborative genomic analysis research that has not been possible before. With the right approach, there are great benefits that can be realized.

image1_

To illustrate the possibilities and benefits of using coordinated worldwide genomic analysis, Dell EMC partnered with researchers at Ben-Gurion University of the Negev (BGU) to develop a global data analytics environment that spans across multiple clouds. This environment lets life sciences organizations analyze data from multiple heterogeneous sources while preserving privacy and security. The work conducted by this collaboration simulated a scenario that might be used by researchers and public health organizations to identify the early onset of outbreaks of infectious diseases. The approach could also help uncover new combinations of virulence factors that may characterize new diseases. Additionally, the methods used have applicability to new drug discovery and translational and personalized medicine.

 

Expanding on past accomplishments

In 2003, SARS (severe acute respiratory syndrome) was the first infectious outbreak where fast global collaborative genomic analysis was used to identify the cause of a disease. The effort was carried out by researchers in the U.S. and Canada who decoded the genome of the coronavirus to prove it was the cause of SARS.

The Dell EMC and BGU simulated disease detection and identification scenario makes use of technological developments (the much lower cost of sequencing, the availability of greater computing power, the use of cloud for data sharing, etc.) to address some of the shortcomings of past efforts and enhance the outcome.

Specifically, some diseases are caused by the combination of virulence factors. They may all be present in one pathogen or across several pathogens in the same biome. There can also be geographical variations. This makes it very hard to identified root causes of a disease when pathogens are analyzed in isolation as has been the case in the past.

Addressing these issues requires sequencing entire micro-biomes from many samples gathered worldwide. The computational requirements for such an approach are enormous. A single facility would need a compute and storage infrastructure on a par with major government research labs or national supercomputing centers.

Dell EMC and BGU simulated a scenario of distributed sequencing centers scattered worldwide, where each center sequences entire micro-biome samples. Each center analyzes the sequence reads generated against a set of known virulence factors. This is done to detect the combination of these factors causing diseases, allowing for near-real time diagnostic analysis and targeted treatment.

To carry out these operations in the different centers, Dell EMC extended the Hadoop framework to orchestrate distributed and parallel computation across clusters scattered worldwide. This pushed computation as close as possible to the source of data, leveraging the principle of data locality at world-wide scale, while preserving data privacy.

Since one Hadoop instance is represented by a single elephant, Dell EMC concluded that a set of Hadoop instances, scattered across the world, but working in tandem formed a World Wide Herd or WWH. This is the name Dell EMC has given to its Hadoop extensions.

image2_

Using WWH, Dell EMC wrote a distributed application where each one of a set of collaborating sequence centers calculates a profile of the virulence factors present in each of the micro-biome it sequenced and sends just these profiles to a center selected to do the global computation.

That center would then use bi-clustering to uncover common patterns of virulence factors among subsets of micro-biomes that could have been originally sampled in any part of the world.

This approach could allow researchers and public health organizations to potentially identify the early onset of outbreaks and also uncover new combinations of virulence factors that may characterize new diseases.

There are several biological advantages to this approach. The approach eliminates the time required to isolate a specific pathogen for analysis and for re-assembling the genomes of the individual microorganisms. Sequencing the entire biome lets researchers identify known and unknown combinations of virulence factors. And collecting samples independently world-wide helps ensure the detection of variants.

On the compute side, the approach uses local processing power to perform the biome sequence analysis. This reduces the need for a large centralized HPC environment. Additionally, the method overcomes the matter of data diversity. It can support all data sources and any data formats.

This investigative approach could be used as a next-generation outbreak surveillance system. It allows collaboration where different geographically dispersed groups simultaneously investigate different variants of a new disease. In addition, the WWH architecture has great applicability to pharmaceutical industry R&D efforts, which increasingly relies on a multi-disciplinary approach where geographically dispersed groups investigate different aspects of a disease or drug target using a wide variety of analysis algorithms on share data.

 

Learn more about modern genomic Big Data analytics

 

 

Build vs. Buy for OpenStack Private Clouds

Doug Bowers

VP of Engineering, Infrastructure Solutions Group at Dell EMC

Latest posts by Doug Bowers (see all)

Over the past several months there have been some excellent posts on this blog that highlight Dell EMC build vs. buy options as it relates to OpenStack.  Dell EMC offers a range of OpenStack solutions starting with enabling technologies for customers who want a do-it-yourself (DIY) cloud and ending with turnkey solutions like VxRack System with Neutrino.

The goal for VxRack Neutrino is to bring the benefits of turnkey deployment and integrated lifecycle management to an open source software stack –  a stack that has its roots firmly planted in the DIY world.

OpenStack started life as a DIY alternative to public cloud offerings.  Its popularity has extended to customers that want the benefits of an open source platform without having to hire the expertise to assemble and operate the platform themselves (i.e. non-DIY) – hence VxRack Neutrino.  So what have we learned from customers using or considering VxRack Neutrino?

  • Customers want products that make it easier to deploy open source software stacks – products that pre-integrate disparate software components and ensure they will work on a stable hardware platform.  This need is not just limited to initial installation and deployment, but also support for day 2 and beyond in order to successfully monitor and manage the system and establish a clear way to upgrade the various software components that must stay in synch (life cycle management).
  • VxRack Neutrino is a turnkey solution – which means that the customer gives up a degree of flexibility to get the benefit of operational efficiency.  While in many cases this is a tradeoff customers are willing to make, early customer feedback indicates customers want more flexibility in hardware options than VxRack Neutrino – the turnkey solution – offers.
  • Customers also indicate that support and training on the OpenStack distribution itself is critical. Customers have expressed interest in getting these services from Dell EMC partner companies (e.g. Red Hat).

So what does all this mean?  Dell EMC has made the strategic decision to meet this customer demand for OpenStack private clouds with our Reference Architecture and Validated System portfolio and end of life VxRack Neutrino.

DELL EMC has the following solutions for customers looking to build OpenStack private clouds:

  • Red Hat OpenStack Solution – A validated solution using Dell servers and switches delivered via our strategic partnership with Red Hat and jointly engineered by Dell EMC and Red Hat
  • ScaleIO OpenStack Reference Architectures – Validated building block of ScaleIO software defined block storage and Dell servers. As a heterogeneous software defined storage offering; ScaleIO supports Red Hat, Mirantis and Canonical OpenStack environments.

These options provide outstanding hardware flexibility.  They also leverage partner relationships (e.g. Red Hat) to provide customers the OpenStack support and training experience they are seeking, while using a combination of up front engineering and validation along with services to provide a turnkey experience.

Dell EMC remains strongly committed to supporting the OpenStack ecosystem as demonstrated by the breadth of our offerings.   Some areas of particular focus:

  • OpenStack community engagement: This includes community participation and contributions to enhance OpenStack, development and support of plug-ins for all of our products, and development of reference architectures with multiple partners.
  • OpenStack committers: Steady increasing level of commits and committers release over release, and broad support for integrating Dell EMC storage products into an OpenStack based cloud.

In summary we remain committed to listening to our customer’s and offering choice across a broad range of OpenStack deployment options – from best in class components for those looking to “build” and validated solutions and reference architectures  for  those looking for more.

Data Security: Are You Taking It For Granted?

Keith Manthey

CTO of Analytics at EMC Emerging Technologies Division

sc1

Despite the fact that the Wells Fargo fake account scandal first broke in September, the banking giant still finds itself the topic of national news headlines and facing public scrutiny months later. While it’s easy to assign blame, whether to the now-retired CEO, the company’s unrealistic sales goals and so forth, let’s take a moment to discuss a potential solution for Wells Fargo and its enterprise peers. I’m talking about data security and governance.

There’s no question that the data security and governance space is still evolving and maturing. Currently, the weakest link in the Hadoop ecosystem is masking of data. As it stands at most enterprises using Hadoop, access to the Hadoop space translates to uncensored access to information that can be highly sensitive. Fortunately, there are some initiatives to change that. Hortonworks recently released Ranger 2.5, which starts to add allocated masking. Shockingly enough, I can count on one hand the number of clients that understand they need this feature. In some cases, CIO- and CTO-level executives aren’t even aware of just how critical configurable row and column masking capabilities are to the security of their data.

Another aspect I find to be shocking is the lack of controls around data governance in many enterprises. Without data restrictions, it’s all too easy to envision Wells Fargo’s situation – which resulted in 5,300 employees being fired – repeating itself at other financial institutions. It’s also important to point out entering unmasked sensitive and confidential healthcare and financial data into a Hadoop system is not only an unwise and negligent practice; it’s a direct violation of mandated security and compliance regulations.

Identifying the Problem and Best Practices

sc3From enterprise systems administrators to C-suite executives, both groups are guilty of taking data security for granted, and assuming that masking and encryption capabilities are guaranteed by default of having a database. These executives are failing to do their research, dig into the weeds and ask the more complex questions, often times due to a professional background that focused on analytics or IT rather than governance. Unless an executive’s background includes building data systems or setting up controls and governance around these types of systems, he/she may not know the right questions to ask.

Another common mistake is not strictly controlling access to sensitive data, putting it at risk of theft and loss. Should customer service representatives be able to pull every file in the system? Probably not. Even IT administrators’ access should be restricted to the specific actions and commands required to perform their jobs. Encryption provides some file level protections from unauthorized users.  Authorized users who have the permission to unlock an encrypted file can often look at fields that aren’t required for their job.

As more enterprises adopt Hadoop and other similar systems, they should consider the following:

Do your due diligence. When meeting with customers, I can tell they’ve done their homework if they ask questions about more than the “buzz words” around Hadoop. These questions alone indicate they’re not simply regurgitating a sales pitch and have researched how to protect their environment. Be discerning and don’t assume the solution you’re purchasing off the shelf contains everything you need. Accepting what the salesperson has to say at face value, without probing further, is reckless and could lead to an organization earning a very damaging and costly security scandal.

Accept there are gaps. Frequently, we engage with clients who are confident they have the most robust security and data governance available.
sc4However, when we start to poke and prod a bit more to understand what other controls they have in place, the astonishing answer is zero. Lest we forget that “Core” Hadoop only obtained security in 2015 without third-party add-ons, the governance around the software framework is still in its infancy stage in many ways. Without something as inherently rudimentary in traditional IT security as a firewall in place, it’s difficult for enterprises to claim they are secure.

Have an independent plan. Before purchasing Hadoop or a similar platform, map out your exact business requirements, consider what controls your business needs and determine whether or not the product meets each of them. Research regulatory compliance standards to select the most secure configuration of your Hadoop environment and the tools you will need to supplement it.

To conclude, here is a seven-question checklist enterprises should be able to answer about their Hadoop ecosystem:

  • Do you know what’s in your Hadoop?
  • Is it meeting your business goals?
  • Do you really have the controls in place that you need to enable your business?
  • Do you have the governance?
  • Where are your gaps and how are you protecting them?
  • What are your augmented controls and supplemental procedures?
  • Have you reviewed the information the salesperson shared and mapped it to your actual business requirements to decide what you need?

Solving the Video Vortex at the Secured Cities Conference

Gary Buonacorsi

CTO of State and Local Government at Dell EMC

Latest posts by Gary Buonacorsi (see all)

I’m in Houston today at the Secured Cities conference, the leading government security and public safety event, to participate on the “Video Vortex Drives Public Safety to the Datacenter” panel. I’ll be joined by Kenneth Baker, director of Infrastructure Support at the Metropolitan Transit Authority of Harris County (METRO), who recently helped implement a citywide video surveillance system for the bus and trolley service. I’m looking forward to hearing more about METRO’s specific architecture, the pain points and challenges the department faced and what problems it hopes to solve with the new system.

For those of you unable to join us in the “Space City” of Houston, here’s a glimpse of what I’ll be covering in the session:

 

What is driving the increase in data for state and local government? 

drroneOne key factor is the emergence of new surveillance technology, such as drones, body cameras, license plate trackers and audio/video recognizance. In particular, drone usage in the public safety arena has seen significant growth for providing situational awareness in tactical events such as bank robberies or hostage situations. In addition to tactical operations, drones are also being used around the country for policing activities. Pilot programs are popping up in cities like Modesto, California, where law enforcement is using drones to assist with search warrants and surveying crime scenes. The sky’s the limit for drone usage in law enforcement, as evidenced by Amazon patenting a voice-activated shoulder-mounted drone earlier this month that officers can use to help assess dangerous situations.

Secondly, resolution requirements are increasing. Grainy pictures are ineffectual when it comes to facial recognition, analytics and post-evaluation, forcing the transition from standard definition to 4K. As new tools and analytics are posed, resolution requirements are much higher.

Perhaps the most common reason for the increase in data for public safety organizations is the growing number of camera counts and longer video retention times. With the rise of citywide surveillance, cities such as London and New York City are moving towards having cameras on practically every street corner. Discovery activities in legal proceedings are extending the retention period and the chain of evidence storage requirements.

 

Given this exponential data growth, how is it impacting organizations and what do they need to focus on?

IT departments at these organizations should look for architectures that are open source, scalable and enterprise-ready to integrate with the system they currently have, in addition to any changes they may make in the future. Simply put, department heads should avoid spot solutions and instead adopt an integrated, strategic approach to help plan for the years ahead. I would counsel them to look for a solution that allows them to start small but grow big, and easily add more cameras and scale without disrupting the current environment.

The next major area to consider is life cycle management. Previously, video footage was kept for a week before it was written over or deleted. Now long term archiving is critical with the potential for courts to mandate digital assets such as video evidence in a capital case to be maintained indefinitely.

Organizations must embrace the shift to an enterprise model. For police departments, having body cameras isn’t enough. They must consider how to integrate them into dashboard cameras, 911 call centers, etc., taking each of these point solutions to form an enterprise approach.

 

Which platform will support retention policies and what are the three different storage architectures? How can organizations escape the video vortex?
cloud2Early video surveillance solutions presented a host of challenges, including restricting departments to certain file and storage protocols, and communication channels. Combine those factors with non IP-based cameras, and modernizing existing systems became extremely difficult. The first step for organizations to solve the video vortex is to select an open platform that not only allows them to migrate and move data from system to system, but that enables them to shift providers easily. Open platforms also present more options in terms of analytics and security, enabling departments to apply more traditional security tools on top of their data storage and data transportation needs.

Compute and data storage is the key element to eliminating the video vortex. Storage is the foundation layer of a sound architecture and must address the needs of an organization, including scaling, enterprise approach and open platform to avoid a lock-in. Currently, three storage architectures exist today: distributed, centralized and cloud. Police forces that are relatively small typically still rely on a distributed architecture, capturing the data from their cars and body cameras and physically transporting it back from a mobile storage device to a centralized repository where it can then be analyzed and managed. Distributed architectures can be folded into centralized architectures, allowing them to be part of the enterprise approach with a centralized location like police headquarters, schools, airports or the METRO. A centralized architecture makes it possible to gather all of these remote data feeds from their video surveillance solutions and bring them back to a centralized repository. In a case like this, the architecture must be efficient, storing only essential data to minimize utilization rates and costs. It must also be capable of supporting thousands of surveillance devices in order to scale to multiple distributed architectures that are coming back to one location.

The third architecture to consider is cloud. Cloud presents a useful solution in that it is elastic, scalable, expands very easily and can ramp up very quickly. However, cloud storage can be very costly in light of the potential retention policy changes, data sets and cloud size – all of a sudden, the portability of those cloud data sets become much more complex. From an architecture perspective, organizations must consider how to bridge that gap and determine the amount of data that can be returned to a more cost-effective on-premise solution without compromising the capabilities that cloud offers.

Finally, distributed, centralized and cloud platforms all underlie the data lake architecture, which is really the foundation for evidence management and helps solve the video vortex public safety organizations are facing.

India’s Largest Search Engine Dials into Object Storage

Corey O'Connor

Senior Product Marketing Manager at Dell EMC² ETD

Welcome to another edition of the Emerging Technologies ECS blog series, where we take a look at issues related to cloud storage and ECS (Elastic Cloud Storage), Dell EMC’s cloud-scale storage platform. 

Navigating the World Wide Websearch

The World Wide Web was invented by an independent contractor at a nuclear research facility in Switzerland back in the late 80’s (who knew!) In its early stages, the web was extremely clumsy and had to be completely indexed by hand. It didn’t take long for the computer geeks of the world to create a very rudimentary search engine tool comprised of a searchable database of files that captured all public directory listings – the big problem here was the data they were able to ingest was limited and searching through it was a very manual and tedious task. After a few years of development, “all text” search engines were established (which is what we currently use today) providing users the ability to search for any word within the contents of any web page.

Up to this point, search engine tools were developed mostly by university researchers and small startups and although showing lots of promise, they had a difficult time monetizing them. Then one day a spinoff from a startup shop came up with the brilliant idea to sell search terms; a ‘pay-for-placement’ service to businesses which made search engines one of the most lucrative tech businesses almost overnight.     

Just Dial Limited

Like Google to the United States and Baidu Inc. to China, Just Dial Limited is the premier search engine provider in India. Just Dial also provides services to the US, UK, UAE, Canada and satisfies over 1.5 billion daily customer requests that come in from around the world.

The challenge: Just Dial had a strict retention policy of five years for their customer’s data with most of it being static and infrequently accessed. Their traditional SAN infrastructure was neither a cost effective nor scalable solution and like many other organizations, they had their concerns around putting sensitive customer data into the public cloud. There was also a constant demand for storage from their application developers and storage admins as capacity seemed to always be running thin.

The solution: Just Dial was in the market for an in-house, native object cloud-based solution that provided universal access, multi-site support, and easily integrated with their cloud services. They chose Dell EMC’s Elastic Cloud Storage (ECS) and would see an 80% reduction in their overall storage management costs. Just Dial was able to easily provision unlimited capacity to their end-users, move all static archival data to ECS by policy, and experience true cloud-scale economics across their data centers. Watch the video below for the full story:

 

Want to start your Digital Transformation with ECS? Find out how by visiting us at www.dellemc.com/ecs or try the latest version of ECS for FREE for non-production use by visiting www.dellemc.com/getecs.

Categories

Archives

Connect with us on Twitter