As New Business Models Emerge, Enterprises Increasingly Seek to Leave the World of Silo-ed Data

Keith Manthey

CTO of Analytics at EMC Emerging Technologies Division

As Bob Dylan famously wrote back in 1964, the times, they are a changin’. And while Dylan probably wasn’t speaking about the Fortune 500’s shifting business models and their impact on enterprise storage infrastructure (as far as we know), his words hold true in this context.

Many of the world’s largest companies are attempting to reinvent themselves by abandoning their product-or manufacturing-focused business models in favor of a more service-oriented approach. Look at industrial giants such as GE, Caterpillar or Procter & Gamble to name a few and consider how they leverage existing data about products (in the case of GE, say it’s a power plant) and apply them to a service model (say for utilities, in this example).

The evolution of a product-focused model into a service-oriented one can offer more value (and revenue) over time, but also requires a more sophisticated analytic model and holistic approach to data, a marked difference from the traditional silo-ed way that data has been managed historically.

Transformation

Financial services is another example of an industry undergoing a transformation from a data storage perspective. Here you have a complex business with lots of traditionally silo-ed data, split between commercial, consumer and credit groups. But increasingly, banks and credit unions want a more holistic view of their business in order to better understand how various divisions or teams could work together in new ways. Enabling consumer credit and residential mortgage units to securely share data could allow them to build better risk score models across loans, for example, ultimately allowing a financial institution to provide better customer service and expand their product mix.

Early days of Hadoop: compromise was the norm

As with any revolution, it’s the small steps that matter most at first. Enterprises have traditionally started small when it comes to holistically governing their data and managing workflows with Hadoop. In earlier days of Hadoop, say five to seven years ago, enterprises assumed potential compromises around data availability and efficiency, as well as how workflows could be governed and managed. Issues in operations could arise, making it difficult to keep things running one to three years down the road. Security and availability were often best effort – there weren’t the expectations of  five-nines reliability.

Data was secured by making it an island by itself. The idea was to scale up as necessary, and build a cluster for each additional department or use case. Individual groups or departments ran what was needed and there wasn’t much integration with existing analytics environments.

With Hadoop’s broader acceptance, new business models can emerge

hadoop_9_resizeHowever, last year, with its 10-year anniversary, we’ve started to see broader acceptance of Hadoop and as a result it’s becoming both easier and more practical to consolidate data company-wide. What’s changed is the realization that Hadoop was a true proof of concept and not a science experiment. The number of Hadoop environments has grown and users are realizing there is real power in combining data from different parts of the business and real business value in keeping historical data.

At best, the model of building different islands and running them independently is impractical; at worst it is potentially paralyzing for businesses. Consolidating data and workflows allows enterprises to focus on and implement better security, availability and reliability company-wide. In turn, they are also transforming their business models and expanding into new markets and offerings that weren’t possible even five years ago.

Analyst firm IDC evaluates EMC Isilon: Lab-validation of scale-out NAS file storage for your enterprise Data Lake

Suresh Sathyamurthy

Sr. Director, Product Marketing & Communications at EMC

A Data Lake should now be a part of every big data workflow in your enterprise organization. By consolidating file storage for multiple workloads onto a single shared platform based on scale-out NAS, you can reduce costs and complexity in your IT environment, and make your big data efficient, agile and scalable.

That’s the expert opinion in analyst firm IDC’s recent Lab Validation Brief: “EMC Isilon Scale-Out Data Lake Foundation: Essential Capabilities for Building Big Data Infrastructure”, March 2016. As the lab validation report concludes: “IDC believes that EMC Isilon is indeed an easy-to-operate, highly scalable and efficient Enterprise Data Lake Platform.

The Data Lake Maximizes Information Value

The Data Lake model of storage represents a paradigm shift from the traditional linear enterprise data flow model. As data and the insights gleaned from it increase in value, enterprise-wide consolidated storage is transformed into a hub around which the ingestion and consumption systems work. This enables enterprises to bring analytics to data in-place – and avoid expensive costs of multiple storage systems, and time for repeated ingestion and analysis.

But pouring all your data into a single shared Data Lake would put serious strain on traditional storage systems – even without the added challenges of data growth. That’s where the virtually limitless scalability of EMC Isilon scale-out NAS file storage makes all the difference…

The EMC Data Lake Difference

The EMC Isilon Scale-out Data Lake is an Enterprise Data Lake Platform (EDLP) based on Isilon scale-out NAS file storage and the OneFS distributed file system.

As well as meeting the growing storage needs of your modern datacenter with massive capacity, it enables big data accessibility using traditional and next-generation access methods – helping you manage data growth and gain business value through analytics. You can also enjoy seamless replication of data from the enterprise edge to your core datacenter, and tier inactive data to a public or private cloud.

We recently reached out to analyst firm IDC to lab-test our Isilon Data Lake solutions – here’s what they found in 4 key areas…

  1. Multi-Protocol Data Ingest Capabilities and Performance

Isilon is an ideal platform for enterprise-wide data storage, and provides a powerful centralized storage repository for analytics. With the multi-protocol capabilities of OneFS, you can ingest data via NFS, SMB and HDFS. This makes the Isilon Data Lake an ideal and user-friendly platform for big data workflows, where you need to ingest data quickly and reliably via protocols most suited to the workloads generating the information. Using native protocols enables in-place analytics, without the need for data migration, helping your business gain more rapid data insights.

datalake_blog

IDC validated that the Isilon Data Lake offers excellent read and write performance for Hadoop clusters accessing HDFS via OneFS, compared against via direct-attached storage (DAS). In the lab tests, Isilon performed:

  • nearly 3x faster for data writes
  • over 1.5x faster for reads and read/writes.

As IDC says in its validation: “An Enterprise Data Lake platform should provide vastly improved Hadoop workload performance over a standard DAS configuration.”

  1. High Availability and Resilience

Policy-based high availability capabilities are needed for enterprise adoption of Data Lakes. The Isilon Data Lake is able to cope with multiple simultaneous component failures without interruption of service. If a drive or other component fails, it only has to recover the specific affected data (rather than recovering the entire volume).

IDC validated that a disk failure on a single Isilon node has no noticeable performance impact on the cluster. Replacing a failed drive is a seamless process and requires little administrative effort. (This is in contrast to traditional DAS, where the process of replacing a drive can be rather involved and time consuming.)

Isilon can even cope easily with node-level failures. IDC validated that a single-node failure has no noticeable performance impact on the Isilon cluster. Furthermore, the operation of removing a node from the cluster, or adding a node to the cluster, is a seamless process.

  1. Multi-tenant Data Security and Compliance

Strong multi-tenant data security and compliance features are essential for an enterprise-grade Data Lake. Access zones are a crucial part of the multi-tenancy capabilities of the Isilon OneFS. In tests, IDC found that Isilon provides no-crossover isolation between Hadoop instances for multi-tenancy.

Another core component of secure multi-tenancy is the ability to provide a secure authentication and authorization mechanism for local and directory-based users and groups. IDC validated that the Isilon Data Lake provides multiple federated authentication and authorization schemes. User-level permissions are preserved across protocols, including NFS, SMB and HDFS.

Federated security is an essential attribute of an Enterprise Data Lake Platform, with the ability to maintain confidentiality and integrity of data irrespective of the protocols used. For this reason, another key security feature of the OneFS platform is SmartLock – specifically designed for deploying secure and compliant (SEC Rule 17a-4) Enterprise Data Lake Platforms.

In tests, IDC found that Isilon enables a federated security fabric for the Data Lake, with enterprise-grade governance, regulatory and compliance (GRC) features.

  1. Simplified Operations and Automated Storage Tiering

The Storage Pools feature of Isilon OneFS allows administrators to apply common file policies across the cluster locally – and extend them to the cloud.

Storage Pools consists of three components:

  • SmartPools: Data tiering within the cluster – essential for moving data between performance-optimized and capacity-optimized cluster nodes.
  • CloudPools: Data tiering between the cluster and the cloud – essential for implementing a hybrid cloud, and placing archive data on a low-cost cloud tier.
  • File Pool Policies: Policy engine for data management locally and externally – essential for automating data movement within the cluster and the cloud.

As IDC confirmed in testing, Isilon’s federated data tiering enables IT administrators to optimize their infrastructure by automating data placement onto the right storage tiers.

The expert verdict on the Isilon Data Lake

IDC concludes that: “EMC Isilon possesses the necessary attributes such as multi-protocol access, availability and security to provide the foundations to build an enterprise-grade Big Data Lake for most big data Hadoop workloads.”

Read the full IDC Lab Validation Brief for yourself: “EMC Isilon Scale-Out Data Lake Foundation: Essential Capabilities for Building Big Data Infrastructure”, March 2016.

Learn more about building your Data Lake with EMC Isilon.

The Democratization of Data Science with the Arrival of Apache Spark

Keith Manthey

CTO of Analytics at EMC Emerging Technologies Division

As an emerging field, data science has seen rapid growth over the span of just a few short years. With Harvard Business Review referring to the data scientist role as the “sexiest job of the 21st century” in 2012 and job postings for the role growing 57 percent in the first quarter of 2015, enterprises are increasingly seeking out talent to help bolster their organizations’ understanding of their most valuable assets: their data.

The growing demand for data scientists reflects a larger business trend – a shifting emphasis from the zeros and ones to the people who help manage the mounds of data on a daily basis. Enterprises are sitting on a wealth of information but are struggling to derive actionable insights from it, in part due to its sheer volume but also because they don’t have the right talent on board to help.

The problem enterprises now face isn’t capturing data – but finding and retaining top talent to help make sense of it in meaningful ways. Luckily, there’s a new technology on the horizon that can help democratize data science and increase accessibility to the insights it unearths.

Data Science Scarcity & Competition

dataThe talent pool for data scientists is notoriously scarce. According to McKinsey & Company, by 2018, the United States alone may face a 50 to 60 percent gap between supply and demand for “deep analytic talent, i.e., people with advanced training in statistics or machine learning.” Data scientists possess an essential blend of business acumen, statistical knowledge and technological prowess, rendering them as difficult to train as they are invaluable to the modern enterprise.

Moreover, banks and insurance companies face an added struggle in hiring top analytics talent, with the allure of Silicon Valley beckoning top performers away from organizations perceived as less inclined to innovate. This perception issue hinders banks’ and insurance companies’ ability to remain competitive in hiring and retaining data scientists.

As automation and machine learning grow increasingly sophisticated, however, there’s an opportunity for banks and insurance companies to harness the power of data science, without hiring formally trained data scientists. One such technology that embodies these innovations in automation is Apache Spark, which is poised to shift the paradigm of data science, allowing more and more enterprises to tap into insights culled from their own data.

Spark Disrupts & Democratizes Data Science

Data science requires three pillars of knowledge: statistical analysis, business intelligence and technological expertise. Spark does the technological heavy-lifting, by understanding and processing data at a scale that most people aren’t comfortable. It handles the distribution and categorization of the data, removing the burden from individuals and automating the process. By allowing enterprises to load data into clusters and query it on an ongoing basis, the platform is particularly adept at machine-learning and automation – a crucial component in any system intended to analyze mass quantities of data.

Spark was created in the labs of UC Berkeley and has quickly taken the analytics world by storm, with two main business propositions: the freedom to model data without hiring data scientists, and the power to leverage analytics models that are already built and ready-for-use in Spark today. The combination of these two attributes allows enterprises to gain speed on analytics endeavors with a modern, open-source technology.

The arrival of Spark signifies a world of possibility for companies that are hungry for the business value data science can provide but are finding it difficult to hire and keep deep analytic talent on board. The applications of Spark are seemingly endless, from cybersecurity and fraud detection to genomics modeling and actuarial analytics.

What Spark Means for Enterprises

Not only will Spark enable businesses to hire non-traditional data scientists, such as actuaries, to effectively perform the role, but it will also open a world of possibilities in terms of actual business strategy.

Banks, for example, have been clamoring for Spark from the get-go, in part because of Spark’s promise to help banks bring credit card authorizations back in-house. For over two decades, credit card authorizations have been outsourced, since it was more efficient and far less dicey to centralize the authorization process.

The incentive to bring this business back in-house is huge, however, with estimated cost savings of tens to hundreds of millions annually. With Spark, the authorization process could be automated in-house – a huge financial boon to banks. The adoption of Spark allows enterprises to effectively leverage data science and evolve their business strategies accordingly.

The Adoption of Spark & Hadoophadoop_1_resized

Moreover, Spark works seamlessly with the Hadoop Distributions sitting on EMC’s storage platforms. As I noted in my last post, Hadoop adoption among enterprises has been incredible and is quickly becoming the de facto
standard for storing and processing terabytes or even petabytes of data.

By leveraging Spark and existing Hadoop platforms in tandem, enterprises are well-prepared to solve the ever-increasing data and analytics challenges ahead.

Open Source and the Modern Data Center: How {code} by Dell EMC enhances ScaleIO software-defined block storage

Joshua Bernstein

VP of Technology, Emerging Technology Team at Dell EMC

Why has open source become such a big deal, even in the enterprise data center? If you answered “to save money”, you wouldn’t be in the minority. But, despite what many may assume, it’s not principally about cost savings – although that may be one benefit. The attraction of open source is in its name – that is, its ‘open’ nature, both in terms of access to the code and to the developers who maintain and enhance it.

To boil it down, open source enables you to run data centers through software, with better and easier integration opportunities between diverse systems than has ever been possible before with proprietary offerings.

Open Source Advantages for the Modern Data Center

emc_code_1_resized_again_1emc_code_2_resized_againemc_code_3_resized_again
Access to Open Source Code & Project DevelopersFreedom of Choice & Flexibility for UsersEasier Integrations Between Diverse Systems

Emerging open source infrastructure software thrives on freedom, flexibility, innovation and integration. Integration is particularly important because it enables discrete components to seamlessly work together as a system. This software thrives through community involvement, and the ability to integrate with both modern and existing processes and infrastructure, which leads to quicker adoption.Enterprises are looking to data center IT transformations to help them meet the ever-growing and fluid expectations of their customers. Key to this is establishing a modern data center strategy, specifically one that is optimized for resource consumption. By embracing systems that are operated as software, organizations are more readily able to adapt to changing demands and opportunities.

{code} is Dell EMC’s open source initiative to deepen ties with the developer and open source communities. Through {code}, Dell EMC is enabling these communities to seamlessly fuse proprietary software with open source technologies.

Leveraging Container-Focused Solutions

Containerization is having its big moment in the world of enterprise IT – specifically with open source infrastructure and application platforms, such as Docker, Mesos, Cloud Foundry and Kubernetes. Container-based infrastructure represents a major evolution in the way applications are deployed and managed. Not since the appearance of the virtual machine has a technology been so transformative. Containers give IT more choice of infrastructure, since it gives teams greater control over application dependencies, which enables them to adopt more agile operational methods.

However, a big challenge for fully adopting container technology is that it’s not a one-to-one comparison to virtual machines. With containers, how can users run persistent applications inside of these lightweight, ephemeral constructs? We believe this is a key challenge that prevents the wider adoption of container-based infrastructure.

REX-Ray: Meeting the Persistent Storage Challenge

To solve for this challenge, {code} has been working for more than a year on developing REX-Ray to deliver persistent storage capabilities to container run times. It provides a simple and focused architecture for enabling advanced storage functionality across common storage, virtualization and cloud platforms. As an open source project, new features and functionality continue to be added to REX-Ray, aimed at continuing to lead and set the bar for providing persistence capabilities to containers.

Storage is a critical element of any IT environment. By focusing on storage within the context of open source and software, we’re able to offer users more functionality, choice and value from their deployments. One solution that works really well with REX-Ray is Dell EMC’s ScaleIO software-defined block storage.

REX-Ray and ScaleIO: Simpler Block Storage for Containerization

REX-Ray acts as the ‘glue’ between the container platform and ScaleIO – a software-defined storage solution that provides block level storage services on commodity hardware. This solution enables IT to move beyond purely stateless applications for containers, to confidently deploying critical stateful applications in containers as well.

ScaleIO is the gold standard for software-defined block storage platforms. It gives organizations the flexibility and freedom to provide storage through commodity servers in a range of deployment models – including hyper-converged architectures without a performance overhead. Through the seamless integration between REX-Ray and ScaleIO, the complete life cycle of storage is managed and consumed by container solutions such as Docker, Mesos, Cloud Foundry and Kubernetes.

Through {code}, Dell EMC has demonstrated its commitment to support the open source community. By ensuring that its software-defined storage solutions such as ScaleIO work seamlessly within a modern data center (which already integrates wide-ranging technologies such as virtualization, containerization, automation and cloud) and DevOps environment, we are making software-based storage technologies relevant in the open source community. Advanced integration, developer enablement and dynamic engagement all made possible by {code} are making ScaleIO an increasingly valued and attractive block storage option for the open source community.

Learn more about {code} by Dell EMC.

Join the {code} Community.

Want to get your hands a little dirty with the technology?

Download and test ScaleIO inside a VM environment.

Request a vLab demo:

Docker, Mesos, and ScaleIO for your persistent applications.

 

Taking AIMS at the IP Transition

Ryan Sayre

Ryan Sayre is the CTO-at-Large for EMC Isilon covering Europe, Middle East, and Africa. Ryan has prior hands-on experience in production technology across several types of production workflows. Previously before working in the storage industry, he was an IT infrastructure architect at a large animation studio in the United States. He has consulted across entertainment sectors from content generation and production management to digital distribution and presentation. Ryan’s current role allows him to assist in enhancing the Isilon product for both current and future uses in media production, share his findings across similar industries and improve the overall landscape of how storage can be better leveraged for productivity. He holds an MBA from London Business School (UK), Bachelors of Science in Computer Science from University of Portland (USA). In his free time, he is an infrastructure volunteer for the London Hackspace and an amateur radio enthusiast using the callsigns M0RYS and N0RYS.

Latest posts by Ryan Sayre (see all)

Much has been made of the impending move to a largely IP and hybrid cloud infrastructure in the media and entertainment industry and with good reason. Over the last decade the shift from SDI to IP has been met with both cheers and jeers. Supporters of transitioning to IP speak of vast operating and financial benefits, while traditional broadcast facilities and operators are still struggling to reconcile these potential gains with their unease over emerging standards and interoperability concerns.

In an effort to assuage these concerns, EMC alongside several of the industry’s leading vendors such as Cisco, Evertz, Imagine Communications and Sony have joined the Alliance for IP Media Solutions (AIMS). AIMS, a non-profit trade alliance, is focused on helping broadcast and media companies move from bespoke legacy systems to a virtualized, IP-based future – quickly and economically. Believing open, standards-based protocol as critically important to ensuring long-term interoperability, AIMS’ promotes the adoption of several standards: VSF TR-03 and TR-04, SMPTE 2022-6 and AES67.

It is important that organizations continue to advocate for AIMS’ roadmap for open standards in IP technology and do their part to educate each other, which is why we recently partnered with TV Technology and Broadcast Television and internet production technology conceptEngineering to develop an e-book titled “The IP Transformation: What It Means for M&E Storage Strategies”. It examines how the combination of standard Ethernet/IP networking, virtualized workflows on commodity servers and clustered high-performance storage is influencing new video facility design and expanding new business opportunities for media companies. The e-book takes a closer look on topics such as media exchange characteristics, the eventual fate of Fibre Channel, Quality of Service (QoS) and storage needs for evolving media workflows.

To learn more about the shift to IP, visit EMC at IBC 2016 at stand 7.H10, September 9-13. The media and entertainment experts will be onsite exhibiting an array of new products and media workflow solutions that include 4K content creation, IP-based hybrid-cloud broadcast operations, and cloud DVR on-demand content delivery. EMC will also be demonstrating a number of partner solutions at IBC, including:

Pixspan, Aspera and NVIDIA

Advances in full resolution 4K workflows – EMC, Pixspan, Aspera, and NVIDIA are bringing full resolution 4K workflows to IT infrastructures, advancing digital media workflows with bit exact content over standard 10 GbE networks. Solution Overview

Imagine Communications

Integrated channel playout solution with Imagine Communications – EMC and Imagine Communications bring live channel playout with the Versio solution in an integrated offering with EMC’s converged VCE Vblock system and EMC’s Isilon scale-out NAS storage system. Solution Overview

MXFserver

Remote and collaborative editing solution with MXFserver –EMC and MXFsever are announcing an integrated disk-based archiving solution that allows immediate online retrieval of media files. The combined solution utilizes MXFserver software and EMC’s Isilon scale-out NAS to deliver storage as well as a platform for industry-leading editing applications. Solution Overview

Anevia

Cloud-based multi-platform content delivery with Anevia – The joint release from Anevia and EMC allows media organizations to deliver OTT Services: Live, Timeshift, Replay, Catchup, Start over, Pause, CDVR, and VOD to provide content to all devices, enabling consumers to access and view content they have recorded on any device at any time. Solution Overview

Rohde & Schwarz

EMC and Rohde & Schwarz announce an interoperability with Isilon storage and the Venice Ingest and Production platform. Venice is a real-time and file-based ingest and playout server from Rohde & Schwarz. Solution Overview

NLTek

EMC and NLTek bring a combined solution enabling integration with Avid Interplay. Working within the familiar Avid MC|UX toolset, users are able to store and restore Avid Assets to an EMC Isilon or ECS media repository—creating a unified Nearchive. Solution Overview

For more information and to schedule a meeting at IBC, please visit our website.

 

 

This summer, NBC captured history while setting standards for the future

This summer, NBC captured history while setting standards for the future.

Building on its history covering the Olympic Games, NBC provided viewers in the United States a front row seat to the Games of the XXXI Olympiad.

Projects such as covering the Games, a 17-day live concurrent event, require the ultimate in scalable, reliable storage. NBC uses the EMC Isilon product line to store and stage video captured during these irreplaceable moments of sporting glory, as well as audio, stills and motion graphics.

Isilon’s 3 Petabyte storage repository bridges the gap from Stamford to Rio, where it functioned as a single large Data Lake, enabling real-time global collaborative production supporting the entire broadcast. Adding Isilon nodes without downtime allows the addition of storage capacity and network throughput while maintaining seamless access to a rock solid platform.

NBC selected the EMC Isilon product line as a reliable, proven infrastructure, to manage their storage.

 

 

Follow Dell EMC

Categories

Archives

Connect with us on Twitter