Posts Tagged ‘analytics’

Analyst firm IDC evaluates EMC Isilon: Lab-validation of scale-out NAS file storage for your enterprise Data Lake

Suresh Sathyamurthy

Sr. Director, Product Marketing & Communications at EMC

A Data Lake should now be a part of every big data workflow in your enterprise organization. By consolidating file storage for multiple workloads onto a single shared platform based on scale-out NAS, you can reduce costs and complexity in your IT environment, and make your big data efficient, agile and scalable.

That’s the expert opinion in analyst firm IDC’s recent Lab Validation Brief: “EMC Isilon Scale-Out Data Lake Foundation: Essential Capabilities for Building Big Data Infrastructure”, March 2016. As the lab validation report concludes: “IDC believes that EMC Isilon is indeed an easy-to-operate, highly scalable and efficient Enterprise Data Lake Platform.

The Data Lake Maximizes Information Value

The Data Lake model of storage represents a paradigm shift from the traditional linear enterprise data flow model. As data and the insights gleaned from it increase in value, enterprise-wide consolidated storage is transformed into a hub around which the ingestion and consumption systems work. This enables enterprises to bring analytics to data in-place – and avoid expensive costs of multiple storage systems, and time for repeated ingestion and analysis.

But pouring all your data into a single shared Data Lake would put serious strain on traditional storage systems – even without the added challenges of data growth. That’s where the virtually limitless scalability of EMC Isilon scale-out NAS file storage makes all the difference…

The EMC Data Lake Difference

The EMC Isilon Scale-out Data Lake is an Enterprise Data Lake Platform (EDLP) based on Isilon scale-out NAS file storage and the OneFS distributed file system.

As well as meeting the growing storage needs of your modern datacenter with massive capacity, it enables big data accessibility using traditional and next-generation access methods – helping you manage data growth and gain business value through analytics. You can also enjoy seamless replication of data from the enterprise edge to your core datacenter, and tier inactive data to a public or private cloud.

We recently reached out to analyst firm IDC to lab-test our Isilon Data Lake solutions – here’s what they found in 4 key areas…

  1. Multi-Protocol Data Ingest Capabilities and Performance

Isilon is an ideal platform for enterprise-wide data storage, and provides a powerful centralized storage repository for analytics. With the multi-protocol capabilities of OneFS, you can ingest data via NFS, SMB and HDFS. This makes the Isilon Data Lake an ideal and user-friendly platform for big data workflows, where you need to ingest data quickly and reliably via protocols most suited to the workloads generating the information. Using native protocols enables in-place analytics, without the need for data migration, helping your business gain more rapid data insights.

datalake_blog

IDC validated that the Isilon Data Lake offers excellent read and write performance for Hadoop clusters accessing HDFS via OneFS, compared against via direct-attached storage (DAS). In the lab tests, Isilon performed:

  • nearly 3x faster for data writes
  • over 1.5x faster for reads and read/writes.

As IDC says in its validation: “An Enterprise Data Lake platform should provide vastly improved Hadoop workload performance over a standard DAS configuration.”

  1. High Availability and Resilience

Policy-based high availability capabilities are needed for enterprise adoption of Data Lakes. The Isilon Data Lake is able to cope with multiple simultaneous component failures without interruption of service. If a drive or other component fails, it only has to recover the specific affected data (rather than recovering the entire volume).

IDC validated that a disk failure on a single Isilon node has no noticeable performance impact on the cluster. Replacing a failed drive is a seamless process and requires little administrative effort. (This is in contrast to traditional DAS, where the process of replacing a drive can be rather involved and time consuming.)

Isilon can even cope easily with node-level failures. IDC validated that a single-node failure has no noticeable performance impact on the Isilon cluster. Furthermore, the operation of removing a node from the cluster, or adding a node to the cluster, is a seamless process.

  1. Multi-tenant Data Security and Compliance

Strong multi-tenant data security and compliance features are essential for an enterprise-grade Data Lake. Access zones are a crucial part of the multi-tenancy capabilities of the Isilon OneFS. In tests, IDC found that Isilon provides no-crossover isolation between Hadoop instances for multi-tenancy.

Another core component of secure multi-tenancy is the ability to provide a secure authentication and authorization mechanism for local and directory-based users and groups. IDC validated that the Isilon Data Lake provides multiple federated authentication and authorization schemes. User-level permissions are preserved across protocols, including NFS, SMB and HDFS.

Federated security is an essential attribute of an Enterprise Data Lake Platform, with the ability to maintain confidentiality and integrity of data irrespective of the protocols used. For this reason, another key security feature of the OneFS platform is SmartLock – specifically designed for deploying secure and compliant (SEC Rule 17a-4) Enterprise Data Lake Platforms.

In tests, IDC found that Isilon enables a federated security fabric for the Data Lake, with enterprise-grade governance, regulatory and compliance (GRC) features.

  1. Simplified Operations and Automated Storage Tiering

The Storage Pools feature of Isilon OneFS allows administrators to apply common file policies across the cluster locally – and extend them to the cloud.

Storage Pools consists of three components:

  • SmartPools: Data tiering within the cluster – essential for moving data between performance-optimized and capacity-optimized cluster nodes.
  • CloudPools: Data tiering between the cluster and the cloud – essential for implementing a hybrid cloud, and placing archive data on a low-cost cloud tier.
  • File Pool Policies: Policy engine for data management locally and externally – essential for automating data movement within the cluster and the cloud.

As IDC confirmed in testing, Isilon’s federated data tiering enables IT administrators to optimize their infrastructure by automating data placement onto the right storage tiers.

The expert verdict on the Isilon Data Lake

IDC concludes that: “EMC Isilon possesses the necessary attributes such as multi-protocol access, availability and security to provide the foundations to build an enterprise-grade Big Data Lake for most big data Hadoop workloads.”

Read the full IDC Lab Validation Brief for yourself: “EMC Isilon Scale-Out Data Lake Foundation: Essential Capabilities for Building Big Data Infrastructure”, March 2016.

Learn more about building your Data Lake with EMC Isilon.

The Democratization of Data Science with the Arrival of Apache Spark

Keith Manthey

CTO of Analytics at EMC Emerging Technologies Division

As an emerging field, data science has seen rapid growth over the span of just a few short years. With Harvard Business Review referring to the data scientist role as the “sexiest job of the 21st century” in 2012 and job postings for the role growing 57 percent in the first quarter of 2015, enterprises are increasingly seeking out talent to help bolster their organizations’ understanding of their most valuable assets: their data.

The growing demand for data scientists reflects a larger business trend – a shifting emphasis from the zeros and ones to the people who help manage the mounds of data on a daily basis. Enterprises are sitting on a wealth of information but are struggling to derive actionable insights from it, in part due to its sheer volume but also because they don’t have the right talent on board to help.

The problem enterprises now face isn’t capturing data – but finding and retaining top talent to help make sense of it in meaningful ways. Luckily, there’s a new technology on the horizon that can help democratize data science and increase accessibility to the insights it unearths.

Data Science Scarcity & Competition

dataThe talent pool for data scientists is notoriously scarce. According to McKinsey & Company, by 2018, the United States alone may face a 50 to 60 percent gap between supply and demand for “deep analytic talent, i.e., people with advanced training in statistics or machine learning.” Data scientists possess an essential blend of business acumen, statistical knowledge and technological prowess, rendering them as difficult to train as they are invaluable to the modern enterprise.

Moreover, banks and insurance companies face an added struggle in hiring top analytics talent, with the allure of Silicon Valley beckoning top performers away from organizations perceived as less inclined to innovate. This perception issue hinders banks’ and insurance companies’ ability to remain competitive in hiring and retaining data scientists.

As automation and machine learning grow increasingly sophisticated, however, there’s an opportunity for banks and insurance companies to harness the power of data science, without hiring formally trained data scientists. One such technology that embodies these innovations in automation is Apache Spark, which is poised to shift the paradigm of data science, allowing more and more enterprises to tap into insights culled from their own data.

Spark Disrupts & Democratizes Data Science

Data science requires three pillars of knowledge: statistical analysis, business intelligence and technological expertise. Spark does the technological heavy-lifting, by understanding and processing data at a scale that most people aren’t comfortable. It handles the distribution and categorization of the data, removing the burden from individuals and automating the process. By allowing enterprises to load data into clusters and query it on an ongoing basis, the platform is particularly adept at machine-learning and automation – a crucial component in any system intended to analyze mass quantities of data.

Spark was created in the labs of UC Berkeley and has quickly taken the analytics world by storm, with two main business propositions: the freedom to model data without hiring data scientists, and the power to leverage analytics models that are already built and ready-for-use in Spark today. The combination of these two attributes allows enterprises to gain speed on analytics endeavors with a modern, open-source technology.

The arrival of Spark signifies a world of possibility for companies that are hungry for the business value data science can provide but are finding it difficult to hire and keep deep analytic talent on board. The applications of Spark are seemingly endless, from cybersecurity and fraud detection to genomics modeling and actuarial analytics.

What Spark Means for Enterprises

Not only will Spark enable businesses to hire non-traditional data scientists, such as actuaries, to effectively perform the role, but it will also open a world of possibilities in terms of actual business strategy.

Banks, for example, have been clamoring for Spark from the get-go, in part because of Spark’s promise to help banks bring credit card authorizations back in-house. For over two decades, credit card authorizations have been outsourced, since it was more efficient and far less dicey to centralize the authorization process.

The incentive to bring this business back in-house is huge, however, with estimated cost savings of tens to hundreds of millions annually. With Spark, the authorization process could be automated in-house – a huge financial boon to banks. The adoption of Spark allows enterprises to effectively leverage data science and evolve their business strategies accordingly.

The Adoption of Spark & Hadoophadoop_1_resized

Moreover, Spark works seamlessly with the Hadoop Distributions sitting on EMC’s storage platforms. As I noted in my last post, Hadoop adoption among enterprises has been incredible and is quickly becoming the de facto
standard for storing and processing terabytes or even petabytes of data.

By leveraging Spark and existing Hadoop platforms in tandem, enterprises are well-prepared to solve the ever-increasing data and analytics challenges ahead.

Digital Strategies:  Are Analytics Disrupting the World?

Keith Manthey

CTO of Analytics at EMC Emerging Technologies Division

Close up of woman hand pointing at business document during discussion at meeting“Software is eating the world”.  It is a phrase that we often see written, but sometimes do not fully understand.  More recently I read derivations of that phrase that posits that “analytics are disrupting the world”.  Both phrases have a lot of truth.  But why? Some of the major disruptions in the last 5 years can be attributed to analytics.  Most companies that serve as an intermediary, such as Uber or AirBNB, with a business model of making consumer and supplier “connections” are driven by analytics.  Pricing surges, routing optimizations, available rentals, available drivers, etc. are all algorithms to these “connection” businesses that are disrupting the world.  It could be argued that analytics is their secret weapon.

It is normal for startups to try new and sometimes crazy & risky investments into new technologies like Hadoop and analytics.  The trend is carrying over into traditional industries and established businesses as well.  What are the analytics uses cases in industries like Financial Services (aka FSI)?

Established Analytics Plays in FSI

Two use cases naturally come to my mind when I think of “Analytics” and “Financial Services”; High Frequency Trading and Fraud are two traditional use cases that have long utilized analytics.  Both are fairly well respected and written about with regard to their heavy use of analytics.  I myself blogged recently (From Kinetic to Synthetic) on behalf of Equifax regarding the market trends in Synthetic Fraud.  Beyond these obvious trends though, where are analytics impacting the Financial Services industry?  What use cases are relevant and impacting the industry in 2016 and why?

Telematics

The insurance industry has been experimenting with opt-in programs that monitor driving behavior for several years.  Insurance companies have varying opinions of its usefulness, but it’s clear that driving behavior is (1) a heavy use of unstructured data and (2) a dramatic leap from the statistical based approach using financial data, actuarial tables, and statistics.  Telematics is the name given to a set of opt-in programs around usage-based insurance / driver monitoring programs. Telematics use in insurance companies has fostered a belief that has long been used in other verticals like fraud that pins behavior down to an individual pattern instead of trying to predict broad swaths of patterns.  To be more precise, Telematics is looking to derive a “behavior of one” vs a “generalized driving pattern for 1K individuals”.  As to the change of why this is different from past insurance practices, we will draw a specific comparison between the two. Method One – historical actuarial tables of life expectancy along with demographic and financial data to denote risk vs. Method Two – how does ONE individual drive based upon real driving data as received from their car.  Which might be more predictive about the expected rate of accidents is the question for analytics.  While this is a gross over-simplification of the entire process, it is a radical shift of the types of data and the analytical methods of deriving value from the data available to the industry.  Truly transformational.

Labor Arbitrage

The insurance industry has been experimenting with analytics based on past performance data.  The industry has years of predictive information (i.e., claim reviews along with actual outcomes) based on past claims.  By exploring this past performance data, Insurance companies are able to apply logistical regression algorithms to derive weighted scores.  The derived scores are then being analyzed to determine a path forward.  For example, if scores greater then 50 amounted to claims that are evaluated and then almost always paid by the insurer, then all scores above 50 should be immediately approved and paid.  The inverse is also true that treatments can be quickly rejected as they are often not appealed or regularly turned down under review if appealed. The analytics of the actual present case was compared against previous outcomes of the corpus of past performance data to derive the most likely outcome of the case.  The resulting business effect would be that the workforce that reviewed medical claims would only be given those files that needed to be worked.  The result would be a better work force productivity.  Labor Arbitrage with data and analytics being the disruptor of workforce trends.

Know Your Customer

Retail Banking has turned to analytics as they have focused on attracting and retaining their customers.   After a large trend of acquisitions in the last decade, retail banks are working to integrate their various portfolios.  In Business people shaking hands, finishing up a meetingsome cases, resolving down the identity of all their clients on all their accounts isn’t always as straight forward as it sounds.  This is especially hard with dormant accounts that might have maiden names, mangled data attributes, or old addresses.  The ultimate goal of co-locating all their customer data into an analytics environment is a customer 360.  Customer 360 is mainly focused on gaining full insights around a customer.  This can lead to upsell opportunities by understanding what a customer’s peer set and what products a similar demographic has a strong interest in. For example, if individuals of a given demographic typically subscribe to 3 of a company’s 5 products, an individual matching that demographic should be targeted for upsell on those additional products when they only subscribe to 1 product.  This is using large swathes of data and companies own product adoptions to build upsell and marketing strategies for their own customers.  If someone was a small business owner and personal consumer of the retail bank, the company may not have previously tied those accounts together.  It gives the bank a whole new perspective on who its customer base really is.

Wrap Up

Why are these trends interesting?  In most of these cases above, people are familiar with certain portions of the story.  The underlying why or what, might often get missed.  It is important to not only understand the technology and capabilities involved with transformation, but also the underlying shift that is being caused. EMC has a long history of helping customers through those journeys and we look forward to helping even more clients face them.

 

 

 

 

MLBAM Goes Over the Top: The Case for a DIY Approach to OTT

James Corrigan

Advisory Solutions Architect at EMC

Latest posts by James Corrigan (see all)

smart tvWhen looking at the current media landscape, the definition of what constitutes a “broadcaster” is undergoing a serious overhaul. Traditional linear TV might not be dead just yet, but it’s clearly having to reinvent itself in order to stay competitive amid rapidly evolving media business models and increasingly diverse content distribution platforms.

The concept of “binge watching” a TV show, for example, was non-existent only a few years ago. Media consumption towards digital and online viewership on a myriad of devices such as smartphones, tablets and PCs is on the rise. Subscription on-demand services are becoming the consumption method of choice, while broadcast-yourself platforms like Twitch and YouTube are fast becoming a popular corner stone of millennial’s viewership habits. Horowitz Research found that over 70 percent of millennials have access to an OTT SVOD service, and they are three times as likely to have an OTT SVOD service without a pay TV subscription. PricewaterhouseCoopers (PwC) estimates that OTT video streaming will grow to be a $10.1 billion business by 2018, up from $3.3 billion in 2013.

As a result, broadcast operators are evolving into media aggregators, and content providers are transforming into “entertainment service providers,” expanding into platforms ranging from mobile to digital to even virtual theme parks.

Building Versus Buying:

This change in media consumption requires media organizations to consider a more efficient storage compute and network infrastructure. Media organizations need flexible and agile platforms to not only expand their content libraries but also to meet the dynamic growth in the number of subscribers and how they consume and experience media and entertainment.

To successfully compete in OTT market is dependent upon the “uniqueness” of your service to the consumer , This uniqueness comes from either having unique or exclusive content, or by having a platform which is able to adapt and offer the customer more than just watching content. For the latter how you deploy your solution whether it be (1) build your own (“DIY”), (2) buy a turn-key solution or (3) take a hybrid approach, is key to success.

MLBAM Hits a Home Run with a DIY Approach

A key advantage of the “DIY” approach is that it increases business agility, allowing media organizations to adapt and change, as consumers demand more from their services. For some media organizations this  allows them to leverage existing content assets, infrastructure and technology teams and keep deployment costs low. Further, layering OTT video delivery on top of regular playout enables organizations to incrementally add the new workflow to the existing content delivery ecosystem. For new entrants,  the DIY approach enables  new development methodologies, allowing these “new kids on the block” to develop micro-services unencumbered by legacy services.

One example of an organization taking the DIY approach is Isilon customer Major League Baseball Advanced Media (MLBAM), which has created a streaming media empire. MLBAM’s success underscores the voracious and rapid growth in consumer demand for streaming video; it streams sporting events, and also supports the streaming service HBO GO, as well as mobile, web and TV offerings for the NHL.

“The reality is that now we’re in a situation where digital distribution isn’t just a ‘nice to have’ strategy, it’s an essential strategy for any content company,” said Joe Inzerillo, CTO for MLBAM. “When I think about…how we’re going to be able to innovate, I often tell people ‘I don’t manage technology, I actually manage velocity.’ The ability to adapt and innovate and move forward is absolutely essential.”

Alternatively, the turn-key approach, which either outsources your media platform or gives you a pre-built video delivery infrastructure, can offer benefits such as increased speed-to-market. However, selecting the right outsource partner for this approach is critical; you choose incorrectly and it can create vendor lock-in, loss of control and flexibility and larger operational costs.

Making it Personal: Analytics’ Role

3D smart tv with hand holding remote control isolatedBeing able to access content when and where consumer’s want – on the device they want – is one part of the challenge with the rise of digital and online content. Another key component is personalization of that content to viewers. Making content more relevant and tailored for subscribers is critical to the success of alternate broadcast business models – EMC and Pivotal are helping media companies extract insights on customers through the development and use of analytics should be key to any OTT strategy. Analyzing data on what consumers are watching should be used to help drive content acquisition and personalized recommendation engines. The added benefits of personalized advertisement of content through targeted ad insertion will help increase revenue through tailored advertisements.

Scaling for the future

Infrastructure platforms that scale is the final consideration for the new age media platforms. Being able to scale “apps” based on containers or virtual instances is key. To do that you need a platform that scales compute, network and storage independently or together, just like EMC’s scale out NAS with Isilon or scale out compute with VCE or VXRail/Rack. MLBAM’s Inzerillo explains. “The ability to have a technology like Isilon that’s flexible, so that the size of the data lake can grow as we on board clients, is increasingly important to us. That kind of flexibility allows you to really focus on total cost of ownership of the custodianship of the data.”

Inzerillo continues, “If you’re always worried about the sand that you’re standing on, because it’s shifting, you’re never going to be able to jump, and  what we need to be able to do is sprint.”

It’s an exciting time to be in the ever-evolving media and entertainment space – the breadth of offerings that broadcasters and media companies are developing today, and the range of devices and distribution models to reach subscribers will only continue to grow.

Check out how MLBAM improves customer experience through OTT.

Infrastructure Convergence Takes Off at Melbourne Airport

Yasir Yousuff

Sr. Director, Global Geo Marketing at EMC Emerging Technologies Division

Latest posts by Yasir Yousuff (see all)

By air, by land, or by sea? Which do you reckon is the most demanding means of travel these days? In asking so, I’d like to steer your thoughts to the institutions and businesses that provide transportation in these myriad segments.

Melbourne Airport_resizedHands down, my pick would be aviation; out of which the heaviest burden falls on any international airport operating 24/7. Let’s take Melbourne Airport in Australia for example. In a typical year, some 32 million passengers transit through its doors – almost a third more than Australia’s entire population. If you think that’s a lot; that figure looks set to double to 64 million by 2033.

As the threat of terrorism grows, so will the criteria for stringent checks. And as travelers get more affluent, so will their expectations. Put the two together, you get somewhat of a paradoxical dilemma that needs to be addressed.

So how does Australia’s only major 24/7 airport cope with these present and future demands?

First Class Security Challenges

Beginning with security, airports have come to terms with the fact that sole passport checks in the immigration process isn’t sufficient. Thanks to Hollywood movies and their depictions of how easy it is to get hold of “fake” passports – think Jason Bourne but in the context of a “bad” guy out to harm innocents, a large majority of the public within the age of reasoning would have to agree that more detailed levels of screening are a necessity.

“Some of the things we need to look at are new technologies associated with biometrics, new methods of running through our security and our protocols. Biometrics will require significant compute power and significant storage ability,” says Paul Bunker, Melbourne Airport’s Business Systems & ICT Executive.

With biometrics, Bunker is referring to breakthroughs such as fingerprint and facial recognition. While these data dense technologies are typically developed in silos, airports like the Melbourne Airport need them to function coherently as part of its integrated security ecosystem and processed in near real-time to ensure authorities have ample time to respond to threats.

First Class Service Challenges

Then there are the all-important passengers who travel in and out for a plethora of reasons: some for business, some for leisure, and some on transit to other destinations.

Whichever the case, most, if not all of them, expect a seamless experience. In this regard, it means free from the hassles of waiting for long periods to clear immigration, picking up luggage at belts almost immediately after, and the list goes on.

With the airport’s IT systems increasingly strained in managing these operational outcomes, a more sustainable way forward is inevitable.

First Class Transformative Strategy

Melbourne Airport has historically been more reactive and focused heavily on maintenance but that has changed in recent times. Terminal 4, which opened in August 2015, became the airport’s first terminal to embrace digital innovation, boasting Asia-Pacific’s first end-to-end self-service model from check-in kiosks to automated bag drop facilities.

This comes against the backdrop of a new charter that aims to enable IT to take on a more strategic role and drive greater business value through technology platforms.

“We wanted to create a new terminal that was effectively as much as possible a fully automated terminal where each passenger had more control over the environment,” Bunker explained. “Technical challenges associated with storing massive amounts of data generated not only by our core systems but particularly by our CCTV and access control solutions is a major problem we had.”

First Class Solution

In response, Melbourne Airport implemented two VCE Vblock System 340 with a VNX5600 converged infrastructure solution featuring 250 virtual servers and 2.5 petabytes of storage capacity. Two EMC Isilon NL series clusters were further deployed at two sites for production and disaster recovery.

Business People Rushing Walking Plane Travel Concept

The new converged infrastructure has allowed Melbourne Airport to simplify its IT operations by great leaps, creating a comfortable buffer that is able to support future growth as the business matures. It has also guaranteed high availability on key applications like baggage handling and check-in, crucial in the development of Terminal 4 as a fully automated self-service terminal.

While key decision-makers may have a rational gauge on where technological trends are headed, it is far from 100%. These sweeping reforms have effectively laid the foundations to enable flexibility in adopting new technologies across the board – biometrics for security and analytics for customer experience enhancement – whenever the need calls for it. Furthermore, the airport can now do away with separate IT vendors to reduce management complexity.

Yet all these come pale in comparison to the long-term collaborative working relationship Melbourne Airport has forged with EMC to support its bid to become an industry-leading innovation driver of the future.

Read the Melbourne Airport Case Study to learn more.

 

Hadoop Grows Up: How Enterprises Can Successfully Navigate its Growing Pains

Keith Manthey

CTO of Analytics at EMC Emerging Technologies Division

If you’d asked me 10 years ago whether enterprises would be migrating to Hadoop, I would’ve answered with an emphatic no. Slow to entice enterprise customers and named after a toy elephant, at first glance, the framework didn’t suggest it was ready for mass commercialization or adoption.

But the adoption of Hadoop among enterprises has been phenomenal. With its open-source software framework, Hadoop provides enterprises with the ability to process and store unprecedented volumes of data – a capability today’s enterprise sorely needs – effectively becoming today’s default standard for storing, processing and analyzing mass quantities, hundreds of terabytes or even petabytes of data.

While the adoption and commercialization of Hadoop is remarkable and an overall positive move for enterprises hungry for streamlined data storage and processing, enterprises are in for a significant challenge with the migration from Hadoop 2.0 to 3.X.

Most aren’t sure what to expect, and few experienced the earlier migration’s pain points. Though Hadoop has “grown up”, in that it is now used by some of the world’s largest enterprises, it hasn’t identified a non-disruptive solution when it jumps major releases.

Happening in just a few short years, this next migration will have dramatic implications for the storage capabilities of today’s insurance companies, banks and largest corporations. It’s imperative that these organizations begin planning for the change now to ensure that their most valuable asset—their data—remains intact and accessible in an “always on” culture that demands it.

Why the Migration Mattersmigration_2

First, let’s explore the significant benefits of the migration and why, despite the headaches, this conversion will ultimately be beneficial for enterprises.

One of the key benefits of Hadoop 3.X is erasure coding, which will dramatically decrease the amount of storage needed to protect data. In a more traditional system, files are replicated multiple times in order to protect against loss. If one file becomes lost or corrupted, its replica can easily be summoned in place of the original file or datum.

As you can imagine, replication of data requires significant volumes of storage that can shield against data failure, but is expensive. In fact, default replication requires an additional 200 percent in storage space and other resources, such as network bandwidth when writing the data.

Hadoop 3.X’s move to erasure coding resolves the storage issue while maintaining the same level of fault tolerance. In other words, erasure coding helps protect data as effectively as traditional forms of coding but takes up far less storage. In fact, erasure coding is estimated to reduce the storage cost by 50 percent – a huge financial boon for enterprises moving to Hadoop 3.X. With Hadoop 3.X, enterprises will be able to store twice as much data on the same amount of raw storage hardware.

That being said, enterprises updating to Hadoop 3.X will face significant roadblocks to ensure that their data remains accessible and intact during a complicated migration process.

Anticipating Challenges Ahead

For those of us who experienced the conversion from Hadoop 1.X to Hadoop 2.X, it was a harrowing one, requiring a complete unload of the Hadoop environment data and a complete re-load onto the new system. That meant long periods of data inaccessibility and, in some cases, data loss. Take a typical laptop upgrade and multiply the pain points thousand-fold.

Data loss is no longer a tolerable scenario for today’s enterprises and can have huge financial, not to mention reputational implications. However, most enterprises adopted Hadoop after its last revamp, foregoing the headaches associated with major upgrades involving data storage and processing. These enterprises may not anticipate the challenges ahead.

The looming migration can have potentially dire implications for today’s enterprises. A complete unload and re-load of enterprises’ data will be expensive, painful and fraught with data loss. Without anticipating the headaches in store for the upcoming migration, enterprises may forego the necessary measures to ensure the accessibility, security and protection of their data.

Navigating the Migration Successfully

Isilon_Hadoop_2The good news is that there is a simple, actionable step enterprises can take to manage migration and safeguard their data against loss, corruption and inaccessibility.

Enterprises need to ensure that their current system does not require a complete unload and reload of their data. Most systems do require a complete unload and reload, so it is crucial that enterprises understand their current system and its capabilities when it comes to the next Hadoop migration.

If the enterprise were on Isilon for Hadoop, for example, there would be no need to unload and re-load its data. The enterprise would simply point the newly upgraded computer nodes to Isilon, with limited downtime, no re-load time and no risk for data loss.

Isilon for Hadoop helps enterprises ensure the accessibility and protection of their data through the migration process to an even stronger, more efficient Hadoop 3.X. While I’m eager for the next revamp of Hadoop and its tremendous storage improvements, today’s enterprises need to take precautionary measures before the jump to protect their data and ensure the transition is as seamless as possible.

Categories

Archives

Connect with us on Twitter