Author Archive

Hadoop Grows Up: How Enterprises Can Successfully Navigate its Growing Pains

Keith Manthey

CTO of Analytics at EMC Emerging Technologies Division

If you’d asked me 10 years ago whether enterprises would be migrating to Hadoop, I would’ve answered with an emphatic no. Slow to entice enterprise customers and named after a toy elephant, at first glance, the framework didn’t suggest it was ready for mass commercialization or adoption.

But the adoption of Hadoop among enterprises has been phenomenal. With its open-source software framework, Hadoop provides enterprises with the ability to process and store unprecedented volumes of data – a capability today’s enterprise sorely needs – effectively becoming today’s default standard for storing, processing and analyzing mass quantities, hundreds of terabytes or even petabytes of data.

While the adoption and commercialization of Hadoop is remarkable and an overall positive move for enterprises hungry for streamlined data storage and processing, enterprises are in for a significant challenge with the migration from Hadoop 2.0 to 3.X.

Most aren’t sure what to expect, and few experienced the earlier migration’s pain points. Though Hadoop has “grown up”, in that it is now used by some of the world’s largest enterprises, it hasn’t identified a non-disruptive solution when it jumps major releases.

Happening in just a few short years, this next migration will have dramatic implications for the storage capabilities of today’s insurance companies, banks and largest corporations. It’s imperative that these organizations begin planning for the change now to ensure that their most valuable asset—their data—remains intact and accessible in an “always on” culture that demands it.

Why the Migration Mattersmigration_2

First, let’s explore the significant benefits of the migration and why, despite the headaches, this conversion will ultimately be beneficial for enterprises.

One of the key benefits of Hadoop 3.X is erasure coding, which will dramatically decrease the amount of storage needed to protect data. In a more traditional system, files are replicated multiple times in order to protect against loss. If one file becomes lost or corrupted, its replica can easily be summoned in place of the original file or datum.

As you can imagine, replication of data requires significant volumes of storage that can shield against data failure, but is expensive. In fact, default replication requires an additional 200 percent in storage space and other resources, such as network bandwidth when writing the data.

Hadoop 3.X’s move to erasure coding resolves the storage issue while maintaining the same level of fault tolerance. In other words, erasure coding helps protect data as effectively as traditional forms of coding but takes up far less storage. In fact, erasure coding is estimated to reduce the storage cost by 50 percent – a huge financial boon for enterprises moving to Hadoop 3.X. With Hadoop 3.X, enterprises will be able to store twice as much data on the same amount of raw storage hardware.

That being said, enterprises updating to Hadoop 3.X will face significant roadblocks to ensure that their data remains accessible and intact during a complicated migration process.

Anticipating Challenges Ahead

For those of us who experienced the conversion from Hadoop 1.X to Hadoop 2.X, it was a harrowing one, requiring a complete unload of the Hadoop environment data and a complete re-load onto the new system. That meant long periods of data inaccessibility and, in some cases, data loss. Take a typical laptop upgrade and multiply the pain points thousand-fold.

Data loss is no longer a tolerable scenario for today’s enterprises and can have huge financial, not to mention reputational implications. However, most enterprises adopted Hadoop after its last revamp, foregoing the headaches associated with major upgrades involving data storage and processing. These enterprises may not anticipate the challenges ahead.

The looming migration can have potentially dire implications for today’s enterprises. A complete unload and re-load of enterprises’ data will be expensive, painful and fraught with data loss. Without anticipating the headaches in store for the upcoming migration, enterprises may forego the necessary measures to ensure the accessibility, security and protection of their data.

Navigating the Migration Successfully

Isilon_Hadoop_2The good news is that there is a simple, actionable step enterprises can take to manage migration and safeguard their data against loss, corruption and inaccessibility.

Enterprises need to ensure that their current system does not require a complete unload and reload of their data. Most systems do require a complete unload and reload, so it is crucial that enterprises understand their current system and its capabilities when it comes to the next Hadoop migration.

If the enterprise were on Isilon for Hadoop, for example, there would be no need to unload and re-load its data. The enterprise would simply point the newly upgraded computer nodes to Isilon, with limited downtime, no re-load time and no risk for data loss.

Isilon for Hadoop helps enterprises ensure the accessibility and protection of their data through the migration process to an even stronger, more efficient Hadoop 3.X. While I’m eager for the next revamp of Hadoop and its tremendous storage improvements, today’s enterprises need to take precautionary measures before the jump to protect their data and ensure the transition is as seamless as possible.

From Kinetic to Synthetic

Keith Manthey

CTO of Analytics at EMC Emerging Technologies Division

Technology is continuing to evolve and drive disruption. By now, most of you have probably viewed the meme being shared online that lists the biggest ride rental company as having no cars, the biggest accommodation company as having no property, etc. The identity and fraud space has not been unaffected by this trend either.

Several decades ago, identity and fraud were presenting businesses with a challenge, but the challenge was very kinetic. A fraudster usually committed the fraud in person and often used forged documents to commit the crime: fraud was therefore a very physical or kinetic transaction.

Fast-forward to today and kinetic fraud has greatly reduced in scope and impact; in its place, cyber fraud (committed via many different avenues) is burgeoning. Taking into account a number of recent cyber breaches, identity information and compromised payment methods like credit cards are readily available on the dark portions of the web. These identity elements sell for extremely low monetary values these days but it’s the volume of this data that will ultimately be financially rewarding to the fraudsters.


Soon you won’t say Travel Safe, instead you’ll say Travel Smart!

Keith Manthey

CTO of Analytics at EMC Emerging Technologies Division

As a frequent traveler myself, I can appreciate this situation.  A lone traveler is enjoying a quiet evening in their hotel.   As they unwind from the day, they peruse the local paper.  They are shocked to learn that their attempt at returning home the next day will be dashed by transit strikes.  All modes of public transportation will be shutdown causing an ill-timed exit from their current travel stop.  There are certainly other ways for the traveler to reach the airport, but the 5x surge pricing for their popular ride sharing application makes it an expensive trip.  There is also an expectation that the ride sharing application drivers might face violence from striking transit workers.  This all could have been avoided if their company subscribed to a travel alert for pending situations.  The advent of situational awareness tools that can monitor travel threats and pair that to traveler itineraries is an evolving field.  It is an advance warning to that weary traveler that forewarns them to seek personal safety and adjust their travel plans accordingly.  In the case of our weary traveler, an advance warning would allow them to change their travel plans in time to avoid this sticky situation.




Connect with us on Twitter