What’s Next for Hadoop? Examining its Evolution, and its Potential

John Mallory

CTO of Analytics at EMC Emerging Technologies Division

First, consider that data informs nearly every decision an organization makes today. Customers across virtually every industry expect to interact with businesses wherever they go, in real-time, across a myriad pf devices and applications. This results in piles and mounds of information that need to be culled, sorted and organized to find actionable data to drive businesses forward.

This evolution mirrors much of what’s taking place in the Apache-Hadoop ecosystem as it continues to mature and find its place among a broader business audience.

The Origins & Evolution of Hadoop

HadoopLet’s look at the origins of Hadoop as a start. Hadoop originally started out as a framework for big batch processing, which is exactly what early adopters like Yahoo! needed – an algorithm that could crawl all of the content on the Internet to help build big search engines and then take the outputs and monetize them with targeted advertising. That type of a use case is entirely predicated on batch processing on a very large scale.

The next phase centered on how Hadoop would reach a broader customer base. The challenge there was to make Hadoop easier to use by a wider audience. Sure, it’s possible to do very rich processing with Hadoop, but it also has to be programmed very specifically, which can make it difficult to use by enterprise users for business intelligence or reporting. This drove the trend around SQL on Hadoop, which was the big thing about two years ago with companies like Cloudera, IBM, Pivotal and others entering the space.

The third major phase in Hadoop’s evolution, which emerged late last year, was around making it enterprise-grade – ensuring data security and governance as well as data management.  This gave enterprises the confidence that a still-emerging framework like Hadoop offered at least as much security as existing enterprise analytics tools like data warehouses.

Indeed, the workloads that are best suited for Hadoop are changing. Hadoop is at its best when you’re in a Hadoop-centric environment and examining bulk processing of data. But another use case that is rapidly become a higher priority is the movement towards processing massive amounts of data in close to real-time. This will require near real-time stream processing with real-time decision making.

Why IoT + Hadoop is the Next Frontier

Much of this evolution, in my view, is being driven by movements such as the Internet of Things and the architecture necessary to support it. IoT is a huge development from an architectural framework perspective – there’s tremendous fundamental change that has to happen to support IoT from not only the technology side, but also the business factors for its adoption, utilization and deployment.

To get Hadoop to the next stage of its evolution and reduce many of the barriers to its widespread adoption, it’s also important to close the skills gap with users.  There are two ways to approach this, in my opinion. The first is to make Hadoop much more like existing systems, minimizing the need for training (and retraining) for users that will be adopting the technology.

The next way is to leverage the tools and expertise from the open source development community and utilize the intelligent modules and solution recipes to automate several functions within Hadoop. This falls into the area of machine learning and making the software more intelligent, almost self-training.

Overall, Hadoop’s evolution follows the same progression of software development as a whole. Originally you had to write your low level algorithms, then it got automated, then more object-oriented programming modules came into play.

It will be interesting to see how Hadoop continue to refine its framework. Certainly as the role of data plays a more strategic role in today’s business, the tools to manage it must also grow in order to accommodate a wider range of users in both technical and business fields.

A Modern All-IP Infrastructure for your Media Digital Transformation

Charles Sevior

Chief Technology Officer at EMC Emerging Technologies Division

Anybody who has attended one of the major Media / Broadcast trade shows over the last 6 – 12 months would know that most of the large solution vendors are now building solutions designed to take your media workflow into an “All-IP” future state – away from the past infrastructure built on dedicated hardware like video / audio routers and industry-specific cabling such as SDI with embedded audio.  The vendors are working hard to adopt standards from SMPTE, VSF and other bodies and to prove system interoperability.  The task of replacing the reliable and rugged (but inflexible and single-purpose) SDI backbone in your facility with a robust and interoperable All-IP fabric has been accepted by AIMS – the Alliance for IP Media Solutions.  This is an industry-lead collaboration of some of the largest M&E vendors, including:

  • Grass Valley (Belden)
  • Imagine Communications
  • Snell Advanced Media
  • Evertz
  • EVS
  • Lawo
  • Sony
  • Cisco
  • Arista
  • Harmonic

But more than just replacing your SDI cable and video router with Cat-6 and a network router, this multi-vendor industry alliance is also focused on fully-virtualised software-defined media solutions. What this means is that solution vendors are now re-engineering their applications to “behave nicely” in the virtualised and containerised environment that is commonly adopted in Enterprise IT and Cloud-Scale operations today.  This is different to the bespoke engineering of the past that assumed full control over some “bare metal” hardware.  Whilst this may be seen as introducing some performance constraints due to the overheads of virtualisation, it actually results in a more robust solution that can leverage the enormous resources of IT scale available today, and has improved features of monitoring & control with graceful failure and rapid service replacement.


Breakfast with ECS: The Swiss Army Knife of Cloud Solutions

Corey O'Connor

Product Marketing Manager at EMC² ETD

Welcome to another edition of Breakfast with ECS, a series where we take a look at issues related to cloud storage and ECS (Elastic Cloud Storage), EMC’s cloud-scale storage platform.

ECS Cloud Enabling ToolsA Swiss army knife is a multi-layered tool equipped with a variety of attachments that can serve up many different functions. When first introduced in the late 1880s, it revolutionized the way soldiers performed their daily tasks – anything from disassembling service rifles to opening up canned rations in the field.  Fast forward to 2016, the use of the Swiss army knife may have changed quite a bit but the initial concept of consolidating various components into a single multi-purpose tool has certainly influenced organizations and industries across the world.

EMC’s Elastic Cloud Storage (ECS) is without question the Swiss army knife for cloud solutions.  ECS revolutionizes storage management by consolidating varied workloads for object, file, and HDFS into a single, unified system.  You can manage both traditional and ‘next-gen’ or ‘cloud-native’ applications on a platform that spans geographies and acts as a single logical resource.  Just like a Swiss army knife, ECS can maximize capacity by packing a lot into a tiny space.  ECS Appliance can squeeze in sixty 8TB drives into a standard 4U DAE with up to 4PB of storage in single rack, for a highly dense platform with a very economical data center footprint.

EMC provides a suite of cloud-enabling ‘tools and attachments’ to the ECS platform that deliver specialized functionality for cloud tiering, long-term retention, and archiving use cases. Here’s a quick review of these solutions:

  • CloudPools – extends the data lake to the cloud with transparent tiering of cold/inactive data from EMC Isilon clusters to ECS low cost cloud storage
  • CloudBoost – move EMC Data Protection Suite (Avamar & Networker) archives to the cloud for long-term retention purposes
  • CloudArray – cloud integrated storage tier that extends cost-effective cloud capacity to high-performance storage arrays
  • DD Cloud Tier – enables simple cloud tiering of deduped data from EMC Data Domain to ECS for long-term retention

All of these solutions seamlessly integrate with existing EMC products/investments with the same objective of leveraging cloud storage to lower cost and free up high performance Tier 1 resources. These solutions also have the option of targeting public cloud vendors like Virtustream or other storage providers for added choice and flexibility. Whether you’re an enterprise or cloud service provider with a private, public, or hybrid strategy – or – whether you have a Capex or Opex priority, EMC has a flexible cloud solution that can manage your workloads and applications for today and for tomorrow.

Want more on ECS? Download the latest fully containerized version of ECS for FREE for non-production use by visiting www.emc.com/getecs.

Soon you won’t say Travel Safe, instead you’ll say Travel Smart!

Keith Manthey

CTO at EMC Emerging Technologies Division

Latest posts by Keith Manthey (see all)

As a frequent traveler myself, I can appreciate this situation.  A lone traveler is enjoying a quiet evening in their hotel.   As they unwind from the day, they peruse the local paper.  They are shocked to learn that their attempt at returning home the next day will be dashed by transit strikes.  All modes of public transportation will be shutdown causing an ill-timed exit from their current travel stop.  There are certainly other ways for the traveler to reach the airport, but the 5x surge pricing for their popular ride sharing application makes it an expensive trip.  There is also an expectation that the ride sharing application drivers might face violence from striking transit workers.  This all could have been avoided if their company subscribed to a travel alert for pending situations.  The advent of situational awareness tools that can monitor travel threats and pair that to traveler itineraries is an evolving field.  It is an advance warning to that weary traveler that forewarns them to seek personal safety and adjust their travel plans accordingly.  In the case of our weary traveler, an advance warning would allow them to change their travel plans in time to avoid this sticky situation.


Breakfast with ECS: Doubling Down on Docker

Corey O'Connor

Product Marketing Manager at EMC² ETD

Welcome to another edition of Breakfast with ECS, a series where we take a look at issues related to cloud storage and ECS (Elastic Cloud Storage), EMC’s cloud-scale storage platform.

Unless you’ve been living under a rock I’m sure you’ve heard of Docker at this point. If you haven’t, it’s time to dust yourself off and understand that Docker containers will wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in. Genius right? Docker_1

The usage of containers has been around quite some time now, but the extra juice worth squeezing came from Docker’s ability to provide total isolation of resources to package and automate applications more effectively than ever before.  Docker provides system administrators and developers the ability to package any kind of software with all its dependencies into a container. Simply put, this resource efficiency standardizes each container and promotes massive scalability – this plugs in very nicely for cloud-scale, geo-distributed systems such as EMC’s Elastic Cloud Storage (ECS). In the early stages of product development, EMC took an early bet on Docker containers and it certainly has proved to payoff.


Examining the Internet of Things: What’s hype? What’s real?

John Mallory

CTO of Analytics at EMC Emerging Technologies Division

The Internet of Things is one of the biggest buzzwords in technology today, and indeed, it does have the potential to be a truly transformational force in the way that we live and work today.Internet of Things

However, if you peel back the “potential” and excitable future-speak surrounding IoT, and look at the actual reality of where it is today, the story is much, much different.  Yes, Internet-enabled “things” ranging from phones to watches to cars are getting smarter by being able to access, share and interpret data in new ways. But in our enthusiasm to embrace a Jetsons-like future powered by IoT, we’re losing sight of the infrastructure required (both at the literal hardware and organizational/institutional levels) to actually elevate this technology beyond buzzword status.

Consider, for example, the hype cycle over “big data” about three years ago when it became the industry’s hot topic without much, well, data to back it up. Hadoop is another example – it too had early adopters, but even now is only being rolled out into Fortune 1000/5000 companies. Organizations are still struggling with how to monetize it.


Follow EMC



Connect with us on Twitter