Archive for the ‘Hadoop’ Category

From Kinetic to Synthetic

Keith Manthey

CTO of Analytics at EMC Emerging Technologies Division

Technology is continuing to evolve and drive disruption. By now, most of you have probably viewed the meme being shared online that lists the biggest ride rental company as having no cars, the biggest accommodation company as having no property, etc. The identity and fraud space has not been unaffected by this trend either.

Several decades ago, identity and fraud were presenting businesses with a challenge, but the challenge was very kinetic. A fraudster usually committed the fraud in person and often used forged documents to commit the crime: fraud was therefore a very physical or kinetic transaction.

Fast-forward to today and kinetic fraud has greatly reduced in scope and impact; in its place, cyber fraud (committed via many different avenues) is burgeoning. Taking into account a number of recent cyber breaches, identity information and compromised payment methods like credit cards are readily available on the dark portions of the web. These identity elements sell for extremely low monetary values these days but it’s the volume of this data that will ultimately be financially rewarding to the fraudsters.


What’s Next for Hadoop? Examining its Evolution, and its Potential

John Mallory

CTO of Analytics at EMC Emerging Technologies Division

In my last blog post, I talked about one of the most popular buzzwords in the IT space today – the Internet of Things – and offered some perspective in terms of what’s real and what’s hype, as well as which use cases make the most sense for IoT in the short-term.

Today I’d like to address the evolution of Apache’s Hadoop, and factors to consider that will drive Hadoop adoption to a wider audience beyond early use-cases.

First, consider that data informs nearly every decision an organization makes today. Customers across virtually every industry expect to interact with businesses wherever they go, in real-time, across a myriad pf devices and applications. This results in piles and mounds of information that need to be culled, sorted and organized to find actionable data to drive businesses forward.

This evolution mirrors much of what’s taking place in the Apache-Hadoop ecosystem as it continues to mature and find its place among a broader business audience.

The Origins & Evolution of Hadoop

HadoopLet’s look at the origins of Hadoop as a start. Hadoop originally started out as a framework for big batch processing, which is exactly what early adopters like Yahoo! needed – an algorithm that could crawl all of the content on the Internet to help build big search engines and then take the outputs and monetize them with targeted advertising. That type of a use case is entirely predicated on batch processing on a very large scale.

The next phase centered on how Hadoop would reach a broader customer base. The challenge there was to make Hadoop easier to use by a wider audience. Sure, it’s possible to do very rich processing with Hadoop, but it also has to be programmed very specifically, which can make it difficult to use by enterprise users for business intelligence or reporting. This drove the trend around SQL on Hadoop, which was the big thing about two years ago with companies like Cloudera, IBM, Pivotal and others entering the space. (more…)

Soon you won’t say Travel Safe, instead you’ll say Travel Smart!

Keith Manthey

CTO of Analytics at EMC Emerging Technologies Division

As a frequent traveler myself, I can appreciate this situation.  A lone traveler is enjoying a quiet evening in their hotel.   As they unwind from the day, they peruse the local paper.  They are shocked to learn that their attempt at returning home the next day will be dashed by transit strikes.  All modes of public transportation will be shutdown causing an ill-timed exit from their current travel stop.  There are certainly other ways for the traveler to reach the airport, but the 5x surge pricing for their popular ride sharing application makes it an expensive trip.  There is also an expectation that the ride sharing application drivers might face violence from striking transit workers.  This all could have been avoided if their company subscribed to a travel alert for pending situations.  The advent of situational awareness tools that can monitor travel threats and pair that to traveler itineraries is an evolving field.  It is an advance warning to that weary traveler that forewarns them to seek personal safety and adjust their travel plans accordingly.  In the case of our weary traveler, an advance warning would allow them to change their travel plans in time to avoid this sticky situation.


Breakfast with ECS: ECS and Hadoop – The More You Know…

Priya Lakshminarayanan

Director of Product Management at ASD, EMC

Latest posts by Priya Lakshminarayanan (see all)

Welcome to another edition of Breakfast with ECS, a series where we take a look at issues related to cloud storage and ECS (Elastic Cloud Storage), EMC’s cloud-scale storage platform.

In a previous blog of this series, you heard from my colleague Nikhil about the challenges of storing and reasoning over immense amounts of globally generated data. The blog also touched upon how EMC’s Elastic Cloud Storage (ECS) system can provide a scale-out platform for global access, disaster recovery and analytics.

I recently sat down with Nikhil to talk more about ECS’s Analytics capabilities.

In the video below, we talk about major themes we hear from our customers about Hadoop storage. When Hadoop has evolved around HDFS as it’s primary filesystem, what would be the impetus for enterprises to choose ECS HDFS as their analytics backend instead?

We then proceeded to whiteboard a typical Hadoop-ECS deployment. Nikhil tells us about ECS’s multi-protocol access, it’s Java HCFS implementation, the deployment and configuration experience with Apache Ambari and performance considerations.

Want more? Try ECS for FREE for non-production use by visiting

Ripple effects of global data: global Hadoop.

Ashvin Naik

Cloud Infrastructure Marketing at Dell EMC

Strata Hadoop world Singapore left me pumping my fists with proof that recipes like the Gartner Value escalator or our very own transformation roadmap provide a simple actionable plan to build hindsight, provide insights and move towards foresights that enable data driven transformations.

The example of transformation came from quite an unexpected keynote speaker Rishi Malhotra : CEO and co-founder of Saavn, in what he termed as data ripple effects.

Rishi’s talk strengthened my belief that data large and small will be created everywhere and consumed for purposes not yet imagined.  As a modern enterprise, businesses have to treat all data as raw materials for future business expansion if not industry disruption. You have to capture, protect and use it for its current purposes but also keep it available for future applications in its native pristine form.

The two popular options for capturing and protecting data in geo-distributed Hadoop architectures are:

  • The public cloud and
  • The Hadoop HDFS storage

The public cloud storage is global, built in the web and caters to the new generation of applications at an attractive cost to store. However the fine print costs for every touch, withdrawal, move – an egress fee, much like the banks or telephone companies of yesterday can quickly add up.

Traditional Hadoop DAS storage – a single protocol (HDFS), single site storage needs edge extensions for transformation and conversion to cater to modern applications. Businesses end up having multiple silos of data stores as they add applications and uses to their existing data.

There was quite an interest in protecting data with sessions on  “Multi tenant Hadoop across geographically distributed data centers” and “Hadoop in the cloud: An architectural How-to” in addition to our very own EMC session on “Hadoop Everywhere: Geo-distributed storage for big data” that indicate a growing interest in addressing the challenges posed by modern mobile applications and the Internet of things.

Hadoop Geo-Distribution

Attendees, prospects and customers at Strata have been instrumental in validating the view that data is the asset of the future and needs to be captured, stored, protected and made available for future uses in a single shared global system. IT managers are looking to de-couple storage from the application stack with a disaggregated stack that is geo-distributed for protection as well as local access with a strong data consistency.

EMC Elastic Cloud Storage provides storage technologies that are simple, easy to manage, protect and scale into the exabyte range with built-in multi-protocol access for modern applications including S3, OpenStack Swift and HDFS.

Click here to learn more about the ECS solution for Hadoop or join our vibrant community on twitter.


What do Analytics and the Suez Canal have in common?

Suresh Sathyamurthy

Sr. Director, Product Marketing & Communications at EMC

Suez Canal

1859: Egyptian workers under French engineers begin construction of the Suez Canal. A canal across the Isthmus of Suez would cut the ocean distance from Europe to Asia by up to 6,000 miles, and it could be built at sea level, without any locks. Circumventing the additional travel would reduce risk, overhead of additional supplies, and fewer sailors.  Completed ten years later, the effect on the world trade was immediate. This wonder shrunk the globe rapidly at a topography level but also in the time, it traditionally took to gain business and economic benefits.

The economic benefits of the Suez Canal and In-Place Hadoop analytics

My metaphor here is that like sailing the old, 12,000+ mile route around the coast of Africa, the traditional method of storing and moving data for analysis is a long and arduous journey that affects your business and economic benefits. Just as the pre-Suez Canal journey from Europe to Asia required significantly more time, larger ships, more crew and more provisions, the traditional route to analytics requires more time (copying and moving data) bigger ships (3x storage capacity), more crew (IT resources) and more provisions (overhead). Now, imagine taking the EMC data lake route that reduces overhead, takes much less time, and offers increased flexibility. The EMC Isilon data lake with its native Hadoop Distributed File System (HDFS) support is the modern route to actionable results. It effectively brings Hadoop to where your data exists today, as opposed to having to ship and replicate your data to a separate Hadoop stack for analysis.

The Open Data Platform Initiative (ODPi), IBM and EMC Isilon

The Isilon data lake’s shared storage architecture natively supports HDFS, and the ODPi common platform. IBM, EMC, Pivotal and Hortonworks established the ODPi to create a standardized, common platform for Hadoop analytics that enables organizations to realize business results more quickly.  Which brings us to the EMC and IBM analytics collaboration. IBM BigInsights, being a part of the ODPi, means now there’s another choice for in-place analytics with the EMC data lake. And, it quickly became evident to both EMC and IBM that there was a strong customer demand for IBM BigInsights and EMC Isilon to align on a data lake approach to analytics. The EMC and IBM collaboration enables analytics on your data right where it is, within the EMC Isilon data lake, while IBM BigInsights provides the separate compute resources that analyze the data. Now you’re on the expedited route to business analytics with EMC Isilon and IBM.

Whether you are looking to gain a 360-degree view of your customers, attempting to prevent fraud in the financial markets, or making smarter infrastructure investments, the increased efficiencies of the partnership allows you to be nimble in understanding and reacting to what your data is telling you.

About 15,000 ships make the 11-hour journey through the canal each year. It’s estimated that the canal bears roughly 8 percent of the world’s shipping and is recognized as one of the most important waterways in the world. Forrester Research1 predicts big data analytics as the number 2 priority of corporations, and states Hadoop has already disrupted the economics of data. Just as the Suez Canal offers key business benefits for trade between Europe and Asia, so does in-place analytics. Here’s how: Compass

  • No moving and copying of data
  • No 3X replication of data
  • Increased storage utilization efficiency (to an average of 80%)
  • Enterprise data resiliency and availability
  • Enterprise grade security features
  • Quicker time to business insight
  • Smarter infrastructure investments
  • Reduction of CAPEX and OPEX
  • Increased choice and flexibility

In summary, back to the metaphor, the modern route to analytics saves on time to benefit, and can be achieved with smaller ships, less crew, and with fewer provisions required.

Where can I get more details?

The EMC Hadoop Starter Kit for IBM BigInsights is available and has instructions on how to build and deploy IBM BigInsights Open Platform with EMC Isilon. You can also learn more about the Hadoop enabled EMC Data Lake here.

1 Source: Forrester Predictions 2015: Hadoop Will Become a Cornerstone of Your Business Technology Agenda



Connect with us on Twitter