Ceph is free if your time is worth nothing!

Jyothi Swaroop

Jyothi Swaroop

Director, Product Marketing at EMC

Ironic how we grow up listening to our parent’s tell us “Nothing in life is free” yet the moment someone claims they have or do something for “free”, we forget that simple truth.

Is anything ever really free?
There are offerings out there today that claim they are open source, free and always will be. However, if we remember what Mom and Dad said – then we need to look deeper into this. Time, overhead and hardware requirements to run these open source solutions are not free.

For this discussion, let us take a look at Ceph and Ceph “Enterprise” editions. Ceph is an open source distributed object store and file system which claims to be free. As an open source platform the code is indeed free but if Ceph is free, why would any company pay to acquire a “commercial wrapper” for it, such as Inktank? When it comes to open source, large companies make their money by selling “enhanced” versions of the software along with professional consulting, services, and support.

Enterprise versions are not free and often expensive. Customers pay more in server hardware, server OS licenses & disk drives. Licensing and support can run as much as $4K per server.

Now, some will say, “I will go with the free version of the open source solution and not the “enhanced” or “enterprise” edition offered – it’s more cost effective and I can support it myself”. It is definitely an option, and in some instances may make sense, but before you make that commitment ask yourself:
• Can I get 24×7 worldwide support – the support I need, when I need it?
• Do I have to wait for a community to help solve my problem and even if a fix is suggested, will it work for me, in my environment?
• Will my customers wait till tomorrow or the next week for a fix?
• Does ‘ongoing support by committee’ really work?
• What am I willing to give up?

If it is free – do you get what you pay for?Car

When it comes to software-defined scale out block storage for high performing applications and / or delivering Infrastructure-as-a-Service (IaaS), “free” may not be better. Will you simply be getting what you pay for?

Starting with installation, as a software-defined offering, Ceph does not constrain or confine you like current hyper-converged appliances do. However, installation is actually extremely complex. Architecting and running Ceph as a storage platform requires deep Linux and Ceph expertise and experience. It requires a multi module, multi-step, deployment process (with different processes for each OS) which complicates management and incurs a larger performance hit to the underlying hardware. Ceph also takes a ‘non-native’ layered approach to delivering block storage, where block is on top of object. David Noy, VP Product Management at EMC pointed out in his blog last month, that with a layered approach “problems come when you have a system that is designed to optimize around the underlying abstraction and not the service layered on top”. This is evident in Ceph’s approach to block (RADOS Block Device – RBD) which has extreme overhead resulting in high latency and an inability to exploit Flash media.

OK, so you know there will be a great deal of work to set up and manage Ceph. You still feel you are ready to deal with this cryptic approach including: compile/decompile/manually edit crush maps; limited system visibility using a command line interface (CLI); and even the manual placement group (PG) planning and repair. Yes, the free approach, even with all of this, will meet your needs. Maybe but let’s not forget what really matters. When delivering IaaS or high performance applications, delays in response are simply not acceptable to your customers or users. How does Ceph measure up where it really counts: Performance and Scalability!

The Proof is in the Numbers
We recently tested Ceph against EMC ScaleIO and the findings were clear as day. Both were tested on the same hardware and configuration with the following assumptions:
• Test “SSD only” using a small volume that will fit the SSD capacity
• Test “SSD+HDD” using a small+large volume spanning HDD capacity
• Test SSD as Cache for HDD using a small+large volume spanning HDD cap
• Test a Mixed workload of 70% Reads, 30% Writes, 8KB IOs

• ScaleIO achieved ~7X better performance than the best Ceph IOPs value for a drive limited configuration
• ScaleIO achieved ~15X better performance than Ceph, when the drives are not the limit
• ScaleIO has ~24X better Response Time with an SSD only configuration
• ScaleIO can support the IOPs at 1/3rd the latency of Ceph, as a result there is no need to second guess performance for applications you run on ScaleIO.

Similar to Ceph, EMC ScaleIO is a software-only solution that uses existing commodity hardware servers’ local disks and LAN to realize a software-defined, scale-out SAN. However, EMC ScaleIO delivers elastic and scalable performance and capacity on demand and beyond what Ceph is a capable of for enterprise deployments. ScaleIO also does not require additional servers for providing storage and supports multiple platforms including OpenStack via a Cinder Plugin. It requires 1/5th to 1/10th the number of drives that Ceph needs to deliver the same performance. This actually results in significant floor and power savings.

The evidence speaks for itself when it comes to performance, scale and enterprise grade capabilities – sometimes you just get what you pay for. But don’t just take our word for it. Here is a perfect example of the kind of issues a company can face, including the potential loss of date, when delivering services with Ceph. Also, Enterprise Strategy Group (ESG) recently published a Lab Spotlight demonstrating extremely high IOPS performance and near-linear scaling capabilities of ScaleIO software on commodity hardware.

If you STILL want to be able use software for free BEFORE you make a long term, strategic commitment, EMC provides you the same opportunity with ScaleIO. May 2015, EMC is offering a free download of ScaleIO, for non-production use, for as much capacity and time as you want. You can experience all of the features and capabilities and see for yourself why enterprise grade software-defined scale out SAN with EMC ScaleIO is better than “free” with Ceph. Virtual Geek is ready! Are you?

Introducing CoprHD. EMC Changes the Game for Software-defined Storage Automation and Management

George Hamilton

George Hamilton

Sr. Product Marketing Manager

EMC is certainly no stranger to open source. EMC and Pivotal are both founding members of the CloudFoundry Foundation. And EMC recently announced a $10 million investment and its first CloudFoundry dojo, based in Cambridge, MA, that will attract developers and facilitate the creation of applications on CloudFoundry.  In November, EMC announced the EMC OpenStack Reference Architecture Partner Program and partnerships with Canonical Ubuntu, Mirantis and Red Hat.  EMC also recently launched EMC {code} – the Community Onramp for Developer Enablement, which provides both EMC and community contributions of open source code, drivers, tools, samples, and more. EMC supports and contributes to open source in a number of ways, yet EMC is still considered a proprietary vendor. Well, if none of the above proves EMC’s open source bona fides, perhaps this will: On May 5th, EMC is moving EMC ViPR Controller development into the open source community.

This is big news. For the first time, EMC is taking a commercial product and releasing it to community-driven development.  The open source project, named CoprHD, makes the code for ViPR Controller – all the storage automation and control functionality – available in the open source community. Customers, partners, developers and other storage vendors can download, expand and contribute to CoprHD. EMC will continue to sell EMC ViPR Controller as a commercial offering enhanced with service, support, training, and more to help organizations quickly adopt software-defined storage.

It’s been an amazing journey. Two years ago, EMC announced and subsequently launched EMC ViPR Software Defined Storage. Two years later, The ViPR Controller code, now open source project CoprHD, will be open and available for download on Github.  This signifies a fundamental change to EMC’s development model. All development for ViPR Controller and CoprHD will be done in the open source community, with EMC and others contributing.  CoprHD is licensed under the Mozilla Public License 2.0 (MPL2.0), which encourages community sharing and requires anyone who modifies the source code to share those modifications with the community.  EMC is also establishing free and frictionless access to CoprHD to facilitate community-driven collaboration that will accelerate and expand functionality and support for third party storage.

Why is EMC taking this step? EMC fundamentally believes that software-defined storage is a strategy, not a product. The goal of software-defined storage is to give customers choice of storage services and hardware platforms, make it all simple and less costly to manage, and eliminate proprietary lock-in. Making the ViPR Controller source code available as open source project CoprHD will accelerate development and increase support for non-EMC storage arrays and data protection technologies. It also strengthens CoprHD as a single, vendor-neutral API control point for software-defined storage automation.

This open source model of open, collaborative development is crucial to the future success of software-defined storage and storage automation and management. CoprHD and ViPR Controller will give customers choice, flexibility, and transparency. Purpose-built storage platforms from EMC and others will always remain data center necessities. But customers increasingly value more plug and play architectures – driven by software-defined solutions and standardized infrastructure – and will often sacrifice some level of efficiency to obtain best-of-breed features, more flexibility and lower switching costs. In the modern data center, successful storage vendors will compete on the merits of their solutions and deliver compelling customer experiences. As CoprHD and ViPR Controller extend support to more and more storage platforms, EMC welcomes this new competitive playing field. EMC is ready to lead in this new software-defined world.

Are you a developer that has contributed to a product in the open source community before? Are you planning on contributing to CoprHD?  Are you a storage administrator or architect looking to evaluate and deploy CoprHD? If so, tell us about your experience!  We invite you to join us on this new journey and share your discoveries…let’s see where it takes us.

Demystifying Software-Defined Storage and Hyperconvergence

David Noy

David Noy

VP Product Management, Emerging Technologies Division at EMC

If you read the storage news these days you simply can’t miss a story around hyper-converged storage or yet another vendor looking to release a software version of its platform. If you believe Gartner, by 2019 about 70% of existing storage array products will become available in “software-only” versions. The information industry is quickly waking up to the fact that the thing that turns a cheap white box server into a branded product that commands high margins is the software. Increasingly, end users are looking to standardize on low cost servers in order to reduce operational costs and obtain better purchasing leverage to get better pricing. Some web scale customers do this to the extreme and from that the Open Compute Project was born.

To participate in this market,Computer applications different strategies have emerged by data storage technology companies and the borders between software-defined, hyper-converged, and commodity hardware have gotten blurred.

Before I delve into what’s out there, let’s define terms. Software-defined storage has been around for a long time. A software-defined storage solution provides a hardware agnostic solution to data management and provisioning based on storage virtualization. Said more plainly, software-defined storage takes a bunch of disks and processors and turns them into the functional equivalent of a storage appliance. This could be object, block or file based storage. Hyperconverged refers to the ability to run both your software-defined storage services (a virtualized storage appliance) and applications on the same servers. This could be a cluster of servers where direct attached hard disks and flash drives are virtualized and made available to applications running (potentially virtualized) on the same physical infrastructure.

“Commodity hardware” refers to servers that are built from commonly interchangeable, standards based, high volume components. Data storage companies are bringing all of these aspects together to build low cost, highly customizable alternatives to the legacy storage architectures of the past.

In EMC’s portfolio there are several unique and powerful software-defined storage offerings for object, block and (soon) file based storage. For today, I am focusing on the EMC® ScaleIO® product which enables a “Software-defined Scale-out SAN” by  virtualizing servers with DAS to provide block storage for applications running either hyper-converged or on separate sets of servers dedicated to storage and applications (“two-layer” approach). The EMC ScaleIO product was designed from day one to be a software-defined storage offering that takes any server hardware and pools its storage in scale-out fashion. What does it mean to be scale-out? Scale-out (as opposed to scale-up) means that the design center for the product is optimized to incrementally add capacity and compute. Scale-out storage products allow end users to start small, often times with only a few nodes (another term for servers) and incrementally grow as their business demands increase.

One of the advantages that EMC ScaleIO has over some of the other approaches to software-defined block storage is that it was designed for scale, performance, and flexibility out of the gate. ScaleIO is first and foremost a software product. As such, it can be easily applied to a wide variety of commodity servers allowing customers to avoid vendor lock-in, maximize their existing server vendor relationships, and pick and choose the storage media that meets their performance requirements. The ScaleIO product was also designed exclusively as a high performance block storage virtualization product, so it does not have to suffer from the performance overhead that comes with trying to take-on “multiple storage personalities”, which I will explain later. Finally, the ScaleIO team recognized the importance of platform choice and implemented support for a wide range of hypervisors and operating systems including integration with cloud management products like OpenStack.

Why the SDS Approach for Hyperconverged InfrastructureServers Conquers All
With the recent shift in thinking towards taking advantage of commoditization and convergence, many vendors are now competing in the hyper-converged storage market. There are several approaches they have taken: an appliance model, a layered model, or a hypervisor model.

Appliance Model:
The first approach, where vendors have taken an appliance model to the solution, has had moderate success. However, in an effort to rush to market, these solutions have made rigid assumptions around hardware choices and rules. These rules help when you are trying to force a quick solution into a new market, but ultimately they lead to pain for the end users. Rigid rules around how to grow your hyper-converged appliances, which components you have to use, flash to spinning disk ratios, and other “non-solutions” to engineering rather than customer problems are forcing these product vendors to rethink their approach. Many of them are now looking at how to take their embedded software and reengineer it to run on a wider variety of hardware vendor platforms. Ultimately, what they are finding is that what customers really want is the software bits and not the vendor lock-in. Unfortunately, systems designed to take advantage of hardware choice shortcuts aren’t so easily repurposed for a hardware vendor neutral world. Fortunately, EMC ScaleIO was built as a software product from inception. This means it can easily be adapted to hardware delivered solutions later, but will never have to worry about struggling to become a software-only product.

Layer Model:
The second approach is to take a layered model to building software-defined block storage services on top of object storage architecture. Now there is nothing wrong with using abstractions in any systems design – abstractions help to simplify things. The problem comes when you have a system that is designed to optimize around the underlying abstraction and not the service layered on top. It’s really hard to do a good job of building one data paradigm on top of another when the two are optimized for totally different parameters. For example, a block storage system should be optimized around maximum uptime, minimal resource utilization, and maximum performance even if it means taking advantage of more expensive media like flash for persistence. On the other hand, an object file system should be optimized for billions or even trillions of objects, geographic dispersion of data, and low cost relatively static data.  Layering a block storage system optimized for uptime and performance on top of a system optimized for object sprawl and low cost seem at odds with one another! That’s exactly what we see in practice; software-defined block storage built on object stores tend to be slow, consume a lot of resources, and require a lot of care and feeding into the underlying storage paradigm to keep operational. These offerings have been successful primarily because their business model is a freemium model that allows end-users to download and use the product without a support contract. The performance penalties and reliability issues have certainly not played in their favor. In order to make sure that end users have choices other than the current cumbersome freemium offerings, this summer EMC ScaleIO will be releasing the first “Free and Frictionless” versions of its product, designed to give anyone the ability to download and operate a software-defined SAN storage cluster, for an unlimited time and capacity, for non-production workloads.

Hyperconverged Model:
Finally, hypervisor vendors (of which there are only a few) have also jumped on the commodity bandwagon. The advantage of these solutions is that they are generally built into the hypervisor platform, which means that if you have the hypervisor platform deployed then you have a block storage virtualization product ready to go. Hypervisor clusters of servers tend to be small though and so while this can provide a quick and easy way to get going with block storage, they tend not to be a scalable, high performance, solution and as with solutions designed for a specific hardware platform, come with a level of rigidity. End-users that have a mix of Windows and Linux platforms, or may be looking to take advantage of less expensive virtualization platforms like KVM and OpenStack will find themselves limited by solutions that are built into a single vendor’s hypervisor. Once again, EMC ScaleIO addresses the needs of these end-users looking for choice of platforms, high performance, and massive scale while in some cases plugging directly into the hypervisor for optimal performance. While EMC ScaleIO can be deployed in conjunction with hypervisor platforms in a hyper-converged fashion, it is different from the hypervisor vendor solutions in that you aren’t forced to run hyper-converged. You can choose to deploy your storage servers and your virtualized application servers separately if that’s what suits your organization.

It’s no surprise given the rapid growth of the software-defined, commodity storage market that every large vendor and many more startups are introducing or tailoring their products for this new world. But the approach matters. Products designed with hardware constraints early on will have a real challenge trying to disentangle themselves from the assumptions that they have made. Products built with dual personalities that attempt to imitate one storage type on top of another will find themselves optimized for one thing while trying to deliver another, leaving end-users dissatisfied. And finally, hypervisor-based solutions, while simple to set up and integrated into the hypervisor, may work for some small deployments but will lack the flexibility and scale of a true software defined storage solution for the enterprise. Fortunately for end-users, the EMC ScaleIO software block storage solution avoids these limitations since it was born and raised in the software defined world.

Software-Defined Storage Marries Enterprise Storage with the Cloud

Rodger Burkley

Rodger Burkley

Principal Product Marketing Manager

This blog focuses on software defined storage….the all caps version.   SDS.  Or….by another popular industry term….the Software Defined Data Center (SDDC) platform.  SDDC platforms are transforming Data Centers because theySoftware Storage can simultaneously marry (ah…some might say integrate) a variety of traditional hardware storage resources, data types and technologies into one “federated” and aggregated data center infrastructure.   Moreover, SDDC platforms can control, manage and even monitor the entire data center storage (and compute and networking) operation from a “single pane of glass” console – outside the data path.  Yes, I’m referring to ViPR Controller SDS.

ScaleIO is an excellent commodity hardware based software defined storage (sds) system for creating block data Server SANs.  It’s hyper-converged, hyper-scalable, flexible and highly elastic.  In fact, by its very name creates a scalable low-cost virtual SAN array from commodity servers.

As good as ScaleIO (sds) is, however, it’s not the whole SDS story.  A good piece to be sure.  But it’s a virtual SAN solution that covers a portion of today’s contemporary enterprise data center’s needs.    True, ScaleIO can completely cover some specific storage use cases extremely well and efficiently.   Fact is, our competitors out there – particularly from the Server SAN appliance camp – may be overstating the extent Server SANs and “SAN-less” storage can be integrated with SMB and Enterprise Data Centers’ existing traditional storage infrastructure and arrays.  In fact, many customers confuse ScaleIO (or VSAN) sds with ViPR Controller SDS. After all, they’re both software defined, right?

Many of you already know this.  The principle cause for some of this confusion is that traditional external storage is quite alive and doing well in today’s storage market environment and Enterprise customer base.  Most of these enterprises store the vast majority of data on their traditional hardware arrays.  Fact is, they’ll probably be doing so for a long time.  Just witness the “demise of tape” as a deep, ‘frozen’ archive medium.  And according to IDC, more than 20,000 petabytes of new external storage system capacity was purchased in 2013 alone.

So rumors of traditional external storage’s impending death are a bit exaggerated…to rip off a humorist slogan from one humorous wise old sage. Accordingly, it stands to reason that a SDS federated and unified management platform solution needs to work seamless with that traditional hardware resource layer…comprised of multiple vendors…and data types.  And ViPR Controller does that….hands down.

On the other end of the data center spectrum…or relay station… however, is the need to manage external data storage resources and I/O traffic access that lie physically outside the data center’s local physical or logical resource control.   Specifically, the Cloud – whether it be Public or Enterprise Off-premise…

When it comes to Public Clouds, service providers like Amazon, Google and Microsoft will remain big players due to their sheer economies of scale and easy table stakes entry.  You hear a lot lately about ‘supplemental storage’ or remote DR/copy on these Public Clouds.  But those very same Public Clouds concern many Enterprise IT Directors too…. (i.e., service outages, SLAs, data security/integrity, multi-tenancy, etc.).   Not surprisingly, Private and Hybrid ‘on-premise’ Clouds are gaining increased popularity and momentum.  So a true, useful/attractive “federating” SDS platform needs to support, manage and control traditional external storage; Cloud storage and – yes – commodity based sds Server SAN arrays like ScaleIO and VSAN.

Bottom-line?  Viable and highly productive/ROI software defined storage implementations should not be limited to lower case sds and/or strictly commodity hardware.  With ViPR Controller and it’s unifying software abstraction layer, a broad platform can be created for managing, provisioning and automating federated storage that includes traditional enterprise storage platforms as well as the “Cloudsphere”, Server SANs, Data Lakes, OpenStack, REST APIs, on-line analytics and other data center resources, tools and access portals.  Again, ViPR Controller shines as a true SDS/SDDC (or whatever the latest popular term happens to be).

Obviously, implementing a purpose-built sds (i.e., ScaleIO) or SDS (i.e,. ViPR Controller or ECS) in an Enterprise Data Center needs to be well thought-out and phased in incrementally.  Why?  Storage systems invariably represent an Enterprise Data Center’s largest investment – and arrays typically have long operational life cycles.  Plus storage admins are used to and quite familiar with them.  This is their bread and butter and ‘job security’.  And use cases involving evolving Public, Hybrid or Private on or off premise storage clouds – married with legacy on-premise, metro or remote geo based traditional storage —  must also be well considered when  proposing and evaluating software defined storage at any level.

In closing, users don’t like to manage multiple storage stacks, architectures or interfaces.  Simplicity and easy everything are highly valued.   It’s all about delivering broad enough, all-encompassing storage strategies that help SMB and Enterprise customers get the most out of their current investments/ROI while adopting and/or migrating to new, next-gen storage technologies in an orderly, seamless and painless manner.  Did I mention ViPR controller and “marriage”?  Maybe more like the “Presiding Official”…

Data Migrations: Seven10 and EMC Revolutionize the Industry Paradigm

Bobby Moulton

Bobby Moulton

President & CEO of Seven10 Storage Software

*The following is a guest blog post by Bobby Moulton, President & CEO of Seven10 Storage Software, a leading developer of cloud-based information lifecycle management (ILM) and data migration software.

An assertive headline is essential for a bold undertakingMigrate to cloud that forever changes how data is moved from old storage to new.  A few years ago, Seven10 set out to transform how users, vendors, and application providers consider file and storage migrations.  It started with a customer challenge: move critical data off proprietary hardware over to new storage without interrupting the patient care process – and resulted in the Storfirst simple, trusted,  data migration platform.

Seven10 searched the industry and was surprised at the lack of innovation.  Where was the automation?  Where was the vision?  Where was the hands-off, ‘we make it so easy you can do it yourself’ innovation?  It seemed that some were busy developing next-generation SaaS, Big Data, IoT, or cloud based offerings because they weren’t working on a data migration solution.

Seven10 Storfirst was Born.
So Seven10 stepped up to the plate.  We focused on customer-driven migrations that were highly automated, supremely reliable, and ridiculously cost effective.  We tossed the PS-lead blue-print and created a new 100% software-driven model.

From day one, Seven10’s Storfirst software seamlessly transitions data from the widest range of legacy storage environments, including EMC Centera, NetApp StorageGrid, HP MAS, IBM GMAS, Oracle SAMFS – as well as any existing NAS platform, cloud gateway or file system.  In addition to data migration capabilities, Storfirst is the only solution offering a standard SMB/CIFS or NFS presentation layer for immediate access into EMC platforms such as ECS.

Why Migrate Data to EMC ECS?
EMC’s Elastic Cloud Storage (ECS) software-defined cloud storage platform combines the cost advantages of commodity infrastructure with the reliability, availability and serviceability of traditional storage arrays.  ECS delivers protocol support for Object and HDFS – all within a single storage platform. Seven10’s Storfirst Gateway allows EMC customers to quickly decommission legacy storage devices while simultaneously modernizing their infrastructure with the adoption of ECS.

How Seven10 Storfirst Gateway Works:
Seven10 offers migration PLUS go-forward data management – all without breaking the bank or interrupting day-to-day operations.  Seven10 changed the paradigm from a resource intensive, PS-led effort, to a repeatable, software-driven five-step migration process:

1. Inventory – Storfirst “ingests” existing file system as read-only and configures new storage or storage tiers under a single managed share.

2. Sync – While providing uninterrupted access to legacy data and writing to new storage, Storfirst copies all legacy data onto new storage or storage tiers.

3. Verify – Using an MD5 hashing algorithm for data verification, Storfirst delivers zero risk of data loss migration.

4. Audit – Storfirst provides a detailed logging capability in order to conduct a file-by-file comparison on the new storage to ensure all data has been copied without any loss.

5. Decommission – Once the migration is complete, the application communicates to the new storage platform while the legacy storage is decommissioned and removed from the environment.

Thanks to the longstanding Technology Connect Select Partnership of EMC and Seven10, organizations retire and/or refresh their storage architecture with software-driven, secure data migrations that guarantee zero data loss.  Storfirst meets compliance regulations by enforcing policies such as: encryption, file-locking, lifespan, auditing, and tiering.  Industries from healthcare to financial services and from manufacturing to government now have the answer to the data migration challenge.

Seven10 and EMC Ease Customer Stress with Safe, Trusted, Proven Migration Solutions

For Allegiance Health, Storfirst seamlessly migrates critical files to ECS.  Due to long-term reliability concerns with the existing NetApp StorageGRID, Allegiance selected Storfirst to migrate millions of electronic records off NetApp and over to ECS.  This all-in-one solution includes optimum storage with a built-in migration path and a 100% auditable transition to ECS – all while delivering Allegiance uninterrupted access to their patient files.

“The combined offering from EMC and Seven10 provides Allegiance Health with an easy and safe migration solution for moving and protecting our critical patient data.  Seven10’s Storfirst migration and management software is very robust, allowing us to quickly and easily adopt the EMC cloud storage platform,” said Allegiance Health’s Information Systems Vice President and Chief Information Officer Aaron Wootton.

It’s clear, the question is not if companies migrate their data, but rather how they complete the migration.  Understanding the features, advantages and benefits of the options is essential.  Through a well-defined, proven, best-of-breed technology partnership with real-world applications, Seven10 and EMC redefine the industry paradigm.

Sizing up Software Defined Storage

Rodger Burkley

Rodger Burkley

Principal Product Marketing Manager

By now, you’ve all heard how Software Defined Storage (SDS) is reshaping the storage industry (and use case) landscape. The market gets it and increasingly users are embracing this new, “disruptive” technology by introducing it to their enterprise data centers or using it to create hyper-scalable virtualized infrastructures for cloud applications. After all, the appeal of installing software on individual commodity host application servers to create a virtual storage pool (i.e., “server SAN”) from each participating server’s excess direct attached storage (DAS) without requiring additional specialized storage/fabric hardware is alluring…and almost too good to be true.

Throw in the added synergy and side benefits like ‘on-the-fly’ elasticity; linear I/O processing performance capacity scalability; simplicity and ease of use; hardware/vendor agnosticism and unparalleled storage platform flexibility and you might think “SDS server SAN” technology can solve world hunger too. Well, if not world hunger, perhaps today’s contemporary data centers’ thirst for a simpler, less expensive higher performing and more flexible storage solution for block and/or object….

Yes. There’s a lot of excitement, market activity and hype out there around SDS and Server SANs in general. But don’t take my word for it. Though data compilations for CY 2014 aren’t available yet, Wikibon’s market TAM and SAM sizings for 2013 are revealing.

Figure 1. Hyperscale vs Enterprise
Hyperscale vs Enterprise

Figure 2.  Vendor SOM (share of market)

Figure one shows that (at least for 2013) the application of “Hyperscale” Server SAN’s (i.e., Petabyte Scale) generated far greater revenue than “Enterprise” Server Server SAN’s.  Why?  New VSI (Virtual Server Infrastructure) and Cloud use cases are ideally suited for the hyper-converged, hyper-scalability attributes SDS Server SAN’s bring to the table.  This is, in fact, a primary targeted use case for ScaleIO.  Why are Enterprise Server SAN’s a lot less than Hyperscale Server SANs?  This is the domain of the mission critical apps, databases and use cases that keeps IT Datacenter Directors and Administrations busy….and up at night.  It’s also the domain of traditional storage arrays and the Storage Admins, with lots of proprietary equipment (and bias) for vendors like EMC and our esteemed competitors.   These folks are wary and cautious when it comes to new technologies.  They’re not enthusiastic early adopters.   But as new technology and products mature and prove themselves, they end up being embraced by IT data center departments.  “Show me your value prop” or …”Show me the money.”  And simple, less complex storage solutions will get you in the door.  Growth is expected to be high in this hardware ‘open’ and ‘liberated’ segment over time.

Figure 2 shows the major players in the Server SAN arena.  Note that VMware’s SOM is tiny…but this is because VMware’s Virtual SAN (VSAN) wasn’t fully rolled out in the market place.   But also note the number of players.  Big players and small unknowns alike.  All vying for the coveted Enterprise Server SAN market, which is poised for growth along with the SMB and ROBO segments.

By now, you’re ready to call me out on my liberal use of SDS and Server SANs terminology.  After all, figure 2 lists hyper-converged Server SAN hardware appliance vendors (like Nutanix & Simplivity) along with pure software SDS vendors and products (like Scality, ScaleIO, Scale Computing, etc.).  So what gives?  Continue reading

Hadoop is Ready for Primetime: Recap of Strata + Hadoop World San Jose

Ryan Peterson

Ryan Peterson

Chief Solutions Strategist

Hadoop joins the ranks of Microsoft Windows and Apple iPhone as the next platform ready for applications.  The message is clear from Strata + Hadoop World San Jose 2015 that Hadoop is ready for primetime.  As we have all seen in the past from other successful platforms such as Windows and iPhone, it takes a well-constructed operating system and application development framework to prepare for success.  Windows 1.0 was a great glimpse into what would happen when the 2nd platform originally emerged, but it wasn’t successful until applications began to be created.  I remember playing with Windows 1.0 and thinking, I wish it had the ability to do X, and I wish it had Y.  And of course today, it has most any application you might need.  The same holds true with the advent of the mobile era as iPhone 1.0 built a platform with a handful of applications, but it wasn’t truly successful until the ecosystem began to build apps on top of the platform.

Enter the next generation data platform, Hadoop.  We’ve heard our customers say things over the last three years like “we’re experimenting with Hadoop” or “we have it in a lab” or “we have a few killer apps we’ve custom designed”.  But in 2015, we’re discovering trends in data and trends in data use by using the advanced toolsets the Hadoop framework brings to data.  In the financial industry, for example, we see fraud analytics and risk calculations as a common set of applications being built with the technology.  It’s now only a matter of time until an application is established that solves that challenge with fewer customizations than Hadoop has usually been known to require.

You can see Doug Cutting (The Father of Hadoop) and me speaking about this topic on O’Reilly TV:

The industry is full of change, advancement, and growth.  You could see growth in the form of attendees from last year (people are starting to get it).  You could see advancement from all of the new intellectual property brought out by the vendors (including some EMC competitors).  Good to see them joining the party.  And for change, well that was the story of the week with the announcement of the Open Data Platform.  There has been plenty said about the new Pivotal-led initiative both from supporters and adversaries.  Although I have heard a lot about the initiative this week, I’d say I am not qualified to comment on its merits.  I will instead state my opinions, which I’m known to do.  I believe in Big Data as something that will change the world.  I also believe Hadoop as a framework is still in need of an enterprise quality uplift as we transition to the application-ready nature I’ve just addressed.  I hope the ODP will be an organization that will not only provide that uplift, but will do so in a truly open way and in a way that gets all of the major Hadoop supporters on board.

At EMC, we support the industry, our customers, and we want to see the world truly made better through whichever vendor that customer chooses (we are Data Switzerland).  We hope we are delivering excellent products and solutions to that end, and believe customer choice is at the heart of those solutions.  With that in mind, we’ve augmented our Pivotal and Cloudera relationships to include Hortonworks.  After 6,172 tests required for certification of EMC Isilon against the Hortonworks distribution, I am happy to say Isilon has passed with just a handful of documented differences. This should put customers at ease when they decide to utilize Hortonworks HDP with our Data Lakes.

Shaun Connelly, VP of Strategy for Hortonworks and I discussed the certification on theCube:

We announced the HD400 node, which is fantastic!  I have found that not many companies have moved greater than 20PB into Hadoop.  Even the very large Web2.0 companies run multiple clusters none of which I have seen greater than 35PB.  This is usually a result of maxing out the namenode and is seconded by not wanting to have such a large fault domain.  I believe EMC Isilon’s 50 PB’s be PLENTY of capacity for 99.9999% of companies for many years to come.

See Sam Grocott discuss the Data Lake and our recent announcements related to the HD400 on theCube:

Finally, a big shout out to Raeanne Marks who represented EMC at the Women of Big Data conference this week, Bill Schmarzo (Dean of Big Data) and the army of >50 EMC’ers that have joined the Hadoop revolution and made it to Strata + Haodop World San Jose this year.

We are driving many innovations with the products, solutions and choices for our customers, follow @SGrocott, @NKirsch, @KorbusKarl, @AshvinNa, @EMCBigData and @EMCIsilon to bring you the latest stories from the trenches.

Thank you!



All roads lead to … Hyperconvergence

Mark O'Connell

Mark O'Connell

EMC Distinguished Engineer & Data Services Architect

There are a series of trends which have determined the overall direction of the IT industry over the past few decades.  By understanding these trends and projecting their continued effect on the data center, applications, software, and users, it is possible to capitalize on the overall direction of the industry and to make intelligent decisions about where IT dollars should be invested or spent.

This blog looks at the trends of increasing CPU power, memory size, and demands on storage scale, resiliency, and efficiency, and examines how the logical outcome of these trends are the hyperconverged architectures which are now emerging and which will come to dominate the industry.

The storage industry is born
Storage 2In the 90s, computer environments started to specialize with the emergence of storage arrays such as CLARiiON and Symmetrix. This was driven by the demand for storage resiliency, as applications needed data availability levels beyond that offered by a single disk drive. As CPU power remained a constraining factor, moving the storage off the main application computer freed up computing power for more complex protection mechanisms, such as RAID 5, and meant that more specialized components could be integrated to enable features such as hot-pull and replace of drives, as well as specialized HW components to optimize compute intensive RAID operations.

Throughout the 90s and into the 2000s, as storage, networking, and computing capabilities continued to increase, there were a series of treadmill improvements in storage, including richer replication capabilities, dual-disk failure tolerant storage schemes, faster recovery times in case of an outage, and the like. Increasingly these features were implemented purely in software, as there was sufficient CPU capacity for these more advanced algorithms, and software features were typically quicker to market, easier to upgrade, and more easily fixed in the field.

A quantum leap forwardLeap 3
The next architectural advance in storage technologies came in the early 2000s with the rise of scale-out storage systems. In a scale-out system, rather than rely on a small number of high-performance, expensive components, the system is composed of many lower end, cheaper components, all of which cooperate in a distributed fashion to provide storage services to applications. For the vast majority of applications, even these lower end components are more than sufficient to satisfy the application’s needs, and load from multiple applications can be distributed across the scaled out elements, allowing a broader, more diverse application load than a traditional array can support . As there may be 100 or more such components clustered together, the overall system can be driven at 80-90% of maximum load and still be able to deliver consistent application throughput despite the failure of multiple internal components, as the failure of any individual component has only a small effect on the overall system capability. The benefits and validity of the scale-out approach was first demonstrated with object systems, with scale-out NAS and scale-out block offerings following shortly thereafter.
Continue reading

Icy Hot: Cold Storage is a Hot Market and Object Storage is Heating Up

George Hamilton

George Hamilton

Sr. Product Marketing Manager

How do we measure the mission criticality of storage systems? What comes to mind when you hear or read the words, “mission critical”? Certainly, you’d think of reliability,Icy Hot resiliency, data protection, etc. But I’m willing to bet that you also, almost reflexively, think of performance –measured in millions of IOPS, transactions per second, or sub-millisecond latencies. To many, mission critical means fast. Think all flash arrays and high-end block storage. This is what the industry refers to as “Hot” storage.

“Cold” storage, on the other hand, gets no love.  When you think cold storage, you think of old data you don’t want but can’t get rid of. You think of tapes in caves or a $0.01 per GB/month cloud storage service. Think low cost, commodity and object storage. Cold storage has an image problem, thanks in no small part to Amazon Web Services introducing Glacier in 2011 as a cold archiving service. You don’t often hear the terms “mission critical” and “cold storage” in the same sentence (see what I did there?). You think cold storage isn’t important. And you’d be wrong.

You’d be wrong because the world of storage doesn’t bifurcate so neatly into just two storage categories. Cold storage, which is frequently delivered by an object storage platform, can actually be different temperatures – cool, chilled, cold, colder than cold, deep freeze, etc. Confused? IDC explains:
System Type
Source: IDC Worldwide Cold Storage Ecosystem Taxonomy, 2014 #246732

It all depends on the use case and how active the data is. Extreme or deep freeze archive is when the data is seldom, if ever, accessed. Amazon Glacier is an example. Access times can range from hours to more than a week depending on the service – and you pay for the retrieval. Deep archive makes up the bulk of the cold storage market. The data is also infrequently accessed but it remains online and accessible. IDC cites Facebook Open Vault as an example. Active archive is best for applications that may not modify data frequently, if at all, but can read data more frequently as in Write Once, Read Many (WORM). An example use case is email or file archiving; IDC cites EMC Centera as an example. EMC Atmos and EMC Isilon are also good examples.

Object storage, general speaking, falls under the category of cold storage and is used for any temperature. But it should not be pigeonholed as an inactive, unimportant storage tier. Object storage is a critical storage tier in its own right and directly influences the judicious use of more expensive hot storage. With the explosion in the growth of unstructured content driven by Cloud, mobile and big data applications, cold secondary storage is a new primary storage. To the salesperson or insurance adjuster in a remote location on a mobile device, the object storage system that houses the data they need is certainly critical to their mission.

The importance of cold storage is best explained in the context of use cases. The EMC ECS appliance is a scale-out object storage platform that integrates commodity off-the-shelf (COTS) components with a patent-pending unstructured storage engine. The ECS Appliance is an enterprise-class alternative to open source object software and DIY COTS. ECS offers all the benefits of low cost commodity but saves the operational and support headache of racking and stacking gear and building a system that can scale to petabytes or exabytes and hundreds or thousands of apps. Organizations evaluating ECS appliance are generally pursuing a scale-out cloud storage platform for one or more of the following three use cases:

Global Content Repository

This is often an organization’s first strategic bet on object and cloud storage.  Object storage, due to its efficiency and linear scalability, makes an ideal low cost utility storage tier when paired with COTS components. The ECS appliance delivers the cost profile of commodity storage and features an unstructured storage engine that maintains global access to content at a lower storage overhead than open source or competing object platforms. This lowers cost and makes their hot storage more efficient and cost- effective by moving colder data to their object archive – without diminishing data access. But it’s more than that. A crucial aspect of a global content repository is that it acts as an active archive; the content is stored efficiently but is also always accessible – often globally.  And it’s accessible via standard object storage APIs. Consequently, the global content repository also supports additional uses such as next-generation file services like content publishing and sharing and enterprise file sync and share. And there is an ecosystem of ISV partners that build cloud gateways/connectors for the ECS appliance that extend the use case further.

Geo-scale Big Data Analytics

Geo-scale Big Data Analytics is how EMC refers to the additional use of a Global Content Repository for Big Data Analytics. The ECS Appliance features an HDFS data service that allows an organization to extend their existing analytics capabilities to their global content repository. As an example, one ECS customer uses their existing Hadoop implementation to perform metadata querying of a very large archive. ECS appliance treats HDFS as an API head on the object storage engine. A drop-in client in the compute nodes of an existing Hadoop implementation lets organizations point their MapReduce tasks to their global archive – without having to move or transform the data. The ECS appliance can also be the data lake storage foundation for EMC Federation Big Data solution. This can extend analytics scenarios to include Pig, Hive, etc. In addition, since ECS is a complete cloud storage platform with multi-tenancy, metering and self-service access, organization can deliver active archive analytics or their data lake foundation as a multi-tenant cloud service.

The ECS appliance overcomes some of the limitations of traditional HDFS. ECS handles the ingestion and efficient storage of a high volume of small files, high availability/disaster recovery is built in, and distributed erasure coding provides lower storage overhead than the 3 copies of data required by traditional HDFS.

Modern Applications

Mainstream enterprises are discovering what Web-centric organizations have known for years. Object storage is the platform of choice to host modern, REST-based cloud, mobile and Big Data applications. In addition to being a very efficient platform, the semantics of object make it the best fit for Web, mobile and cloud applications.

I recommend viewing the webcast, “How REST & Object Storage Make Next Generation Application Development Simple” to get an in-depth look at object architecture and writing apps to REST based APIs. However, there are two features unique to ECS that facilitate the development and deployment of modern applications:

  • Broad API support. ECS supports Amazon S3, OpenStack Swift or EMC Atmos object storage APIs. If developing apps for Hadoop, ECS provides HDFS access.
  • Active-active, read/write architecture – ECS features a global index that enables applications to write to and read from any site in the infrastructure. ECS offers stronger consistency semantics than typically found in eventually consistent object storage. ECS ensures it retrieves the most recent copy of a file. This helps developers who previously had to contend with the possibility of a stale read or write conflict resolution code into their applications.

Noam Chomsky once said, “I like the cold weather. It means you get work done.” You can say the same for cold storage; it also means you get work done.  It’s become a workhorse storage platform. It doesn’t get the sexy headlines in trade rags. But I hope after reading this and understanding the actual use cases for ECS appliance and object storage, you have a better appreciation and some love for cold storage. There are lots of solutions for storing old data that just can’t be thrown away and most compete purely on price. But, if your applications and data fall into one or more of these use cases, then the ECS appliance should be at the top of your list.

How are you planning to meet the OTT Video Demand?

Jeff Grassinger

Jeff Grassinger

Sales Alliance Manager

There is a great deal of buzz in the Media industry about digital content delivery. At the recent CES trade show in Las Vegas, Dish Networks announced Sling TV. Sling TV will deliver some of most popular live cable channels “over-the-top” (OTT) of the Internet to consumers. This comes on the heels of Time Warner’s wave of industry press following their decision (which must have been a challenging one) to outsource video delivery and move away from an in-house build for their HBO GO OTT offering. Clearly there is a lot of effort being put into new video delivery platforms. Efficient business models and optimized infrastructure are still being assessed. EMC Isilon speaks with many media organizations that are weighing the same decisions on how to leverage their media assets and capitalize on the accelerated consumer demand for online video.

Demand Driving Decisions
It’s no secret that that there is a voracious and rapid growth in consumer demand for online video. With delivery platforms like smartphones, tablets, set top boxes, gaming devices and connected TVs enabling convenient access to online video, consumers can access content wherever they have internet broadband or mobile data access—and they’re willing to pay for the privilege. According to PricewaterhouseCoopers (PwC), OTT video streaming will grow to be a $10.1 billion business by 2018, up from just a $3.3 billion in 2013.
Taking into account the shifting in consumer demand, the rapidly growing associated market share opportunity, and advertising/subscription revenue, it’s clear the demand for online video is a strategic, fast moving, and meaningful business opportunity for the media industry. So how do organizations take the next steps to delivering online video?

Evolution of Media Delivery
As decision makers become increasingly focused on the strategic business value that online video provides, executing on this evolution of media delivery may not be easy for some organizations. Success in the decision to compete in the OTT markets often boils down to three options: build, buy, or a hybrid of both. Regardless of the choice, this decision has major implications for your organization. In essence, the majority of media organizations will be creating a hybrid infrastructure – as few will own their own CDN or the “last mile.” However, there are a few aspects and specific requirements you would want to consider for your deployment and I reached out to one of our Isilon CTO’s Charles Sevior to share his overview of the infrastructure models:

  • Build — For existing media organizations, you’re probably already considering requirements like integration with existing content playout infrastructure, digital rights management and advanced monetization such as targeted ad serving and VOD subscription models. Leveraging your existing content assets, infrastructure and technology team to create a new OTT workflow can result in lower deployment costs and an efficient long-term solution. Your strategy to layering OTT video delivery on top of your regular playout enables your team to incrementally add the new workflow to your content delivery ecosystem. And you can learn the benefits of integrating advanced analytics technologies like Hadoop to extract valuable business insights and provide content recommendations for improved viewer engagement using an integrated Isilon Data Lake Foundation.
  • Buy — Aggregating content rights in your territory for the specific delivery mode is only the start. Setting up an operational infrastructure for reliable and “buffering free” media delivery is a large part of the equation for streaming success. For some businesses outsourcing the OTT video delivery infrastructure may be the best strategy. Development and operation of media infrastructure may not be one of your core business competencies, or time to market presents a need to launch today to get ahead of the competitors.
    Outsourcing has immediate benefits: speed to market is greatly increased; you have significant platform agility to dial in your business model; and the barrier to entry from a technical standpoint is low. Finally, your financial outlay is an operational expense. If the venture proves commercially non-viable, you can more readily shift strategies down the track.
    Choosing the right outsource partner becomes critical and an experienced media content delivery specialist can quickly accelerate your speed to market and help you navigate the challenges for your go-to-market. 
  • Hybrid — In reality, the best infrastructure for online video may be a hybrid model. With a hybrid model, you can leverage your current resources and talents against your “cash cow” business operations, while outsourcing parts of the video delivery infrastructure that have low revenue return or tight launch windows. A hybrid model gives your business the agility of rapid deployment with the flexibility to bring the workload back onto owned and managed infrastructure for reduced cost overheads and leveraging investments in staff, infrastructure and data centers.
    The EMC Isilon scale-out NAS has helped a lot of media organizations deliver content. In fact today, we are providing the origin storage solution to serve audio and video content to just under 2 billion subscribers worldwide in the cable, satellite, IPTV, OTT and streaming music industries.

EMC has a unique relationship with companies that have built an industry-leading infrastructure to deliver video to their customers worldwide. One of those companies leading the way is Kaltura, one of the top Online Video Platforms (OVP). They offer services and infrastructure to help you outsource, build your own (using their open source APIs), or develop hybrid content delivery solutions. Kaltura is not only helping media organizations, but also companies in education, enterprise and government sectors. Here is a short video that we created with Kaltura about their operations and infrastructure decisions:

As you consider your next step in video delivery, let us or Kaltura know how we can assist in your planning process. If you liked this post and video, please feel free to like, share, and tweet.