Author Archive

Enterprise Strategy Group on ECS: Meet data growth challenges with software-defined object storage

“This new era of massive content storage environments requires a new storage architecture, object storage.” That’s the view of Scott Sinclair, Storage Analyst at Enterprise Strategy Group (ESG) in its October 2015 whitepaper: “EMC Elastic Cloud Storage Offers Resilient Scalability for the New Generation of Workloads”. Let’s explore why ESG believes object storage is an essential enterprise technology.

Identifying The Challenges Of Data Growth

Managing today’s ever-growing quantities of (mostly unstructured) data is a constant challenge for most enterprises – a fact that ESG underlines in its whitepaper. ESG asked 373 IT decision-makers: “What are your organization’s biggest challenges in terms of its storage environment?

emcecs_1

The rapid growth of data was cited as a top challenge by 26% of IT decision-makers, which ESG says should not be a surprise. However, what may be more surprising is that all the other top challenges – including increased hardware costs, data protection costs, and staffing costs – can also be considered symptoms of data growth.

Why The Industry Is Moving To Object Storage

ESG says that object-based storage offers a strong solution to meet the multiple challenges created by data growth. It represents an evolution in the ability to store unstructured data, with near limitless scalability. Object storage also offers automatic geo-dispersed data protection and the ability to leverage commodity hardware, making it an ideal storage environment for a new generation of IT workloads.

To discover more about what attracts IT decision-makers to object storage, ESG also asked: “Which factors are responsible for your organization’s initial deployment or consideration of object storage technology?”

emcecs_2

As ESG reveals, across the wide variety of potential use cases for object storage, organizations are turning to the technology to help control TCO in the wake of rising digital content levels. But of those potential use cases, more organizations are looking to object storage as the foundation for the next generation of modern and cloud-based workloads.

Evaluating ECS Object Storage Advantages

ESG says that EMC’s object storage solution Elastic Cloud Storage (ECS) delivers key features for object storage, such as:

  • Software-defined architecture leveraging commodity components
  • A strongly consistent, global scale-out namespace
  • In-place Hadoop analytics with HDFS support
  • Support for S3 and OpenStack APIs
  • Combined erasure-coded and replication-based protection
  • Built-in journaling, snapshots, and versioning
  • Multi-tenant architecture
  • Flexible deployment models
  • On-premises data security

Today’s Demanding Workloads Are Ripe For ECS

As ESG comments: The next generation of workloads is looking to be even more dependent on larger levels of digital content than ever before, and therefore needs the next generation of storage.” Highlighted workload areas where ECS can help improve TCO and ROI for organizations include:

  • Foundation for a cloud storage solution: ECS supports your deployment of a cloud infrastructure with its ability to dynamically allocate resources – and to report on how they are being leveraged. Its potential for nearly infinite scalability along with its ability to leverage commodity hardware help keep cloud infrastructure costs low.
  • Data lake for business analytics: ECS offers the ability to pool all your data into a single globally distributed repository for business analytics to derive new value, while its multi-protocol access reduces the need to move data back and forth for analysis.
  • Global content repository: Offering a single scalable, resilient, and cost-effective pool that can serve multiple application and content types, ECS enables all the content within the repository to be globally accessible by web, mobile, and cloud applications – at up to 65% lower cost than public cloud.
  • Universal object storage platform for Internet of Things (IoT) with in-place analytics: The geo-distributed strong consistency of ECS provides the ability to collect and store IoT sensor data in locations closer to the actual “things,” which improves performance and saves cost.
  • Modern application development: The scale and automatic geographic accessibility of ECS can provide significant benefits for developing your modern applications, especially those that require access to a large pool of read-only content.
  • Cold storage archive: ECS-based archiving controls the growth of unstructured data by migrating “cold” data off of high-performing and more expensive storage – while allowing for the data to remain online and accessible.

The Next-Generation Object Storage Platform

As ESG concludes: “Object storage is designed for and ideally suited for large content storage. EMC’s ECS solution, ultimately, offers a foundation upon which IT organizations can build out the next generation of applications and workloads. The next-generation datacenter will need a next-generation storage solution, like ECS.”

Read the full ESG Whitepaper for yourself: ”EMC Elastic Cloud Storage Offers Resilient Scalability for the New Generation of Workloads”, October 2015.

Learn more about object storage with EMC ECS.

EMC ECS software is available to download now and try free.

 

Breakfast with ECS: Files Can’t Live in the Cloud? This Myth is BUSTED!

Welcome to another edition of Breakfast with ECS, a series where we take a look at issues related to cloud storage and ECS (Elastic Cloud Storage), EMC’s cloud-scale storage platform.

The trends towards increasing digitization of content and towards cloud based storage have been driving a rapid increase in the use of object storage throughout the IT industry.  However, while it may seem that all applications are using Web-accessible REST interfaces on top of cloud based object storage, in reality, while new applications are largely being designed with this model, file based access models remain critical for a large proportion of the existing IT workflows.

Given the shift in the IT industry towards object based storage, why is file access still important?  There are several reasons for this, but they boil down to two fundamental reasons:

  1. There exists a wealth of applications, both commercial and home-grown, that rely on file access, as it has been the dominant access paradigm for the past decade.
  2. It is not cost effective to update all of these applications and their workflows to use an object protocol. The data set managed by the application may not benefit from an object storage platform, or the file access semantics may be so deeply embedded in the application that the application would need a near rewrite to disentangle it from the file protocols.

What are the options?

The easiest option is to use a file-system protocol with an application that was designed with file access as its access paradigm.

ECS - Beauty FL_resizedECS has supported file access natively since its inception, originally via its HDFS access method, and most recently via the NFS access method.  While HDFS lacks certain features of true file system interfaces, the NFS access method has full support for applications and NFS clients are a standard part of any OS platform, thus making NFS the logical choice for file based application access.

Via NFS, applications gain access to the many benefits of ECS, including its scale-out performance, the ability to massively multi-thread reads and writes, the industry leading storage efficiencies, and the ability to support multi-protocol access, e.g. ingesting data from a legacy application via NFS while also supporting data access over S3 for newer, mobile application clients and thus supporting next generation workloads at a fraction of the cost of rearchitecting the complete application.

Read the NFS on ECS Overview and Performance White Paper for a high level summary of version 3 of NFS with ECS.

An alternative is to use a gateway or tiering solution to provide file access, such as CIFS-ECS, Isilon CloudPools, or third-party products like Panzura or Seven10.  However, if ECS supports direct file-system access, why would an external gateway ever be useful?  There are several reasons why this might make sense:

  • An external solution will typically support a broader range of protocols, including things like CIFS, NFSv4, FTP, or other protocols that may be needed in the application environment.
  • The application may be running in an environment where the access to the ECS is over a slow WAN link. A gateway will typically cache files locally, thereby shielding the applications from WAN limitations or outages while preserving the storage benefits of ECS.
  • A gateway may implement features like compression, thereby either reducing WAN traffic to the ECS, thus providing direct cost savings on WAN transfer fees, or encryption, thus providing an additional level of security for the data transfers.
  • While HTTP ports are typically open across corporate or data center firewalls, network ports for NAS (NFS, CIFS) protocols are normally blocked for external traffic. Some environments, therefore, may not allow direct file access to an ECS which is not in the local data center, though a gateway which provides file services locally and accesses ECS over HTTP would satisfy the corporate network policies.

So what’s the right answer?

The there is no one right answer; instead, the correct answer will depend on the specifics of the environment and of the characteristics of the application.

  • How close is the application to the ECS? File system protocols work well over LANs and less well over WANs.  For applications that are near the ECS, a gateway is an unnecessary additional hop on the data path, though 3d Kugel mit Fragezeichen im Labyrinthgateways can give an application the experience of LAN local traffic even for a remote ECS.
  • What are the application characteristics? For an application that makes many small changes to an individual file or a small set of files, a gateway can consolidate multiple such changes into a single write to ECS.  For applications that more generally write new files or update existing files with relatively large updates (e.g. rewriting a PowerPoint presentation), a gateway may not provide much benefit.
  • What is the future of the application? If the desire is to change the application architecture to a more modern paradigm, then files on ECS written via the file interface will continue to be accessible later as the application code is changed to use S3 or Swift.  Gateways, on the other hand, often write data to ECS in a proprietary format, thereby making the transition to direct ECS access via REST protocols more difficult.

As should be clear, there is no one right answer for all applications.  The flexibility of ECS, however, allows for some applications to use direct NFS access to ECS while other applications use a gateway, based on the characteristics of the individual applications.

If existing file based workflows were the reason for not investigating the benefits of an ECS object based solution, then rest assured that an ECS solution can address your file storage needs while still providing the many benefits of the industry’s premier object storage platform.

Want more ECS? Visit us at www.emc.com/ecs or try the latest version of ECS for FREE for non-production use by visiting www.emc.com/getecs.

Breakfast with ECS: ECS and Centera – Optimizing the Data Archive

Welcome to another edition of Breakfast with ECS, a series where we take a look at issues related to cloud storage and ECS (Elastic Cloud Storage), EMC’s cloud-scale storage platform.

Breakfast wtih ECSWhen Centera was introduced in 2002, it promised long term storage of a company’s archives with an architecture that completely separated the application access method from the storage infrastructure.  From the beginning, EMC has always promised to maintain the Centera data, the APIs that are used to access it, and the applications that have integrated to those APIs, with the full knowledge that the lifecycle of the Centera data would exceed the lifecycle of any individual HW or SW platform.  The Centera SW architecture has fulfilled this promise across multiple hardware generations and across multiple disk drive variants, and with ECS 2.2, the promise is being extended even across SW architectures.

Why a new architecture?

The state of the industry has changed dramatically from when Centera was originally designed and built.  Scale-out, distributed storage accessed via an IP network is no longer a novel concept, and increases in network speeds, CPU power, disk density, and memory sizes mean that the Centera model of a small number of disks per node is no longer economically feasible.

ECS is a new software architecture which was designed from the ground up to meet the needs of the modern data center, and with the ECS 2.2 release, it not only has full support for the Centera API, it has built in capabilities to automatically index and migrate content from a Centera cluster to the ECS, all in a non-disruptive fashion, maintaining full application access throughout the process.

But ECS does far more than just support the Centera use cases, ECS adds additional capabilities and efficiencies to the storage of Centera data and can even apply these to enhance the value of existing Centera data, as well as newly written data.  These capabilities include:

  • Erasure coding protection for all data, including clips, in addition to a greater variety of choice in the protection scheme to best meet the needs of the individual situation. ECS can protect data with 20% or 33% overhead and can apply this protection equally for large and small objects, all the way down to 4 node systems, for far better storage efficiencies than Centera.
  • Protection against multiple disk failures. While Centera offered protection only against single disk failures, ECS protects against the simultaneous failure of 2 disks (20% overhead) or 4 disks (33% overhead).
  • With the introduction of Data at Rest Encryption in ECS 2.2, ECS can automatically and transparently encrypt data before storing it, adding an additional level of protection for your most sensitive data.
  • ECS provides site failure protection for less than 2x the original data size in a three or more site environment, unlike Centera which would need minimally 2.3x (4x if using mirroring) data storage for a similar level of protection.
  • ECS offers far denser configurations, with up to 60 drives per node, for far more economical data storage than Centera.

In addition, ECS preserves Centera’s compliance capabilities, including fixed retention periods and retention classes, allowing all Basic, GE, and CE+ data from Centera to be safely migrated to ECS.

Read the Top Reasons Handout on why you should choose EMC Elastic Cloud Storage (ECS) for archiving.

How do I move from Centera to ECS?

There are two ways to start leveraging the benefits of ECS for Centera customers – either by leveraging an external migration tool, of which there are several choices available now, or by leveraging the built-in transformation capabilities in ECS 2.2.

The transformation capabilities of ECS 2.2 fully automate the migration process – the Centera is added to the ECS cluster and the application directs its reads and writes to the ECS – the ECS enumerates the Centera, creates an ECS-based index for that content, migrates the content, then performs a final reconciliation to validate that all data has been correctly migrated.  The entire process is fully automated, driven by ECS, and transparent to the application.

Of course, ECS is capable of supporting workloads far different than the typical Centera workload.  ECS is suited for modern S3 or Swift based applications, for analytics workloads, for web-based application storage, for Internet of Things applications, and much more.  ECS 2.2 brings the benefits of a modern software architecture to the existing Centera data while simultaneously supporting modern application workloads and allowing a single software/hardware environment to satisfy the full set of scalable storage needs.

A new application model requires a new toolbox

The rise of the internet and the Toolboxprevalence of mobile devices, including smartphones and tablets, have driven a revolution in application design towards what is commonly known as a Platform 3 application.  Platform 3 applications are characterized by a scope of use that spans potentially millions of users, a breadth of access that includes worldwide access 24 hours a day, a volume and variety of data storage and access needs that includes both traditional applications as well as big data analytic platforms, and need to share data sets across multiple instances of a single application as well as across multiple independent applications. As businesses increasingly move towards such platform 3 applications, they can take advantage of a number of tools available in the industry to help them create and deploy their applications. However, it is not sufficient to have a great set of tools in your box; knowing how to use them effectively is just as important. (more…)

The Modern Data Center: Greek sailors, knitting supplies, and a Spanish painter

What do Greek sailors, knitting supplies, a Spanish painter, thermal energy, and an obscure Greek word have in common, and why are they in modern data centers?

Kubernetes, YARN, Diego, HelmHeat, Mesos  – does it ever seem like the IT industry today is simply a confusing alphabet soup of names being bandied about?  Ever wonder what these things are and why they are important?

Before answering the question of what these are, it is important to first understand the transitions taking place in the data center and what new challenges are arising with new technologies.

A modern data center

The advances in commodity (generally x86) processing power, the prevalence of cheaper SATA disks, and the emergence of faster networks have been the major elements to transform the modern data center.  Standard hardware platforms have now become capable enough to run enterprise grade workloads without the need for specialized hardware or purpose built resiliency and serviceability features.  As such, farms of these commodity hardware platforms have replaced the specialized systems that previously characterized the data center.

Data CenterApplications designed for a modern data center are composed of a set of cooperating processes, where each process is typically encased within a schedulable virtualized environment (e.g. a VM, a container, a JVM, etc.), as described in methodologies such as the 12-factor application.  The reasons behind this architectural paradigm are myriad, including improved fault isolation, improved fault recovery, ability to scale the app quickly, and the ability to move the app seamlessly to newer/faster HW.   Whichever form of virtualization technology is chosen as the basis for the application environment, it creates a layer of abstraction between the application environment and the hardware, opening up multiple options and strategies for scheduling application environments onto the farm of commodity hardware platforms.

With a farm of commodity hardware and a suite of disparate applications to run, the question becomes one of how best to map the applications onto the available resources in the processing farm.  Having a system administrator manually monitor and rebalance applications across the processing farm would be inefficient, thus the need for an automated scheduler to perfom these actions – and this is where Kubernetes, YARN, Diego, Heat, and Mesos enter into the picture.

There is a wide variance in the capabilities and scope across these different schedulers, but at a basic level, each provides:

  • The ability to select the most appropriate, currently available hardware resources for new application instances
  • The ability to run multiple diverse applications on a single HW platform
  • Automated application failure detection and restart capabilities
  • Scaling the number of application instances up or down in response to bursts or dearths of app activity

However, new issues arise from the fact that the applications are no longer tied to individual machines that the scheduling framework must address to ensure that the application can execute properly.

  • Applications state may no longer be stored local to a particular piece of HW, as applications are mobile within the environment
  • The framework must provide a dynamic mechanism for an application to locate and access its shared state.
  • Application instances that need to communicate with each other cannot rely on a pre-configured set of IP addresses to identify their peers
  • The framework must provide dynamic network capabilities to enable app-to-app connectivitiy, especially if the connection must be secure.
  • Applications may need local storage for scratch files or temporary storage of intermediate results
  • The framework must provision this in a way that is appropriate on the physical HW where the application happens to be running
  • The framework must provide the application a generic (i.e. not specific to the particular HW platform) channel to access the temporary local storage.
  • The framework must de-provision the temporary storage whenever the application instance stops running, be it either a graceful or unexpected shutdown.
  • Applications that are clients of other applications (e.g. one app is a client of a database) cannot rely on fixed addresses of where those services should be running
  • The framework must provide a dynamic service to allow the app to discover what application services are available and to subscribe to those that are appropriate.

While there are many similarities between Kubernetes, YARN, Diego, Heat, and Mesos at a basic level, each of these frameworks also has differentiating features in how it works and how much of the overall application development and deployment lifecycle it captures beyond what is captured above.

  • Kubernetes has been designed and optimized for the coordination and scheduling of containerized Linux applications.
  • YARN is the next-generation Hadoop task scheduler, and has been generalized as an enterprise scheduling framework, especially for sets of applications accessing a common data set.
  • Diego is part of the larger Cloud Foundry project, which aims to provide a full cloud scale development, deployment, and operations environment.
  • Heat is part of the OpenStack project, which aims to provide a full cloud stack for private, hybrid, or public cloud environments.
  • Mesos aims to provide a basic level of resource allocation and scheduling, which can then be customized by various plugins for particular application workloads.

The modern data center has been transformed at all levels by the rise of commodity components.  At the lowest level, storage products like ScaleIO and ECS provide software defined and managed storage pools which isolate the applications from the details of the hardware, while at a higher level virtualization and containerization technlogies isolate the application runtime environments from the details of the hardware.  Frameworks such as Kubernetes, YARN, Diego, Heat and Mesos fill the gap between the storage and the applications and complete the picture of an application environment that can adapt and change as the hardware environment is expanded or upgraded.

And now you know why Greek sailors, knitting supplies, a Spanish painter, thermal energy, and an obscure Greek word are finding a home in modern data centers.

 

Categories

Archives

Connect with us on Twitter