Posts Tagged ‘Software-Defined Storage’

The EMC Portfolio Approach to Next-generation Storage Technology

Suresh Sathyamurthy

Sr. Director, Product Marketing & Communications at EMC

EMC is approaching the emerging revolution in storage technology development with a portfolio of advanced tools and solutions. These storage solutions are engineered to adopt and grow with the next-generation of big and unstructured data generated from diversified industries.

As data begins to inundate the marketplace, file types and sizes are becoming more extreme, mutative and mountainous than ever before. IT departments are tasked with managing constant streams of data from the cloud, social platforms, mobile devices and the web, as well as other sources never before realized by traditional BI and on-premise content management infrastructures. Software-defined storage (SDS) represents the leading edge of storage solutions, and EMC is positioned at the helm.

EMC’s depth of industry expertise SDS Portfolioand emerging tech software development experience have informed the creation of a suite of solutions that can be tailored to meet the needs of organizations transitioning to SDS, those working to incorporate SDS technology into their existing storage infrastructure, or those looking to redefine how they use, gather, maintain and manage the constant influx of data on which their businesses depend.

What sets EMC apart is a grander vision for SDS as a driving concept that should span how storage will be delivered now and in the future. EMC is embracing, redefining and disrupting the industry standards to deliver vendor neutral, open-standard APIs that allow products to be used as a standalone platform or part of a full cloud deployment, like OpenStack. Open source community editions increase flexibility and reduce risk.

EMC software-defined storage products improve and provide new means of organization and delivery models for file, block, object, HDFS and hyper-converged storage, as well as next-gen rack scale, data center and hyper scale-out storage. These products include IsilonSD Edge (scale-out file storage), ScaleIO (scale-out block storage) and ECS (cloud-scale object storage). The flexibility of EMC SDS solutions makes them available as appliances or free and frictionless downloadable software that can be installed on industry-standard hardware.

The sophisticated nature of EMC’s broad selection of SDS solutions allows easy integration into existing content management schemes without massive hardware replacement investments, new staff integration or performance downtime. SDS allows complete control over protocols, access, interface and data management within the systems, removing extraneous nodes, access points and machinery from the content management experience. This achieves more streamlined IT processes with reduced hardware, software, training and staffing costs and the ability to grow and adapt as the advancements in data collection and formatting continue.

The portfolio of SDS solutions developed by EMC allows organizations to not only better collect, protect and store next-gen data types and proportions, but to provide the possibility of streamlined growth as technology transforms, providing multi-generational, cost-effective options for the future of IT sectors everywhere.

Learn more about how EMC SDS solutions can prepare your enterprise for the future of big data. Stay up to day on everything SDS at www.emc.com/sds.

EMC {code} shares how to deploy ECS with Five Ways of Docker

Kendrick Coleman

Kendrick Coleman

Developer Advocate at EMC
Kendrick Coleman

Latest posts by Kendrick Coleman (see all)

We’ve been in a lot of conversations about DevOps recently, with customers, partners, community members and EMC teams. Whether or not you believe in the buzz around DevOps, there is definitively a wave of new and open tools, proclogo-emc-codeesses, and operational models being used in IT. These are all trends that can’t be ignored, and we’re continuously working to make sure EMC’s product strategy adapts with these changes.

EMC {code} is a developer evangelism team within EMC. We contribute to major open source projects, create our own tools and projects (also entirely in the open– in fact– you can find them all on GitHub, here), and engage with developer communities. Several of our team members have worked with EMC SDS products over the past few years, and we’ve enjoyed seeing the progress made to make our SDS tools readily accessible for developers and IT/ops alike.

ECS has long been a developer-friendly product, with universal protocols like object and HFDS, and a software-only download option. The recently announced ECS 2.0 is chock full of updates that are making the lives of dev/ops teams easier—including enhanced geo-caching, multi-tenancy capabilities, monitoring and reporting, and the ability to automatically and rapidly failover and recover from outages.

ECS 2.0 only enhances the experience for developers with a containerized download, free for test-and-dev environments (non-production use today). This has been another milestone in providing free and frictionless access to EMC software (along with the open-sourced CoprHD and free download of ScaleIO).

Our team works with Docker containers on a regular basis, and when ECS was announced, we rolled up our sleeves to see exactly how we could integrate Docker tools into the ECS experience.

In addition to the containerized download, we wanted to explore ways of deploying ECS with broader Docker technology. Deploying multi-node ECS is fast and easy with Docker tools. Using Docker Machine, you can deploy Ubuntu hosts that will be a part of a Docker Swarm cluster. With Docker Compose, you can deploy ECS from Docker Hub to a Docker Engine container on each host in the Docker Swarm cluster. If you are interested in how to do this, you can check out the GitHub repo or watch this quick video.

With this automated process, everything will be complete and configured in the span of 10-15 minutes. To get all of the details, please visit the EMC {code} blog post on the topic, or send us a tweet with your feedback to @EMCcode.

Configuring ECS: Start Uploading Files in 5 Minutes!

Many times after an ECS installation, the question that arises is, “Now, what do we do with this thing?” It’s understandable because an object-storage appliance is not like a SAN or a NAS array. You cannot just walk back to your cube and start archiving old documents to the ECS.

At least, not yet. ECS will have NFS support and some other tools in the near future that will enable just that.

But for now, a quick way to demo access to ECS is to use a freeware such as the “S3 Browser.”

To summarize the two-step process: Go through the ECS portal (GUI) to create a Namespace, and then use the S3 Browser to access ECS.

Step 1: Configure ECS 2.0 PortalECS GUI

There are a few easy steps to accomplish this.

  1. First, create a Storage Pool in the ECS portal by choosing the ECS nodes and giving that collection a name.
  2. Next, from the Virtual Data Center (VDC) tab on the left
      • Click on “Get VDC Access Key”. Copy the displayed key
      • Click on “New Virtual Data Center” and create a VDC using the above key and the IP addresses of all the ECS nodes.
  3. Then create a Replication Group using the newly created VDC and the Storage Pool.
  4. Now, create a Namespace using “root” and the Replication Group. Namespace is analogous to a “Tenant.” This is where buckets will be created.
      • At this point, you have an option of moving to the next step or creating a bucket. If you create a bucket, you will see that bucket after configuring the S3 browser. To create a bucket, go to the “Buckets” tab, choose the Namespace and the Replication Group that you just created, and give the bucket a name.
  5. Finally, go to Users, click on “Object Users” à “New Object users”; choose the Namespace for “root” and click on “Next to add Passwords”. In that next page, generate an S3 secret key for root and copy it to a text editor (to be later used within S3 Browser).

You’re done with ECS portal.

Step 2: Configure S3 Browser and Access ECS

Now, open the S3 Browser. Go to “Accounts” on the top left and choose “Add New Account”. There are four things that need to be filled out or chosen:

  1. Type in an Account name
  2. Under “Storage Type”, Choose S3 Compatible Storage
  3. In “REST Endpoint”, type in the IP Address of one of the ECS nodes along with the port of 9020 (9021 is for secure https). The format is <IP_address>:<port_number>. Note that, by default, S3 Browser uses http, and if you want to use https, go to Tools -> Options -> Connection and check the top option to use secure connection.
  4. Under “Access Key Id”, put in “root”
  5. Copy and paste the secret key from Step 1-E into the field for “Secret Access Key”.
  6. Save Changes and you should be good to go.
  7. Now you can click on “New bucket” in the main page and upload files to that bucket from your local drive. If you had created a bucket from within the ECS portal at the end of Step 1-D, you will now see the bucket listed here as well.

Here is an image of the configuration of S3 Browser with some annotations:Configuration

Hope you found that easy! Of course, after this demo, customers would use their real enterprise applications to access ECS using REST API and using different users, namespaces, secret keys etc., but the fundamentals are the same.

Introducing ECS 2.0

Vikram Bhambri

Vice President, Product Management, Emerging Technology Division at EMC

Latest posts by Vikram Bhambri (see all)

ECS (Elastic Cloud Storage) provides a hyperscale, geo-distributed, software-defined cloud storage platform that is perfect for unstructured data, Big Data applications and Data Lakes. Deployed as a software-only solution or as a turnkey appliance, ECS offers all the cost advantages of public cloud running on commodity infrastructure while featuring enterprise reliability, availability and serviceability.

The new version of ECS headlines many new features and enhancements. Here is a “Top Ten” list of the most significant improvements and changes in ECS, version 2.0:

1. Built-in Element ManagementECS Infographic
Any storage array requires strong element management capabilities to manage the infrastructure and its lifecycle. Until last release of ECS, the element management functionality of ECS was delivered via ViPR Controller. This meant that each installment of ECS would require a minimum of 3 additional VMs to run the ECS Appliance.

In ECS 2.0, all of the element management functionality is part of the ECS software itself and there is no requirement for additional ViPR Controller to manage ECS. The user interface for managing ECS is on each node and the administrator can hit any of the ECS nodes to get access to it. The improved interface is easy and simple to use with lot of enhanced functionality.

For customers who want a single pane of glass for management of their storage infrastructure, ECS will be able to plug into ViPR Controller like any other array that ViPR Controller manages and provide one single northbound interface for administrators for storage management.

2. Better UI, simplified management and operations
One of the fundamental changes in 2.0 release is the introduction of self-service UI and management built into the ECS software itself. ECS offers a simpler and more intuitive UI that allows administrators to easy deploy and manage their Appliance’s lifecycle.

In addition, based on the feedback from 1.x customers, a lot of the terminologies have been simplified, better-named or even removed. For example, a “tenant” is now the same as a “namespace”, so you don’t need two different variables. ECS 2.0 has simplified the multi-tenancy model and number of roles involved for managing the ECS. All of these operations managed by the UI are consuming a standard set of REST APIs that can be consumed directly if the customer chooses to use another user experience for managing their storage environment.

3. Improved Monitoring and Diagnosis in ECS UI

The new UI has lots of new information that will assist in monitoring, diagnosis and performance analysis, with drill downs and charts to make the analysis easier to visualize.

To start with, administrator can get information on

  • Capacity utilization for disks,nodes and storage pools.
  • Granular information on bandwidth, IOPs, latency and network utilization in different categories such as erasure coding, geo-caching, metadata, user data etc.

Other enhancements include detailed reports on replication groups

  • Bandwidth for ingress/egress
  • Progress of replication, details about chunks that are being cached and replicated etc.

In case of site failure or disaster, Administrators are able to get details on duration of recovery and amount of data to be recovered.

All this information will be stored for 7 days. For a longer period of data, the Customer can use ViPR SRM  (with a new Solution Pack) or their own software that accesses ECS via REST API as all of the monitoring and diagnostics data is available through REST APIs.

4. Rack-level Awareness for better HA
ECS 2.0 brings rack-level awareness to the software so that when data has to be
distributed across disks and nodes, ECS software can spread the data across different racks for increased high-availability and redundancy. Each disk, node, rack and DCs are considered fault domains in ECS and data is distributed in a way to maximize the data availability.

5. Geo-replication: Unsealed chunk replication means better RPO
For geo-replication cases before ECS 2.0, ECS software wrote an object to chunks (of size 128MB), it waited for the chunk to get filled up, and then replicated the chunk to a remote site in an asynchronous process. Although this strategy is more efficient, the drawback is that if an entire site or a rack goes down, there could be many chunks with less than 128MB of data that have not been replicated. To reduce the risk of data loss, the software now starts streaming the data to remote Data center and the replication process kicks off as soon as a chunk receives new data. This feature will help with an improved RPO.

6. Geo-caching for better performance in multi-sites & Overall performance Improvements
In a multi-site environment, the data were always accessed from the primary site where it was originally written. This meant that every time customers in other sites accessed the data, it involved cost for WAN bandwidth as well as slower performance caused by WAN latency.

ECS 2.0 solves the above problem by the use of geo-caching the data at the secondary site on the local disk so that customers can access the data locally without a WAN transfer. This is more applicable to scenarios where the number of sites is greater than 2.

EMC 2.0 has stepped up the game in object performance when compared to earlier object platforms. Performance comparing ECS Appliance to ATMOS:

*Small Objects: write 6X faster – read 2X faster
*Large Objects: write 5X faster – read 9X faster

7. Temporary Site Failover
Temporary site failures like network drops are pretty common in data centers. ECS 2.0 has smart temporary site failures and failback features that allow applications to access its data even when the primary site is unavailable or unreachable. The delta writes can go to the secondary site. ECS software will also automatically re-sync the sites and reconcile the data when all the sites are operational and connected to each other again. Any conflicts that arise, get resolved leveraging the algorithms built into the software layer.

8. Metering and Auditing
One of the common requirements for running a large-scale multi-tenant distributed storage environment is to have very detailed metering. ECS 2.0 provides key statistics for individual buckets and tenants. This includes capacity, object count, objects created, objects deleted and bandwidth consumption (inbound as well as outbound bandwidth). The design and implementation is done in a way that it will satisfy the requirements of large scale services providers either in a single managed customer or a multi-tenant shared environment.

The new software also enables auditing for buckets, which allows administrators to view activities regarding creation, update, and deletion of buckets, and any changes in bucket ownership. This is especially important for environments that have to be governed by specific regulations. The events can be accessed through the UI or the REST API.

9. Quotas
ECS 2.0 software now allows administrators to set soft or hard quotas for buckets and tenants. This allows for administrators to set guard rails on consumption by application thus creating a sandbox without impacting other application users.

Alerts are raised and writes can be blocked after the set limit if the administrator chooses so. The administrator can also set up policies that can lock a specific bucket or a user if in case their application workload is causing other tenants to be impacted.

10. Free & Frictionless Download of ECS Software
Today, EMC announced the “Free and Frictionless” version of ECS. There is now a free download of ECS software for development and testing purposes with unlimited capacity and perpetual licensing.  ECS made a bet on Docker even before it was GA and every single ECS Appliance ships with Docker in it. Now, we are making the same software containerized for broader audience. This allows developers, partners and customers to use and develop for ECS, while enjoying access to a large developer community. ECS is downloaded as a docker container, and installation can be done manually or automated through Puppet or Vagrant to run the software on one or more VMs/ bare metal servers. Stay tuned for information in EMC Pulse blog, the ECS Twitter account, and the EMC Community Network.

That’s ECS 2.0 in a nutshell. Of course, there will be more blog posts and white papers coming out soon as well. Customers, partners and developers have helped shaped the 2.0 version. We will continue to solicit more feedback on additional features should be added to the platform. If you have any request for features, please let us know and we will incorporate that in our roadmap.

Datadobi and EMC: Fast, Reliable Data Migrations

Michael Jack

Global Sales Director at Datadobi

Latest posts by Michael Jack (see all)

*The following is a guest post by Michael Jack, Global Sales Director at Datadobi.

Over the past 13 years, EMC Centera has – and continues to be – the most reliable object storage and compliant archiving platform in the industry.  While EMC continues to sell and support Centera, customers are also looking at migration strategies to gain new cloud scale features and automation services.

That’s where Datadobi can help.  Datadobi is an EMC Technology Connect business partner that offers DobiMiner, our migration software that migrates Centera customers to other storage platforms. In this blog, we’ll discuss Datadobi’s unparalleled migration experience, and we’ll share considerations and strategies on how to migrate Centera data to EMC Elastic Cloud Storage (ECS).Datadobi_blog_image

Centera is one of the most stable platforms in EMC’s storage platform portfolio providing 6 nines of availability. It also has an excellent feature set including the coveted Compliance Edition Plus model used by financial institutions around the world to ensure the immutability of their data. Centera created a new paradigm in the storage industry and set the standard for Object Storage that no other vendor has yet to achieve.

While many use cases for Centera remain strong, the move to next generation distributed cloud, mobility, and Big Data applications are driving many Centera customers to take advantage of the next generation object storage. And with many companies having experienced the cost, risk, and time associated with moving from one platform to another, they are rightly asking ‘how do we move all that data without all that pain?’

So how do you move CAS data? Many EMC customers have experience with NAS migrations but have no idea how to move CAS data. To answer that question, let’s take a brief look at how CAS works. CAS data sits in a flat file space without a classic file structure and accessing it requires having the key known as the Content Address or CA on Centera. When an application writes data (a user file) to a Centera, Centera creates a Content Descriptor file (CDF) and places the CA of the user file in the CDF. It then creates a CA for the CDF and passes that to the application, which stores this in its database. When the application needs to retrieve the file again, it passes the CA of the CDF to the Centera, which looks in the CDF to find the CA of the user file. All pretty complex stuff which is what makes Centera such a great product.

To migrate a CAS file to another platform (be it CAS, NAS, RESTful, OpenStack, etc.) you can’t use a tool such as Robocopy because you have to use the CA to locate the file. You can only achieve this by using:

  • The application that wrote the data or
  • A specialized migration software such as DobiMiner

You may think migrating the data using the application that wrote the data would be the easiest way; however, this is not the case. Applications migrate data by re-reading it from the Centera and then re-writing it to the new platform. Most applications are not designed for high-speed read and write. As a result, migrations done in this way tend to be very lengthy affairs resulting in additional expenses such as professional services and extended support contracts for the old platform.

DobiMiner takes a different approach – it quickly collects the list of data to be migrated directly from the Centera by parsing the CDFs information to the DobiMiner instance. Because the CDFs contain the CA of the user file and the file sizes the entire scope of the migration is completely understood. This allows the migration team to make informed decisions and have a predictable finish date before actually migrating the data. Once the scope is agreed, the data can be rapidly pumped to the new platform at Terabytes a day.

Once the data has been copied to the new platform, DobiMiner reads back each individual file and compares it with the same file on the old platform to validate its authenticity before declaring it fully migrated.

Mining all the CDFs also enables you to have a complete understanding of your data before migrating. For example, you’ll know whether the data is still recognized by the application, has the data expired but not been deleted, and are the replicated environments in sync? All this information enables you to migrate only the data (and all the data) with true business value ensuring the best ROI on the target platform.

CAS-to-non-CAS – a slightly different kettle of fish

Where all of the above relates to both CAS-to-CAS migrations (such as Centera-to-Centera and Centera-to-ECS) and CAS-to-non-CAS migrations (such as Centera-to-NAS or a RESTful interface), moving data from a CAS to a non-CAS platform can involve additional complexity.

Applications write data to a Centera in one of two ways:

  • Through a file-system gateway or
  • Natively through the Centera API

Some applications use a file-system gateway (such as the Centera Universal Archive) to write the data to the Centera and it is the file-system gateway that stores the CAs not the application. The application database stores a file path making migrating that data to a non-CAS platform straightforward. The file path known to the application can be duplicated on the new platform making the migration transparent to the application.

On the other hand, applications using the API store the list of CAs in their database and therefore require an additional step in the migration process. Firstly, a file path naming scheme must be agreed for the new platform. A combination of fields in the CDF can be used for this or DobiMiner can create a path itself. Secondly, the list of CAs in the application database must be replaced with the new file path so that the application can access the files after the migration. This database update can occur in one of a number of ways:

  • Fully automated – the migration software automatically connects to the application database and overwrites the CA with the new file path.
  • Handshake – the migration software creates a file that maps the old CA to the new file path and this mapping is manually used by the application vendor to update the application database.

Unfortunately, some application vendors do not allow either method and only allow migrations through the application itself (making for a lengthy migration).

In conclusion, migrating Centera to another platform can seem like a pretty daunting process – it really isn’t. With DobiMiner, most of the difficult tasks are automated to ensure a simple, low risk, predictable, and fast migration.

Avoid the pain. Take advantage of Datadobi and our partners’ expertise to perform your next migration and let your team focus on your business.

Ceph is free if your time is worth nothing!

Jyothi Swaroop

Director, Product Marketing at EMC

Latest posts by Jyothi Swaroop (see all)

Ironic how we grow up listening to our parent’s tell us “Nothing in life is free” yet the moment someone claims they have or do something for “free”, we forget that simple truth.

Is anything ever really free?
There are offerings out there today that claim they are open source, free and always will be. However, if we remember what Mom and Dad said – then we need to look deeper into this. Time, overhead and hardware requirements to run these open source solutions are not free.

For this discussion, let us take a look at Ceph and Ceph “Enterprise” editions. Ceph is an open source distributed object store and file system which claims to be free. As an open source platform the code is indeed free but if Ceph is free, why would any company pay to acquire a “commercial wrapper” for it, such as Inktank? When it comes to open source, large companies make their money by selling “enhanced” versions of the software along with professional consulting, services, and support.

Enterprise versions are not free and often expensive. Customers pay more in server hardware, server OS licenses & disk drives. Licensing and support can run as much as $4K per server.

Now, some will say, “I will go with the free version of the open source solution and not the “enhanced” or “enterprise” edition offered – it’s more cost effective and I can support it myself”. It is definitely an option, and in some instances may make sense, but before you make that commitment ask yourself:
• Can I get 24×7 worldwide support – the support I need, when I need it?
• Do I have to wait for a community to help solve my problem and even if a fix is suggested, will it work for me, in my environment?
• Will my customers wait till tomorrow or the next week for a fix?
• Does ‘ongoing support by committee’ really work?
• What am I willing to give up?

If it is free – do you get what you pay for?Car

When it comes to software-defined scale out block storage for high performing applications and / or delivering Infrastructure-as-a-Service (IaaS), “free” may not be better. Will you simply be getting what you pay for?

Starting with installation, as a software-defined offering, Ceph does not constrain or confine you like current hyper-converged appliances do. However, installation is actually extremely complex. Architecting and running Ceph as a storage platform requires deep Linux and Ceph expertise and experience. It requires a multi module, multi-step, deployment process (with different processes for each OS) which complicates management and incurs a larger performance hit to the underlying hardware. Ceph also takes a ‘non-native’ layered approach to delivering block storage, where block is on top of object. David Noy, VP Product Management at EMC pointed out in his blog last month, that with a layered approach “problems come when you have a system that is designed to optimize around the underlying abstraction and not the service layered on top”. This is evident in Ceph’s approach to block (RADOS Block Device – RBD) which has extreme overhead resulting in high latency and an inability to exploit Flash media.

OK, so you know there will be a great deal of work to set up and manage Ceph. You still feel you are ready to deal with this cryptic approach including: compile/decompile/manually edit crush maps; limited system visibility using a command line interface (CLI); and even the manual placement group (PG) planning and repair. Yes, the free approach, even with all of this, will meet your needs. Maybe but let’s not forget what really matters. When delivering IaaS or high performance applications, delays in response are simply not acceptable to your customers or users. How does Ceph measure up where it really counts: Performance and Scalability!

The Proof is in the Numbers
We recently tested Ceph against EMC ScaleIO and the findings were clear as day. Both were tested on the same hardware and configuration with the following assumptions:
• Test “SSD only” using a small volume that will fit the SSD capacity
• Test “SSD+HDD” using a small+large volume spanning HDD capacity
• Test SSD as Cache for HDD using a small+large volume spanning HDD cap
• Test a Mixed workload of 70% Reads, 30% Writes, 8KB IOs

Findings:
• ScaleIO achieved ~7X better performance than the best Ceph IOPs value for a drive limited configuration
• ScaleIO achieved ~15X better performance than Ceph, when the drives are not the limit
• ScaleIO has ~24X better Response Time with an SSD only configuration
• ScaleIO can support the IOPs at 1/3rd the latency of Ceph, as a result there is no need to second guess performance for applications you run on ScaleIO.

Similar to Ceph, EMC ScaleIO is a software-only solution that uses existing commodity hardware servers’ local disks and LAN to realize a software-defined, scale-out SAN. However, EMC ScaleIO delivers elastic and scalable performance and capacity on demand and beyond what Ceph is a capable of for enterprise deployments. ScaleIO also does not require additional servers for providing storage and supports multiple platforms including OpenStack via a Cinder Plugin. It requires 1/5th to 1/10th the number of drives that Ceph needs to deliver the same performance. This actually results in significant floor and power savings.

The evidence speaks for itself when it comes to performance, scale and enterprise grade capabilities – sometimes you just get what you pay for. But don’t just take our word for it. Here is a perfect example of the kind of issues a company can face, including the potential loss of date, when delivering services with Ceph. Also, Enterprise Strategy Group (ESG) recently published a Lab Spotlight demonstrating extremely high IOPS performance and near-linear scaling capabilities of ScaleIO software on commodity hardware.

If you STILL want to be able use software for free BEFORE you make a long term, strategic commitment, EMC provides you the same opportunity with ScaleIO. May 2015, EMC is offering a free download of ScaleIO, for non-production use, for as much capacity and time as you want. You can experience all of the features and capabilities and see for yourself why enterprise grade software-defined scale out SAN with EMC ScaleIO is better than “free” with Ceph. Virtual Geek is ready! Are you?

Categories

Archives

Connect with us on Twitter