Author Archive

Analyst firm IDC evaluates EMC Isilon: Lab-validation of scale-out NAS file storage for your enterprise Data Lake

Suresh Sathyamurthy

Sr. Director, Product Marketing & Communications at EMC

A Data Lake should now be a part of every big data workflow in your enterprise organization. By consolidating file storage for multiple workloads onto a single shared platform based on scale-out NAS, you can reduce costs and complexity in your IT environment, and make your big data efficient, agile and scalable.

That’s the expert opinion in analyst firm IDC’s recent Lab Validation Brief: “EMC Isilon Scale-Out Data Lake Foundation: Essential Capabilities for Building Big Data Infrastructure”, March 2016. As the lab validation report concludes: “IDC believes that EMC Isilon is indeed an easy-to-operate, highly scalable and efficient Enterprise Data Lake Platform.

The Data Lake Maximizes Information Value

The Data Lake model of storage represents a paradigm shift from the traditional linear enterprise data flow model. As data and the insights gleaned from it increase in value, enterprise-wide consolidated storage is transformed into a hub around which the ingestion and consumption systems work. This enables enterprises to bring analytics to data in-place – and avoid expensive costs of multiple storage systems, and time for repeated ingestion and analysis.

But pouring all your data into a single shared Data Lake would put serious strain on traditional storage systems – even without the added challenges of data growth. That’s where the virtually limitless scalability of EMC Isilon scale-out NAS file storage makes all the difference…

The EMC Data Lake Difference

The EMC Isilon Scale-out Data Lake is an Enterprise Data Lake Platform (EDLP) based on Isilon scale-out NAS file storage and the OneFS distributed file system.

As well as meeting the growing storage needs of your modern datacenter with massive capacity, it enables big data accessibility using traditional and next-generation access methods – helping you manage data growth and gain business value through analytics. You can also enjoy seamless replication of data from the enterprise edge to your core datacenter, and tier inactive data to a public or private cloud.

We recently reached out to analyst firm IDC to lab-test our Isilon Data Lake solutions – here’s what they found in 4 key areas…

  1. Multi-Protocol Data Ingest Capabilities and Performance

Isilon is an ideal platform for enterprise-wide data storage, and provides a powerful centralized storage repository for analytics. With the multi-protocol capabilities of OneFS, you can ingest data via NFS, SMB and HDFS. This makes the Isilon Data Lake an ideal and user-friendly platform for big data workflows, where you need to ingest data quickly and reliably via protocols most suited to the workloads generating the information. Using native protocols enables in-place analytics, without the need for data migration, helping your business gain more rapid data insights.

datalake_blog

IDC validated that the Isilon Data Lake offers excellent read and write performance for Hadoop clusters accessing HDFS via OneFS, compared against via direct-attached storage (DAS). In the lab tests, Isilon performed:

  • nearly 3x faster for data writes
  • over 1.5x faster for reads and read/writes.

As IDC says in its validation: “An Enterprise Data Lake platform should provide vastly improved Hadoop workload performance over a standard DAS configuration.”

  1. High Availability and Resilience

Policy-based high availability capabilities are needed for enterprise adoption of Data Lakes. The Isilon Data Lake is able to cope with multiple simultaneous component failures without interruption of service. If a drive or other component fails, it only has to recover the specific affected data (rather than recovering the entire volume).

IDC validated that a disk failure on a single Isilon node has no noticeable performance impact on the cluster. Replacing a failed drive is a seamless process and requires little administrative effort. (This is in contrast to traditional DAS, where the process of replacing a drive can be rather involved and time consuming.)

Isilon can even cope easily with node-level failures. IDC validated that a single-node failure has no noticeable performance impact on the Isilon cluster. Furthermore, the operation of removing a node from the cluster, or adding a node to the cluster, is a seamless process.

  1. Multi-tenant Data Security and Compliance

Strong multi-tenant data security and compliance features are essential for an enterprise-grade Data Lake. Access zones are a crucial part of the multi-tenancy capabilities of the Isilon OneFS. In tests, IDC found that Isilon provides no-crossover isolation between Hadoop instances for multi-tenancy.

Another core component of secure multi-tenancy is the ability to provide a secure authentication and authorization mechanism for local and directory-based users and groups. IDC validated that the Isilon Data Lake provides multiple federated authentication and authorization schemes. User-level permissions are preserved across protocols, including NFS, SMB and HDFS.

Federated security is an essential attribute of an Enterprise Data Lake Platform, with the ability to maintain confidentiality and integrity of data irrespective of the protocols used. For this reason, another key security feature of the OneFS platform is SmartLock – specifically designed for deploying secure and compliant (SEC Rule 17a-4) Enterprise Data Lake Platforms.

In tests, IDC found that Isilon enables a federated security fabric for the Data Lake, with enterprise-grade governance, regulatory and compliance (GRC) features.

  1. Simplified Operations and Automated Storage Tiering

The Storage Pools feature of Isilon OneFS allows administrators to apply common file policies across the cluster locally – and extend them to the cloud.

Storage Pools consists of three components:

  • SmartPools: Data tiering within the cluster – essential for moving data between performance-optimized and capacity-optimized cluster nodes.
  • CloudPools: Data tiering between the cluster and the cloud – essential for implementing a hybrid cloud, and placing archive data on a low-cost cloud tier.
  • File Pool Policies: Policy engine for data management locally and externally – essential for automating data movement within the cluster and the cloud.

As IDC confirmed in testing, Isilon’s federated data tiering enables IT administrators to optimize their infrastructure by automating data placement onto the right storage tiers.

The expert verdict on the Isilon Data Lake

IDC concludes that: “EMC Isilon possesses the necessary attributes such as multi-protocol access, availability and security to provide the foundations to build an enterprise-grade Big Data Lake for most big data Hadoop workloads.”

Read the full IDC Lab Validation Brief for yourself: “EMC Isilon Scale-Out Data Lake Foundation: Essential Capabilities for Building Big Data Infrastructure”, March 2016.

Learn more about building your Data Lake with EMC Isilon.

The EMC Portfolio Approach to Next-generation Storage Technology

Suresh Sathyamurthy

Sr. Director, Product Marketing & Communications at EMC

EMC is approaching the emerging revolution in storage technology development with a portfolio of advanced tools and solutions. These storage solutions are engineered to adopt and grow with the next-generation of big and unstructured data generated from diversified industries.

As data begins to inundate the marketplace, file types and sizes are becoming more extreme, mutative and mountainous than ever before. IT departments are tasked with managing constant streams of data from the cloud, social platforms, mobile devices and the web, as well as other sources never before realized by traditional BI and on-premise content management infrastructures. Software-defined storage (SDS) represents the leading edge of storage solutions, and EMC is positioned at the helm.

EMC’s depth of industry expertise SDS Portfolioand emerging tech software development experience have informed the creation of a suite of solutions that can be tailored to meet the needs of organizations transitioning to SDS, those working to incorporate SDS technology into their existing storage infrastructure, or those looking to redefine how they use, gather, maintain and manage the constant influx of data on which their businesses depend.

What sets EMC apart is a grander vision for SDS as a driving concept that should span how storage will be delivered now and in the future. EMC is embracing, redefining and disrupting the industry standards to deliver vendor neutral, open-standard APIs that allow products to be used as a standalone platform or part of a full cloud deployment, like OpenStack. Open source community editions increase flexibility and reduce risk.

EMC software-defined storage products improve and provide new means of organization and delivery models for file, block, object, HDFS and hyper-converged storage, as well as next-gen rack scale, data center and hyper scale-out storage. These products include IsilonSD Edge (scale-out file storage), ScaleIO (scale-out block storage) and ECS (cloud-scale object storage). The flexibility of EMC SDS solutions makes them available as appliances or free and frictionless downloadable software that can be installed on industry-standard hardware.

The sophisticated nature of EMC’s broad selection of SDS solutions allows easy integration into existing content management schemes without massive hardware replacement investments, new staff integration or performance downtime. SDS allows complete control over protocols, access, interface and data management within the systems, removing extraneous nodes, access points and machinery from the content management experience. This achieves more streamlined IT processes with reduced hardware, software, training and staffing costs and the ability to grow and adapt as the advancements in data collection and formatting continue.

The portfolio of SDS solutions developed by EMC allows organizations to not only better collect, protect and store next-gen data types and proportions, but to provide the possibility of streamlined growth as technology transforms, providing multi-generational, cost-effective options for the future of IT sectors everywhere.

Learn more about how EMC SDS solutions can prepare your enterprise for the future of big data. Stay up to day on everything SDS at www.emc.com/sds.

New EMC Isilon Products Now Available for Your Data Lake Journey

Suresh Sathyamurthy

Sr. Director, Product Marketing & Communications at EMC

In November 2015, EMC announced the upcoming release of new Isilon products to help organizations expand their data lake to enterprise edge locations and the cloud while strengthening it at the core data center.  We’re now very pleased to report the immediate availability of IsilonSD Edge, Isilon OneFS 8.0 (formerly OneFS.NEXT), and Isilon CloudPools. Together, these products can transform the way your organization stores and uses data—whether at the edge, the core or the cloud – by harnessing the power of the data lake.

A data lake based on EMC Isilon scale-out NAS offers a number of important advantages:  With it, organizations can consolidate file-based, unstructured data, eliminate costly storage silos, simplify management, increase data protection, and gain more value from their data assets.  Leveraging built-in multi-protocol capabilities, Isilon supports a wide range of applications and workloads on a single platform—including data analytics that can be used to gain better insight and identify new opportunities for organizations to accelerate their business.

Start at the Core
A great place to begin your journey to the data lake is at your core data center.  If your organization has not already done so, you’ll want to contact your EMC representative or authorized reseller about consolidating your unstructured data with Isilon storage. They’ll also be able to describe key Isilon advantages including:

  • Simplified management: Single file system, single volume, global namespace
  • Massively scalable: Scales from 16 TB to over 50 PB in a single cluster
  • Unmatched efficiency: Over 80% storage utilization with automated tiering and data deduplication options
  • Enterprise data protection: Efficient backup and disaster recovery, and N+1 thru N+4 redundancy
  • Robust security and compliance options: RBAC, Access Zones,  WORM data security, File System Auditing, Data At Rest Encryption with SEDs, STIG hardening, CAC/PIV Smartcard authentication, FIPS OpenSSL support
  • Operational flexibility: Multi-protocol support including NFS, SMB, HTTP, FTP and HDFS, Object and Cloud computing including OpenStack Swift

If you’re already using Isilon, then you’ll want to take advantage of the new Isilon OneFS 8.0 operating system which extends these benefits by providing enterprise-grade, continuous data center service and non-disruptive upgrade capabilities. It also enables you to extend your data lake to edge locations and the cloud.  You may also download the new EMC Isilon OneFS 8.0 Simulator at no charge for non-production use so that you and your team can create a simulated environment and get a feel for the interface and administration tasks available in the latest Isilon software release.

Extend to the Edge
If your organization has a network of edge locations – including remote and branch offices – that produce and store data locally, your next destination on your data lake journey probably should be the edge.  Our research shows that for most organizations, data at edge locations is growing and they are often inefficient islands of storage, running with limited IT resources and inconsistent data protection practices. Data at the edge is also typically outside of the business data lake, making it difficult to incorporate into data analytics projects.

IsilonSD EdgeEMC IsilonSD Edge addresses these challenges with a Software-Defined Storage solution that combines the power of Isilon scale-out NAS with the economy of industry standard hardware in a VMware ESX environment. IsilonSD Edge simplifies management at the edge while providing up to 36 TB of storage capacity per installation. It also allows you to consolidate edge data to the core and thereby extend the data lake to the edge. It also increases data protection by automatically replicating data to the core.OneFS

IsilonSD Edge is now available in two versions—a ‘free and frictionless’ download for non-production, trial use, and a licensed version for production use which can be obtained through your EMC representative, authorized reseller or the EMC Store.

Integrate with the Cloud
Nearly all businesses today want to leverage the cloud to cut costs, simplify IT management, and gain virtually limitless storage capacity. But the question for many is how to integrate their on premise storage infrastructure with the cloud.  Isilon CloudPools software lets you address rapid data growth and optimize data center storage resources by using the cloud as a highly economical storage tier with massive storage capacity for cold or frozen data that is rarely used or accessed. It also allows you to select from a number of public cloud services or use a private cloud based on EMC Elastic Cloud Storage (ECS) or other EMC alternatives.CloudPools

Isilon CloudPools uses policy-based, automated tiering that enables you to seamlessly integrate with the cloud as an additional storage tier from the Isilon cluster at your core data center.  This enables more valuable on-premise storage resources to be used for more active data and applications. To secure data that is archived in the cloud, CloudPools encrypts data that is transmitted from the Isilon cluster at the core data center to the cloud storage service. This data remains encrypted in the cloud until it is retrieved and returned to the Isilon cluster at the data center.

We look forward to hearing about your own journey to the data lake!

What do Analytics and the Suez Canal have in common?

Suresh Sathyamurthy

Sr. Director, Product Marketing & Communications at EMC

Suez Canal

1859: Egyptian workers under French engineers begin construction of the Suez Canal. A canal across the Isthmus of Suez would cut the ocean distance from Europe to Asia by up to 6,000 miles, and it could be built at sea level, without any locks. Circumventing the additional travel would reduce risk, overhead of additional supplies, and fewer sailors.  Completed ten years later, the effect on the world trade was immediate. This wonder shrunk the globe rapidly at a topography level but also in the time, it traditionally took to gain business and economic benefits.

The economic benefits of the Suez Canal and In-Place Hadoop analytics

My metaphor here is that like sailing the old, 12,000+ mile route around the coast of Africa, the traditional method of storing and moving data for analysis is a long and arduous journey that affects your business and economic benefits. Just as the pre-Suez Canal journey from Europe to Asia required significantly more time, larger ships, more crew and more provisions, the traditional route to analytics requires more time (copying and moving data) bigger ships (3x storage capacity), more crew (IT resources) and more provisions (overhead). Now, imagine taking the EMC data lake route that reduces overhead, takes much less time, and offers increased flexibility. The EMC Isilon data lake with its native Hadoop Distributed File System (HDFS) support is the modern route to actionable results. It effectively brings Hadoop to where your data exists today, as opposed to having to ship and replicate your data to a separate Hadoop stack for analysis.

The Open Data Platform Initiative (ODPi), IBM and EMC Isilon

The Isilon data lake’s shared storage architecture natively supports HDFS, and the ODPi common platform. IBM, EMC, Pivotal and Hortonworks established the ODPi to create a standardized, common platform for Hadoop analytics that enables organizations to realize business results more quickly.  Which brings us to the EMC and IBM analytics collaboration. IBM BigInsights, being a part of the ODPi, means now there’s another choice for in-place analytics with the EMC data lake. And, it quickly became evident to both EMC and IBM that there was a strong customer demand for IBM BigInsights and EMC Isilon to align on a data lake approach to analytics. The EMC and IBM collaboration enables analytics on your data right where it is, within the EMC Isilon data lake, while IBM BigInsights provides the separate compute resources that analyze the data. Now you’re on the expedited route to business analytics with EMC Isilon and IBM.

Whether you are looking to gain a 360-degree view of your customers, attempting to prevent fraud in the financial markets, or making smarter infrastructure investments, the increased efficiencies of the partnership allows you to be nimble in understanding and reacting to what your data is telling you.

About 15,000 ships make the 11-hour journey through the canal each year. It’s estimated that the canal bears roughly 8 percent of the world’s shipping and is recognized as one of the most important waterways in the world. Forrester Research1 predicts big data analytics as the number 2 priority of corporations, and states Hadoop has already disrupted the economics of data. Just as the Suez Canal offers key business benefits for trade between Europe and Asia, so does in-place analytics. Here’s how: Compass

  • No moving and copying of data
  • No 3X replication of data
  • Increased storage utilization efficiency (to an average of 80%)
  • Enterprise data resiliency and availability
  • Enterprise grade security features
  • Quicker time to business insight
  • Smarter infrastructure investments
  • Reduction of CAPEX and OPEX
  • Increased choice and flexibility

In summary, back to the metaphor, the modern route to analytics saves on time to benefit, and can be achieved with smaller ships, less crew, and with fewer provisions required.

Where can I get more details?

The EMC Hadoop Starter Kit for IBM BigInsights is available and has instructions on how to build and deploy IBM BigInsights Open Platform with EMC Isilon. You can also learn more about the Hadoop enabled EMC Data Lake here.

1 Source: Forrester Predictions 2015: Hadoop Will Become a Cornerstone of Your Business Technology Agenda

Data Lake 2.0: Edge to Core to Cloud

Suresh Sathyamurthy

Sr. Director, Product Marketing & Communications at EMC

Welcome to Data Lake 2.0!

Expand the Data LakeEMC has just announced three new products — EMC  IsilonSD Edge, Isilon OneFS.Next (the future version of the Isilon OneFS operating system), and Isilon CloudPools—that combine to enable you to expand your data lake to enterprise edge locations (e.g., remote and branch offices) and the cloud while strengthening the data lake at the core data center.

This new edge-to-core-cloud approach offers new opportunities for you to harness the power of a data lake to increase efficiency, cut costs, improve data protection, and get more value from your data assets.

Opportunities at the Edge
Businesses with networks of remote or branch offices often have challenges managing and protecting data generated at these locations. Data at edge locations is growing. A recent ESG study1 showed that 68% of organizations now have more than 10 TB of data per branch office as compared to just 23% in 2011. Edge locations are often inefficient islands of storage with inconsistent data protection practices. Compounding the problem, IT resources at these locations are typically very limited. Data at the edge is also generally outside the business data lake; this means it can’t easily be incorporated into data analytics projects.

Isilon SD EdgeEMC IsilonSD Edge is designed to address these challenges and offers key advantages for businesses with edge locations. EMC IsilonSD is a new family of Software-Defined Storage products that leverages the power of EMC Isilon scale-out NAS and the Isilon OneFS operating system. IsilonSD Edge is the first offering of the IsilonSD family and provides Software-Defined Storage solutions specifically for edge locations. IsilonSD Edge runs on commodity hardware (x86 servers) in a VMware ESX environment.

IsilonSD Edge will be offered in two versions: a ‘free and frictionless’ download for non-production use and a licensed version for production use. Designed to simplify data and storage management at the edge, IsilonSD Edge scales to 36 TB of storage capacity in a single installation. Importantly, IsilonSD Edge allows you to consolidate data to the core data center and thereby extend the data lake to include edge locations. It also increases data protection at the edge by automatically replicating data to the core.

Opportunities at the Core
A data lake offers many advantages for organizations across a wide range of industries. By consolidating file-based, unstructured data on EMC Isilon—the #1 scale-out NAS storage platform in the industry—you can eliminate costly storage silos, simplify management, increase data protection, and get more value from your data assets. With built-in multi-protocol capabilities, Isilon storage solutions can support a wide range of traditional and next-generation applications on a single platform. This includes powerful Big Data analytics that can help your organization gain insight and identify new opportunities to accelerate your business. The new Isilon OneFS.Next operating system extends these benefits by offering enterprise-grade continuous service for the data center with nondisruptive upgrade capabilities, enabling the new Data Lake 2.0 edge-to-core-cloud approach.

Opportunities at the Cloud
Nearly all businesses today are looking to leverage the cloud to cut costs, simplify IT management, and gain virtually limitless storage capacity. But the question for many is how to integrate their on-premise storage infrastructure with the cloud. The new Isilon CloudPools software addresses this need. CloudPools provides policy-based, automated tiering that lets you to seamlessly integrate with the cloud as an additional storage tier from the Isilon cluster at your data center. This allows you to address rapid data growth and optimize data center storage resources by using the cloud as a highly economical storage tier with massive storage capacity for cold or frozen data that is rarely used or accessed. In this way, more valuable on-premise storage resources may be used for more active data and applications. Another key advantage is that the movement of this data is transparent to users and applications.

Cloudpools 2With CloudPools, you can select from a number of public cloud services, including Amazon AWS and Microsoft Azure, or use a private cloud based on EMC Elastic Cloud Storage. To secure data that is archived in the cloud, CloudPools encrypts data that is transmitted from the Isilon cluster at the core data center to the cloud storage service. This data remains encrypted in the cloud until it is retrieved and returned to the Isilon cluster at the data center.

Tips to Get Started
EMC IsilonSD Edge, Isilon CloudPools, and Isilon OneFS.Next will all be available in early 2016. In the meantime, you’ll want to have your team become more familiar with these new products and learn more about the advantages that Data Lake 2.0 can provide your organization. A great way to get started is to visit our website. If you have any questions, be sure to contact your local EMC sales representative or authorized EMC reseller.

We look forward to hearing about your success!

Source: ESG Research Report, Remote Office/Branch Office Technology Trends, May 2015.

Categories

Archives

Connect with us on Twitter