Posts Tagged ‘Data Lake’

Unwrapping Machine Learning

Ashvin Naik

Cloud Infrastructure Marketing at Dell EMC

In a recent IDC spending guide titled Worldwide cognitive systems and artificial intelligence spending guide,   some fantastic numbers were thrown out in terms of opportunity and growth 50+ % CAGR, Verticals pouring in billions of dollars on cognitive systems. One of the key components of cognitive systems is Machine Learning.

According to wikipedia Machine Learning is a subfield of computer science that gives the computers the ability to learn without being explicitly programmed. Just these two pieces of information were enough to get me interested in the field.

After hours of daily  searching, digging through inane babble and noise across the internet, the understanding of how machines can learn evaded me for weeks, until I hit a jackpot. A source, that should not be named pointed me to a “secure by obscurity” share that had the exact and valuable insights on machine learning. It was so simple, elegant and completely made sense to me.

Machine Learning was not all noise, it worked on a very simple principle. Imagine, there is a pattern in this world that can be used to forecast or predict a behavior of any entity. There is no mathematical notation available to describe the pattern, but if you have the data that can be used to plot the pattern, you can use Machine Learning to model it.  Now, this may sound like a whole lot of mumbo jumbo but allow me to break it down in simple terms.

Machine learning can be used to understand patterns so you can forecast or predict anything provided

  • You are certain there is a pattern
  • You do not have a mathematical model to describe the pattern
  • You have the data to try to figure out the pattern.

Viola, this makes so much sense already. If you have data, know there is a pattern but don’t know what that is, you can use machine learning to find it out. The applications for this are endless from natural language processing, speech to text to predictive analytics. The most important is forecasting- something we do not give enough credit these days. The Most critical component of Machine Learning is Data – you should have the data. If you do not have data, you cannot find the pattern.

As a cloud storage professional, this is a huge insight. You should have data. Pristine, raw data coming from the systems that generate it- sort of like a tip from the horses mouth. I know exactly where my products fit in. We are able to ingest, store, protect and expose the data for any purposes in the native format complete with the metadata all through one system.

We have customers in the automobile industry leveraging our multi-protocol cloud storage across 2300 locations in Europe capturing data from cars on the roads. They are using proprietary Machine Learning systems to look for patterns in how their customers- the car owners use their products in the real world to predict the parameters of designing better, reliable and efficient cars. We have customers in the life-sciences business saving lives by looking at the patterns of efficacy and effective therapies for terminal diseases. Our customers in retail are using Machine Learning to detect fraud and protect their customers. This goes on and on and on.

I personally do not know the details of how they make it happen, but this is the world of the third platform. There are so many possibilities and opportunities ahead if only we have the data. Talk to us and we can help you capture, store and secure your data so you can transform humanity for the better.

 

Learn more about how Dell EMC Elastic Cloud Storage can fit into your Machine Learning Infrastructure

 

 

Why Healthcare IT Should Abandon Data Storage Islands and Take the Plunge into Data Lakes

One of the most significant technology-related challenges in the modern era is managing data growth. As healthcare organizations leverage new data-generating technology, and as medical record retention requirements evolve, the exponential rise in data (already growing at 48 percent each year according to the Dell EMC Digital Universe Study) could span decades.

Let’s start by first examining the factors contributing to the healthcare data deluge:

  • Longer legal retention times for medical records – in some cases up to the lifetime of the patient.
  • Digitization of healthcare and new digitized diagnostics workflows such as digital pathology, clinical next-generation sequencing, digital breast tomosynthesis, surgical documentation and sleep study videos.
  • With more digital images to store and manage, there is also an increased need for bigger picture archive and communication system (PACS) or vendor-neutral archive (VNA) deployments.
  • Finally, more people are having these digitized medical tests, (especially given the large aging population) resulting in a higher number of yearly studies with increased data sizes.

Healthcare organizations also face frequent and complex storage migrations, rising operational costs, storage inefficiencies, limited scalability, increasing management complexity and storage tiering issues caused by storage silo sprawl.

Another challenge is the growing demand to understand and utilize unstructured clinical data. To mine this data, a storage infrastructure is necessary that supports the in-place analytics required for better patient insights and the evolution of healthcare that enables precision medicine.

Isolated Islands Aren’t Always Idyllic When It Comes to Data

The way that healthcare IT has approached data storage infrastructure historically hasn’t been ideal to begin with, and it certainly doesn’t set up healthcare organizations for success in the future.

Traditionally, when adding new digital diagnostic tools, healthcare organizations provided a dedicated storage infrastructure for each application or diagnostic discipline. For example, to deal with the growing storage requirements of digitized X-rays, an organization will create a new storage system solely for the radiology department. As a result, isolated storage siloes, or data islands, must be individually managed, making processes and infrastructure complicated and expensive to operate and scale.

Isolated siloes further undermine IT goals by increasing the cost of data management and compounding the complexity of performing analytics, which may require multiple copies of large amounts of data copied into another dedicated storage infrastructure that can’t be shared with other workflows. Even the process of creating these silos is involved and expensive because tech refreshes require migrating medical data to new storage. Each migration, typically performed every three to five years, is labor-intensive and complicated. Frequent migrations not only strain resources, but take IT staff away from projects aimed at modernizing the organization, improving patient care and increasing revenue.

Further, silos make it difficult for healthcare providers to search data and analyze information, preventing them from gaining the insights they need for better patient care. Healthcare providers are also looking to tap potentially important medical data from Internet-connected medical devices or personal technologies such as wireless activity trackers. If healthcare organizations are to remain successful in a highly regulated and increasingly competitive, consolidated and patient-centered market, they need a simplified, scalable data management strategy.

Simplify and Consolidate Healthcare Data Management with Data Lakes

The key to modern healthcare data management is to employ a strategy that simplifies storage infrastructure and storage management and supports multiple current and future workflows simultaneously. A Dell EMC healthcare data lake, for example, leverages scale-out storage to house data for clinical and non-clinical workloads across departmental boundaries. Such healthcare data lakes reduce the number of storage silos a hospital uses and eliminate the need for data migrations. This type of storage scales on the fly without downtime, addressing IT scalability and performance issues and providing native file and next-generation access methods.

Healthcare data lake storage can also:

  • Eliminate storage inefficiencies and reduce costs by automatically moving data that can be archived to denser, more cost-effective storage tiers.
  • Allow healthcare IT to expand into private, hybrid or public clouds, enabling IT to leverage cloud economies by creating storage pools for object storage.
  • Offer long-term data retention without the security risks and giving up data sovereignty of the public cloud; the same cloud expansion can be utilized for next-generation use cases such as healthcare IoT.
  • Enable precision medicine and better patient insights by fostering advanced analytics across all unstructured data, such as digitized pathology, radiology, cardiology and genomics data.
  • Reduce data management costs and complexities through automation, and scale capacity and performance on demand without downtime.
  • Eliminate storage migration projects.

 

The greatest technical challenge facing today’s healthcare organizations is the ability to effectively leverage and manage data. However, by employing a healthcare data management strategy that replaces siloed storage with a Dell EMC healthcare data lake, healthcare organizations will be better prepared to meet the requirements of today’s and tomorrow’s next-generation infrastructure and usher in advanced analytics and new storage access methods.

 

Get your fill of news, resources and videos on the Dell EMC Emerging Technologies Healthcare Resource Page

 

 

Your Data Lake Is More Powerful and Easier to Operate with New Dell EMC Isilon Products

Karthik Ramamurthy

Director Product Management
Isilon Storage Division at Dell EMC

Earlier this year Dell EMC released a suite of Isilon products designed to enable your company’s data lake journey. Together IsilonSD Edge, Isilon OneFS 8.0, and Isilon CloudPools transformed the way your organization stores and uses data by harnessing the power of the data lake. Today we are pleased to announce all three of these products have been updated and further enhanced to make your data lake even more powerful and easier to operate from edge to core to cloud.

Starting with the release of OneFS 8.0.1

OneFS 8.0.1 builds on the powerful platform provided by OneFS 8.0 released in February 2016. The intent of this newest release is to provide features important to unique customer datacenter workflows, enhance usability and manageability of OneFS clusters. In addition, OneFS 8.0.1 is the first release that takes full advantage of the non-disruptive upgrade and rollback framework introduced in OneFS 8.0.

Let’s review some of the most compelling features of this software release.

Improved Management, Monitoring, Security and Performance for Hadoop on Isilon

Expanding on the Data Lake, one of the focus areas of this new release was increasing the scope and usefulness of our integration with leading Hadoop management tools. OneFS 8.0.1 delivers support for and integration with Apache Ambari 2.4 and Ranger. A single management point now allows Ambari operators to seamlessly manage and monitor Hadoop clusters with OneFS as the HDFS storage layer. Ranger is an important security management tool for Hadoop.  These Ambari and Ranger integration features benefit all customers using Hortonworks and ODP-I compliant Hadoop distributions with OneFS.

Additionally OneFS 8.0.1 adds new features including Kerberos encryption to secure and encrypt data between HDFS clients and OneFS. In addition, Datanode load balancing avoids overloading nodes and increases cluster resilience. OneFS 8.0.1 also supports the following HFDS distributions: Hortonworks HDP 2.5, Cloudera CDH 5.8.0, and IBM Open Platform (IOP) 4.1.

Introducing Scale-Out NAS with SEC Compliance and Asynchronous Replication for Disaster Recovery

With OneFS 8.0.1, Isilon becomes the first and only Scale-Out NAS vendor that offers SEC-17a4 compliance via SmartLock Compliance Mode combined with the asynchronous replication to secondary or standby clusters via SyncIQ. This powerful combination means companies that must comply with SEC-17a4 are no longer caught in a choice between compliance and data recovery – with OneFS 8.0.1 they have both!

Storage Efficiency Designed for the Healthcare Diagnostic Imaging Needs

For many years PACS (Picture Archiving and Communication System) applications diagnostic imaging data was stored in large “container” files for maximum storage efficiency. In recent years, the way referring physicians’ access individual diagnostic images changed and, as a result, the methods used to store diagnostic imaging files had to change as well. OneFS 8.0.1 has a new storage efficiency feature specifically designed for the Healthcare PACS archive market to provide significantly improved storage efficiency for diagnostic imaging files.  Isilon customers can expect to see storage efficiency similar to OneFS’s large file storage efficiency for diagnostic imaging files when using this feature.   you leverage Isilon to store your PACS application data you will want to talk with your sales representative and learn more about this new feature.

Upgrade with Confidence

OneFS 8.0, released in February 2016, provided the framework for non-disruptive upgrades for all supported upgrades going forward and the addition of release rollback. OneFS 8.0.1 is the first OneFS release that you will be able to test and validate and, if needed, rollback to the previously installed 8.0.x release. This means that you can non-disruptively upgrade to 8.0.1, without impacting users or applications! You will be able to upgrade sets of nodes or the entire cluster for your testing and validation and then, once complete, you decide to commit the upgrade or rollback to the prior release. Once committed to OneFS 8.0.1, future upgrades will be even easier and more transparent with the ability to view an estimate of how long an upgrade will take to complete and transparency of the upgrade process. The WebUI was enhanced to make upgrade management even easier than before.

Manage Performance Resources like Never Before

Even more exciting is the new Performance Resource Management framework introduced in OneFS 8.0.1. This framework is the start of a revolutionary scale-out NAS performance management system. In OneFS 8.0.1 you will be able to obtain and view statistics on the performance resources (CPU, operations, data read, data written, etc.) for OneFS jobs and services. This will allow you to identify quickly if a particular job or service may be the cause of performance issues. These statistics are available via the CLI, Platform API and can be visualized with InsightIQ 4.1. In future releases these capabilities will be expanded to clients, IP addresses, users, protocols and !

These are just some of the new features OneFS 8.0.1 has to offer. OneFS 8.0.1 improves on our support for MAC OS clients, SMB, audit, NDMP and data migrations, to name a few other areas.  The white paper, Technical Overview of New and Improved Features of EMC Isilon OneFS 8.0., provides additional details on these and other new and improved features in OneFS 8.0.1

Isilon SD Edge Management Server version 1.0.1

This July EMC released a new version of IsilonSD Edge Management Server. Version 1.0.1 provides support for VMware ESX 6.0 in addition to previously supported ESX versions. This management server also enables monitoring of the IsilonSD Edge Clusters via EMC’s Secure Remote Support (ESRS) server and tools.

Isilon CloudPools Just Got Easier to Manage

OneFS 8.0.1 provides improved flexibility for CloudPools deployments in the enterprise with the introduction of proxy support. This allows administrators to specify one or more proxy servers between the Isilon cluster and your cloud provider of .

The Data Lake Journey is Just Beginning!

OneFS 8.0.1 is an important step on the data lake journey; however, you can rest assured we are not stopping here! Look forward to amazing new hardware and software features in coming releases as we build on the Performance Resource Management Framework, provide more workload specific enhancements to address our customers’ needs and deliver new levels of supportability, serviceability, scale and performance.   Don’t wait, upgrade .  Click here to download OneFS 8.0.1.

Analyst firm IDC evaluates EMC Isilon: Lab-validation of scale-out NAS file storage for your enterprise Data Lake

Suresh Sathyamurthy

Sr. Director, Product Marketing & Communications at EMC

A Data Lake should now be a part of every big data workflow in your enterprise organization. By consolidating file storage for multiple workloads onto a single shared platform based on scale-out NAS, you can reduce costs and complexity in your IT environment, and make your big data efficient, agile and scalable.

That’s the expert opinion in analyst firm IDC’s recent Lab Validation Brief: “EMC Isilon Scale-Out Data Lake Foundation: Essential Capabilities for Building Big Data Infrastructure”, March 2016. As the lab validation report concludes: “IDC believes that EMC Isilon is indeed an easy-to-operate, highly scalable and efficient Enterprise Data Lake Platform.

The Data Lake Maximizes Information Value

The Data Lake model of storage represents a paradigm shift from the traditional linear enterprise data flow model. As data and the insights gleaned from it increase in value, enterprise-wide consolidated storage is transformed into a hub around which the ingestion and consumption systems work. This enables enterprises to bring analytics to data in-place – and avoid expensive costs of multiple storage systems, and time for repeated ingestion and analysis.

But pouring all your data into a single shared Data Lake would put serious strain on traditional storage systems – even without the added challenges of data growth. That’s where the virtually limitless scalability of EMC Isilon scale-out NAS file storage makes all the difference…

The EMC Data Lake Difference

The EMC Isilon Scale-out Data Lake is an Enterprise Data Lake Platform (EDLP) based on Isilon scale-out NAS file storage and the OneFS distributed file system.

As well as meeting the growing storage needs of your modern datacenter with massive capacity, it enables big data accessibility using traditional and next-generation access methods – helping you manage data growth and gain business value through analytics. You can also enjoy seamless replication of data from the enterprise edge to your core datacenter, and tier inactive data to a public or private cloud.

We recently reached out to analyst firm IDC to lab-test our Isilon Data Lake solutions – here’s what they found in 4 key areas…

  1. Multi-Protocol Data Ingest Capabilities and Performance

Isilon is an ideal platform for enterprise-wide data storage, and provides a powerful centralized storage repository for analytics. With the multi-protocol capabilities of OneFS, you can ingest data via NFS, SMB and HDFS. This makes the Isilon Data Lake an ideal and user-friendly platform for big data workflows, where you need to ingest data quickly and reliably via protocols most suited to the workloads generating the information. Using native protocols enables in-place analytics, without the need for data migration, helping your business gain more rapid data insights.

datalake_blog

IDC validated that the Isilon Data Lake offers excellent read and write performance for Hadoop clusters accessing HDFS via OneFS, compared against via direct-attached storage (DAS). In the lab tests, Isilon performed:

  • nearly 3x faster for data writes
  • over 1.5x faster for reads and read/writes.

As IDC says in its validation: “An Enterprise Data Lake platform should provide vastly improved Hadoop workload performance over a standard DAS configuration.”

  1. High Availability and Resilience

Policy-based high availability capabilities are needed for enterprise adoption of Data Lakes. The Isilon Data Lake is able to cope with multiple simultaneous component failures without interruption of service. If a drive or other component fails, it only has to recover the specific affected data (rather than recovering the entire volume).

IDC validated that a disk failure on a single Isilon node has no noticeable performance impact on the cluster. Replacing a failed drive is a seamless process and requires little administrative effort. (This is in contrast to traditional DAS, where the process of replacing a drive can be rather involved and time consuming.)

Isilon can even cope easily with node-level failures. IDC validated that a single-node failure has no noticeable performance impact on the Isilon cluster. Furthermore, the operation of removing a node from the cluster, or adding a node to the cluster, is a seamless process.

  1. Multi-tenant Data Security and Compliance

Strong multi-tenant data security and compliance features are essential for an enterprise-grade Data Lake. Access zones are a crucial part of the multi-tenancy capabilities of the Isilon OneFS. In tests, IDC found that Isilon provides no-crossover isolation between Hadoop instances for multi-tenancy.

Another core component of secure multi-tenancy is the ability to provide a secure authentication and authorization mechanism for local and directory-based users and groups. IDC validated that the Isilon Data Lake provides multiple federated authentication and authorization schemes. User-level permissions are preserved across protocols, including NFS, SMB and HDFS.

Federated security is an essential attribute of an Enterprise Data Lake Platform, with the ability to maintain confidentiality and integrity of data irrespective of the protocols used. For this reason, another key security feature of the OneFS platform is SmartLock – specifically designed for deploying secure and compliant (SEC Rule 17a-4) Enterprise Data Lake Platforms.

In tests, IDC found that Isilon enables a federated security fabric for the Data Lake, with enterprise-grade governance, regulatory and compliance (GRC) features.

  1. Simplified Operations and Automated Storage Tiering

The Storage Pools feature of Isilon OneFS allows administrators to apply common file policies across the cluster locally – and extend them to the cloud.

Storage Pools consists of three components:

  • SmartPools: Data tiering within the cluster – essential for moving data between performance-optimized and capacity-optimized cluster nodes.
  • CloudPools: Data tiering between the cluster and the cloud – essential for implementing a hybrid cloud, and placing archive data on a low-cost cloud tier.
  • File Pool Policies: Policy engine for data management locally and externally – essential for automating data movement within the cluster and the cloud.

As IDC confirmed in testing, Isilon’s federated data tiering enables IT administrators to optimize their infrastructure by automating data placement onto the right storage tiers.

The expert verdict on the Isilon Data Lake

IDC concludes that: “EMC Isilon possesses the necessary attributes such as multi-protocol access, availability and security to provide the foundations to build an enterprise-grade Big Data Lake for most big data Hadoop workloads.”

Read the full IDC Lab Validation Brief for yourself: “EMC Isilon Scale-Out Data Lake Foundation: Essential Capabilities for Building Big Data Infrastructure”, March 2016.

Learn more about building your Data Lake with EMC Isilon.

Soon you won’t say Travel Safe, instead you’ll say Travel Smart!

Keith Manthey

CTO of Analytics at EMC Emerging Technologies Division

As a frequent traveler myself, I can appreciate this situation.  A lone traveler is enjoying a quiet evening in their hotel.   As they unwind from the day, they peruse the local paper.  They are shocked to learn that their attempt at returning home the next day will be dashed by transit strikes.  All modes of public transportation will be shutdown causing an ill-timed exit from their current travel stop.  There are certainly other ways for the traveler to reach the airport, but the 5x surge pricing for their popular ride sharing application makes it an expensive trip.  There is also an expectation that the ride sharing application drivers might face violence from striking transit workers.  This all could have been avoided if their company subscribed to a travel alert for pending situations.  The advent of situational awareness tools that can monitor travel threats and pair that to traveler itineraries is an evolving field.  It is an advance warning to that weary traveler that forewarns them to seek personal safety and adjust their travel plans accordingly.  In the case of our weary traveler, an advance warning would allow them to change their travel plans in time to avoid this sticky situation.

(more…)

Is the Media & Entertainment industry in the middle of a major transformation?

Tom "TV" Burns

CTO, Media & Entertainment at EMC

Media Companies in 2016 find themselves “in the middle of things”:

  • a “peak TV” environment in which cable channels are bursting with high-quality scripted episodic programming
  • an on-demand subscriber environment that has revolutionized the way consumers find and watch entertainment
  • immersive environments which have changed the grammar of visual storytelling

“We can’t really even call it television anymore…”
When I took this gig as CTO, Media & Entertainment for EMC, one thing that intrigued me about the role was how EMC thought differently about the media industry: they focused on helping media companies create and deliver great stories. We do this by leveraging the evolving concepts from traditional enterprise IT such as software-defined storage and converged infrastructure to provide adaptable carrier-grade solutions and a future-proof infrastructure to their media customers. This year is no different as we’re introducing a host of new features to our flagship Isilon product—features that expand a media organization’s reach from “Edge to Core to Cloud.” (I’ll talk about how this relates to media organizations in a sec…)

“The software-defined media facility”
Broadcasters, advertiser- and subscriber-supported OTT providers, content creators, aggregators and the entire ecosystem of partners and suppliers are really in the “Entertainment Services Delivery” market. Technology infrastructures like software-defined storage and software-defined networking will allow broadcasters to make changes to their workflows programmatically. This is the “new normal” – of adding or migrating new services in days rather than months.

As our media customers have responded to this re-definition of their business, we’ve introduced new features that let them do things they’ve never done before. For example, last year we introduced the concept of a “Media Lake”—essentially a single, high-performance shared resource that let media organizations consolidate their data into a central repository. Rather than copying your data to some other silo, transforming it and copying it back – the idea is to leave your data in one place and execute the value-creation operations (whether it’s encoding, archiving, creative editorial, analytics or what have you) while you leave the data in place – it saves you time, enables collaboration and reduces the possibility of error.

Isilon Media and Entertainment Data Lake
This year we’ve doubled our investment in the idea of a media data lake—announcing “Media Lake 2.0,” which gives our media customers new capabilities that allow productions to extend to “edge” (smaller or remote) locations, optimize their core media operations, and leverage public/private/hybrid clouds.

The new Isilon software upgrades and products being introduced include:

OneFS 8.0—improves availability failover reliability for Isilon storage clusters.  OneFS 8.0 includes multiple features, but the ones that are critical for Media & Entertainment include non-disruptive operating system upgrades, upgrade rollback, and non-disruptive operations.  In addition, OneFS 8.0 incorporates support for SMB3.0 Continuous Availability for increased reliability and availability for Windows clients.

One of our long-standing partners Jeff Snyder Nexio Mediaand premier media industry vendor, Imagine Communications, recently tested OneFS 8.0 in their lab and at one of their customers’ sites.  As a  global leader in video and advertising solutions—nearly half of the world’s video channels traverse through more than three million Imagine Communications products deployed across 185 countries.

That’s why we look to Imagine to give us feedback on our solutions as they definitely know how to stress new technology.  For OneFS 8.0 they were specifically interested in the non-disruptive upgrades and the improvements in SMB 3.0.

In testing, Imagine found the scale-out and failover from EMC Isilon makes it one of the “best” NAS solutions they’ve tested. The continuous availability options (the ability to upgrade or update on-the-fly) and improvements in the SMB protocol gave them functionality and bandwidth “they couldn’t get with other NAS products.”

IsilonSD Edge—provides a software-defined storage solution that allows media companies to quickly and seamlessly gain access to production and talent wherever great stories are created and captured. In addition, as a software-defined storage solution, IsilonSD Edge runs on commodity hardware in a virtual environment—substantially reducing costs and increasing agility. IsilonSD Edge extends the Media Data Lake across geographies, allowing “follow-the-sun” content creation and delivery from remote and branch facilities. Media data lake synchronization with remote assets reduces inefficiencies and accelerates production times by sharing critical production media assets.

EMC Isilon CloudPools—enables Isilon media customers to seamlessly archive or tier assets from their Isilon cluster to an in-house (private) cloud based on EMC ECS, or a choice of public cloud providers, or a combination of both. This gives a media organization the flexibility and simplicity to dynamically expand beyond their current capacity and archive storage into any cloud-based solution. With CloudPools, data that is transmitted to the cloud is encrypted for security purposes and compressed to optimize the network bandwidth usage.

“Ch-Ch-Ch-Ch-Changes…”
The industry continues to change at a dramatic pace, and we realize our customers need to not only adapt and transform but also profit from their efforts.  That’s why we continue to challenge the status quo for our media customers.   Whether it’s jointly developing end-to-end workflows solutions with great industry partners like Imagine Communications, or upgrading the Isilon platform with innovative features like the CloudPools and IsilonSD Edge, our goal is to give our customers a foundation for future-proofing their business and workflow needs.

@TVBurns

Categories

Archives

Connect with us on Twitter