Archive for the ‘Analytics’ Category

Unwrapping Machine Learning

Ashvin Naik

Cloud Infrastructure Marketing at Dell EMC

In a recent IDC spending guide titled Worldwide cognitive systems and artificial intelligence spending guide,   some fantastic numbers were thrown out in terms of opportunity and growth 50+ % CAGR, Verticals pouring in billions of dollars on cognitive systems. One of the key components of cognitive systems is Machine Learning.

According to wikipedia Machine Learning is a subfield of computer science that gives the computers the ability to learn without being explicitly programmed. Just these two pieces of information were enough to get me interested in the field.

After hours of daily  searching, digging through inane babble and noise across the internet, the understanding of how machines can learn evaded me for weeks, until I hit a jackpot. A source, that should not be named pointed me to a “secure by obscurity” share that had the exact and valuable insights on machine learning. It was so simple, elegant and completely made sense to me.

Machine Learning was not all noise, it worked on a very simple principle. Imagine, there is a pattern in this world that can be used to forecast or predict a behavior of any entity. There is no mathematical notation available to describe the pattern, but if you have the data that can be used to plot the pattern, you can use Machine Learning to model it.  Now, this may sound like a whole lot of mumbo jumbo but allow me to break it down in simple terms.

Machine learning can be used to understand patterns so you can forecast or predict anything provided

  • You are certain there is a pattern
  • You do not have a mathematical model to describe the pattern
  • You have the data to try to figure out the pattern.

Viola, this makes so much sense already. If you have data, know there is a pattern but don’t know what that is, you can use machine learning to find it out. The applications for this are endless from natural language processing, speech to text to predictive analytics. The most important is forecasting- something we do not give enough credit these days. The Most critical component of Machine Learning is Data – you should have the data. If you do not have data, you cannot find the pattern.

As a cloud storage professional, this is a huge insight. You should have data. Pristine, raw data coming from the systems that generate it- sort of like a tip from the horses mouth. I know exactly where my products fit in. We are able to ingest, store, protect and expose the data for any purposes in the native format complete with the metadata all through one system.

We have customers in the automobile industry leveraging our multi-protocol cloud storage across 2300 locations in Europe capturing data from cars on the roads. They are using proprietary Machine Learning systems to look for patterns in how their customers- the car owners use their products in the real world to predict the parameters of designing better, reliable and efficient cars. We have customers in the life-sciences business saving lives by looking at the patterns of efficacy and effective therapies for terminal diseases. Our customers in retail are using Machine Learning to detect fraud and protect their customers. This goes on and on and on.

I personally do not know the details of how they make it happen, but this is the world of the third platform. There are so many possibilities and opportunities ahead if only we have the data. Talk to us and we can help you capture, store and secure your data so you can transform humanity for the better.

 

Learn more about how Dell EMC Elastic Cloud Storage can fit into your Machine Learning Infrastructure

 

 

When It Comes To Data, Isolation Is The Enemy Of Insights

Brandon Whitelaw

Senior Director of Global Sales Strategy for Emerging Technologies Division at Dell EMC

Latest posts by Brandon Whitelaw (see all)

Within IT, data storage, servers and virtualization, there have always been ebbs and flows of consolidation and deconsolidation. You had the transition from terminals to PCs and now we’re going back to virtual desktops – it flows back and forth from centralized to decentralized. It’s also common to see IT trends repeat themselves.

dataIn the mid to late 90s, the major trend was to consolidate structured data sources into a single platform; to go from direct detached storage with dedicated servers per application to a consolidated central storage piece, called a storage array network (SAN). SANs allowed organizations to go from a shared nothing architecture (SN) to a shared everything architecture (SE), where you have a single point of control, allowing users to share available resources and not have data trapped or siloed within the independent direct detached storage systems.

The benefit of consolidation has been an ongoing IT trend that continues to repeat itself on a regular basis, whether it’s storage, servers or networking. What’s interesting is once you consolidate all the data sources, IT is able to finally look at doing more with them. The consolidation onto a SAN enables cross analysis of data sources that were otherwise previously isolated from each other. This was simply practically infeasible to do before. Now that these sources are in one place, this enables the emergence of systems such as an enterprise data warehouse, which is the concept of ingesting and transforming all the data on a common scheme to allow for reporting and analysis. Companies embracing this process led to growth in IT consumption because of the value gained from that data. It also led to new insights, resulting in most of the world’s finance, strategy, accounting, operations and sales groups all relying on the data they get from these enterprise data warehouses.

Next, companies started giving employees PCs, and what do you do on PCs? Create files. Naturally, the next step is to ask, “How do I share these files?” and “How do I collaborate on these files?” The end result is home directories and file shares. From an infrastructure perspective, there needed to be a shared common platform for this data to come together. Regular PCs can’t talk to a SAN without direct block level access, a fiber channel, or being connected in the data center to a server, so unless you want everyone to physically sit in the data center, you run Ethernet.

Businesses ended up building Windows file servers to be the middleman brokering the data between the users on Ethernet and the backend SAN. This method worked until companies reached the point where the Windows file servers steadily grew to dozens. Yet again, this led to IT teams being left with complexity, inefficiency and facing the original problem of having several isolated silos of data and multiple different points of management.

So what’s the solution? Let’s take the middleman out of this. Let’s take the file system that was sitting on top of the file servers and move it directly onto the storage system and allow Ethernet to go directly to it. Thus the network-attached storage (NAS) was born.

However, continuing the cycle, what started as a single NAS eventually became dozens for organizations. Each NAS device contained specific applications with different performance characteristics and protocol access. Also, each system could only store so much data before it didn’t have enough performance to keep up, so systems would continue expanding and replicating to accommodate.

This escalates until an administrator is startled to realize 80 percent of his/her company’s data being created is unstructured. The biggest challenge of unstructured data is that it’s not confined to the four walls of a data center. Once again, we find ourselves with silos that aren’t being shared (notice the trend repeating itself?). Ultimately, this creates the need for scale-out architecture with multiprotocol data access that can combine and consolidate unstructured data sources to optimize collaboration.

Doubling every two years, unstructured data is the vast majority of all data being created. Traditionally, the approach to gaining insights from this data has involved building yet another silo, which prevents having a single source of istock_000048860836_largetruth and having your data in one place. Due to the associated cost and the complexity, not all of the data goes into a data lake, for instance, but only sub-samples of the data that are relevant to that individual query. An option to ending this particular cycle is investing in a storage system that not only has the protocol access and tiering capabilities to consolidate all your unstructured data sources, but can also serve as your analytics platform. Therefore your primary storage, the single source of truth that comes with it and that ease of management will lend itself to become that next phase, which is unlocking its insights.

Storing data is typically viewed as a red-ink line item, but it can actually be to your benefit. Not because of regulation or policies dictating it, but as a deeper, wider set of data that can provide better answers. Often, you may not know what questions to ask until you’re able to see data sets together. Consider the painting technique, pointillism. If you look too closely, it’s just a bunch of dots of paint. However, if you stand back, a landscape emerges, ladies with umbrellas materialize and suddenly you realize you’re staring at Georges Seurat’s famous panting, A Sunday Afternoon on the Island of La Grande Jatte. Similar to pointillism, with data analytics, you never think of connecting the dots if you don’t even realize they’re next to one another.

Follow Dell EMC

Categories

Archives

Connect with us on Twitter