Posts Tagged ‘data’

At the Speed of Light

Keith Manthey

CTO of Analytics at EMC Emerging Technologies Division

For the last year, an obvious trend in analytics has begun to emerge. Batch analytics are getting bigger and real-time analytics are getting faster.  This divergence has never been more apparent then as of late.

Batch Analytics

Batch analytics primarily compose the arena of descriptive analytics, massive scale analytics, and online model development. Descriptive analytics are still the main purview of data warehouses, but Hadoop has expanded the capabilities to ask “What If” questions with far more data types and analytics capabilities. The size of some Hadoop descriptive analytics installations have reached rather massive scale.

The documented successes of massive scale analytics is well trod. Cross data analytics (like disease detection with multiple data sets), time-series modeling, and anomaly detection rank are particularly impressive due to their depth of adoption in several verticals. The instances in health care analytics with Hadoop alone in the past year are numerous and show the potential of this use case to provide amazing insights into caring for our aging population as well as healing rather bespoke diseases.

Model development is an application that effectively highlights the groundbreaking potential that can be unlocked through Hadoop’s newest capabilities and analytics. Creating real-time models based upon trillions of transactions for a hybrid architecture is a good example of this category. Due to the small percentage of records that actually occur as real fraud on a daily basis, trillions of transactions are required for a fraud model to be certified as effective. The model is then deployed into production, which is often a real-time system.

One data point for the basis of my belief that batch is getting “bigger” is that I have been engaged in no less than 10 Hadoop clusters that have crossed the 50 PB threshold this year alone. In each case, the cluster has hit a logical pause point, causing customers  to re-evaluate the architecture and operations. This may be due to cost, scale, limitations, or other catalysts.  These are often the times when I am engaged with these customers. Not every client reaches these catalysts at consistent sizes or times, so it’s interesting that 10 clusters greater than 50 PB have hit this in 2017 alone. Nonetheless, Hadoop continues to capture new records of customers setting all-time size limits on their Hadoop cluster size.

Real-Time Analytics

While hybrid analytics were certainly in vogue last year, real-time or streaming analytics appear to be the hottest trend as of late. Real-time analytics, such as efforts to combat fraud authorizations, are not new endeavors. Why is the latest big push for streaming analytics now the “new hot thing”? There are several factors at play.

Data is growing at an ever increasing rate. One contributing factor can effectively be categorized as “whether to store or not to store.” While this step takes place usually in conjunction with more complex processes, one aspect that is clearly apparent is some form of analytics to decide if the data is useful.  Not every piece of data is valuable and an enormous amount of data is being generated. Determining if there is value in using batch storage for a particular artifact of data is one use for real-time analytics.

Moving up the value chain, the more significant factor at play is that the value proposition of real-time far outweighs the value proposition in batch. However, this doesn’t mean that batch and real-time are de-coupled or not symbiotic in some ways. In high-frequency trading, fraud authorization detection, cyber security, and other streaming use cases, the value of gaining insights in real time versus several days can be especially critical. Real-time systems have historically not relied upon Hadoop for their architectures, which has not gone unnoticed by some traditional Hadoop ecosystem tools like Spark.  The University of California Berkeley recently shifted the focus of its AMP Labs to create RISELabs, greenlighting projects such as Drizzle that aim to bring low-latency streaming capabilities to Spark. The ultimate goal of Drizzle and RISELabs is to increase the viability of Spark for real-time, non-Hadoop workloads. The emphasis on creating lower latency tools will certainly escalate the usage of streaming analytics, as real time continues to get “faster.”

The last factor is the “Internet of Everything,” often referred to as “IoT” or “M2M.” While sensors are top of mind, most companies are still finding their way in this new world of streaming sensor data. Highly technologically advanced use cases and designs are already in place, but the installs are still very bespoke and limited in nature. The mass adoption is still a work in progress. The theoretical value of this data for use in governance analytics or the analytics of improving business operations is massive. Given the dearth of data, storage in batch is not a feasible alternative at scale. As such, most of the analytics of IoT are streaming-based capabilities. The value proposition is still truly outstanding and IoT analytics remain in the hype phase. The furor and spending is in full-scale deployment regardless.

In closing, the divergence of analytics is growing between batch and online analytics. The symbiotic relationship remains strong, but the architectures are quickly separating. Most predictions from IDC, Gartner, and Forrester indicate streaming analytics will grow at a far greater rate than batch analytics due to most of the factors above. It will be interesting to see how this trend continues to manifest itself.  Dell EMC is always interested in learning more about specific use cases, and we welcome your stories on how these trends are impacting your business.

Dispelling Common Misperceptions About Cloud-Based Storage Architectures

As the media and entertainment industry moves to 4K resolution and virtual/augmented content formats, the storage and archive requirements for media content has grown exponentially. But while storage requirements continue to skyrocket, industry revenue has not grown accordingly – and M&E organizations are finding themselves challenged to “do more with less.” More organizations are looking to leverage the cost efficiencies, scalability and flexibility that cloud storage can offer, but many remain apprehensive about taking the plunge.

To be clear, in this post when we talk about “the cloud,” we’re talking cloud architectures, versus the public cloud provided by vendors such as Microsoft, AWS and Google, among others. Unlike public clouds, cloud architectures can be used completely within your facility if desired and they are designed with infinite scalability and ease of access in mind.

There are a number of misperceptions about moving data to cloud architectures that are (wait for it) clouding people’s judgment. It’s time we busted some of the bigger myths and misperceptions out there about cloud storage.

Myth #1: I’ll have to learn a whole new interface – false! Dell EMC’s Elastic Cloud Storage (ECS) employs a tiered system, where it sits under a file system – in our case, Isilon. For organizations already deploying Isilon SAN or NAS storage platforms, the workflows stay exactly as they were, as does users’ interface to the file system.

This tiered approach helps companies to “do more with less” by allowing them to free up primary storage and consolidate resources. By tiering down “cold,” inactive data to ECS, you can better optimize your tier-one higher performance storage and drive down costs.

Myth #2: My data won’t be safe in the cloud – false! ECS features a geo-efficient architecture that stores, distributes and protects data both locally and geographically, eliminating any single point of failure and providing a seamless failover from site to site with no impact to business. Further, even though the data within ECS is distributed, it’s still a secure, private environment so users won’t run into scenarios where anyone can access information without the right credentials.

Myth #3: Collaboration and access is going to be negatively impacted – false! If you look at the VFX industry, for example, teams are frequently spread across the world and working across time zones on a 24/7 basis. ECS enables global teams to work on the same piece of data at the same time from one system – it’s true collaboration. ECS’s multi-site, active-active architecture and universal accessibility enables anywhere access to content from any application or device.

Myth #4: Moving to the cloud is an all-or-nothing approach – false! ECS can be deployed when your organization is ready for it – whether that’s in a month, or six months, or a year. We realize a lot of operations personnel like to “see” their data and know first-hand that it’s there. We get that. But as things evolve, it’s likely that organizations will face pressure to take at least some of the data offsite. With ECS, you can still keep your data in the data center and, when the time is right to take your data off-site, Dell EMC can work with your organization to move your infrastructure to a hosted facility or a co-lo where you can continue to access your data just as you did when it was on-premise. ECS is available in a variety of form factors that can be deployed and expanded incrementally, so you can choose the right size for your immediate needs and project growth.

Because it is designed with “limitless scale” in mind, ECS eliminates concerns and worries of running out of storage, it can meet the needs for today’s M&E organizations, as well as those in the future simply by adding additional storage, just as you used to do with tapes.

Hopefully we’ve been able to bust a few of the myths around adopting a cloud-based storage architecture. This video featuring Dell EMC’s Tom Burns and Manuvir Das can offer additional insight into ECS’s tiered approach and how media organizations can begin seeing benefits from day one.

Stay current with Media & Entertainment industry trends here or listen to Broadcast Workflows webcast recording.

Categories

Archives

Connect with us on Twitter