Archive for the ‘Object Storage’ Category

Galaxy: A Workflow Management System for Modern Life Sciences Research

Nathan Bott

Healthcare Solutions Architect at EMC

Am I a life scientist or an IT data manager? That’s the question many researchers are asking themselves in today’s data-driven life sciences organizations.

Whether it is a bench scientist analyzing a genomic sequence or an M.D. exploring biomarkers and a patient’s genomic variants to develop a personalized treatment, researchers are spending a great amount of time searching for, accessing, manipulating, analyzing, and visualizing data.

Organizations supporting such research efforts are trying to make it easier to perform these tasks without the user needing extensive IT expertise and skills. This mission is not easy.

Focus on the data

Modern life sciences data analysis requirements are vastly different than they were just a handful of years ago.

In the past, once data was created, it was stored, analyzed soon after, and then archived to tape or another long-term medium. Today, not only is more data is being generated, but also the need to re-analyze that data means that it must be retained where it can be easily accessed for longer periods.

Additionally, today’s research is much more collaborative and multi-disciplinary. As a result, organizations must provide an easy way for researchers to access data, ensure that results are reproducible, and provide transparency to ensure best practices are used and that procedures adhere to regulatory mandates.

More analytics and collaboration represent areas where The Galaxy Project (also known as just Galaxy) can help. Galaxy is a scientific workflow, data integration, and data and analysis persistence and publishing platform designed to help make computational biology accessible to research scientists that do not have computer programming experience.

Galaxy is generally used as a general bioinformatics workflow management system that automatically tracks and manages data while providing support for capturing the context and intent of computational methods.

Organizations have several ways to make use of Galaxy. They include:

Free public instance: The Galaxy Main instance is available as a free public service at UseGalaxy.org. This is the Galaxy Project’s primary production Galaxy instance and is useful for sharing or publishing data and methods with colleagues for routine analysis or with the larger scientific community for publications.

Anyone can use the public servers, with or without an account. (With an account, data quotas are increased and full functionality across sessions opens up, such as naming, saving, sharing, and publishing Galaxy-defined objects).

Publicly available instances: Many other Galaxy servers besides Main have been made publicly available by the Galaxy community. Specifically, a number of institutions have installed Galaxy and have made those installations either accessible to individual researchers or open to certain organizations or communities.

For example, the Centre de Bioinformatique de Bordeaux offers a general purpose Galaxy instance that includes EMBOSS (a software analysis package for molecular biology) and fibronectin (diversity analysis of synthetic libraries of a Fibronectin domain). Biomina offers a general purpose Galaxy instance that includes most standard tools for DNA/RNA sequencing, plus extra tools for panel resequencing, variant annotation, and some tools for Illumina SNP array analysis.

A list of the publically available installations of Galaxy can be found here.

Do-it-yourself: Organizations also have the choice of deploying their own Galaxy installations. There are two options: an organization can install a local instance of Galaxy (more information on setting up a local instance of Galaxy can be found here), or Galaxy can be deployed to the cloud. The Galaxy Project supports CloudMan, a software package that provides a common interface to different cloud infrastructures.

How it works

Architecturally, Galaxy is a modular python-based web application that provides a data abstracting layer to integrate with various storage platforms. This allows researchers to access data on a variety of storage back-ends like standard direct attached storage, S3 object-based cloud storage, storage management systems like iRODs (the Integrated Rule-Oriented Data System), or a distributed file system.

For example, a Galaxy implementation might use object-based storage such as that provided by Dell EMC Elastic Cloud Storage (ECS). ECS is a software-defined, cloud-scale, object storage platform that combines that cost advantages of commodity infrastructure with the reliability, availability, and serviceability of traditional storage arrays.

With ECS, any organization can deliver scalable and simple public cloud services with the reliability and control of a private-cloud infrastructure.

ECS provides comprehensive protocol support, like S3 or Swift, for unstructured workloads on a single, cloud-scale storage platform. This would allow the user of a Galaxy implementation to easily access data stored on such cloud storage platforms.

With ECS, organizations can easily manage a globally distributed storage infrastructure under a single global namespace with anywhere access to content. ECS features a flexible software-defined architecture that is layered to promote limitless scalability. Each layer is completely abstracted and independently scalable with high availability and no single points of failure.

Get first access to our Life Sciences Solutions

You can test drive Dell EMC ECS by registering for an account and getting access to our APIs by visiting https://portal.ecstestdrive.com/

Or you can download the Dell EMC ECS Community Edition here and try it for FREE in your own environment with no time limit for non-production use

Examining TCO for Object Storage in the Media and Entertainment Industry

The cloud has changed everything for the media and entertainment industry when it comes to storage. The economies of scale that cloud-based storage can support has transformed the way that media organizations archive multi-petabyte amounts of media.

Tape-based multi-petabyte archives present a number of challenges, including a host of implementation of maintenance issues. Data stored on tape is not accessible until the specific tape is located, loaded onto a tape drive, and then positioned to the proper location on the tape. Then there is the factor of the physical footprint of the library frame, and real estate required for frame expansions – tape libraries are huge. This becomes all the more problematic in densely populated, major media hubs such as Hollywood, Vancouver and New York.

At first, the public cloud seemed like a good alternative to tape, providing lower storage costs. But while it’s cheaper to store content in the public cloud, you must also factor in the high costs associated with data retrieval, which can be prohibitive given data egress fees. The public cloud also requires moving your entire media archive library to the cloud and giving up the freedom to use the applications of your choice. Suddenly the lower initial costs of the public cloud can be wrapped up in a significantly larger price to pay.

Object storage is emerging as a viable option that offers media companies a number of benefits and efficiencies that the public cloud and tape-based archives simply cannot provide. In fact, object storage is rapidly becoming mandatory for applications that must manage large, constantly growing repositories of media for long-term retention.

Dell EMC Elastic Cloud Storage (ECS) blends next-generation object storage with traditional storage features that offer the media and entertainment world an on-premises cloud storage platform that is cost-competitive with multi-petabyte type libraries. ECS not only simplifies the archive infrastructure, it enables critical new cloud-enabled workflows not possible with a legacy tape library.

Instant Availability of Content

The greatest benefit of object storage for media and entertainment companies is the instant availability of their media content – you can’t access media on tape without a planned and scheduled retrieval from a robotic tape library. For a broadcast company, the delay in data availability could result in a missed air date, advertiser revenue loss, and legal fees.

With instant access to their entire archives, a whole new world of possibilities opens up for content creators. Archives aren’t often considered when it comes to content creation – the process of accessing media content has historically been difficult and the process of obtaining data often takes far too long. However, with instant access to archived media, archives can effectively become monetized, rather than just sitting around on tape in a dark closet gathering dust and being wasted. Being able to access all of your media content at any time allows rapid deployment of new workflows and new revenue opportunities. Further, with object storage, engineering resources that were focused on tape library maintenance can be re-focused on new projects.

Operational Efficiencies

Object storage can also offer increased operational efficiencies – eliminating annual maintenance costs, as one example. One of the biggest – and least predictable – expenses with operating a tape library is maintenance. Errors on a tape library are commonplace, drive failures and downtime to fix issues can impact deadlines and cause data availability issues that can require valuable engineering time and result in lost revenue.

Going Hot and Cold: Consolidation and Prioritization

Public cloud storage services can enable users to move cold or inactive content off of tier 1 storage for archiving, but concerns around security, compliance, vendor-lock and unpredictable costs still remain a concern.  Cold content can still deliver value and ESC allows organizations to monetize this data and provide an active-archive with the same scalability and low costs benefits, but without the lack of IT agility and reliability concerns.

ECS allows organizations to consolidate their backup and archive storage requirements into a single platform. It can replace tape archives for long-term retention and near-line purposes, and surpass public cloud service for backup.

In the video below, Dell EMC’s Tom Burns and Manuvir Das offer some additional perspective on how the media and entertainment industry can benefit from object storage: 

Stay current with Media & Entertainment industry trends

The Next Element for IT Service Providers in the Digital Age

Diana Gao

Senior Product Marketing Manager at EMC² ECS

Digital technology has disrupted large swaths of the economy and is generating huge amount of data, where the average backup hovers at around a petabyte. Not all organizations can cope up with this data deluge and look to service providers for storage and protection. Many service providers provide tape-based backup and archiving services. Despite their best efforts to innovate, data volumes always seem to grow faster, pushing the boundaries of tape capacity.

Today, companies of all sizes still use tape to store business information, but now it is more for cold storage than for data that needs to be accessed frequently. While tape as a low cost and reliable storage option is ideal for data not being accessed often, maintaining multiple versions of software and legacy infrastructure can put a burden on already taxed resources. These challenges come at a cost including software licenses, maintenance, and a waste of technical resources that could be spent on other more important initiatives to help drive business innovation. As a service provider, you need a secure and compliant data storage option that will enable you to sell more value added services.

As reported in Tech Target, a Storage magazine Purchasing Intention survey showed that the trend away from tape continues – 76% of IT professionals see their use of tape as a backup format either declining or staying the same.

Some service providers are considering offering cloud-based backup-as-a-service without causing any security concerns for their customers. Others are looking for a solution that combines the benefits of faster data access along with the cost advantages of tape.

More than a few service providers have discovered an ideal solution that covers all of these benefits: Elastic Cloud Storage (ECS) object storage platform. As a highly scalable, multi-tenant, multi-protocol object storage system, ECS is the perfect platform that helps service providers to better meet their service-level-agreement (SLA) commitments to customers by offering highly resilient, reliable and low-cost storage services with enterprise-class security.

Iron Mountain® Incorporated (NYSE: IRM), a leading provider of storage and information management services, is one of those who have discovered this solution. In additional to its traditional tape-based storage-as-a-service, it partnered with Dell EMC to provide a cost-effective, scalable and modern Cloud Archive as a part of their services portfolio. Designed to scale as the volume of data grows with ECS as the backend storage platform, the Cloud Archive solution is ideal for organizations needing offsite, pay-as-you-use archival storage with near-infinite scalability.

“Our customers trust that we know where the data is by having those cloud-based solutions in our datacenters. It gives them a peace of mind where they know where their data is at rest.” said Eileen Sweeney, SVP Data Management at Iron Mountain.

Watch the video below to hear more about how Iron Mountain uses ECS to modernize its storage management services for 95% of Fortune 1000 companies. 

You’ll find the full rundown of Iron Mountain Cloud Archive solution with ECS here.

Planning on getting away to Barcelona for Mobile World Congress (MWC) 2017? Stop by at VMWare Stand at Hall 3, Stand 3K10 to meet with Dell EMC experts!

Follow Dell EMC

Categories

Archives

Connect with us on Twitter