Author Archive

Galaxy: A Workflow Management System for Modern Life Sciences Research

Nathan Bott

Healthcare Solutions Architect at EMC

Am I a life scientist or an IT data manager? That’s the question many researchers are asking themselves in today’s data-driven life sciences organizations.

Whether it is a bench scientist analyzing a genomic sequence or an M.D. exploring biomarkers and a patient’s genomic variants to develop a personalized treatment, researchers are spending a great amount of time searching for, accessing, manipulating, analyzing, and visualizing data.

Organizations supporting such research efforts are trying to make it easier to perform these tasks without the user needing extensive IT expertise and skills. This mission is not easy.

Focus on the data

Modern life sciences data analysis requirements are vastly different than they were just a handful of years ago.

In the past, once data was created, it was stored, analyzed soon after, and then archived to tape or another long-term medium. Today, not only is more data is being generated, but also the need to re-analyze that data means that it must be retained where it can be easily accessed for longer periods.

Additionally, today’s research is much more collaborative and multi-disciplinary. As a result, organizations must provide an easy way for researchers to access data, ensure that results are reproducible, and provide transparency to ensure best practices are used and that procedures adhere to regulatory mandates.

More analytics and collaboration represent areas where The Galaxy Project (also known as just Galaxy) can help. Galaxy is a scientific workflow, data integration, and data and analysis persistence and publishing platform designed to help make computational biology accessible to research scientists that do not have computer programming experience.

Galaxy is generally used as a general bioinformatics workflow management system that automatically tracks and manages data while providing support for capturing the context and intent of computational methods.

Organizations have several ways to make use of Galaxy. They include:

Free public instance: The Galaxy Main instance is available as a free public service at UseGalaxy.org. This is the Galaxy Project’s primary production Galaxy instance and is useful for sharing or publishing data and methods with colleagues for routine analysis or with the larger scientific community for publications.

Anyone can use the public servers, with or without an account. (With an account, data quotas are increased and full functionality across sessions opens up, such as naming, saving, sharing, and publishing Galaxy-defined objects).

Publicly available instances: Many other Galaxy servers besides Main have been made publicly available by the Galaxy community. Specifically, a number of institutions have installed Galaxy and have made those installations either accessible to individual researchers or open to certain organizations or communities.

For example, the Centre de Bioinformatique de Bordeaux offers a general purpose Galaxy instance that includes EMBOSS (a software analysis package for molecular biology) and fibronectin (diversity analysis of synthetic libraries of a Fibronectin domain). Biomina offers a general purpose Galaxy instance that includes most standard tools for DNA/RNA sequencing, plus extra tools for panel resequencing, variant annotation, and some tools for Illumina SNP array analysis.

A list of the publically available installations of Galaxy can be found here.

Do-it-yourself: Organizations also have the choice of deploying their own Galaxy installations. There are two options: an organization can install a local instance of Galaxy (more information on setting up a local instance of Galaxy can be found here), or Galaxy can be deployed to the cloud. The Galaxy Project supports CloudMan, a software package that provides a common interface to different cloud infrastructures.

How it works

Architecturally, Galaxy is a modular python-based web application that provides a data abstracting layer to integrate with various storage platforms. This allows researchers to access data on a variety of storage back-ends like standard direct attached storage, S3 object-based cloud storage, storage management systems like iRODs (the Integrated Rule-Oriented Data System), or a distributed file system.

For example, a Galaxy implementation might use object-based storage such as that provided by Dell EMC Elastic Cloud Storage (ECS). ECS is a software-defined, cloud-scale, object storage platform that combines that cost advantages of commodity infrastructure with the reliability, availability, and serviceability of traditional storage arrays.

With ECS, any organization can deliver scalable and simple public cloud services with the reliability and control of a private-cloud infrastructure.

ECS provides comprehensive protocol support, like S3 or Swift, for unstructured workloads on a single, cloud-scale storage platform. This would allow the user of a Galaxy implementation to easily access data stored on such cloud storage platforms.

With ECS, organizations can easily manage a globally distributed storage infrastructure under a single global namespace with anywhere access to content. ECS features a flexible software-defined architecture that is layered to promote limitless scalability. Each layer is completely abstracted and independently scalable with high availability and no single points of failure.

Get first access to our Life Sciences Solutions

You can test drive Dell EMC ECS by registering for an account and getting access to our APIs by visiting https://portal.ecstestdrive.com/

Or you can download the Dell EMC ECS Community Edition here and try it for FREE in your own environment with no time limit for non-production use

Digital Health Strategies – An introduction to Elastic Cloud Storage (ECS)

Nathan Bott

Healthcare Solutions Architect at EMC

This past April, my father reached two important milestones – he turned 70 and retired from a 40-plus year career in food science.  He is now planning to head back to Spain to complete the Camino de Santiago – or the Way of St. James – a journey he started in 2014.  Unfortunately he had to stop 150 miles into the 500 mile trek because of severe back and hip pain due to the emergence of degenerative disc disease.  After working with his physician to manage this new condition, he started to prepare for the upcoming trip by walking between 5 and 10 miles three times a week.  Along with this training came other ailments that would be expected with anybody his age:  pulled muscles, strained knees, and “light-headedness.”  This last ailment can be attributed to another condition he happens to have – Type 2 Diabetes.  And so it goes, as he gets older and tries to maintain a high level of activity, he will suffer more ailments, and spend more time and money (via Medicare benefits) managing these chronic conditions.

And he will not be alone.  My father was born in 1946 and is thus a first year baby-boomer, the first wave of new Medicare beneficiaries in which about 10,000 enroll every day.  The Congressional Budget Office expects over 80 million Americans will be Medicare eligible by 2035, an almost 50% increase in enrollment from 2015.  The cost per beneficiary is expected to increase even more as each patient will have multiple chronic conditions to manage; per the National Council on Aging:

  • About 68% of Medicare beneficiaries have two or more chronic diseases and 36% have four or more.
  • More than two-thirds of all health care costs are for treating chronic diseases.

The US government and the healthcare industry are well aware of the current “silver tsunami” and planning has been underway.

For the past 7 years, since the passage of the Hi-Tech provision in the American Recovery and Reinvestment Act (ARRA) in 2009 and the Medicare Shared Saving Program (MSSP) in 2011 the ground work has been laid to implement various programs and incentives to distribute the efforts to manage the cost of delivering healthcare to an ever expanding beneficiary population.  The prolific adoption of electronic health records technology by healthcare providers and the reorganization of reimbursements to these providers – from a fee-for-service to an outcomes based model – have combined to become a catalyst for a digital revolution in healthcare.

Government led healthcare reform programs like Accountable Care Organizations (ACO), the Patient Centered Medical Home, and the Precision Medicine initiative are predicated with having a digital technology platform that can use the demographic, financial, clinical and genetic data acquired from the vast population of patients to develop evidence-based plans of care that are specifically tailored based on the genetic disposition and the disease(s) of a given patient.

Medicine doctor hand working with modern medical iconsRegardless of the industry, product or service, a disruptive technology that drives innovation through digitization requires a re-assessment of the infrastructure that supports it; the healthcare industry is no different.  As healthcare providers have implemented electronic medical records systems, deployed enterprise imaging solutions, piloted next generation sequencing programs, and developed clinical informatics capabilities, new infrastructure requirements and operating modes have emerged.  Furthermore, in response to the evolving markets and reimbursement models explained above, many healthcare entities – providers, payers, and pharmaceuticals alike – have consolidated through mergers and acquisitions which also necessitate re-evaluating infrastructure architectures in order to rationalize operational capabilities, drive utilization efficiency and decrease both operational and capital costs.

Working directly with healthcare customers, collaborating with healthcare software vendors, and partnering with IT service providers, EMC has been on the front line to provide architectural guidance and infrastructure solutions to support this digital revolution and its emerging infrastructure requirements. A key infrastructure solution to support the digitization revolution in healthcare is a highly durable, geo-distributed, performant storage platform that will work with legacy monolithic systems using file system interfaces as well as cloud-native distributed applications using standard storage APIs like AWS S3 or OpenStack Swift.

ECSEMC’s Elastic Cloud Storage system (ECS) is a modern object storage platform that does just that…and more.  Just as important, the ECS object platform can be used for a myriad of use cases specifically for the healthcare industry to support:

  • Innovative technology platforms which enable coordinated and accessible medical services such as outlined by the Patient-Centered Medical Home program
  • Collaboration and data sharing as needed for programs such as the Accountable Care Organization initiative
  • An increase in IT operational agility using a storage platform that can be provisioned with cloud-based API’s
  • A decrease in costs through storage utilization efficiency at scale using modern data protection and replication methods

In my follow-up blog entries here, I will provide more details on the functional capabilities of ECS as well as map these capabilities to specific use cases that are driving the digital revolution to take on the challenges of delivering collaborative and personalized healthcare services to an aging population with multiple complex chronic conditions while driving down IT operational costs as well as the overall cost of the healthcare system.

Examples of the use cases I mentioned above include various new technology trends like the emerging Internet of Things (IoT) solutions that support remote patient monitoring, telehealth, and behavior modification tools to help manage chronic diseases; data lake functionality with the Hadoop ecosystem for population and precision health based analytics programs; and cloud-native development efforts to launch distributed mobile applications that can capture and access data from any location.

I look forward to exploring these use cases and examining how ECS’s unique capabilities will help our healthcare customers move towards meeting their technical, operational, and “digitized-mission” goals.

Follow Dell EMC

Categories

Archives

Connect with us on Twitter