Posts Tagged ‘Life Sciences’

Galaxy: A Workflow Management System for Modern Life Sciences Research

Nathan Bott

Healthcare Solutions Architect at EMC

Am I a life scientist or an IT data manager? That’s the question many researchers are asking themselves in today’s data-driven life sciences organizations.

Whether it is a bench scientist analyzing a genomic sequence or an M.D. exploring biomarkers and a patient’s genomic variants to develop a personalized treatment, researchers are spending a great amount of time searching for, accessing, manipulating, analyzing, and visualizing data.

Organizations supporting such research efforts are trying to make it easier to perform these tasks without the user needing extensive IT expertise and skills. This mission is not easy.

Focus on the data

Modern life sciences data analysis requirements are vastly different than they were just a handful of years ago.

In the past, once data was created, it was stored, analyzed soon after, and then archived to tape or another long-term medium. Today, not only is more data is being generated, but also the need to re-analyze that data means that it must be retained where it can be easily accessed for longer periods.

Additionally, today’s research is much more collaborative and multi-disciplinary. As a result, organizations must provide an easy way for researchers to access data, ensure that results are reproducible, and provide transparency to ensure best practices are used and that procedures adhere to regulatory mandates.

More analytics and collaboration represent areas where The Galaxy Project (also known as just Galaxy) can help. Galaxy is a scientific workflow, data integration, and data and analysis persistence and publishing platform designed to help make computational biology accessible to research scientists that do not have computer programming experience.

Galaxy is generally used as a general bioinformatics workflow management system that automatically tracks and manages data while providing support for capturing the context and intent of computational methods.

Organizations have several ways to make use of Galaxy. They include:

Free public instance: The Galaxy Main instance is available as a free public service at This is the Galaxy Project’s primary production Galaxy instance and is useful for sharing or publishing data and methods with colleagues for routine analysis or with the larger scientific community for publications.

Anyone can use the public servers, with or without an account. (With an account, data quotas are increased and full functionality across sessions opens up, such as naming, saving, sharing, and publishing Galaxy-defined objects).

Publicly available instances: Many other Galaxy servers besides Main have been made publicly available by the Galaxy community. Specifically, a number of institutions have installed Galaxy and have made those installations either accessible to individual researchers or open to certain organizations or communities.

For example, the Centre de Bioinformatique de Bordeaux offers a general purpose Galaxy instance that includes EMBOSS (a software analysis package for molecular biology) and fibronectin (diversity analysis of synthetic libraries of a Fibronectin domain). Biomina offers a general purpose Galaxy instance that includes most standard tools for DNA/RNA sequencing, plus extra tools for panel resequencing, variant annotation, and some tools for Illumina SNP array analysis.

A list of the publically available installations of Galaxy can be found here.

Do-it-yourself: Organizations also have the choice of deploying their own Galaxy installations. There are two options: an organization can install a local instance of Galaxy (more information on setting up a local instance of Galaxy can be found here), or Galaxy can be deployed to the cloud. The Galaxy Project supports CloudMan, a software package that provides a common interface to different cloud infrastructures.

How it works

Architecturally, Galaxy is a modular python-based web application that provides a data abstracting layer to integrate with various storage platforms. This allows researchers to access data on a variety of storage back-ends like standard direct attached storage, S3 object-based cloud storage, storage management systems like iRODs (the Integrated Rule-Oriented Data System), or a distributed file system.

For example, a Galaxy implementation might use object-based storage such as that provided by Dell EMC Elastic Cloud Storage (ECS). ECS is a software-defined, cloud-scale, object storage platform that combines that cost advantages of commodity infrastructure with the reliability, availability, and serviceability of traditional storage arrays.

With ECS, any organization can deliver scalable and simple public cloud services with the reliability and control of a private-cloud infrastructure.

ECS provides comprehensive protocol support, like S3 or Swift, for unstructured workloads on a single, cloud-scale storage platform. This would allow the user of a Galaxy implementation to easily access data stored on such cloud storage platforms.

With ECS, organizations can easily manage a globally distributed storage infrastructure under a single global namespace with anywhere access to content. ECS features a flexible software-defined architecture that is layered to promote limitless scalability. Each layer is completely abstracted and independently scalable with high availability and no single points of failure.

Get first access to our Life Sciences Solutions

You can test drive Dell EMC ECS by registering for an account and getting access to our APIs by visiting

Or you can download the Dell EMC ECS Community Edition here and try it for FREE in your own environment with no time limit for non-production use

Overcoming the Exabyte-Sized Obstacles to Precision Medicine

Wolfgang Mertz

CTO of Healthcare, Life Sciences and High performance Computing

As we make strides towards a future that includes autonomous cars and grocery stores sans checkout lines, concepts that once seemed reserved only for utopian fiction, it seems there’s no limit to what science and technology can accomplish. It’s an especially exciting time for those in the life sciences and healthcare fields, with 2016 seeing breakthroughs such as a potential “universal” flu vaccine and CRISPR, a promising gene editing technology that may help treat cancer.

Several of Dell EMC’s customers are also making significant advances in precision medicine, the medical model that focuses on using an individual’s specific genetic makeup to customize and prescribe treatments.

Currently, physicians and scientists are in the research phase of a myriad of applications for precision medicine, including oncology, diabetes and cardiology. Before we are able to realize the vision President Obama shared of “the right treatments at the right time, every time, to the right person” from his 2015 Precision Medicine Initiative, there are significant challenges to overcome.


In order for precision medicine to become available to the masses, this will require researchers and doctors to not only have the technical infrastructure to support genomic sequencing, but the storage capacity and resources to access, view and share additional relevant data as well. They will need to have visibility into patients’ electronic health records (EHR), along with information on environmental conditions and lifestyle behaviors and biological samples. While increased data sharing may sound simple enough, the reality is there is still much work to be done on the storage infrastructure side to make this possible. Much of this data is typically siloed, which impedes healthcare providers’ ability to collaborate and review critical information that could impact a patient’s diagnosis and treatment. To fully take advantage of the potential life-saving insights available from precision medicine, organizations must implement a storage solution that enables high-speed access anytime, anywhere.


Another issue to confront is the storage capacity needed to house and preserve the petabytes of genomic data, medical imaging, EHR and other data. Thanks to decreased costs of genomic sequencing and more genomes being analyzed, the sheer volume of genomic data alone being generated is quickly eclipsing the storage available in most legacy systems. According to a scientific report by Stephens et. al published in PLOS Biology, between 100 million and two billion human genomes may be sequenced by 2025. This may lead to storage demands of up to 2-40 exabytes since storage requirements must take into consideration the accuracy of the data collected. The paper states that, “For every 3 billion bases of human genome sequence, 30-fold more data (~100 gigabases) must be collected because of errors in sequencing, base calling and genome alignment.” With this exponential projected growth, scale-out storage that can simultaneously manage multiple current and future workflows is necessary now more than ever.

Early Stages 

Finally, while it’s easy to get caught up in the excitement of the advances made thus far in precision medicine, we have to remember this remains a young discipline. At the IT level, there’s still much to be done around network and storage infrastructure and workflows in order to develop the solutions that will make this ground-breaking research readily available to the public, the physician community and healthcare professionals. Third-generation platform applications need to be built to make this more mainstream. Fortunately, major healthcare technology players such as GE and Philips have undertaken initiatives to attract independent software vendor (ISV) applications. With high-profile companies willing to devote time and resources to supporting ISV applications, the more likely it is scientists will have access to more sophisticated tools sooner.

More cohort analysis such as Genomic England’s 100,000 Genomic Project must be put in place to ensure researchers have sufficient data to develop new forms of screening and treatment and these efforts will also necessitate additional storage capabilities.


Despite these barriers, the future remains promising for precision medicine. With the proper infrastructure in place to provide reliable shared access and massive scalability, clinicians and researchers will have the freedom to focus on discovering the breakthroughs of tomorrow.

Get first access to our Life Sciences Solutions

Metalnx: Making iRODS Easy

Stephen Worth

Stephen Worth is a director of Global Innovation Operations at Dell EMC. He manages development and university research projects in Brazil, is a technical liaison between helping to improve innovation across our global engineering labs, and works in digital asset management leveraging user defined metadata. Steve is based out of Dell EMC’s RTP Software Development Center which focuses on data protection, core storage products, & cloud storage virtualization. Steve started with Data General in 1985, which was acquired by EMC in 1999, and Dell Technologies in 2016. He has led many product development efforts involving operating systems, diagnostics, UI, database, & applications porting. His background includes vendor & program management, performance engineering, engineering services, manufacturing, and test engineering. Steve, an alumnus of North Carolina Status University, received a B.S. degree in Chemistry in 1981 and M.S. degree in Computer Studies in 1985. He served as an adjunct faculty member of the Computer Science department from 1987-1999. Currently Steve is an emeritus member of the Computer Science Department’s Strategic Advisory Board and is currently chairperson of the Technical Advisory Board for the James B. Hunt Jr. Library on Centennial Campus.

Latest posts by Stephen Worth (see all)

DNA background

Advances in sequencing, spectroscopy, and microscopy are driving life sciences organizations to produce vast amounts of data. Most organizations are dedicating significant resources to the storage and management of that data. However, until recently, their primary efforts have focused on how to host the data for high performance, rapid analysis, and moving it to more economical disks for longer-term storage.

The nature of life sciences work demands better data organization. The data produced by today’s next-generation lab equipment is rich in information, making it of interest to different research groups and individuals at varying points in time. Examples include:

  • Raw experimental and analyzed data may be needed as new drug candidates move through research and development, clinical trials, FDA approval, and production
  • A team interested in new indications for an existing chemical compound would want to leverage work already done by others in the organization on the compound in the past
  • In the realm of personalized medicine, clinicians may need to evaluate not only a person’s health history, but correlate that information with genome sequences and phenotype data throughout the individual’s life.

The great challenge is how to make data more generally available and useful throughout an organization. Researchers need to know what data exists and have a way to access it. For this to happen, data must be properly categorized, searchable, and easy to find.

To get help in this area, many research organizations and government agencies worldwide are using the Integrated Rule-Oriented Data System (iRODS), which is open source data management software developed by the iRODS Consortium. iRODS enables data discovery using a data/metadata catalog that can retain machine and user-defined metadata describing every file, collection, and object in a data grid.

Additionally, iRODS automates data workflows with a rule engine that permits any action to be initiated by any trigger on any server or client in the grid. iRODS enables secure collaboration, so users only need to login to their home grid to access data hosted on a remote, federated grid.

Leveraging iRODS can be simplified and its benefits enhanced when used with Metalnx, an administrative and metadata management user interface (UI) for iRODS. Metalnx was developed by Dell EMC through its efforts as a corporate member of the iRODS Consortium. The intuitive Metalnx UI helps both the IT administrators charged with managing metadata and the end-users / researchers who need to find and access relevant data based upon metadata descriptions.

Making use of metadata via an easy to use UI provided by Metalnx working with iRODS can help:

  • Maximize storage assets
  • Find what’s valuable, no matter where the data is located
  • Automate movement and processing of data
  • Securely share data with collaborators

Real world example: Putting the issues into perspective

A simple example illustrates why iRODS and Metalnx are needed. Plant & Food Research, a New Zealand-based science company providing research and development that adds value to fruit, vegetable, crop and food products, makes great use of next-generation sequencing and genotyping. The work generates a lot of mixed data types.

“In the past, we were good at storing data, but not good at categorizing the data or using metadata,” said Ben Warren, bioinformatician, at Plant & Food Research. “We tried to get ahead of this by looking at what other institutions were doing.”

iRODS seemed a good fit. It was the only decent open source solution available. However, there were some limitations. “We were okay with the rule engine, but not the interface,” said Warren.

A system administrator working with EMC on hardware for the organization’s compute cluster had heard of Metalnx and mentioned this to Warren. “We were impressed off the bat with its ease of use,” said Warren. “Not only would it be useful for bioinformaticians, coders, and statisticians, but also for the scientists.”

The reason: Metalnx makes it easier to categorize the organization’s data, to control the metadata used to categorize the data, and to use the metadata to find and access any data.

Benefits abound

At Plant & Food Research, metadata is an essential element of a scientist’s workflow. The metadata makes it easier to find data at any stage of a research project. When a project is conceived, scientists will start by determining all metadata required for the project using Metalnx and cataloging data using iRODS. With this approach, everything associated with a project including the samples used, sample descriptions, experimental design, NGS data, and other information are searchable.

One immediate benefit is that someone undertaking a new project can quickly determine if similar work has already been done. This is increasingly important in life science organizations as research become more multi-discipline in nature.

Furthermore, the more an organization knows about its data, the more valuable the data becomes. Researchers can connect with other work done across the organization. Being able to find the right raw data of a past effort means an experiment does not have to be redone. This saves time and resources.

Warren notes that there are other organizational benefits using iRODS and Metalnx. When it comes to collaborating with others, the data is simply easier to share. Scientists can put the data in any format and it is easier to publish the data.

Learn more

Metalnx is available as open source tool. It can be found at Dell EMC Code  or on Github at . EMC has also made binary versions available on bintray at  and a Docker image posted on Docker Hub at

A broader discussion of the use of Metalnx and iRODS in the life sciences can be found in an on-demand video of a recent web seminar “Expanding the Face of Meta Data in Next Generation Sequencing.” The video can be viewed on the EMC Emerging Tech Solutions site.


Get first access to our LifeScience Solutions

Isilon – A Storage Solution that evolves with Next-Generation Sequencing

Biswajit Mishra

Sr. Product Marketing Manager, Life Sciences at EMC²

EMC Isilon Life SciencesThe creation and understanding of the binary code was the cornerstone of the development of the modern digital technology age. The language of 0’s and 1’s opened the door to many new technologies including the internet, modern media platforms, telecommunications, smartphone applications and more. For Life Sciences, something similar happened when we unlocked the genetic code. The language of A’s, T’s, C’s and G’s could explain all life forms and led to an explosion in the development of products and services in the areas of pharmaceuticals, healthcare, and agriculture.

The Beginning

In 1976, a single-stranded RNA virus named “Bacteriophage MS2” that infects E coli became the first genome with 3569 bases to be completely sequenced.  Then in 1995, the Institute for Genomic Research performed the completed genome sequencing of “Haemophilus influenza” – the first full genome sequencing of a living organism with 1.8M base pairs (Mb). But when the Human Genome Project sequenced the entire human genome with 3.2 billion base pairs in 2001, it opened a whole new world of possibilities.

Demands on IT divisions aka “The NGS Data Challenge”

The field of Life Sciences continues to remain at the cutting edge of scientific research with improvements in technologies such as next generation sequencing (NGS), proteomics, etc. NGS instrument such as the Illumina HiSeq X Ten Systems can now sequence over 18,000 human genomes per year which is unbelievable when you consider that it took more than a decade to sequence the entire human genome as part of the Human Genome Project. And, keep in mind each human genome needs about 100 GB of storage, so an Illumina HiSeq X Ten Systems needs about 1.8 PB of storage per year. The improvements, particularly in NGS technology and the growing applications of NGS means that more and more organizations are using NGS services and generating data at a rate as never before. And, in this day and age of flattening IT budgets, the IT departments in these organizations cannot keep up with the pace of data generation.

For most organizations performing NGS, the IT department has to deliver on the following fronts:

  • Meet the performance and capacity needs: Most Life Sciences workflows containing NGS include data generation, analysis, and archiving stages. Each of these stages has unique capacity and performance needs. Companies performing NGS usually require sequencing to run 24/7 and it is left to the IT department to ensure that the right production performance and storage tiers are available as needed.
  • Plan for the future: For IT departments with a longer IT procurement cycle, it is a struggle to ensure that the IT purchase process keeps up with the storage needs of the exponentially growing data produced from NGS processes. IT divisions have to ensure that they plan for today as well as the future to meet the demanding customer SLAs.
  • Cost efficient solutions: IT budget growth typically is nowhere close to the rate of the data growth. This means that IT departments have to ensure that they are smart and cost efficient with their storage solutions.

Isilon, a trusted partner for Life Sciences organizations – now and in the future

EMC Isilon is a leader and trusted partner for hundreds of Life Sciences organizations worldwide including leading genome centers, pharmaceutical companies, and academic research centers. Isilon provides a highly available, and reliable, single file system, OneFS, for Life Sciences workflows and consolidates all the workflows into a single, scalable volume for the ease of management. Isilon can host multiple types of nodes – S, X, NL and HD – in a single cluster. S and X nodes are ideal for high performance workflows such as mapping and alignment of NGS data while NL and HD nodes provide high-density, low cost storage of raw data and analysis results for archive needs. Isilon SmartFail and Auto Balance features also ensure that data is protected across the entire cluster.

Isilon gets better

Recently EMC Isilon announced Data Lake 2.0 – including OneFS 8.0, Isilon SD Edge and Isilon CloudPools – to extend the data lake from the data center to edge locations, including remote and branch offices and to both public and private cloud.

IsilonSD EdgeThe Data Lake 2.0 has massive benefits for Life Science organizations using Isilon. With OneFS 8.0 organizations can now achieve non-disruptive upgrade and rollback capabilities to ensure continuous operations at all times. IsilonSD Edge is a software-defined storage solution that can be deployed on commodity hardware on remote research facilities. This can provide Life Science organizations with the ease of management to connect geographically dispersed research facilities in a cost efficient manner.

Isilon CloudPools is a game changer for organizations Isilon CloudPoolsperforming NGS, providing a cost efficient way of archiving data allowing the IT department to make the most of their budget. Isilon CloudPools offers a transparent, policy-based, automated storage tiering of data from an on-premise Isilon cluster to either a private cloud based on EMC Elastic Cloud Storage (ECS), another Isilon cluster or into a public cloud like AWS or Azure for long term storage of Life Science research data and other archiving needs. By moving the cold archive data to either a public or private cloud, CloudPools helps the IT departments to free up space on production environments and ensure that the NGS analysis pipelines are not affected due to a lack of high performance storage. This also allows IT departments to deliver on customer SLAs while at the same time, reducing costs.

Improve Life Sciences Data Management

We understand the need of Life Science organizations for a versatile storage solution to balance their capacity and performance requirements. That’s why we continue to evolve our Isilon platform to meet the needs of our Life Science customers. New innovative features like EMC Isilon OneFS 8.0, IsilonSD Edge and Isilon CloudPools makes sure that our Life Sciences customers have a proven solution to run NGS non-stop and achieve faster time to insights. With EMC Isilon, Life Sciences organizations have a trusted partner to handle their NGS workloads not only today but in the future too.

For more information on how EMC Emerging Technology Division can help your Life Science organization click here.


Get first access to our LifeScience Solutions

Telemedicine Part 1: TeleRadiology as the growth medium of Precision Medicine

Sanjay Joshi

CTO, Healthcare & Life-Sciences at EMC
Sanjay Joshi is the Isilon CTO of Healthcare and Life Sciences at the EMC Emerging Technologies Division. Based in Seattle, Sanjay's 28+ year career has spanned the entire gamut of life-sciences and healthcare from clinical and biotechnology research to healthcare informatics to medical devices. His current focus is a systems view of Healthcare, Genomics and Proteomics for infrastructures and informatics. Recent experience has included information and instrument systems in Electronic Medical Records; Proteomics and Flow Cytometry; FDA and HIPAA validations; Lab Information Management Systems (LIMS); Translational Genomics research and Imaging. Sanjay holds a patent in multi-dimensional flow cytometry analytics. He began his career developing and building X-Ray machines. Sanjay was the recipient of a National Institutes of Health (NIH) Small Business Innovation Research (SBIR) grant and has been a consultant or co-Principal-Investigator on several NIH grants. He is actively involved in non-profit biotech networking and educational organizations in the Seattle area and beyond. Sanjay holds a Master of Biomedical Engineering from the University of New South Wales, Sydney and a Bachelor of Instrumentation Technology from Bangalore University. He has completed several medical school and PhD level courses.

Real “health care” happens when telemedicine is closely joined to a connected-care delivery model that has prevention and continuity-of-care at its core. This model has been defined well, but only sparsely adopted. As John Hockenberry, host of the morning show “The Takeaway” on National Public Radio, eloquently puts it: “health is not episodic.” We need a continuous care system.

Telemedicine makes it possible for you to see a specialist like me without driving hundreds of miles

Image source: Chest. 2013,143 (2):295-295. doi:10.1378/chest.143.2.295

How do we get the “right care to the right patient at the right time”? Schleidgen et al define Precision Medicine also known as Personalized Medicine (1) as seeking “to improve stratification and timing of health care by utilizing biological information and biomarkers on the level of molecular disease pathways, genetics, proteomics as well as metabolomics.” Precision Medicine (2) is an orthogonal, multimodal view of the patient from her/his cells to pathways to organs to health and disease. There are several devices and transducers that would catalyze telemedicine: Radiology, Pathology, and Wearables. I will focus on Radiology for this part of my three-part series, since all of these modalities use multi-spectral imaging.

Where first?
The world is still mostly rural. According to World Bank statistics, 19% of the USA is rural, but the worldwide average is about 30% which is a spectrum from 0% rural (Hong Kong) to 74% rural (Afghanistan). With the recent consolidations (since 2010 in the US) of hospitals into larger organizations (3), it is this 30% to 70% of the world with sparse network connectivity that needs telemedicine sooner than the well-off “worried well” folks who live in dense urban areas with close access to healthcare. China has the world’s largest number of hospitals at around 60,000 followed by India at around 15,000. The US tally is approximately 5,700 hospitals. The counter-argument to the rural needs in the US is the risk of reduction of physician numbers (4), the growing numbers of the urban poor and the elderly. Then there is the plight of poor health amongst the world’s millions of refugees who are usually stuck in no-mans-lands, fleeing conflicts that never seem to wane. All these use-cases are valid, but need prioritization.

Connected Health and the “Saver App”
Many a fortune has been made by devising and selling “killer apps” on mobile platforms. In healthcare what we need is a “saver app.” Using the pyscho-social keys to the success of these “sticky” technologies, Dr. Joseph C. Kvedar succinctly builds the case for connected health in his recent book “The Internet of Healthy Things” with three strategies and three tactics:

Strategies: (1) Make It about Life; (2) Make It Personal; and (3) Reinforce Social Connections.

Tactics: (1) Employ Messaging; (2) Use Unpredictable Rewards; and (3) Use the Sentinel Effect.

Dr. Kvedar calls this “digital therapies.”

The Vendor Neutral Archive (VNA) and Virtual Radiology
The Western Roentgen Society, a predecessor of the Radiological Society of North America (RSNA), was founded in 1915 in St. Louis, Missouri (soon after the invention of the X-Ray tube in Bavaria in 1895). An interactive timeline of Radiology events can be seen here. Innovations in Radiology have always accelerated the innovations in healthcare.

The Radiology value chain is in its images and clinical reporting, as summarized in the diagram below (5):

Radiology value chain

To scale this value-chain for telemedicine, we need much larger adoption of VNA, which is an “Enterprise Class” data management system. A VNA consolidates multiple Imaging Departments into:

  • a master directory,
  • associated storage and
  • lifecycle management of data

The difference between PACS (Picture Archiving and Communications System) (6) and VNA is the Image Display and the Image Manager layers respectively.

The Image Display layer is a PACS Vendor or a Cloud based “image program”. All Admit, Discharge and Transfer (ADT) information must reside with the image. This means DICOM standards and HL7 X.12N interoperability (using service protocols like FHIR) are critical. The Image Manager for VNA is the “storage layer of images”, either local or cloud based. For telemedicine to be successful, VNA must “scale-out” exponentially and in a distributed manner within a privacy and security context.

VNA’s largest players (alphabetically) are: Agfa, CareStream, FujiFilm (TeraMedica), IBM (Merge), Perceptive Software (Acuo), Philips and Siemens. The merger of NightHawk Radiology with vRad which was then acquired by MedNax and IBM’s acquisition of Merge Healthcare (in Aug 2015) are important landmarks in this trend.

One of the most interesting journal articles in 2015 was on “Imaging Genomics” (or Radiomics) of glioblastoma, a brain cancer. By bidirectionally linking imaging features to the underlying molecular features, the authors (7) have created a new field of non-invasive genomic biomarkers.

Imagine this “virtual connected hive” of patients on one side and physicians, radiologists and pathologists on the other, constantly monitoring and improving the care of a population in health and disease at the individual and personal level. Telemedicine needs to be the anchor architecture for Precision Medicine. Without Telemedicine (and VNA), there is no Precision Medicine.

Postscript: Telepresence in mythology
Let me end this tale of distance and care with a little echo from my namesake, Sanjaya, who is mentioned in the first chapter of the first verse of the Bhagvad Gita (literally translated as the “Song of the Lord”) – an existential dialog between the warrior Arjuna and his charioteer, Krishna. The Gita, as it is commonly known, is set within the longest Big Data poem with over 100,000 verses (and 1.8 million words), the Mahabharata, estimated to be first written around 400 BCE.

Dhritarashtra, the blind king, starts this great book-within-book by enquiring: “O Sanjaya, what did my sons and the sons of Pandu decide about battle after assembling at the holy land of righteousness Kurukshetra?”

Sanjaya starts the Gita by peering into the great yonder. He is bestowed with the divine gift of seeing events afar (divya-drishti); he is the king’s tele-vision – and Dhritarashtra’s advisor and charioteer (just like Krishna in the Gita). The other great religions and mythologies also mention telepresence in their seminal books.

My tagline for the “trickle down” in technology innovation flow is “from Defense to Life Sciences to Pornography to Finance to Commerce to Healthcare.” One interpretation of the Mahabharata is that it did not have any gods – all miracles were added later. Perhaps we have now reached the pivot point for telepresence which has happened in war to “trickle down” into population scale healthcare without divine intervention or miracles!


  1. Schleidgen et al, “What is personalized medicine: sharpening a vague term based on a systematic literature review”, BMC Medical Ethics, Dec 2013, 14:55
  2. “Toward Precision Medicine”, Natl. Acad. Press, June 2012
  3. McCue MJ, et al, “Hospital Acquisitions Before Healthcare Reform”, Journal of Healthcare Management, 2015 May-Jun; 60(3):186-203.
  4. Petterson SM, et al, “Estimating the residency expansion required to avoid projected primary care physician shortages by 2035”, Annals of Family Medicine 2015 Mar; 13(2):107-14. doi: 10.1370/afm.1760
  5. Enzmann DR, “Radiology’s Value Chain”, Radiology: Volume 263: Number 1, April 2012, pp 243-252
  6. Huang HK, “PACS and Imaging Informatics: Basic Principles and Applications”, Wiley-Blackwell; 2 edition (January 12, 2010)
  7. Moton S, et al, “Imaging genomics of glioblastoma: biology, biomarkers, and breakthroughs”, Topics in Magnetic Resonance Imaging. 2015


Get first access to our LifeScience Solutions

Making Trust and Collaboration a Unified Force in Science

Sanjay Joshi

CTO, Healthcare & Life-Sciences at EMC
Sanjay Joshi is the Isilon CTO of Healthcare and Life Sciences at the EMC Emerging Technologies Division. Based in Seattle, Sanjay's 28+ year career has spanned the entire gamut of life-sciences and healthcare from clinical and biotechnology research to healthcare informatics to medical devices. His current focus is a systems view of Healthcare, Genomics and Proteomics for infrastructures and informatics. Recent experience has included information and instrument systems in Electronic Medical Records; Proteomics and Flow Cytometry; FDA and HIPAA validations; Lab Information Management Systems (LIMS); Translational Genomics research and Imaging. Sanjay holds a patent in multi-dimensional flow cytometry analytics. He began his career developing and building X-Ray machines. Sanjay was the recipient of a National Institutes of Health (NIH) Small Business Innovation Research (SBIR) grant and has been a consultant or co-Principal-Investigator on several NIH grants. He is actively involved in non-profit biotech networking and educational organizations in the Seattle area and beyond. Sanjay holds a Master of Biomedical Engineering from the University of New South Wales, Sydney and a Bachelor of Instrumentation Technology from Bangalore University. He has completed several medical school and PhD level courses.

Try to recall all the superhero movies you have watched. Many of us would agree that the films which are most captivating are those where superheroes collaborate as a team to defeat a near invincible villain – like in The Avengers. When there is collaboration, there is trust. Dr. Douglas Fridsma, President and CEO of the AMIA (American Medical Informatics Association) mentioned a phrase in a panel discussion we were on in 2012 that stuck with me. “Information moves at the speed of trust.” And, trust is at the heart of any collaboration. New forms of trust and collaboration networks have been forming since 2008, and the “bitcoin” is a great example of this. The “BlockChain” method behind bitcoin, discussed in an article published by The Economist and illustrated in the figure below is a new approach to trust and collaboration.

This figure illustrates the "BlockChain" method behind bitcoin.

Scientists are our Modern-day X-Men

Taking this parallel and comparing it to the sphere of scientific research – notably in biomedical sciences, this is an area where breakthroughs can deliver better health outcomes for mankind – like superheroes do, but only if scientists have the means of working together. All humans are continuously mutating; and I’d like to think that scientists are our modern day X-Men (and Women)!

The two most exciting disruptions in science recently are Synthetic Biology advances and the CRISPR gene editing enzyme system. Both of these innovations have enormous implications for biomedical sciences and the future of healthcare advancements.

Recently, a young girl in London with Leukemia was treated with Gene Editing Therapy. This case used a gene editing enzyme system called TALEN. This is but one of the many leaps that have been made possible through collaborative scientific research.

Fueling and Steering Scientific Research

Research is fueled by data. Discoveries are steered by the management of data. Even superheroes like Iron Man and The Incredible Hulk need some form of cognitive direction to focus their superhuman powers in order to achieve a common desired outcome. To build trust and collaboration frameworks, we need a single logical container of data. And this is why EMC has the Data Lake concept: a multi-user, multi-protocol, multi-application container for data which is geo-aware and secure.

We know that research is ultra-data intensive. To implement Precision Medicine at population health scale, there are two pivots: Collaboration and Asia. The Malaysia Genome Institute (MGI) engages in national and international collaborative projects in comparative genomics and genetics, structural and synthetic biology, computational and systems biology, and metabolic engineering. When MGI does DNA sequencing, whole genome sequencing, whole transcriptome sequencing, and targeted sequencing, a single run generates 13 terabytes of data. That’s equivalent to over 2.6 million songs in your iPod.

Being able to discover insights through large chunks of data is what differentiates progress from stalemate for the institution and its partners. MGI had a problem. As MGI increased its storage capacity to cope with the influx of research data, data processing speed decreased, which slowed down analysis work.

That was before MGI adopted EMC Isilon’s scalable on demand storage solution with its fast next-generation sequencing architecture. With the added benefit of having data access provided directly to users, this has also curbed the problem of bottlenecks within workflows and ensured ease of collaboration.

Read the MGI Case Study to learn more.

Tools for Teamwork in Research

Singapore’s Agency for Science, Technology and Research (A*STAR) is a single agency that oversees 14 biomedical sciences, physical sciences, and engineering institutes as well as six consortia and centers.

So how does A*STAR encourage collaboration amongst scientists housed in different institutions?

There were two key issues A*STAR needed to address. One, sharing of data between institutions was done manually by researchers, who had to make a copy to transfer it to another party. It was both time consuming and wasteful in terms of storage due to the duplication of data within localized machines.

Two, long procurement periods – three to nine months – meant A*STAR didn’t have the means to scale up storage when the demand called for it. The opportunity cost was great.

Following the deployment of a comprehensive EMC Isilon platform, all that changed. Atop the increase in usable capacity with an option to scale on demand, researchers could now assign their data to a central storage, which could be shared within and across research institutes.

Says Lai Loong Fong, Director, Computational Resource Centre at A*STAR. “Users have been receptive to the new model. They are looking forward to the new features we can offer them to provide greater flexibility in accessing research data through their mobiles or laptops when they are working and meeting outside of the labs. It’s another way we can support innovation and collaboration across all of our research disciplines.”

Read the A*STAR Case Study to learn more.

Subject Data Protection

According to DOE Human Subjects Resources, the use of humans as research subjects has aided significant scientific discoveries such as the Human Genome Project. That being said, given that one’s genome contains personal health and other privy information, there needs to be measures in place to protect each subject’s privacy and prevent the loss of information. There are Ethical, Legal and Social Implication (ELSI) issues which can be resolved by trust and collaboration, as published by the Genome Law Review.

Looking at A*STAR as an example again, the agency has incorporated EMC Isilon SnapshotIQ into their platform which offers data protection through secure inbox snapshots and access to near-immediate, on-demand snapshot restores.

The sum of many great minds can achieve much greater things than the sum of one. And even greater still, scalable data storage on the cloud now makes it possible for great minds to work together, regardless of where they are. We can only begin to imagine what our modern-day science X-Men would be able to actualize in these new dynamic and secure collaborative environments.


Get first access to our LifeScience Solutions



Connect with us on Twitter