Archive for the ‘Life Sciences’ Category

Using a World Wide Herd (WWH) to Advance Disease Discovery and Treatment

Patricia Florissi

Vice President & Global Chief Technology Officer, Sales at Dell EMC
Patricia Florissi is Vice President and Global Chief Technology Officer (CTO) for Sales. As Global CTO for Sales, Patricia helps define mid and long term technology strategy, representing the needs of the broader EMC ecosystem in EMC strategic initiatives. Patricia is an EMC Distinguished Engineer, holds a Ph. D. in Computer Science from Columbia University in New York, graduated valedictorian with an MBA at the Stern Business School in New York University, and has a Master's and a Bachelor's Degree in Computer Science from the Universidade Federal de Pernambuco, in Brazil. Patricia holds multiple patents, and has published extensively in periodicals including Computer Networks and IEEE Proceedings.

Latest posts by Patricia Florissi (see all)

Analysis of very large genomic datasets has the potential to radically alter the way we keep people healthy. Whether it is quickly identifying the cause of a new infectious outbreak to prevent its spread or personalizing a treatment based on a patient’s genetic variants to knock out a stubborn disease, modern Big Data analytics has a major role to play.

By leveraging cloud, Apache™ Hadoop®, next-generation sequencers, and other technologies, life scientists potentially have a new, very powerful way to conduct innovative global-scale collaborative genomic analysis research that has not been possible before. With the right approach, there are great benefits that can be realized.

image1_

To illustrate the possibilities and benefits of using coordinated worldwide genomic analysis, Dell EMC partnered with researchers at Ben-Gurion University of the Negev (BGU) to develop a global data analytics environment that spans across multiple clouds. This environment lets life sciences organizations analyze data from multiple heterogeneous sources while preserving privacy and security. The work conducted by this collaboration simulated a scenario that might be used by researchers and public health organizations to identify the early onset of outbreaks of infectious diseases. The approach could also help uncover new combinations of virulence factors that may characterize new diseases. Additionally, the methods used have applicability to new drug discovery and translational and personalized medicine.

 

Expanding on past accomplishments

In 2003, SARS (severe acute respiratory syndrome) was the first infectious outbreak where fast global collaborative genomic analysis was used to identify the cause of a disease. The effort was carried out by researchers in the U.S. and Canada who decoded the genome of the coronavirus to prove it was the cause of SARS.

The Dell EMC and BGU simulated disease detection and identification scenario makes use of technological developments (the much lower cost of sequencing, the availability of greater computing power, the use of cloud for data sharing, etc.) to address some of the shortcomings of past efforts and enhance the outcome.

Specifically, some diseases are caused by the combination of virulence factors. They may all be present in one pathogen or across several pathogens in the same biome. There can also be geographical variations. This makes it very hard to identified root causes of a disease when pathogens are analyzed in isolation as has been the case in the past.

Addressing these issues requires sequencing entire micro-biomes from many samples gathered worldwide. The computational requirements for such an approach are enormous. A single facility would need a compute and storage infrastructure on a par with major government research labs or national supercomputing centers.

Dell EMC and BGU simulated a scenario of distributed sequencing centers scattered worldwide, where each center sequences entire micro-biome samples. Each center analyzes the sequence reads generated against a set of known virulence factors. This is done to detect the combination of these factors causing diseases, allowing for near-real time diagnostic analysis and targeted treatment.

To carry out these operations in the different centers, Dell EMC extended the Hadoop framework to orchestrate distributed and parallel computation across clusters scattered worldwide. This pushed computation as close as possible to the source of data, leveraging the principle of data locality at world-wide scale, while preserving data privacy.

Since one Hadoop instance is represented by a single elephant, Dell EMC concluded that a set of Hadoop instances, scattered across the world, but working in tandem formed a World Wide Herd or WWH. This is the name Dell EMC has given to its Hadoop extensions.

image2_

Using WWH, Dell EMC wrote a distributed application where each one of a set of collaborating sequence centers calculates a profile of the virulence factors present in each of the micro-biome it sequenced and sends just these profiles to a center selected to do the global computation.

That center would then use bi-clustering to uncover common patterns of virulence factors among subsets of micro-biomes that could have been originally sampled in any part of the world.

This approach could allow researchers and public health organizations to potentially identify the early onset of outbreaks and also uncover new combinations of virulence factors that may characterize new diseases.

There are several biological advantages to this approach. The approach eliminates the time required to isolate a specific pathogen for analysis and for re-assembling the genomes of the individual microorganisms. Sequencing the entire biome lets researchers identify known and unknown combinations of virulence factors. And collecting samples independently world-wide helps ensure the detection of variants.

On the compute side, the approach uses local processing power to perform the biome sequence analysis. This reduces the need for a large centralized HPC environment. Additionally, the method overcomes the matter of data diversity. It can support all data sources and any data formats.

This investigative approach could be used as a next-generation outbreak surveillance system. It allows collaboration where different geographically dispersed groups simultaneously investigate different variants of a new disease. In addition, the WWH architecture has great applicability to pharmaceutical industry R&D efforts, which increasingly relies on a multi-disciplinary approach where geographically dispersed groups investigate different aspects of a disease or drug target using a wide variety of analysis algorithms on share data.

 

Learn more about modern genomic Big Data analytics

 

 

Big Data Analysis for the Greater Good: Dell EMC & the 100,000 Genome Project

Wolfgang Mertz

CTO of Healthcare, Life Sciences and High performance Computing

It might seem far-reaching to say that big data analysis can fundamentally impact patient outcomes around cancer and other illnesses, and that it has the power to ultimately transform health services and indeed society at large, but that’s the precise goal behind the 100,000 Genome Project from Genomics England.

DNA backgroundFor background, Genomics England is a wholly-owned company of the Department of Health, set up to deliver the 100,000 Genomes Project. This exciting endeavor will sequence and collect 100,000 whole genomes from 70,000 NHS patients and their families (with their full consent), focusing on patients with rare diseases as well as those with common cancers.

The program is designed to create a lasting legacy for patients as well as the NHS and the broader UK economy, while encouraging innovation in the UK’s bioscience sector. The genetic sequences will be anonymized and shared with approved academic researchers to help develop new treatments and diagnostic testing methods targeted at the genetic characteristics of individual patients.

Dell EMC provides the platform for large-scale analytics in a hybrid cloud model for Genomics England, which leverages our VCE vScale, with EMC Isilon and EMC XtremIO solutions. The Project has been using EMC storage for its genomic sequence library, and now it will be leveraging an Isilon data lake to securely store data during the sequencing process. Backup services are provided by EMC’s Data Domain and EMC Networker.

The Genomics England IT environment uses both on-prem servers and IaaS provided by cloud service providers on G-Cloud. According to an article from Government Computing, “one of Genomics England’s key legacies is expected to be an ecosystem of cloud service providers providing low cost, elastic compute on demand through G-Cloud, bringing the benefits of scale to smaller research groups.”

There are two main considerations from an IT perspective around genome and DNA sequencing projects such as those being done by Genomics England and others: data management and speed. Vast amounts of research data have to be stored and retrieved, and this large-scale biologic data has to be processed quickly in order to gain meaningful insights.

Scale is another key factor. Sequencing and storing genomic information digitally is a data-intensive endeavor, to say the least. Just sequencing a single genome creates hundreds of gigabytes and the Project has sequenced over 13,000 genomes to date, which is expected to generate ten times more data over the next two years. The data lake being used by Genomics England allows 17 petabytes of data to be stored and made available for multi-protocol analytics (including Hadoop).

For perspective, 1 PB is a quadrillion bytes – think of that as 20 million four-drawer filing cabinets filled with text. Or, considering the Milky Way has roughly two hundred billion stars in its galaxy, if you count each single star as a single byte – it would take 5,000 Milky Way galaxies to reach 1PB of data. It’s staggering.

The potential of being able to contribute to eradicating disease and identify exciting new treatments is truly awe inspiring.  And considering the immense scale of the data involved – 5,000 galaxies! – provides new context around reaching for the stars.

Get first access to our LifeScience Solutions

 

Metalnx: Making iRODS Easy

Stephen Worth

Stephen Worth is a director of Global Innovation Operations at Dell EMC. He manages development and university research projects in Brazil, is a technical liaison between helping to improve innovation across our global engineering labs, and works in digital asset management leveraging user defined metadata. Steve is based out of Dell EMC’s RTP Software Development Center which focuses on data protection, core storage products, & cloud storage virtualization. Steve started with Data General in 1985, which was acquired by EMC in 1999, and Dell Technologies in 2016. He has led many product development efforts involving operating systems, diagnostics, UI, database, & applications porting. His background includes vendor & program management, performance engineering, engineering services, manufacturing, and test engineering. Steve, an alumnus of North Carolina Status University, received a B.S. degree in Chemistry in 1981 and M.S. degree in Computer Studies in 1985. He served as an adjunct faculty member of the Computer Science department from 1987-1999. Currently Steve is an emeritus member of the Computer Science Department’s Strategic Advisory Board and is currently chairperson of the Technical Advisory Board for the James B. Hunt Jr. Library on Centennial Campus.

Latest posts by Stephen Worth (see all)

DNA background

Advances in sequencing, spectroscopy, and microscopy are driving life sciences organizations to produce vast amounts of data. Most organizations are dedicating significant resources to the storage and management of that data. However, until recently, their primary efforts have focused on how to host the data for high performance, rapid analysis, and moving it to more economical disks for longer-term storage.

The nature of life sciences work demands better data organization. The data produced by today’s next-generation lab equipment is rich in information, making it of interest to different research groups and individuals at varying points in time. Examples include:

  • Raw experimental and analyzed data may be needed as new drug candidates move through research and development, clinical trials, FDA approval, and production
  • A team interested in new indications for an existing chemical compound would want to leverage work already done by others in the organization on the compound in the past
  • In the realm of personalized medicine, clinicians may need to evaluate not only a person’s health history, but correlate that information with genome sequences and phenotype data throughout the individual’s life.

The great challenge is how to make data more generally available and useful throughout an organization. Researchers need to know what data exists and have a way to access it. For this to happen, data must be properly categorized, searchable, and easy to find.

To get help in this area, many research organizations and government agencies worldwide are using the Integrated Rule-Oriented Data System (iRODS), which is open source data management software developed by the iRODS Consortium. iRODS enables data discovery using a data/metadata catalog that can retain machine and user-defined metadata describing every file, collection, and object in a data grid.

Additionally, iRODS automates data workflows with a rule engine that permits any action to be initiated by any trigger on any server or client in the grid. iRODS enables secure collaboration, so users only need to login to their home grid to access data hosted on a remote, federated grid.

Leveraging iRODS can be simplified and its benefits enhanced when used with Metalnx, an administrative and metadata management user interface (UI) for iRODS. Metalnx was developed by Dell EMC through its efforts as a corporate member of the iRODS Consortium. The intuitive Metalnx UI helps both the IT administrators charged with managing metadata and the end-users / researchers who need to find and access relevant data based upon metadata descriptions.

Making use of metadata via an easy to use UI provided by Metalnx working with iRODS can help:

  • Maximize storage assets
  • Find what’s valuable, no matter where the data is located
  • Automate movement and processing of data
  • Securely share data with collaborators

Real world example: Putting the issues into perspective

A simple example illustrates why iRODS and Metalnx are needed. Plant & Food Research, a New Zealand-based science company providing research and development that adds value to fruit, vegetable, crop and food products, makes great use of next-generation sequencing and genotyping. The work generates a lot of mixed data types.

“In the past, we were good at storing data, but not good at categorizing the data or using metadata,” said Ben Warren, bioinformatician, at Plant & Food Research. “We tried to get ahead of this by looking at what other institutions were doing.”

iRODS seemed a good fit. It was the only decent open source solution available. However, there were some limitations. “We were okay with the rule engine, but not the interface,” said Warren.

A system administrator working with EMC on hardware for the organization’s compute cluster had heard of Metalnx and mentioned this to Warren. “We were impressed off the bat with its ease of use,” said Warren. “Not only would it be useful for bioinformaticians, coders, and statisticians, but also for the scientists.”

The reason: Metalnx makes it easier to categorize the organization’s data, to control the metadata used to categorize the data, and to use the metadata to find and access any data.

Benefits abound

At Plant & Food Research, metadata is an essential element of a scientist’s workflow. The metadata makes it easier to find data at any stage of a research project. When a project is conceived, scientists will start by determining all metadata required for the project using Metalnx and cataloging data using iRODS. With this approach, everything associated with a project including the samples used, sample descriptions, experimental design, NGS data, and other information are searchable.

One immediate benefit is that someone undertaking a new project can quickly determine if similar work has already been done. This is increasingly important in life science organizations as research become more multi-discipline in nature.

Furthermore, the more an organization knows about its data, the more valuable the data becomes. Researchers can connect with other work done across the organization. Being able to find the right raw data of a past effort means an experiment does not have to be redone. This saves time and resources.

Warren notes that there are other organizational benefits using iRODS and Metalnx. When it comes to collaborating with others, the data is simply easier to share. Scientists can put the data in any format and it is easier to publish the data.

Learn more

Metalnx is available as open source tool. It can be found at Dell EMC Code www.codedellemc.com  or on Github at www.github.com/Metalnx . EMC has also made binary versions available on bintray at www.bintray.com/metalnx  and a Docker image posted on Docker Hub at https://hub.docker.com/r/metalnx/metalnx-web/

A broader discussion of the use of Metalnx and iRODS in the life sciences can be found in an on-demand video of a recent web seminar “Expanding the Face of Meta Data in Next Generation Sequencing.” The video can be viewed on the EMC Emerging Tech Solutions site.

 

Get first access to our LifeScience Solutions

Isilon – A Storage Solution that evolves with Next-Generation Sequencing

Biswajit Mishra

Sr. Product Marketing Manager, Life Sciences at EMC²

EMC Isilon Life SciencesThe creation and understanding of the binary code was the cornerstone of the development of the modern digital technology age. The language of 0’s and 1’s opened the door to many new technologies including the internet, modern media platforms, telecommunications, smartphone applications and more. For Life Sciences, something similar happened when we unlocked the genetic code. The language of A’s, T’s, C’s and G’s could explain all life forms and led to an explosion in the development of products and services in the areas of pharmaceuticals, healthcare, and agriculture.

The Beginning

In 1976, a single-stranded RNA virus named “Bacteriophage MS2” that infects E coli became the first genome with 3569 bases to be completely sequenced.  Then in 1995, the Institute for Genomic Research performed the completed genome sequencing of “Haemophilus influenza” – the first full genome sequencing of a living organism with 1.8M base pairs (Mb). But when the Human Genome Project sequenced the entire human genome with 3.2 billion base pairs in 2001, it opened a whole new world of possibilities.

Demands on IT divisions aka “The NGS Data Challenge”

The field of Life Sciences continues to remain at the cutting edge of scientific research with improvements in technologies such as next generation sequencing (NGS), proteomics, etc. NGS instrument such as the Illumina HiSeq X Ten Systems can now sequence over 18,000 human genomes per year which is unbelievable when you consider that it took more than a decade to sequence the entire human genome as part of the Human Genome Project. And, keep in mind each human genome needs about 100 GB of storage, so an Illumina HiSeq X Ten Systems needs about 1.8 PB of storage per year. The improvements, particularly in NGS technology and the growing applications of NGS means that more and more organizations are using NGS services and generating data at a rate as never before. And, in this day and age of flattening IT budgets, the IT departments in these organizations cannot keep up with the pace of data generation.

For most organizations performing NGS, the IT department has to deliver on the following fronts:

  • Meet the performance and capacity needs: Most Life Sciences workflows containing NGS include data generation, analysis, and archiving stages. Each of these stages has unique capacity and performance needs. Companies performing NGS usually require sequencing to run 24/7 and it is left to the IT department to ensure that the right production performance and storage tiers are available as needed.
  • Plan for the future: For IT departments with a longer IT procurement cycle, it is a struggle to ensure that the IT purchase process keeps up with the storage needs of the exponentially growing data produced from NGS processes. IT divisions have to ensure that they plan for today as well as the future to meet the demanding customer SLAs.
  • Cost efficient solutions: IT budget growth typically is nowhere close to the rate of the data growth. This means that IT departments have to ensure that they are smart and cost efficient with their storage solutions.

Isilon, a trusted partner for Life Sciences organizations – now and in the future

EMC Isilon is a leader and trusted partner for hundreds of Life Sciences organizations worldwide including leading genome centers, pharmaceutical companies, and academic research centers. Isilon provides a highly available, and reliable, single file system, OneFS, for Life Sciences workflows and consolidates all the workflows into a single, scalable volume for the ease of management. Isilon can host multiple types of nodes – S, X, NL and HD – in a single cluster. S and X nodes are ideal for high performance workflows such as mapping and alignment of NGS data while NL and HD nodes provide high-density, low cost storage of raw data and analysis results for archive needs. Isilon SmartFail and Auto Balance features also ensure that data is protected across the entire cluster.

Isilon gets better

Recently EMC Isilon announced Data Lake 2.0 – including OneFS 8.0, Isilon SD Edge and Isilon CloudPools – to extend the data lake from the data center to edge locations, including remote and branch offices and to both public and private cloud.

IsilonSD EdgeThe Data Lake 2.0 has massive benefits for Life Science organizations using Isilon. With OneFS 8.0 organizations can now achieve non-disruptive upgrade and rollback capabilities to ensure continuous operations at all times. IsilonSD Edge is a software-defined storage solution that can be deployed on commodity hardware on remote research facilities. This can provide Life Science organizations with the ease of management to connect geographically dispersed research facilities in a cost efficient manner.

Isilon CloudPools is a game changer for organizations Isilon CloudPoolsperforming NGS, providing a cost efficient way of archiving data allowing the IT department to make the most of their budget. Isilon CloudPools offers a transparent, policy-based, automated storage tiering of data from an on-premise Isilon cluster to either a private cloud based on EMC Elastic Cloud Storage (ECS), another Isilon cluster or into a public cloud like AWS or Azure for long term storage of Life Science research data and other archiving needs. By moving the cold archive data to either a public or private cloud, CloudPools helps the IT departments to free up space on production environments and ensure that the NGS analysis pipelines are not affected due to a lack of high performance storage. This also allows IT departments to deliver on customer SLAs while at the same time, reducing costs.

Improve Life Sciences Data Management

We understand the need of Life Science organizations for a versatile storage solution to balance their capacity and performance requirements. That’s why we continue to evolve our Isilon platform to meet the needs of our Life Science customers. New innovative features like EMC Isilon OneFS 8.0, IsilonSD Edge and Isilon CloudPools makes sure that our Life Sciences customers have a proven solution to run NGS non-stop and achieve faster time to insights. With EMC Isilon, Life Sciences organizations have a trusted partner to handle their NGS workloads not only today but in the future too.

For more information on how EMC Emerging Technology Division can help your Life Science organization click here.

 

Get first access to our LifeScience Solutions

Telemedicine Part 1: TeleRadiology as the growth medium of Precision Medicine

Sanjay Joshi

CTO, Healthcare & Life-Sciences at EMC
Sanjay Joshi is the Isilon CTO of Healthcare and Life Sciences at the EMC Emerging Technologies Division. Based in Seattle, Sanjay's 28+ year career has spanned the entire gamut of life-sciences and healthcare from clinical and biotechnology research to healthcare informatics to medical devices. His current focus is a systems view of Healthcare, Genomics and Proteomics for infrastructures and informatics. Recent experience has included information and instrument systems in Electronic Medical Records; Proteomics and Flow Cytometry; FDA and HIPAA validations; Lab Information Management Systems (LIMS); Translational Genomics research and Imaging. Sanjay holds a patent in multi-dimensional flow cytometry analytics. He began his career developing and building X-Ray machines. Sanjay was the recipient of a National Institutes of Health (NIH) Small Business Innovation Research (SBIR) grant and has been a consultant or co-Principal-Investigator on several NIH grants. He is actively involved in non-profit biotech networking and educational organizations in the Seattle area and beyond. Sanjay holds a Master of Biomedical Engineering from the University of New South Wales, Sydney and a Bachelor of Instrumentation Technology from Bangalore University. He has completed several medical school and PhD level courses.

Real “health care” happens when telemedicine is closely joined to a connected-care delivery model that has prevention and continuity-of-care at its core. This model has been defined well, but only sparsely adopted. As John Hockenberry, host of the morning show “The Takeaway” on National Public Radio, eloquently puts it: “health is not episodic.” We need a continuous care system.

Telemedicine makes it possible for you to see a specialist like me without driving hundreds of miles

Image source: Chest. 2013,143 (2):295-295. doi:10.1378/chest.143.2.295

How do we get the “right care to the right patient at the right time”? Schleidgen et al define Precision Medicine also known as Personalized Medicine (1) as seeking “to improve stratification and timing of health care by utilizing biological information and biomarkers on the level of molecular disease pathways, genetics, proteomics as well as metabolomics.” Precision Medicine (2) is an orthogonal, multimodal view of the patient from her/his cells to pathways to organs to health and disease. There are several devices and transducers that would catalyze telemedicine: Radiology, Pathology, and Wearables. I will focus on Radiology for this part of my three-part series, since all of these modalities use multi-spectral imaging.

Where first?
The world is still mostly rural. According to World Bank statistics, 19% of the USA is rural, but the worldwide average is about 30% which is a spectrum from 0% rural (Hong Kong) to 74% rural (Afghanistan). With the recent consolidations (since 2010 in the US) of hospitals into larger organizations (3), it is this 30% to 70% of the world with sparse network connectivity that needs telemedicine sooner than the well-off “worried well” folks who live in dense urban areas with close access to healthcare. China has the world’s largest number of hospitals at around 60,000 followed by India at around 15,000. The US tally is approximately 5,700 hospitals. The counter-argument to the rural needs in the US is the risk of reduction of physician numbers (4), the growing numbers of the urban poor and the elderly. Then there is the plight of poor health amongst the world’s millions of refugees who are usually stuck in no-mans-lands, fleeing conflicts that never seem to wane. All these use-cases are valid, but need prioritization.

Connected Health and the “Saver App”
Many a fortune has been made by devising and selling “killer apps” on mobile platforms. In healthcare what we need is a “saver app.” Using the pyscho-social keys to the success of these “sticky” technologies, Dr. Joseph C. Kvedar succinctly builds the case for connected health in his recent book “The Internet of Healthy Things” with three strategies and three tactics:

Strategies: (1) Make It about Life; (2) Make It Personal; and (3) Reinforce Social Connections.

Tactics: (1) Employ Messaging; (2) Use Unpredictable Rewards; and (3) Use the Sentinel Effect.

Dr. Kvedar calls this “digital therapies.”

The Vendor Neutral Archive (VNA) and Virtual Radiology
The Western Roentgen Society, a predecessor of the Radiological Society of North America (RSNA), was founded in 1915 in St. Louis, Missouri (soon after the invention of the X-Ray tube in Bavaria in 1895). An interactive timeline of Radiology events can be seen here. Innovations in Radiology have always accelerated the innovations in healthcare.

The Radiology value chain is in its images and clinical reporting, as summarized in the diagram below (5):

Radiology value chain

To scale this value-chain for telemedicine, we need much larger adoption of VNA, which is an “Enterprise Class” data management system. A VNA consolidates multiple Imaging Departments into:

  • a master directory,
  • associated storage and
  • lifecycle management of data

The difference between PACS (Picture Archiving and Communications System) (6) and VNA is the Image Display and the Image Manager layers respectively.

The Image Display layer is a PACS Vendor or a Cloud based “image program”. All Admit, Discharge and Transfer (ADT) information must reside with the image. This means DICOM standards and HL7 X.12N interoperability (using service protocols like FHIR) are critical. The Image Manager for VNA is the “storage layer of images”, either local or cloud based. For telemedicine to be successful, VNA must “scale-out” exponentially and in a distributed manner within a privacy and security context.

VNA’s largest players (alphabetically) are: Agfa, CareStream, FujiFilm (TeraMedica), IBM (Merge), Perceptive Software (Acuo), Philips and Siemens. The merger of NightHawk Radiology with vRad which was then acquired by MedNax and IBM’s acquisition of Merge Healthcare (in Aug 2015) are important landmarks in this trend.

One of the most interesting journal articles in 2015 was on “Imaging Genomics” (or Radiomics) of glioblastoma, a brain cancer. By bidirectionally linking imaging features to the underlying molecular features, the authors (7) have created a new field of non-invasive genomic biomarkers.

Imagine this “virtual connected hive” of patients on one side and physicians, radiologists and pathologists on the other, constantly monitoring and improving the care of a population in health and disease at the individual and personal level. Telemedicine needs to be the anchor architecture for Precision Medicine. Without Telemedicine (and VNA), there is no Precision Medicine.

Postscript: Telepresence in mythology
Let me end this tale of distance and care with a little echo from my namesake, Sanjaya, who is mentioned in the first chapter of the first verse of the Bhagvad Gita (literally translated as the “Song of the Lord”) – an existential dialog between the warrior Arjuna and his charioteer, Krishna. The Gita, as it is commonly known, is set within the longest Big Data poem with over 100,000 verses (and 1.8 million words), the Mahabharata, estimated to be first written around 400 BCE.

Dhritarashtra, the blind king, starts this great book-within-book by enquiring: “O Sanjaya, what did my sons and the sons of Pandu decide about battle after assembling at the holy land of righteousness Kurukshetra?”

Sanjaya starts the Gita by peering into the great yonder. He is bestowed with the divine gift of seeing events afar (divya-drishti); he is the king’s tele-vision – and Dhritarashtra’s advisor and charioteer (just like Krishna in the Gita). The other great religions and mythologies also mention telepresence in their seminal books.

My tagline for the “trickle down” in technology innovation flow is “from Defense to Life Sciences to Pornography to Finance to Commerce to Healthcare.” One interpretation of the Mahabharata is that it did not have any gods – all miracles were added later. Perhaps we have now reached the pivot point for telepresence which has happened in war to “trickle down” into population scale healthcare without divine intervention or miracles!

References:

  1. Schleidgen et al, “What is personalized medicine: sharpening a vague term based on a systematic literature review”, BMC Medical Ethics, Dec 2013, 14:55
  2. “Toward Precision Medicine”, Natl. Acad. Press, June 2012
  3. McCue MJ, et al, “Hospital Acquisitions Before Healthcare Reform”, Journal of Healthcare Management, 2015 May-Jun; 60(3):186-203.
  4. Petterson SM, et al, “Estimating the residency expansion required to avoid projected primary care physician shortages by 2035”, Annals of Family Medicine 2015 Mar; 13(2):107-14. doi: 10.1370/afm.1760
  5. Enzmann DR, “Radiology’s Value Chain”, Radiology: Volume 263: Number 1, April 2012, pp 243-252
  6. Huang HK, “PACS and Imaging Informatics: Basic Principles and Applications”, Wiley-Blackwell; 2 edition (January 12, 2010)
  7. Moton S, et al, “Imaging genomics of glioblastoma: biology, biomarkers, and breakthroughs”, Topics in Magnetic Resonance Imaging. 2015

 

Get first access to our LifeScience Solutions

Making Trust and Collaboration a Unified Force in Science

Sanjay Joshi

CTO, Healthcare & Life-Sciences at EMC
Sanjay Joshi is the Isilon CTO of Healthcare and Life Sciences at the EMC Emerging Technologies Division. Based in Seattle, Sanjay's 28+ year career has spanned the entire gamut of life-sciences and healthcare from clinical and biotechnology research to healthcare informatics to medical devices. His current focus is a systems view of Healthcare, Genomics and Proteomics for infrastructures and informatics. Recent experience has included information and instrument systems in Electronic Medical Records; Proteomics and Flow Cytometry; FDA and HIPAA validations; Lab Information Management Systems (LIMS); Translational Genomics research and Imaging. Sanjay holds a patent in multi-dimensional flow cytometry analytics. He began his career developing and building X-Ray machines. Sanjay was the recipient of a National Institutes of Health (NIH) Small Business Innovation Research (SBIR) grant and has been a consultant or co-Principal-Investigator on several NIH grants. He is actively involved in non-profit biotech networking and educational organizations in the Seattle area and beyond. Sanjay holds a Master of Biomedical Engineering from the University of New South Wales, Sydney and a Bachelor of Instrumentation Technology from Bangalore University. He has completed several medical school and PhD level courses.

Try to recall all the superhero movies you have watched. Many of us would agree that the films which are most captivating are those where superheroes collaborate as a team to defeat a near invincible villain – like in The Avengers. When there is collaboration, there is trust. Dr. Douglas Fridsma, President and CEO of the AMIA (American Medical Informatics Association) mentioned a phrase in a panel discussion we were on in 2012 that stuck with me. “Information moves at the speed of trust.” And, trust is at the heart of any collaboration. New forms of trust and collaboration networks have been forming since 2008, and the “bitcoin” is a great example of this. The “BlockChain” method behind bitcoin, discussed in an article published by The Economist and illustrated in the figure below is a new approach to trust and collaboration.

This figure illustrates the "BlockChain" method behind bitcoin.

Scientists are our Modern-day X-Men

Taking this parallel and comparing it to the sphere of scientific research – notably in biomedical sciences, this is an area where breakthroughs can deliver better health outcomes for mankind – like superheroes do, but only if scientists have the means of working together. All humans are continuously mutating; and I’d like to think that scientists are our modern day X-Men (and Women)!

The two most exciting disruptions in science recently are Synthetic Biology advances and the CRISPR gene editing enzyme system. Both of these innovations have enormous implications for biomedical sciences and the future of healthcare advancements.

Recently, a young girl in London with Leukemia was treated with Gene Editing Therapy. This case used a gene editing enzyme system called TALEN. This is but one of the many leaps that have been made possible through collaborative scientific research.

Fueling and Steering Scientific Research

Research is fueled by data. Discoveries are steered by the management of data. Even superheroes like Iron Man and The Incredible Hulk need some form of cognitive direction to focus their superhuman powers in order to achieve a common desired outcome. To build trust and collaboration frameworks, we need a single logical container of data. And this is why EMC has the Data Lake concept: a multi-user, multi-protocol, multi-application container for data which is geo-aware and secure.

We know that research is ultra-data intensive. To implement Precision Medicine at population health scale, there are two pivots: Collaboration and Asia. The Malaysia Genome Institute (MGI) engages in national and international collaborative projects in comparative genomics and genetics, structural and synthetic biology, computational and systems biology, and metabolic engineering. When MGI does DNA sequencing, whole genome sequencing, whole transcriptome sequencing, and targeted sequencing, a single run generates 13 terabytes of data. That’s equivalent to over 2.6 million songs in your iPod.

Being able to discover insights through large chunks of data is what differentiates progress from stalemate for the institution and its partners. MGI had a problem. As MGI increased its storage capacity to cope with the influx of research data, data processing speed decreased, which slowed down analysis work.

That was before MGI adopted EMC Isilon’s scalable on demand storage solution with its fast next-generation sequencing architecture. With the added benefit of having data access provided directly to users, this has also curbed the problem of bottlenecks within workflows and ensured ease of collaboration.

Read the MGI Case Study to learn more.

Tools for Teamwork in Research

Singapore’s Agency for Science, Technology and Research (A*STAR) is a single agency that oversees 14 biomedical sciences, physical sciences, and engineering institutes as well as six consortia and centers.

So how does A*STAR encourage collaboration amongst scientists housed in different institutions?

There were two key issues A*STAR needed to address. One, sharing of data between institutions was done manually by researchers, who had to make a copy to transfer it to another party. It was both time consuming and wasteful in terms of storage due to the duplication of data within localized machines.

Two, long procurement periods – three to nine months – meant A*STAR didn’t have the means to scale up storage when the demand called for it. The opportunity cost was great.

Following the deployment of a comprehensive EMC Isilon platform, all that changed. Atop the increase in usable capacity with an option to scale on demand, researchers could now assign their data to a central storage, which could be shared within and across research institutes.

Says Lai Loong Fong, Director, Computational Resource Centre at A*STAR. “Users have been receptive to the new model. They are looking forward to the new features we can offer them to provide greater flexibility in accessing research data through their mobiles or laptops when they are working and meeting outside of the labs. It’s another way we can support innovation and collaboration across all of our research disciplines.”

Read the A*STAR Case Study to learn more.

Subject Data Protection

According to DOE Human Subjects Resources, the use of humans as research subjects has aided significant scientific discoveries such as the Human Genome Project. That being said, given that one’s genome contains personal health and other privy information, there needs to be measures in place to protect each subject’s privacy and prevent the loss of information. There are Ethical, Legal and Social Implication (ELSI) issues which can be resolved by trust and collaboration, as published by the Genome Law Review.

Looking at A*STAR as an example again, the agency has incorporated EMC Isilon SnapshotIQ into their platform which offers data protection through secure inbox snapshots and access to near-immediate, on-demand snapshot restores.

The sum of many great minds can achieve much greater things than the sum of one. And even greater still, scalable data storage on the cloud now makes it possible for great minds to work together, regardless of where they are. We can only begin to imagine what our modern-day science X-Men would be able to actualize in these new dynamic and secure collaborative environments.

 

Get first access to our LifeScience Solutions

Follow Dell EMC

Categories

Archives

Connect with us on Twitter