Big Data comprises of data sets of structured and unstructured data that are complex and extremely large. This has become one of the core study areas in recent years as the traditional software, algorithm, and data repositories are inadequate to extract information from it. Information extraction means collecting, processing, analyzing, and storing the data. Another critical point here is with the development of the Internet, the Internet of things, the mobile Internet, finance, social media, biology, and digital medicine. The volume of data has increased dramatically. Big Data not only describes the large size of data but also implies rapid data processing ability and innovative technology and approaches for handling the data.
Big data evolution happened in different phases, and required software has been developed accordingly. So, we can say Big data not only expanded in size but also in data technology. This is required for Big data –
- Correlation analysis
- Clustering analysis
- Modeling
- Prediction
- Hypothesis verification
We need advanced hardware and software for data acquisition, extraction, processing, analysis, and storage. Currently, infrastructure for Big Data includes servers, storage systems, cloud service, and networking equipment. Software for Big Data includes parallel and distributed file systems, retrieval software, and data-mining software.
In this blog, we will discuss the recent growth of Big data in healthcare. Why has Big data been introduced in healthcare? With the significant growth of the clinical sector, related technologies also got combined with the healthcare sector, which results in the massive growth of data. To handle, store, and analyze such massive amounts of data, big data techniques are being used in the healthcare sector.
To add more, Big Data in health care has its own features, such as heterogeneity, incompleteness, timeliness and longevity, privacy, and ownership. These features bring a series of challenges for data storage, mining, and sharing to promote health-related research. To deal with these challenges, analysis approaches focusing on Big Data in health care need to be developed, and laws and regulations for making use of Big Data in health care need to be enacted.
Major Types and Sources of Big Data in Health Care
Big Data in health care can be classified into four main types based on the data sources,
- Big Data in medicine, or medical/clinical Big Data
- Big Data in public health and behavior
- Big Data in medical experiments
- Big Data in the medical literature.
Big Data in medicine and clinics
As the term suggests, Big Data generated from hospitals, such as data generated from medical imaging and clinics, is known as clinical data. It is often closely associated with doctors and patients. This also includes Big Data in medicine generated from historical clinical activities and has significant effects on the medical industry. Such Big data can be used to plan treatment paths for patients, process clinical decision support (CDS), and improve healthcare technology and systems. So, the following sources can be used as a source of Big data in medicine –hospital information resources, surgeons’ work, activities of anesthesia, physical examinations, radiography, magnetic resonance imaging (MRI), computer tomography (CT), information of patients, pharmacy, treatment, medical imaging, and imaging report. Activities mentioned here generate a large number of records that includes –
- information of patients
- diagnosis
- medicine scheme
- notes from physicians
- sensor data
Now, what are the data generate through these activities?
- electronic health record (EHR)/ electronic medical record (EMR)
- personal health record (PHR)
- medical images.
Big Data in public health and behavior
Big Data in public health and behavior means physiological data of users collected by various portable equipment such as electrocardiograms, vitals such as temperature measuring monitors, BP machines, etc. Apart from that, there are smart devices such as smartphones from Google, Apple, or Samsung. Also, ODLs play a crucial role in recording personal daily health and behavior. Besides, Google applies search engine results for predicting infectious disease as part of public health.
Big Data in medical experiment
This type of Big Data focuses on molecular human body data set, biology, clinical trials, gene sequences, biology samples, and clinical and medical research laboratory tests. This medical experiment data focuses on the interaction and regulation of biological activities within cells, such as interactions between DNA, RNA, proteins, and biosynthesis. With a close relationship with fields of biochemistry and genes. Similarly, human body data sets include samples of cells, tissues, and organs in the human body and photographs of the human anatomy. Next comes a clinical trial that involves clinical research, which evaluates the effectiveness of new medical treatment through the study on human volunteers. Gene sequencing, mainly referring to DNA sequencing, is a medical research activity of obtaining precise order of nucleotides within DNA. This process results in a large amount of data for recording DNA sequences.
Big Data in the medical literature
With the rapid expansion of the medical and clinical area, research articles and structured knowledge are produced at high speed. Additionally, it includes many older materials in the medical/clinical area. This literature makes a significant contribution to Big Data in health care.
Features of Big Data in Health Care
In addition to the “5V” features of Big Data, Big Data in health care has its own unique features, such as –
- heterogeneity
- incompleteness
- timeliness
- longevity
- data privacy
- ownership.
Heterogeneity
Big Data in health care often come in either structured or unstructured format. For example, some EHR collect data in structured formats; however, the majority of Big Data in health care is unstructured, including data from CT, MRI, X-ray, Holter monitoring, angiography, and laboratories. As mentioned earlier, there are heterogeneous Big data sources in healthcare, and there is often a shortage of tools to analyze this data.
Incompleteness
As the data generates from monitoring devices at a constant stream, it is often difficult to save those data at the same rate. For example, electrocardiogram records. This is too expensive to store all data. Additionally, the EHR requires doctors or nurses to record patients’ disease information, such as medications and allergies, and this process may also lead to data incompleteness.
Timeliness and longevity
There is a delay time for every medical operation for hospital information systems such as EHR information, Medical signals such as electrocardiogram (ECG), EEG, MRI, Single Photon Emission Computed Tomography (SPECT) images, and thus have vital timeliness. So, getting real-time medical/health information is a major challenge for Big Data in health care analytics, and HIS needs to maximize data timeliness.
Data privacy
HIS data must maintain extreme privacy as they are sensitive data and containing patient and hospital information. The main privacy concerns pose limitations in linking external data to individual insured data, improving consumer health-related experience, and personalizing service and care.
Ownership
Consumer medical records are stored and controlled by medical bodies like hospitals, laboratories, clinics, pharmacies, government agencies, etc. The data is stored in innumerable data silos. However, consumers may lack access to these data or have little control over their own health data. To solve this problem, the cooperative, which is an old and successful form of corporation that citizens entirely own, is an effective approach. Each consumer has one account that stores and manages all health care data. They can share subsets or all the data for research purposes.
Importance of Big Data in Health Care
As of now, we have discussed various forms and sources of big data in healthcare. Next comes the use of Big data analysis. Because it is important to extract valuable information and discard useless fragments from the collected Big Data. Such Big data analytics is widely used for clinical diagnosis, medical research, hospital management, and fundamental demand in medicine. Depending on this analysis, patients may have personalized medicine and patient-centric care. We can consider the benefits from the perspective of the research body and the hospital, from the perspective of the public/ government, and from the patients’ and their relatives’ perspectives.
- Research institutions could better understand the mechanisms and effects of newly developed drugs through BDA. For example, it could also reprocess cancer data to hunt for new cancer drugs. Through using statistical tools and algorithms, researchers could improve the clinical trial design and reduce trial failures.
- Big data analytics could reduce costs in the medical domain by analyzing cost efficiencies in treatments and timeliness.
- Big data analytics could help governments prevent the spread of infectious diseases along with smartphone technology. The method was also used to detect outbreaks of epidemics like flu. So, Governments can thus respond more quickly to epidemics and help people avoid the disease. Also, it has the potential to reveal regional health problems.
Applications
Applications in public health: By mining web-based and social media data, big data analytics can provide a new solution in the public health arena. It could be for predicting outbreaks, disease trends, and drug safety. Also, it is often used for monitoring disease networking. Google’s search queries have a significant role here. Research shows that one-third of consumers currently use social networking for health care purposes (Facebook, YouTube, blogs, Google, Twitter).
Disease pattern analysis and personalized medicine: Social data related to environmental information are used to create a dynamic and real-time global infectious disease map. Based on infectious disease risk maps, human beings can deepen their knowledge of infectious diseases and improve the ability to triage spatially and issue infectious disease outbreak alerts. Personalized healthcare is a data-driven approach. This means a kind of patient-centered medical model that assesses the relationship among patients exposed to similar risk, lifestyle, and environmental factors created.
Challenges associated with Big data in healthcare
Data mining
Integrating, analyzing, and storing Clinical Big Data is difficult because they consist of a huge amount of unstructured data such as natural language or other handwritten data. Also, discovering how to effectively analyze a large set of unstructured data can still be a significant challenge. One of the fundamental characteristics of Big Data is having a variety of data sources. The processing speed of data in the medical industry is exceptionally challenging when the patient is in a critical condition. Additionally, the patient’s data privacy and security are also demanding when using real-time applications such as cloud computing to access and analyze data. Cloud computing now provides new capabilities for Medical Big Data’s mining and sharing but there are certain obstacles to be overcome before cloud computing become much more efficient. The first one is, the increasing risk of privacy disclosure in fields like clinical and public health informatics, even when cloud computing provides a simple and versatile way to mine resources. Second, the network bandwidth constraints affect the speed of data transmission and the cost increments of cloud computing since the healthcare industry often required a huge amount of data to be imported or exported to the cloud.
Data storage
Nowadays, the difficulties in data storage are mainly because of the high cost and the huge amount of data is one of the reasons for the rising storage cost. The medical industry produces a large amount of data from time to time with the development of medical information. As an example, regional medical data are usually derived from a region with millions of people and hundreds of medical institutions and the amount of data expanding continuously. According to the relevant provisions of the medical industry, a patient’s data should be retained for about 50 years. This data includes online or real-time data and various types of data such as diagnosis and medication recommendations, structured data tables, non-structured or semi-structured text documents, medical images, and other information. There is no doubt that these exceedingly large amounts of data increase the cost and difficulty of storage. Moving data from one place to another and analyzing data is also required a great amount of cost. The types of medical data include numerical data that record disease tests and unstructured data such as diagnostic images, records, speech, and video made by doctors and nurses. Unstructured data are more difficult to analyze, manipulate and store. Hence, they increase the cost of storage to a certain extent. Maintaining safety and privacy within the process of storing, extracting, and downloading patient-related data is also challenging.
Limited data standardization and interoperability
The current standards and technologies are incompetent to satisfy the needs of the integrative applications of Big Data in the healthcare sector. The difficulties are having two aspects. The first one is the lack of data uniform standards, consistent description format, and presentation methods. The second one is the difficulty of integrating different levels of structure, semi-
structured and unstructured data. Each database uses different software and data formats. An unstructured database makes data comparison, analysis, transfer, sharing, and other processes more complicated. Data integration also can reduce the cost.
Information barriers
The users of Big Data in the medical sector expands through a wide range such as hospital clinics, regional medical centres, medical insurance companies, drug management analysis units and medical equipment monitoring centres. The interrelated data resources are scattered in several data pools, including hospital medical records, settlement and cost data, medical firms’ records, academic medical research data, residents’ health records collected by regional health information platforms, and population and public health data of state surveys. There’s not much connection between these data sets. At the same time, the data-sharing mechanism is imperfect because of the information barriers among hospitals, research institutes and other institutes. For instance, in China, medical institutes have limited communication and sharing as a whole. With the globalization of data, Big Data in healthcare will also face language, terminology and standardization barriers.
Volume of data
The huge volume of healthcare Big Data at the terabyte level and even petabyte level is now beyond the control of personal computers and network file-sharing programs. Hence, establishing new sharing mechanisms is critically needed.
Data privacy
The data in healthcare sectors are more sensitive and centralized than other sorts of Big Data. The confidentiality of data is a significant concern. However, there is no determined solution for the problem of patient data privacy protection. There are many real cases in the world regarding patient data leakages and it results in unpredictable consequences. So, Big Data technology puts personal medical data at a greater risk. Some people believe that protecting personal privacy is impossible in an era of Big Data but these difficulties can be mitigated by using appropriate methods such as de-identification and digital identity encryption even though these methods still require people or applications to process.