Progress Towards and Challenges in Biological Big Data

Biological big data represents a vast amount of data in bioinformatics and this could lead to the transformation of the research pattern into large scale. In medical research, a large amount of data can be generated from tools including genomic sequencing machines. The availability of advanced tools and modern technology has become the main reason for the expansion of biological data in a huge amount. Such immense data should be utilized in an efficient manner in order to distribute this valuable information. Besides that, storing and dealing with those big data has become a great challenge as the data generation are tremendously increasing over years. As well, the blast of data in healthcare systems and biomedical research appeal for an immediate solution as health care requires a compact integration of biomedical data. Thus, researchers should make use of this available big data for analysis rather than keep creating new data as they could provide meaningful information with the use of current advanced bioinformatics tools.


Introduction
Big data is a term that is used to elucidate the huge data with certain features such as data volume, data type (variety), data generation speed (velocity), data unpredictability (variability) and data quality (veracity) (Greene et al., 2014, Yang et al., 2017. The generation of big data which evaluate the complex biological system is feasible with modern technologies such as next-generation sequencing (Nekrutenko andTaylor, 2012, Schuster, 2008). The big data grows rapidly due to the generation of data from the diverse sources and quick conversion of digital technologies in this modern era (Kashyap et al., 2015, Yang et al., 2017, Stephens et al., 2015. In addition, the terabytes of data are generated in the large depository databases every day through modern information systems (Marx, 2013). Hence, big data analysis has become the recent area of interest in research and development (Acharjya and Ahmed, 2016). Managing the big data and connecting them is critical particularly in the biological/ biomedical research area (Marx, 2013). Moreover, accessible data are not completely employed as it requires basic skills and upgraded solutions (Wang et al., 2017). The aim of this short review is to discuss the progress towards and challenges in biomedical and biological Big Data.

The progress towards
The blast of data in healthcare systems, biotechnology and biomedical research appeal for an immediate solution as health care requires a compact integration of biomedical data (Luo et al., 2016, Liang andKelemen, 2016). This is important for stimulating personalized medicine as well as better treatments (Alyass et al., 2015). Nevertheless, the presence of the latest solution to engage with this large volume of data is significant as it is directly linked to such progress (Merelli et al., 2014, Kozubek, 2018. Apart from that, sequencing price continuously declining and this leads to the rapid increase of sequence data (Wetterstrand, 2013). Hence, it requires a new method for storing these data and analyzing them (Levy and Myers, 2016). Therefore, arise of cloud computing assures the place for dealing the generation of a large amount of sequence data , Zhao et al., 2017. Cloud computing is the use of network server for storing, managing, and processing the big scale data (Hashem et al., 2015). However, variability in price pattern of the latest computing trend will vitally affect the response of funding agencies. Besides that, it also affects the researchers to approach data analysis as accessing cloud computing is costly (Muir et al., 2016, Wall et al., 2010.

The challenges
Genomic is looking forward to producing data within 2 to 40 exabytes per year (Check Hayden, 2015). This is caused by data volume being generated in genomics day by day that is doubling every seven months (Check Hayden, 2015). However, the huge size of the data is not the only issue the field needs to overcome. Collection of data from many locations and various format causes difficulty in utilizing the over datasets according to a study (Gebelhoff, 2015). On top of that, most of the data produced nowadays are not properly structured, standardized and organized (Kho et al., 2013, Lathe et al., 2008. Thus, this leads to many technical issues and inappropriate interpretation (Simon, 2008). There is a perspective that Big Data might cause extra problems than it can be solved by now to 2020 (Anderson and Rainie, July 20, 2012). Besides that, processing big data in a timely mean and analyzing them eventually to provide meaningful inference are much challenging (McAfee et al., 2012). In addition, Hadoop is open source software that provides a vital platform for analysis of big data. It allocates immense storage and facilitates of rapid data processing . Thus, steps are taken by Google via MapReduce in order to develop Hadoop (Ware et al., 2017). Furthermore, a high level of Java expertise is needed for developing parallelized programs so that Hadoop can be programmed, yet it could be another hurdle (O'Driscoll et al., 2013).
Scalability and validation of data is the general concern for analysis of bioinformatics genomic big data. The former can be handled by conceptually associated methods such as divide-and-conquer. It increases scalability and multiple executions for better validation. Besides that, an effective bioinformatics approach with a quality assurance could be implemented using metamorphic testing (Yang et al., 2017). Metamorphic testing is a technique for identifying oracle problem and validating the scientific computing dependent machine learning program (Xie et al., 2009). The way of conducting research in medical science should be changed in this big data era. It should be changed from individual academic investigation to more collaborative research using the well-organized techniques. Researchers should focus on crucial life/ health networks including dynamics rather than static and statistic features of the big data (Li and Chen, 2014).

Conclusion
In a nutshell, it is very obvious that researchers are capable of producing huge amounts of data rapidly and cost-efficiently. This is due to the presence of advanced technology that leads to the current big data. However, there is a distinctive challenge caused by this big data such as storing, managing, transferring, and analyzing them. The available tools and skills can assist researchers to utilize the technology for interpreting the data efficiently. Hence, they should make use of the big data to create a network instead of focusing in a particular area. As there is no point of keep multiplying the data without analyzing the existing data.