After the fast development in Internet technologies, data has been coined as the “new oil”. Indeed, on the one hand, Internet technologies have enabled gathering and storing unlimited amounts of data in Data Centers and, on the other, Cloud computing ecosystem has enabled unlimited computing capacity for processing and analyzing the data. One naturally poses the question of why the data is the new oil? The premise is that in huge amounts of data (aka, data deluge) there can be found useful information, similarly to oil find in vast petroleum fields.
Data Science and Engineering disciplines have emerged in need to extract useful information contained in the vast amounts of data to support all fields of science, engineering, businesses, and life. Data science aims to study and understand fundamental issues behind data sets, their interrelations and complexity, analysis, fusion, privacy, anonymization, security, quality, etc. in a full data life-cycle. Data Engineering addresses building and deployment of distributed, scalable and reliable data infrastructures and integrating them into the knowledge and business pipelines of institutions, organizations, businesses and alike. Therefore, Data Science and Engineering can be seen as the tandem that bridges data sources with the real-life data platforms, also referred to as data lakes.
Admittedly, data science and engineering are not new. Since 1950s, data sources and analysis (data analytics) were used to uncover the useful information found in the data sources. There is, however, a big leap from then to the today’s data science and engineering. That comes in response to the need to handle ever increasing amounts of data, generated at high speed and variety. The nature of today’s data science and engineering is very much different not only from the old data science and engineering but even to the recent data science and engineering of the past two decades. There has taken place a shift from processing simple sensorial data to data-centric systems, from simple business analytics to data-driven business models!
Data Science and Engineering has to overcome a number of challenges in order to leverage the many opportunities in the data-driven era. These challenges range from new data analysis and processing models to building data lake infrastructures at any scale. While current state of the art in data science and engineering makes possible to leverage the value in the data, it becomes a major hurdle to make this happen in a sustained way in the mid and long term. So, small and medium size businesses and enterprises strive to integrate data pipelines in their business workflows and benefit from the value in the data. Overcoming this and other challenges has prompted and propelled an intensive and fruitful collaboration of various scientific and engineering communities shortening in a drastic way the time from conception to implementation and deployment of data-driven systems. Similarly, in the academic field, many new publication outlets have emerged to address the latest research and development findings in from data science and engineering. Springer offers a number of such publication outlets, including the series of “Lecture Notes on Data Engineering and Communications Technologies”, which successfully serves as a platform for disseminating the latest research and development findings of data science and engineering in an interdisciplinary setting.
The Internet-based technologies start with a hype, but only some of them reach the “Slope of Enlightenment” and even fewer enter the “Plateau of Productivity”. This is precisely the positive case of the data-driven technologies, which is made possible due to the significant contribution of the data science and engineering, to bring data technologies to the mainstream and provide payoff to all fields of science, knowledge, businesses and alike.
Quoting Roy Charles Amara “We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run”, we can state that data science and engineering are indeed drivers for the long haul!
About the author
Fatos Xhafa, PhD in Computer Science, is Full Professor at the Technical University of Catalonia (UPC), Barcelona, Spain. He has held various tenured and visiting professorship positions.
Prof. Xhafa has widely published in peer reviewed international journals, conferences/workshops, book chapters, edited books and proceedings in the field (H-index 60). He is a founder and Editor-in-Chief of the Springer Lecture Notes on Data Engineering and Communications Technologies. His research interests include IoT and Cloud-to-thing continuum computing, massive data processing and collective intelligence, optimization, security and trustworthy computing and machine learning, among others.