Analytics at Scale: Evolution at Infrastructure and Algorithmic Levels
Author
Abstract

Data Analytics is at the core of almost all modern ap-plications ranging from science and finance to healthcare and web applications. The evolution of data analytics over the last decade has been dramatic - new methods, new tools and new platforms - with no slowdown in sight. This rapid evolution has pushed the boundaries of data analytics along several axis including scalability especially with the rise of distributed infrastructures and the Big Data era, and interoperability with diverse data management systems such as relational databases, Hadoop and Spark. However, many analytic application developers struggle with the challenge of production deployment. Recent experience suggests that it is difficult to deliver modern data analytics with the level of reliability, security and manageability that has been a feature of traditional SQL DBMSs. In this tutorial, we discuss the advances and innovations introduced at both the infrastructure and algorithmic levels, directed at making analytic workloads scale, while paying close attention to the kind of quality of service guarantees different technology provide. We start with an overview of the classical centralized analytical techniques, describing the shift towards distributed analytics over non-SQL infrastructures. We contrast such approaches with systems that integrate analytic functionality inside, above or adjacent to SQL engines. We also explore how Cloud platforms' virtualization capabilities make it easier - and cheaper - for end users to apply these new analytic techniques to their data. Finally, we conclude with the learned lessons and a vision for the near future.

Year of Publication
2022
Conference Name
2022 IEEE 38th International Conference on Data Engineering (ICDE)
Google Scholar | BibTeX