Course Description


Introduction

The term "big data" is now commonly used to mean that the growth of data in volume, velocity, variety and veracity are in such an unprecedented scale that traditional database management systems can no longer handle it properly. New technologies, artificial intelligence(AI), machine learning(ML) and internet of things(IoT) in particular, rely heavily on the processing of huge data sets. Online services (ChatGPT, YouTube, Meta, IG, ¡K) need to handle hundreds of millions of users issuing billions of request at the same time. We therefore need new technologies (big data processing/analysis) and new tools (big data systems) to deal with extremely large data sets and service requests. This is an introductory course on big data concepts, processing, analytics and systems. You will learn the latest development in big data technologies and get hands on experience in using popular open source big data systems such as Hadoop, Spark, HBase, MongoDB, Neo4j, Kafka, Flink, etc. The objectives of this course can be summarized as follows. This is a lecture-oriented course. The system part of the course will be executed through in-class example discussion, homework assignments and term project. Due to the time limit, the lectures will focus mostly on the technological innovation of each system rather than how to use them. With brief introduction to the basic operations of various big data systems, students are expected to learn to use them on their own.

Regular Topics

The topics to be covered in the lecture are listed as follows (**: topics to be covered depending on the time and progress):

Special Topic(s)

Based on current practices and emerging trends, we will select one or two special topics to provide a brief overview (if time permits, of course). This semester, the special topic we plan to talk about is big data and AI (Artificial Intelligence): Visit the syllabus page for detail information about the lecture schedule.

Administrative Information

Online Class Information: (if needed)