Course Description
Introduction
The term "big data" is now commonly used to mean that the growth of data in volume, velocity, variety and veracity are in such an unprecedented scale that traditional database management systems can no longer handle it properly. New technologies, artificial intelligence(AI), machine learning(ML) and internet of things(IoT) in particular, rely heavily on the processing of huge data sets. Online services (ChatGPT, YouTube, Meta, IG, ¡K) need to handle hundreds of millions of users issuing billions of request at the same time. We therefore need new technologies (big data processing/analysis) and new tools (big data systems) to deal with extremely large data sets and service requests. This is an introductory course on big data concepts, processing, analytics and systems. You will learn the latest development in big data technologies and get hands on experience in using popular open source big data systems such as Hadoop, Spark, HBase, MongoDB, Neo4j, Kafka, Flink, etc.
The objectives of this course can be summarized as follows.
- Understand big data concepts, challenges and trends.
- Learn the technological foundations of big data science and engineering.
- Learn the principles and practices behind popular open source big data systems.
- Get hands on experiences of using open source big data systems for solving big data problems.
This is a lecture-oriented course. The system part of the course will be executed through in-class example discussion, homework assignments and term project. Due to the time limit, the lectures will focus mostly on the technological innovation of each system rather than how to use them. With brief introduction to the basic operations of various big data systems, students are expected to learn to use them on their own.
Regular Topics
The topics to be covered in the lecture are listed as follows (**: topics to be covered depending on the time and progress):
- Introduction
- General purpose big data platforms
- Big data storage architecture and systems
- Big data systems for structured/semi-structured data
- Big graph processing
- RDF processing systems**
- Big stream processing
- Big data pipelining tools
- Big data ETL(extract, transform, and load) tools
- Big data analytics, other systems and trends**
- Open data**
- Big data system landscape**
Special Topic(s)
Based on current practices and emerging trends, we will select one or two special topics to provide a brief overview (if time permits, of course).
This semester, the special topic we plan to talk about is big data and AI (Artificial Intelligence):
- Relationship between big data and AIML(Artificial Intelligence and Machine Learning)
- How ChatGPT and other similar Large Language Models(LLMs) work?
- LLaMA, the open source LLM from Meta
- The future of big data and generative AI
Visit the syllabus page for detail information about the lecture schedule.
Administrative Information
Online Class Information: (if needed)