Course Description

Introduction

The term "big data" is now commonly used to mean that the growth of data in volume, velocity, variety and veracity are in such an unprecedented scale that traditional database management systems can no longer handle it properly. New technologies, artificial intelligence(AI), machine learning(ML) and internet of things(IoT) in particular, rely heavily on the processing of huge data sets. Online services (ChatGPT, YouTube, Meta, IG, …) need to handle hundreds of millions of users issuing billions of request at the same time. We therefore need new technologies (big data processing/analysis) and new tools (big data systems) to deal with extremely large data sets and service requests. This is an introductory course on big data concepts, processing, analytics and systems. You will learn the latest development in big data technologies and get hands on experience in using popular open source big data systems such as Hadoop, Spark, HBase, MongoDB, Neo4j, Kafka, Flink, etc. The objectives of this course can be summarized as follows.

Understand big data concepts, challenges and trends.
Learn the technological foundations of big data science and engineering.
Learn the principles and practices behind popular open source big data systems.
Get hands on experiences of using open source big data systems for solving big data problems.

This is a lecture-oriented course. The system part of the course will be executed through in-class example discussion, homework assignments and term project. Due to the time limit, the lectures will focus mostly on the technological innovation of each system rather than how to use them. With brief introduction to the basic operations of various big data systems, students are expected to learn to use them on their own.

Regular Topics

The topics to be covered in the lecture are listed as follows (**: topics to be covered depending on the time and progress):

Introduction
General purpose big data platforms
Big data storage architecture and systems
Big data systems for structured/semi-structured data
Big graph processing
RDF processing systems**
Big stream processing
Big data pipelining tools
Big data ETL(extract, transform, and load) tools
Big data analytics, other systems and trends**
Open data**
Big data system landscape**

Special Topic(s)

Based on current practices and emerging trends, we will select one or two special topics to provide a brief overview (if time permits, of course). This semester, the special topic we plan to talk about is big data and AI (Artificial Intelligence):

Relationship between big data and AIML(Artificial Intelligence and Machine Learning)
How ChatGPT and other similar Large Language Models(LLMs) work?
LLaMA, the open source LLM from Meta
The future of big data and generative AI

Visit the syllabus page for detail information about the lecture schedule.

Administrative Information

Course Title: Big Data Systems
Course Number: CSIE59830/CSIEM0410
Meeting Time: Tue 14:10~17:00
Classroom: Engineering Building C305 (工C305)
Office Hours:Tue 17:00~18:00
Grading Policy:

Homework/Programming Assignments 35%
Independent Study and Presentation 15%
Final Exam 25%
Term Project 25%

Course Homepage: http://web.csie.ndhu.edu.tw/showyang/BigDataSys2023f/index.html
Instructor's Homepage: http://web.csie.ndhu.edu.tw/showyang/index.html

Online Class Information: (if needed)

Teams Link (for students with NDHU accounts): https://teams.microsoft.com/l/team/19%3aNK6KBeeGD2Nz_tWgIHy4FLpKOYGZBK4JzgHhswSS_Jo1%40thread.tacv2/conversations?groupId=40213531-a422-491c-a5c1-b8b3d71f72f6&tenantId=edba3211-8174-4411-b089-357c588fa127
Join Code: z18wcz5