Course Description


Introduction

The term "big data" is now commonly used to mean that the growth of data in volume, velocity, variety and veracity are in such an unprecedented scale that traditional database management systems can no longer handle it properly. Take Walmart - the world's biggest retailer with over 20,000 stores in 28 countries, as an example, is in the process of building the world's biggest private cloud, to process 2.5 petabytes of data every hour. Facebook, the world's most popular social media network, needs to process data from more than 2 billion monthly active users worldwide. Every 60 seconds, 136,000 photos are uploaded, 510,000 comments are posted, and 293,000 status updates are posted. That amounts to 1000+ terabytes of data generated per day. Approximately 600 million times per second, particles collide within the Large Hadron Collider (LHC) at CERN. Merely recording these events takes up 500EB(1EB = 1024PB) of storage per day, let alone analyzing it. We therefore need new technologies (big data processing) and new tools (big data systems) for these jobs. This is an introductory course on big data concepts, processing, analytics and systems. You will learn the latest development in big data technologies and get hands on experience in using popular open source big data systems such as Hadoop, HBase, Spark, Hama, etc. The objectives of this course can be summarized as follows. This is a lecture-oriented course. The system part of the course will be executed through in-class example discussion, homework assignments and term project. Due to the time limit, the lectures will focus mostly on the technological innovation of each system rather than how to use them. With brief introduction to the basic operations of various big data systems, students are expected to learn to use them on their own.

The topics to be covered in the lecture are listed as follows (**: topics to be covered depending on the time and progress):

Visit the syllabus page for detail information about the lecture schedule.

Administrative Information