Syllabus

The topics to be covered in the lecture can be divided into several parts (depending on the time and progress):

Part I: Introduction

What is a Big Data?
Why Big Data?
Examples of Big Data
The opportunities and challenges for Big Data

Part II: General purpose big data systems

Distributed and cluster computing
MapReduce and Apache Hadoop
In-memory computation & Apache Spark

Part III: Big data storage

Distributed filesystems and big data storage
Google GFS
Apache HDFS
Google BigTable system

Part IV: Big structured data processing

SQL or NoSQL
Apache HBase
Cassandra and MongoDB
Data Warehousing, Google BigQuery and Apache Hive

Part V: Big graph processing

The challanges of big graphs
Pregel family of systems
GraphLab family of systems

Part VI: Big stream processing

The challenges of distributed big stream processing
Apache Flink
Apache Storm
Spark Streaming

Part VII: Other systems and trends**

Google Dremel, Apache Drill and Apache Impala
Google Cloud Platform (GCP) vs Amazon Web Services (AWS)
Open Data
Beyond Hadoop

**: Topics to be covered depending on the time and progress.