Syllabus
The topics to be covered in the lecture can be divided into several parts (depending on the time and progress):
- Part I: Introduction
- What is a Big Data?
- Why Big Data?
- Examples of Big Data
- The opportunities and challenges for Big Data
- Part II: General purpose big data systems
- Distributed and cluster computing
- MapReduce and Apache Hadoop
- In-memory computation & Apache Spark
- Part III: Big data storage
- Distributed filesystems and big data storage
- Google GFS
- Apache HDFS
- Google BigTable system
- Part IV: Big structured data processing
- SQL or NoSQL
- Apache HBase
- Cassandra and MongoDB
- Data Warehousing, Google BigQuery and Apache Hive
- Part V: Big graph processing
- The challanges of big graphs
- Pregel family of systems
- GraphLab family of systems
- Part VI: Big stream processing
- The challenges of distributed big stream processing
- Apache Flink
- Apache Storm
- Spark Streaming
- Part VII: Other systems and trends**
- Google Dremel, Apache Drill and Apache Impala
- Google Cloud Platform (GCP) vs Amazon Web Services (AWS)
- Open Data
- Beyond Hadoop
**: Topics to be covered depending on the time and progress.