Syllabus
The topics to be covered in the lecture can be divided into several parts (depending on the time and progress):
- Introduction
- Data -> Knowledge -> Intelligence
- What is a Big Data?
- Why Big Data?
- Examples of Big Data
- The opportunities and challenges for Big Data
- General purpose big data platforms
- Distributed and cluster computing
- MapReduce and Apache Hadoop
- In-memory computation and Apache Spark
- Cloudera(CDH, Cloudera Distribution for Hadoop)
- High Performance Computing Cluster (HPCC), also referred to as DAS(Data Analytics Supercomputer)
- Big data storage architecture
- Distributed nodes
- Scale-out NAS
- All-solid-satae drive (SSD) arrays
- Object-based storage
- DNA storage
- Big data storage systems
- Distributed filesystems and big data storage
- Google GFS and Apache HDFS
- Cloud Storage
- Data lake
- Big data storage security
- Big data systems for structured/semi-structured data
- SQL, NoSQL, NewSQL
- Apache HBase, Cassandra, CouchDB, Drill, Impala, Hive
- Cassandra
- Spark SQL, DataFrames, Datasets
- MongoDB
- Google BigQuery, Spanner, F1
- Presto
- Big graph processing
- The challenges of big graphs
- Pregel family of systems(BSP, Pregel, Giraph)
- GraphLab family of systems(GraphLab, PowerGraph, GraphChi)
- Spark GraphX, GraphFrames
- Neo4j graph database
- Titan distributed graph database
- RDF processing systems**
- Big stream processing
- The challenges of distributed big stream processing
- Spark Streaming, Structured Streaming
- Apache Storm, Samza, Flink
- Apache SAMOA
- Big data pipelining tools**
- Big data ETLETL(extract, transform, and load) tools
- Apache Airflow
- Apache Kafka
- Big data analytics, other systems and trends**
- Google Cloud Platform (GCP) vs Amazon Web Services (AWS)
- RapidMiner
- KNIME
- Tableau
- R Language and RStudio
- Open Data
- Beyond Hadoop and Spark
- Big data landscape
- Special topic (2023): Big data and AI
- The relationship between big data and AI
- How do big data and AI work together?
- Powerful synergy of big data and AI
- How ChatGPT and other similar Large Language Models(LLMs) work?
- LLaMA, the open source LLM from Meta
- The future of big data and generative AI
- Special topic (2021): IoT(Internet of Things) big data platforms
- ThingsBoard
- DeviceHive
- Kaa
- DSA (Distributed Services Architecture)
- SiteWhere
- Node-RED
- OpenRemote
- Zetta
- ThingSpeak
**: Topics to be covered depending on the time and progress.