Big Data and Cloud Computing

Georgetown University
Fall 2023

Published

Tuesday Nov 28, 2023 at 7:37 pm

Leverage the cloud to scale your work up and out and to work with datasets too large to fit on a single machine.

Sections and meeting locations

Section Instructor(s) Day/Time Location
1 Anderson Monken Tuesdays 6:30 - 9:00 PM Car Barn 204
2 Amit Arora Wednesdays 6:30 - 9:00 PM Car Barn 309
3 Marck Vaisman
Abhijit Dasgupta
Thursdays 6:30 - 9:00 PM Walsh 499

Topics and Class Meetings

Notes:

  • First day of class will be virtual and meet on August 24 6:30-9pm
  • Section 1 (Tuesday section) will meet on Tuesday, September 5 (even though it is a Monday schedule that week). If students in section 1 have a class conflict, please let Prof. Anderson know.
Section 1
Section 2
Section 3
Wk Topics Lab Readings Date Faculty Date Faculty Date Faculty
1 Course overview, evolution of cloud technologies, background skills Background linux skills practice Thu Aug-24 Virtual Thu Aug-24 Virtual Thu Aug-24 Virtual
2 Getting started with Cloud analytical tools in an extended lab: hands on with SageMaker (AWS) and AzureML (Azure) Accessing cloud services Tue Aug-29 Anderson Wed Aug-30 Amit Thu Aug-31 Abhijit/Marck
3 Python parallelization, map/reduce, file systems/distributed/S3, cluster architecture Parallelization with Python and multiprocessing TBD Tue Sep-05 Anderson Wed Sep-06 Amit Thu Sep-07 Abhijit/Marck
4 Event driven cloud processing, containers, and code portability Containers and Lambda (AWS) TBD Tue Sep-12 Anderson Wed Sep-13 Amit Thu Sep-14 Abhijit/Marck
5 DuckDB, Polars, file performance DuckDB/Polars lab TBD Tue Sep-19 Anderson Wed Sep-20 Amit Thu Sep-21 Abhijit/Marck
6 Spark 1: Introduction to Spark, Spark RDDs, Spark DataFrames, SparkSQL Spark RDD and DataFrames lab TBD Tue Sep-26 Anderson Wed Sep-27 Amit Thu Sep-28 Abhijit/Marck
7 Spark 2: Project Introduction, Spark UDFs Spark Advanced DataFrames lab TBD Tue Oct-03 Anderson Wed Oct-04 Amit Thu Oct-05 Abhijit/Marck
8 Spark 3: Machine Learning with SparkML Spark ML lab TBD Tue Oct-10 Anderson Wed Oct-11 Amit Thu Oct-12 Abhijit/Marck
9 Spark 4: SparkNLP Spark NLP lab TBD Tue Oct-17 Anderson Wed Oct-18 Amit Thu Oct-19 Abhijit/Marck
10 Spark Streaming (tentative) TBD Tue Oct-24 Anderson Wed Oct-25 Amit Thu Oct-26 Abhijit/Marck
11 TBD TBD Tue Oct-31 Anderson Wed Nov-01 Amit Thu Nov-02 Abhijit/Marck
12 TBD TBD Tue Nov-07 Anderson Wed Nov-08 Amit Thu Nov-09 Abhijit/Marck
13 Dask & Ray (tentative) TBD Tue Nov-14 Anderson Wed Nov-15 Amit Thu Nov-16 Abhijit/Marck
NO CLASS - Thanksgiving Break TBD
14 Cloud Virtual Machines and Hardware (tentative) TBD Tue Nov-28 Anderson Wed Nov-29 Amit Thu Nov-30 Abhijit/Marck
15 Wrapup. Ask me anything. TBD Tue Dec-05 Virtual Tue Dec-05 Virtual Tue Dec-05 Virtual

Deliverables

All deliverables (assignments, lab completions, and project milestones) are due on Mondays at 11:59pm

  • Refer to the Canvas site for up to date details on due dates and deliverables -
Deliverable Date Released Due Date
A0: Background Skills 2023-08-20 6:00pm 2023-08-30 11:59pm
A1: Python skills 2023-08-25 12:00am 2023-09-05 11:59pm
A2: Shell & Linux 2023-09-01 12:00pm 2023-09-11 11:59pm
A3: Parallelization 2023-09-08 12:00pm 2023-09-18 11:59pm
A4: Containers 2023-09-15 12:00pm 2023-09-25 11:59pm
A5: DuckDB & Polars 2023-09-22 12:00pm 2023-09-29 11:59pm
A6: PySpark (Multi-Part) 2023-09-29 12:00pm 2023-10-06 11:59pm
Labs (weekly) - 14 total weekly during class weekly on Tuesday at 6pm
Project Proposal TBD TBD
Project Intermediate Deliverable TBD TBD
Project Peer Feedback Deliverable TBD TBD
Final Project Deliverable TBD 2023-12-08 11:59pm