Big Data and Cloud Computing
Georgetown University
Fall 2023
Leverage the cloud to scale your work up and out and to work with datasets too large to fit on a single machine.
Sections and meeting locations
Section | Instructor(s) | Day/Time | Location |
---|---|---|---|
1 | Anderson Monken | Tuesdays 6:30 - 9:00 PM | Car Barn 204 |
2 | Amit Arora | Wednesdays 6:30 - 9:00 PM | Car Barn 309 |
3 | Marck Vaisman Abhijit Dasgupta |
Thursdays 6:30 - 9:00 PM | Walsh 499 |
Topics and Class Meetings
Notes:
- First day of class will be virtual and meet on August 24 6:30-9pm
- Section 1 (Tuesday section) will meet on Tuesday, September 5 (even though it is a Monday schedule that week). If students in section 1 have a class conflict, please let Prof. Anderson know.
Wk | Topics | Lab | Readings | Date | Faculty | Date | Faculty | Date | Faculty |
---|---|---|---|---|---|---|---|---|---|
1 | Course overview, evolution of cloud technologies, background skills | Background linux skills practice | Thu Aug-24 | Virtual | Thu Aug-24 | Virtual | Thu Aug-24 | Virtual | |
2 | Getting started with Cloud analytical tools in an extended lab: hands on with SageMaker (AWS) and AzureML (Azure) | Accessing cloud services | Tue Aug-29 | Anderson | Wed Aug-30 | Amit | Thu Aug-31 | Abhijit/Marck | |
3 | Python parallelization, map/reduce, file systems/distributed/S3, cluster architecture | Parallelization with Python and multiprocessing | TBD | Tue Sep-05 | Anderson | Wed Sep-06 | Amit | Thu Sep-07 | Abhijit/Marck |
4 | Event driven cloud processing, containers, and code portability | Containers and Lambda (AWS) | TBD | Tue Sep-12 | Anderson | Wed Sep-13 | Amit | Thu Sep-14 | Abhijit/Marck |
5 | DuckDB, Polars, file performance | DuckDB/Polars lab | TBD | Tue Sep-19 | Anderson | Wed Sep-20 | Amit | Thu Sep-21 | Abhijit/Marck |
6 | Spark 1: Introduction to Spark, Spark RDDs, Spark DataFrames, SparkSQL | Spark RDD and DataFrames lab | TBD | Tue Sep-26 | Anderson | Wed Sep-27 | Amit | Thu Sep-28 | Abhijit/Marck |
7 | Spark 2: Project Introduction, Spark UDFs | Spark Advanced DataFrames lab | TBD | Tue Oct-03 | Anderson | Wed Oct-04 | Amit | Thu Oct-05 | Abhijit/Marck |
8 | Spark 3: Machine Learning with SparkML | Spark ML lab | TBD | Tue Oct-10 | Anderson | Wed Oct-11 | Amit | Thu Oct-12 | Abhijit/Marck |
9 | Spark 4: SparkNLP | Spark NLP lab | TBD | Tue Oct-17 | Anderson | Wed Oct-18 | Amit | Thu Oct-19 | Abhijit/Marck |
10 | Spark Streaming (tentative) | TBD | Tue Oct-24 | Anderson | Wed Oct-25 | Amit | Thu Oct-26 | Abhijit/Marck | |
11 | TBD | TBD | Tue Oct-31 | Anderson | Wed Nov-01 | Amit | Thu Nov-02 | Abhijit/Marck | |
12 | TBD | TBD | Tue Nov-07 | Anderson | Wed Nov-08 | Amit | Thu Nov-09 | Abhijit/Marck | |
13 | Dask & Ray (tentative) | TBD | Tue Nov-14 | Anderson | Wed Nov-15 | Amit | Thu Nov-16 | Abhijit/Marck | |
NO CLASS - Thanksgiving Break | TBD | ||||||||
14 | Cloud Virtual Machines and Hardware (tentative) | TBD | Tue Nov-28 | Anderson | Wed Nov-29 | Amit | Thu Nov-30 | Abhijit/Marck | |
15 | Wrapup. Ask me anything. | TBD | Tue Dec-05 | Virtual | Tue Dec-05 | Virtual | Tue Dec-05 | Virtual |
Deliverables
All deliverables (assignments, lab completions, and project milestones) are due on Mondays at 11:59pm
- Refer to the Canvas site for up to date details on due dates and deliverables -
Deliverable | Date Released | Due Date |
---|---|---|
A0: Background Skills | 2023-08-20 6:00pm | 2023-08-30 11:59pm |
A1: Python skills | 2023-08-25 12:00am | 2023-09-05 11:59pm |
A2: Shell & Linux | 2023-09-01 12:00pm | 2023-09-11 11:59pm |
A3: Parallelization | 2023-09-08 12:00pm | 2023-09-18 11:59pm |
A4: Containers | 2023-09-15 12:00pm | 2023-09-25 11:59pm |
A5: DuckDB & Polars | 2023-09-22 12:00pm | 2023-09-29 11:59pm |
A6: PySpark (Multi-Part) | 2023-09-29 12:00pm | 2023-10-06 11:59pm |
Labs (weekly) - 14 total | weekly during class | weekly on Tuesday at 6pm |
Project Proposal | TBD | TBD |
Project Intermediate Deliverable | TBD | TBD |
Project Peer Feedback Deliverable | TBD | TBD |
Final Project Deliverable | TBD | 2023-12-08 11:59pm |