Syllabus

Published

Tuesday Nov 28, 2023 at 7:37 pm

Important

You must familiarize yourself with this document. It is thorough and covers our expectations and policies.

Instructors

TAs

Communication

Important

The primary mode of communication will be Slack. The course’s Slack workspace is https://dsan6000-fall2023.slack.com.. You will receive an invitation to join.

Instructional team e-mail: dsan6000-instructors@georgetown.edu. This is the preferred way of contacting the professors privately. Please read the communication and Slack rules.

Course description

Data is everywhere! In today’s data driven world, very often you will find yourself with a dataset that is just too big to be analyzed with traditional programming libraries on your laptop or workstation. That is where modern open source projects, cloud providers, and distributed computation/processing save the day.

This is a practical, workshop-style course about using cloud computing to do analysis and manipulation of datasets that are too large to fit on a single machine or analyzed with traditional tools. You will have the opportunity to use m ` tools and techniques discussed in class. Although this is not a programming course per se, there is programming involved.

You will understand how to ingest the data, and then massage, clean, transform, analyze, and model it within the context of big data analytics. You will be able to think more programmatically and logically about your big data needs, tools, and issues.

Learning objectives

  • Select, configure, and use the approproate tools and cloud infrastructure to work with datasets
  • Process and analyze large datasets using scalable approaches
  • Learn the services offered by Microsoft Azure and Amazon Web Services
  • Use big data processing skills, including git and the Linux command line
  • Execute a big data analytics project from start to finish: process, analyze, model, and communicate results through written and verbal methods
  • Understand the steps required to scale from interactive scripting to unattended jobs

Pre-requisites

  • Experience with Python and SQL. Note: The primary language is Python
  • Experience with git and GitHub

Some tutorials to brush up on these skills:

Required resources

Computer

You should have a laptop (no Chromebooks, please). Windows, Mac or Linux machines are acceptable. Please bring your machine to class.

Cloud accounts

Note

You will be provided with and use cloud resources on Microsoft Azure. and Amazon Web Services.

We will discuss how to setup your account(s) and environment(s) in class and lab within the first couple of weeks. You will get credits that will be enough to support your coursework throughout the semester.

Warning

IT IS YOUR RESPONSIBILITY TO MANAGE THE CREDITS AND CLOUD RESOURCES PROVIDED TO YOU. YOU MUST SHUT DOWN YOUR CLOUD RESOURCES WHEN NOT IN USE.

Learning activities

Class format

The course meetings follow a split lecture/lab format. All class meetings will have a lecture portion, and most sessions will have an in-class lab portion.

During the lecture portion, we will discuss concepts, techniques, cloud services, open-source tools, and explore the tools’ history and development

During the lab portion, we will usually perform some a short demonstration, and then you will complete exercises and follow examples which are designed to show you how to implement the ideas and concepts with various tools.

Important
  • Lectures and labs will not be recorded.
  • Lectures may not cover all the material and some topics will be introduced in the lab or through readings/assignments.
  • You will start the labs in class but you will most likely not finish. It is your responsibility to complete the labs to enable your learning. Completing the labs succesfully is also part of your grade.

Readings

On certain weeks, readings will be assigned to prepare you for the lecture material being presented. These readings should take an hour or less per week. Reading materials will be provided through in PDF format via Canvas.

Important

You must read assigned readings prior to the lectures.

Online Quizzes

There will be unannounced quizzes a few times during the semester, at random intervals and times. The quizzes ensure you are keeping up with the material presented in the class. The material for the quizzes will be drawn from lectures, labs, and readings.

Important

Missed quizzes cannot be made up.

Lab completions

Most labs will have a deliverable. Completing the labs is essential for you to learn the skills presented in class.

The lab deliverables can sometimes be completed during lab time, however, it is your responsibility to complete the deliverable as part of your work outside of lecture/lab time.

Homework assignments

There will be several homework assignments. The goal of these problem sets is to hone your big data skills by answering some questions about large datasets. The problem sets will build on the labs and will be much more in-depth. Deliverables from the assignment will usually include code written for your programs and the output produced.

Warning

Please start assignments as soon as they are posted. These assignments can take several hours to complete depending on your familiarity with the material. You will not complete the assignments on time if you start the day they are due.

Note

We reuse problem set questions, we expect students not to copy, refer to, or look at the solutions in preparing their answers. Since this is a graduate-level class, we expect students to want to learn and not search online for answers. See the Academic Integrity section for more details.

Big Data analytics project

You will assemble into groups of 3 to 4 students in any section. You will perform and write up an analysis of a big dataset using the tools learned in class. Big is defined as “a dataset that is so large that you cannot work with it on a laptop.”

The details for the project will be provided within the first few weeks of the term.

Evaluation

  • Assignments : 30%
  • Lab completions : 20%
  • Quizzes & attendance : 10%
  • Group project : 40%
Important

The project will have several milestones that are cumulative in nature. Therefore, we will grade the project after the final submission with a holistic project rubric. We will grade the milestones in a qualitative way, and we will provide feedback and a trending grade with each milestone. It is up to you to incorporate the feedback provided. If your milestone trending grade is lower than you expected, and you do not incorporate the feedback we provide for improvement, do not expect your final project grade to improve.

In addition, each team member will complete a peer evaluation for their group and provide feedback to everyone’s contribution. Every team member is expected to contribute equally to their project. If peer evaluations indicate that students within a team are not contributing equally, those students will receive a grade penalty and a lower grade than the rest of their team.

Total is 100%. There is no plan to curve the final grade, and the final letter grade will be:

  • A: >= 92.5
  • A-: >= 89.5, < 92.5
  • B+: >= 87.99, < 89.5
  • B: >= 81.5, < 87.99
  • B-: >= 79.5, < 81.5
  • C: >= 70, < 79.5
  • F: < 70
Important

Failing this course is highly unlikely but definitely possible. Reasons for failing include but are not limited to:

  • Consistently delivering work that is significantly below expectations
  • Consistently missing deliverables
  • Consistently missing class
  • Being found in violation of academic integrity

Grading philosophy

Some of the assignments you will work on are open-ended and some are not (i.e. specific tasks). Grading is generally holistic, meaning that there may not always be specific point value for individual elements of a deliverable. Each deliverable submission is unique and will be compared to all other submissions.

Deliverables that:

  • Exceed the requirements and expectations are typically considered A level work.
  • Just meet the requirements and expectations are typically considered A-/B+ level work.
  • Do not meet the requirements are typically considered B or lesser level work.

Partial credit will be given where appropriate.

All deliverables must meet general quality requirements that are expected from students at the graduate school level as well as specific requirements that will be provided for each deliverable. Points will be deducted for any of the following reasons:

  • You did not follow any direct and specific instructions
  • Your deliverable has missing sections
  • Your overall presentation and/or writing is sloppy
  • Your code does not follow best coding practices
  • Your code has no comments (including the areas where GAI was used)
  • Your repository has either more or less files than those requested
  • You use absolute references (file paths, urls, etc.) paths in your scripts
  • You alter the repository structure in any way
  • You do not use GitHub Classroom
  • You do not use git effectively
  • You manually upload files to GitHub through the web and do not use
  • You use incorrect file names (wrong extensions, wrong case, etc.)
  • Your technical approach is fundamentally flawed
  • Your analytical decisions are unjustified

Submitting your work

GitHub classroom

We use Github Classroom for all class deliverables: assignments, labs, and the final project. Submitting your work is the process of committing your files and results to your local repository and then pushing it to GitHub.

Important

You must submit everything through GitHub!

Use the final-submission commit message

When you are ready for your work to be evaluated, you MUST use the commit message final-submission. If you do not use the commit message final-submission we will assume that you are still working in the repository and we will only grade what is present. By submitting that commit message, you are stating that you are finished with the assignment and are ready for feedback.

Important

Make sure you understand the difference between a git commit and a push, and that you push your repository successfully to GitHub.

In case you need to make a correction after your final-submission and the submission deadline has not yet passed, then you can amend your previous commit. See amending a commit for instructions. Do not change the commit message, it should continue say “final-submission” after the amend.

Warning

No further edits to your GitHub repository are allowed after using the final-submission commit message.

Important

We will use commit datetime and commit message to assess lateness.

Late policy

In lieu of extensions, there is a tiered deduction scale if a deliverable is late. Late penalties only apply to labs and assignments.

We will assess exceptional circumstances on a case-by-case basis, and only if we are made aware before a deliverable’s deadline, not after.

  • A late penalty of 10% per day, up to 4 days, will be assessed for assignments and labs that are submitted with a final-submission commit message after the deadline. You may still submit a missed lab or assignment up until the last day of class (May 2) with a maximum possible grade of 60%.
  • Missed in-class quizzes cannot be made up and will receive a grade of zero.
  • Project deadlines are fixed and have no extensions or late penalty. A missed project deliverable will receive a grade of zero.

Other course policies

Attendance and punctuality

Attendance is mandatory and will be taken. Given the technical nature of this course, and the breadth of topics discussed, you are expected to attend each class, to complete all readings, and to participate actively in lectures, discussions and exercises. We understand there may be times you may need to miss class, please inform us in advance if you are not able to attend class for any reason. However, it is up to you to keep up.

Participation

We love participation. Read. Raise your hand. Ask questions. Make comments. Challenge us. Acknowledge us. If we speak for three hours to a silent classroom, it is a lot more boring and tiring for everyone.

Laptop and phone use

You must bring your laptop to class to work on labs. No phone use is allowed during lecture. You may use your laptop during lecture to take notes, but please refrain from other activities. We reserve the right to ask you to put your phones and laptops away. You may not use your computer or phone while your peers or guest speakers are presenting.

Communication and Slack Rules

  • All announcements will be posted on Canvas and Slack
  • Use Slack for any question you may have about the course, about assignments or any technical issue. This way everyone can learn from each others questions. We will be monitoring and providing answers on a regular basis. Make sure you understand what is allowed in Slack.
  • Individual emails containing any course question that is not personal will not be answered
  • Slack DMs are not to be used unless we DM you first and you can respond to our message. Students may not initiate DMs.
  • Keep an eye on the questions posted in Slack. Use the search function. It’s very possible that we have already answered a question, and we reserve the right to point you to the syllabus, previous Slack messages, or other document containing the information requested
  • Assignment, lab and project questions will only be answered on Slack up to 12 hours before something is due

Open Door Policy

Please approach or get in touch with us if something is not working for you regarding the class, methods, etc. Our pledge to you is to provide the best learning experience possible. If you have any issue please do not wait until the last minute to speak with us. You will find that we are fair, reasonable, and flexible and we care deeply about your learning and success.

Academic Integrity

As a Jesuit, Catholic university, committed to the education of the whole person, Georgetown expects all members of the academic community, students and faculty, to strive for excellence in scholarship and in character.The University spells out the specific minimum standards for academic integrity in its Honor Code, as well as the procedures to be followed if academic dishonesty is suspected.

Over and above the honor code, in this course we will seek to create an engaged and passionate learning environment, characterized by respect and courtesy in both our discourse and our ways of paying attention to one another.

The code of academic integrity applies to all courses at Georgetown University. Please become familiar with the code. All students are expected to maintain the highest level of academic integrity throughout the course of the semester.Please note that acts of academic dishonesty during the course will be prosecuted and harsh penalties may be sought for such acts. Students are responsible for knowing what acts constitute academic dishonesty. The code may be found at https://bulletin.georgetown.edu/regulations/honor/.

Danger

We have a ZERO TOLERANCE POLICY and students found to be in violation will be reported and penalized. The consequences of any violation may include: additional points penalty, getting a grade of zero, automatically failing the course, and suspension or expulsion from the program.

Definition of collaboration

In the spirit of fostering a collective and inclusive learning environment, we acknowledge that you will work and study with your peers. We also acknowledge that you use web resources (code examples specifically), and that in writing a program many of you will most likely use the same libraries, functions and other similar instructions in your scripts. However:

  • You must write your own code. This will be verified for every assignment against every submission, and any similarity greater than 60% between students on a given assignment will be considered to be unauthorized collaboration.
  • You must do your individual work in your own cloud resources. This will be verified for every assignment. We know the fingerprint of your cloud account and subscriptions and we can tell.

What is allowed

  • Collaborating with other students during in-class labs to facilitate collective learning
  • Using Slack for helping one-another as long as:
    • You do not provide answers directly but only discuss potential approaches
    • You only share up to a few lines of code for everyone’s benefit for the resolution of a specific question or issue
  • Using anything (code, resources, tips, approaches, etc.) provided by the instructional team

What is forbidden

The following actions are not permitted in any way and are considered a violation of academic integrity:

  • Copying and sharing code between students in individual assignments or across goups in the group project
  • Sharing anything on any individual assignment
  • Using code snippets found online (stack overflow, etc.) and not commenting the source
  • Plagiarism of any kind
  • Using any Generative Artificial Intelligence tool without acknowledging it
  • Using someone else’s cloud resources
  • Making your private GitHub repos public
  • Sharing or posting any course materials anywhere
  • Faking or tampering with git commit dates or messages

Use of Generative AI tools

We recognize the recent availability of very powerful generative AI tools like Chat-GPT, GitHub Copilot, and others. These tools can help us be more effective and we embrace their use.

Important

You are allowed to use GAI tools in a non substantial way.

What does non substantial mean?

It means that whatever is generated by GAI must not make up the majority of the work you do.

Any use of these tools must abide to the following rules:

  • You must comment which code blocks were generated by GAI
  • You must note which written sections were generated by GAI
  • If you used a prompt to ask the GAI tool to do something, you must include it
Warning

Any deviation from these rules is considered a violation of academic integrity and will be acted on.

Georgetown University resources and policies

Georgetown University’s Plagiarism Policy

Plagiarism or academic dishonesty in any form will not be tolerated and may result in a failing grade. All Honor Code violations will be submitted to the Honor Council.

Academic integrity is central to the learning and teaching process. Students are expected to conduct themselves in a manner that will contribute to the maintenance of academic integrity by making all reasonable efforts to prevent the occurrence of academic dishonesty. Academic dishonesty includes (but is not limited to) obtaining or giving aid on an examination, having unauthorized prior knowledge of an examination, doing work for another student, and plagiarism of all types, including copying code.

Plagiarism is the intentional or unintentional presentation of another person’s idea or product as one’s own. Plagiarism includes, but is not limited to the following: copying verbatim all or part of another’s written work; using phrases, charts, figures, illustrations, code, or mathematical/scientific solutions without citing the source; paraphrasing ideas, conclusions, or research without citing the source; and using all or part of a literary plot, poem, film, musical score, or other artistic product without attributing the work to its creator. Students can avoid unintentional plagiarism by following carefully accepted scholarly practices. Notes taken for papers and research projects should accurately record sources cited, quoted, paraphrased, or summarized sources and articles should be acknowledged in footnotes.

Honor System

All students are expected to maintain the highest standards of academic and personal integrity in pursuit of their education at Georgetown. Academic dishonesty, including plagiarism, in any form, is a serious offense, and students found in violation are subject to academic penalties that include, but are not limited to, failure of the course, termination from the program, and revocation of degrees already conferred. All students are held to the Georgetown University Honor Code. For more information about the Honor Code http://gervaseprograms.georgetown.edu/honor/

Academic Integrity and Courtesy

As a Jesuit, Catholic university committed to the education of the whole person, Georgetown expects all members of the academic community, students and faculty, to strive for excellence in scholarship and in character. The University spells out the specific minimum standards for academic integrity in its Honor Code and the procedures to be followed if academic dishonesty is suspected. Over and above the honor code, in this course, we will seek to create an engaged and passionate learning environment characterized by respect and courtesy in both our discourse and our ways of paying attention to one another.

Academic Resource Center

The Academic Resource Center (ARC) is the campus office responsible for reviewing medical documentation and determining reasonable accommodations for students with disabilities. You can reach the ARC via email at arc@georgetown.edu.

Counseling and Psychiatric Services (CAPS)

As Georgetown faculty, you are among the most important individuals in some of the students’ lives. They may turn to you when they are struggling and in times of need, or you may be one of the first to notice when they are distressed.

The CAPS website has tips for faculty on how to deal with struggling or distressed students. 202.687.6985 or after hours, call (833) 960-3006 to reach Fonemed, a telehealth service; individuals may ask for the on-call CAPS clinician.

Emergency Preparedness and HOYAlert

We encourage all faculty to become familiar with Georgetown’s Office of Emergency Management and sign up for HOYAlert to receive important safety and University operating status updates. Faculty teaching at the Georgetown Downtown campus might also want to sign up for AlertDC to obtain safety and traffic updates.

Office of Institutional Compliance and Ethics

The Office of Institutional Compliance and Ethics supports and coordinates many compliance-related activities the University undertakes. With the endorsement and assistance of the University’s senior leadership, this Office is responsible for leading the development, implementation, and operation of the Georgetown Institutional Compliance and Ethics Program.

Office of Institutional Diversity, Equity and Affirmative Action (IDEAA)

The mission of IDEAA is to promote a deep understanding and appreciation among the diverse members of the University community to result in justice and equality in educational, employment, and contracting opportunities, as well as to lead efforts to create an inclusive academic and work environment.

Title IX/Sexual Misconduct

Georgetown University and its faculty are committed to supporting survivors and those impacted by sexual misconduct, which includes sexual assault, sexual harassment, relationship violence, and stalking. Georgetown requires faculty members unless otherwise designated as confidential, to report all disclosures of sexual misconduct to the University Title IX Coordinator or a Deputy Title IX Coordinator. Suppose you disclose an incident of sexual misconduct to a professor in or outside of the classroom (except disclosures in papers). In that case, that faculty member must report the incident to the Title IX Coordinator or Deputy Title IX Coordinator. The coordinator will, in turn, reach out to the student to provide support, resources, and the option to meet—[Please note that the student is not required to meet with the Title IX coordinator.]. More information about reporting options and resources can be found on the Sexual Misconduct Website.

If you would prefer to speak to someone confidentially, Georgetown has a number of fully confidential professional resources that can provide support and assistance. These resources include:

  • Health Education Services for Sexual Assault Response and Prevention: confidential email sarp@georgetown.edu
  • Counseling and Psychiatric Services (CAPS): 202.687.6985 or after hours, call (833) 960-3006 to reach Fonemed, a telehealth service; individuals may ask for the on-call CAPS clinician

Title IX Sexual Misconduct Statement Please know that as faculty members, we are committed to supporting survivors of sexual misconduct, including relationship violence and sexual assault. However, university policy also requires us to report any disclosures about sexual misconduct to the Title IX Coordinator, whose role is to coordinate the University’s response to sexual misconduct.

Georgetown has a number of fully confidential professional resources who can provide support and assistance to survivors of sexual assault and other forms of sexual misconduct. These resources include:

  • Getting Help
  • Jen Schweer, MA, LPC
    Associate Director of Health Education Services for Sexual Assault Response and Prevention (202) 687-032
    jls242@georgetown.edu
  • Erica Shirley, Trauma Specialist
    Counseling and Psychiatric Services (CAPS)
    (202) 687-6985
    els54@georgetown.edu

Threat Assessment

Georgetown University established its Threat Assessment program as part of an extensive emergency planning initiative. The program at Georgetown has been developed and implemented to meet current best practices and national standards for hazard planning in higher education institutions and workplace violence prevention.

Special Accommodations

If you believe that you have a disability that will affect your performance in this class, don’t hesitate to get in touch with the Academic Resource Center for further information. The center is located in the Leavey Center, Suite 335. The Academic Resource Center is the campus office responsible for reviewing documentation provided by students with disabilities and determining reasonable accommodations according to the Americans with Disabilities Act (ADA) and University policies.