Lecture 2

Course overview. Introduction to big data concepts. The Cloud.

Amit Arora, Jeff Jacobs

Georgetown University

Fall 2025

Look back

  • Great use of Slack
  • Big data definition
  • Used the shell in Linux on a virtual machine through Codespaces

Agenda and Goals for Today

  • Quick tour of the cloud services that are used in the course
  • Extended Lab:
    • Setting up AWS accounts
    • Starting VMs in the cloud and connecting to them

Glossary

Term Definition
Local Your current workstation (laptop, desktop, etc.), wherever you start the terminal/console application.
Remote Any machine you connect to via ssh or other means.

Working on a single machine

You are most likely using traditional data analysis tools, which are single threaded and run on a single machine.

The BIG DATA problem

Is Moore’s Law Dead?

New Hardware

Need

  • The demand for data processing will not be met by relying on the same technology.
  • The key to modern data processing is new semiconductors
    • Not just squeezing more transistors per area
    • Need new compute architectures that are built and optimized for specialized functions
  • Specialized edge hardware for Edge Computing
  • While many declare Moore’s Law to be broken or no longer valid, in reality it’s not the law that is broken but rather a heat problem.

What

  • Graphic Processing Units (GPUs)

  • Field Programmable Gate Arrays (FPGAs)

  • Data Processing Units (DPUs)

  • Photonic computing

So, we can’t store or process data on a single machine, what do we do?

We distribute

More CPUs, more memory, more storage!

How do we do that?

Simple, we use the cloud

Cloud computing is a big deal!

Benefits

  • Provides access to low-cost computing

  • Costs are decreasing every year

  • Elastic

  • PAAS works!

  • Many other benefits…

What is the claaaaaaawd (the cloud)

What is the cloud?

\kloud\ noun

the practice of storing regularly used computer data on multiple servers that can be accessed through the Internet

Using someone else’s computer(s)

NIST Definition

Service Models

The evolution of the Cloud

Yesterday Today Tomorrow
Limited number of tools and vendors Many tools and vendors to work with Integrated tools and vendors
One platform - few devices Multiple platforms - many devices Connected platforms and devices
Data is scarce but manageable Overabundance of data Data is used for important business decisions
IT has major influence and control IT has limited influence and control IT is strategic to the business
People only work when they are at work People work wherever they want People have access to what they need, wherever they are

What does the cloud look like?

Virtual Visit to a Microsoft Azure Data Center

Microsoft Azure Data Center in Boydton, VA

Loudon County, VA is called “CLoudon”

  • How data centers power VA’s Loudon County: https://gcn.com/articles/2018/10/12/loudoun-county-data-centers.aspx

  • The heart of “The Cloud” is in Virginia: https://www.cbsnews.com/news/cloud-computing-loudoun-county-virginia/

  • CBS Sunday Morning Visits the Home of the Internet in Loudoun County: https://biz.loudoun.gov/2017/10/30/cbs-sunday-morning-visits-loudoun/

70% of the world’s internet traffic passes through Loudon County, VA

Time for Lab!