STCLOUD_mod7_storage FULL

for public/STCLOUD_mod7_storage FULL
check 20250525 STCLOUD Storage FULL

Module 7: Storage

Core AWS Services RECAP

Networks
Compute
Database
Directory Services

Note

there are many services out there. public cloud would need to support various use cases and organizations. don't be afraid of using cloud, chances are there is a service that will help you.
cloud is powerful because it has lots of options.

Amazon Virtual Private Cloud (VPC) - Networks
Amazon EC2 - Compute
- VMs → highest amount of control, and responsibility, hardware virtualization
- Containers → OS virtualizations, lightweight, focused on software applications and deployment of applications
- Serverless → kind of like PaaS, focused on your own code nalang (don't think about infra na), API calls, cheapest option (you only pay for every time you do an API call)
- PaaS/Orchestration → focus on code/application/website don't think about infra anymore, deploys resources
Storage
- Amazon S3 → object storage like GDrive
  - great for static files (images, videos, things that don't change)
  - each file is a full object, if you have to update something you need to upload a full new copy
- Amazon EBS → block storage
  - each file is chopped into blocks, for example each block is 512kb. if you have a 2mb file, it will consume 4 blocks.
  - great for changing files
  - VM/CT
  - HDD/SSD
  - has OS
  - Example:
- Amazon EFS → Network File System (NFS)
  - great for high-scaling storage, file sharing with multiple devices
  - non-root disk (not the one holding the OS)
  - Example: load balancer connected to 3 VMs, one VM gets broken, but user can just use another VM and it's like nothing happened, because all the VMs are connected to an NFS
- Amazon S3 Glacier -> Archival Storage
  - mostly for archives, long-term storage
  - cheapest price per GB
  - not instant retrieval (takes a few mins to a few hours) like cassette
  - sample use cases would be logs (ex. 1-2 years after a student graduates, they put their info in archival)
Database
- Amazon Relational Database Service (Amazon RDS) → managed SQL DB
  - managed service (focus on your data, the cloud manages the DB for your)
  - SQL-type
  - if you know how to manage your own DB, you can just go for compute
  - good for vertical scaling (many entries, same columns)
- Amazon DynamoDB → non-SQL service
  - think MongoDB, XML, JSON, key-value pairs
  - good for horizontal scaling (many columns, entries have different key-value pairs)
- Amazon RedShift → Data Warehousing
  - normally for big data, large volume
  - analytics
- Amazon Aurora → Enterprise SQL
  - SQL
  - high performance
  - clustering
  - aggregation
AWS Identity and Access Management → Directory Services
- users, group, role, policies
- authentication and identification

AWS Storage Options: Block Storage vs. Object Storage

38:50

Block storage → change one piece of the file
object storage → everything has to be updated, object is for static files like in GDrive

this is where the recording for july 29 starts

Amazon EBS (Elastic? Block Storage)

enables you to create individual storage volumes and attach them to an Amazon EC2 Instance
Amazon EBS offers block-level storage
volumes are automatically replicated within its Availability Zone
can be backed up automatically to Amazon S3 (object storage) through Snapshots
uses:
- boot volumes and storage for Amazon Elastic Compute Cloud (Amazon EC2) instances (VMs)
- data storage with a file system (NTFS, exFAT, FAT32)
- database hosts
- Enterprise applications

Amazon EBS Volume Types and Use Cases

Type of Storage	Solid State Drives (SSD)		Hard Disk Drives (HDD)
Volume Types	General Purpose	Provisioned IOPS	Throughput Optimized	Cold
Max Volume Size	16TiB	16TiB	16TiB	16TiB
Max IOPS/Volume	16,000	64,000	500	250
Max Throughput/Volume	250 MiB/s	1,000 MiB/s	500 MiB/s	250 MiB/s

Solid State Drives (SSD)	Also SSD	Hard Disk Drives (HDD)	Also HDD
General Purpose	Provisioned IOPS	Throughput Optimized	Cold
recommended for most workloads	critical business applications that require sustained IOPS performance, or more than 16k IOPS or 250 MiB/s of throughput per vol	streaming workloads that require consistent, fast, throughput at a low price	throughput-oriented storage for large volumes of data that is infrequently accessed
system boot volumes, has OS	large database workloads	big data	for scenarios where the lowest storage cost is important
virtual desktops	more for enterprise na	data warehouses	cannot be a boot volume, No OS
low latency interactive applications	high data transfer	log processing
development and test environments	-	cannot be a boot volume, No OS	-

boot volumes can only be SSD

Amazon EBS Features

Snapshots - point-in-time snapshots, recreate a new volume at any time, like Time Machine or a save state
Encryption - encrypted Amazon EBS Volumes, no additional cost
- expect that this may consume a bit more space
- may be a bit slower when you have encryption on
- there's no additional cost because it's your responsibility
Elasticity - increase capacity, change to different types
- scale up

regarding...
data in rest → customer is responsible
data in transit → customer is responsible

encryption is a feature and a choice to activate

Amazon Simple Storage Service (Amazon S3) - Object Storage

data is stored as objects in buckets
virtually unlimited storage, and a single object is limited to 5TB
designed for 11 9s of durability
- 99.999999999% annual durability basically means for every 10,000,000 objects stored, you can (on average), expect to lose a single object every 10,000 years. (99.99....% chance your data will NOT be lost)
- basically, data loss due to the cloud provider's infrastructure is a non concern because the chance is so low. it's more likely you'll lose a file to user error, application bugs, or malicious attacks rather than the underlying cloud storage failing.
granular access to bucket and objects

Amazon S3 (object storage) Classes

Amazon S3 offers a range of object-level storage classes that are designed for different use cases:

Standard → kind of like Google Drive
Intelligent-Tiering
Standard-Infrequent Access (Amazon S3 Standard-IA)
One Zone-Infrequent Access (Amazon S3 One Zone-IA)
Glacier
Deep Archive
(as you move down it becomes cheaper but also becomes slower)

what's happening here is that the longer you don't use a file, the lower it goes

infrequent access: you don't really use it as much
one zone: instead of all availability zones

Amazon S3 (object storage) bucket URLs (2 styles)

to upload your data

create a bucket in an AWS region
upload almost any number of objects to the bucket

Data is redundantly stored in the region

high durability: if one fails, then you can just go to the other one that has a copy
backups, redundancy

Access the data anywhere

because it's SaaS, it's connected to the internet

on AWS Management console (web gui)
AWS command line interface (CLI)
SDK

Common Use Cases and Scenarios

use cases

storing application assets
static web hosting
backup and disaster recovery (DR)
staging area for big data
and so on...

mostly static data

CCTV Footage
application assets
documents in the cloud

Amazon S3 Storage Pricing

to estimate the costs, consider the following:

Storage class type
- Standard storage is designed for 11 9s of durability, 4 9s of availability (99.99% available, 100%=365 days, 90% availability means 36.5 days of failure, 99% means 3.6 days fail, 99.9% means 0.36 of a day, 99.99% downtime is 50 min down/yr)
- S3 Standard-Infrequent Access (S-IA) is designed for: 11 9s of durability, 3 9s of availability
Amount of storage
- the number and size of objects
Requests
- the number and type of requests (GET/PUT/COPY)
- type of requests: different rates for GET requests than other requests
- when you request, you also do a data transfer
Data Transfer
- pricing is based on the amount of data that is transferred out of the Amazon S3 Region
  - data transfer is free, but you incur charges for data that is transferred out

Amazon Elastic File System (Amazon EFS)

file storage in the AWS cloud
works well for big data and analytics, media processing workflows, content management, web serving, and home directories
petabyte-scale, low-latency file system
shared storage ← main purpose of using NFS
elastic capacity → able to expand
supports Network File System (NFS) ver 4.0 and 4.1 (NFSv4)
- NFS is good for file sharing or VMs with load balancing
compatible with all Linux-based AMIs for Amazon EC2

Amazon EFS architecture

the VMs just mount onto the file system

Amazon S3 Glacier (Archival)

representative of magnetic tape archival, long-term cold storage, and non-instant data retrieval

is a data archiving service that is designed for security, durability, and an extremely low cost
Amazon S3 Glacier is designed to provide 11 9s of durability for objects (likelihood for data loss is low to 0)
supports the encryption of data in transit and at rest through Secure Sockets Layer (SSL) or Transport Layer Security (TLS)
the Vault Lock feature enforces compliance through a policy
extremely low-cost design works well for long-term archiving
- provides 3 options for access to archives - expedited, standard, and bulk
- retrieval times range from a few minutes to several hours

storage service for low-cost data archiving and long-term backup
you can configure lifecycle archiving of Amazon S3 content to Amazon S3 Glacier
retrieval options
- standard: 3-5 hours
- bulk: 5-12 hours
- expedited: 1-5 min

Amazon S3 Glacier Use Cases

media asset archiving
healthcare information archiving
regulatory and compliance archiving
scientific data archiving
digital preservation
magnetic tape replacement

Using Amazon S3 Glacier

you can move buckets to glacier
you can move SDKs to make it programmable
also use REST APIs

Amazon S3 Glacier Lifecycle Policies

Amazon S3 lifecycle policies enable you to delete or move objects based on age
it can be in standard, then can move to infrequent access, then glacier, etc.
you can configure it to be deleted
lifecycle → data from its birth to its death based on the policy you set up

Comparison: Amazon S3 vs Glacier

both are object storages
if you're doing a lot of upload and download don't put it in Glacier

	Amazon S3	Amazon S3 Glacier
Data Volume	no limit	no limit
Average Latency	ms	mins/hours
Item Size	5TB max	40 TB max
cost/GB per mon	higher	lower
billed requests	PUT, COPY, POST, LIST, GET	UPLOAD and retrieval (note you can't get the file immediately, moves it to S3 first)
retrieval pricing	¢ per request	¢¢ per request and per GB

Server-side encryption for Amazon S3 and Glacier

application must enable encryption

Security with Amazon S3 Glacier

how you secure your storages

control access with IAM
Amazon S3 Glacier encrypts your data with AES-256
Amazon S3 Glacier manages your keys for you