books
Search
⌃K

S3

Simple Storage Service - S3

Definition

  • Storage service - a backbone service of AWS, provides secure, durable, highly-scalable.
  • Easy to use object storage with a simple web service interface
    • Object-based storage
    • A safe place to store your files
    • The data is spread across multiple device and facilities
    • File: 0-5TB
    • Unlimited storage
    • Files saved in buckets
    • Globally (like domains)

Buckets

  • Web folder (a flat folder) - containers for objects, root namespace for AWS S3.
  • The name must be unique across all AWS accounts (like domains).

Basics

Object-based storage

  • Objects are the entities for files stored in S3
  • Size: up to 5TB
  • Store data as binary
  • Object has:
    • keys: a unique identifier (like a file name)
    • data: binary data
    • metadata: information about the object
      • system metadata: auto-generated by S3
      • user metadata (optional): generated by the user
    • versionId: for versioning
    • subresources:
      • ACL: access control list
      • torrent: torrent supported info

Data Consistency

  • Data can be accessed by REST API using URL:
    • Not support object locking → request that has the latest timestamp is executed
    • read-after-write consistency for PUTS new objects
    • eventually consistency for PUTS to an existing object or DELETE objects

Access Control

  • S3 is secure by default (public access is turned off)
  • Access List Controls (ACLs)
    • Logging for buckets
    • Hosting static webs
  • Bucket policies
    • IP range
    • AWS account
    • Objects with prefixes
  • IAM
  • Bucket policies are the recommended access control mechanism

Advanced Features

Prefixes and Delimiters

  • S3 uses a flat structure in a bucket
  • Using prefixes and delimiters in order to make a file and folder hierarchy
  • Not a file system

Storage Classes

Name
Durability
Availability
Usecases
STANDARD
9(11)%
9(4)%
Short term, long term storage for frequently accessed data
STANDARD_IA
9(11)%
9(4)%
Long-lived, less frequently accessed data (infrequently accessed data that is stored for longer than 30 days)
RRS
9(4)%
9(4)%
Can be easily reproduced data(thumbnails,...)
GLACIER
9(9)%
9(2)%
Not available for real-time access. You must first restore archived objects before you can access them.

Object Lifecycle

  • Automated tiering
  • Should be used to reduce cost

Encryption

Server-side encryption

  • SSE-S3 (AWS-Managed Keys): "check-box-style" encryption AWS handles the management and protection of the key. A new key is issued monthly.
  • SSE-KMS (AWS KMS Keys): Handle user-provided keys management and protection, provides auditing → access logging
  • SSE-C (Customer-Provided Keys): Encrypt/decrypt data by using user-provided keys

Client-side encryption

  • AWS KMS key
  • Client-side master key
For maximum simplicity and ease of use, use SSE-S3 or SSE-KMS

Versioning

  • Protects data against accidental or malicious deletion
  • Once enabled, can not be removed, it can only be suspended
  • Bucket level setting
  • Save versions of the file by using IDs

MFA Delete

  • Additional authentication prevents accidental or malicious deletion
  • Can only be enabled by the root account

Pre-Signed URLs

  • Use owner security credential to grant time-limited permission to access objects
  • Protect against "content scraping."

Multi-part upload

  • Used for large file
  • 3 steps process
    • Initiation
    • Uploading the parts
    • Completion
  • Should be used for objects larger than 100Mbytes
  • Must be used for objects larger than 5GB

Cross-Region Replication

  • Asynchronously replicate to a target bucket in another region
  • Versioning must be turned on
  • Must use IAM policy to give Amazon the permission
  • Replicate only new object (created after activation), and delete marker is replicated
  • Delete a delete marker is not replication between buckets

Events

  • Trigger action by S3 events:
    • SNS
    • SQS
    • Lambda

Logging

  • Off by default
  • Should use with prefixed (/logs, bucketname/logs)
  • Information
    • Request user account
    • Bucket name
    • Request time
    • Action (GET, PUT, LIST)
    • Response status, error code

Best practices

  • Use for hybrid IT environments and applications (data in on-premise file systems, databases back-ups,...)
  • bulk blob for data
  • If request rate > 100 requests per second, use a hash prefix
  • Static web hosting should use CloudFront as a cache layer
  • domain: bucket-name.s3-website-regions.amazonaws.com

Glacier

  • Extremely low-cost storage service provides durable, secure, and flexible storage for data archiving and online backup.
  • Retrieval time of three to five hours
  • 99.9(9)% durability

Archives

  • Data is stored in archives
  • Up to 40TB and an unlimited number of archives
  • Assigned a unique archive ID at the time of creation
  • Automatically encrypted and immutable

Vaults

  • Vaults are containers for archives
  • Each AWS accounts can have up to 1000 vaults

Vault Locks

  • Enforce compliance controls with a vault lock policy
  • Once locked, the policy can no longer be changed

Data Retrieval

  • Up to 5% of data can be stored for free each month

S3 vs. Glacier

S3
Glacier
Data size| 40TB archive | 5TB object | Identifier|system-generated Ids|friendly key names| Encryption| automatically encrypted|encrypted at rest|
Review questions: 90% → PASSED!

Mindmap

AWS Storage Mindmap