S3

Simple Storage Service - S3

Definition

  • Storage service - a backbone service of AWS, provides secure, durable, highly-scalable.

  • Easy to use object storage with a simple web service interface

    • Object-based storage

    • A safe place to store your files

    • The data is spread across multiple device and facilities

    • File: 0-5TB

    • Unlimited storage

    • Files saved in buckets

    • Globally (like domains)

Buckets

  • Web folder (a flat folder) - containers for objects, root namespace for AWS S3.

  • The name must be unique across all AWS accounts (like domains).

Basics

Object-based storage

  • Objects are the entities for files stored in S3

  • Size: up to 5TB

  • Store data as binary

  • Object has:

    • keys: a unique identifier (like a file name)

    • data: binary data

    • metadata: information about the object

      • system metadata: auto-generated by S3

      • user metadata (optional): generated by the user

    • versionId: for versioning

    • subresources:

      • ACL: access control list

      • torrent: torrent supported info

Data Consistency

  • Data can be accessed by REST API using URL:

    • Not support object locking → request that has the latest timestamp is executed

    • read-after-write consistency for PUTS new objects

    • eventually consistency for PUTS to an existing object or DELETE objects

Access Control

  • S3 is secure by default (public access is turned off)

  • Access List Controls (ACLs)

    • Logging for buckets

    • Hosting static webs

  • Bucket policies

    • IP range

    • AWS account

    • Objects with prefixes

  • IAM

  • Bucket policies are the recommended access control mechanism

Advanced Features

Prefixes and Delimiters

  • S3 uses a flat structure in a bucket

  • Using prefixes and delimiters in order to make a file and folder hierarchy

  • Not a file system

Storage Classes

Name

Durability

Availability

Usecases

STANDARD

9(11)%

9(4)%

Short term, long term storage for frequently accessed data

STANDARD_IA

9(11)%

9(4)%

Long-lived, less frequently accessed data (infrequently accessed data that is stored for longer than 30 days)

RRS

9(4)%

9(4)%

Can be easily reproduced data(thumbnails,...)

GLACIER

9(9)%

9(2)%

Not available for real-time access. You must first restore archived objects before you can access them.

Object Lifecycle

  • Automated tiering

  • Should be used to reduce cost

Encryption

Server-side encryption

  • SSE-S3 (AWS-Managed Keys): "check-box-style" encryption AWS handles the management and protection of the key. A new key is issued monthly.

  • SSE-KMS (AWS KMS Keys): Handle user-provided keys management and protection, provides auditing → access logging

  • SSE-C (Customer-Provided Keys): Encrypt/decrypt data by using user-provided keys

Client-side encryption

  • AWS KMS key

  • Client-side master key

For maximum simplicity and ease of use, use SSE-S3 or SSE-KMS

Versioning

  • Protects data against accidental or malicious deletion

  • Once enabled, can not be removed, it can only be suspended

  • Bucket level setting

  • Save versions of the file by using IDs

MFA Delete

  • Additional authentication prevents accidental or malicious deletion

  • Can only be enabled by the root account

Pre-Signed URLs

  • Use owner security credential to grant time-limited permission to access objects

  • Protect against "content scraping."

Multi-part upload

  • Used for large file

  • 3 steps process

    • Initiation

    • Uploading the parts

    • Completion

  • Should be used for objects larger than 100Mbytes

  • Must be used for objects larger than 5GB

Cross-Region Replication

  • Asynchronously replicate to a target bucket in another region

  • Versioning must be turned on

  • Must use IAM policy to give Amazon the permission

  • Replicate only new object (created after activation), and delete marker is replicated

  • Delete a delete marker is not replication between buckets

Events

  • Trigger action by S3 events:

    • SNS

    • SQS

    • Lambda

Logging

  • Off by default

  • Should use with prefixed (/logs, bucketname/logs)

  • Information

    • Request user account

    • Bucket name

    • Request time

    • Action (GET, PUT, LIST)

    • Response status, error code

Best practices

  • Use for hybrid IT environments and applications (data in on-premise file systems, databases back-ups,...)

  • bulk blob for data

  • If request rate > 100 requests per second, use a hash prefix

  • Static web hosting should use CloudFront as a cache layer

  • domain: bucket-name.s3-website-regions.amazonaws.com

Glacier

  • Extremely low-cost storage service provides durable, secure, and flexible storage for data archiving and online backup.

  • Retrieval time of three to five hours

  • 99.9(9)% durability

Archives

  • Data is stored in archives

  • Up to 40TB and an unlimited number of archives

  • Assigned a unique archive ID at the time of creation

  • Automatically encrypted and immutable

Vaults

  • Vaults are containers for archives

  • Each AWS accounts can have up to 1000 vaults

Vault Locks

  • Enforce compliance controls with a vault lock policy

  • Once locked, the policy can no longer be changed

Data Retrieval

  • Up to 5% of data can be stored for free each month

S3 vs. Glacier

S3

Glacier

Data size| 40TB archive | 5TB object | Identifier|system-generated Ids|friendly key names| Encryption| automatically encrypted|encrypted at rest|

Review questions: 90% → PASSED!

Mindmap

AWS Storage Mindmap