S3
Simple Storage Service - S3
Definition
Storage service - a backbone service of AWS, provides secure, durable, highly-scalable.
Easy to use object storage with a simple web service interface
Object-based storage
A safe place to store your files
The data is spread across multiple device and facilities
File: 0-5TB
Unlimited storage
Files saved in buckets
Globally (like domains)
Buckets
Web folder (a flat folder) - containers for objects, root namespace for AWS S3.
The name must be unique across all AWS accounts (like domains).
Basics
Object-based storage
Objects are the entities for files stored in S3
Size: up to 5TB
Store data as binary
Object has:
keys: a unique identifier (like a file name)
data: binary data
metadata: information about the object
system metadata: auto-generated by S3
user metadata (optional): generated by the user
versionId: for versioning
subresources:
ACL: access control list
torrent: torrent supported info
Data Consistency
Data can be accessed by REST API using URL:
Not support object locking → request that has the latest timestamp is executed
read-after-write consistency for PUTS new objects
eventually consistency for PUTS to an existing object or DELETE objects
Access Control
S3 is secure by default (public access is turned off)
Access List Controls (ACLs)
Logging for buckets
Hosting static webs
Bucket policies
IP range
AWS account
Objects with prefixes
IAM
Bucket policies are the recommended access control mechanism
Advanced Features
Prefixes and Delimiters
S3 uses a flat structure in a bucket
Using prefixes and delimiters in order to make a file and folder hierarchy
Not a file system
Storage Classes
Name | Durability | Availability | Usecases |
STANDARD | 9(11)% | 9(4)% | Short term, long term storage for frequently accessed data |
STANDARD_IA | 9(11)% | 9(4)% | Long-lived, less frequently accessed data (infrequently accessed data that is stored for longer than 30 days) |
RRS | 9(4)% | 9(4)% | Can be easily reproduced data(thumbnails,...) |
GLACIER | 9(9)% | 9(2)% | Not available for real-time access. You must first restore archived objects before you can access them. |
Object Lifecycle
Automated tiering
Should be used to reduce cost
Encryption
Server-side encryption
SSE-S3 (AWS-Managed Keys): "check-box-style" encryption AWS handles the management and protection of the key. A new key is issued monthly.
SSE-KMS (AWS KMS Keys): Handle user-provided keys management and protection, provides auditing → access logging
SSE-C (Customer-Provided Keys): Encrypt/decrypt data by using user-provided keys
Client-side encryption
AWS KMS key
Client-side master key
For maximum simplicity and ease of use, use SSE-S3 or SSE-KMS
Versioning
Protects data against accidental or malicious deletion
Once enabled, can not be removed, it can only be suspended
Bucket level setting
Save versions of the file by using IDs
MFA Delete
Additional authentication prevents accidental or malicious deletion
Can only be enabled by the root account
Pre-Signed URLs
Use owner security credential to grant time-limited permission to access objects
Protect against "content scraping."
Multi-part upload
Used for large file
3 steps process
Initiation
Uploading the parts
Completion
Should be used for objects larger than 100Mbytes
Must be used for objects larger than 5GB
Cross-Region Replication
Asynchronously replicate to a target bucket in another region
Versioning must be turned on
Must use IAM policy to give Amazon the permission
Replicate only new object (created after activation), and delete marker is replicated
Delete a delete marker is not replication between buckets
Events
Trigger action by S3 events:
SNS
SQS
Lambda
Logging
Off by default
Should use with prefixed (/logs, bucketname/logs)
Information
Request user account
Bucket name
Request time
Action (GET, PUT, LIST)
Response status, error code
Best practices
Use for hybrid IT environments and applications (data in on-premise file systems, databases back-ups,...)
bulk blob for data
If request rate > 100 requests per second, use a hash prefix
Static web hosting should use CloudFront as a cache layer
domain: bucket-name.s3-website-regions.amazonaws.com
Glacier
Extremely low-cost storage service provides durable, secure, and flexible storage for data archiving and online backup.
Retrieval time of three to five hours
99.9(9)% durability
Archives
Data is stored in archives
Up to 40TB and an unlimited number of archives
Assigned a unique archive ID at the time of creation
Automatically encrypted and immutable
Vaults
Vaults are containers for archives
Each AWS accounts can have up to 1000 vaults
Vault Locks
Enforce compliance controls with a vault lock policy
Once locked, the policy can no longer be changed
Data Retrieval
Up to 5% of data can be stored for free each month
S3 vs. Glacier
S3 | Glacier |
Data size| 40TB archive | 5TB object | Identifier|system-generated Ids|friendly key names| Encryption| automatically encrypted|encrypted at rest|
Review questions: 90% → PASSED!
Mindmap
Last updated