AWS Solutions architect 5 - Storage and Content Delivery
S3 architecture
Permissions
- Bucket authorization within S3 is controlled using
- identity policies on AWS identities
- bucket policies in the form of resource policies on the bucket
- bucket or object ACLs
- Final authorization is acombination of all applicable policies
- Priority order
- explicit deny
- explicit allow
- implicit deny
- Priority order
Transferring data to S3
- Uploads to S3 are generally done using the S3 console, the CLI or the APIs
- Uploads either use a single operation (known as a single PUT upload) or multipart upload
- Single PUT upload
- Object is uploaded in a single stream of data
- Limit of 5 GB -> can cause performance issues, if it fails the whole upload fails
- Multipart upload
- Object is broken up into parts (up to 10.000)
- Each part is %MB-5GB, and the last part can be less (remaining data)
- Faster (parallel uploads), and the individual parts can fail and be retried individually
- AWS recommends multipart for anything over 100MB, but it is required for anything beyond 5GB
Serve content
Static Websites
- Amazon S3 buckets can be configured to host websites -> content can be uploaded to the bucket and when enabled, static web hosting will provide a unique endpoint URL that can be accessed by any web brwser
- S3 can be used to host front-end code for serveless applications or an offload location for static content
- Cloudfront can also be added to improve the speed and efficiency of content delivery for global users or to add SSL for custom domains
- Route53 and alias records can also be used t add human-friendly names to buckets
- bucket policy example (trailing
/*
-> applies policy to all objects in bucket)1
2
3
4
5
6
7
8
9
10
11{
"Version":"2012-10-17",
"Statement":[{
"Sid":"PublicReadGetObject",
"Effect":"Allow",
"Principal": "*",
"Action":["s3:GetObject"],
"Resource":["arn:aws:s3:::YOUR_BUCKET_NAME/*"]
}
]
}
Cross Origin Resource Sharing (CORS)
- CORS is a security measure allowing a web application running in one domain to reference resources in another
Encryption
- Data is encrypted on transit/at rest (on per-object basis)
- Client side
- Server side with Customer-managed keys (SSE-C)
- Server side with S3 managed keys (SSE-S2)
- Server side with AWS KMS-managed keys (SSE-KMS)
- Bucket default encryption
- objects, not buckets, are encrypted
- each PUT operation needs to specify encyption (and type) or not
- a bucket default captures any PU operations where no encryption method/directive is specified
- it doesn’t enforce what type can and can’t be used. Buckets policies can enforce
Versioning
- Objects versioning: enabled on a bucket
- once enabled, any operations that would otherwise modify objects generate new versions of that original object
- once a bucket is version enabled, it can never be switched-off, only suspended
- with versioning enabled, an AWS account is billed for all versions of all objects
- objects deletion by default does not delete an object - delete marker added
- older versions of an object can be accessed using name + verionID
- specific veriosn can be deleted
- MFA delete: feature designed to prevent accidental deleteion of objects
- a 1 time password is required to delete an object version or when changing the versioning state of a bucket
Presigned URLs
Preassigned URL: can be created by an identity in AWS, providing access to an object using the creators permissions
- When preassigned URL is used, AWs verifies the creator’s access to the object, not yours
- The URL is encoded with authentication built in and has an expiry time
- Preassigned URLs can be used to download or upload objects
Any identity can create a preassigned URL - even if that identity doesn’t have access to the object
When used preassigned URLs, you may get an error. Some common situations include
- URL has expired - 7 days maximum
- creator’s permissions of the URL changed
- URL was created using a role (36-hours max)and the role’s temporary credentials have expired (aim to never create preasigned URLs using roles)
S3 performance and resilience
Resiliance = High Availability (HA)
Storage tiers/classes
Tier | Value | Availability | min |
---|---|---|---|
Standard | default, all purpose | 99,99% (11 9s) | in AZs, no min size |
Standard Intelligent-Tier | unknown or changing access patterns | _ | __ |
Standard-infrequent | real time, infrequent | 99,9% | 30 days, 128KB |
One zone-IA | non-critical | 99,5% | cheaper than standad-IA |
Glacier | long-term (warm backup) | retrival mins | 3 AZs, 90 day, 40KB |
Glacier deep archive | long term (cold backup) | retrieval days | __ |
Lifecycle policies and intelligent-tiering
- Lifecycle rules control storage classes, allow for the automated transition of objects between storage classes, or expiration of objects that are no lobger required
- rules are added at bucket level
- rules can be enabled or disabled based on business rules
- objects can be archieved using lifecycle configurations
- objects can be restored into S3 for temporary periods of time, after which they are deleted
- objects encrypted remain encryted during the transitions
- INTELLIGENT_TIERING
- Objects smaller than 128KB cannot be transitioned into it
- Objects must be in the original storage for a minimum of 30 days before transitioning
- At the point of expiry, ojects are deleted from the bucket
graph LR subgraph S3 A[Standard] B[Standard IA] C[One Zone IA] end D[Glacier] A --> B; B --> C; C --> D;
CRR
- S3 cross region replication (CRR): on buckets, allow one way replication of data from a source bucket to a destination bucket in another region
- Replicas keep
- storage class
- object name (key)
- owner
- object permissions
- Replication configuration
- applied to the source bucket
- versioning must be enabled both on orgin and destination
- requires an IAM role with permissions.
- excluded from replication
- system actions (lifecycle events)
- any existing objects from before replication is enabled
- SSE-C encrypted objects- only SSE-S3 and (if enabled) KMS encrypted objects are supported
- Replicas keep
CloudFront
CloudFront is a content delivery network (CDN): global cache that stores copies of your data on edge caches, which are positioned as close to your customers as possible
- lower latency
- higher transfer speeds
- reduced load on the content server
CloudFront architecture
- CloudFront components
- origin: server/service to host content
- distribution: configuration entity in CloudFront (CloudFront implementation)
- edge location: local infrastructure (150 locations over 30 countries)
- regional edge caches: larger veriosn of edge locations (more capacity, larger areas)
- Caching process
- create a distribution and point at one or more origins. A distribution has DNS address that is used to access it
- DNS address directs clients at the closes avialable edge location
- of edege location has cached copy of your data, it’s delivered locally from edge location
- if it¡s not cached, the edge location attempts to downloadit from either a regional cache or from the origin (known as an origin fetch)
- as the edge location receives the data, it immediatly begins forwarding it and caches it for the next visitor
graph TD A(customer) subgraph distribution B[Edge location] C[Regional cache] D[S3] end A -- object delivery --> B; B --> C; D -. origin fetch .-> B; D -. transfer to regional cache .-> C;
OAI
- CloudFront is publicly accesible by default (anyone with the DNS endpoint address can access it
- Distribution can be configured to be private (access requires a signed URL cookie) via trusted signers on the distributions ->can be bypassed by going straight to the origin
- Origin Access Identity (OAI) is a virtual identity that can be associated with a distribution
- S3 bucket can then be restricted to only allow this OAI to access it, all other identities can be denied
Network File Systems
- Amazon EFS: implementation of the Network File System (NFSv4) delivered as a service. Files can be create and mounted on multiple Linux instances at the same time
- base entity of a file system
- accessed via mount targets
- file system is mounted on Linux instances (the only one supported on Linux)
- file systems are accessible from a VPC or from on-premises locations via VPN or Direct Connect
graph TD subgraph VPC A[EFS] B(POSIX permissions) C[AZ-1 instances] D[mount target] end E[Direct Connect] subgraph corpo. datacenter F[Server] end A -.- B; A --> C; C --> D; E --> D; F --> E;
Performance modes
- General purpose (default, 99% of needs)
- Max I/O (larger number of instances (>100) need to access the file system)
Throughput modes
- Bursting throughput: 100MiB/s base burst
- Provisioned troughput: allows control over throughput independently of file system size
Security groups are used to control access to NFS mount targets
EFS supports 2 storage classes (with lifecycle managament)
- Standard
- Infrequent access (IA)