AWS Solutions architect 5 - Storage and Content Delivery

S3 architecture

Permissions

  • Bucket authorization within S3 is controlled using
    • identity policies on AWS identities
    • bucket policies in the form of resource policies on the bucket
    • bucket or object ACLs
  • Final authorization is acombination of all applicable policies
    • Priority order
      1. explicit deny
      2. explicit allow
      3. implicit deny

Transferring data to S3

  • Uploads to S3 are generally done using the S3 console, the CLI or the APIs
  • Uploads either use a single operation (known as a single PUT upload) or multipart upload
  • Single PUT upload
    • Object is uploaded in a single stream of data
    • Limit of 5 GB -> can cause performance issues, if it fails the whole upload fails
  • Multipart upload
    • Object is broken up into parts (up to 10.000)
    • Each part is %MB-5GB, and the last part can be less (remaining data)
    • Faster (parallel uploads), and the individual parts can fail and be retried individually
    • AWS recommends multipart for anything over 100MB, but it is required for anything beyond 5GB

Serve content

  • Static Websites

    • Amazon S3 buckets can be configured to host websites -> content can be uploaded to the bucket and when enabled, static web hosting will provide a unique endpoint URL that can be accessed by any web brwser
    • S3 can be used to host front-end code for serveless applications or an offload location for static content
    • Cloudfront can also be added to improve the speed and efficiency of content delivery for global users or to add SSL for custom domains
    • Route53 and alias records can also be used t add human-friendly names to buckets
    • bucket policy example (trailing /* -> applies policy to all objects in bucket)
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      {
      "Version":"2012-10-17",
      "Statement":[{
      "Sid":"PublicReadGetObject",
      "Effect":"Allow",
      "Principal": "*",
      "Action":["s3:GetObject"],
      "Resource":["arn:aws:s3:::YOUR_BUCKET_NAME/*"]
      }
      ]
      }
  • Cross Origin Resource Sharing (CORS)

    • CORS is a security measure allowing a web application running in one domain to reference resources in another

Encryption

  • Data is encrypted on transit/at rest (on per-object basis)
    • Client side
    • Server side with Customer-managed keys (SSE-C)
    • Server side with S3 managed keys (SSE-S2)
    • Server side with AWS KMS-managed keys (SSE-KMS)
  • Bucket default encryption
    • objects, not buckets, are encrypted
    • each PUT operation needs to specify encyption (and type) or not
    • a bucket default captures any PU operations where no encryption method/directive is specified
    • it doesn’t enforce what type can and can’t be used. Buckets policies can enforce

Versioning

  • Objects versioning: enabled on a bucket
    • once enabled, any operations that would otherwise modify objects generate new versions of that original object
    • once a bucket is version enabled, it can never be switched-off, only suspended
    • with versioning enabled, an AWS account is billed for all versions of all objects
      • objects deletion by default does not delete an object - delete marker added
      • older versions of an object can be accessed using name + verionID
      • specific veriosn can be deleted
  • MFA delete: feature designed to prevent accidental deleteion of objects
    • a 1 time password is required to delete an object version or when changing the versioning state of a bucket

Presigned URLs

  • Preassigned URL: can be created by an identity in AWS, providing access to an object using the creators permissions

    • When preassigned URL is used, AWs verifies the creator’s access to the object, not yours
    • The URL is encoded with authentication built in and has an expiry time
    • Preassigned URLs can be used to download or upload objects
  • Any identity can create a preassigned URL - even if that identity doesn’t have access to the object

  • When used preassigned URLs, you may get an error. Some common situations include

    • URL has expired - 7 days maximum
    • creator’s permissions of the URL changed
    • URL was created using a role (36-hours max)and the role’s temporary credentials have expired (aim to never create preasigned URLs using roles)

S3 performance and resilience

Resiliance = High Availability (HA)

Storage tiers/classes

Tier Value Availability min
Standard default, all purpose 99,99% (11 9s) in AZs, no min size
Standard Intelligent-Tier unknown or changing access patterns _ __
Standard-infrequent real time, infrequent 99,9% 30 days, 128KB
One zone-IA non-critical 99,5% cheaper than standad-IA
Glacier long-term (warm backup) retrival mins 3 AZs, 90 day, 40KB
Glacier deep archive long term (cold backup) retrieval days __

Lifecycle policies and intelligent-tiering

  • Lifecycle rules control storage classes, allow for the automated transition of objects between storage classes, or expiration of objects that are no lobger required
    • rules are added at bucket level
    • rules can be enabled or disabled based on business rules
    • objects can be archieved using lifecycle configurations
    • objects can be restored into S3 for temporary periods of time, after which they are deleted
    • objects encrypted remain encryted during the transitions
  • INTELLIGENT_TIERING
    • Objects smaller than 128KB cannot be transitioned into it
    • Objects must be in the original storage for a minimum of 30 days before transitioning
    • At the point of expiry, ojects are deleted from the bucket
graph LR

subgraph S3
  A[Standard]
  B[Standard IA]
  C[One Zone IA]
end

D[Glacier]

A --> B;
B --> C;
C --> D;

CRR

  • S3 cross region replication (CRR): on buckets, allow one way replication of data from a source bucket to a destination bucket in another region
    • Replicas keep
      • storage class
      • object name (key)
      • owner
      • object permissions
    • Replication configuration
      • applied to the source bucket
      • versioning must be enabled both on orgin and destination
      • requires an IAM role with permissions.
    • excluded from replication
      • system actions (lifecycle events)
      • any existing objects from before replication is enabled
      • SSE-C encrypted objects- only SSE-S3 and (if enabled) KMS encrypted objects are supported

CloudFront

CloudFront is a content delivery network (CDN): global cache that stores copies of your data on edge caches, which are positioned as close to your customers as possible

  • lower latency
  • higher transfer speeds
  • reduced load on the content server

CloudFront architecture

  • CloudFront components
    • origin: server/service to host content
    • distribution: configuration entity in CloudFront (CloudFront implementation)
    • edge location: local infrastructure (150 locations over 30 countries)
    • regional edge caches: larger veriosn of edge locations (more capacity, larger areas)
  • Caching process
    1. create a distribution and point at one or more origins. A distribution has DNS address that is used to access it
    2. DNS address directs clients at the closes avialable edge location
    3. of edege location has cached copy of your data, it’s delivered locally from edge location
    4. if it¡s not cached, the edge location attempts to downloadit from either a regional cache or from the origin (known as an origin fetch)
    5. as the edge location receives the data, it immediatly begins forwarding it and caches it for the next visitor
graph TD

A(customer)

subgraph distribution
  B[Edge location]
  C[Regional cache]
  D[S3]
end

A -- object delivery --> B;
B --> C;
D -. origin fetch .-> B;
D -. transfer to regional cache .-> C;

OAI

  • CloudFront is publicly accesible by default (anyone with the DNS endpoint address can access it
    • Distribution can be configured to be private (access requires a signed URL cookie) via trusted signers on the distributions ->can be bypassed by going straight to the origin
  • Origin Access Identity (OAI) is a virtual identity that can be associated with a distribution
    • S3 bucket can then be restricted to only allow this OAI to access it, all other identities can be denied

Network File Systems

  • Amazon EFS: implementation of the Network File System (NFSv4) delivered as a service. Files can be create and mounted on multiple Linux instances at the same time
    • base entity of a file system
    • accessed via mount targets
    • file system is mounted on Linux instances (the only one supported on Linux)
    • file systems are accessible from a VPC or from on-premises locations via VPN or Direct Connect
graph TD

subgraph VPC
  A[EFS]
  B(POSIX permissions)
  C[AZ-1 instances]
  D[mount target]
end

E[Direct Connect]

subgraph corpo. datacenter
  F[Server]
end

A -.- B;
A --> C;
C --> D;
E --> D;
F --> E;
  • Performance modes

    • General purpose (default, 99% of needs)
    • Max I/O (larger number of instances (>100) need to access the file system)
  • Throughput modes

    • Bursting throughput: 100MiB/s base burst
    • Provisioned troughput: allows control over throughput independently of file system size
  • Security groups are used to control access to NFS mount targets

  • EFS supports 2 storage classes (with lifecycle managament)

    • Standard
    • Infrequent access (IA)