AWS Solutions architect 6 - Database

Models

  • RDBMS
    • Relational database management systems (RDBMS) data has formal and fixed relationships
      • data stored in rows -> individual attributes
      • Tables have schemas (define row layout)
    • RDBMS conforms ACID system (Atomicity, Consistency, Isolation, Durability)
      • High performance
      • Low scalability
    • Structured Query Language (SQL) is used for RDBMS
  • Non-relational: No-SQL (e.g. for social media. data warehousing, analytics)
    • Elements
      • Key/value: fast queries, no relationships
      • Document: structure of key/value pairs. Operations are highly performant
      • Column: data is stored in columns rather than rows (Amazon Redshift)
      • Graph: designed for dynamic relationships. Data = nodes (Neo4j)

SQL — RDS

Definition

  • RDS = Database as a Service (DBaS) -> fully functional DB without admin overhead
    • performs ar sclae
    • can be publicly accesible
    • can be configured for demanding availability and durability scenarios
  • Engines
    • MySQL
    • PostgreSQL
    • MariaDB
    • Oracle
    • Microsoft SQL
graph LR

A(DB instance CNAME)

subgraph VPC1-region1
  B[Standby - DB storage -AZ1]
  C[Primary- DB storage -AZ2]
end

D[S3]

subgraph VPC2-region2
  E[Read replica - DB -AZ1]
end

A --> C;
B --> C;
C --> B;
B --> D;
C --> E;
  • 1 or more AZs for resilliance
    • general purpose (DB.M4, DB.M5)
    • memory optimized (DB.R4, DB.R5, Oracle DB.X1 amd DB.X1e)
    • burstable (DB.T2 and DB.T3)
  • Storage types
    • generally purpose SSD (gp2)
    • provisione IOPS SSD (io1): indepndent configuration
  • Billing based on
    • instance size
    • provisioned storage (not used)
    • IOPS if using io1
    • Data transferred out
    • Any backups/snapshots beyond the 100% that is free with each DB instance
  • RDS supports encryption with limitations
    • configured when creating DB instances
    • added by taking snapshots, or creating new instance from encrypted snapshot
    • encryption can not be removed
    • read replicas need to be the same state as theprimary instance (encrypted or not)
    • encrypted snapshots can be copied between regions, but a new destination region KMS CMK is used (as they are region specific)
  • Network access to an RDS instance is controlled by a security group (SG) associated with RDS instance

Backups

  • Automated backups: to S3 occur daily, retained for 0-35days
  • Manual snapshots: exist until deleted
  • Point in time log-based backups: stored on S3

RDS supports manual snapshot-based backups as well as automatic point-in-time recovery-capable backups with a 1- to 35-day retention period.

graph LR
A(Primary-A DB)
B(Standby FB)
C[S3]
D[S3]
E(Primary-B DB)

A -- synchronous data replication --> B;
B -- daily backup --> C;
B -- manual snapshot --> D;
D -- restore a new instance --> E;

Resiliency multi-AZ

  • RDS can be provisioned in single or multi-AZ mode (standby instance in same or different) -> recover from failure
  • Only primary can be accessed using instance CNAME
  • No performace benefit, better RTO than storing snapshot
  • Replication of data is asynchronous (copy in real time from primary to snadby)
    • Backups are taken from standby, to ensure no performance impact
    • Maintenance is performed in the standby first, then promoted to minimiza downtime

Read replicas

  • Read replicas are read-only cpies of an RDS instance that can be created in the same or different region from the primary instance
    • can be adresseed independently (each having their DNS name)
    • used for read workloads, allowing scaling reads
    • 1 RDS instance -> 5 read replicas (whihc can be created from other read replicas)
    • read replicas can be promoted to primary instances and can be themselves multi-AZ
  • Read replicas are eventually consistent (in seconds, but aplications need to support it)

SQL — Aurora

Essentials

  • Aurora enhanced RDS, by Amazon, compatible with MySQL, PostgreSQL tools
    • base configuration = cluster
    • cluster contains a single primary instance and 0+ replicas
  • Cluster storage: all instances use the same storage
    • read/write uses cluster endpoint
    • redas can use reader endpoint (balances connectionsover replicas)
    • volume = SSD based, can scale automatically up to 64TB, bill only consumed data
    • replicates data 6 times, across 3 AZs -> improve availability, be promoted to primary instance quickly
    • Aurora can tolerate 2 failures without writes being impacted and 3 failures without impacting reads
    • Aurora storage is auto-healing
    • Tier 0 has the highest priority in an Aurora failover
  • Backtrack feature
    • allows you to roll back a database for up to 72 hours
    • yo don’t have to make a new cluster when using Aurora’s backtrack feature to restore a database.

Parallel queries and global

  • Paralell queries: executed across all the nodes of the cluster at the same time
    • Activated when created the Aurora cluster
  • Global: Aurora provisioning option which adds resiliency by allowing you to pick amongst all AWS regions as your secondary reader cluster
    • 1 primary region, 2n region for read workloads -> low latency
    • Activated when created the Aurora cluster, only for some verions

Serverless

  • Aurora Serverless: based on the same DB engine as Aurora but wothout resource allocation
    • specify a minimum and maximum number of Aurora Capacity Units (ACUs) - measurement for processing (compute) and memory in Aurora Serverless.
    • can use the Data API to connect to it
    • billing: charges are based on database resources used per second
    • master exists in one Availability Zone
    • capable of rapid scaling because it uses proxy fleets to route the workload to “warm” resources that are always ready to service requests.
    • maximum amount of replicas: 15
    • when to use it
      • should be used when workloads are intermittent and unpredictable
      • slower failover time than Aurora Provisioned
graph LR

A[Applications]
B(Proxy fleet)
C[Aurora Serverless cluster DBs]
D[Pool DBs]

A --> B;
B --> C;
C --> D;
  • Proxy Fleet: fleet of proxy instances who route an application’s query to a group of automatically scalable resources
  • Query editor: web-based tool that allows you to log in to the Aurora Serverless cluster and executes queries

NoSQL: DynamoDB

Essentials

  • Elements

    • DynamoDB: NoSQL DB service, 3 replicas of data
    • Table: collection of items that share teh same partition key (PK) or partition+sort key (SK) with other configuration and performance settings
    • Item: collections of attributes (up to 400KB in size) inside a table that shares the same key structure as every other in the table
    • Attribute: key-value pair
  • Query: filter based on PK or SK, efficient

  • Scan: checks all items, not efficient

    • Filters: applied to scan

Performance

  • 2 read/write capacity modes

    • provisioned throughput (default)
      • each table is configured with Read Capacity Units (RCU) and Write Capacity Units (WCU)
      • every operation on Items
    • on-demand mode (automatically consumes at least 1 RCU or WCU - partial RCU/WCU cannot be consumed scales to handle performance demands)
  • Consistency

    • 200 status code = write has been completed and is durable
    • ensures Dynamo DB returns the most up-to-date copy of data
    • Reads from leader node to use strongly consistent reads
  • Capacity modes

    • On-demand
    • Provisioned
    • Provisioned with Auto Scaling
  • Capacity Units How-To Guide – How to Calculate Read and Write Capacity for DynamoDB – Linux Academy

    • Read Capacity Units (RCU)
      • (ITEM SIZE (rounded up to the next 4KB multiplier) / 4KB) * # of items (Round up to the nearest 4 KB multiplier)
      • 4 KB of data read from a table per second in a strongly consistent way
      • Read 2kKB consumes 1 RCU
      • if eventually consistent reads are OK, 1 RCU can allow for 2x4 KB of data reads per second
      • atomic transactions require 2x the RCU to complete
    • Write Capacity Units (WCU)
      • (ITEM SIZE (rounded up to the next 1KB multiplier) / 1KB) * # of items (Round up to the nearest 1 KB multiplier)
      • 1 WCU is 1 KB of data or less written to a table
      • Write 200 bytes consumes 1 WCU
      • atomic transactions require 2x the WCU to complete

Elements

  • Streams: provide ordered list of changes that occur to items with a DynamoDB table

    • rolls 24 hour window of changes
    • enabled per table (only data from the point of being enabled)
    • has a ARN that identifies it globally across all tables, accounts and regions
    • View types
      • KEYS_ONLY: whenever an item is added, updated or deleted, the keys of the item are added to the strea,
      • NEW_IMAGE: the entire item is added to the stream “post-change”
      • OLD_IMAGE: the entire item is added to the stream “pre-change”
      • NEW_AND_OLD_IMAGES: both the new and old versions of the item are added to the stream
  • Triggers:

    • steams can be integrated with AWS lambda, invoking a function whenever items are changed in a DynamoDB table (a DB ttrigger)
graph LR

A[Terminal]
B[Dynamo DB table]
C[Dynamo DB Stream Records]
D((AWS lambda))

A --> B;
B --> C;
C --> D;
  • Indexes: provide n alternative represntation of data in a table, which is useful for applications with varying query demansa
    • 2 forms
      • Local Secoundary Indexes (LSI)
        • created at the same time as the table
        • same PK, as alternative SK
        • share the RCU and WCU values for the main table
        • maximum: 5
      • Global Secoundary Indexes (GSI)
        • created after table was created, asynchronous data from the table
        • different PK and SK
        • have their own RCU and WCU values
        • maximum amount (without logging a support ticket) of GSIs per table: 30
    • interated with as though they are table (alternative representation of it)

In-memory caching

  • DynamoDB Accelerator (DAX): in-memory cache designed specifically ofr DynamoDB

    • results delivered from DAX are available in microseconds rather than in the single-digit milliseconds available from DynamoDB
    • can use a cluster architecture, run inside VPC, applications use a DAX client
    • 2 distinct caches
      • item cache
        • stores results from GetItem and BatchGetItem
        • has a 5-minutes default TTL
      • query cache
        • stores results from Query and Scan
        • caches based on the parameters specified
  • ElastiCache: managed in-memory data store supporting the Redis or Memcached engines, for lyarge sets of data with repeated read patterns

    • 2 use cases
      • offloading database
        • reads by caching responses
        • improving application speed and reducing costs
      • storing user session database
        • allowing for stateless compute instances (used for fault tolerant architectures)
    • is used with key-value databases or to store simple session data, but it can be used with SQL database engines