AWS Solutions architect 6 - Database

In DevOps

Models

RDBMS
- Relational database management systems (RDBMS) data has formal and fixed relationships
  - data stored in rows -> individual attributes
  - Tables have schemas (define row layout)
- RDBMS conforms ACID system (Atomicity, Consistency, Isolation, Durability)
  - High performance
  - Low scalability
- Structured Query Language (SQL) is used for RDBMS
Non-relational: No-SQL (e.g. for social media. data warehousing, analytics)
- Elements
  - Key/value: fast queries, no relationships
  - Document: structure of key/value pairs. Operations are highly performant
  - Column: data is stored in columns rather than rows (Amazon Redshift)
  - Graph: designed for dynamic relationships. Data = nodes (Neo4j)

SQL — RDS

Definition

RDS = Database as a Service (DBaS) -> fully functional DB without admin overhead
- performs ar sclae
- can be publicly accesible
- can be configured for demanding availability and durability scenarios
Engines
- MySQL
- PostgreSQL
- MariaDB
- Oracle
- Microsoft SQL

graph LR

A(DB instance CNAME)

subgraph VPC1-region1
  B[Standby - DB storage -AZ1]
  C[Primary- DB storage -AZ2]
end

D[S3]

subgraph VPC2-region2
  E[Read replica - DB -AZ1]
end

A --> C;
B --> C;
C --> B;
B --> D;
C --> E;

1 or more AZs for resilliance
- general purpose (DB.M4, DB.M5)
- memory optimized (DB.R4, DB.R5, Oracle DB.X1 amd DB.X1e)
- burstable (DB.T2 and DB.T3)
Storage types
- generally purpose SSD (gp2)
- provisione IOPS SSD (io1): indepndent configuration
Billing based on
- instance size
- provisioned storage (not used)
- IOPS if using io1
- Data transferred out
- Any backups/snapshots beyond the 100% that is free with each DB instance
RDS supports encryption with limitations
- configured when creating DB instances
- added by taking snapshots, or creating new instance from encrypted snapshot
- encryption can not be removed
- read replicas need to be the same state as theprimary instance (encrypted or not)
- encrypted snapshots can be copied between regions, but a new destination region KMS CMK is used (as they are region specific)
Network access to an RDS instance is controlled by a security group (SG) associated with RDS instance

Backups

Automated backups: to S3 occur daily, retained for 0-35days
Manual snapshots: exist until deleted
Point in time log-based backups: stored on S3

RDS supports manual snapshot-based backups as well as automatic point-in-time recovery-capable backups with a 1- to 35-day retention period.

graph LR
A(Primary-A DB)
B(Standby FB)
C[S3]
D[S3]
E(Primary-B DB)

A -- synchronous data replication --> B;
B -- daily backup --> C;
B -- manual snapshot --> D;
D -- restore a new instance --> E;

Resiliency multi-AZ

RDS can be provisioned in single or multi-AZ mode (standby instance in same or different) -> recover from failure
Only primary can be accessed using instance CNAME
No performace benefit, better RTO than storing snapshot
Replication of data is asynchronous (copy in real time from primary to snadby)
- Backups are taken from standby, to ensure no performance impact
- Maintenance is performed in the standby first, then promoted to minimiza downtime

Read replicas

Read replicas are read-only cpies of an RDS instance that can be created in the same or different region from the primary instance
- can be adresseed independently (each having their DNS name)
- used for read workloads, allowing scaling reads
- 1 RDS instance -> 5 read replicas (whihc can be created from other read replicas)
- read replicas can be promoted to primary instances and can be themselves multi-AZ
Read replicas are eventually consistent (in seconds, but aplications need to support it)

SQL — Aurora

Essentials

Aurora enhanced RDS, by Amazon, compatible with MySQL, PostgreSQL tools
- base configuration = cluster
- cluster contains a single primary instance and 0+ replicas
Cluster storage: all instances use the same storage
- read/write uses cluster endpoint
- redas can use reader endpoint (balances connectionsover replicas)
- volume = SSD based, can scale automatically up to 64TB, bill only consumed data
- replicates data 6 times, across 3 AZs -> improve availability, be promoted to primary instance quickly
- Aurora can tolerate 2 failures without writes being impacted and 3 failures without impacting reads
- Aurora storage is auto-healing
- Tier 0 has the highest priority in an Aurora failover
Backtrack feature
- allows you to roll back a database for up to 72 hours
- yo don’t have to make a new cluster when using Aurora’s backtrack feature to restore a database.

Parallel queries and global

Paralell queries: executed across all the nodes of the cluster at the same time
- Activated when created the Aurora cluster
Global: Aurora provisioning option which adds resiliency by allowing you to pick amongst all AWS regions as your secondary reader cluster
- 1 primary region, 2n region for read workloads -> low latency
- Activated when created the Aurora cluster, only for some verions

Serverless

Aurora Serverless: based on the same DB engine as Aurora but wothout resource allocation
- specify a minimum and maximum number of Aurora Capacity Units (ACUs) - measurement for processing (compute) and memory in Aurora Serverless.
- can use the Data API to connect to it
- billing: charges are based on database resources used per second
- master exists in one Availability Zone
- capable of rapid scaling because it uses proxy fleets to route the workload to “warm” resources that are always ready to service requests.
- maximum amount of replicas: 15
- when to use it
  - should be used when workloads are intermittent and unpredictable
  - slower failover time than Aurora Provisioned

graph LR

A[Applications]
B(Proxy fleet)
C[Aurora Serverless cluster DBs]
D[Pool DBs]

A --> B;
B --> C;
C --> D;

Proxy Fleet: fleet of proxy instances who route an application’s query to a group of automatically scalable resources
Query editor: web-based tool that allows you to log in to the Aurora Serverless cluster and executes queries

NoSQL: DynamoDB

Essentials

Elements
- DynamoDB: NoSQL DB service, 3 replicas of data
- Table: collection of items that share teh same partition key (PK) or partition+sort key (SK) with other configuration and performance settings
- Item: collections of attributes (up to 400KB in size) inside a table that shares the same key structure as every other in the table
- Attribute: key-value pair
Query: filter based on PK or SK, efficient
Scan: checks all items, not efficient
- Filters: applied to scan

Performance

2 read/write capacity modes
- provisioned throughput (default)
  - each table is configured with Read Capacity Units (RCU) and Write Capacity Units (WCU)
  - every operation on Items
- on-demand mode (automatically consumes at least 1 RCU or WCU - partial RCU/WCU cannot be consumed scales to handle performance demands)
Consistency
- 200 status code = write has been completed and is durable
- ensures Dynamo DB returns the most up-to-date copy of data
- Reads from leader node to use strongly consistent reads
Capacity modes
- On-demand
- Provisioned
- Provisioned with Auto Scaling
Capacity Units How-To Guide – How to Calculate Read and Write Capacity for DynamoDB – Linux Academy
- Read Capacity Units (RCU)
  - (ITEM SIZE (rounded up to the next 4KB multiplier) / 4KB) * # of items (Round up to the nearest 4 KB multiplier)
  - 4 KB of data read from a table per second in a strongly consistent way
  - Read 2kKB consumes 1 RCU
  - if eventually consistent reads are OK, 1 RCU can allow for 2x4 KB of data reads per second
  - atomic transactions require 2x the RCU to complete
- Write Capacity Units (WCU)
  - (ITEM SIZE (rounded up to the next 1KB multiplier) / 1KB) * # of items (Round up to the nearest 1 KB multiplier)
  - 1 WCU is 1 KB of data or less written to a table
  - Write 200 bytes consumes 1 WCU
  - atomic transactions require 2x the WCU to complete

Elements

Streams: provide ordered list of changes that occur to items with a DynamoDB table
- rolls 24 hour window of changes
- enabled per table (only data from the point of being enabled)
- has a ARN that identifies it globally across all tables, accounts and regions
- View types
  - KEYS_ONLY: whenever an item is added, updated or deleted, the keys of the item are added to the strea,
  - NEW_IMAGE: the entire item is added to the stream “post-change”
  - OLD_IMAGE: the entire item is added to the stream “pre-change”
  - NEW_AND_OLD_IMAGES: both the new and old versions of the item are added to the stream
Triggers:
- steams can be integrated with AWS lambda, invoking a function whenever items are changed in a DynamoDB table (a DB ttrigger)

graph LR

A[Terminal]
B[Dynamo DB table]
C[Dynamo DB Stream Records]
D((AWS lambda))

A --> B;
B --> C;
C --> D;

Indexes: provide n alternative represntation of data in a table, which is useful for applications with varying query demansa
- 2 forms
  - Local Secoundary Indexes (LSI)
    - created at the same time as the table
    - same PK, as alternative SK
    - share the RCU and WCU values for the main table
    - maximum: 5
  - Global Secoundary Indexes (GSI)
    - created after table was created, asynchronous data from the table
    - different PK and SK
    - have their own RCU and WCU values
    - maximum amount (without logging a support ticket) of GSIs per table: 30
- interated with as though they are table (alternative representation of it)

In-memory caching

DynamoDB Accelerator (DAX): in-memory cache designed specifically ofr DynamoDB
- results delivered from DAX are available in microseconds rather than in the single-digit milliseconds available from DynamoDB
- can use a cluster architecture, run inside VPC, applications use a DAX client
- 2 distinct caches
  - item cache
    - stores results from GetItem and BatchGetItem
    - has a 5-minutes default TTL
  - query cache
    - stores results from Query and Scan
    - caches based on the parameters specified
ElastiCache: managed in-memory data store supporting the Redis or Memcached engines, for lyarge sets of data with repeated read patterns
- 2 use cases
  - offloading database
    - reads by caching responses
    - improving application speed and reducing costs
  - storing user session database
    - allowing for stateless compute instances (used for fault tolerant architectures)
- is used with key-value databases or to store simple session data, but it can be used with SQL database engines