AWS Solutions architect 6 - Database
Models
- RDBMS
- Relational database management systems (RDBMS) data has formal and fixed relationships
- data stored in rows -> individual attributes
 - Tables have schemas (define row layout)
 
 - RDBMS conforms ACID system (Atomicity, Consistency, Isolation, Durability)
- High performance
 - Low scalability
 
 - Structured Query Language (SQL) is used for RDBMS
 
 - Relational database management systems (RDBMS) data has formal and fixed relationships
 - Non-relational: No-SQL (e.g. for social media. data warehousing, analytics)
- Elements
- Key/value: fast queries, no relationships
 - Document: structure of key/value pairs. Operations are highly performant
 - Column: data is stored in columns rather than rows (Amazon Redshift)
 - Graph: designed for dynamic relationships. Data = nodes (Neo4j)
 
 
 - Elements
 
SQL — RDS
Definition
- RDS = Database as a Service (DBaS) -> fully functional DB without admin overhead
- performs ar sclae
 - can be publicly accesible
 - can be configured for demanding availability and durability scenarios
 
 - Engines
- MySQL
 - PostgreSQL
 - MariaDB
 - Oracle
 - Microsoft SQL
 
 
graph LR A(DB instance CNAME) subgraph VPC1-region1 B[Standby - DB storage -AZ1] C[Primary- DB storage -AZ2] end D[S3] subgraph VPC2-region2 E[Read replica - DB -AZ1] end A --> C; B --> C; C --> B; B --> D; C --> E;
- 1 or more AZs for resilliance
- general purpose (DB.M4, DB.M5)
 - memory optimized (DB.R4, DB.R5, Oracle DB.X1 amd DB.X1e)
 - burstable (DB.T2 and DB.T3)
 
 - Storage types
- generally purpose SSD (gp2)
 - provisione IOPS SSD (io1): indepndent configuration
 
 - Billing based on
- instance size
 - provisioned storage (not used)
 - IOPS if using io1
 - Data transferred out
 - Any backups/snapshots beyond the 100% that is free with each DB instance
 
 - RDS supports encryption with limitations
- configured when creating DB instances
 - added by taking snapshots, or creating new instance from encrypted snapshot
 - encryption can not be removed
 - read replicas need to be the same state as theprimary instance (encrypted or not)
 - encrypted snapshots can be copied between regions, but a new destination region KMS CMK is used (as they are region specific)
 
 - Network access to an RDS instance is controlled by a security group (SG) associated with RDS instance
 
Backups
- Automated backups: to S3 occur daily, retained for 0-35days
 - Manual snapshots: exist until deleted
 - Point in time log-based backups: stored on S3
 
RDS supports manual snapshot-based backups as well as automatic point-in-time recovery-capable backups with a 1- to 35-day retention period.
graph LR A(Primary-A DB) B(Standby FB) C[S3] D[S3] E(Primary-B DB) A -- synchronous data replication --> B; B -- daily backup --> C; B -- manual snapshot --> D; D -- restore a new instance --> E;
Resiliency multi-AZ
- RDS can be provisioned in single or multi-AZ mode (standby instance in same or different) -> recover from failure
 - Only primary can be accessed using instance CNAME
 - No performace benefit, better RTO than storing snapshot
 - Replication of data is asynchronous (copy in real time from primary to snadby)
- Backups are taken from standby, to ensure no performance impact
 - Maintenance is performed in the standby first, then promoted to minimiza downtime
 
 
Read replicas
- Read replicas are read-only cpies of an RDS instance that can be created in the same or different region from the primary instance
- can be adresseed independently (each having their DNS name)
 - used for read workloads, allowing scaling reads
 - 1 RDS instance -> 5 read replicas (whihc can be created from other read replicas)
 - read replicas can be promoted to primary instances and can be themselves multi-AZ
 
 - Read replicas are eventually consistent (in seconds, but aplications need to support it)
 
SQL — Aurora
Essentials
- Aurora enhanced RDS, by Amazon, compatible with MySQL, PostgreSQL tools
- base configuration = cluster
 - cluster contains a single primary instance and 0+ replicas
 
 - Cluster storage: all instances use the same storage
- read/write uses cluster endpoint
 - redas can use reader endpoint (balances connectionsover replicas)
 - volume = SSD based, can scale automatically up to 64TB, bill only consumed data
 - replicates data 6 times, across 3 AZs -> improve availability, be promoted to primary instance quickly
 - Aurora can tolerate 2 failures without writes being impacted and 3 failures without impacting reads
 - Aurora storage is auto-healing
 - Tier 0 has the highest priority in an Aurora failover
 
 - Backtrack feature
- allows you to roll back a database for up to 72 hours
 - yo don’t have to make a new cluster when using Aurora’s backtrack feature to restore a database.
 
 
Parallel queries and global
- Paralell queries: executed across all the nodes of the cluster at the same time
- Activated when created the Aurora cluster
 
 - Global: Aurora provisioning option which adds resiliency by allowing you to pick amongst all AWS regions as your secondary reader cluster
- 1 primary region, 2n region for read workloads -> low latency
 - Activated when created the Aurora cluster, only for some verions
 
 
Serverless
- Aurora Serverless: based on the same DB engine as Aurora but wothout resource allocation
- specify a minimum and maximum number of Aurora Capacity Units (ACUs) - measurement for processing (compute) and memory in Aurora Serverless.
 - can use the Data API to connect to it
 - billing: charges are based on database resources used per second
 - master exists in one Availability Zone
 - capable of rapid scaling because it uses proxy fleets to route the workload to “warm” resources that are always ready to service requests.
 - maximum amount of replicas: 15
 - when to use it
- should be used when workloads are intermittent and unpredictable
 - slower failover time than Aurora Provisioned
 
 
 
graph LR A[Applications] B(Proxy fleet) C[Aurora Serverless cluster DBs] D[Pool DBs] A --> B; B --> C; C --> D;
- Proxy Fleet: fleet of proxy instances who route an application’s query to a group of automatically scalable resources
 - Query editor: web-based tool that allows you to log in to the Aurora Serverless cluster and executes queries
 
NoSQL: DynamoDB
Essentials
Elements
- DynamoDB: NoSQL DB service, 3 replicas of data
 - Table: collection of items that share teh same partition key (PK) or partition+sort key (SK) with other configuration and performance settings
 - Item: collections of attributes (up to 400KB in size) inside a table that shares the same key structure as every other in the table
 - Attribute: key-value pair
 
Query: filter based on PK or SK, efficient
Scan: checks all items, not efficient
- Filters: applied to scan
 
Performance
2 read/write capacity modes
- provisioned throughput (default)
- each table is configured with Read Capacity Units (RCU) and Write Capacity Units (WCU)
 - every operation on Items
 
 - on-demand mode (automatically consumes at least 1 RCU or WCU - partial RCU/WCU cannot be consumed scales to handle performance demands)
 
- provisioned throughput (default)
 Consistency
- 200 status code = write has been completed and is durable
 - ensures Dynamo DB returns the most up-to-date copy of data
 - Reads from leader node to use strongly consistent reads
 
Capacity modes
- On-demand
 - Provisioned
 - Provisioned with Auto Scaling
 
Capacity Units How-To Guide – How to Calculate Read and Write Capacity for DynamoDB – Linux Academy
- Read Capacity Units (RCU)
(ITEM SIZE (rounded up to the next 4KB multiplier) / 4KB) * # of items(Round up to the nearest 4 KB multiplier)- 4 KB of data read from a table per second in a strongly consistent way
 - Read 2kKB consumes 1 RCU
 - if eventually consistent reads are OK, 1 RCU can allow for 2x4 KB of data reads per second
 - atomic transactions require 2x the RCU to complete
 
 - Write Capacity Units (WCU)
(ITEM SIZE (rounded up to the next 1KB multiplier) / 1KB) * # of items(Round up to the nearest 1 KB multiplier)- 1 WCU is 1 KB of data or less written to a table
 - Write 200 bytes consumes 1 WCU
 - atomic transactions require 2x the WCU to complete
 
 
- Read Capacity Units (RCU)
 
Elements
Streams: provide ordered list of changes that occur to items with a DynamoDB table
- rolls 24 hour window of changes
 - enabled per table (only data from the point of being enabled)
 - has a ARN that identifies it globally across all tables, accounts and regions
 - View types
- KEYS_ONLY: whenever an item is added, updated or deleted, the keys of the item are added to the strea,
 - NEW_IMAGE: the entire item is added to the stream “post-change”
 - OLD_IMAGE: the entire item is added to the stream “pre-change”
 - NEW_AND_OLD_IMAGES: both the new and old versions of the item are added to the stream
 
 
Triggers:
- steams can be integrated with AWS lambda, invoking a function whenever items are changed in a DynamoDB table (a DB ttrigger)
 
graph LR A[Terminal] B[Dynamo DB table] C[Dynamo DB Stream Records] D((AWS lambda)) A --> B; B --> C; C --> D;
- Indexes: provide n alternative represntation of data in a table, which is useful for applications with varying query demansa
- 2 forms
- Local Secoundary Indexes (LSI)
- created at the same time as the table
 - same PK, as alternative SK
 - share the RCU and WCU values for the main table
 - maximum: 5
 
 - Global Secoundary Indexes (GSI)
- created after table was created, asynchronous data from the table
 - different PK and SK
 - have their own RCU and WCU values
 - maximum amount (without logging a support ticket) of GSIs per table: 30
 
 
 - Local Secoundary Indexes (LSI)
 - interated with as though they are table (alternative representation of it)
 
 - 2 forms
 
In-memory caching
DynamoDB Accelerator (DAX): in-memory cache designed specifically ofr DynamoDB
- results delivered from DAX are available in microseconds rather than in the single-digit milliseconds available from DynamoDB
 - can use a cluster architecture, run inside VPC, applications use a DAX client
 - 2 distinct caches
- item cache
- stores results from 
GetItemandBatchGetItem - has a 5-minutes default TTL
 
 - stores results from 
 - query cache
- stores results from 
QueryandScan - caches based on the parameters specified
 
 - stores results from 
 
 - item cache
 
ElastiCache: managed in-memory data store supporting the Redis or Memcached engines, for lyarge sets of data with repeated read patterns
- 2 use cases
- offloading database
- reads by caching responses
 - improving application speed and reducing costs
 
 - storing user session database
- allowing for stateless compute instances (used for fault tolerant architectures)
 
 
 - offloading database
 - is used with key-value databases or to store simple session data, but it can be used with SQL database engines
 
- 2 use cases