AWS Solutions architect 6 - Database
Models
- RDBMS
- Relational database management systems (RDBMS) data has formal and fixed relationships
- data stored in rows -> individual attributes
- Tables have schemas (define row layout)
- RDBMS conforms ACID system (Atomicity, Consistency, Isolation, Durability)
- High performance
- Low scalability
- Structured Query Language (SQL) is used for RDBMS
- Relational database management systems (RDBMS) data has formal and fixed relationships
- Non-relational: No-SQL (e.g. for social media. data warehousing, analytics)
- Elements
- Key/value: fast queries, no relationships
- Document: structure of key/value pairs. Operations are highly performant
- Column: data is stored in columns rather than rows (Amazon Redshift)
- Graph: designed for dynamic relationships. Data = nodes (Neo4j)
- Elements
SQL — RDS
Definition
- RDS = Database as a Service (DBaS) -> fully functional DB without admin overhead
- performs ar sclae
- can be publicly accesible
- can be configured for demanding availability and durability scenarios
- Engines
- MySQL
- PostgreSQL
- MariaDB
- Oracle
- Microsoft SQL
graph LR A(DB instance CNAME) subgraph VPC1-region1 B[Standby - DB storage -AZ1] C[Primary- DB storage -AZ2] end D[S3] subgraph VPC2-region2 E[Read replica - DB -AZ1] end A --> C; B --> C; C --> B; B --> D; C --> E;
- 1 or more AZs for resilliance
- general purpose (DB.M4, DB.M5)
- memory optimized (DB.R4, DB.R5, Oracle DB.X1 amd DB.X1e)
- burstable (DB.T2 and DB.T3)
- Storage types
- generally purpose SSD (gp2)
- provisione IOPS SSD (io1): indepndent configuration
- Billing based on
- instance size
- provisioned storage (not used)
- IOPS if using io1
- Data transferred out
- Any backups/snapshots beyond the 100% that is free with each DB instance
- RDS supports encryption with limitations
- configured when creating DB instances
- added by taking snapshots, or creating new instance from encrypted snapshot
- encryption can not be removed
- read replicas need to be the same state as theprimary instance (encrypted or not)
- encrypted snapshots can be copied between regions, but a new destination region KMS CMK is used (as they are region specific)
- Network access to an RDS instance is controlled by a security group (SG) associated with RDS instance
Backups
- Automated backups: to S3 occur daily, retained for 0-35days
- Manual snapshots: exist until deleted
- Point in time log-based backups: stored on S3
RDS supports manual snapshot-based backups as well as automatic point-in-time recovery-capable backups with a 1- to 35-day retention period.
graph LR A(Primary-A DB) B(Standby FB) C[S3] D[S3] E(Primary-B DB) A -- synchronous data replication --> B; B -- daily backup --> C; B -- manual snapshot --> D; D -- restore a new instance --> E;
Resiliency multi-AZ
- RDS can be provisioned in single or multi-AZ mode (standby instance in same or different) -> recover from failure
- Only primary can be accessed using instance CNAME
- No performace benefit, better RTO than storing snapshot
- Replication of data is asynchronous (copy in real time from primary to snadby)
- Backups are taken from standby, to ensure no performance impact
- Maintenance is performed in the standby first, then promoted to minimiza downtime
Read replicas
- Read replicas are read-only cpies of an RDS instance that can be created in the same or different region from the primary instance
- can be adresseed independently (each having their DNS name)
- used for read workloads, allowing scaling reads
- 1 RDS instance -> 5 read replicas (whihc can be created from other read replicas)
- read replicas can be promoted to primary instances and can be themselves multi-AZ
- Read replicas are eventually consistent (in seconds, but aplications need to support it)
SQL — Aurora
Essentials
- Aurora enhanced RDS, by Amazon, compatible with MySQL, PostgreSQL tools
- base configuration = cluster
- cluster contains a single primary instance and 0+ replicas
- Cluster storage: all instances use the same storage
- read/write uses cluster endpoint
- redas can use reader endpoint (balances connectionsover replicas)
- volume = SSD based, can scale automatically up to 64TB, bill only consumed data
- replicates data 6 times, across 3 AZs -> improve availability, be promoted to primary instance quickly
- Aurora can tolerate 2 failures without writes being impacted and 3 failures without impacting reads
- Aurora storage is auto-healing
- Tier 0 has the highest priority in an Aurora failover
- Backtrack feature
- allows you to roll back a database for up to 72 hours
- yo don’t have to make a new cluster when using Aurora’s backtrack feature to restore a database.
Parallel queries and global
- Paralell queries: executed across all the nodes of the cluster at the same time
- Activated when created the Aurora cluster
- Global: Aurora provisioning option which adds resiliency by allowing you to pick amongst all AWS regions as your secondary reader cluster
- 1 primary region, 2n region for read workloads -> low latency
- Activated when created the Aurora cluster, only for some verions
Serverless
- Aurora Serverless: based on the same DB engine as Aurora but wothout resource allocation
- specify a minimum and maximum number of Aurora Capacity Units (ACUs) - measurement for processing (compute) and memory in Aurora Serverless.
- can use the Data API to connect to it
- billing: charges are based on database resources used per second
- master exists in one Availability Zone
- capable of rapid scaling because it uses proxy fleets to route the workload to “warm” resources that are always ready to service requests.
- maximum amount of replicas: 15
- when to use it
- should be used when workloads are intermittent and unpredictable
- slower failover time than Aurora Provisioned
graph LR A[Applications] B(Proxy fleet) C[Aurora Serverless cluster DBs] D[Pool DBs] A --> B; B --> C; C --> D;
- Proxy Fleet: fleet of proxy instances who route an application’s query to a group of automatically scalable resources
- Query editor: web-based tool that allows you to log in to the Aurora Serverless cluster and executes queries
NoSQL: DynamoDB
Essentials
Elements
- DynamoDB: NoSQL DB service, 3 replicas of data
- Table: collection of items that share teh same partition key (PK) or partition+sort key (SK) with other configuration and performance settings
- Item: collections of attributes (up to 400KB in size) inside a table that shares the same key structure as every other in the table
- Attribute: key-value pair
Query: filter based on PK or SK, efficient
Scan: checks all items, not efficient
- Filters: applied to scan
Performance
2 read/write capacity modes
- provisioned throughput (default)
- each table is configured with Read Capacity Units (RCU) and Write Capacity Units (WCU)
- every operation on Items
- on-demand mode (automatically consumes at least 1 RCU or WCU - partial RCU/WCU cannot be consumed scales to handle performance demands)
- provisioned throughput (default)
Consistency
- 200 status code = write has been completed and is durable
- ensures Dynamo DB returns the most up-to-date copy of data
- Reads from leader node to use strongly consistent reads
Capacity modes
- On-demand
- Provisioned
- Provisioned with Auto Scaling
Capacity Units How-To Guide – How to Calculate Read and Write Capacity for DynamoDB – Linux Academy
- Read Capacity Units (RCU)
(ITEM SIZE (rounded up to the next 4KB multiplier) / 4KB) * # of items
(Round up to the nearest 4 KB multiplier)- 4 KB of data read from a table per second in a strongly consistent way
- Read 2kKB consumes 1 RCU
- if eventually consistent reads are OK, 1 RCU can allow for 2x4 KB of data reads per second
- atomic transactions require 2x the RCU to complete
- Write Capacity Units (WCU)
(ITEM SIZE (rounded up to the next 1KB multiplier) / 1KB) * # of items
(Round up to the nearest 1 KB multiplier)- 1 WCU is 1 KB of data or less written to a table
- Write 200 bytes consumes 1 WCU
- atomic transactions require 2x the WCU to complete
- Read Capacity Units (RCU)
Elements
Streams: provide ordered list of changes that occur to items with a DynamoDB table
- rolls 24 hour window of changes
- enabled per table (only data from the point of being enabled)
- has a ARN that identifies it globally across all tables, accounts and regions
- View types
- KEYS_ONLY: whenever an item is added, updated or deleted, the keys of the item are added to the strea,
- NEW_IMAGE: the entire item is added to the stream “post-change”
- OLD_IMAGE: the entire item is added to the stream “pre-change”
- NEW_AND_OLD_IMAGES: both the new and old versions of the item are added to the stream
Triggers:
- steams can be integrated with AWS lambda, invoking a function whenever items are changed in a DynamoDB table (a DB ttrigger)
graph LR A[Terminal] B[Dynamo DB table] C[Dynamo DB Stream Records] D((AWS lambda)) A --> B; B --> C; C --> D;
- Indexes: provide n alternative represntation of data in a table, which is useful for applications with varying query demansa
- 2 forms
- Local Secoundary Indexes (LSI)
- created at the same time as the table
- same PK, as alternative SK
- share the RCU and WCU values for the main table
- maximum: 5
- Global Secoundary Indexes (GSI)
- created after table was created, asynchronous data from the table
- different PK and SK
- have their own RCU and WCU values
- maximum amount (without logging a support ticket) of GSIs per table: 30
- Local Secoundary Indexes (LSI)
- interated with as though they are table (alternative representation of it)
- 2 forms
In-memory caching
DynamoDB Accelerator (DAX): in-memory cache designed specifically ofr DynamoDB
- results delivered from DAX are available in microseconds rather than in the single-digit milliseconds available from DynamoDB
- can use a cluster architecture, run inside VPC, applications use a DAX client
- 2 distinct caches
- item cache
- stores results from
GetItem
andBatchGetItem
- has a 5-minutes default TTL
- stores results from
- query cache
- stores results from
Query
andScan
- caches based on the parameters specified
- stores results from
- item cache
ElastiCache: managed in-memory data store supporting the Redis or Memcached engines, for lyarge sets of data with repeated read patterns
- 2 use cases
- offloading database
- reads by caching responses
- improving application speed and reducing costs
- storing user session database
- allowing for stateless compute instances (used for fault tolerant architectures)
- offloading database
- is used with key-value databases or to store simple session data, but it can be used with SQL database engines
- 2 use cases