DynamoDB – AWS Developer Certified Exam Notes

  • Amazon DynamoDB is a fast and flexible NOSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed database and supports both document and key-value data models.
  • Stored on SSD storage
  • Spread Across 3 geographically distinct data centres
  • two data consistency models:
    • Eventual Consistent Reads (Default)
      • Consistency across all copies of data is usually reached within a second. Repeating a read after a short time should return the updated data
    • Strongly consistent reads
      • A strongly consistent read returns a result that reflects all writes that received a successful response prior to the read
  • The basics of DynamoDB:
    • Table
    • Item
    • Attributes
  • Pricing:
    • Provisioned Throughput Capacity
      • Write throughput 0.0065$ per hour for every 10 units
      • Read Throughput 0.0065$ per hour for every 50 units
    • First 25 Go stored per month is free
    • Storage costs of 0.25$ per month there after
  • Two types of Primary keys available
    • Single Attribute (think of unique ID) — Partition key composed of one attribute
    • Composite (think of unique ID and date range)  — Partition key and Sort key
  • Partition Key: DynamoDB uses the partition key’s value as input to an internal hash function. The output from the hash function determines the partition (this is simply the physical location in which the data is stored)
  • Partition Key and Sort key: Two items can have the same partition key, but they must have a different sort key
  • All items with the same partition key are stored together, in sorted order by sort key value
  • Local Secondary Index:
    • Has the SAME Partition key (equal to primary key), different sort key.
    • Can only be created when you creating a table. They cannot be removed or modified later
  • Global Secondary Index
    • Has DIFFERENT Partition key (different for primary key) and different sort key
    • Can be created at table creation or they can be added later
  • DynamoDB Streams: Used to capture any kind of modification of the DynamoDB Streams
    • If a new item is added to the table, the stream captures an image of the entire item, including all of its attributes
    • If an item is updated, the stream captures the “before” and “after” image of any attributes that were modified in the item
    • If an item is deleted from the table, the stream captures an image of the entire item before it was deleted
  • DynamoDB Streams stored for 24 hours, and useful to trigger lambda functions
  • Possible to have 5 LSI and 5 GSI per table
  • By default, a Query returns all of the data attributes for items with the specified primary key(s); however, you can use the ProjectionExpression parameter so that the Query only returns some of the attributes, rather than all of them
  • Query results are always sorted by the sort key. to reverse the order, set the ScanIndexForward parameter to false
  • A scan operation examines every item in te table. By default, a Scan returns all of the data attributes for every item; however, you can use the ProjectionExpression parameter so that the Scan only returns some of the attributes, rather than all of them
  • Generally, a Query operation is more efficient than a Scan operation
  • A Scan operation always scans the entire table, then filters out values to provide the desired result, essentially adding the extra step of removing data from the result set. Avoid using a Scan operation on a large table with a filter that removes many results. If possible. Also, as a table grows, the Scan operation slows.
  • DynamoDB Provisioned Throughput Calculations
    http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ProvisionedThroughput.html

    • Unit of Read Provisioned Throughput
      • All reads are rounded up to increments of 4 KB
      • Eventually Consistent Reads consist of 2 reads per second
      • Strongly Consistent Reads consist of 1 read per second
    • Unit of Write provisioned throughput
      • All writes are 1 KB
      • All writes consist of 1 write per second
  • 404 HTTP Status Code: ProvisionedThroughputExceededException:
    • You exceeded your maximum allowed provisioned throughput for a table or for one or more global secondary indexes
  • You can authenticate users using Web Identity providers (such as Facebook, Google..) This is done using AssumeRoleWithWebIdentity API
    Untitled Diagram-4

    • User Authenticates with ID provider
    • They are passed a Token by their ID provider
    • Your code calls AssumeRoleWithWebIdentity API and provides the providers token and specifies the ARN for the IAM Role
    • App can now access DynamoDB from between 15 minutes to 1 hour
  • Atomic Counter: Are not idempotent and useful when margin of error are authorised
  • Conditional writes are idempotent. This means that you can send the same conditional write request multiple times, but it will have no further effect on the item after the first time DynamoDB performs the specified update
    Conditional write are safe  for critical data
  • Batch operations: If your application needs to read multiple items, you can use the BatchGetItem API. A single BatchGetItem request can retrieve up to 1 MB of data, which can contain as many as 100 items. In addition, a single BatchGetItem request can retrieve items from multiple tables
  • Amazon DynamoDB stores three geographically distributed replicas of each table to enable high availability and data durability
  • Amazon DynamoDB allows atomic increment and decrement operations on scalar values.
  • It does not have all the functionality of a relational database. It does not support complex relational queries (e.g. joins) or complex transactions
  • There is NO LIMIT to the number of attributes that an item can have
  • DynamoDB APIs: CreateTable, UpdateTable, DescribeTable, ListTables,DeleteTable, PutItem, BatchWriteItem, UpdateItem, DeleteItem, GetItem, BatchGetItem
  • A Scan operation on a table or secondary index has a limit of 1MB of data per operation. After the 1MB limit, it stops the operation and returns the matching values up to that point, and a LastEvaluatedKey to apply in a subsequent operation, so that you can pick up where you left off.
  • Yes you can use AWS Management Console to view and edit JSON documents
  • Amazon DynamoDB is designed to scale its provisioned throughput up or down while still remaining available, whether managed by Auto Scaling or manually.
  • To achieve high uptime and durability, Amazon DynamoDB synchronously replicates data across three facilities within an AWS Region.
  • DynamoDB Auto Scaling is a fully managed feature that automatically scales up or down provisioned read and write capacity of a DynamoDB table or a global secondary index, as application requests increase or decrease.
  • There are three configurable settings for Auto Scaling: Target Utilization, Minimum capacity and Maximum capacity
  • An Auto Scaling policy can only be set to a single table or a global secondary indexes within a single region
  • Scaling up instantly to maximum capacity or scaling down to minimum capacity IS NOT SUPPORTED
  • Auto-Scaling can be enabled for read, write or both
  • When you delete a table or global secondary index from the console, its Auto Scaling policy and supporting Cloud Watch alarms are also deleted.
  • There are no additional cost to using Auto Scaling, beyond what you already pay for DynamoDB and CloudWatch alarm
  • You can create one or more secondary indexes on a table, and issue Query requests against these indexes
  • You can create a maximum of 5 global secondary indexes per table
  • LSI is attached to a specific partition key value, whereas a GSI spans all partition key values
  • The data in a secondary index consists of attributes that are projected, or copied, from the table into the index
  • A GSI does not need to have a sort key element.
  • DynamoDB will ensure that the contents of the GSI are updated appropriately as items are added, removed or updated.
  • Attributes that are part of an item in a table, but not part of the GSI key, primary key of the table, or projected attributes are thus not returned on querying the GSI index. Applications that need additional data from the table after querying the GSI, can retrieve the primary key from the GSI and then use either the GetItem or BatchGetItem APIs to retrieve the desired attributes from the table
  • GSIs manage throughput independently of the table they are based on
  • Because some or all writes to a DynamoDB table result in writes to related GSIs, it is possible that a GSI’s provisioned throughput can be exhausted. In such a scenario, subsequent writes to the table will be throttled. This can occur even if the table has available write capacity units.
  • Can I specify which global secondary index should be used for a query? YES
  • For a global secondary index, with a partition-only key schema there is no ordering. For global secondary index with partition-sort key schema the ordering of the results for the same partition key is based on the sort key attribute
  • You can only add or delete one index per API call.
  • There can be ONLY ONE ACTIVE add or delete index operation on a table.
  • No, the new index becomes available only after the index creation process is finished.
  • The length of time depends on the size of the table and the amount of additional provisioned write throughput for Global Secondary Index creation
  • You can use the DynamoDB console or DescribeTable API to check the status of all indexes associated with the table
  • You are currently limited to 5 GSIs. The “Add” operation will fail and you will get an error.
  • Yes, once a Global Secondary Index has been deleted, that index name can be used again when a new index is added.
  • No, once index creation starts, the index creation process cannot be canceled.
  • A query on a GSI can only return attributes that were specified to be included in the GSI at creation time
  • Scan on global secondary indexes will not support fetching of non-projected attributes.
  • The set of attributes that is copied into a local secondary index is called a projection
  • You need to create a LSI at the time of table creation. It can’t currently be added later on
  • Local secondary indexes can only be queried via the Query API.
  • Each table can have up to five local secondary indexes.
  • No, an index cannot be modified once it is created.
  • No, local secondary indexes cannot be removed from a table
  • In Amazon DynamoDB, an item collection is any group of items that have the same partition key, across a table and all of its local secondary indexes
  • Every item collection in Amazon DynamoDB is subject to a maximum size limit of 10 gigabytes
  • The 10 GB limit for item collections does not apply to tables without local secondary indexes; only tables that have one or more local secondary indexes are affected.
  • Fine Grained Access Control (FGAC) gives a DynamoDB table owner a high degree of control over data in the table
  • There is no additional charge for using FGAC
  • AWS CloudTrail is a web service that records AWS API calls for your account and delivers log files to you
  • please note that you are charged by the hour for the throughput capacity, whether or not you are sending requests to your table.
  • Free tier which allows for 25 units of write capacity and 25 units of read capacity
  • Every Amazon DynamoDB table has pre-provisioned the resources it needs to achieve the throughput rate you asked for. You are billed at an hourly rate for as long as your table holds on to those resources
  • Keep in mind that you can’t change your provisioned throughput if your Amazon DynamoDB table is still in the process of responding to your last request to change provisioned throughput
  • In general, decreases in throughput will take anywhere from a few seconds to a few minutes, while increases in throughput will typically take anywhere from a few minutes to a few hours.
  • Reserved Capacity is a billing feature that allows you to obtain discounts on your provisioned throughput capacity
  • You cannot cancel your Reserved Capacity and the one-time payment is not refundable
  • The smallest Reserved Capacity offering is 100 capacity units (reads or writes).
  • No API to use with Reserved Capacity
  • Reserved Capacity is associated with a single Region.
  • Reserved Capacity is applied to the total provisioned capacity within the Region in which you purchased your Reserved Capacity. For example, if you purchased 5,000 write capacity units of Reserved Capacity, then you can apply that to one table with 5,000 write capacity units, or 100 tables with 50 write capacity units, or 1,000 tables with 5 write capacity units, etc.
  • The cross-region replication application is hosted in an Amazon EC2 instance in the same region where the cross-region replication application was originally launched.
  • After the replication has been created, any changes to the provisioned capacity on the master table will not result in an update in throughput capacity on the replica table.
  • As long as the replica table and the master table have different names, both tables can exist in the same region.
  • There is no limit on the number of triggers for a table.
  • You can create a trigger by creating an AWS Lambda function and associating the event-source for the function to a stream in DynamoDB Streams
  • Capacity for your stream is managed automatically in DynamoDB Streams
  • What happens if you delete DynamoDB Table: The stream will persist in DynamoDB Streams for 24 hours to give you a chance to read the last updates that were made to your table
  • If you turn DynamoDB Streams back on (after is was turned off), this will create a new stream in DynamoDB Streams that contains the changes made to your DynamoDB table starting from the time that the new stream was created
  • To choose what information is included in DynamoDB Streams you need to specify ViewType Parameter
  • If you want to change the type of information stored in a stream after it has been created, you must disable the stream and create a new one using the UpdateTable API.
  • You can use the DescribeStream API to get the current status of the stream. Once the status changes to ENABLED, all updates to your table will be represented in the stream.
  • Local Secondary Indexes and Global Secondary Indexes associated with the tagged tables are automatically tagged with the same tags
  • DynamoDB Streams usage cannot be tagged
  • You can add up to 50 tags to a single DynamoDB table. Tags with the prefix “aws:” cannot be manually created and do not count against your tags per resource limit.
  • DynamoDB Time-to-Live (TTL) is a mechanism that lets you set a specific timestamp to delete expired items from your table
  • To specify TTL, first enable the TTL setting on the table and specify the attribute to be used as the TTL value. As you add items to the table, you can specify a TTL attribute if you would like DynamoDB to automatically delete it after its expiration
  • No you cannot delete an entire table by setting TTL on the whole table
  • IF the value specified in the TTL attribute for an item is not in the right format, the value is ignored and the item won’t be deleted.
  • Yes. TTL behaves like any other item attribute. You can create indexes the same as with other item attributes.
  • The exact duration within which an item truly gets deleted after expiration will be specific to the nature of the workload and the size of the table
  • Expired items are not backed up before deletion
  • Enabling TTL requires no additional fees.
  • Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that enables you to benefit from fast in-memory performance for demanding applications
  • Applications that require the fastest possible response times for reads. Some examples include real-time bidding, social gaming, and trading applications. DAX delivers fast, in-memory read performance for these use cases.
  • DAX handles cache eviction in three different ways. First, it uses a Time-to-Live (TTL) value that denotes the absolute period of time that an item is available in the cache. Second, when the cache is full, a DAX cluster uses a Least Recently Used (LRU) algorithm to decide which items to evict. Third, with the write-through functionality, DAX evicts older values as new values are written through DAX.
  • The only way to connect to your DAX cluster from outside of your VPC is through a VPN connection.
  • DAX provides DAX SDKs for Java and Node.js
  • VPC Endpoint (VPCE) for DynamoDB is a logical entity within a VPC that creates a private connection between a VPC and DynamoDB without requiring access over the Internet, through a NAT device, or a VPN connection
  • VPC endpoints can only be created for DynamoDB tables in the same region as the VPC.
  • There is no additional cost for using VPCE for DynamoDB
  • Each VPCE supports one service
  • DynamoDB does not support resource based policies pertaining to individual tables, items, etc.
  • One strongly consistent read requires one read capacity unit
  • You CAN ONLY create once secondary index at a time : So attempting to create more than one table with a secondary index at the same time will result “LimitExceededException” Error
  • UpdateTable API does not consume capacity units, it is used to change provisioned throughput capacity
  • The min length of a sort key is 1 byte, and max length is 1024 bytes
  • The min length of a partition key is 1 byte, and max length is 2048 bytes
  • DynamoDB uses optimistic concurrency control
  • Data plane operations let you perform create, add, update or delete actions on data in a table
  • A single BatchGetItem can retrieve a max of 100 items. Total size of retrieved items cannot exceed 16 MB
  • PutItem, DeleteItem and UpdateItem allow conditional writes, where you specify an expression that must evaluate to true in order for the operation to succeed
  • A single DynamoDB table partition can support a maximum of 3000 read capacity units or 1000 write capacity units
  • Control Plane operations let you create and manage DynamoDB tables. And also work with indexes, streams and other objects that are dependent on tables
  • When defining primary keys, you should always use a many to few principal — An example of a bad design would be a primary key of product_id where there are few products but many users
  • 5 local and 5 global secondary indexes are allowed, which gives a maximum of 10 per table
  • DynamoDB DOES NOT ALLOW secondary index limit increase
  • Every Item Collection in DynamoDB is subject to a maximum size limit of 10 GB
  • All scalar data types (Number, String, Binary and Boolean) can be used for the sort key element of the local secondary index key. Set, list and map types cannot be indexed
  • For any AWS account, there is an initial limit of 256 tables per region
  • GetItem provides an eventual ready consistent by default, if your app requires strongly consistent read, set ConsistentRead to true
  • You can reduce the impact of the scan operation by setting a smaller page size (Limit param)
  • Optimistic concurrency depends on checking a value upon save to ensure that it has not changed.
  • Pessimistic concurrency prevents a value from changing by locking the item on row in the database. DynamoDB does not support item locking, and Conditional writes are perfect to implementing optimistic concurrency
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s