Thursday, May 25, 2023

AWS DynamoDB Local Secondary Index (LSI) - Demystified

Local Secondary Index -  AWS DynamoDB - All you need to know

In a previous post we saw Global Secondary Index (GSI) in detail. In this post let's look into the Local Secondary Index (LS) in detail

Local Secondary Index (LSI):

A Local Secondary Index (LSI) uses the same Partition Key as that of the base Table with different Sort Key. It contains some or all of the attributes of the base Table. 

Sample Table: The following Table captures Goals scored, Number of Matches played by different Countries in FIFA World Cup Soccer during different games (2022, 2018 and so on)
Partition Key: Country
Sort Key: Game



From this Table one can easily query Number of Goals scored by a Country, given the Country name and the Game (e.g. France, FIFA-2022). Without Sort Key in the Query one can find the Goals scored by "France" across all Games they played, however if we want to find out in which Game they scored the most number of Goals the application needs to process the Query result to find out that.

Create a LSI named GoalsIndex

Let's see how a LSI makes such a querying easier, consider the below LSI where the "Partition Key" remains the "Country" however the Sort key is "Goals"
Partition Key: Country
Sort Key: Goals
Additional attribute projected: Matches



Points to remember:

  • The data in LSI is organized by the same Partition Key of the base Table, however with different Sort Key
  • The Partition Key in the LSI must be same as that of the base Table (in example table 'Country')
  • Must specify one non key attribute of base Table as the Sort Key for a LSI, it must be Scalar attribute (String, Number or Binary)
  • The Sort Key, if any, of the base Table is automatically Projected in to the LSI as non key attribute (in example 'Game')
  • The Partition Key of the base Table is automatically projected
  • A maximum of 5 LSIs can be created on a Table
  • A Query can also retrieve attributes that are not Projected to the LSI, DynamoDB retrieves these from the base Table known as 'fetches', however at greater Latency and high provisioned throughput cost
  • One can store up to 10 GB of data per distinct Partition Key value, this includes all the Items in the base Table as well as across all the LSIs
  • Unlike the base Table, the Partition Key and Sort Key combination need not be unique in a LSI

Projecting attributes:

Projection is set of attributes copied from base Table to the Secondary Index. Primary Keys (Partition & Sort keys) are always projected to the Index. The following are the 3 possible attribute Projection options for a LSI
  1. KEYS_ONLY: Projects Primary Key (Partition and Sort keys) attributes from the base Table to the LSI in addition to the Primary Key attributes defined in the LSI. In the sample LSI the base Table's Sort Key "Game" is projected besides its own Sort Key "Goals". This is the smallest possible LSI. The smaller the Index, the less it costs to store and also less the write costs
  2. INCLUDE: Includes specific attributes besides the automatically projected base Table's Primary Key attributes. In the example Index above the 'Matches' non key attribute of base Table is projected
  3. ALL: The LSI includes all the attributes of the base Table. The LSI will have the attributes "Country","Goals", "Game","Matches" and "Venue"

Reading data from GSI:

  • Query and Scan operations are supported GetItem and BatchGetItem are not supported in a LSI. This is the same as GSI
  • If there are multiple Items for a given Partition/Sort Key combination then all the Items that matches given Key are returned but not in a particular order
  • Data can be read from a LSI either "Strongly consistent" or "Eventually consistent" fashion. This can be specified using 'ConsistentRead'Query parameter
  • If the "ScanIndexForward" parameter is set to "false" while querying, the results are returned in the descending order of Sort Key attribute, in the example Index above the first record will be the highest Goals

Data synchronization between base Table and GSI:

  • DynamoDB keeps all the LSIs synchronized against its respective base Tables
  • Applications can't write directly into a LSI
  • Every time a new Item is added to the base Table the data type of the attribute, which is used as Sort Key in a LSI, should be respected. The sample index uses "Goals" as Sort Key, its value should always be of type Number while inserting into base Table
  • The Items in the LSI is not a one-to-one relationship in base Table. If a LSIs Sort Key attribute doesn't have a value in base Table then that Item is not copied to LSI. This rule can be made use of effectively to copy specific sub set of Items to a specific LSI

Read/Write capacity units for LSI:

  • Strongly consistent read consumes one Read Capacity Unit (RCU) while eventually consistent read consumes half of it
  • When query reads only Projected attributes, provisioned RCU usage is calculated based on the size of the Item in the Index (Keys and Projected attributes) and not based on the Item size in the base Table
  • The number of RCU used is the sum of all projected attributes sized across all returned items, this is rounded to the next 4 KB boundary
  • When query reads attributes which are not Projected into a LSI, the RCU is calculated based on the size of the Item in the LSI and also the entire Item size in the base Table, not just the attributes fetched from the base Table. Fetching from base Table causes additional latency
  • The maximum size of the results returned by Query is 1 MB, this includes the sum of size of all items returned matched in LSI and also in the base Table, if any attribute which is not projected is queried
  • When an Item is added, updated or deleted from base Table the corresponding change is replicated in LSI which incur additional Write cost
  • The Provisioned throughput cost is the sum of Write Capacity Units (WCU) consumed for writing to base Table and as well as to LSIs
  • When new Item is written to base Table which has a value for Sort Key attribute of a LSI or an existing Item is updated to populate with value for the same attribute which is previously undefined then WCU is consumed for writing that Item to the LSI
  • When value of Sort key attribute of an Item is changed from X to Y in base Table, it result in one Delete and one Write in a LSI
  • When update on an Item in the base Table removes the Sort key attribute of a LSI a write is consumed to get the Item deleted from the LSI
  • When an Item is written to the base Table, the DynamoDB automatically copies the correct subset of attributes to LSIs. Storage cost is charged on this Item storage for both base Table and LSIs

Items Collections:

  • Item collection is group of Items that have the same Partition key value in base Table and any LSI. In the example Table it's 'Country'. 
  • Following operations can be performed on an Item collection that returns information about the Item collection. When 'ReturnItemCollectionMetrics' parameter is set to SIZE each of these operations return details of size of Items in Collection in Index
    • BatchWriteItem
    • DeleteItem
    • PutItem
    • UpdateItem
    • TransactWriteItems
  • The maximum size of any item collection for a table which has one or more local secondary indexes is 10 GB
  • If an item collection exceeds the 10 GB limit, DynamoDB returns an ItemCollectionSizeLimitExceededException, and you won't be able to add more items to the item collection or increase the sizes of items that are in the item collection
  • If the application expect the size of Item collection to exceed 10 GB then one should consider creating a GSI
  • Each Item collection is stored in a single partition whose size capability is 10 GB. One should choose Partition Key in such a way the data is evenly distributed across Partitions. For a Table with LSIs, applications should not create hot spots of read/write activity within a single Item collection which is in a single Partition

No comments:

Post a Comment