emeraldwalk

AWS Amplify - Migrating Data to GraphQL + DynamoDB

I've been working on migrating an older app of mine to a new tech stack. My original app was written in AngularJS with an ASP.NET MVC + MS Sql Server backend. The new version is written in ReactJS backed by an AWS AppSync API built using AWS Amplify.

The Challenge

I found that using AWS Amplify to create a new frontend app + GraphQL backend was pretty straightforward by just following the Amplify docs. Migrating my existing data on the other hand, proved to me more challenging and required getting a deeper understanding of the stack that AWS Amplify generates.

TLDR; General Approach

  • Dump MS Sql data to Json using mssql npm package.
  • Import data directly to DynamoDB using the AWS.DynamoDB namespace in the aws-sdk npm package.
  • Data for composite sort key attributes has to be concatenated and set on appropriate fields for each imported record.

Dumping MS Sql Data as JSON

The mssql package made this part very easy. You just need a connection string and a query to get data and can easily JSON.stringify the results and write to a file.

const fs = require('fs')
const sql = require('mssql')

await sql.connect('mssql://xxxxxx')
const result = await sql.query`select * from someTable`

fs.writeFileSync(
  jsonFilePath,
  JSON.stringify(result, undefined, 2)
)

Importing JSON data to DynamoDB

Importing the JSON data to DynamoDB was a little more involved and took me a while to figure out. Here's a few of the things I had to learn about to pull it off:

  • DynamoDB API
  • Local DynamoDB
  • Amplify's implementation of composite sort keys

DynamoDB API

I used the AWS.DynamoDB namespace of the aws-sdk npm module to import directly to DynamoDB. Amplify has an amplify mock command which will spin up a local DynamoDB instance, so this is where I started.

Targetting the local DynamoDB instance looks something like this:

const localConfig: AWS.DynamoDB.ClientConfiguration = {
  accessKeyId: 'fake',
  endpoint: 'http://localhost:62224',
  region: 'us-fake-1',
  secretAccessKey: 'fake'
}

const db = new AWS.DynamoDB(localConfig);

Note that the fake values are literal and not placeholders for real keys of some sort (this was uncear to me when I first found sample code).

Bulk inserting items then looks something like this:

db.batchWriteItem({
  RequestItems: {
    someTable: [
      {
        PutRequest: {
          Item: {
            someText: {
              S: 'test'
            },
            someNum: {
              N: '4'
            }
          }
        }
      },
      {
        PutRequest: {
          Item: {
            someText: {
              S: 'mock'
            },
            someNum: {
              N: '5'
            }
          }
        }
      }
    ]
  }
})

It's basically just items and attributes mapped to a table. Pretty straightforward and well documented.

One catch here is that there is a limit on how much data you can batch write at a time, and the number of items is limited to 25. I didn't have any issues with the KB limits, but I did have to write batching logic to limit my imports to 25 or less items at a time since some of my tables have thousands of rows.

NOTE: I have since learned that there are other libraries that will handle the batching for you, but using aws-sdk seems to require you to handle it yourself.

Local DynamoDB

As I mentioned before, amplify mock will run a local DynamoDB instance. For my first migration attempt I just tried to map my json data to DynamoDB item + attributes. I ran a few queries against my local GraphQL API and noticed that some were returning data, but queries that involved secondary indexes were empty. Entries I created using the GraphQL API worked just fine, so I knew there was an issue with my direct DynamoDB import.

The local DynamoDB is backed by a SQLITE database file, so I started looking at the underlying data using the SQLite vscode extension. I noticed there were columns containing my model data as well as ones labeled indexKey_0, indexKey_1, etc. I discovered that items imported directly to DynamoDB were missing some index values, but the ones created through the GraphQL API were not.

Composite Sort Keys

DynamoDB indexes consist of a partition key and a sort key. The primary key points to a single field, but the composite key can refer to multiple. In Amplify indexes are defined by a @key directive. If the sort includes more than 1 field, a composite field attribute is added to the table.

For example the following describes a secondary index with the role field as the partition key and name + age as the sort key.

@key(
  name: "byCategory",
  fields: ["role", "name", "age"]
)

This will add an attribute named name#age to the corresponding DynamoDB table and entries in the table would look like john#31.

The composite attributes are defined in the CloudFormation templates that create the DynamoDB tables, and the resolver templates handle composing the data when querying or mutating data.

Manually Importing Composite Data

After understanding how Amplify handles composite sort keys, importing the index data wasn't too difficult. It's just a matter of adding additional string attributes for any sort key that has more than 1 field.

e.g.

{
  PutRequest: {
    Item: {
      name: {
        S: 'john'
      },
      age: {
        N: 31
      },
      role: {
        S: 'manager'
      },
      name#age: {
        S: 'john#31'
      }
    }
  }
}

Summary

I've been enjoying using AWS Amplify. Overall it is an accelerator and automates a lot of work that would be required to deploy an app to AWS. Data migration still involves a fair amount of manual work, but it's not too bad once you understand some of the underlying mechanics of the stacks that Amplify creates. Till next time.