Unify data silos with AWS AppSync

Silos

Most organisations that process data will have experienced the concept of data in silos. This is where an application is built for a particular purpose and tied to a data store. While this may solve a particular business problem, as time passes developers and engineers may start to spend time extracting data from these silos for other purposes such as analytics and machine learning.

  • Join data from silos together
  • Provide a migration path for application modernisation by moving some data into DynamoDB while keeping some in a RDBMS.

What’s GraphQL?

Organizations choose to build APIs with GraphQL because it gives developers the ability to query multiple databases, microservices, and APIs with a single GraphQL endpoint.

What’s AppSync?

AWS AppSync is a fully managed service that makes it easy to develop GraphQL APIs. Out of the box it allows connections to data sources like AWS DynamoDB, Lambda, and more.

Data Sources

In this example we will provide a unified API that is able to query data from the following data stores:

  • RDS — Representing a traditional 3 tier app that has been migrated to the cloud.
  • Lambda — Representing a serverless application.

DynamoDB

Create a table named vehicle. The key is vehicle_id (string).

Lambda

We now create a quick Lambda that will mock returning some data for a vehicle_id.


import json
print(‘Loading function’)def lambda_handler(event, context):
print (json.dumps(event))
print (context)

vehicle_id={}
vehicle_id=event[‘source’][‘vehicle_id’]
print(vehicle_id)

vehicles = {
“123456” : { “vehicle_id” : “123456”, “fuel” : “electric”, “category”: “SUV” },
“987654321” : { “vehicle_id” : “987654321”, “fuel”: “hybrid”, “category”: “Saloon”}
}

print(vehicles[vehicle_id])
return (vehicles[vehicle_id])

RDS (Aurora PostgreSQL)

Out of the box AppSync supports Aurora Serverles RDS instances. Create an RDS Aurora PostgreSQL instance named vehicle-accident.


create table accident (
vehicle_id varchar,
accident_date date,
damage varchar,
cost integer);
insert into accident values (123456, ‘2020–11–23 18:00:00’, ‘windscreen smashed’, 100);
insert into accident values (987654321, ‘2020–11–24 18:00:00’, ‘dent in front passenger door’, 600);
commit;
{
“username”: “xxxxxxxxxxxxxx”,
“password”: “xxxxxxxxxxxxxx”
}

aws secretsmanager create-secret — name HttpRDSSecret — secret-string file://creds.json — region eu-west-1

Create The GraphQL API

From the AppSync console select build from scratch.

Schema

Click edit schema.


type Query {
#Get a single vehicle.
singleVehicle(vehicle_id: String): Vehicle
}
type Vehicle {
vehicle_id: String
model: String
year: String
colour: String
make: String
fuel: String
category: String
accident_date: String
accident_damage: String
accident_cost: String
}
schema {
query: Query
}

Data Sources

Next we define the three data sources. DynamoDB, RDS and Lambda. Click Data Sources and add them one by one.

Resolvers

DynamoDB

Back on the Schema screen select Attach for the resolver of “singleVehicle(…): Vehicle”


{
“version”: “2017–02–28”,
“operation”: “GetItem”,
“key”: {
“vehicle_id”: $util.dynamodb.toDynamoDBJson($ctx.args.vehicle_id),
}

## Pass back the result from DynamoDB. **
$util.toJson($ctx.result)

Lambda

On the schema definition screen scroll down to the fuel field and click attach.


$util.toJson($context.result.get(“fuel”))


$util.toJson($context.result.get(“category”))

RDS

On the schema definition screen scroll down to the accident_date field and click attach.


{
“version”: “2018–05–29”,
“statements”: [
$util.toJson(“select accident_date from accident WHERE vehicle_id = ‘$ctx.source.vehicle_id’”)
]
}

#if($ctx.error)
$util.error($ctx.error.message, $ctx.error.type)
#end
#set($output = $utils.rds.toJsonObject($ctx.result)[0])
## Make sure to handle instances where fields are null
## or don’t exist according to your business logic
#foreach( $item in $output )
#set($accident_date = $item.get(‘accident_date’))
#end
$util.toJson($accident_date)


{
“version”: “2018–05–29”,
“statements”: [
$util.toJson(“select damage from accident WHERE vehicle_id = ‘$ctx.source.vehicle_id’”)
]
}

#if($ctx.error)
$util.error($ctx.error.message, $ctx.error.type)
#end
#set($output = $utils.rds.toJsonObject($ctx.result)[0])
## Make sure to handle instances where fields are null
## or don’t exist according to your business logic
#foreach( $item in $output )
#set($damage = $item.get(‘damage’))
#end
$util.toJson($damage)

{
“version”: “2018–05–29”,
“statements”: [
$util.toJson(“select cost from accident WHERE vehicle_id = ‘$ctx.source.vehicle_id’”)
]
}

#if($ctx.error)
$util.error($ctx.error.message, $ctx.error.type)
#end
#set($output = $utils.rds.toJsonObject($ctx.result)[0])
## Make sure to handle instances where fields are null
## or don’t exist according to your business logic
#foreach( $item in $output )
#set($cost = $item.get(‘cost’))
#end
$util.toJson($cost)

Query

The three data sources are now in place to resolve all the fields for our API. Go back to the query screen and check that the fields all get populated when you run a query.

Tips

Turn on CloudWatch Logs so you can see details of any errors. You can do this under settings.

Data Architect @CDL_Software , AWS Community Builder, 12 x AWS Certified. GCP, Azure, SCRUM, DAMA CDMP, Hashicorp & Kafka Certified