Syncing JSON Files from AWS to Snowflake
Load JSON data from an AWS S3 bucket into Snowflake using Matillion, including table setup and Flatten Variant steps for nested structures.
Matillion can sync JSON files stored in an AWS S3 bucket into Snowflake. Because JSON structures contain nested objects and arrays, this workflow requires two Snowflake tables; one for raw source data with VARIANT columns and one for transformed data with typed columns, along with a Flatten Variant step to extract nested elements.
Before you begin, ensure that AWS cloud credentials are created and associated with the Matillion project environment. See
Setting Up Data Sync for credential setup instructions.
Perform the following steps to sync a JSON file from AWS S3 into Snowflake.
1. Create the source table in the customer PTC Orbit tenant. Open Max Designer and define a table with VARIANT columns for each top-level JSON property. For example, for a JSON file containing rows and dataShape properties:
CREATE OR REPLACE TABLE "Test_JSON_S3" ( "rows" VARIANT, "dataShape" VARIANT );
2. Create the stage table for transformed data. Define typed columns that match the structure of the flattened JSON elements. For example:
CREATE OR REPLACE TABLE "Test_JSON_Stage" ( "PTCCardModelAssetProperties" VARIANT, "name" VARCHAR(16777216), "tags" ARRAY );
3. In Matillion, create an orchestration pipeline. Add an S3 Load component. Configure the following properties:
◦ Authentication: Select Credentials.
◦ S3 Object Prefix: Enter the S3 bucket path where the JSON file is located.
◦ Target Table: Select the source table created in step 1.
◦ Match By Column Name: Select Case Sensitive or Case Insensitive to load semi-structured data into columns that match the corresponding JSON property names. The available options are:
▪ Case Insensitive: Match column names without regard to letter case.
▪ Case Sensitive: Match column names with exact letter case.
▪ None: Load semi-structured data into a single variant column, or transform it using a query in the COPY statement.
4. Add a transformation pipeline to flatten nested JSON arrays. Insert a Flatten Variant component after the Table Input component. Configure the following properties:
◦ Include Input Columns: Choose whether to carry forward the original input columns into the output table.
◦ Column Mapping: For each element in the source array, specify:
▪ Column: The source table column containing the raw semi-structured data.
▪ Property: The name of the element within the semi-structured data to extract.
▪ Type: The data type of the output column.
▪ Alias: The name of the output column to map the element to.
◦ Column Flattens: If a source element contains a nested array, use this property to extract the nested array elements into individual values before mapping them in Column Mapping.
5. Add a Table Output component at the end of the transformation pipeline. Set the Target Table to the stage table created in step 2.
6. Validate the orchestration pipeline by clicking Validate. If validation fails, click the component with the error icon to view the detailed error message in its property window. Fix the error and click Run to execute the pipeline.
The JSON data from AWS S3 is loaded into the source table, flattened into individual columns, and saved to the Snowflake stage table for downstream processing in PTC Orbit.
To create a custom connector, refer to
Matillion Documentation.