StreamSets to Snowflake Integration | WhoisXML API

WhoisXML API Blog

StreamSets to Snowflake Integration

This guide provides step-by-step instructions for StreamSets to Snowflake Integration.

Prerequisites

  • StreamSets account.
  • Data collector using Docker (as described in previous guide)
  • Snowflake account
  • WHOISXMLAPI.COM Real-Time Domain Registration subscription 

Design Overview

Design Overview

Creating a Database in Snowflake

Step 1: Set Up a Warehouse:

  • After logging in, navigate to the sidebar select "Admin" and choose "Warehouses," Click on Warehouse button, as illustrated in the screenshot below.
Set Up a Warehouse
  • Select the appropriate warehouse name and click on the “Create Warehouse” button.
Select the appropriate warehouse name and click on the “Create Warehouse” button.

Step 2: Setup a Database

  • Navigate to the sidebar select "Data" and choose "Databases" Click on Database button, as illustrated in the screenshot below.
Navigate to the sidebar select "Data" and choose "Databases" Click on Database button
  • Select the appropriate database name and click on “Create” button.
Select the appropriate database name and click on “Create” button.

Step 3: Create Schema:

  • Select database and click on “+ Schema”  button
Select database and click on “+ Schema”  button
  • Select the appropriate Schema name and click on “Create” button.
Select the appropriate Schema name and click on “Create” button.

Step 4: Create Stage:

  • Select Schema created in the previous step and click on “Create” button and choose “Stage” and click on “Snowflake Managed”.
create stage
  • Select the appropriate Stage name and click on “Create” button.
Select the appropriate Stage name and click on “Create” button.

Step 5: Create Table:

  • Select Schema created in the previous step and click on “Create” button and choose “Table” and click on “Standard”.
create table
  • Create Table by writing the SQL query as depicted below and click on the “Create Table” button.
Create Table by writing the SQL query as depicted below and click on the “Create Table” button.

Creating a Connection for Snowflake

Step 6: Setup a Connection:

  • In the StreamSets UI, go to the sidebar, select "Setup," and then choose "Connections." Create the connection by clicking on the plus button, as demonstrated in the screenshot below.
setup a connection
  • Customize the new Connection according to your specifications, and proceed by clicking on the "Save & Next" button. Make sure to select the Data collector and type.
Customize the new Connection according to your specifications, and proceed by clicking on the "Save & Next" button
  • Configure the new Connection according to your credentials, and proceed by clicking on the "Save" button. Make sure to Test the connection.
Configure the new Connection according to your credentials

Creating a pipeline

Step 7: Setup Pipeline:

  • In the StreamSets UI, go to the sidebar, select "Build," and then choose "Pipelines." Initiate the pipeline by clicking on the plus button, as demonstrated in the screenshot below.
setup pipeline
  • Customize the new pipeline according to your specifications, and proceed by clicking on the "Next" button.
Customize the new pipeline according to your specifications, and proceed by clicking on the "Next" button.
  • Adjust the pipeline configuration, choose the designated data collector, and click on the "Save & Open in Canvas" button.
adjust the pipeline configuration
  • Subsequently, the following user interface (UI) will be displayed. 
the following user interface (UI) will be displayed
  • Click on the "Add Stage" button, then search for "WebSocket" and choose the "WebSocket Client."
Click on the "Add Stage" button, then search for "WebSocket" and choose the "WebSocket Client."

Choose the WebSocket stage and configure it based on your preferences:

WebSocket Configuration: 

  • Add Resource URL 
  • Add request Data (It will contain the API Key)
  • Add Max Message Length (bytes) = 522184 minimum 
WebSocket Configuration

Data Format Configuration:

  • Add Data Format JSON.
  • Add Max Object Length (chars) = 9999999 (You can change it according to your requirements)
data format configuration
  • Add another stage by selecting the "Add Stage" button on the UI, and choose the "Snowflake (destinations)" stage.
Add another stage by selecting the "Add Stage"

Snowflake Configuration: 

  • Select the connection which we created in previous steps.
Snowflake configuration
  • Add the Snowflake configuration warehouse, database, schema & table. Check create new columns; it will automatically create columns in the table.
Add the Snowflake configuration warehouse
  • Add the staging configuration stage location, stage name , stage database and schema.
Add the staging configuration stage location
  • Add the data configuration:
Add the data configuration

Final Steps:

Once configured, validate the pipeline by clicking on the "Validate" button to identify and rectify any errors. The final state of the pipeline should resemble the provided example.

final steps

Running Pipeline:

Execute the pipeline by selecting "Draft Run" on the UI, then choose "Start Pipeline."

Running Pipeline

Following the completion of the aforementioned steps, upon starting the pipeline, you should observe the displayed UI as illustrated below.

upon starting the pipeline, you should observe the displayed UI

Within the Snowflake database, you should be able to preview the data on the table, containing data sourced from the WebSocket, as depicted below.

preview the data on the table, containing data sourced from the WebSocket

Conclusion

In conclusion, this comprehensive guide outlined the step-by-step process to set up and execute a pipeline in the StreamSets UI using Docker. From configuring the WebSocket Client to defining the Snowflake stage, each step contributed to building a functional pipeline for data processing. The validation step ensures the integrity of the pipeline, and upon successful execution, the snowflake UI displays the output. By following these instructions, users can successfully create, configure, and run a streamlined data processing pipeline, facilitating effective data integration and management.

Try our WhoisXML API for free
Get started