Blogs

Snowpipe on GCP

Purpose of the Article: In this blog, we have explained how we can implement Snowpipe on GCP Cloud.

Intended Audience: Developers working on Snowflake and GCP Cloud

Tools and Technology: Snowflake and GCP services

Keywords: implement Snowpipe on GCP cloud.

DATA LOADING TO SNOWFLAKE FORM GCP THROUGH SNOWPIPE

OBJECTIVE:

How to establish a Snowflake Snowpipe on GCP.

INTRODUCTION TO SNOWFLAKE:

It is a process of continuous data loading, whenever the data is available in the stage. It can handle large data sets.

ARCHITECTURE OF SNOWPIPE FLOW TO INGESTION OF DATA FROM GCP:

PROCESS OF SNOWPIPE CREATION TO LOAD DATA INTO SNOWFLAKE:

Step 1: To create a Project in GCP:

  • Login to GCP account
  • Click on navigation menu then select IAM & ADMIN>>CREATE A PROJECT

  • Click on create a project tab then it redirects to project creation page, enter the project name and then click on create button to create a project

Step 2: Steps to create bucket in GCP:

  • Search for bucket in search bar, click on buckets

  • Click on create button

  • Enter a proper name to the bucket and click on create button

Step 3: Steps to create a IAM role in GCP:

  • Select IAM & ADMIN>>ROLES

  • Click on create roles

  • Enter a name unique name to the role and click on add permission to add the permissions to the role. Below are the different permissions

Data loading only:

  • Storage.buckets.get
  • Storage.objects.get
  • Storage.objects.list

Data loading with purge option:

  • Storage.buckets.get
  • Storage.objects.delete
  • Storage.objects.get
  • Storage.objects.list

Data loading and unloading:                     

  • Storage.buckets.get
  • Storage.objects.create
  • Storage.objects.delete
  • Storage.objects.get
  • Storage.objects.list

Data unloading only:

  • Storage.buckets.get
  • Storage.objects.create
  • Storage.objects.delete
  • Storage.objects.list

  • Now we are adding permissions for data loading only
  • After clicking on add permission button, search for permissions required then tick the check box and finally click the add button to add a permission
  • Following above step add other two permissions also

  • Below are the descriptions and permissions assigned for the role

Step 4: Create a database:

  • Login to the snowflake account
  • Open new workbook
  • Create a database

 Step 5: Create a table:

  • Create a table to copy the data from stage

 Step 6: Create a storage integration

  • Create a storage integration using GCP bucket
  • Run desc of the storage to get the storage description and details

  • Below is the output we got when we run the desc command
  • Copy the storage_gcp_service_account property value

  • Redirect to GCP account and check the check box of bucket created and we can see permission tab is enabled at top
  • Click on permissions tab to add permissions

  • Click on ADD PRINCIPAL button to add principle

  • Enter the storage_gcp_service_account name which we copied earlier

  • Select the roles custom>>gcp_role

  • Click on save button

Step 7: Create a stage:

  • Create a stage using GCP bucket and storage integration details

  • Run the show stages command to get the details of all stages which were created

Step 8: Create an event subscription:

  • Redirect to the roles page, click on the Activate cloud shell

  • It will redirect to the cloud shell terminal. Which looks like below

Click the My First Project dropdown, copy the ID of the Project

  • Enter gcloud config set project and click enter

  • Creating the pub/subtopic
  • Then enter the gsutil notification create -t -f json gs:/bucket_name/, topic is the name of the topic and bucket_name is the name of GCS bucket and click enter

  • Below is the output

  • Search for pub/sub in search bar and select the pub/sub option

  • Redirect to below page, click on the topic created earlier i.e

  • Click on create subscription button to create a subscription

  • Enter the name of subscription and click create button to create a subscription

  • Click on subscription created and copy the subscription name

Step 8: Create a notification integration:

  • Create a notification integration using subscription name

  • Run the desc integration notification_gcp_snowpipe and copy the pubsub_service_account_name from the output.

 Step 9: Grant Snowflake Access to the Pub/Sub Subscription:

  • Click on add principal button to provide access for subscription

  • Enter the pubsub_service_account_name in new principal field and select role as pub/sub subscriber
  • Click on save

  • Redirect to the dashboard and click on Add people to this project

  • Add the pubsub_service_account_name you recorded
  • From the role dropdown, select Monitoring viewer
  • Click on save

Step 10: Create a pipe to auto ingest:

  • Create a pipe using notification integration, stage and file format based on the file type we are going to load

Create or replace pipe<pipe_name>

Auto_ingest=true

Integration=<notification_integration_name>

As

<copy_statement>

  • Upload a JSON file into GCP bucket
  • Run the pipe
  • Run the select query for a table to see the data loaded into the table

Author Bio:

Picture of Meena Karkambadi

Meena Karkambadi

Data Engineering- Analytics

I am working as Associate software Engineering in Mouri Tech from past 1.5 years. I have good experience in SQL, AWS and Snowflake.

Leave A Comment

Related Post

Purpose to Contact :
Purpose to Contact :
Purpose to Contact :

Purpose to Contact :
Purpose to Contact :
Purpose to Contact :

Purpose to Contact :