Blogs

AWS STEP FUNCTIONS – Automating your ETL Processing

  • Purpose of the Article: In this blog, we have explained how we can use AWS STEP FUNCTION and orchestrate our ETL pipelines on cloud.
  • Intended Audience: This POC/blog will help which kinds of people, like developers working on AWS and looking for services which can handle the ETL automation at 0 cost.
  • Tools and Technology: AWS Services (Step Function, S3, Lambda, Glue)
  • Keywords: Step Function.

Objective:

  1. Discuss AWS STEP FUNCTION and its advantages
  2. Provide an example to show the working of AWS Step Function

Introduction:

AWS Step Functions (SF) is a serverless service in the APPLICATION INTEGRATION section of AWS. It is an orchestration service that helps developers create, manage, and automate the multi-step ETL workflows and pipelines on AWS cloud.

Teams can easily combine different services and microservices into a single workflow and schedule it to run automatically and on specific timings.

At every given step of a workflow, SF manages input, output, error handling, and retries by itself, so that a developer can purely focus on applying the business logic and make the solution as robust as possible.

Components of STEP FUNCTION:

  1. State Machines: Multiple components combine in the workflow to make a single solution which is known as a state machine. These machines can be:
    • Automatically triggered
    • Triggered based on schedule
    • Triggered based on event
  2. Actions: the workflow is a combination of actions and flow components. The actions combine with flows to form a state machine.
  1. Flow: This includes some basic tasks which help in various decisions to be made as per the results of the actions. It helps you control the flow of the state machines.

 

Design Workflow: Choice, Parallel, Map, Pass, Wait, Success, Fail

Architecture:

Steps to Create a Step Function:

  1. Create a state machine & serverless workflow.

  1. Create an AWS IAM Role for accessing with S3, Lambda, Cloudwatch.

  1. Add the IAM Role to the state machine.

  1. Create your AWS Lambda functions.

Lambda Function 1: running the source crawler

Lambda Function 2: status of source crawler

Lambda Function 3: run the GLUE Job (only after successful run of source CRAWLER)

Lambda Function 4: GLUE job status

Lambda Function 5: run the destination crawler (only after success of Glue JOB)

Lambda 6: destination crawler status

  1. Wait & Choice options to control the flow of the process.

Choice:

Wait:

  1. Populate your workflow.

  1. Execute your Workflow- one can start the state machine to run the step function. The SF can be triggered in below mentioned ways:
    1. Manual/on-demand trigger (manually run from console or CLI)
    2. Scheduled trigger (scheduled using a cron expression)
    3. Event based trigger (based on any event or action)

Graph view execution:

Table view execution:

Event view execution:

Author Bio:

Picture of Aman Maheshwari

Aman Maheshwari

Data Engineering-Analytics

This is Aman Maheshwari working as Team Lead in Mouri Tech from past 1.5 years. I have good experience in AWS, Big Data, SQL, Python and Spark.

Leave A Comment

Related Post

Purpose to Contact :
Purpose to Contact :
Purpose to Contact :

Purpose to Contact :
Purpose to Contact :
Purpose to Contact :

Purpose to Contact :