Blogs

Home >> Capture data using KPL and write to kinesis data streams

Technical Articles

Capture data using KPL and write to kinesis data streams

January 18, 2021

1.Introduction

1.1 Kinesis data streams

Amazon Kinesis Data Streams is used to collect and process large streams of data records and Data sets in real-time. We can create data-processing applications, known as Kinesis Data Streams applications.

A Kinesis Data Streams application reads data from a data stream as data records and pushes it to the target. These Kinesis streams applications use the Kinesis Client Library, Kinesis consumer Library to push data and they can run on Amazon EC2 instances.

2.EC2 Instance Creation

Step1: In the AWS management console go to EC2 and click on Launch instance.

Now choose the Amazon LINUX AMI 2018.03.0 because it comes with more AWS tools pre-installed.

Step2: Now choose the instance type.

Step3: Now Launch the ec2 instance and download the pem file (contains the keypair) or use the existing one.

2.1 Putty configuration

Step1: Now we need to convert the downloaded pem file to ppk file using putty gen.

Step2: Launch PUTTY and use this ppk file in SSH(AUTH)

Step3: Provide the EC2 hostname and save the session for further reference

Step4: Click on open and EC2 Linux instance will be launched.

Attach a IAM role for EC2 instance to give all the permissions it needed.

Note: Attaching Administrator Access is not a Good Practice.

3.Kinesis Producer Library

3.1 Data to streams using KPL-python

An Amazon Kinesis Data Streams producer is an application that captures user data records from a source and puts them into a Kinesis data stream (also called data ingestion). The Kinesis Producer Library (KPL) simplifies producer application and allows the developers to achieve high write throughput to a Kinesis data stream.

Step1:

The data from Ec2 instance is captured using KPL code and send to kinesis streams. So initially data from the “source” folder is captured and moved to S3 and archive folder simultaneously so that the python code captures data with infinite loop (streaming is done in a continuous process) using producer library and run the code in ec2 instance.

Step2: Python code

Python code is developed using Kinesis producer Library to capture data continuously and write to respective kinesis stream which is mentioned in the code.

Note: Kinesis stream needs to be created in AWS and specify that in code.

Along with the producer library, the required jar files as below are to be downloaded and put along with the python code.

Now trigger the python code and data will be continuously produced to Kinesis streams, this data can be consumed from streams within 24 hours by using consumer libraries and also can be sent to s3 using kinesis firehose delivery stream as a consumer.

Data captured into s3:

Purpose of the article – Regarding how to capture data using kinesis producer library.
Intended Audience – For people working on AWS
References / Sources of the information referred –https://github.com/ludia/kinesis_producer

Contact for further details:

Rajya Lakshmi KUNA
Associate Trainee– Data lakes & DWH -Analytics
MOURI Tech

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Business Services

Non-Profit & Public Sector

Professional Services

Others

Business Services

Non-Profit & Public Sector

Professional Services

Others

Blogs

Capture data using KPL and write to kinesis data streams

1.Introduction

1.1 Kinesis data streams

2.EC2 Instance Creation

2.1 Putty configuration

3.Kinesis Producer Library

3.1 Data to streams using KPL-python

Leave A Comment Cancel Reply

Related Post

Services

Industries

Follow Us

Services

Industries

Follow Us

Services

Industries

Follow Us