Blogs

Home >> Resources >> Blogs >> Technical Articles

Technical Articles

Transforming Natural Language Prompts into SQL Queries Using LLM’s

July 25, 2024

Purpose of the article: To explain how Large Language Models (LLMs) can be utilized to convert natural language queries (NLP) into SQL queries effectively. It aims to demonstrate the capabilities of LLMs in enhancing the process of querying databases.

Intended Audience: Data Analyst, Data Scientists, Database Administrators, Data Engineers, Software Engineer and Developers.

Tools and Technology: Python, Flask, SQLite, LLM, Langchain.

Keywords: NLP2SQL, Natural Language Understanding, Data Retrieval, Text-to-SQL, SQL Query Generation.

What is LLM?

LLM stands for Large Language Model. It refers to sophisticated Artificial Intelligence models trained on vast amounts of data. They are pivotal in applications ranging from Natural Language understanding and generation to improving the efficiency of various tasks like translation, summarization, and dialogue systems.

Example of LLMs: OpenAI’s GPT series, Google’s BERT or Mistral etc.,

Why LLM?

LLMs are pivotal due to their ability to handle complex Natural Language tasks with high accuracy. LLMs enhance user interactions with applications, making them crucial for advancing AI-driven capabilities across diverse domains.

NLP to SQL refers to the process of converting Natural Language queries into Structured Query Language (SQL) commands that can be executed on databases. This involves leveraging Natural Language Processing (NLP) techniques and often advanced models like Large Language Models (LLMs). NLP to SQL systems interpret the intent behind user queries, analyze the semantics of sentences, and generate corresponding SQL queries to retrieve relevant data from databases.

These systems aim to bridge the gap between human language understanding and database querying, enabling more intuitive and efficient interactions with data-driven applications and services.

Step-by-Step process of converting Natural Language to SQL queries:

Install the packages.

pip install flask

pip install pandas

pip install python-dotenv

pip install sqlite3

pip install langchain-openai

pip install langchain

pip install langchain-community

Import the packages.

from flask import Flask, request

import sqlite3

from langchain_openai_llm import get_llm

from langchain_community.utilities import SQLDatabase

import pandas as pd

from langchain_openai import AzureChatOpenAI

from dotenv import load_dotenv

import os

load_dotenv()

Passing User queries in Natural Language can be done via frontend UI or, in our case, through the Postman API platform.

				
					#Getting the User query from frontend UI
@app.route("/v2/query", methods=['POST'])
def predict():
    data = request.json
    user_input = data.get('query')
    response = get_results(user_input)
    return response

Generating prompts and passing them to the LLM model for SQL query generation based on user specifications and provided table data. Once the SQL query is generated, it is executed at the database level to fetch relevant records.

				
					#Generating prompt and passing it to the LLM for SQL query generation
def get_results(user_query: str) -> str:
    try:
        prompt = f'''Based on provided table and column details in an SQLite database, generate an SQL query to fetch records as per user specifications.
        Use the pipeline symbol '||' with a single space for concatenating columns, avoiding the built-in CONCAT function. Employ joins as needed to link related tables. Provide the SQL query ending with ';' for execution. Ensure all string literals used in WHERE operations are converted to upper case or lower case.
        
        "EMPLOYEES": "employee_id", "first_name", "last_name, email", "phone_number", "hire_date", "job_id", "salary", "manager_id", 
        "DEPARTMENTS": "department_id", "department_name", "location",
        "JOBS" : "job_id", "job_title", "min_salary", "max_salary", "department_id", 
        "EMPLOYEE_DEPARTMENTS": "employee_id", "department_id"
        
        Given User Query: {user_query}'''

        sql_query = get_11m(prompt)
        print("SQL Query is:", sql_query)
        response=run_sql_query(sql_query)
        print("SQL Query Data Results-> ", response)
        json_str = response.to_json(orient='records')
        return json_str
except Exception as e:
    # Handle any errors that occur during query generation 
    return f"Error: {str(e)}"

Connecting and passing prompt to deployed LLM model which will in return generate SQL query and pass it back.

				
					os.environ["AZURE_OPENAI_API_KEY"] = os.getenv("AZURE_OPENAI_API_KEY") 
os.environ["AZURE_OPENAI_ENDPOINT"] = os.getenv("AZURE_OPENAI_ENDPOINT")

# Connecting to deployed LLM model to generate SQL query based on user prompt. 
def get_11m (prompt):
    11m = AzureChatOpenAI(
        azure_deployment="gpt-35-turbo",
        api_version="2024-02-01",
        temperature=0,
        max_tokens=None,
        timeout=None,
        max_retries=2,
    # other params...
    )

    messages = [
        (
        "system",
        "You are a helpful assistant that understands user prompt and generate accurate SQL query"
        ),
        ("human", prompt),
    ]
    ai_msg = 11m.invoke(messages) 
    return ai_msg.content

Executing the generated SQL query on the database to retrieve relevant records.

				
					Connecting to database and fetching the results. 
def run_sql_query(query):
    try:
        db = SQLDatabase.from_uri("sqlite:///employee.db") 
        result = db.run(query, fetch="cursor") 
        result_map = list(result.mappings())
        df = pd.DataFrame (result_map)
        return df
    except Exception as e:
        return { "Error": "Failed to connect with database", "Error code": 400}

Initializing the server on port 9001.

				
					if _name_ == '___main__':
        print("Serving Initializing")
        print("Serving Started....")
        app.run(host="0.0.0.0", debug=True, port=9001)

Output:

Server Started on port 9001

http://127.0.0.1:9001

Passing user query using Postman API.

Query: Department wise total Salary?

References:

Author Bio:

Vinod Kumar Reddy KOMMA

Analyst - Advances Analytics

I'm Vinod Kumar Reddy KOMMA, and I've been an Analyst in Advanced Analytics at MOURI Tech for three years. Possess solid experience with Python, SQL programming languages, Azure Cloud and Basics of AI & ML

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Business Services

Non-Profit & Public Sector

Professional Services

Others

Business Services

Non-Profit & Public Sector

Professional Services

Others

Blogs

Transforming Natural Language Prompts into SQL Queries Using LLM’s

Author Bio:

Vinod Kumar Reddy KOMMA

Related Post

Services

Industries

Follow Us

Services

Industries

Follow Us

Services

Industries

Follow Us