OpenAI Codex tutorial: Natural language to SQL query with OpenAI Codex

Wednesday, October 26, 2022 by ezzcodeezzlife
OpenAI Codex tutorial: Natural language to SQL query with OpenAI Codex

Unraveling the OpenAI Codex

OpenAI Codex, a marvel in the realm of artificial intelligence, is a brainchild of OpenAI. This AI model is a wizard at interpreting natural language and responding with generated code. It's the powerhouse behind GitHub Copilot, your virtual programming assistant. Codex is an offshoot of the renowned GPT-3 model by OpenAI, specifically fine-tuned for coding applications. OpenAI has unveiled an API for Codex, currently in closed beta.

Dive into the world of Codex via the OpenAI playground, your sandbox for AI exploration.

Embracing the GPT-3 Tutorial Concept

Imagine transforming natural language into a SQL query. That's the magic of GPT-3. Let's say you need to fetch all users above 25 years old. You simply state:

Fetch all users above 25 years old

And voila! The model generates the corresponding SQL query:

SELECT * FROM users WHERE age > 25

This capability allows SQL query generation without any SQL syntax knowledge. It's a game-changer for non-technical individuals needing to query a database. Dive into our GPT-3 tutorial and start your GPT online journey today!

Let's get started!

Make sure you got access to Codex. If you don't have access, you can get access to the Codex waitlist here: https://openai.com/blog/openai-codex/ and getting accepted usually is within a few days. Before we write any code, you can do some testing first on the OpenAI playground: https://beta.openai.com/playground. I can really recommend you to do some testing first, because it will help you understand how the model works.

First, we need to install the OpenAI Codex library:

!pip install openai

You can find the full documentation of this library here:https://github.com/openai/openai-python

Then, we need to import the library and set the API key

import openai

openai.api_key = "YOUR_API_KEY"

Now, we can start writing some code. We will start by creating a function that will generate a SQL query from a natural language. We will use the openai.Completion.create function to generate the SQL query. We will pass the following parameters:

  • engine: The engine that will be used to generate the SQL query. We will use davinci-codex for this tutorial.
  • prompt: The natural language that will be used to generate the SQL query.
  • max_tokens: The maximum number of tokens that will be generated.
  • temperature: The temperature of the model. The higher the temperature, the more random the text will be. Lower temperature results in more predictable text.
  • top_p: The cumulative probability for top-p sampling. 1.0 means no restrictions. Lower top_p results in more random completions.
  • frequency_penalty: The cumulative probability for top-p sampling. 1.0 means no restrictions. Lower top_p results in more random completions.
  • presence_penalty: The cumulative probability for top-p sampling. 1.0 means no restrictions. Lower top_p results in more random completions.
  • stop: The sequence of tokens that will stop the generation.
def generate_sql_query(prompt, max_tokens=100, temperature=0.5, top_p=1.0, frequency_penalty=0.0, presence_penalty=0.0, stop=["\n"]):
    response = openai.Completion.create(
        engine="code-davinci-002",
        prompt=prompt,
        max_tokens=max_tokens,
        temperature=temperature,
        top_p=top_p,
        frequency_penalty=frequency_penalty,
        presence_penalty=presence_penalty,
        stop=stop
    )
    return response.choices[0].text

Now, we can test our function. We will use the following prompt:

Get all the users that are older than 25 years old

generate_sql_query("Get all the users that are older than 25 years old")

The model will generate the following SQL query:

SELECT * FROM users WHERE age > 25

Now we put it all together in one file and add input via the console.

import openai

openai.api_key = "YOUR_API_KEY"

def generate_sql_query(prompt, max_tokens=100, temperature=0.5, top_p=1.0, frequency_penalty=0.0, presence_penalty=0.0):
    response = openai.Completion.create(
        engine="davinci-codex",
        prompt=prompt,
        max_tokens=max_tokens,
        temperature=temperature,
        top_p=top_p,
        frequency_penalty=frequency_penalty,
        presence_penalty=presence_penalty,
    )
    return response.choices[0].text

prompt = input("Enter your prompt - What data do you need?: ")
print(generate_sql_query(prompt))

Now run the file and enter your prompt and get back the SQL query.

Wrapping Up the GPT-3 Tutorial Journey

In this enlightening journey, we've mastered the art of generating SQL queries from natural language using OpenAI Codex. We've also delved into the practical use of the OpenAI Codex library.

The road to improvement is endless. You can enhance this project by integrating a database and a user-friendly web interface. To guide the model more accurately, consider adding more prompts to the request.

For precision, you could include your database schema in the prompt. For instance:

USER: ID (INT), NAME (VARCHAR), AGE (DATE)
POST: ID (INT), TITLE (VARCHAR), BODY (VARCHAR), USER_ID (INT)
COMMENT: ID (INT), BODY (VARCHAR), POST_ID (INT), USER_ID (INT)

And why not give a spin to what you;ve learned during our [AI Hackathons] and build GPT-3 app in 1, 2, or just 7 days!

Thank you for reading. If you enjoyed this tutorial you can find more and continue reading on our tutorial page - Fabian Stehle, Junior Data Scientist at New Native