Photo courtesy: pexels.com

SQL is typically a go-to tool when you need to get a quick look at some data and draw preliminary conclusions. However the data can be of various shapes and forms, stored across different tables, and not really ‘relational’. Additionally, if one is doing data science in python, they would want to be able to read data and write back their predictions and forecasts to DB quickly.

Until recently I had never come across an opportunity to use SQL in Python. When I started looking around for options, I found many — Sqlite, MySQL, SQLAlchemy.

In this article, I will give an overview how we can leverage SQLAlchemy and Pandas to perform database queries.

Step 1: Importing SQLAlchemy and Pandas

Lets start with importing the sqlalchemy library. We will be using the create_engine feature from the library.

!pip install sqlalchemy 
from sqlalchemy import create_engine

We also import pandas, a python library built for data analysis and manipulation

import pandas

Step 2: Creating a SQL engine

We create a SQL engine using the command which creates a new class ‘.engine’.

engine = create_engine(*args)

The argument is a string which indicates database dialect and connection arguments in the form of a url. It is typically what you would write in the SQL engine to connect to a DB.

dialect[+driver]://+ dsn_uid + ':' + dsn_pwd + '@'+dsn_hostname+':'+dsn_port+'/' + dsn_database

Here dialect is the database name such as mysql, oracle, postgresql etc. Each database has a corresponding DBAPI wrapper. All dialects require that an appropriate DBAPI driver is installed.

create_engine() builds a secure connection with DB so you can read and write to it.

Step 3 — Running queries using SQL statements
Coming to the interesting part in the blog, lets go through the steps one can take to submit data queries using sqlalchemy.
To submit data queries, the following steps are followed:
a.Wrap your SQL statements in a container

b. Send it to the database,

c. Receive the response back,

d. Put the response in a pandas dataframe.

Like any SQL query, the two primary clauses that must be present in every query here are SELECT, and FROM.

· SELECT allows you to select a subset of columns (or all of them) from a table,

· FROM specifies which table the column(s) are being pulled from.

For example, to return all columns from a ‘table1’, you can do the following:

sql = "SELECT * FROM table1 "
df = pd.read_sql_query(sql, engine)
df.head()

If you are dealing with multiple tables, you may need to specify which column from which table because the columns of interest may come from different tables in the database.

SELECT table1.column1, table1.column2
FROM table1

If you want to get certain rows from a column, you can use this query

SELECT DISTINCT column1
FROM table1

Step 4 — Writing to DB

Writing to DB in python using SQLAlchemy is similar to what you would do in a SQL environment.

Once you create_engine, and receive data, you can use to_sql to write to DB. Here the data should be placed inside a dataframe.

from sqlalchemy import create_engine
engine = create_engine(*args)

Now create a table with some rows

df = pd.DataFrame({‘name’ : [‘T1’, T2', T3']})
print(df)
>>> name
0 T1
1 T2
2 T3

Using to_sql we can write to the DB

df.to_sql(tableT, con=engine, if_exists=’append’)

Check if data was written in the table correctly

engine.execute(“SELECT * FROM tableT “).fetchall()
[(0, ‘T1’), (1, ‘T2’), (2, ‘T3’)]

One can append more rows to the table, or replace the rows with a new dataframe. As follows

df2 = pd.DataFrame({'name' : ['T6', 'T7']})
df2.to_sql(' tableT ', con=engine, if_exists='replace')

Step 5— Creating a Table in DB

To create a table in DB from python, we make use of Metadata. Metadata is a collection of Table objects and their associated schema constructs.

from sqlalchemy import MetaData
meta = MetaData()

Next, we define our table using the Table construct, which resembles regular SQL CREATE statement.

SQLAlchemy matches Python data to the best possible generic column data types defined in it. Some of the generic data types are −

  • Boolean
  • Date
  • DateTime
  • Float
  • Integer
  • Numeric
  • String
  • Text
  • Time

Lets take an example and create a students table

from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String
engine = create_engine(*args)
meta = MetaData()
students = Table('students', meta,
           Column('id', Integer, primary_key = True),
           Column('name', String),
           Column('lastname', String))
meta.create_all(engine)

Here create_all() function uses the engine object to create all the defined table objects and stores the information in metadata.

Questions or feedback? I’d love to hear from you — please feel free to leave out a comment, or connect with me on Twitter/Linkedin.

This is a primer to get started with SQL queries in Python. For more advanced sql queries, please refer to:

https://www.tutorialspoint.com/sqlalchemy/sqlalchemy_core_using_set_operations.htm

https://towardsdatascience.com/sql-in-python-for-beginners-b9a4f9293ecf


SQL queries in Python was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.