Servicios
Servicios
Sobre nosotros
Sobre nosotros
Casos de éxito
Casos de éxito
Blog
Blog
Empleo
Empleo
es
en
Contacto
Contacto
Blog

How to connect to ClickHouse from SQLAlchemy

Tutoriales
How to connect to ClickHouse from SQLAlchemy
¡Enlace copiado!

ClickHouse is one of the fastest opensource databases in the market and it claims to be faster than Spark. At WhiteBox we’ve tested this hypothesis with a +2 billion rows table and we can assure you it is! Our tests performed 3x faster for a complex aggregation with several filters.

Regarding this tutorial, all code and steps in this post have been tested in May 2021 and Ubuntu 20.04 OS, so please don’t be evil and don’t complain if the code does not work in September 2025 😅.

Requirements

The requirements for this integration are the following:

  • ClickHouse server: It can be installed quite easily following the official documentation. Current version (21.4.5.46).
  • Python libraries:
    • SQLAlchemy: It can be installed using pip install SQLAlchemy==1.3.24
    • clickhouse-sqlalchemy: It can be installed using pip install clickhouse-sqlalchemy==0.1.6. There is another library “sqlalchemy-clickhouse”, but it does not support most of SQLAlchemy magic.

Setup

ClickHouse installation

This tutorial can be tested against any ClickHouse database. However, in order to get a local ClickHouse database to test the integration, it can be easily installed following the steps below:

sudo apt-get install apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4

echo "deb https://repo.clickhouse.tech/deb/stable/ main/" | sudo tee \
   /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update

sudo apt-get install -y clickhouse-server clickhouse-client

sudo service clickhouse-server start

Running clickhouse-client on the shell ensure you that your ClickHouse installation is properly working. Besides, it can help you debug the SQLAlchemy DDL.

Python environment

These are the Python libraries that are required to run the all the code in this tutorial:

pip install SQLAlchemy==1.3.24
pip install clickhouse-sqlalchemy==0.1.6

Integration

SQLAlchemy setup

The following lines of code perform the SQLAlchemy standard connection:

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

conn_str = 'clickhouse://default:@localhost/default'

engine = create_engine(conn_str)
session = sessionmaker(bind=engine)()

DDL

Create a new database

from sqlalchemy import DDL

database = 'test'

engine.execute(DDL(f'CREATE DATABASE IF NOT EXISTS {database}'))

It is possible to test the current databases in ClickHouse from the command line connection using the sentence SHOW DATABASES. The following output should display on screen:

46a4c3ee-7053-464e-b965-d9884b9c0fb7.png

Create a new table

The following steps show how to create a MergeTree engine table in ClickHouse using the SQLAlchemy ORM model.

ORM model definition

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String, Date

from clickhouse_sqlalchemy import engines

Base = declarative_base()

class NewTable(Base):
   __tablename__ = 'new_table'
   __table_args__ = (
       engines.MergeTree(order_by=['id']),
       {'schema': database},
   )

   id = Column(Integer, primary_key=True)
   var1 = Column(String)
   var2 = Column(Date)

DDL

NewTable().__table__.create(engine)

A new table should appear in the new database:

9e5c1f5c-d4a9-472b-ae04-ca5639aae9a8.png

SQL

INSERT

from datetime import date

for i in range(1000):
   row = NewTable(id=i, var1=f'test_str_{i}', var2=date(2021, 5, 3))
   session.add(row)

session.commit()

SELECT

session.query(NewTable).filter(NewTable.id >= 500).first().id
session.query(NewTable).filter_by(var1='test_str_2').first().id

Conclusions

Should ClickHouse replace traditional databases like Postgres, MySQL, Oracle? Definitively no. These databases have a lot of features that ClickHouse doesn’t currently have nor it is intended to have in the future (primary key basic concepts, unique columns…). It can be considered an analytics database but not a fully functioning transactional one.

However, ClickHouse speed is so amazing that it should be definitively the GOTO when there is a huge amount of tabular data.

Artículos relacionados
"Do as you're told": How to make LLMs behave like you want
Tutoriales

"Do as you're told": How to make LLMs behave like you want

Discover how to navigate the complexities of using LLMs as components in interconnected systems, ensuring smooth operations and reliable outputs.
Quantum Machine Learning: from Zero to Hero
Tutoriales

Quantum Machine Learning: from Zero to Hero

Learn how to build a Quantum Machine Learning pipeline along with some Quantum ML vs Classical ML benchmarking.
The definitive guide to Python virtual environments with conda
Tutoriales

The definitive guide to Python virtual environments with conda

Master Python virtual environments with conda, once and for all.
Lidera el cambio ·
Lidera el cambio ·
Lidera el cambio ·
Contáctanos
Sobre WhiteBox
ServiciosEmpleoSobre nosotrosBlog
Contáctanos
ContactoLinkedInTwitterCómo llegar
WhiteBox

Cubrimos las necesidades de los proyectos de Inteligencia Artificial de principio a fin.

es
en
LegalCookiesPrivacidadCalidad