Breaking tests if uncommitted DB transactions are found in the Flask-SQLAlchemy project

Sep 07, 2023

"Testing leads to failure, and failure leads to understanding." - Burt Rutan

Hello, I’m Satyam Jadhav, I work as a Senior Member of Technical Staff at Fyle.

I started using the SQLAlchemy ORM for the first time when I joined Fyle. Having come from a Django background, my experience had been limited to using the Django ORM until then.

After working for 2 months I found that there is one major difference between Django ORM and SQLAlchemy ORM. Django ORM is not transaction first, it always uses atomic statements like insert, update, delete etc. by default when making DB queries. On the other hand, SQLAlchemy is transaction first, it wraps every operation in the DB transaction. Django uses transactions only when we explicitly ask it to do so.

Due to this behavior of SQLAlchemy, I encountered a problem in writing unit test cases in Flask-SQLAlchemy project. I wanted to roll back all the DB changes made during a particular test so that other tests do not get impacted. Doing this thing in Django is easy as it is not transaction first, pytest-django provides django_db mark which runs db in transaction mode and rolls back at the end of the test. Since SQLAlchemy is a transaction first, a similar type of solution does not work.

There is a solution on the internet for SQLAlchemy where commit calls are patched to not persist transactions, Transactions are rolled back at the end of the test. Patching commits calls creates a new problem. If the developer misses to write a commit call in code, the test for the code won’t fail. Ideally, we want that test code to catch such misses and result in failure.

To solve this problem I extended the solution on the internet. The idea is at the end of the test check if there are any uncommitted transactions, if found break the test so that developer becomes aware that he forgot to commit the transaction. Before getting into implementation let’s understand a few concepts of SQLAlchemy.

SQLAlchemy

As mentioned earlier SQLAlchemy is transactions first, It always creates a transaction to execute DB queries. Following python level objects are used to manage transactions in SQLAlchemy.

Session

An SQLAlchemy Session object is bound to a database connection. Whenever the query is made a new database transaction is created using a database connection. This transaction remains in progress until the session is rolled back, committed, or closed. The session will begin a new transaction if it is used again.

References:

https://docs.sqlalchemy.org/en/20/orm/session_basics.html#what-does-the-session-do https://docs.sqlalchemy.org/en/20/orm/session_basics.html#session-faq-whentocreate

Scoped Session

A scoped session returns the same session object each time it is initialized within a specific scope. In our case, we create a scoped session for the function-level test scope. This means that a single session object is created at the start of the test and used throughout its execution.

References:

https://docs.sqlalchemy.org/en/20/orm/contextual.html#contextual-thread-local-sessions

Outside Session Scope Transaction

If we begin a transaction on a database connection directly and then bind it to the session then such a transaction is considered an outside session scope transaction. This type of scoping allows SQLAlchemy to implement nested transactions using different session objects sharing the same database connection.

References: https://docs.sqlalchemy.org/en/20/orm/session_api.html#sqlalchemy.orm.Session.params.join_transaction_mode

Patching commit calls

We want to patch commit calls made during test execution so that they are not persisted and we can roll back them at the end of the test. This helps to keep unit tests independent of each other.

We can create a session with ‘outside session scope transaction’ to achieve this. When the commit method is called on the Session object with ‘outside session scope transaction’ then the default behavior is to not commit the transaction if there is no savepoint in a transaction.

In the below snippet, we are creating a Session object having ‘outside session scope transaction’ and then running a test. After running the test we can roll back the ‘outside session scope transaction’.

from flask import Flask
from flask_sqlalchemy import SQLAlchemy

# Create a database object
app = Flask(__name__)
app.config["SQLALCHEMY_DATABASE_URI"] = "postgresql://%s:%s@%s:%s/%s"
db = SQLAlchemy(session_options={"autocommit": False, "autoflush": True})
db.init_app(app)

# Create a connection and start a transaction (before binding connection to session)
connection = db.engine.connect()
transaction = connection.begin()

# Create a scoped session and bind a connection to it
session = scoped_session(session_factory=sessionmaker(bind=connection))

# Set session on database object so that application code uses this session object
db.session = session

# Run test
test_function()

# Roll back transaction
transaction.rollback()

Detecting uncommitted transactions

Since we force the session to operate in 'rollback_only' mode, .commit() calls are not verified. If a developer forgets to write a .commit() call after making database modifications, the unit test that asserts those modifications will succeed without any errors, thus defeating the purpose of writing unit test cases.

To solve this problem, we can verify that at the end of the test, there are no uncommitted changes. We have two types of uncommitted changes.

Not flushed changes:

Database objects are modified but those are not flushed, these changes could easily be identified by inspecting the Session object.

if not (session.new or session.deleted or session.dirty):
    # No 'not flushed' changes

Uncommitted changes:

Database objects are modified, and changes are flushed but not committed, To identify such changes we need to track if flush calls are followed by commit call or not. To track the result we will be using a simple data class called BooleanState.

class BooleanState:
    """
    Data class to store a boolean state or flag. Useful when tracking a      single/global state that
    can be modified from multiple places.

    Parameters
    ----------
    name: str
        Human readable name for the state.
    initial_state: bool
        Initial value of the state.
    """

    def __init__(self, name: str, initial_state: bool):
        self.name = name
        self._state = initial_state

    def __repr__(self) -> str:
        return f"BooleanState<{self.name}: {self._state}>"

    def __bool__(self) -> bool:
        return self._state

    def mark_true(self) -> None:
        self._state = True

    def mark_false(self) -> None:
        self._state = False

To track flush and commit calls we can listen to events generated by SQLAlchemy.

all_flushed_changes_are_committed = BooleanState(
    "All flushed changes are committed", True
)
@event.listens_for(session, "after_flush_postexec")
def un_mark_all_flushed_changes_are_committed(session, flush_context):
    all_flushed_changes_are_committed.mark_false()

@event.listens_for(session, "after_commit")
def mark_all_flushed_changes_are_committed(session):
    all_flushed_changes_are_committed.mark_true()

Seeing everything in action

Now that we have all pieces ready, let’s stitch them together! We can create a pytest fixture called db_session which we can use in tests. This will take care of roll backing db modifications done during the test and also fail the test if there are any uncommitted transactions found. At the end of the test, we also remove the session object so that for every test new session object is created with ‘outside scope session transaction’.

@pytest.fixture(scope="function")
def db_session(db):
    """
    Fixture that sets up a scoped SQLAlchemy session and wraps it in a transaction for test
    execution with automatic rollback of changes.    
    """
    connection = db.engine.connect()
    transaction = connection.begin()
    session = scoped_session(session_factory=sessionmaker(bind=connection))
    db.session = session
    all_flushed_changes_are_committed = BooleanState(
        "All flushed changes are committed", True
    )

    @event.listens_for(session, "after_flush_postexec")
    def un_mark_all_flushed_changes_are_committed(session, flush_context):
        all_flushed_changes_are_committed.mark_false()

    @event.listens_for(session, "after_commit")
    def mark_all_flushed_changes_are_committed(session):
        all_flushed_changes_are_committed.mark_true()

    yield session

    try:
        assert not (session.new or session.deleted or session.dirty)
        assert all_flushed_changes_are_committed
    except AssertionError as e:
        # Rollback transaction for other tests to work, otherwise we run into a deadlock
        transaction.rollback()
        session.remove()
        logger.error(
            "Uncommitted instance states found, did you forget to call db.session.commit()?"
        )
        raise e
    transaction.rollback()
    session.remove()

Why not use unittest.mock to patch commit calls?

If we use unittest.mock to patch commit calls then SQLAlchemy events won’t work and we won’t be able to detect uncommitted transactions.

Conclusion

Roll backing DB changes in unit tests are not straightforward in SQLAlchemy. To roll back changes we need to run nested transaction so that the inner transaction’s commits gets patched and we can roll back the outer transaction at the end of unit tests.

To detect uncommitted changes in API code we can use SQLAlchemy session object state to know if there are changes pending for flush and SQLAlchemy events to know if flush changes are committed or not.

Fyle Stories

Discussion about this post