Databases — Part Two

15 min readJul 10, 2023

Databases lie at the heart of virtually all modern applications, ensuring that data is stored, retrieved, and manipulated efficiently and reliably. One of the key principles guiding database design and operation is the ACID properties (Atomicity, Consistency, Isolation, and Durability). Furthermore, understanding read phenomena (Dirty Reads, Non-Repeatable Reads, Phantom Reads, and Lost Updates) is crucial in building robust applications. Let’s dive into these concepts.

Read Phenomena

Read phenomena relate to potential issues that can arise when multiple transactions operate concurrently on a database.

Dirty Read: This occurs when a transaction reads uncommitted changes made by another transaction. For instance, if Transaction A modifies a value, but hasn’t yet committed those changes, Transaction B reads this uncommitted data. If Transaction A rolls back its changes, Transaction B has read “dirty” data that no longer exists.

Non-Repeatable Read: A non-repeatable read happens when a single transaction reads the same row twice but gets different data each time. This is because another transaction has modified that row in the interim.

Phantom Read: This is a phenomenon where a transaction reads records with a particular condition, then another transaction inserts or modifies records that match the condition, and when the first transaction repeats its read, it gets a different set of records.

Lost Updates: This occurs when two transactions both read and update a record concurrently. One of the transactions’ updates is lost because the second overwrites it.

ACID

The ACID properties ensure reliable processing in a database system.

Atomicity: This property ensures that a transaction is treated as a single, indivisible unit, which either succeeds completely or fails completely. If any part of a transaction fails, the entire transaction fails, and the database state is left unchanged.

Consistency: This property ensures that a transaction brings a database from one valid state to another, maintaining database rules and integrity constraints.

Isolation: This property ensures that the concurrent execution of transactions leaves the database in the same state as if the transactions were executed sequentially. Isolation is commonly expressed through different “isolation levels”.

Read uncommitted: The lowest level of isolation. Transactions may read uncommitted changes made by others (dirty read), leading to inconsistencies.
Read committed: Guarantees that any data read is committed at the moment it is read. It can prevent dirty reads but not non-repeatable or phantom reads.
Repeatable read: Ensures that if a transaction reads data once and then again, it will see the same data (preventing non-repeatable reads). However, it does not prevent phantom reads.
Snapshot: Creates a consistent snapshot of the data at the start of a transaction. It allows high concurrency while preventing dirty reads, non-repeatable reads, and phantom reads.
Serializable: The highest level of isolation. It ensures total isolation of the transaction from others. It’s as if the transactions were executed one after the other.

Durability: This property guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure.

The most commonly used techniques to ensure durability include Write-Ahead Logging (WAL), Append-Only File (AOF), and Asynchronous Snapshotting. WAL involves writing changes to a log before they are committed to the database. AOF involves a log of every write operation received by a database, which is played again at system startup to reconstruct the database state. Asynchronous Snapshotting involves taking regular snapshots of the database and saving them to disk.

Locking and Concurrency Control

Concurrency control and locking mechanisms are a vital part of implementing the Isolation property of ACID. Here are some common types of locks:

Exclusive lock (Write lock): If a transaction has an exclusive lock on a data item, no other transaction can read or write to that data item.
Shared lock (Read lock): Multiple transactions can have a shared lock on the same data item, allowing them to read the item. However, no transaction can write to it.
Intention lock: A lock on a table indicating that a transaction intends to lock a row in that table in the future. It’s used for managing hierarchical locking efficiently.
Gap lock: A lock on a range of index records preventing phantom reads by blocking new insertions in the range.

Pessimistic Locking

Pessimistic Locking is a strategy that assumes conflict is likely to occur. When a transaction begins, it locks the data it will work with. Other transactions cannot access this data until the lock is released, either when the transaction is completed or rolled back.

The primary benefit of Pessimistic Locking is that it prevents conflicts by not allowing any other transaction to access the data that is being modified. This ensures the safety of the transaction.

However, the downside is that it may significantly impact performance, especially in scenarios where read operations are far more common than write operations. Long-running transactions can hold locks for a long duration, causing other transactions to wait and leading to potential bottlenecks in your system.

To implement Pessimistic Locking, you would typically use a lock manager in your database. When a transaction wants to modify a piece of data, it asks the lock manager for an exclusive lock. If no other transaction holds a lock for that data, the lock is granted, and the data can be modified.

Optimistic Locking

Optimistic Locking, on the other hand, assumes that conflict is unlikely to happen. When a transaction begins, it does not lock the data. Instead, at the time of commit, the transaction verifies if the data it read and changed has not been altered by other transactions in the meantime.

The advantage of Optimistic Locking is its high level of concurrency, which can result in higher throughput in scenarios where conflict is unlikely, such as in predominantly read-based applications.

The downside is that if a conflict does occur, the transaction will need to be restarted, which can result in overhead if conflicts are frequent.

To implement Optimistic Locking, versioning is commonly used. Each data record has a version number, and when a transaction reads data, it also reads the version number. When the transaction wants to commit, it checks whether the current version number in the database matches the one it read. If they match, no other transaction has changed the data, and the commit can proceed.

Two-Phase Locking (2PL)

A system without 2PL might allow locks to be acquired and released in any order, without any particular structure. This means a transaction could acquire a lock, do some work, release the lock, acquire another lock, do some more work, release the second lock, and then reacquire the first lock if it needs to do more work on the first set of data.

In contrast, 2PL imposes a clear structure on when locks can be acquired and released:

In the first phase, the transaction can only acquire locks. This phase continues until the transaction has acquired all the locks it needs.
Once the transaction releases its first lock, it enters the second phase. In this phase, it can only release locks. It cannot acquire any new locks.

This structure has the important property that, if all transactions obey 2PL, any schedule (sequence of operations from multiple transactions) that the system produces will be serializable, which means it’s equivalent to some schedule in which the transactions run one at a time, in some order.

Deadlocks

A deadlock is a state where two or more transactions are unable to proceed because each is waiting for the other to release a lock. For example, if Transaction 1 locks Resource A and tries to access Resource B while Transaction 2 locks Resource B and tries to access Resource A, neither transaction can proceed, creating a deadlock.

Most modern databases handle deadlocks by implementing a Deadlock Detection and Prevention system. The most common strategies for deadlock prevention and avoidance include:

Wait-Die and Wound-Wait: These schemes are based on a non-preemptive technique. The basic idea is to validate the timestamp of transactions and decide their fate based on their age.
Ostrich Algorithm: This algorithm assumes that deadlocks occur very rarely and hence, it’s cheaper to let them occur and then restart the transaction rather than prevent them from occurring.
Timeout: Transactions are allowed to wait for a lock only for a specified amount of time. If they cannot obtain the lock within this time, they are rolled back and restarted.

To detect deadlocks, modern databases often use a “wait-for” graph, which tracks which transactions are waiting for locks held by which other transactions. If the graph has a cycle, a deadlock exists. The database then typically chooses one of the transactions in the cycle to terminate and roll back, breaking the deadlock.

Let’s dive deep into various aspects of databases — from how they store data, the role of indexes, performance optimization techniques, to advanced concepts like partitioning and sharding.

Pages

At the core of any database is how it manages the storage and retrieval of data. This fundamental operation happens at the level of “pages” — fixed-size blocks of storage. Pages are the smallest unit that a database reads or writes to disk.

Clustered Index and Index Organised Table

A clustered index determines the physical order of data in a table. In other words, it sorts the table rows based on the index key, making data retrieval faster. Postgres automatically creates a clustered index when a primary key is defined for a table.

An Index-Organized Table (IOT) is a type of table in which the primary key values determine the storage position of its rows. Here, the table data is stored within the primary key index itself.

In PostgreSQL, the concept of a secondary index arises when dealing with a table that uses a clustered index or is an IOT. A secondary index holds a copy of the primary or candidate key and a reference to the main data (either a physical address or a logical identifier).

Row vs Column Databases

Row databases (also known as OLTP databases) store data in rows, ideal for transactional systems where operations typically impact a complete record.

Column databases (also known as OLAP databases) store data in columns, enhancing data retrieval for analytics applications. This structure improves the speed of data aggregation and analysis.

Primary Key vs Secondary Key

The primary key is a unique identifier for a record in a table. There can only be one primary key in a table, and it cannot contain null values.

A secondary key (also known as a foreign key) is used to establish a link between the data in two tables. It can have duplicate values and can also contain null values.

Clustered or Non-clustered Indexes

A clustered index determines the physical order of data inside a table, which affects the time it takes to read data. There’s only one clustered index per table (often aligned with the primary key).

Non-clustered indexes, on the other hand, do not alter the physical order of rows. They maintain a separate object within a table that points back to the data rows, allowing for more than one non-clustered index per table.

“Analyze” Command in SQL

The ANALYZE command in SQL is used to update statistics about the distribution of the key values in the indexes of a table, which in turn allows the query planner to make better decisions.

Partial Index

A partial index can be extremely useful when you frequently query for a specific subset of data, and this subset represents a small portion of the entire table. There are several advantages:

Less Disk Space: Since the partial index includes only a portion of your table’s data, it uses less disk space than a full index.
Increased Performance: Querying the indexed subset of data is faster since the index is smaller and thus more efficient to traverse.
Less Maintenance: Inserting, updating, or deleting rows that aren’t in the partial index doesn’t result in any index updates, reducing the write costs and making these operations faster.

Example

Suppose you have a table called orders with a status column. Most of the rows have a status of delivered, but a small fraction are pending.

If you frequently run queries to find the pending orders, you could improve the performance of these queries by creating a partial index on the status column where status = 'pending'.

CREATE INDEX orders_pending_status_idx
ON orders (status)
WHERE status = 'pending';

Now, when you run a query to find pending orders:

SELECT * FROM orders WHERE status = 'pending';

PostgreSQL will use the partial index to retrieve the rows faster. But note that if you query for delivered orders, PostgreSQL won't use this index, and the query might be slower as a result. This is why it's essential to use partial indexes judiciously based on your specific workload and query patterns.

“Explain” Command in SQL

The EXPLAIN command in SQL is used to display the execution plan that the PostgreSQL planner generates for a given SQL statement. It shows how the table(s) referenced by the statement will be scanned — by plain sequential scan, by an index scan, etc., and if multiple tables are involved, what join algorithms will be used to bring together the required rows from each input table.

The keywords VERBOSE, COSTS, BUFFERS, TIMING, SUMMARY can be added to get more information.

Sequential Scan, Parallel Scan, Index Scan, Bitmap Scan, Bitmap Heap Scan, Index Only Scans

A Sequential Scan reads through all pages in a table to find the matching rows.

Parallel Scans are similar to sequential scans, but they distribute the work among several parallel workers, each scanning a portion of the table.

An Index Scan uses an index to find the needed rows, avoiding a scan of the entire table.

A Bitmap Scan also uses an index, but instead of immediately visiting the heap pages, it first creates a bitmap in memory representing the pages that contain the matching tuples.

A Bitmap Heap Scan is a two-step operation that first retrieves a bitmap index scan and then fetches only the necessary rows from the table.

An Index Only Scan fetches all needed columns from the index itself, avoiding heap access altogether.

Key vs Non-key Index

A Key Index is created on columns that include primary key constraints.

A non-key index, sometimes called a non-unique index, allows duplicate values in the indexed column(s). This type of index can be useful when queries frequently request data based on a column that doesn’t require unique values.

For example, consider a users table with the columns id, name, email, and country. id is the primary key, and email is a unique key

CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  email TEXT UNIQUE NOT NULL,
  country TEXT NOT NULL
}

Let’s say you frequently run queries to find users from a specific country. The country column is a non-unique field, meaning it contains duplicate values (multiple users can be from the same country).

Even though country is not unique, creating an index on this column can significantly speed up these queries.

CREATE INDEX idx_users_country ON users (country);

Now, when you query for users from a specific country, PostgreSQL can use this non-key index to quickly locate those users, instead of doing a full table scan.

SELECT * FROM users WHERE country = 'USA';

Here, idx_users_country is a non-key (non-unique) index. This index will contain entries for each row in users, and each entry will be the country value for that row and a reference to the row's location on disk. Because country is not unique, this index will contain duplicate values.

In this case, even though there might be many users from the ‘USA’, the query can quickly return all of them by using the idx_users_country index.

It’s worth noting that non-key indexes can also improve the speed of joins and GROUP BY operations on the indexed column.

Composite Index

A Composite Index is an index on two or more columns of a table. Its main advantage is improved query performance, as it can satisfy queries on any subset of its indexed columns.

The right-most column of a composite index is not used because queries can only use a composite index if there is a constraint on the first column(s) of the index. If the query does not constrain the leftmost portion of the index, the system cannot use the index for fast lookups.

“Create Index Including (column)” and its Uses

The “INCLUDE” clause of the CREATE INDEX command can be used to specify a list of non-key columns for the index. This is particularly useful for covering indexes in PostgreSQL.

The benefit of the INCLUDE columns is that they allow extra information to be stored in the index, reducing the need for the system to visit the actual table row. This can make certain “index only” scans faster.

How the Optimizer Knows Which Index to Use

The Query Optimizer uses statistics about the distribution of data in the index and the selectiveness of the query to decide which index to use. These statistics are generated by the ANALYZE command.

Importance of Vacuum in PostgreSQL

The VACUUM command in PostgreSQL recovers space from deleted rows and keeps tables from growing infinitely. It also updates statistics used by the PostgreSQL query planner.

The PostgreSQL storage system holds a visibility map (VM), which is a bitmap with one bit per heap page. A set bit means that all tuples on the page are visible to all transactions, and thus the page doesn’t need to be vacuumed. This map significantly reduces I/O for the vacuum operation.

“Create Index Concurrently”

The “CREATE INDEX CONCURRENTLY” command allows building an index without locking out writes. This command is slower, and requires more total work than a standard index build and generates more IO and consumes more CPU resources.

Shard vs Partition

Partitioning splits a table into smaller physical pieces. Partitioning can provide several benefits:

Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions.
Bulk loads and deletes can be accomplished by adding or removing partitions.

Sharding is the process of storing data records across multiple machines and is PostgreSQL’s approach to meeting the demand for data management and performance consistency at scale. Sharding reduces the database size per schema and thus reduces index size, making queries faster.

How Does Partitioning Work?

Partitioning is the process of splitting a large table into smaller, more manageable pieces called partitions. Partitions are defined based on a specified range, list, or hash key. Each partition is stored separately, resulting in improved performance and easier management.

In PostgreSQL, table partitioning can be managed through the declarative partitioning feature. This feature allows you to specify how you want your table divided. You define a partition key, which can be a range, a list, or a hash, and PostgreSQL automatically routes the data to the correct partition based on the key.

Pros and Cons of Partitioning

Pros:

Improved query performance: Since partitions limit the amount of data that needs to be scanned, queries can return results faster.
Easier to manage: Because data is separated into smaller tables, operations such as backup, restoration, and deletion can be done on individual partitions rather than the entire table.

Cons:

Row transitions between partitions: If a row’s partition key is updated, it may need to be moved to a different partition. This can be a resource-intensive process.
Inefficient query can scan all partitions: If a query doesn’t include a clause on the partition key, PostgreSQL might need to scan all partitions, which could lead to poor performance.

Sharding: Consistent Hashing, Pros, and Cons

Sharding is a method of splitting and storing a single logical dataset in multiple databases. This is done by creating smaller parts, called shards, and distributing them across several locations.

Consistent Hashing is a strategy that allows for minimal reorganization when shards are added or removed, reducing the need for resharding. This is especially useful for distributed databases.

Pros of Sharding:

Scalability: By distributing the data among multiple machines, a system can scale horizontally to support larger datasets and higher load.
Reduction in index size: With smaller amounts of data in each shard, index sizes are kept smaller, leading to faster query execution.
Isolation of Data: Each shard is isolated from the others. This isolation level increases reliability and availability since a problem in one shard does not directly affect others.

Cons of Sharding:

Increased Complexity: Implementing shards adds complexity to the database architecture. Application logic becomes more complex as it now has to determine which shard to query.
Transactions and Joins: Sharding can make transactions and joins more complicated, as they may need to operate across multiple shards.
Data Distribution: If the data isn’t distributed evenly across shards, some shards may become hot spots, leading to performance issues.

B-tree vs B+ tree

In the context of databases, a B-tree and a B+ tree are types of data structures used to organize information for efficient retrieval.

A B-tree stores keys in its internal nodes but need not store those keys in the records at the leaves. Its leaf nodes contain all keys in the tree. In a B-tree, a node with k children contains k-1 keys. B-trees are good when record access is primarily by key.

A B+ tree is a variant of B-tree where the tree contains only keys, and establishes a linear link across all leaves containing the same data. The internal nodes of a B+ tree only store the key values and not the associated records. This makes B+ trees ideal for systems that read and write large blocks of data, such as databases and file systems.

Why is an index on UUID bad?

UUID stands for Universally Unique Identifier. It is a 128-bit number used to uniquely identify information in computer systems. While UUIDs are great for uniquely identifying records, they are not ideal for being indexed, primarily due to their size and randomness.

Indexes work best when the indexed data is small and has some natural ordering. UUIDs, being large and random, do not provide any ordering, leading to poor index performance. Every insertion causes a random write to disk, leading to more disk seeks and a slower index operation.

Scatter Index Scan

A scatter index scan is a term used in the context of sharded databases. This type of scan refers to an operation where a query gets executed across multiple, or all, shards in a database. It’s called a scatter scan because the query “scatters” to all shards, collects results, and then combines them into a final result set.

A scatter index scan can be costly in terms of performance as it could lead to high network latency, depending on the distribution of the shards. However, if data is sharded correctly, and the database is well-indexed, such scans can be optimized for performance.