Implementing a Write-Ahead Log (WAL) in Go
Introduction
I love digging into how our everyday tools work under the hood! As a software engineer, I find it incredibly rewarding to peek behind the curtain. It builds my understanding of tools and patterns like nothing else could. Let me show you how to build a Write-Ahead Log (WAL) step-by-step - it's more than just code, it's a gateway to understanding the magic that powers our tools.
Think about the tools you use every day: databases, version control systems, message queues. They're all using similar patterns behind the scenes. A WAL is a fundamental building block for all of these tools. When you build this component yourself, you'll:
- Craft better system architectures with confidence
- Track down tricky bugs like a pro
- Master performance tuning for more complex systems that use a WAL
- Marvel at the elegant solutions in existing systems
While our WAL implementation isn't as complex as production systems, it will teach you the core principles that keep your data safe when things go wrong. You'll see firsthand how theoretical concepts transform into robust, working code.
Let me explain how a WAL keeps your data safe: it acts like a safety net for your storage systems. Before making any changes to your database, you first write them to a sequential log. If your system crashes, no problem! Just replay the log, and you'll recover your data perfectly.
Here's a real example: imagine you're building a key-value store. When someone saves a value, you first write it to your WAL. Even if your application crashes right after, you're covered - just restart, read the WAL, and rebuild your database state. Simple!
I've built this WAL implementation to be:
- Segmented for better performance
- Protected with CRC checks
- Ready for concurrent access
- Simple to recover from
The Role of Atomic Operations
Throughout this WAL implementation, atomic operations play a crucial role in ensuring thread safety and data consistency without excessive locking. These operations are essential for maintaining the WAL's integrity in a concurrent environment. Let's examine the key uses of atomic operations:
Position Tracking
The segment.position
field uses atomic operations to safely track write positions:
currentPos := atomic.LoadInt64(&segment.position)
if !atomic.CompareAndSwapInt64(&segment.position, currentPos, newPos) {
continue
}
This atomic Compare-and-Swap (CAS) operation ensures that:
- Multiple goroutines can safely compete for write space
- No two writes will ever overlap or corrupt each other
- Failed CAS operations automatically trigger retries
Segment ID Management
The nextSegmentID field uses atomic operations to generate unique segment IDs:
newID := atomic.AddInt64(&w.nextSegmentID, 1)
This ensures that:
- Each segment gets a unique, sequential ID
- No two segments will ever have the same ID, even under heavy concurrency
- ID generation is lock-free and efficient
Why Atomic Operations Matter
The use of atomic operations provides several critical benefits:
- Better Performance: Atomic operations are generally faster than mutex locks for simple operations
- Reduced Contention: They allow for fine-grained concurrency control
- Data Consistency: They prevent race conditions and ensure operations complete correctly
- Lock-Free Algorithms: They enable certain operations to proceed without acquiring mutex locks
While the WAL still uses mutex locks for complex operations (like file I/O), atomic operations handle the high-contention scenarios where multiple goroutines frequently access the same values. This hybrid approach balances safety with performance.
Typical Client Usage
Here are the key operations you'll use when working with this WAL implementation. These patterns cover the entire lifecycle of WAL usage, from initialization through normal operations to proper shutdown.
On startup:
w, err := NewWAL() // Open or create a WAL, which recovers previous state
On updates:
AppendRecord() // Call for each PUT or DELETE operation
On crash recovery:
ReadAll() // Replay all records to rebuild in-memory database
On shutdown:
Close() // Release resources and ensure data is safely on disk
Constants and Types
The WAL implementation uses several key types and constants to represent its data structures and operations. These types provide the foundation for storing and managing records, tracking positions, and maintaining data integrity across segments.
Let's examine each core type and constant, along with its purpose and usage patterns:
Record Types
These define the fundamental operations that can be stored in the WAL.
const (
PUT byte = 1
DELETE byte = 2
)
As a client, choose one of these types to indicate the kind of change you're recording in the WAL. Later, when you “replay” the log for recovery, you can build the data structure such as a B-tree by applying each operation in turn.
The WAL Struct
The WAL struct serves as the central coordinator for all write-ahead logging operations. It manages segments, handles concurrent access, and provides the interface through which clients interact with the log. This struct encapsulates both the state and behavior needed for reliable, persistent storage of records.
Key responsibilities of the WAL struct include:
- Managing multiple log segments and their lifecycle
- Coordinating concurrent access through mutex locks
- Tracking active segments and maintaining size limits
- Providing atomic operations for record appending
The main entry point for client interactions:
type WAL struct {
mu sync.RWMutex
dir string
activeSegment *segment
segments []*segment
segmentSize int64
nextSegmentID int64
}
Client usage example:
// Create new or open existing WAL
w, err := NewWAL("wal_dir", 1024*1024)
if err != nil {
// handle error
}
The Segment Struct
Segments are a critical part of WAL design that solve several important problems:
- Size Management: Instead of one massive log file that could grow indefinitely, segments break the log into manageable chunks
- Cleanup and Rotation: Old segments can be safely removed once their data is persisted to the main database, without disrupting active writes
- Performance: Smaller files are faster to read and write, improving overall system performance
- Concurrent Access: Multiple readers can access different segments simultaneously while new writes occur
Think of segments like chapters in a book - they organize the content into digestible pieces while maintaining the overall narrative. Each segment has:
- A unique, sequential ID to maintain order
- Its own file handle for independent I/O operations
- An atomic position counter for thread-safe writes
- Size limits to prevent unbounded growth
Internal structure representing each physical log file:
type segment struct {
id int64
file *os.File
position int64 // atomic
mu sync.Mutex
}
Note: Clients don't interact with segments directly. The WAL handles segment management internally.
Position Type
The Position type is crucial for tracking exactly where records are stored in the WAL, enabling:
- Random access reads - quickly locate and retrieve specific records without scanning the entire log
- Crash recovery - maintain record order when rebuilding state after a system failure
- Record references - allow applications to maintain pointers to specific log entries
- Segment management - help identify which segment contains a particular record
Identifies the exact location of a record in the WAL:
type Position struct {
SegmentID int64
Offset int64
}
Example usage:
pos, err := w.AppendRecord(PUT, []byte("user:123"), []byte("Alice"))
if err != nil {
// handle error
}
// pos can be stored for later record lookup
Record Type
The Record type is a fundamental data structure that serves three key purposes:
- Data Storage: It combines the record type (PUT/DELETE), key, and value into a single unit, making it easy to store and retrieve complete operations
- Data Integrity: The CRC (Cyclic Redundancy Check) field stores a checksum of the record's contents, allowing the system to detect any corruption that might occur during storage or transmission
- Location Tracking: The Position field stores exactly where this record lives in the WAL (which segment and offset), enabling fast random access and proper sequencing during crash recovery
When writing a record, the WAL calculates its CRC and stores it alongside the data. Later, when reading the record, it recalculates the CRC and compares it to the stored value - if they don't match, we know the data has been corrupted.
Represents the data stored in the WAL:
type Record struct {
Type byte
Key []byte
Value []byte
CRC uint32
Position Position
}
Example of reading a record:
record, err := w.ReadRecord(pos)
if err != nil {
// handle error
}
fmt.Printf("Read record: Type=%d, Key=%s, Value=%s\\n",
record.Type, record.Key, record.Value)
Core Functions
The core functions of this WAL implementation provide the essential operations needed for reliable, persistent logging. These functions handle everything from initialization and record management to crash recovery and cleanup. Let's examine each core function in detail, understanding its purpose, implementation, and usage patterns.
Each function is designed with careful attention to concurrency, data integrity, and error handling. We'll see how they work together to provide a robust foundation for building reliable storage systems.
NewWAL
Creates or opens a WAL from a directory:
func NewWAL(dir string, segmentSize int64) (*WAL, error) {
if err := os.MkdirAll(dir, 0755); err != nil {
return nil, fmt.Errorf("creating WAL directory: %w", err)
}
wal := &WAL{
dir: dir,
segmentSize: segmentSize,
}
if err := wal.recoverSegments(); err != nil {
return nil, fmt.Errorf("recovering segments: %w", err)
}
if len(wal.segments) == 0 && wal.activeSegment == nil {
if err := wal.createNewSegment(); err != nil {
return nil, fmt.Errorf("creating initial segment: %w", err)
}
}
return wal, nil
}
Client usage:
w, err := NewWAL("mywal", 10*1024*1024) // 10MB segments
if err != nil {
log.Fatal(err)
}
RecoverSegments
The RecoverSegments function is a critical component that runs during WAL initialization, responsible for rebuilding the WAL's state from existing segment files on disk. It scans the WAL directory, identifies valid segment files, and reconstructs the internal segment structures needed for normal operation.
This function provides crucial durability guarantees by ensuring no data is lost between process restarts. It handles tasks such as:
- Discovering and validating existing segment files
- Rebuilding segment metadata and positions
- Establishing the active segment for new writes
- Maintaining segment ordering and continuity
Internal method that runs during WAL initialization:
func (w *WAL) recoverSegments() error {
files, err := os.ReadDir(w.dir)
if err != nil {
return fmt.Errorf("reading WAL directory: %w", err)
}
var maxID int64
segments := make(map[int64]*segment)
for _, file := range files {
if filepath.Ext(file.Name()) != ".wal" {
continue
}
var id int64
_, err := fmt.Sscanf(file.Name(), "segment-%d.wal", &id)
if err != nil {
continue
}
path := filepath.Join(w.dir, file.Name())
f, err := os.OpenFile(path, os.O_RDWR, 0666)
if err != nil {
return fmt.Errorf("opening segment file %s: %w", path, err)
}
info, err := f.Stat()
if err != nil {
f.Close()
return fmt.Errorf("stating segment file %s: %w", path, err)
}
seg := &segment{
id: id,
file: f,
}
atomic.StoreInt64(&seg.position, info.Size())
segments[id] = seg
if id > maxID {
maxID = id
}
}
w.segments = make([]*segment, 0, len(segments))
for i := int64(1); i <= maxID; i++ {
if seg, exists := segments[i]; exists {
w.segments = append(w.segments, seg)
}
}
if len(w.segments) > 0 {
w.activeSegment = w.segments[len(w.segments)-1]
w.segments = w.segments[:len(w.segments)-1]
atomic.StoreInt64(&w.nextSegmentID, maxID)
}
return nil
}
CreateNewSegment
The CreateNewSegment function is responsible for creating and initializing new log segments when the current active segment reaches its size limit. This function handles the critical task of maintaining continuous write availability while ensuring proper segment file creation and management.
Key responsibilities of CreateNewSegment include:
- Atomic segment ID generation to prevent duplicates
- Safe file creation with appropriate permissions
- Proper segment rotation and management
- Thread-safe operation through careful locking
Internal method for creating new segments when needed:
func (w *WAL) createNewSegment() error {
newID := atomic.AddInt64(&w.nextSegmentID, 1)
path := filepath.Join(w.dir, fmt.Sprintf("segment-%d.wal", newID))
file, err := os.OpenFile(path, os.O_CREATE|os.O_RDWR, 0666)
if err != nil {
return fmt.Errorf("creating segment file: %w", err)
}
newSegment := &segment{
id: newID,
file: file,
}
atomic.StoreInt64(&newSegment.position, 0)
if w.activeSegment != nil {
w.segments = append(w.segments, w.activeSegment)
}
w.activeSegment = newSegment
return nil
}
AppendRecord
The AppendRecord function is the cornerstone of the WAL implementation, responsible for safely and atomically writing new records to disk. This function ensures durability by using CRC checksums and careful synchronization, while handling concurrent access through a combination of atomic operations and mutex locks. It automatically manages segment rotation when size limits are reached.
Key features of AppendRecord include:
- Atomic record writing with position tracking
- Built-in CRC32 checksum generation and storage
- Automatic segment rotation when size limits are reached
- Thread-safe operation through careful locking
Primary method for storing data:
func (w *WAL) AppendRecord(recordType byte, key, value []byte) (Position, error) {
if len(key) == 0 {
return Position{}, errors.New("key cannot be empty")
}
headerSize := 13
recordSize := headerSize + len(key) + len(value)
for {
w.mu.RLock()
segment := w.activeSegment
w.mu.RUnlock()
if segment == nil {
return Position{}, errors.New("no active segment")
}
currentPos := atomic.LoadInt64(&segment.position)
newPos := currentPos + int64(recordSize)
if newPos > w.segmentSize {
w.mu.Lock()
if err := w.createNewSegment(); err != nil {
w.mu.Unlock()
return Position{}, fmt.Errorf("creating new segment: %w", err)
}
w.mu.Unlock()
continue
}
if !atomic.CompareAndSwapInt64(&segment.position, currentPos, newPos) {
continue
}
buf := make([]byte, recordSize)
offset := 4 // CRC space
buf[offset] = recordType
offset++
binary.LittleEndian.PutUint32(buf[offset:], uint32(len(key)))
offset += 4
binary.LittleEndian.PutUint32(buf[offset:], uint32(len(value)))
offset += 4
copy(buf[offset:], key)
offset += len(key)
copy(buf[offset:], value)
crc := crc32.ChecksumIEEE(buf[4:])
binary.LittleEndian.PutUint32(buf[0:], crc)
segment.mu.Lock()
pos := Position{
SegmentID: segment.id,
Offset: currentPos,
}
if _, err := segment.file.WriteAt(buf, currentPos); err != nil {
segment.mu.Unlock()
return Position{}, fmt.Errorf("writing record: %w", err)
}
if err := segment.file.Sync(); err != nil {
segment.mu.Unlock()
return Position{}, fmt.Errorf("syncing record: %w", err)
}
segment.mu.Unlock()
return pos, nil
}
}
Client usage:
pos, err := w.AppendRecord(PUT, []byte("session:abc"), []byte("data123"))
if err != nil {
// handle error
}
ReadRecord
The ReadRecord function is a critical component that enables random access to records stored in the WAL. It takes a Position parameter (obtained from a previous AppendRecord call) and returns the corresponding Record. This function ensures data integrity by validating CRC checksums and handles concurrent access through appropriate locking mechanisms.
Key aspects of ReadRecord include:
- Efficient segment lookup and record retrieval
- CRC validation to ensure data integrity
- Thread-safe operation through read locks
- Proper error handling for missing or corrupted records
Retrieves a specific record by position:
func (w *WAL) ReadRecord(pos Position) (*Record, error) {
w.mu.RLock()
var segment *segment
if w.activeSegment != nil && w.activeSegment.id == pos.SegmentID {
segment = w.activeSegment
} else {
for _, seg := range w.segments {
if seg.id == pos.SegmentID {
segment = seg
break
}
}
}
w.mu.RUnlock()
if segment == nil {
return nil, fmt.Errorf("segment %d not found", pos.SegmentID)
}
segment.mu.Lock()
defer segment.mu.Unlock()
headerBuf := make([]byte, 13)
_, err := segment.file.ReadAt(headerBuf, pos.Offset)
if err != nil {
return nil, fmt.Errorf("reading record header: %w", err)
}
storedCRC := binary.LittleEndian.Uint32(headerBuf[0:])
recordType := headerBuf[4]
keyLen := binary.LittleEndian.Uint32(headerBuf[5:9])
valueLen := binary.LittleEndian.Uint32(headerBuf[9:13])
dataBuf := make([]byte, keyLen+valueLen)
_, err = segment.file.ReadAt(dataBuf, pos.Offset+13)
if err != nil {
return nil, fmt.Errorf("reading record data: %w", err)
}
fullRecord := make([]byte, 1+4+4+len(dataBuf))
copy(fullRecord, headerBuf[4:])
copy(fullRecord[9:], dataBuf)
computedCRC := crc32.ChecksumIEEE(fullRecord)
if computedCRC != storedCRC {
return nil, errors.New("CRC mismatch")
}
return &Record{
Type: recordType,
Key: dataBuf[:keyLen],
Value: dataBuf[keyLen:],
CRC: storedCRC,
Position: Position{
SegmentID: pos.SegmentID,
Offset: pos.Offset,
},
}, nil
}
Close
The Close function ensures proper cleanup and resource management when shutting down the WAL. It's essential for maintaining data integrity by ensuring all files are properly closed and resources are released. This function should be called during graceful shutdowns to prevent file descriptor leaks and ensure all pending writes are completed.
Key responsibilities of Close include:
- Safely closing all segment files
- Ensuring proper cleanup of resources
- Maintaining thread safety during shutdown
- Handling both active and inactive segments
Properly shuts down the WAL:
func (w *WAL) Close() error {
w.mu.Lock()
defer w.mu.Unlock()
if w.activeSegment != nil {
if err := w.activeSegment.file.Close(); err != nil {
return err
}
}
for _, segment := range w.segments {
if err := segment.file.Close(); err != nil {
return err
}
}
return nil
}
ReadAll
The ReadAll function is a critical utility for crash recovery and data reconstruction. It sequentially reads all records from all segments, performing CRC validation to ensure data integrity. This function is particularly useful when rebuilding state after a system crash or when migrating data.
Key features of ReadAll include:
- Sequential scanning of all segments in order
- Full CRC validation of every record
- Atomic reading with proper locking
- Handling of partial or corrupted records
Reads all records for crash recovery:
func (w *WAL) ReadAll() ([]*Record, error) {
w.mu.RLock()
segments := append([]*segment(nil), w.segments...)
if w.activeSegment != nil && (len(segments) == 0 || segments[len(segments)-1].id != w.activeSegment.id) {
segments = append(segments, w.activeSegment)
}
w.mu.RUnlock()
var allRecords []*Record
for _, seg := range segments {
seg.mu.Lock()
currentPos := int64(0)
for {
headerBuf := make([]byte, 13)
n, err := seg.file.ReadAt(headerBuf, currentPos)
if err != nil {
if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) || n == 0 {
break
}
seg.mu.Unlock()
return nil, fmt.Errorf("reading record header: %w", err)
}
if n < len(headerBuf) {
break
}
storedCRC := binary.LittleEndian.Uint32(headerBuf[0:])
recordType := headerBuf[4]
keyLen := binary.LittleEndian.Uint32(headerBuf[5:9])
valueLen := binary.LittleEndian.Uint32(headerBuf[9:13])
dataSize := keyLen + valueLen
dataBuf := make([]byte, dataSize)
n, err = seg.file.ReadAt(dataBuf, currentPos+13)
if err != nil {
if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) || n == 0 {
break
}
seg.mu.Unlock()
return nil, fmt.Errorf("reading record data: %w", err)
}
if uint32(n) < dataSize {
break
}
fullRecord := make([]byte, 1+4+4+len(dataBuf))
copy(fullRecord, headerBuf[4:])
copy(fullRecord[9:], dataBuf)
computedCRC := crc32.ChecksumIEEE(fullRecord)
if computedCRC != storedCRC {
seg.mu.Unlock()
return nil, errors.New("CRC mismatch")
}
record := &Record{
Type: recordType,
Key: dataBuf[:keyLen],
Value: dataBuf[keyLen:],
CRC: storedCRC,
Position: Position{
SegmentID: seg.id,
Offset: currentPos,
},
}
allRecords = append(allRecords, record)
currentPos += 13 + int64(dataSize)
}
seg.mu.Unlock()
}
return allRecords, nil
}
Client usage:
records, err := w.ReadAll()
if err != nil {
// handle error
}
for _, r := range records {
// Reapply PUT or DELETE operations to reconstruct state
}
Concurrency, Integrity, and Extensions
The WAL implementation uses several mechanisms to ensure safety and reliability:
Concurrency Safety
sync.Mutex
andsync.RWMutex
for thread-safe access- Atomic operations for position tracking
- Lock-free segment transitions
Data Integrity
- CRC32 checksums for all records
- Validation on every read operation
- Safe segment rotation
Potential Extensions
Future enhancements could include:
- Write Batching
- Group multiple records into single writes
- Improve throughput for high-volume scenarios
- Compression
- Compress records before writing
- Reduce disk space usage
- Indexing
- Build indexes for faster lookups
- Support time-range queries
Conclusion
This WAL implementation provides a robust foundation for building reliable storage systems. Key features:
- Simple client interface with clear semantics
- Automatic segment management
- Strong consistency guarantees
- Built-in crash recovery
- Thread-safe operations
By understanding both the internal mechanisms and client usage patterns, you can effectively integrate this WAL into your own applications or extend it to meet specific requirements.
Best Practices
When using this WAL:
- Always call
Close()
during graceful shutdown - Handle CRC mismatch errors appropriately
- Use
ReadAll()
for complete recovery after crashes - Consider batching multiple records in high-throughput scenarios
- Monitor segment sizes and rotation frequency
Remember that the WAL is designed to be a durable source of truth for your data operations. Use it accordingly in your architecture.