Improving Data Synchronization in Platform: Addressing Foreign Key Constraints and Memory Usage

Introduction

The Reimpact/platform project encountered challenges during data synchronization, specifically with foreign key constraints and memory exhaustion issues. This post outlines the problem, the solution implemented, and key takeaways for managing large-scale data transfers.

The Challenge

During the synchronization of a large dataset (1.45M rows) from a production MySQL database to a local PostgreSQL instance, the process was failing due to:

  • Foreign Key (FK) Ordering Issues: Tables were being truncated without considering dependencies, causing errors.
  • Memory Exhaustion: Large chunk sizes during data insertion led to excessive memory usage.

The Solution

To address these challenges, the following changes were implemented:

  • Reverse Dependency Truncation: All target tables are now truncated in reverse dependency order before any data insertion.
  • Removal of Per-Table Truncate: The individual table truncates, which were causing cascading deletions of parent data, were removed.
  • Reduced Chunk Size: The chunk size was reduced from 500 to 200 to minimize memory consumption during insertion.

The process now involves determining the correct order, then iterating through and truncating:

class SyncService
{
    public function syncData(array $tables):
void
    {
        $orderedTables = $this->getTablesOrderedByDependency($tables);

        foreach ($orderedTables as $table) {
            $this->truncateTable($table);
        }

        foreach ($tables as $table) {
            $this->insertData($table, 200); // Chunk size of 200
        }
    }

    private function truncateTable(string $table):
void
    {
        DB::table($table)->truncate();
    }

    private function insertData(string $table, int $chunkSize):
void
    {
        // Logic to fetch and insert data in chunks
    }

    private function getTablesOrderedByDependency(array $tables): array
    {
         //Logic to determine table truncation order
    }
}

Key Decisions

  1. Truncate All Upfront: Ensures a clean slate before data insertion, avoiding FK violations.
  2. Reverse Dependency Order: Prevents errors by truncating child tables before parent tables.
  3. Smaller Chunk Size: Reduces memory footprint during data insertion.

Results

The changes resulted in a successful synchronization of 1.45M rows, including sales, recipes, products, and other related data, from the production MySQL database to the local PostgreSQL instance. Furthermore, code analysis tools report zero errors.

Lessons Learned

When synchronizing data between databases, especially with foreign key constraints, it's crucial to manage table truncation order and memory usage. Always consider database dependencies and optimize chunk sizes to prevent memory exhaustion. If possible, avoid cascading deletes during truncation to prevent accidental data loss.

Improving Data Synchronization in Platform: Addressing Foreign Key Constraints and Memory Usage
GERARDO RUIZ

GERARDO RUIZ

Author

Share: