Streamlining Product Data Validation with a Modular PHP Service

Introduction

In the Reimpact/platform project, managing product data often involves importing information from various sources, such as spreadsheets, and ensuring its integrity. This process demands a robust validation system that can handle diverse data types, complex dependencies, and business rules. We recently focused on enhancing our product data validation capabilities to make this process more reliable and maintainable.

The Problem

Product data validation presents several challenges:

  1. Diverse Sources: Data can come from external sheets or be cross-referenced with existing database records.
  2. Complex Business Rules: Fields often have interdependencies (e.g., dimensions like length, width, height, and volume must either all be present or all be null).
  3. Uniqueness Constraints: Critical identifiers like SKUs and product names must be unique.
  4. Data Option Adherence: Certain fields, like units (length, volume, mass) or origins, must conform to a predefined set of options.
  5. Maintainability: Hardcoding validation logic or field names makes the system rigid and difficult to update as business requirements evolve.

The Solution: Building a Comprehensive Data Validation Service

To address these challenges, we implemented a modular approach centered around a ProductValuesValidator service and a SheetLoaderService. This architecture separates concerns, allowing for clearer, more testable validation logic.

The SheetLoaderService is responsible for parsing and providing data from various sheets, abstracting away the specifics of file handling. The ProductValuesValidator then orchestrates a series of specialized validation checks, often relying on data provided by the SheetLoaderService or direct database lookups.

A key aspect of this solution involved leveraging PHP enums to define acceptable data options and field names. This significantly improves maintainability by centralizing definitions and making the code more readable and less prone to typos.

Example: Enum-driven Fillable Properties

Instead of manually listing $fillable properties in a model, enums can provide these dynamically, ensuring consistency.

namespace App\Enums;

enum ProductField: string
{
    case SKU = 'sku';
    case NAME = 'name';
    case LENGTH = 'length';
    case WIDTH = 'width';
    case HEIGHT = 'height';
    case VOLUME = 'volume';
    // ... other fields

    public static function getFillableFields(): array
    {
        return array_column(self::cases(), 'value');
    }
}

// In your Eloquent model:
class Product extends Model
{
    protected $fillable = ProductField::getFillableFields();
}

Key Validation Scenarios

The ProductValuesValidator encapsulates various checks, including:

  • Grouped Field Validation: Ensuring that related fields (e.g., product dimensions) are consistently null or defined together.
  • Uniqueness Checks: Verifying the uniqueness of SKUs and names, potentially across both the input sheet and the existing database.
  • Data Option Enforcement: Confirming that values for fields like origin or unit types (e.g., length_unit, volume_unit) adhere to predefined sets of options, often sourced from enums or configuration.
  • Numerical & Geometrical Constraints: Validating that quantities are positive real numbers and that calculated volumes respect geometrical rules.
  • Database Lookups: Dynamically fetching valid options from the database (e.g., registered warehouses or product types) to validate incoming data.

Example: Grouped Fields Validation

class ProductValuesValidator
{
    // ... constructor and other methods

    protected function validateDimensions(array $rowData, array &$notifications): void
    {
        $dimensionFields = ['length', 'width', 'height', 'volume'];
        $definedDimensions = array_filter($dimensionFields, fn($field) => !is_null($rowData[$field] ?? null));

        if (count($definedDimensions) > 0 && count($definedDimensions) < count($dimensionFields)) {
            $notifications[] = ['message' => 'All dimension fields (length, width, height, volume) must be either null or defined.'];
        }

        foreach ($definedDimensions as $field) {
            if (!is_numeric($rowData[$field]) || $rowData[$field] <= 0) {
                $notifications[] = ['message' => "{$field} must be a positive number."];
            }
        }
        // Further checks for units and geometrical constraints here
    }
}

Refactoring for Maintainability

During code review, several points emerged to further enhance maintainability:

  • Centralizing Enums: Consolidating enums in a dedicated location for easier management.
  • Dependency Inversion: Moving towards using contracts (interfaces) instead of concrete class implementations (e.g., SheetLoaderService) in constructors to promote loose coupling and testability.
  • Helper Methods: Introducing private helper methods to reduce code duplication in validation logic, particularly in test scenarios involving multiple checks.
  • Naming Conventions: Consistent renaming of variables and methods for clarity.

Key Insight

Developing a robust data validation system requires more than just checking individual fields; it demands a holistic approach that considers data sources, business logic, interdependencies, and maintainability. By leveraging services, enums, and adhering to SOLID principles, we can build a validation pipeline that is both powerful and adaptable to future changes.

Streamlining Product Data Validation with a Modular PHP Service
GERARDO RUIZ

GERARDO RUIZ

Author

Share: