PHP MySQL SQL

Enhancing Data Integrity and Performance in Reporting Queries

Introduction

Recent code reviews have highlighted several opportunities to improve the robustness, performance, and maintainability of our application's reporting queries. These changes focus on ensuring data consistency, optimizing query execution, and adhering to coding standards.

Addressing Potential Issues

Explicit Facade Imports

We addressed an issue where facades (like File and DB) were used without explicit use statements. Explicit imports improve code clarity and prevent potential issues if facade aliases are disabled. This ensures the code adheres to our coding guidelines:

<?php

use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\File;

// ...

File::exists('example.txt');
DB::table('app_config')->get();

Optimizing Collation Usage

Applying collation to the column side of a WHERE clause can prevent index usage, leading to slower query performance. We corrected this by applying the COLLATE clause to the parameter side, allowing MySQL to leverage indexes effectively.

Instead of:

WHERE column_name COLLATE utf8mb4_unicode_ci = @parameter

We now use:

WHERE column_name = @parameter COLLATE utf8mb4_unicode_ci

Ensuring Deterministic Subqueries

A subquery with a LIMIT 1 clause but without an ORDER BY clause can return unpredictable results. To ensure consistent behavior, we added an ORDER BY clause to make the subquery deterministic.

For example, instead of:

SELECT id FROM table WHERE name = 'example' LIMIT 1

We now use:

SELECT id FROM table WHERE name = 'example' ORDER BY id ASC LIMIT 1

Tie-breakers for Deterministic Ordering

When using ORDER BY with LIMIT, ties in the ordering column can lead to non-deterministic results. We added a secondary sort column to ensure consistent ordering.

Instead of:

ORDER BY total_cost DESC

We now use:

ORDER BY total_cost DESC, product_id ASC

Coding Style and Exception Handling

We also addressed minor coding style issues, such as adding a space after the not operator (!) and using use statements for exception classes instead of fully qualified names. This improves code readability and maintainability.

For example, we changed:

if(!File::exists($file)) {
    throw new \Illuminate\Contracts\Filesystem\FileNotFoundException('File not found');
}

To:

use Illuminate\Contracts\Filesystem\FileNotFoundException;

if (! File::exists($file)) {
    throw new FileNotFoundException('File not found');
}

Rollback Strategy Documentation

For migrations that update stored procedures, we added comments to the down() method to document the rollback strategy, which typically involves manually executing a previous version of the SQL file.

public function down(): void
{
    // Procedure updates are not rolled back automatically.
    // To restore the previous version, manually execute the prior SQL file
    // or create a new migration with the old procedure definition.
}

Handling NULL Values in Comparisons

When comparing a nullable column to a value, we need to handle NULL values explicitly to avoid unexpected filtering. We updated conditions like column != 'value' to (column IS NULL OR column != 'value').

Instead of:

WHERE column != 'flexible'

We now use:

WHERE (column IS NULL OR column != 'flexible')

Preserving Product Identity in Aggregations

To prevent merging distinct products with identical names during aggregations, we included the product_id in the GROUP BY clause of relevant CTEs. This ensures that each product is treated as a unique entity.

Instead of:

GROUP BY product_name, category

We now use:

GROUP BY product_id, product_name, category

Accounting for Multi-Year Rates

We identified an issue where rates were joined based only on the start year of the report, leading to incorrect cost calculations for multi-year reports. We corrected this by joining rates based on the year of the transaction.

Instead of:

JOIN rates RT ON RT.year = YEAR(reportStartDate)

We now use:

JOIN rates RT ON RT.year = YEAR(date_of_transaction)

Preventing Recipe Double-Counting

We addressed an issue where recipes could be double-counted across the report window. To fix this, we joined recipes to sales data before aggregation, ensuring that each sale is linked to the correct recipe version based on its transaction date.

Handling Null Mass Values

To avoid null costs, we now handle null mass values by either filtering them out at the join or using COALESCE to replace them with 0 in calculations.

Conclusion

These improvements enhance the reliability and efficiency of our reporting queries. By addressing potential issues related to data consistency, query performance, and coding standards, we ensure that our application provides accurate and timely insights. Key takeaways include the importance of explicit imports, correct collation usage, deterministic queries, proper NULL handling, and accurate aggregation logic.

Gerardo Ruiz

Gerardo Ruiz

Author

Share: