Enhancing Data Integrity and Performance in Reporting Queries
Introduction
Recent code reviews have highlighted several opportunities to improve the robustness, performance, and maintainability of our application's reporting queries. These changes focus on ensuring data consistency, optimizing query execution, and adhering to coding standards.
Addressing Potential Issues
Explicit Facade Imports
We addressed an issue where facades (like File and DB) were used without explicit use statements. Explicit imports improve code clarity and prevent potential issues if facade aliases are disabled. This ensures the code adheres to our coding guidelines:
<?php
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\File;
// ...
File::exists('example.txt');
DB::table('app_config')->get();
Optimizing Collation Usage
Applying collation to the column side of a WHERE clause can prevent index usage, leading to slower query performance. We corrected this by applying the COLLATE clause to the parameter side, allowing MySQL to leverage indexes effectively.
Instead of:
WHERE column_name COLLATE utf8mb4_unicode_ci = @parameter
We now use:
WHERE column_name = @parameter COLLATE utf8mb4_unicode_ci
Ensuring Deterministic Subqueries
A subquery with a LIMIT 1 clause but without an ORDER BY clause can return unpredictable results. To ensure consistent behavior, we added an ORDER BY clause to make the subquery deterministic.
For example, instead of:
SELECT id FROM table WHERE name = 'example' LIMIT 1
We now use:
SELECT id FROM table WHERE name = 'example' ORDER BY id ASC LIMIT 1
Tie-breakers for Deterministic Ordering
When using ORDER BY with LIMIT, ties in the ordering column can lead to non-deterministic results. We added a secondary sort column to ensure consistent ordering.
Instead of:
ORDER BY total_cost DESC
We now use:
ORDER BY total_cost DESC, product_id ASC
Coding Style and Exception Handling
We also addressed minor coding style issues, such as adding a space after the not operator (!) and using use statements for exception classes instead of fully qualified names. This improves code readability and maintainability.
For example, we changed:
if(!File::exists($file)) {
throw new \Illuminate\Contracts\Filesystem\FileNotFoundException('File not found');
}
To:
use Illuminate\Contracts\Filesystem\FileNotFoundException;
if (! File::exists($file)) {
throw new FileNotFoundException('File not found');
}
Rollback Strategy Documentation
For migrations that update stored procedures, we added comments to the down() method to document the rollback strategy, which typically involves manually executing a previous version of the SQL file.
public function down(): void
{
// Procedure updates are not rolled back automatically.
// To restore the previous version, manually execute the prior SQL file
// or create a new migration with the old procedure definition.
}
Handling NULL Values in Comparisons
When comparing a nullable column to a value, we need to handle NULL values explicitly to avoid unexpected filtering. We updated conditions like column != 'value' to (column IS NULL OR column != 'value').
Instead of:
WHERE column != 'flexible'
We now use:
WHERE (column IS NULL OR column != 'flexible')
Preserving Product Identity in Aggregations
To prevent merging distinct products with identical names during aggregations, we included the product_id in the GROUP BY clause of relevant CTEs. This ensures that each product is treated as a unique entity.
Instead of:
GROUP BY product_name, category
We now use:
GROUP BY product_id, product_name, category
Accounting for Multi-Year Rates
We identified an issue where rates were joined based only on the start year of the report, leading to incorrect cost calculations for multi-year reports. We corrected this by joining rates based on the year of the transaction.
Instead of:
JOIN rates RT ON RT.year = YEAR(reportStartDate)
We now use:
JOIN rates RT ON RT.year = YEAR(date_of_transaction)
Preventing Recipe Double-Counting
We addressed an issue where recipes could be double-counted across the report window. To fix this, we joined recipes to sales data before aggregation, ensuring that each sale is linked to the correct recipe version based on its transaction date.
Handling Null Mass Values
To avoid null costs, we now handle null mass values by either filtering them out at the join or using COALESCE to replace them with 0 in calculations.
Conclusion
These improvements enhance the reliability and efficiency of our reporting queries. By addressing potential issues related to data consistency, query performance, and coding standards, we ensure that our application provides accurate and timely insights. Key takeaways include the importance of explicit imports, correct collation usage, deterministic queries, proper NULL handling, and accurate aggregation logic.