Background
The Utilities company provides energy solutions to commercial, industrial, and retail customers. For large Commercial and Industrial (C&I) customers with complex energy needs, The Utilities company utilises a specialised pricing tool. This system generates customised pricing models and quotes for C&I customers based on
factors like usage data, spot prices, and predefined pricing parameters. The tool was built using legacy technologies. It ingested data from various sources like their CRM system and meter data to generate pricing models. However, this tool was being depreciated and end of service life.
Challenges
- Implementing complex ETL processes for diverse data objects (tables and views)
- Managing schema changes and data type conversions across multiple environments
- Orchestrating multi-stage deployment pipeline (UAT, preproduction, production)
- Ensuring data consistency and integrity during transformations
- Optimizing performance for large-scale data processing
- Implementing Slowly Changing Dimension (SCD) type validation
- Integrating multiple data sources (Infoserver, EDS, BDS) into a unified system
Solutions
- Developed modular transformation logic using SQL and Databricks
- Implemented metadata-driven approach using YAML configuration files
- Utilized Azure DevOps for version control and CI/CD pipeline management
- Leveraged AWS S3 for artifact and metadata storage
- Employed Databricks for data processing and Delta Lake for versioned storage
- Implemented data validation and unit testing processes
- Developed custom DDL runners for environment-specific deployments
Results
- Successfully deployed 40+ data objects to production environment
- Implemented efficient ETL processes for real-time and batch data ingestion
- Achieved data consistency across multiple systems (Infoserver, EDS, BDS)
- Enhanced data accessibility for business users with appropriate privileges
- Improved monitoring and alerting capabilities for data pipelines
- Optimized query performance through proper data type casting and indexing
- Implemented event-driven architecture for Thomson Reuters data ingestion