adopts the Lakehouse as an accelerator for modernizing supply chain analytics.
Slalom worked with one customer that distributed thousands of everyday consumer products to over 100 countries globally, helping communities around the world lead healthier lives. Ensuring the right products reached those in need in a timely fashion required a robust data and analytics system capable of providing insights from the production line through delivery on store shelves.
When it came time to modernize their analytics stack, the supply chain analytics team enlisted the help of Slalom, seeking to partner with a global leader in strategy and technology consulting. Together, we adopted Microsoft Azure and the Databricks Lakehouse for Healthcare and Life Sciences as the foundation to modernize their technology infrastructure, resulting in both automated shipment tracking processes, as well as more accurate forecasting.
Our customer’s performance team relies on On-Time, In-Full (OTIF), and Proof of Delivery metrics as a key piece of their supply chain performance analytics for their products; however, their primary shipping carrier only allowed automated data access through a complex API, providing heavily nested XML with an unpredictable schema. As a result, our customer was manually tracking shipments to feed into OTIF and relying on a third party to deliver Proof of Delivery over a month later, costing them valuable time to resolve shipment issues or improve supply chain performance during shipment. Our solution was to build a data pipeline using python on Databricks, which interacts with the carrier's API, crawls through the XML structure, normalizes it into a flat and predictable schema, and outputs the data into a table that could be pulled into several key dashboards for stakeholder use. It is critical that this data is reliably updated throughout the day as shipments are delivered. Due to the complex nesting and often unpredictable XML structure (tags change, important data elements shift in the tree, API responses sometimes contain malformed XML), we built a number of components in our pipeline that can handle and log exceptions, flatten the data to a reliable structure, and interact with supporting services such as data visualization tools. A key requirement of our solution is the ability to have each component structured so it can be developed independently (sometimes in different notebooks) and yet interact seamlessly from the main entry point to the pipeline.
Databricks natively supports python and allows us to easily install a multitude of key libraries that are needed in our solution. Databricks notebooks integrate with CICD pipelines and allow us to import core code in our pipelines and import any custom libraries or classes as a part of our solution. The pipeline runs on Azure Databricks which is scheduled through Azure Data Factory and outputs the data to Azure Synapse. Databricks acts as the perfect middle step to get the data and manipulate it in a reliable and quick fashion.
To the next Use Case >>>
Following shipment tracking automation, we shifted our focus to improving forecasting with the data science functionality of Databricks. This is critical because if a forecasted amount is outside of 20% variance, there are fines based on contracts representing millions of dollars of potential risk. The prior forecasting method involved significant manual effort that slowed the process considerably, missing shifting patterns and behavior due to the pandemic.
Our solution was to build a statistical time series model that incorporated the actual, erratic behavior. Databricks suite of ML tools and native support for R helped in developing and evaluating the time series models. The exponential smoothing model (ETS) was chosen among others (Arima, TBATS), as it performed the best in time series cross-validation. The model successfully weighted recent erratic behavior: the ETS model prioritized the most recent data points and had the rest exponentially decay in importance (hence exponential smoothing) while also capturing seasonality and trend.
The model runs on Azure Databricks as a weekly job leveraging the in-built CRON scheduling feature that publishes the output of the job to the Azure Synapse using the Databricks Synapse connector. Publishing the process through Databricks reduced the manual work while harnessing the power of the Databricks cluster, which enables one-click scaling.
Supply chain organizations are modernizing their data and analytics capabilities to minimize the risk of lost sales, reduce carrying costs, and improve customer satisfaction. To help ensure your modernization effort is effective and produces your desired outcomes, it is important to start with identifying success measures and building an analytics roadmap that can directly improve that measure.
Once the key performance measure(s) have been identified, Slalom ‘works backward’ in partnership with our customers to define the user journey, data visualization design, requisite data, and AI, as well as operationalizing the end-to-end pipeline. Supply chain organizations are adopting a dedicated environment known as a Control Tower. This is a single source of data for your business that reduces manual data wrangling and analysis by bringing ingestion, governance, analysis, predictive modeling, visualization, and business insights all into one platform.
Impactful & Actionable Metrics >>>