Optimizing the price of retail products

1. Introduction

Last Updated: 2021-09-15

The data required to drive pricing insight and optimization is disparate by nature (different systems, different local realities, etc.), so it is crucial to develop a well-structured, standardized, and clean CDM table. This includes key attributes for pricing optimization, like transactions, products, prices, and customers. In this document, we walk you through the steps outlined below, providing a quick start for pricing analytics that you can extend and customize for your own needs. The following diagram outlines the steps covered in this document.


  1. Assess data sources: First, you must get an inventory of data sources that will be used to create the CDM. In this step, Dataprep is also used to explore and identify issues from the input data. For example, missing and mismatched values, inconsistent naming conventions, duplicates, data integrity issues, outliers, etc.
  2. Standardize data: Next, previously identified issues are fixed to ensure data accuracy, integrity, consistency, and completeness. This process can involve various transformations in Dataprep, such as date formatting, value standardization, unit conversion, filtering out unnecessary fields and values, and splitting, joining, or deduplicating the source data.
  3. Unify in one structure: The next stage of the pipeline joins each data source into a single, wide table in BigQuery containing all attributes at the finest granular level. This denormalized structure allows for efficient analytical queries that do not require joins.
  4. Deliver analytics & ML/AI: Once data is clean and formatted for analysis, analysts can explore historical data to understand the impact of prior pricing changes. Additionally, BigQuery ML can be used to create predictive models that estimate future sales. The output of these models can be incorporated into dashboards within Looker to create "what-if scenarios" where business users can analyze what sales may look like with certain price changes.

The following diagram shows the Google Cloud components used to build the Pricing Optimization Analytics Pipeline.


What you'll build

Here we'll show you how to design a pricing optimization data warehouse, automate the data preparation over time, use machine learning to predict the impact of changes to product pricing, and develop reports to provide actionable insights to your team.

What you'll learn

  • How to connect Dataprep to data sources for pricing analytics, which can be stored in relational databases, flat files, Google Sheets, and other supported applications.
  • How to build a Dataprep flow to create a CDM table in your BigQuery data warehouse.
  • How to use BigQuery ML to predict future revenue.
  • How to build reports in Looker to analyze historical pricing and sales trends, and to understand the impact of future price changes.

What you'll need

2. Create the CDM in BigQuery

In this section, you create the Common Data Model (CDM), which provides a consolidated view of the information that you need to analyze and suggest pricing changes.

  1. Open the BigQuery console.
  2. Select the project you want to use to test this reference pattern.
  3. Use an existing dataset or create a BigQuery dataset. Name the dataset Pricing_CDM.
  4. Create the table:
create table `CDM_Pricing`
  Fiscal_Date DATETIME,
  Product_ID STRING,
  Client_ID INT64,
  Customer_Hierarchy STRING,
  Division STRING,
  Market STRING,
  Channel STRING,
  Customer_code INT64,
  Customer_Long_Description STRING,
  Key_Account_Manager INT64,
  Key_Account_Manager_Description STRING,
  Structure STRING,
  Invoiced_quantity_in_Pieces FLOAT64,
  Gross_Sales FLOAT64,
  Trade_Budget_Costs FLOAT64,
  Cash_Discounts_and_other_Sales_Deductions INT64,
  Net_Sales FLOAT64,
  Variable_Production_Costs_STD FLOAT64,
  Fixed_Production_Costs_STD FLOAT64,
  Other_Cost_of_Sales INT64,
  Standard_Gross_Margin FLOAT64,
  Transportation_STD FLOAT64,
  Warehouse_STD FLOAT64,
  Gross_Margin_After_Logistics FLOAT64,
  List_Price_Converged FLOAT64

3. Assess data sources

In this tutorial, you use sample data sources which are stored in Google Sheets and BigQuery.

  • The transactions Google Sheet which contains one row for each transaction. It has details like the quantity of each product sold, total gross sales, and associated costs.
  • The product pricing Google Sheet which contains the price of each product for a given customer for each month.
  • The company_descriptions BigQuery table which contains individual customers information.

This company_descriptions BigQuery table can be created using the following statement:

create table `Company_Descriptions`
 Customer_ID INT64,
 Customer_Long_Description STRING
insert into `Company_Descriptions` values (15458, 'ENELTEN');
insert into `Company_Descriptions` values (16080, 'NEW DEVICES CORP.');
insert into `Company_Descriptions` values (19913, 'ENELTENGAS');
insert into `Company_Descriptions` values (30108, 'CARTOON NT');
insert into `Company_Descriptions` values (32492, 'Thomas Ed Automobiles');

4. Build the flow

In this step, you import a sample Dataprep flow, which you use to transform and unify the example datasets listed in the previous section. A flow represents a pipeline, or an object that brings together datasets and recipes, which are used to transform and join them.

  1. Download the Pricing Optimization Pattern flow package from GitHup, but don't unzip it. This file contains the Pricing Optimization Design Pattern flow used to transform the sample data.
  2. In Dataprep, click the Flows icon in the left navigation bar. Then in the Flows view, select Import from the context menu. After you have imported the flow, you can select it to view and edit it.


  1. On the left side of the flow, the Product Pricing and each of the three Transactions Google Sheets must be connected as datasets. To do this, right-click on the Google Sheets dataset objects and select Replace. Then click on the Import Datasets link. Click the "Edit path" pencil, as shown in the following diagram.


Replace the current value with the link pointing to transactions and product pricing Google Sheets .

When Google Sheets contain several tabs you can select the tab you want to use in the menu. Click Edit and select the tabs that you want to use as the data source, then click Save and click Import & Add to Flow. When you are back in the modal, click Replace. In this flow, each sheet is represented as its own dataset to demonstrate unioning disparate sources later in a later recipe.


  1. Define BigQuery Output tables:

In this step, you will associate the location for the BigQuery CDM_Pricing output table to be loaded every time you run the Dataoprep job.

In the Flow View, click the Schema Mapping Output icon, in the Details Panel, click the Destinations tab. From there, edit both the Manual Destinations output used for testing, and the Scheduled Destinations output used when you want to automate your entire flow. Follow these instructions to do so:

  1. Edit the "Manual destinations" In the Details Panel, under the Manual Destinations section, click the Edit button. On the Publishing Settings page, under the Publishing Actions, if a publishing action already exists, Edit it, otherwise click the Add Action button. From there, navigate the BigQuery datasets to the Pricing_CDM dataset you created in a previous step and select the CDM_Pricing table. Confirm that Append to this table every run is checked and then click Add Click Save Settings.
  2. Edit the "Scheduled Destinations"

In the Details Panel, under the Scheduled Destinations section, click Edit.

The settings are inherited from the Manual Destinations and you don't need to make any changes. Click Save Settings.

5. Standardize data

The provided flow unions, formats and cleans the transactions data, and then joins the result with the company descriptions and aggregated pricing data for reporting. Here you will walk through the components of the flow, which can be seen in the image below.


6. Explore Transactional Data Recipe

First, you will explore what happens within the Transactional Data Recipe, which is used to prepare transactions data. Click the Transaction Data object in the Flow View, on the Details Panel, click the Edit Recipe button.

The Transformer Page opens with the Recipe presented in the Details Panel. The Recipe contains all the transformation steps that are applied to the data. You can navigate within the Recipe by clicking in between each of the steps to see the state of the data in this particular position in the Recipe.

You can also click the More menu for each Recipe step, and select Go to Selected or Edit it to explore how the transformation works.

  1. Union Transactions: The first step in the transactional data recipe unions transactions stored in different sheets representing each month.
  2. Standardize Customer Descriptions: The next step in the recipe standardizes customer descriptions. It means that customer names may be similar with slight changes and we want to normalize them as on name. The recipe demonstrates two potential approaches. First, it leverages the Standardization Algorithm, which can be configured with different standardization options such as "Similar strings" where values with characters in common are clustered together, or "Pronunciation" where values that sound alike are clustered together. Alternatively, you can lookup the company description in the BigQuery table referenced above, using the company ID.

You can navigate further the recipe to discover the various other techniques that are applied to clean and format the data: delete rows, format based on patterns, enrich data with lookups, dealing with missing values, or replace unwanted characters.

7. Explore Product Pricing Data Recipe

Next, you can explore what happens in the Product Pricing Data Recipe, which joins the prepared transactions data onto the aggregated pricing data.

Click the PRICING OPTIMIZATION DESIGN PATTERN at the top of the page to close the Transformer Page and go back to the Flow View. From there click the Product Pricing Data object and Edit the Recipe.

  1. Unpivot monthly price columns: Click the recipe in between the 2 and 3 steps, to see how the data looks like before the Unpivot step. You'll notice that the data contains the transaction value in a distinct column for each month: Jan Fev Mar. This is not a format that is convenient for applying aggregation (i.e sum, average transaction) calculation in SQL. The data needs to be unpivoted so that each column becomes a row in the BigQuery table. The recipe leverages the unpivot function to transform the 3 columns into one row for each month so it is easier to apply group calculations furthermore.
  2. Calculate the average transaction value by client, product, and date: We want to calculate the average transaction value for each client, product, and data. We can use the Aggregate function and generate a new table (option "Group by as a new table"). In that case, the data is aggregated at the group level and we lose the details of each individual transaction. Or we can decide to keep both the details and the aggregated values in the same dataset (option "Group by as a new column(s)") which becomes very convenient to apply a ratio (i.e. % contribution of product category to the overall revenue). You can try this behavior by editing the recipe step 7 and select the option "Group by as a new table" or "Group by as a new column(s)" to see the differences.
  3. Join pricing date: Finally, a join is used to combine multiple datasets into a larger one adding columns to the initial dataset. In this step, the pricing data is joined with the output of the Transactional Data Recipe based on ‘Pricing Data.Product Code' = Transaction Data.SKU' and ‘Pricing Data.Price Date' = ‘Transaction Data.Fiscal Date'

To learn more about the transformations you can apply with Dataprep, see the Trifacta Data Wrangling Cheat Sheet

8. Explore Schema Mapping Recipe

The last recipe, Schema Mapping ensures that the resulting CDM table matches the existing BigQuery output table's schema. Here, the Rapid Target functionality is used to reformat the data structure to match the BigQuery table using fuzzy matching to compare both schemas and apply automatic changes.

9. Unify in one structure

Now that sources and destinations have been configured, and the steps of the flows have been explored, you can run the flow to transform and load the CDM table into BigQuery.

  1. Run Schema Mapping output: In the flow view, select the Schema Mapping output object and click the "Run" button in the Details panel. Select "Trifacta Photon" Running Environment and uncheck Ignore Recipe Errors. Then click the Run button. If the specified BigQuery table exists, Dataprep will append new rows, otherwise, it will create a new table.
  2. View the job status: Dataprep automatically opens the Run Job page so you can monitor the job execution. It should take a few minutes to proceed and load the BigQuery table. When the job is completed, the pricing CDM output will be loaded in BigQuery in a clean, structured, and normalized format ready for analysis.

10. Deliver analytics & ML/AI

Analytics Prerequisites

In order to run some analytics and a predictive model with interesting outcomes, we have created a data set that is larger and relevant to discover specific insights. You need to upload this data into your BigQuery dataset before continuing this guide.

Click the + Create Table button and define these parameters:

  • Create table from upload and select the CDM_Pricing_Large_Table.csv file
  • Schema Autodetect, check Schema and input parameters
  • Advanced options, Write preference, Overwrite table


  • Click Create table

After the table is created and the data uploaded, in the Google Console for BigQuery, you should see the details of the new table as shown below. With the pricing data in BigQuery, we can easily ask more comprehensive questions to analyze your pricing data on a deeper level.


11. View the effect of pricing changes

One example of something you may want to analyze is a change in order behavior when you have previously changed the price of an item.

  1. First, you create a temporary table that has one line each time the price of a product has changed, with information about that particular product pricing like how many items were ordered with each price and the total net sales associated with that price.
create temp table price_changes as (
       lag(list_price_converged) over(partition by product_id order by first_price_date asc) as previous_list,
       lag(total_ordered_pieces) over(partition by product_id order by first_price_date asc) as previous_total_ordered_pieces,
       lag(total_net_sales) over(partition by product_id order by first_price_date asc) as previous_total_net_sales,
       lag(first_price_date) over(partition by product_id order by first_price_date asc) as previous_first_price_date
       from (
               product_id,list_price_converged,sum(invoiced_quantity_in_pieces) as total_ordered_pieces, sum(net_sales) as total_net_sales, min(fiscal_date) as first_price_date
           from `{{my_project}}.{{my_dataset}}.CDM_Pricing` AS cdm_pricing
           group by 1,2
           order by 1, 2 asc
select * from price_changes where previous_list is not null order by product_id, first_price_date desc


  1. Next, with the temporary table in place, you can calculate the average price change across SKUs:
select avg((previous_list-list_price_converged)/nullif(previous_list,0))*100 as average_price_change from price_changes;
  1. Finally, you can analyze what happens after a price has been changed by looking at the relationship between each price change and the total amount of items that were ordered:




from price_changes

12. Build a time series forecasting model

Next, with BigQuery's built in machine learning capabilities, you can build an ARIMA time series forecasting model to predict the quantity of each item that will be sold.

  1. First you create a ARIMA_PLUS model
create or replace `{{my_project}}.{{my_dataset}}.bqml_arima`
 (model_type = 'ARIMA_PLUS',
  time_series_timestamp_col = 'fiscal_date',
  time_series_data_col = 'total_quantity',
  time_series_id_col = 'product_id',
  auto_arima = TRUE,
  data_frequency = 'AUTO_FREQUENCY',
  decompose_time_series = TRUE
 ) as
 sum(invoiced_quantity_in_pieces) as total_quantity
group by 1,2;
  1. Next, you use the ML.FORECAST function to predict future sales across each product:
ML.FORECAST(model testing.bqml_arima,
            struct(30 as horizon, 0.8 as confidence_level));
  1. With these predictions available, you can try to understand what might happen if you raise prices. For example, if you raise the price of every product by 15% you could calculate the estimated total revenue for the next month with a query like this:
sum(forecast_value * list_price) as total_revenue
from ml.forecast(mode testing.bqml_arima,
            struct(30 as horizon, 0.8 as confidence_level)) forecasts
left join (select product_id,
                       order by fiscal_date desc limit 1)[offset(0)] as list_price
                from `leigha-bq-dev.retail.cdm_pricing` group by 1)  recent_prices
using (product_id);

13. Build a report

Now that your de-normalized pricing data is centralized in BigQuery, and you understand how to run meaningful queries against this data, it's time to build a report to allow business users to explore and act on this information.

If you already have a Looker instance you can use the LookML in this GitHub repository to get started analyzing the pricing data for this pattern. Simply create a new Looker project, add the LookML, and replace the connection and table names in each of the view files to match your BigQuery configuration.

In this model, you will find the derived table ( in this view file) that we showed earlier to examine price changes:

view: price_changes {
    derived_table: {
      sql: select
        lag(list_price_converged) over(partition by product_id order by first_price_date asc) as previous_list,
        lag(total_ordered_pieces) over(partition by product_id order by first_price_date asc) as previous_total_ordered_pieces,
        lag(total_net_sales) over(partition by product_id order by first_price_date asc) as previous_total_net_sales,
        lag(first_price_date) over(partition by product_id order by first_price_date asc) as previous_first_price_date
        from (
         product_id,list_price_converged,sum(invoiced_quantity_in_pieces) as total_ordered_pieces, sum(net_sales) as total_net_sales, min(fiscal_date) as first_price_date
      from ${cdm_pricing.SQL_TABLE_NAME}  AS cdm_pricing
      group by 1,2
      order by 1, 2 asc

As well as the BigQuery ML ARIMA model we showed earlier, to predict future sales ( in this view file)

view: arima_model {
  derived_table: {
    persist_for: "24 hours"
      create or replace model ${sql_table_name}
              (model_type = 'arima_plus',
               time_series_timestamp_col = 'fiscal_date',
               time_series_data_col = 'total_quantity',
               time_series_id_col = 'product_id',
               auto_arima = true,
               data_frequency = 'auto_frequency',
               decompose_time_series = true
              ) as
              sum(invoiced_quantity_in_pieces) as total_quantity
            group by 1,2 ;;

The LookML also contains a sample dashboard. You can access a demo version of the dashboard here. The first part of the dashboard gives users high level information about changes in sales, costs, pricing and margins. As a business user, you may want to create an alert to know if sales have dropped below X% as this may mean you should lower prices.


The next section, which is shown below, allows users to dig into trends surrounding pricing changes. Here, you can drill down into specific products to see the exact list price and what prices were changed to - which can be helpful for pinpointing specific products to do more research on.


Finally, at the bottom of the report you have the results of our BigQueryML model. Using the filters at the top of the Looker dashboard, you can easily enter parameters to simulate different scenarios similar as described above. For example, seeing what would happen if order volume went down to 75% of the predicted value, and pricing across all products was raised by 25%, as shown below


This is powered by parameters in LookML, which are then incorporated directly into the measure calculations found here. With this type of reporting, you can find the optimal pricing for all products or drill into specific products to determine where you should raise or discount prices and what the outcome on the gross and net revenue would be.

14. Adapt to your pricing systems

While this tutorial transforms sample data sources, you'll face very similar data challenges for the pricing assets that live on your various platforms. Pricing assets have different export formats (often xls, sheets, csv, txt, relational databases, business applications) for summary and detailed results, each of which can be connected to Dataprep. We recommend that you begin by describing your transformation requirements similarly to the examples provided above. After your specifications are clarified and you have identified the types of transformations needed, you can design them with Dataprep.

  1. Make a copy of the Dataprep flow (click the **... "**more" button at the right of the flow and select the Duplicate option) that you'll customize, or just start from scratch using a new Dataprep flow.
  2. Connect to your own pricing dataset. File formats like Excel, CSV, Google Sheets, JSON are natively supported by Dataprep. You can also connect to other systems using Dataprep connectors.
  3. Dispatch your data assets into the various transformation categories you identified. For each category, create one recipe. Get some inspiration from the flow provided in this design pattern to transform the data and write your own recipes. If you get stuck, no worries, ask for help in the chat dialog at the bottom left of the Dataprep screen.
  4. Connect your recipe to your BigQuery instance. You don't need to worry about creating the tables manually in BigQuery, Dataprep will take care of it for you automatically. We suggest when you add the output to your flow, to select a Manual Destination and Drop the table at each run. Test each recipe individually up to the point you deliver the expected results. After your testing is done, you will convert the output to Append to the table at each run to avoid deleting the prior data.
  5. You can optionally associate the flow to run on schedule. This is something useful if your process needs to run continuously. You could define a schedule to load the response every day or every hour based on your required freshness. If you decide to run the flow on a schedule, you'll need to add a Schedule Destination Output in the flow for each recipe.

Modify the BigQuery Machine Learning Model

This tutorial provides a sample ARIMA model. However, there are additional parameters that you can control when you're developing the model to make sure that it fits your data best. You can see more details in the example within our documentation here. Additionally, you You can also use the BigQuery ML.ARIMA_EVALUATE, ML.ARIMA_COEFFICIENTS, and ML.EXPLAIN_FORECAST functions to get more details on your model and make optimization decisions.

Edit Looker reports

After importing the LookML into your own project as described above, you can make direct edits to add additional fields, modify calculations or user entered parameters, and change the visualizations on the dashboards to suit your business needs. You can find details on developing in LookML here, and visualizing data in Looker here.

15. Congratulations

You now know the key steps required to optimize your retail products' pricing!

What's next?

Explore other smart analytics reference patterns

Further reading