Read BigQuery ML predictions in SAP using ABAP SDK for Google Cloud

1. Introduction

In this codelab, you'll create a machine learning (ML) model in BigQuery and get predictions from this model using ABAP SDK for Google Cloud.

You'll leverage the following Google Cloud services:

  • BigQuery
  • Cloud Shell

What you'll build

You'll create the following:

  • A BigQuery Machine Learning (ML) model.
  • A service account with BigQuery Job User role to call BigQuery API.
  • An ABAP program to call BigQuery API and get predictions from the ML model.

2. Requirements

  • A browser, such as Chrome or Firefox.
  • A Google Cloud project with billing enabled or Create a 90-Day Free Trial account for Google Cloud Platform.
  • SAP GUI (Windows or Java) installed in your system. If SAP GUI is already installed on your laptop, connect to SAP using the VM external IP address as the Application Server IP. If you are on Mac, then you can also install the SAP GUI for Java available in this link.

3. Before you begin

6757b2fb50ddcc2d.png

  • Run the following commands in Cloud Shell to authenticate for your account and set the default project to abap-sdk-poc. Zone us-west4-b is used as an example. If needed, please change the project and zone in the following commands based on your preference.
gcloud auth login
gcloud config set project abap-sdk-poc
gcloud config set compute/zone us-west4-b
  • You must have access to an SAP system with the ABAP SDK for Google Cloud installed.
  • You must complete codelab 1 (Install ABAP Platform Trial 1909 on Google Cloud Platform and Install ABAP SDK for Google Cloud) and codelab 2 (Configure ABAP SDK Authentication using tokens for SAP Hosted on Compute Engine VM) before proceeding with this codelab.
  • If you have completed codelab 1 and codelab 2, this would have provisioned you with an ABAP Platform Trial 1909 System on Google Cloud, along with the required setup for authentication and connectivity.
  • If you have not completed codelab 1 and codelab 2, you will not have all the required infrastructure and connectivity to perform the steps provided in this codelab. Therefore, you must complete codelab 1 and codelab 2 before proceeding with this codelab.

4. Enable BigQuery API V2 in your Google Cloud Project

  1. From the Cloud Console, click Activate Cloud Shell on the top right corner:

6757b2fb50ddcc2d.png

  1. Run the following commands in Cloud Shell to enable BigQuery API:
gcloud services enable bigquery.googleapis.com

On successful execution you should see a message displayed as shown below

b5f52859df2c2f56.png

You should now have the BigQuery API enabled in your Google Cloud Project.

5. Create a Service Account for Secure BigQuery Access

To securely get ML predictions from a BigQuery ML model, you need to create a service account with the BigQuery Job User and BigQuery Data Viewer roles, which will allow your program to run queries (as jobs) within the project and read data from tables. This role grants only the necessary permission to create jobs and read data, minimizing security risks.

Create a service account

To create a service account with required role, perform the following steps:

  1. Run the following command in the Cloud Shell terminal:
gcloud iam service-accounts create abap-sdk-bigquery-jobuser --display-name="Service Account for BigQuery Job user"
  1. Now add the required roles to the service account created in the previous step:
gcloud projects add-iam-policy-binding abap-sdk-poc --member='serviceAccount:abap-sdk-bigquery-jobuser@abap-sdk-poc.iam.gserviceaccount.com' --role='roles/bigquery.jobUser'

gcloud projects add-iam-policy-binding abap-sdk-poc --member='serviceAccount:abap-sdk-bigquery-jobuser@abap-sdk-poc.iam.gserviceaccount.com' --role='roles/bigquery.dataViewer'

The above command uses abap-sdk-poc as a placeholder for the Google Cloud Project. Replace abap-sdk-poc with your project id.

  1. To verify, the role has been added, go to IAM page. The service account you created should be listed along with the role that has been assigned to it.

6. Creating a BigQuery Machine Learning model

In this codelab, we will create a k-means model to cluster London bicycle hires dataset. You can apply the k-means algorithm to group your data into clusters. Unlike supervised machine learning, which is about predictive analytics, unsupervised learning is about descriptive analytics. It's about understanding your data so that you can make data-driven decisions.

Create your dataset

To create a BigQuery dataset to store your ML model, perform the following steps:

  1. In the Google Cloud console, go to the BigQuery page. Go to the BigQuery page
  2. In the Explorer pane, click your project name.
  3. Click 5cf3b742649f1e2c.png View actions > Create dataset.

3fbc072041bfa313.png

  1. On the Create dataset page, do the following:
  • For Dataset ID, enter bqml_tutorial.
  • For Location type, select Multi-region, and then select EU (multiple regions in European Union). The London Bicycle Hires public dataset is stored in the EU multi-region. Your dataset must be in the same location.
  • Leave the remaining default settings as they are, and click Create dataset. Create dataset page.

Create a k-means model

Now that your dataset is set up, the next step is to create a k-means model using the data. You can create and train a k-means model using the CREATE MODEL statement with the option model_type=kmeans.

To run the query and create a k-means model, perform the following steps:

  1. Go to the BigQuery page. Go to BigQuery
  2. In the editor pane, run the following SQL statement:
CREATE OR REPLACE MODEL `bqml_tutorial.london_station_clusters`
  OPTIONS(model_type='kmeans', num_clusters=4) AS
WITH
  hs AS (
  SELECT
    h.start_station_name AS station_name,
  IF
    (EXTRACT(DAYOFWEEK
      FROM
        h.start_date) = 1
      OR EXTRACT(DAYOFWEEK
      FROM
        h.start_date) = 7,
      "weekend",
      "weekday") AS isweekday,
    h.duration,
    ST_DISTANCE(ST_GEOGPOINT(s.longitude,
        s.latitude),
      ST_GEOGPOINT(-0.1,
        51.5))/1000 AS distance_from_city_center
  FROM
    `bigquery-public-data.london_bicycles.cycle_hire` AS h
  JOIN
    `bigquery-public-data.london_bicycles.cycle_stations` AS s
  ON
    h.start_station_id = s.id
  WHERE
    h.start_date BETWEEN CAST('2015-01-01 00:00:00' AS TIMESTAMP)
    AND CAST('2016-01-01 00:00:00' AS TIMESTAMP) ),
  stationstats AS (
  SELECT
    station_name,
    isweekday,
    AVG(duration) AS duration,
    COUNT(duration) AS num_trips,
    MAX(distance_from_city_center) AS distance_from_city_center
  FROM
    hs
  GROUP BY
    station_name, isweekday)
SELECT
  * EXCEPT(station_name, isweekday)
FROM
  stationstats
  1. In the navigation panel, in the Resources section, expand your project name, click bqml_tutorial, and then click london_station_clusters.
  2. Click the Schema tab. The model schema lists the three station attributes that BigQuery ML used to perform clustering. The schema should look like the following:

5f1feb313bd0f6a5.png

  1. Click the Evaluation tab. This tab displays visualizations of the clusters identified by the k-means model. Under Numeric features, bar graphs display up to 10 of the most important numeric feature values for each centroid. You can select which features to visualize from the drop-down menu.

8f9b53971e33dc08.png

7. Get BigQuery ML predictions using ABAP SDK for Google Cloud

Now that you have set up the pre-requisites on the Google Cloud side, you are ready to complete steps in your SAP system to get predictions from the ML model using ABAP SDK for Google Cloud.

Create Client Key Configuration

For authentication and connectivity related configuration, the ABAP SDK for Google Cloud uses the tables /GOOG/CLIENT_KEY and /GOOG/SERVIC_MAP.

To maintain the configuration in the /GOOG/CLIENT_KEY table, perform the following steps:

  1. In the SAP GUI, enter transaction code SPRO.
  2. Click SAP Reference IMG.
  3. Click ABAP SDK for Google Cloud > Basic Settings > Configure Client Key

25871e639293b9ee.png

  1. Maintain the following values against the listed fields, leave all other fields blank:

Field

Value

Google Cloud Key Name

BIGQUERY_ML

Google Cloud Service Account Name

abap-sdk-bigquery-jobuser@abap-sdk-poc.iam.gserviceaccount.com

Google Cloud Scope

https://www.googleapis.com/auth/cloud-platform

Project ID

abap-sdk-poc

Authorization Class

/GOOG/CL_AUTH_GOOGLE

Build an ABAP Report to get predictions from the BigQuery ML model

To build an ABAP Report, perform the following steps:

  1. In your SAP GUI, go to transaction code SE38 and create a Report Program with the name ZDEMO_BIGQUERY_ML_PREDICT.
  2. In the pop-up that opens up, provide details as shown in the following image:

4cb32d50427df294.png

  1. In the next pop-up, either select Local Object or provide a package name as appropriate.
  2. In the ABAP Editor, add the following code:
REPORT zdemo_bigquery_ml_predict.

types:
  begin of lty_query_result,
    centroid_id     type i,
    station_name    type string,
    isweekday       type string,
    num_trips       type i,
    distance_from_city type string,
  end of lty_query_result,
  ltt_query_result type standard table of lty_query_result.

DATA:
  lv_project_id TYPE string,
  ls_input      TYPE /goog/cl_bigquery_v2=>ty_103,
  ls_output     TYPE lty_query_result,
  lt_output     TYPE ltt_query_result.

CONSTANTS:
  lc_newline TYPE c VALUE cl_abap_char_utilities=>newline.

TRY.
    "Initialize Bigquery object, pass the client key name that you have configured in /GOOG/CLIENT_KEY table
    DATA(lo_bq) = NEW /goog/cl_bigquery_v2( iv_key_name = 'BIGQUERY_ML' ).

    "Populate relevant parameters
    lv_project_id = lo_bq->gv_project_id.

    ls_input-default_dataset-project_id = 'abap-sdk-poc'.
    ls_input-default_dataset-dataset_id = 'bqml_tutorial'.

    "This query gets predictions from
    ls_input-query =
                | WITH | && lc_newline &&
                | hs AS ( | && lc_newline &&
                | SELECT | && lc_newline &&
                | h.start_station_name AS station_name, | && lc_newline &&
                | IF | && lc_newline &&
                | (EXTRACT(DAYOFWEEK | && lc_newline &&
                | FROM | && lc_newline &&
                | h.start_date) = 1 | && lc_newline &&
                | OR EXTRACT(DAYOFWEEK | && lc_newline &&
                | FROM | && lc_newline &&
                | h.start_date) = 7, | && lc_newline &&
                | "weekend", | && lc_newline &&
                | "weekday") AS isweekday, | && lc_newline &&
                | h.duration, | && lc_newline &&
                | ST_DISTANCE(ST_GEOGPOINT(s.longitude, | && lc_newline &&
                | s.latitude), | && lc_newline &&
                | ST_GEOGPOINT(-0.1, | && lc_newline &&
                | 51.5))/1000 AS distance_from_city_center | && lc_newline &&
                | FROM | && lc_newline &&
                | `bigquery-public-data.london_bicycles.cycle_hire` AS h | && lc_newline &&
                | JOIN | && lc_newline &&
                | `bigquery-public-data.london_bicycles.cycle_stations` AS s | && lc_newline &&
                | ON | && lc_newline &&
                | h.start_station_id = s.id | && lc_newline &&
                | WHERE | && lc_newline &&
                | h.start_date BETWEEN CAST('2015-01-01 00:00:00' AS TIMESTAMP) | && lc_newline &&
                | AND CAST('2016-01-01 00:00:00' AS TIMESTAMP) ), | && lc_newline &&
                | stationstats AS ( | && lc_newline &&
                | SELECT | && lc_newline &&
                | station_name, | && lc_newline &&
                | isweekday, | && lc_newline &&
                | AVG(duration) AS duration, | && lc_newline &&
                | COUNT(duration) AS num_trips, | && lc_newline &&
                | MAX(distance_from_city_center) AS distance_from_city_center | && lc_newline &&
                | FROM | && lc_newline &&
                | hs | && lc_newline &&
                | GROUP BY | && lc_newline &&
                | station_name, isweekday ) | && lc_newline &&
                | SELECT | && lc_newline &&
                | * EXCEPT(nearest_centroids_distance) | && lc_newline &&
                | FROM | && lc_newline &&
                | ML.PREDICT( MODEL `bqml_tutorial.london_station_clusters`, | && lc_newline &&
                | ( | && lc_newline &&
                | SELECT | && lc_newline &&
                | * | && lc_newline &&
                | FROM | && lc_newline &&
                | stationstats | && lc_newline &&
                | WHERE | && lc_newline &&
                | REGEXP_CONTAINS(station_name, 'Kennington'))) |.

    "Call API method: bigquery.jobs.query
    CALL METHOD lo_bq->query_jobs
      EXPORTING
        iv_p_project_id = lv_project_id
        is_input        = ls_input
      IMPORTING
        es_output       = DATA(ls_response)
        ev_ret_code     = DATA(lv_ret_code)
        ev_err_text     = DATA(lv_err_text)
        es_err_resp     = DATA(ls_err_resp).

    IF lo_bq->is_success( lv_ret_code ).
      "API Call successful, loop through the data & display the result
      IF ls_response-job_complete = abap_true.
        LOOP AT ls_response-rows ASSIGNING FIELD-SYMBOL(<ls_row>).
          LOOP AT <ls_row>-f ASSIGNING FIELD-SYMBOL(<ls_value>).
            ASSIGN <ls_value>-v->* TO FIELD-SYMBOL(<ls_field_value>).
            CASE sy-tabix.
              WHEN 1.
                ls_output-centroid_id = <ls_field_value>.
              WHEN 2.
                ls_output-station_name = <ls_field_value>.
              WHEN 3.
                ls_output-isweekday = <ls_field_value>.
              WHEN 4.
                ls_output-num_trips = <ls_field_value>.
              WHEN 5.
                ls_output-distance_from_city = <ls_field_value>.
            ENDCASE.
          ENDLOOP.
          APPEND ls_output TO lt_output.
          CLEAR ls_output.
        ENDLOOP.
        IF lt_output IS NOT INITIAL.
          cl_demo_output=>new( )->begin_section( 'ML.Predict Query Details'
                               )->write_text( ls_input-query
                               )->write_text( 'Dataset: bigquery-public-data.london_bicycles'
                               )->end_section(
                               )->begin_section( 'ML.Predict Query Results'
                               )->write_data( lt_output
                               )->end_section(
                               )->display( ).
        ENDIF.
      ENDIF.
    ELSE.
      "Display error message in case the API call fails
      MESSAGE lv_err_text TYPE 'E'.
    ENDIF.

    "Close HTTP Connection
    lo_bq->close( ).

  CATCH /goog/cx_sdk INTO DATA(lo_exception).
    MESSAGE lo_exception->get_text( ) TYPE 'E'.
ENDTRY.
  1. Save and activate the report.
  2. Execute the report (F8).

On successful execution, you should see a report output as shown below:

739e5685511fc9fc.png

6405542a597ed09f.png

8. Congratulations

Excellent work completing the "Get predictions from a BigQuery Machine Learning (ML) model using ABAP SDK for Google Cloud" codelab!

You've successfully retrieved predictions of a BigQuery Machine Learning model, right from within your SAP system! You've unlocked a new level of integration between ABAP and Google Cloud Services. Expand your horizons with other exciting ABAP SDK for Google Cloud codelabs:

  • Using the Translation API with ABAP SDK for Google Cloud
  • Upload a large object to a Cloud Storage bucket using chunking
  • Retrieving Credentials/Secrets from Secret Manager with ABAP SDK for Google Cloud
  • Call Vertex AI test-bison from ABAP

9. Clean up

If you do not wish to continue with the additional codelabs related to ABAP SDK for Google Cloud, please proceed with the cleanup.

Delete the project

  • Delete the Google Cloud project:
gcloud projects delete abap-sdk-poc

Delete individual resources

  1. Delete the compute instance:
gcloud compute instances delete abap-trial-docker
  1. Delete the firewall-rules:
gcloud compute firewall-rules delete sapmachine
  1. Delete the service account:
gcloud iam service-accounts delete \
    abap-sdk-bigquery-jobuser@abap-sdk-poc.iam.gserviceaccount.com