In this CodeLab, you will transfer data from Google BigQuery into SAP HANA using two different methods: a manual transfer for static master data and an ODBC interface for transactional data based on a view.

You will use a storage bucket to transfer the master data from Google BigQuery to SAP HANA.

Go to your storage browser and create a bucket.

Note the name of the bucket as you will need it in the next step

Go to the open dataset noaa_gsod in Google BigQuery. Expand the dataset and scroll down until you see the stations table.

Double-click the table and choose Export table

Use the name of the bucket you created before and add a name for the file.

You will find the exported dataset in your bucket. Choose to share it publicly and copy the public link for later use.

Importing the data into SAP HANA

Log in to a web console in your SAP HANA virtual machine. Switch to the SIDadm user and use the following command to import the master data file into a directory that is allowed for import by default.

Replace the SID, instance number (XX) and HOSTNAME accordingly. For SAP HANA, express edition, you can use command cdwork instead.

Paste the public URL for the CSV file in your bucket.

sudo su - hxeadm
cd /tmp
curl -O <<BUCKET_LINK>> 

Open the SQL editor of your choice and log in to a tenant database with the user SYSTEM. In SAP HANA, express edition, the default tenant database is called HXE. We will refer to the tenant database as HXE in the rest of this codelab.

Use the following commands to create a new user with limited permissions, a new schema called BIG and grant the proper permissions to the new user.

Finally, allow the import from files from any directory.

CREATE USER SUPER PASSWORD "HanaRocks1!" NO FORCE_FIRST_PASSWORD_CHANGE;
CREATE SCHEMA BIG;
GRANT IMPORT TO SUPER;
grant SELECT on schema "BIG" to SUPER;
grant EXECUTE on schema "BIG" to SUPER;
grant CREATE ANY on schema "BIG" to SUPER with grant option;
grant DELETE on schema "BIG" to SUPER;
grant UPDATE on schema "BIG" to SUPER;
grant INSERT on schema "BIG" to SUPER;
grant DROP on schema "BIG" to SUPER;
grant CREATE REMOTE SOURCE to SUPER;
GRANT AFL__SYS_AFL_AFLPAL_EXECUTE_WITH_GRANT_OPTION TO SUPER;

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini', 'database') SET ('import_export', 'enable_csv_import_path_filter') = 'false' WITH RECONFIGURE;

Log in to the HXE database as user SUPER. Use the following command to create a table:

create table BIG.STATIONS (
USAF VARCHAR(8),
WBAN VARCHAR(5),
NAME VARCHAR(200),
COUNTRY VARCHAR(2),
STATE VARCHAR(2),
CALL VARCHAR(6),
LAT VARCHAR(12),
LON VARCHAR(12),
ELEV VARCHAR(12),
BEGIN_DATE VARCHAR(8),
END_DATE VARCHAR(8)
 
);

Use the following command to import data into the table you have just created

IMPORT FROM CSV FILE '/tmp/stations.csv' INTO BIG        .STATIONS
WITH
   RECORD DELIMITED BY '\n'
   FIELD DELIMITED BY ','
   OPTIONALLY ENCLOSED BY '"'
   SKIP FIRST 1 ROW
   FAIL ON INVALID DATA
   ERROR LOG '/tmp/error_log.err';

Verify the data has been loaded

Use the following command to check data has been loaded

select * from big.stations;

Get your current location coordinates

You will simulate an application providing an address to identify the closest stations to a specific location. Open Google Maps and get the latitude and longitude of your current location, or any location of your choice.

Find the 10 stations closest to your location

Use the following SQL statement to find the closest 10 stations to your current location. Replace the latitude and longitude with the one you copied from Google maps.

select top 10 USAF from (
        Select STATION_LOCATION.ST_Distance(st_geomFromText('Point( 11.540665 48.143882 )', 4326), 'meter') / 1000 as DISTANCE_KM, USAF
                From (
                                select St_geomFromText('Point( '|| lon || ' ' || lat || ' )', 4326) as STATION_LOCATION, USAF
                                        FROM "BIG"."STATIONS" ) 
                order by distance_km asc )
        WHERE DISTANCE_KM < 100
        order by distance_km asc

Keep the results handy, you will need them in the next step.

Create a dataset in your project

Call it HeatDeath

Double click on any of the tables for the GSOD dataset and then on Query table

Replace the query with the following, changing the "xxxxx" with the station IDs produced by the previous step.

SELECT * FROM `bigquery-public-data.noaa_gsod.*` where stn in ( "xxxxx", "xxxxx")   

Uncheck the box for use Legacy SQL

Once the results are displayed, use save view to save the results into your project.


Enter a name for the table and click ok

Use the following command to create a new remote source linked to your project in Google BigQuery.

CREATE REMOTE SOURCE BIGQUERY ADAPTER "odbc" CONFIGURATION FILE 'property_bq.ini' CONFIGURATION 'DSN=GoogleBQ';

You can use the following call to list the contents of your new source

CALL "PUBLIC"."GET_REMOTE_SOURCE_OBJECT_TREE"('BIGQUERY', '', ?, ?)

You will see the result matches the name of your project.

Add it as a second parameter to the previous procedure call to see what is inside

Alternatively, if you are using SAP Web IDE for SAP HANA, you can double click on the source and see the objects in the graphical interface

Use the following command to create a virtual table for your remote source. Replace PROJECT_ID with your project ID

CREATE VIRTUAL TABLE "BIG"."BQRESULTS" at "BIGQUERY"."PROJECT_ID"."HeatDeath"."RESULTS"

You can now query the table using a select statement. This will query the data in BigQuery.

Select * from "BIG"."BQRESULTS";

Use the following SQL script to create a table to aggregate data from the remote source, train the proper ARIMA model and execute the predictive procedure.

create column table "BIG"."AGGR_TEMPS" as 
( SELECT  cast(concat("year", "mo") as INTEGER) as "MOMENT", AVG("temp") AS "TEMP_AVERAGE"
FROM "BIG"."BQRESULTS" 
GROUP BY "year", "mo"
ORDER BY "MOMENT" ASC  )

set schema "BIG";

DROP TYPE PAL_ARIMA_DATA_T;
CREATE TYPE PAL_ARIMA_DATA_T AS TABLE(
    "MOMENT" INTEGER,
        "TEMP_AVERAGE" DOUBLE);
        

DROP TYPE PAL_ARIMA_CONTROL_T;
CREATE TYPE PAL_ARIMA_CONTROL_T AS TABLE(
    "NAME" VARCHAR (50),
    "INTARGS" INTEGER,
    "DOUBLEARGS" DOUBLE,
        "STRINGARGS" VARCHAR (100) );

DROP TYPE PAL_ARIMA_MODEL_T;
CREATE TYPE PAL_ARIMA_MODEL_T AS TABLE(
    "NAME" VARCHAR (50),
    "VALUE" VARCHAR (5000) );
    
DROP TABLE PAL_ARIMA_PDATA_TBL;
CREATE COLUMN TABLE PAL_ARIMA_PDATA_TBL("POSITION" INT, "SCHEMA_NAME" NVARCHAR(256), "TYPE_NAME" NVARCHAR(256), "PARAMETER_TYPE" VARCHAR(7));
INSERT INTO PAL_ARIMA_PDATA_TBL VALUES (1, 'BIG', 'PAL_ARIMA_DATA_T', 'IN');
INSERT INTO PAL_ARIMA_PDATA_TBL VALUES (2, 'BIG', 'PAL_ARIMA_CONTROL_T', 'IN');
INSERT INTO PAL_ARIMA_PDATA_TBL VALUES (3, 'BIG', 'PAL_ARIMA_MODEL_T', 'OUT');


CALL SYS.AFLLANG_WRAPPER_PROCEDURE_DROP('BIG', 'PAL_AUTOARIMA_PROC');


CALL SYS.AFLLANG_WRAPPER_PROCEDURE_CREATE('AFLPAL','AUTOARIMA', 'BIG', 'PAL_AUTOARIMA_PROC', 'PAL_ARIMA_PDATA_TBL');                

DROP TABLE #PAL_CONTROL_TBL;
CREATE LOCAL TEMPORARY COLUMN TABLE #PAL_CONTROL_TBL ( "NAME" VARCHAR (50),"INTARGS" INTEGER,"DOUBLEARGS" DOUBLE,"STRINGARGS" VARCHAR (100));

/*INSERT INTO #PAL_CONTROL_TBL VALUES ('SEARCH_STRATEGY', 1,null,null);*/

/*Auto seasonal*/
INSERT INTO #PAL_CONTROL_TBL VALUES ('SEARCH_STRATEGY', 0,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('MAX_ORDER', 10,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('THREAD_NUMBER', 20,null,null);

DROP TABLE PAL_ARIMA_MODEL_TBL;
CREATE COLUMN TABLE PAL_ARIMA_MODEL_TBL LIKE PAL_ARIMA_MODEL_T;

CALL BIG.PAL_AUTOARIMA_PROC(AGGR_TEMPS, "#PAL_CONTROL_TBL", PAL_ARIMA_MODEL_TBL) WITH OVERVIEW;


SELECT * FROM PAL_ARIMA_MODEL_TBL;

/*Training is done - call the forecast based on the trained model*/
set schema BIG;
DROP TABLE #PAL_PARAMETER_TBL;
CREATE LOCAL TEMPORARY COLUMN TABLE #PAL_PARAMETER_TBL ( "NAME" VARCHAR (50),"INT_VALUE" INTEGER,"DOUBLE_VALUE" DOUBLE,"STRING_VALUE" VARCHAR (100));
INSERT INTO #PAL_PARAMETER_TBL VALUES ('FORECAST_METHOD', 1, NULL, NULL);
INSERT INTO #PAL_PARAMETER_TBL VALUES ('FORECAST_LENGTH', 10, NULL, NULL);

CALL _SYS_AFL.PAL_ARIMA_FORECAST (AGGR_TEMPS, PAL_ARIMA_MODEL_TBL, "#PAL_PARAMETER_TBL", ?);

For more information about the ARIMA model and its implementation in the Predictive Analytic Library in SAP HANA, check the proper SAP Help page.

Use Google Cloud Functions to interact with SAP HANA in a serverless architecture.

Enable the Cloud Functions API

Navigate to the Cloud Functions API page and click "Enable":

Create Working Area for a Cloud Function

In Google Cloud Shell we will:

  1. Make a directory,
  2. Change into that directory,
  3. Create empty files for our project, and
  4. Open the Google Cloud Shell Code Editor
mkdir ~/gcf-query-hana
cd ~/gcf-query-hana
touch .npmrc .gcloudignore package.json index.js
edit .

Edit the NPM Configuration

Because SAP's SAP HANA nodejs library is hosted in a private NPM repository we need to specify a configuration for NPM so Google Cloud Functions understands how to download and compile it.

Edit the .npmrc file:

edit .npmrc

Copy the following into the file:

.npmrc

@sap:registry=https://npm.sap.com

Configure nodejs Package

Edit package.json as the following:

package.json

{
        "name": "gcf-query-hana-example",
        "version": "1.0.0",
        "description": "",
        "main": "index.js",
        "scripts": {
                "test": "echo \"Error: no test specified\" && exit 1"
        },
        "author": "",
        "license": "ISC",
        "dependencies": {
                "@sap/hana-client": "^2.3.21",
                "cloud-functions-runtime-config": "latest"
        }
}

Avoid uploading needless files to Google Cloud Function

Edit .gcloudignore:

edit .gcloudignore

Set the content as the following:

.gcloudignore

.gcloudignore
.git
.gitignore
node_modules

Write a Cloud Function

Edit index.js to be the following:

Index.js

'use strict';

const hanaClient = require('@sap/hana-client');
const runtimeConfig = require('cloud-functions-runtime-config');
const ENTRY_POINT = process.env.ENTRY_POINT;

const hanaConnection = () => {
        return Promise.all([
                runtimeConfig.getVariable(ENTRY_POINT, 'HANA_EXPRESS_LOCATION'),
                runtimeConfig.getVariable(ENTRY_POINT, 'HANA_EXPRESS_USER'),
                runtimeConfig.getVariable(ENTRY_POINT, 'HANA_EXPRESS_PASSWORD')
        ]).then((connectionValues) => {
                return new Promise((resolve, reject) => {
                        var connection = {
                                serverNode:        connectionValues[0],
                                uid:                 connectionValues[1],
                                pwd:                 connectionValues[2]
                        };
                
                        var hana = hanaClient.createConnection();
                        hana.connect(connection, (err) => {
                                if (err) return reject(err);
                                return resolve(hana);
                        });
                });
        });
};

const countEverything = (hana) => {
        return new Promise((resolve, reject) => {
                hana.exec('SELECT COUNT(1) AS TOTAL FROM BIG.STATIONS', (err, result) => {
                        if (err) return reject(err);
                        return resolve(result[0].TOTAL);
                });
        });
};

const hanaQuery = (req, res) => {
        var handleError = (err) => {
                console.log(err);
                res.status(500).send(`Error: ${err}`);
        };
        
        var reportTotal = (total) => {
                res.status(200).send(total.toString());
        };

        hanaConnection().then((hana) => {
                countEverything(hana)
                        .then(reportTotal, handleError);
                
        }, handleError);
};

exports.hanaQuery = hanaQuery;

Setup Runtime Config

We don't want to hardcode sensitive or environmental information or configuration into cloud functions. We'll use the Runtime Configurator API of Deployment Manager to store and manage configuration for the function.

We need to configure the following SAP HANA, express edition environment variables:

Run the following commands in Google Cloud Shell:

gcloud services enable runtimeconfig.googleapis.com
gcloud beta runtime-config configs create hanaQuery
gcloud beta runtime-config configs variables set HANA_EXPRESS_LOCATION '[YOUR_HANA_EXTERNAL_IP]:39015' --config-name hanaQuery
gcloud beta runtime-config configs variables set HANA_EXPRESS_USER 'SUPER' --config-name hanaQuery
gcloud beta runtime-config configs variables set HANA_EXPRESS_PASSWORD 'HanaRocks1!' --config-name hanaQuery

Deploy the Google Cloud Function

Run the following in Google Cloud Shell:

npm install
gcloud beta functions deploy hanaQuery --trigger-http

Call the Google Cloud Function

Run this command in Google Cloud Shell:

gcloud beta functions call hanaQuery

Which should produce output resembling the following:

executionId: xhykofcp8aq6
result: '30050'