About this codelab
1. Introduction
In this lab, we will learn how to protect BigQuery Data Transfer Service, using VPC Service Controls, while transferring data from Cloud Storage to a BigQuery dataset. We then protect Cloud Storage and repeat the process to transfer data from Cloud Storage to BigQuery. The protection of Cloud Storage causes a VPC Service Controls violation, which needs to be fixed for the successful transfer. In the end, we also protect BigQuery and then attempt to copy dataset between projects, which also causes a violation that needs to be fixed.
Throughout this lab, we will see how to fix both ingress and egress violations using ingress and egress rules respectively. We will also use an access level to fix the BigQuery Data Transfer ingress violation. The goals of this codelab are:
- Understand how to fix ingress and egress violations using ingress and egress rules respectively on different services, notably Cloud Storage, BigQuery, and BigQuery Data Transfer Service.
- Understand why a specific violation occurred.
2. Resources Setup and Requirements
Before you begin
In this codelab, we assume that you already know:
- How to create a folder
- How to create a project in a folder or move existing project in a folder
- How to create a scoped access policy
- How to create and configure a service perimeter from Google Cloud console
- How to find violations logs from audit logs
Setup
Our initial setup is designed as follows:
- A Google Cloud organization.
- A folder under the Organization. For this codelab, we will call it
codelab-folder
. - Two Google Cloud projects in the folder
codelab-folder
. For this codelab, we call the projectsproject-1
andproject-2
.- If you do not have the folder and projects already created, in the Google Cloud console, create a folder under the Organization and create two new projects.
- The required permissions: IAM roles for managing folders, IAM roles for managing projects, IAM roles required to configure VPC Service Controls, IAM roles for managing BigQuery, and IAM roles for managing Cloud Storage.
- Billing account for both projects
project-1
andproject-2
.
Create a scoped policy and a regular service perimeter
In this codelab, we will use a regular service perimeter protecting project-2
.
- Create a scoped access policy, which is scoped at the folder
codelab-folder
level. For this codelab, we assume the created access policy has id987654321
. - Create a regular perimeter, we call it
perimeter-2
, and add the projectproject-2
.
In the perimeter perimeter-2
, restrict BigQuery Data Transfer API
.
Creation of Cloud Storage bucket and BigQuery dataset
For the purpose of this codelab, any csv file is enough, regardless of the content. The main limitation is related to colocation requirement which enforces that:
- If your BigQuery dataset is in a multi-region, the Cloud Storage bucket containing the data you're transferring must be in the same multi-region or in a location that is contained within the multi-region
- If your dataset is in a region, your Cloud Storage bucket must be in the same region.
Henceforth, for this codelab, we will ensure that both Cloud Storage bucket and BigQuery dataset are in the same region or multi-region.
Create a new Cloud Storage bucket in project project-1
To create a new Cloud Storage bucket, follow the documented steps for creating a new bucket.
- For the name of the bucket, enter a name that meets the bucket name requirements. For this codelab, we will call the bucket
codelab-bqtransfer-bucket
. - For where to store the data, bucket location, select a Location type and Location where the bucket data will be permanently stored. For this codelab, we will use us (multiple regions in United States).
Create a CSV file
From your local machine or using Cloud Shell, we can use the echo
command to create a sample csv file, codelab-test-file.csv
, using the following commands:
echo "name,age" > codelab-test-file.csv; \
echo "Alice,10" >> codelab-test-file.csv; \
echo "Bob,20" >> codelab-test-file.csv; \
echo "Carol,30" >> codelab-test-file.csv; \
echo "Dan,40" >> codelab-test-file.csv; \
echo "Eve,50" >> codelab-test-file.csv; \
echo "Frank,60" >> codelab-test-file.csv; \
echo "Grace,70" >> codelab-test-file.csv; \
echo "Heidi,80" >> codelab-test-file.csv;
Upload CSV file to Cloud Storage bucket
Once the csv file is created, run the following command to upload the file object to the created bucket:
gcloud storage cp codelab-test-file.csv gs://codelab-bqtransfer-bucket
You can verify that the file was uploaded to the created bucket by listing objects in the bucket or running the following command:
gcloud storage ls --recursive gs://codelab-bqtransfer-bucket/**
Create BigQuery dataset and table in project-2
- Create a BigQuery dataset, in project
project-2
following these steps.- For Dataset ID, enter a unique dataset name. For this codelab, we use:
codelab_bqtransfer_dataset
. - For Location type, choose a geographic location for the dataset. For this codelab, we use the same location as the Cloud Storage bucket: US (multiple regions in the United States).
- For Dataset ID, enter a unique dataset name. For this codelab, we use:
- Create a BigQuery table, under the created dataset
codelab_bqtransfer_dataset
, by following these steps.- In the Source section, select Empty table in the Create table from list.
- In the Table field, enter the name of the table that you want to create. For this codelab, we use the name:
codelab-bqtransfer-table
. - Verify that the Table type field is set to Native table
- In the Schema section, enter the schema definition. You can enter schema information by clicking Edit as text and input the following schema, which conforms to the format of the created csv file.
[{
"name": "name",
"type": "STRING",
"mode": "NULLABLE",
"description": "The name"
},
{
"name": "age",
"type": "INTEGER",
"mode": "NULLABLE",
"description": "The age"
}]
Cost
You need to enable billing in the projects project-2
and project-1
to use Cloud resources/APIs. We advise to shut down used resources to avoid incurring billing beyond this codelab.
The resources that incur the cost are BigQuery and Cloud Storage. An estimated cost can be found in the BigQuery pricing calculator and Cloud Storage calculator.
3. Configure Data Transfer from Cloud Storage Object to BigQuery Table
We will now try to create a Data Transfer Service (in project-2
) to transfer from Cloud Storage (located in project-1
) to BigQuery (located in project-2
), while having VPC Service Controls protecting BigQuery Data Transfer Service in project-2
. Protecting only the BigQuery Data Transfer Service (without also protecting BigQuery and Cloud Storage) restricts principals to only create and manage data transfers (such as manually starting a data transfer).
Setup data transfer from Cloud Storage
To create a data transfer, follow these steps:
- Go to the BigQuery page in the Google Cloud console of
project-2
. - Click Data transfers.
Investigate the violation while accessing Data transfers page
In the Google Cloud console, we can see the VPC Service Controls unique identifier. Use the same identifier to filter logs and identify violation details (replace OBSERVED_VPCSC_DENIAL_UNIQUE_ID
with the observed denial id):
protoPayload.metadata.@type="type.googleapis.com/google.cloud.audit.VpcServiceControlAuditMetadata"
protoPayload.metadata.vpcServiceControlsUniqueId="OBSERVED_VPCSC_DENIAL_UNIQUE_ID"
The observed violation is a NO_MATCHING_ACCESS_LEVEL
, which is an ingress violation with details similar to the following:
ingressViolations: [
0: {
servicePerimeter: "accessPolicies/987654321/servicePerimeters/perimeter-2"
targetResource: "projects/[PROJECT2_NUMBER]"
}]
violationReason: "NO_MATCHING_ACCESS_LEVEL"
callerIp: "USER_PUBLIC_IP_ADDRESS"
resource: {
labels: {
method: "google.cloud.bigquery.datatransfer.v1.DataTransferService.ListTransferConfigs"
project_id: "project-2"
service: "bigquerydatatransfer.googleapis.com"
}
type: "audited_resource"
}
Accessing the Data transfers page attempts to list any configured data transfers; therefore, the violation of the ListTransferConfigs
method.
Fix the violation for bigquerydatatransfer.googleapis.com
service
An access level or an ingress rule can be used to fix an ingress violation. In this codelab, let's use an ingress rule configured with the denied user identity, which allows access to bigquerydatatransfer.googleapis.com
service and all methods.
Once the ingress rule is in place, access to the Data transfers page should work with no issue.
Resume the setup of data transfer from Cloud Storage
From previous steps, while on the Data transfers page (after clicking Data transfers), continue with the following steps:
- Click + Create transfer.
- In the Source type section, for Source, choose Google Cloud Storage.
- In the Transfer config name section, for Display name, enter a name for the transfer such as
Codelab Transfer
. - In the Schedule options section:
- Select a Repeat frequency such as 15 minutes.
- Make sure to select Start now; otherwise, the data transfer will start only after the configured Repeat frequency
- In the Destination settings section, for Destination dataset, choose the dataset you created to store your data:
codelab_bqtransfer_dataset
- In the Data source details section
- For Destination table, enter the name of your destination table. The destination table must follow the table naming rules. For this codelab, we will use the table that we created earlier:
codelab-bqtransfer-table
- For Cloud Storage URI, enter the Cloud Storage URI. For this codelab, we use the created bucket and file:
codelab-bqtransfer-bucket/codelab-test-file.csv
- For Write preference, keep
APPEND
(or chooseMIRROR
). - DO NOT select to delete files after transfer (Because we will reuse the same file multiple times. However, you can use multiple files and delete source files after transfer)
- For File format, select CSV
- Within Transfer Options, under CSV, enter comma(",") as the Field delimiter.
- For Destination table, enter the name of your destination table. The destination table must follow the table naming rules. For this codelab, we will use the table that we created earlier:
- In the Service Account menu, select a service account from the service accounts associated with your Google Cloud project
- The selected service account must have the required permissions for both Cloud Storage on the project hosting the storage bucket;
project-1
in this codelab. - For this codelab, we will use a service account created in
project-2
ascodelab-sa@project-2.iam.gserviceaccount.com
.
- The selected service account must have the required permissions for both Cloud Storage on the project hosting the storage bucket;
- Click Save.
Since we selected Start Now as the schedule option, as soon as Save is selected the first transfer will start.
Verify data transfer service status
To verify the status of the configured data transfer:
- Go to the BigQuery page in the Google Cloud console
- Click Data transfers.
- The list of configured transfer is displayed
Click on the Codelab Transfer
(under Display name) and it will display a list of all runs performed so far.
The data transfer run should be successful, with no VPC Service Controls violation for both manually triggered and scheduled data transfer. Note that, only the manually triggered transfer needs the ingress rule to allow access to the principal, that is initiating the transfer manually.
4. IP Address Restrictions for Manually Triggered Data Transfers
The current configured ingress rules allow the configured identity to trigger data transfer manually from any IP address.
With the usage of access level, VPC Service Controls provide the ability to limit allowed access by specific API request attributes, notably:
- IP subnetworks: checks if the request is coming from a specific IP address.
- Regions: checks if the request is coming from a specific region, which is determined by the geolocation of the IP address.
- Principals: checks if the request is coming from a specific account.
- Device policy: checks if the request is coming from a device that meets specific requirements.
To enforce the verification of these attributes along with already configured ingress rule, we have to create an access level, which allows the desired attributes, and then add the created access level as the source in the ingress rule.
This diagram illustrates access initiated by the two principals (
user@example.com
and user2@example.com
) in three scenarios, demonstrating how VPC Service Controls evaluates sources (ingress access level) and identity attributes as an AND condition where both have to match.
- User user@example.com is allowed access when attempting access from an IP address allowed by the access level, because their IP address and user account match the configurations in the ingress rule.
- User user@example.com is blocked access when their IP address does not match the allowed IP address, despite their account being the one configured in the ingress rule.
- User user2@example.com is blocked access despite attempting access from an allowed IP address, because their account is not allowed by the ingress rule.
Create access level
To create access level that limits access by IP address:
- Open the Access Context Manager page in the Google Cloud console.
- If you are prompted, select the folder
codelab-folder
.
- If you are prompted, select the folder
- At the top of the Access Context Manager page, click CREATE ACCESS LEVEL.
- In the New Access Level pane, give the new access level a Title. For this codelab, we will call it
project_2_al
. - In the Conditions section, click + in front of IP subnetworks.
- In the IP Subnetworks box, select Public IP
- As alternative, you can select to use Private IP to use internal IP address in access levels. However, for this codelab, we are using a public IP.
- Enter one or more IPv4 or IPv6 ranges formatted as CIDR blocks.
Add access level in the ingress rule
Within ingress rule, access level is referenced under sources
field, which is a required field as documented in ingress rule reference. To allow ingress to resources, VPC Service Controls evaluates sources
and identityType
attributes as an AND condition. The ingress rule uses the identity of the principal triggering the data transfer manually, not the service account specified in the data transfer configuration.
Re-run transfer with the configurations limiting access by IP address
To evaluate the effectiveness of applied configurations, trigger the transfer again using the following scenarios:
- using the IP address in the range allowed in the access level referenced by the ingress rule.
- using an IP address not allowed by the configurations
Access from allowed IP address should be successful while access from non-allowed IP addresses should fail and results in a VPC Service Controls violation.
One easy way to test using a different IP address is to allow IP address assigned while using Google Cloud console, and then test while using Cloud Shell.
In the Cloud Shell, run the following command to manually trigger a transfer, by replacing both RUN_TIME and RESOURCE_NAME:
bq mk \
--transfer_run \
--run_time='RUN_TIME' \
RESOURCE_NAME
For example, the following sample command runs immediately for a transfer 12345678-90ab-cdef-ghij-klmnopqrstuv
configuration in the project 1234567890
.
NOW=$(TZ=GMT date +"%Y-%m-%dT%H:%M:%SZ");
bq mk \
--transfer_run \
--run_time=$NOW \
projects/1234567890/locations/us/transferConfigs/12345678-90ab-cdef-ghij-klmnopqrstuv
The observed output shows a VPC Service Controls violation, as expected, since the IP address is not allowed.
The observed violation is on DataTransferService.StartManualTransferRuns
method.
ingressViolations: [
0: {
servicePerimeter: "accessPolicies/987654321/servicePerimeters/perimeter-2"
targetResource: "projects/[PROJECT2_NUMBER]"
targetResourcePermissions: [0: "vpcsc.permissions.unavailable"]
}]
violationReason: "RESOURCES_NOT_IN_SAME_SERVICE_PERIMETER"
resource: {
labels: {
method: "google.cloud.bigquery.datatransfer.v1.DataTransferService.StartManualTransferRuns"
project_id: "project-2"
service: "bigquerydatatransfer.googleapis.com"
}
type: "audited_resource"
}
severity: "ERROR"
5. Starting Data Transfer While Protecting Cloud Storage Service
Since we are performing a transfer from Cloud Storage to BigQuery, let's add Cloud Storage among the services protected by VPC Service Controls and see if the transfer remains successful.
In the perimeter-2
configuration, add Cloud Storage API as one of the Restricted Services, along with the BigQuery Data Transfer API.
After securing the Cloud Storage API, wait for the next scheduled data transfer, or manually trigger a transfer using the following steps:
- Go to the BigQuery page in the Google Cloud console.
- Click Data transfers.
- Select your transfer from the list: for this codelab, we are using Codelab Transfer transfer
- Click Run transfer now
- Click OK.
Another transfer will be initiated. You may need to refresh the page to see it. This time the transfer will fail with a VPC Service Controls violation.
Investigate Cloud Storage VPC Service Controls violation
Filter audit logs using the vpcServiceControlsUniqueIdentifier
as seen in the transfer Summary.
The observed violation is a RESOURCES_NOT_IN_SAME_SERVICE_PERIMETER
egress violation with the following details:
- Principal is the service account configured in the Data Transfer Service (whether manually triggered or running the scheduled data transfer, the denied principal will be the same.)
- Service affected is Cloud Storage
- The source of the request is the project where Data Transfer Service is configured:
project-2
- The target project is the project where the Cloud Storage object is located:
project-1
principalEmail: "codelab-sa@project-2.iam.gserviceaccount.com"
egressViolations: [
0: {
servicePerimeter: "accessPolicies/987654321/servicePerimeters/perimeter-2"
source: "projects/[PROJECT2_NUMBER]"
sourceType: "Resource"
targetResource: "projects/[PROJECT1_NUMBER]"
targetResourcePermissions: [0: "storage.objects.get"]
}]
labels: {
method: "google.storage.objects.get"
project_id: "project-2"
service: "storage.googleapis.com"
}
Fix the Cloud Storage egress violation
To fix the egress violation, we have to use an egress rule which allows traffic from the denied service account towards the project with Cloud Storage objects.
After modifying the service perimeter perimeter-2
, repeat the process to trigger the transfer again. The transfer will not show an error.
6. Copy BigQuery Dataset from project-2 to project-1
After confirming that we can transfer data from Cloud Storage bucket in project-1
to BigQuery dataset in project-2
, let's copy BigQuery dataset from project-2
towards project-1
; while the BigQuery API is protected by VPC Service Controls.
To create and copy the dataset, we will use the bq mk
command, which uses the bq tool.
Create destination dataset in project-1
Before copying the dataset, the destination dataset has to be created first. To create the destination dataset, we can run the following command, which creates a dataset named copied_dataset
, in project project-1
with us
as the location.
bq mk \
--dataset \
--location=us \
project-1:copied_dataset
Protect BigQuery service in project-2
with VPC Service Controls
Modify the configuration of the perimeter perimeter-2
and add BigQuery API as the protected service, along with the BigQuery Data Transfer and Cloud Storage services.
Initiate dataset copy
To copy the dataset, run the following bq mk
command, which copies dataset codelab_bqtransfer_dataset
in project project-2
towards the dataset copied_dataset
in project-1
, and overwrite the dataset content, if any.
bq mk \
--transfer_config \
--project_id=project-1 \
--target_dataset=copied_dataset \
--data_source=cross_region_copy \
--display_name='Dataset from project-2 to project-1' \
--params='{
"source_dataset_id":"codelab_bqtransfer_dataset",
"source_project_id":"project-2",
"overwrite_destination_table":"true"
}'
The command will run successfully; meanwhile, the transfer configuration is created successfully, to start the operation to copy the dataset. Copying the dataset itself will fail, with a VPC Service Controls violation.
To find the corresponding VPC Service Controls violation details, check the logs in project-2
(source dataset project) with the following log query. The log query filters logs on the BigQuery service and resource name of the dataset being copied (codelab_bqtransfer_dataset
).
resource.labels.service="bigquery.googleapis.com"
protoPayload.metadata.resourceNames:"datasets/codelab_bqtransfer_dataset"
The observed VPC Service Controls violation is an egress violation from project-2
to project-1
.
egressViolations: [
0: {
servicePerimeter: "accessPolicies/987654321/servicePerimeters/perimeter-2"
source: "projects/[PROJECT-2-NUMBER]"
sourceType: "Resource"
targetResource: "projects/[PROJECT-1-NUMBER]"
targetResourcePermissions: [
0: "bigquery.transfers.update"
1: "bigquery.transfers.get"
2: "bigquery.jobs.create"
]
}
]
method: "bigquery.tables.getData"
service: "bigquery.googleapis.com"
Fix all BigQuery violations and start dataset copy again
To fix the egress violation, we need to create an egress rule which allows the denied principal. The denied principal is the one running the mk
command.
Once the egress rule is in place, on the perimeter perimeter-2
, run the same command to copy the dataset. This time, it should copy the dataset successfully with no VPC Service Controls violation.
7. Cleanup
While there is no separate charge for using VPC Service Controls when the service is not in use, it's a best practice to clean up the setup used in this laboratory. You can also delete the VM instance and/or Cloud projects to avoid incurring charges. Deleting the Cloud project stops billing for all the resources used within that project.
- To delete the Cloud Storage bucket, complete the following steps:
- In the Google Cloud console, go to the Cloud Storage Buckets page.
- Select the checkbox of the bucket to delete, and then click Delete.
- In the overlay window that appears, confirm you want to delete the bucket and its contents.
- To delete the BigQuery dataset, complete the following steps:
- In the Google Cloud console, go to the BigQuery page.
- In the Explorer pane, expand your project and select a dataset.
- Expand the three-dot menu and click Delete.
- In the Delete dataset dialog, type
delete
into the field, and then click Delete.
- To delete the service perimeter, complete the following steps:
- In the Google Cloud console, select Security, and then VPC Service Controls at the level where access policy is scoped, in this case, at the folder level.
- In the VPC Service Controls page, in the table row corresponding to the perimeter that you want to delete, select
Delete Icon
.
- To delete the Access Level, complete the following steps:
- in the Google Cloud console, Open the Access Context Manager page at the Folder scope.
- In the grid, identify the row for the access level that you want to delete, select three-dot menu, and then select Delete.
- To shutdown the projects, complete the following steps:
- In the Google Cloud console, go to the IAM & Admin Settings page of the project you want to delete.
- On the IAM & Admin Settings page, select Shutdown.
- Enter the project ID, and select Shutdown anyway.
8. Congratulations!
In this codelab, you created a VPC Service Controls perimeter, enforced it, and troubleshooted it.
Learn more
You can explore the following scenarios as well:
- Add
project-1
in a different perimeter which also protects BigQuery, BigQuery Data Transfer Service, and Cloud Storage. - Perform BigQuery data transfer from other supported sources.
- Restrict user access by other attributes, like location or device policy.
License
This work is licensed under a Creative Commons Attribution 2.0 Generic License.