VPC Service Controls - BigQuery Protection Codelab I

VPC Service Controls - BigQuery Protection Codelab I

About this codelab

subjectLast updated Aug 16, 2024
account_circleWritten by Robert Basomingera, Juan Silva

1. Introduction

In this codelab, you will learn how to protect the BigQuery API using VPC Service Controls. The codelab starts with no API service protected by the service perimeter, allowing queries to be run on public datasets, and the results to be saved in a project table. The query runs in one project and the table (where results are saved) is created in another project, mimicking a setup where data can be stored in one project but needs to be accessed using a different project.

Next, we will introduce a service perimeter to protect the data project. You will learn how to fix observed violations using ingress rules and egress rules, and later add an access level to restrict access using internal IP addresses. The goals of this codelab are:

  • Understand how to fix ingress and egress violations using ingress and egress rules respectively.
  • Understand why a specific violation occurred.
  • Analyze the scope of the applied violation fix.
  • Modify the fix (ingress / egress rule) to change its scope by leveraging the option to allow traffic from internal IP addresses in a VPC network using access levels.

2. Resources Setup and Requirements

Before you begin

In this codelab, we assume that you already know:

Setup

Our initial setup is designed as follows:

The initial design with service perimeter protecting no API.

Create a Regular Service Perimeter

In this codelab, we will use a regular service perimeter protecting project-1.

Create Compute Engine VM

In this codelab, we will use 1 Compute Engine instance in project-2, located in us-central1 and using default VPC network named default.

Cost

You need to enable billing in the Google Cloud console to use cloud resources/APIs. We advise to shut down used resources to avoid incurring billing beyond this codelab. New Google Cloud users are eligible for the $300 USD Free Trial program.

The resources that incur cost are BigQuery and Compute Engine instance. You can estimate the cost using the BigQuery pricing calculator and Compute Engine pricing calculator.

3. Access to BigQuery without VPC Service Controls Restrictions

Query Public Dataset and Save Results in project-1

  1. Access project-2 and project-1 to verify if you are able to access BigQuery API by visiting the BigQuery Studio page. You should be able to do so because even if project-1 is in a service perimeter, the perimeter is not protecting any service, yet.
  2. From project-2, run the following query to query a public dataset.
SELECT  name, SUM(number) AS total
FROM  
`bigquery-public-data.usa_names.usa_1910_2013`
GROUP BY   name
ORDER BY total DESC
LIMIT
10;

After running the query to the public dataset (while remaining in project-2):

  1. Click on Save Results and select the BigQuery table. (refer to screenshot below). Save BigQuery results.
  2. Select project-1 as the destination project.
  3. Name the Dataset as codelab_dataset. (Select CREATE NEW DATASET, unless using an existing dataset). Choosing destination project while saving BigQuery results.
  4. Name the table as: codelab-table.
  5. Click Save.

The public dataset data has been successfully stored in project-1 as a result of executing the query from project-2.

Query Dataset saved in project-1 from project-2

While remaining in project-2 BigQuery Studio, run the following query to select data from:

  • Project: project-1
  • Dataset: codelab_dataset
  • Table: codelab-table
SELECT name, total
FROM
`project-1.codelab_dataset.codelab-table`
ORDER BY total DESC
LIMIT
10;

The query should run successfully, because neither project-2 nor project-1 are restricted to use BigQuery. Access to BigQuery is allowed from and to anywhere as long as the user has appropriate IAM permissions.

Codelab Setup without VPC Service Controls service perimeters. This diagram illustrates the process when a principal queries a BigQuery dataset. Each BigQuery query initiates a BigQuery job, which then performs the actual operation, in this scenario, retrieving data. Principal access is demonstrated from a Compute Engine instance and from the internet, while querying from a public dataset and from a separate Google Cloud project. The process to query the data (GetData) is successful, without being blocked by VPC Service Controls.

4. Protect BigQuery API in Source Dataset Project

Modify the configuration of perimeter perimeter-1 and restrict BigQuery API service along with the protected resource being project-1.

Configuring service perimeter

Verify Service Perimeter Enforcement

From project-2, run the following query in BigQuery Studio, as in previous step:

SELECT name, total
FROM
`project-1.codelab_dataset.codelab-table`
ORDER BY total DESC
LIMIT
10;

A VPC Service Controls RESOURCES_NOT_IN_SAME_SERVICE_PERIMETER violation will occur

Egress VPC Service Controls violation

The violation audit log will be located in project-1, because that is where the violation to cross the perimeter occurred. Logs can be filtered with the observed vpcServiceControlsUniqueId (replace VPC_SC_DENIAL_UNIQUE_ID with the observed unique id).

severity=ERROR
resource.type="audited_resource"
protoPayload.metadata.@type="type.googleapis.com/google.cloud.audit.VpcServiceControlAuditMetadata"
protoPayload.metadata.vpcServiceControlsUniqueId="[*VPC_SC_DENIAL_UNIQUE_ID*]"

The violation is an egressViolations with:

  • principalEmail: [user account running the query]
  • callerIp: [The IP address of the user agent running the query]
     "egressViolations": [
       
{
         
"targetResource": "projects/project-2",
         
"sourceType": "Resource",
         
"source": "projects/project-1",
         
"servicePerimeter": "accessPolicies/REDACTED/servicePerimeters/perimeter-1",
         
"targetResourcePermissions": [ "bigquery.jobs.create"]
       
}      ],

5. Fixing the Violation to Create BigQuery Job

Egress traffic fail for BigQuery Job creation. This diagram illustrates when a principal runs a query from project-2 for a dataset in project-1. The operation to create a BigQuery job, from dataset project (project-1) in the project where the query is run from (project-2) fails with a VPC Service Controls egress violation due to service perimeter perimeter-1 protecting BigQuery API. With the perimeter in place, no BigQuery API request can be initiated from project-1 toward outside the perimeter or initiated outside the perimeter toward the protected project; unless allowed by service perimeter configurations.

An egress violation can be fixed by creating an egress rule which is based on the:

  • source (FROM): namely the user email address and context (e.g: caller ip address, device state, location, etc.)
  • destination (TO): namely the target resource, service, and method or permission.

To fix the observed egress violation, create an egress rule that allows traffic toward the targetResource (project-2) by the user account running the query (user@example.com) on the BigQuery service and the bigquery.jobs.create method/ permission.

Egress violation Fix configurations.

Expected behavior from the configured egress rule:

  • FROM | Identities: only the specified identity user@example.com must be allowed to cross the perimeter boundary.
  • TO | projects: the specified identity can cross the perimeter boundaries only if the destination is the specified project project-2.
  • TO | Services: the specified identity can initiate traffic outside the perimeter, toward the specified project only if the API call is for the specified service and method. Otherwise, for example if they try a different service protected by the service perimeter, the operation will be blocked because other services are not allowed.

Test the Fix: Egress Rule

Once the egress rule is in place, run the same query.

SELECT name, total
FROM
`project-1.codelab_dataset.codelab-table`
ORDER BY total DESC
LIMIT
10;

Another violation will occur, this time a NO_MATCHING_ACCESS_LEVEL ingress violation. The new violation is different from the first one, in terms of target project, and method.

Ingress VPC Service Controls violation

The new violation is an ingress violation with

  • principalEmail: [user account running the query]
  • callerIp: [The IP address of the user agent running the query]
ingressViolations: [
0: {
 servicePerimeter
: "accessPolicies/REDACTED/servicePerimeters/perimeter-1"
 targetResource
: "projects/project-1"
 targetResourcePermissions
: [0: "bigquery.tables.getData"]}
 
]

The violation for bigquery.tables.getData method is due to an API call initiated by the BigQuery job trying to get data from the BigQuery table.

6. Fixing Violation to Get BigQuery Table Data

An ingress rule fixes an ingress violation, while providing a granular control on who is allowed to cross the service perimeter boundary along with the context of the allowed access, such as the source/ target project and the API method they can access.

An ingress violation is fixed by an ingress rule which is configured with:

  • source (FROM): namely the user email address and context (e.g: caller ip address, device state, location, etc.)
  • destination (TO): namely the target resource, service, and method or permission.

The ingress rule will permit traffic towards project-1 by the specified user on the specified service and method.

Ingress violation fix

Expected behavior from the configured ingress rule:

  • FROM | Identities: only the specified identity user@example.com must be allowed to cross the perimeter boundary.
  • TO | projects: the specified identity can cross perimeter boundaries only if the destination is the specified project project-1.
  • TO | Services: the specified identity can initiate traffic inside the perimeter only if the API call is for the BigQuery API and the specified method bigquery.tables.getData.

The execution of the identical query should henceforth function appropriately without VPC Service Controls violations.

We have successfully restricted BigQuery API in project-1 so that it can only be used by the user@example.com and not by user2@example.com.

VPC Service Controls perimeter protecting BigQuery API This diagram illustrates how two different principals attempt to query the same dataset. Access by user2@example.com (dotted blue lines) is denied by VPC Service Controls, because they are not allowed to run BigQuery operations from or toward project-1 by the service perimeter configuration. Access by user@example.com (green solid line) is successful, because they are allowed by VPC Service Controls configurations, to perform operations from and toward project-1.

7. Restrict Traffic Allowed by the Service Perimeter based on Internal IP Address

The current configuration enables the designated user to run queries on BigQuery in project-1 from any location; anywhere on the internet, if they are granted IAM permission to query the data, and as long as they utilize their account. From a security perspective, this implies that if the account is compromised, any individual who gains access to the account is capable of accessing the BigQuery data without any additional restrictions.

Further restrictions can be implemented by utilizing access level in ingress and egress rules to specify user context. For instance, you can permit access based on source IP in conjunction with a previously configured ingress rule that authorizes access by caller identity. Access by source IP is feasible for both public IP CIDR ranges, provided the user client has a public IP assigned to it, or by employing an internal IP address if the user client operates from a Google Cloud project.

Create Access Level with an Internal IP Address Access Condition

Under the same scoped access policy folder, open Access Context Manager page to create an access level.

  1. On the Access Context Manager page, select CREATE ACCESS LEVEL.
  2. In the New Access Level pane:
    1. Provide a title: you can use codelab-al.
    2. In the Conditions section, click IP subnetworks.
    3. Select Private IP tab and click SELECT VPC NETWORKS.
    4. From the Add VPC Networks pane, you can either browse and find the default network or manually enter the full network name in format of //compute.googleapis.com/projects/project-2/global/networks/default.
    5. Click ADD VPC Network.
    6. Click SELECT IP SUBNETs.
    7. Select the region where the VM instance is located. For this codelab, it is us-central1.
    8. Click SAVE.

We have created an access level, which is still not enforced on any perimeter or ingress/egress policy.

Access level configured with IP Subnetworks

Add Access Level to the Ingress Rule

In order to enforce that the user permitted by the ingress rule is also verified against the access level, it is necessary to configure the access level in the ingress rule. The ingress rule that authorizes access to query data is in perimeter-1. Alter the ingress rule to define the source as the access level codelab-al.

Access level with VPC network

Testing New Configurations

Subsequent to the addition of the access level in the ingress rule, the same BigQuery query will fail unless executed from the client in VPC network default for the project project-2. To verify this behavior, execute the query from the Google Cloud console while the endpoint device is connected to the internet. The query will terminate unsuccessfully, accompanied by an indication of an ingress Violation.

The same query can be run from the VPC network default, located in project-2. Similarly, executing the same BigQuery query from a Compute Engine instance located in project-2 using VPC network default will also fail. This is because the ingress rule is still configured to only allow the principal user@example.com. However, the VM is using the Compute Engine default service account.

To successfully run the same command from the Compute Engine instance in project-2,ensure that:

  • The VM has access scope to use BigQuery API. This can be done by selecting Allow full access to all Cloud APIs as the VM access scope.
  • The service account attached to the VM needs the IAM permissions to:
    • Create BigQuery Jobs in project-2
    • Get BigQuery data from the BigQuery table located in project-1
  • The default Compute Engine service account needs to be allowed by the ingress and egress rule.

Now we need to add the Compute Engine default service account in the ingress rules (to allow getting data from BigQuery table) and to the egress rule (to allow creation of BigQuery jobs).

VPC Service Controls service perimeter configuration with access levels

From a Compute Engine instance in project-2 on the default VPC network, run the following bq query command:

bq query --nouse_legacy_sql \
'SELECT name, total
FROM
`project-1.codelab_dataset.codelab-table`
ORDER BY total DESC
LIMIT 10;'

With the current configuration, the BigQuery command will succeed only if:

  • run on a VM using the default VPC network in project-2, and
  • located in the specified us-central1 region (Ip subnetwork), and
  • run using the default Compute Engine service account configured in the service perimeter.

The BigQuery command query will fail if run from anywhere else, including:

  • if run on a VM using the default VPC network in project-2 but located in a different region than the subnet added in access level, or
  • if run by the user user@example.com with a user client on the internet.

Service perimeter allowing access for GCE default service account. This diagram illustrates access initiated by the same principal, user@example.com, from two different locations: the internet and a Compute Engine instance. Access to BigQuery directly from the internet (blue dotted lines) is blocked by VPC Service Controls, while access from a VM (green solid lines) —while impersonating the Compute Engine default service account— is allowed. The allowed access is due to the service perimeter being configured to allow access to protected resources from an internal IP address.

8. Cleanup

While there is no separate charge for using VPC Service Controls when the service is not in use, it's a best practice to clean up the setup used in this laboratory. You can also delete the VM instance and BigQuery datasets, or Google Cloud projects to avoid incurring charges. Deleting the Cloud project stops billing for all the resources used within that project.

  • To delete the VM instance, complete the following steps:
    • In the Google Cloud console, go to the VM instances page.
    • Select the checkbox on the left side of the VM instance name, and then select Delete, and then click Delete again to confirm. Deletion of Compute Engine instance instance.
  • To delete the service perimeter, complete the following steps:
    • In the Google Cloud console, select Security, and then VPC Service Controls at the level where access policy is scoped, in this case, at the folder level.
    • In the VPC Service Controls page, in the table row corresponding to the perimeter that you want to delete, click Delete.
  • To delete the Access Level, complete the following steps:
    • In the Google Cloud console, Open the Access Context Manager page at the Folder scope.
    • In the grid, identify the row for the access level that you want to delete, select three-dot menu, and then select Delete.
  • To shutdown the projects, complete the following steps:
    • In the Google Cloud console, go to the IAM & Admin Settings page of the project you want to delete.
    • On the IAM & Admin Settings page, select Shutdown.
    • Enter the project ID, and select Shutdown anyway.

9. Congratulations!

In this codelab, you created a VPC Service Controls perimeter, enforced it, and troubleshooted it.

Learn More

You can explore the following scenarios as well:

  • Run the same query on public dataset, after the project is protected by VPC Service Controls.
  • Add project-2 in the same perimeter as project-1.
  • Add project-2 in its own perimeter and keep project-1 in the current perimeter.
  • Run queries to update data in the table, not just to retrieve data.

License

This work is licensed under a Creative Commons Attribution 2.0 Generic License.