About this codelab
1. Introduction
Spanner is a fully managed, horizontally scalable, globally distributed database service that is great for both relational and non-relational workloads.
Spanner's Cassandra interface lets you take advantage of Spanner's fully managed, scalable, and highly available infrastructure using familiar Cassandra tools and syntax.
What you'll learn
- How to setup a Spanner instance and database.
- How to convert your Cassandra schema and data model.
- How to deploy and configure dual writes for incoming data.
- How to bulk export your historical data from Cassandra to Spanner.
- How to validate data to ensure data integrity throughout the migration process.
- How to point your application to Spanner instead of Cassandra.
What you'll need
- A Google Cloud project that is connected to a billing account.
- Access to a machine with the
gcloud
CLI installed and configured, or use the Google Cloud Shell. - A web browser, such as Chrome or Firefox.
2. Setup and requirements
Create a GCP project
Sign-in to the Google Cloud Console and create a new project or reuse an existing one. If you don't already have a Gmail or Google Workspace account, you must create one.
- The Project name is the display name for this project's participants. It is a character string not used by Google APIs. You can always update it.
- The Project ID is unique across all Google Cloud projects and is immutable (cannot be changed after it has been set). The Cloud Console auto-generates a unique string; usually you don't care what it is. In most codelabs, you'll need to reference your Project ID (typically identified as
PROJECT_ID
). If you don't like the generated ID, you might generate another random one. Alternatively, you can try your own, and see if it's available. It can't be changed after this step and remains for the duration of the project. - For your information, there is a third value, a Project Number, which some APIs use. Learn more about all three of these values in the documentation.
Billing setup
Next, you'll need to follow manage billing user guide and enable billing in the Cloud Console. New Google Cloud users are eligible for the $300 USD Free Trial program. To avoid incurring billing beyond this tutorial, you can shut down the Spanner instance at the end of the codelab by following "Step 9 Cleaning up".
Start Cloud Shell
While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell, a command line environment running in the Cloud.
From the Google Cloud Console, click the Cloud Shell icon on the top right toolbar:
It should only take a few moments to provision and connect to the environment. When it is finished, you should see something like this:
This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on Google Cloud, greatly enhancing network performance and authentication. All of your work in this codelab can be done within a browser. You do not need to install anything.
Next up
Next, you will deploy Cassandra cluster.
3. Deploy Cassandra cluster (Origin)
For this codelab, we'll set up a single-node Cassandra cluster on Compute Engine.
1. Create a GCE VM for Cassandra
To create an instance, use the gcloud compute instances create
command.
gcloud compute instances create cassandra-origin \ --machine-type=e2-medium \ --image-family=ubuntu-2004-lts \ --image-project=ubuntu-os-cloud \ --tags=cassandra-migration \ --boot-disk-size=20GB
2. Install Cassandra
# Install Java (Cassandra dependency) sudo apt-get update sudo apt-get install -y openjdk-11-jre-headless # Add Cassandra repository echo "deb [https://debian.cassandra.apache.org](https://debian.cassandra.apache.org) 41x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list curl [https://downloads.apache.org/cassandra/KEYS](https://downloads.apache.org/cassandra/KEYS) | sudo apt-key add - # Install Cassandra sudo apt-get update sudo apt-get install -y cassandra
3. Create a keyspace and table
We'll use an users table example and create a keyspace called "analytics".
cd ~/apache-cassandra bin/cqlsh <your-localhost-ip? 9042 #starts the cql shell
Inside cqlsh:
-- Create keyspace (adjust replication for production) CREATE KEYSPACE analytics WITH replication = {'class':'SimpleStrategy', 'replication_factor':1}; -- Use the keyspace USE analytics; -- Create the users table CREATE TABLE users ( id int PRIMARY KEY, active boolean, username text, ); -- Exit cqlsh EXIT;
Leave the SSH session open or note the IP address of this VM (hostname -I).
Next up
Next, you will setup a Cloud Spanner Instance and Database.
4. Create a Spanner instance and database (Target)
In Spanner, an instance is a cluster of computing and storage resources that hosts one or more Spanner databases. You will need at least 1 instance to host a Spanner database for this codelab.
Check gcloud SDK version
Before creating an instance, make sure that the gcloud SDK in the Google Cloud Shell has been updated to the version required - gcloud SDK 493.0.0. You can find your gcloud SDK version by following the below command.
$ gcloud version | grep Google
Here's an example output:
Google Cloud SDK 489.0.0
If the version you're using is earlier than the required 493.0.0 version (489.0.0
in the previous example), then you need to upgrade your Google Cloud SDK by running the following command:
sudo apt-get update \
&& sudo apt-get --only-upgrade install google-cloud-cli-anthoscli google-cloud-cli-cloud-run-proxy kubectl google-cloud-cli-skaffold google-cloud-cli-cbt google-cloud-cli-docker-credential-gcr google-cloud-cli-spanner-migration-tool google-cloud-cli-cloud-build-local google-cloud-cli-pubsub-emulator google-cloud-cli-app-engine-python google-cloud-cli-kpt google-cloud-cli-bigtable-emulator google-cloud-cli-datastore-emulator google-cloud-cli-spanner-emulator google-cloud-cli-app-engine-go google-cloud-cli-app-engine-python-extras google-cloud-cli-config-connector google-cloud-cli-package-go-module google-cloud-cli-istioctl google-cloud-cli-anthos-auth google-cloud-cli-gke-gcloud-auth-plugin google-cloud-cli-app-engine-grpc google-cloud-cli-kubectl-oidc google-cloud-cli-terraform-tools google-cloud-cli-nomos google-cloud-cli-local-extract google-cloud-cli-firestore-emulator google-cloud-cli-harbourbridge google-cloud-cli-log-streaming google-cloud-cli-minikube google-cloud-cli-app-engine-java google-cloud-cli-enterprise-certificate-proxy google-cloud-cli
Enable the Spanner API
Inside Cloud Shell, make sure that your project ID is setup. Use the first command below to find the currently configured project ID. If the result is not expected, the second command below sets the right one.
gcloud config get-value project
gcloud config set project [YOUR-DESIRED-PROJECT-ID]
Configure your default region to us-central1
. Feel free to change this to a different region supported by Spanner regional configurations.
gcloud config set compute/region us-central1
Enable the Spanner API:
gcloud services enable spanner.googleapis.com
Create the Spanner instance
In this section, you will create either a free trial instance or a provisioned instance. Throughout this codelab, the Spanner Cassandra Adapter Instance ID used is cassandra-adapter-demo
, set as SPANNER_INSTANCE_ID
variable using the export
command line. Optionally, you can pick your own instance id name.
Create a free-trial Spanner instance
A Spanner 90-day free trial instance is available to anyone with a Google Account who has Cloud Billing enabled in their project. You aren't charged unless you choose to upgrade your free trial instance to a paid instance. Spanner Cassandra Adapter is supported in the free trial instance. If eligible, create a free trial instance by opening up Cloud Shell and running this command:
export SPANNER_INSTANCE_ID=cassandra-adapter-demo
export SPANNER_REGION=regional-us-central1
gcloud spanner instances create $SPANNER_INSTANCE_ID \
--config=$SPANNER_REGION \
--instance-type=free-instance \
--description="Spanner Cassandra Adapter demo"
Command output:
$ gcloud spanner instances create $SPANNER_INSTANCE_ID \ --config=$SPANNER_REGION \ --instance-type=free-instance \ --description="Spanner Cassandra Adapter demo" Creating instance...done.
Create the database
Once your instance is running, you can create the database. The database is where you define your schema. You can also control who has access to the database, set up custom encryption, configure the optimizer, and set the retention period.
The database will be created in the instance with id SPANNER_INSTANCE_ID
.
To create a database, use the gcloud command line tool:
export SPANNER_DATABASE=analytics
gcloud spanner databases create $SPANNER_DATABASE \
--instance=$SPANNER_INSTANCE_ID
Command output:
$ gcloud spanner databases create $SPANNER_DATABASE \ --instance=$SPANNER_INSTANCE_ID Creating database...done.
5. Migrate Cassandra Schema and data model to Spanner
The initial and crucial phase of transitioning data from a Cassandra database to Spanner involves transforming the existing Cassandra schema to align with the structural and data type requirements of Spanner.
To streamline this complex schema migration process, Spanner provides a valuable open-source tool known-as the Spanner Cassandra schema tool.
Spanner Cassandra schema tool
Spanner Cassandra schema tool is a stand-alone open source tool for Spanner evaluation and schema migration. Its primary function is to automatically construct a Spanner schema based on the definitions found in an existing Cassandra schema. By analyzing the Cassandra table structures, data types, and primary key configurations, the tool generates equivalent Spanner table definitions, significantly reducing the manual effort typically involved in schema translation.
Export Cassandra schema
Before utilizing the Spanner Cassandra schema tool, the first concrete step is to extract the schema from your current Cassandra cluster. This can be achieved by connecting to your existing Cassandra cluster through cqlsh
, and exporting the schema from Cassandra:
cqlsh [IP] "-e DESC SCHEMA" > orig_schema.cql
In this command, [IP]
should be replaced with the IP address or hostname of one of the nodes in your Cassandra cluster. The -e DESC SCHEMA
part of the command instructs cqlsh to describe the entire schema of the Cassandra cluster. The output of this command, which contains the CREATE KEYSPACE and CREATE TABLE statements, is then redirected to a file named orig_schema.cql
.
The content of this orig_schema.cql
file will essentially represent a textual blueprint of your Cassandra schema. The content of the orig_schema.cql
file should look like this:
CREATE KEYSPACE analytics WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
CREATE TABLE analytics.users (
id int PRIMARY KEY,
active boolean,
username text
) WITH additional_write_policy = '99p'
AND allow_auto_snapshot = true
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND cdc = false
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND memtable = 'default'
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND extensions = {}
AND gc_grace_seconds = 864000
AND incremental_backups = true
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99p';
Clone the Repository
To utilize the Spanner Cassandra schema tool, the next step involves obtaining the tool's source code. This is done by cloning the repository hosted on GitHub. Clone the Spanner Cassandra schema tool from GitHub by typing the following command in Cloud Shell:
git clone https://github.com/cloudspannerecosystem/spanner-cassandra-schema-tool.git
Then change directory to the "spanner-cassandra-schema-tool" directory where you will run the command.
cd spanner-cassandra-schema-tool
Install Dependencies
The Spanner Cassandra schema tool is written in the Go programming language. To ensure the tool functions correctly, it relies on certain external Go modules (libraries). These dependencies need to be downloaded and managed before you can run the tool. Within the spanner-cassandra-schema-tool
directory, execute the following command:
go mod download
Set Up Google Cloud Credentials
This tool uses Application Default Credentials (ADC) as the credential source for connecting to Spanner databases. Set the GOOGLE_APPLICATION_CREDENTIALS
environment variable to the path of your service account key file.
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-file.json"
Replace /path/to/your/service-account-file.json
with the actual path to your downloaded service account key file. Setting this environment variable ensures that the Spanner Cassandra schema tool can securely authenticate with your Google Cloud project and Spanner instance.
Usage
Once the dependencies are installed and the Google Cloud credentials are configured, you are ready to run the Spanner Cassandra schema tool to generate the Spanner schema from the exported Cassandra schema file. Navigate to the spanner-cassandra-schema-tool
directory in your terminal or Cloud Shell and execute the following go run
command:
go run schema_converter.go \
--project $PROJECT_ID \
--instance $SPANNER_INSTANCE_ID \
--database $SPANNER_DATABASE \
--cql orig_schema.cql \
--dry-run
Running with --dry-run
option just generates the schema. Review and refine the data type mapping and primary key columns generated by tool. Ensure that Spanner data types accurately represent the range, precision, and semantics of the corresponding Cassandra database types.
This tool maps Cassandra types to Spanner types as documented in Supported Cassandra data types.
The command output would look something like this:
.....
[Converted Spanner statement]
CREATE TABLE users (
id INT64 NOT NULL OPTIONS (cassandra_type = 'int'),
active BOOL OPTIONS (cassandra_type = 'boolean'),
username STRING(MAX) OPTIONS (cassandra_type = 'text'),
) PRIMARY KEY (id)
----------------------------------------------
Writing converted Spanner schema to: schema.txt
Dry run enabled. Skipping schema execution.
Schema conversion completed!
In case you also want the apply schema to be automatically applied to Spanner, you should run the cli without the --dry-run
option.
Verify in the Google Cloud Console that the tables and metadata table exist in Cloud Spanner database.
8. Validate your data
[TODO]
9. Point your application to Spanner (Cutover)
After meticulously validating the accuracy and integrity of your data following the migration phase, the pivotal step is to transition your application's operational focus from your legacy Cassandra system to the newly populated Google Cloud Spanner database. This critical transition period is commonly referred to as the "cutover".
The cutover phase marks the moment when live application traffic is redirected away from the original Cassandra cluster and directly connected to the robust and scalable Spanner infrastructure. This transition demonstrates the ease with which applications can leverage the power of Spanner, especially when utilizing the Spanner Cassandra interface.
With the Spanner Cassandra interface, the cutover process is streamlined. It primarily involves configuring your client applications to utilize the native Spanner Cassandra Client for all data interaction. Instead of communicating with your Cassandra (origin) database, your applications will seamlessly begin reading and writing data directly to Spanner (target). This fundamental shift in connectivity is typically achieved through the use of the SpannerCqlSessionBuilder
, a key component of the Spanner Cassandra Client library that facilitates the establishment of connections to your Spanner instance. This effectively reroutes your application's entire data traffic flow to Spanner.
For Java applications already using the cassandra-java-driver
library, integrating the Spanner Cassandra Java Client requires only minor changes to the CqlSession
initialization.
Getting google-cloud-spanner-cassandra dependency
To begin using the Spanner Cassandra Client, you first need to incorporate its dependency into your project. The google-cloud-spanner-cassandra
artifacts are published in Maven central, under the group id com.google.cloud
. Add the following new dependency under the existing <dependencies>
section in your Java project. Here's a simplified example of how you would include the google-cloud-spanner-cassandra
dependency:
<!-- native Spanner Cassandra Client -->
<dependencies>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-spanner-cassandra</artifactId>
<version>0.2.0</version>
</dependency>
</dependencies>
Change connection configuration to connect to Spanner
Once you have added the necessary dependency, the next step is to change your connection configuration to connect to Spanner database.
A typical application interacting with a Cassandra cluster often employs code similar to the following to establish a connection:
CqlSession session = CqlSession.builder()
.addContactPoint(new InetSocketAddress("127.0.0.1", 9042))
.withLocalDatacenter("datacenter1")
.withAuthCredentials("username", "password")
.build();
To redirect this connection to Spanner, you need to modify your CqlSession
creation logic. Instead of directly using the standard CqlSessionBuilder
from the cassandra-java-driver
, you will utilize the SpannerCqlSession.builder()
provided by the Spanner Cassandra Client. Here's an illustrative example of how to modify your connection code:
String databaseUri = "projects/<your-gcp-project>/instances/<your-spanner-instance>/databases/<your-spanner-database>";
CqlSession session = SpannerCqlSession.builder()
.setDatabaseUri(databaseUri)
.addContactPoint(new InetSocketAddress("localhost", 9042))
.withLocalDatacenter("datacenter1")
.build();
By instantiating the CqlSession
using SpannerCqlSession.builder()
and providing the correct databaseUri
, your application will now establish a connection through the Spanner Cassandra Client to your target Spanner database. This pivotal change ensures that all subsequent read and write operations performed by your application will be directed to and served by Spanner, effectively completing the initial cutover. At this point, your application should continue to function as expected, now powered by the scalability and reliability of Spanner.
Under the Hood: How the Spanner Cassandra Client Operates
The Spanner Cassandra client acts as a local tcp proxy, intercepting the raw Cassandra protocol bytes sent by a driver or client tool. It then wraps these bytes along with necessary metadata into gRPC messages for communication with Spanner. Responses from Spanner are translated back into the Cassandra wire format and sent back to the originating driver or tool.
Once you are confident that Spanner is serving all traffic correctly, you can eventually:
- Stop dual writes.
- Decommission the original Cassandra cluster.
10. Cleaning up (optional)
To clean up, just go into the Spanner section of the Cloud Console and delete the cassandra-adapter-demo
instance we created in the codelab.
Delete Cassandra database (if installed locally or persisted)
If you installed Cassandra outside of a Compute Engine VM created here, follow appropriate steps to remove the data or uninstall Cassandra.
11. Congratulations!
What's next?
- Learn more about Cloud Spanner.
- Learn more about Cassandra Interface.