Cloud Bigtable for Cassandra users

1. Introduction

This codelab is a guide for anyone who is migrating queries from Apache Cassandra to Google Cloud Bigtable.

In this codelab you'll

  • Use the Cloud Bigtable emulator
  • Explore a timeseries use case
  • Create a table and column family
  • Learn the Cloud Bigtable Java equivalents of CQL Insert, Update, Select, and Delete

How would you rate your experience with using Google Cloud Platform?

Novice Intermediate Proficient

How will you use this tutorial?

Read it through only Read it and complete the exercises

2. Set up

You'll be looking at a sample dataset with just a few rows to let you get an understanding of the core concepts quickly.

Cassandra

Equivalent Cassandra queries will be present on each step, so feel free to follow along on a local cluster if you'd like, or you can quickly set up a click-to-deploy Cassandra cluster and SSH into it.

If you're following along, create a keyspace and use it. The replication strategy won't be important here.

cqlsh> create keyspace mykeyspace with replication = {'class':'SimpleStrategy','replication_factor' : 2};
cqlsh> use mykeyspace;

Cloud Bigtable

You'll need a Cloud Bigtable instance for your table. You can set up a local instance for free using the emulator. You won't need to create a keyspace in Cloud Bigtable, your replication is handled by your instance configuration.

Use the following command to start the emulator:

gcloud beta emulators bigtable start

Then in another shell window or tab set the emulator environment variable with this command:

$(gcloud beta emulators bigtable env-init) #Sets BIGTAB`LE_EMULATOR_HOST

Then create a Java project you will use to run the code examples, and import the Cloud Bigtable client with Maven, Gradle or SBT. Then in a new Java file create a connection to the data client.

BigtableDataSettings settings =
    BigtableDataSettings.newBuilder().setProjectId(projectId).setInstanceId(instanceId).build();

try {
  dataClient = BigtableDataClient.create(settings);
} catch (Exception e) {
  System.out.println("Error during data client connection: \n" + e.toString());
}

3. Table creation

In both Cassandra and Cloud Bigtable tables, each row has a key associated with it. Cassandra keys have a partition key and clustering column which can be separate or overlap. The entirety of Cloud Bigtable keys are used for splits (partitions) and ordering. These two systems are incredibly similar when it comes to primary/row key construction. Both systems are basically lexicographically sorted lists where the keys act as the main form of row distribution between nodes. In most cases you can reuse the same key for Cloud Bigtable.

In this codelab, you'll use Cloud Bigtable to store time series data about mobile phones and mobile tablets (not to be confused with Bigtable tablets.) Below are the instructions for creating the table.

Cassandra

cqlsh:mykeyspace> create table mobileTimeSeries (
           deviceid text,
           devicetype text,
           date date,
           connected_cell map<timestamp,Boolean>, 
           os_build text, 
           os_name text,
           PRIMARY KEY((devicetype, deviceid), date));

Cloud Bigtable

You can create a table and column family using the Java client, but it's easiest to use the following command with the cbt tool:

cbt createtable mobile-time-series families=stats_summary

Cloud Bigtable is a NoSQL database, so you won't need to define a schema at table creation, but you'll want to think about the queries you're going to run and optimize the row key for them.

The most common strategy for key migration is simply to take all partition keys and clustering columns and join them to form a string representing the Cloud Bigtable row key. We generally recommend using string-based keys as they help with debugging the key distribution through Key Visualizer. You can use a separator such as a hash ‘#' between the column values to help with readability.

In this example we'll use a rowkey of "[DEVICE_TYPE]#[DEVICE_ID]#[YYYYMMDD]"

4. Inserts

Inserts are fairly similar between Cassandra and Cloud Bigtable. Here you'll insert one row and then multiple rows using a rowkey of "[DEVICE_TYPE]#[DEVICE_ID]#[YYYYMMDD]".

Cassandra

Single

cqlsh:mykeyspace> insert into mobileTimeSeries (deviceid, devicetype, date, connected_cell, os_build) values ('4c410523', 'phone',toDate(now()), {toTimeStamp(now()): true}, 'PQ2A.190405.003');

Batch

cqlsh:mykeyspace> BEGIN BATCH
insert into mobileTimeSeries (deviceid, devicetype, date, os_name, os_build) values ('a0b81f74', 'tablet', '2019-01-01', 'chromeos', '12155.0.0-rc1');
insert into mobileTimeSeries (deviceid, devicetype, date, os_name, os_build) values ('a0b81f74', 'tablet', '2019-01-02','chromeos', '12145.0.0-rc6');
APPLY BATCH;

Cloud Bigtable

Single

Create a mutation with the row key and data you'd like to use, then apply the mutation with the data client. You'll add a row of data for a phone with information about its cell connection and operating system.

try {
  System.currentTimeMillis();
  long timestamp = (long) 1556712000 * 1000; // Timestamp of June 1, 2019 12:00

  String rowKey = "phone#4c410523#20190501";
  ByteString one = ByteString.copyFrom(new byte[] {0, 0, 0, 0, 0, 0, 0, 1});

  RowMutation rowMutation =
      RowMutation.create(tableId, rowKey)
          .setCell(
              COLUMN_FAMILY_NAME,
              ByteString.copyFrom("connected_cell".getBytes()),
              timestamp,
              one)
          .setCell(COLUMN_FAMILY_NAME, "os_build", timestamp, "PQ2A.190405.003");

  dataClient.mutateRow(rowMutation);
} catch (Exception e) {
  System.out.println("Error during Write: \n" + e.toString());
}

Batch

Define multiple mutations on a bulkMutation object, then use the data client to apply all the mutations with one API call. You'll add a few days of data about a mobile tablet's operating system name and version.

try {
  long timestamp = (long) 1556712000 * 1000; // Timestamp of June 1, 2019 12:00

  BulkMutation bulkMutation =
      BulkMutation.create(tableId)
          .add(
              "tablet#a0b81f74#20190501",
              Mutation.create()
                  .setCell(COLUMN_FAMILY_NAME, "os_name", timestamp, "chromeos")
                  .setCell(COLUMN_FAMILY_NAME, "os_build", timestamp, "12155.0.0-rc1"))
          .add(
              "tablet#a0b81f74#20190502",
              Mutation.create()
                  .setCell(COLUMN_FAMILY_NAME, "os_name", timestamp, "chromeos")
                  .setCell(COLUMN_FAMILY_NAME, "os_build", timestamp, "12155.0.0-rc6"));

  dataClient.bulkMutateRows(bulkMutation);
} catch (Exception e) {
  System.out.println("Error during WriteBatch: \n" + e.toString());
}

5. Updates

Here you'll update a cell that hasn't been written yet and then write a new value to a cell while keeping the previous versions too.

Cassandra

Adding cells

cqlsh:mykeyspace> UPDATE mobileTimeSeries SET os_name = 'android' WHERE devicetype='phone' AND deviceid = '4c410523' AND date = '2019-09-06';

Updating cells

cqlsh:mykeyspace> UPDATE mobileTimeSeries SET connected_cell = connected_cell +  {toTimeStamp(now()): false} WHERE devicetype='phone' AND deviceid = '4c410523' AND date = '2019-09-06';

Cloud Bigtable

In Cloud Bigtable, you can just treat updates the same as writes.

Adding cells

This is the same as writing cells, just provide a column that you haven't written to previously. Here you'll add the operating system name to the phone's row.

try {
  long timestamp = (long) 1556713800 * 1000; // Timestamp of June 1, 2019 12:30

  String rowKey = "phone#4c410523#20190501";

  RowMutation rowMutation =
      RowMutation.create(tableId, rowKey)
          .setCell(COLUMN_FAMILY_NAME, "os_name", timestamp, "android");

  dataClient.mutateRow(rowMutation);
} catch (Exception e) {
  System.out.println("Error during update: \n" + e.toString());
}

Updating cells

Here you'll add a new data about the phone's cell connection status. You can use cell versions to store part of your time series data easily. Just provide a timestamp for your write and you'll add a new version for the cell. To clean up your data, you can use garbage collection to delete versions after a certain number or a certain amount of time.

try {
  long timestamp = (long) 1556713800 * 1000; // Timestamp of June 1, 2019 12:30

  String rowKey = "phone#4c410523#20190501";

  ByteString zero = ByteString.copyFrom(new byte[] {0, 0, 0, 0, 0, 0, 0, 0});

  RowMutation rowMutation =
      RowMutation.create(tableId, rowKey)
          .setCell(
              COLUMN_FAMILY_NAME,
              ByteString.copyFrom("connected_cell".getBytes()),
              timestamp,
              zero);

  dataClient.mutateRow(rowMutation);
} catch (Exception e) {
  System.out.println("Error during update2: \n" + e.toString());
}

6. Selects

Now, you'll retrieve the data you've written into the table. When migrating CQL select statements you need to take into account several aspects of select statements such as columns, filtering through where clauses, and limit and aggregates functions such as group by. Here, you'll just look at two simple select statements to get the basic idea, but can look in the documentation for more information on selecting. In Cloud Bigtable there are two types of retrieve operations: Get and Scan. Get retrieves one row while scan retrieves range of rows.

Cassandra

Single

cqlsh:mykeyspace> SELECT * FROM mobileTimeSeries WHERE devicetype='phone' AND deviceid = '4c410523' AND date = '2019-09-04';

Multiple

cqlsh:mykeyspace> SELECT * FROM mobileTimeSeries WHERE devicetype='tablet' AND deviceid = 'a0b81f74' AND date >= '2019-09-04';

Cloud Bigtable

Single

Use a row lookup to get data for a specific phone at the specified date which is all within one row. This will return each timestamped version of the values, so you should see two lines for connected_cell at different timestamps.

try {
  String rowKey = "phone#4c410523#20190501";

  Row row = dataClient.readRow(tableId, rowKey);
  for (RowCell cell : row.getCells()) {

    System.out.printf(
        "Family: %s    Qualifier: %s    Value: %s    Timestamp: %s%n",
        cell.getFamily(),
        cell.getQualifier().toStringUtf8(),
        cell.getValue().toStringUtf8(),
        cell.getTimestamp());
  }
} catch (Exception e) {
  System.out.println("Error during lookup: \n" + e.toString());
}

Multiple

Use a range scan to view a month of data for a specified mobile tablet which is spread across multiple rows. You can use a filter with these to only get certain versions of the data or filter on values.

try {
  Query query = Query.create(tableId).range("tablet#a0b81f74#201905", "tablet#a0b81f74#201906");
  ServerStream<Row> rowStream = dataClient.readRows(query);
  for (Row row : rowStream) {
    System.out.println("Row Key: " + row.getKey().toStringUtf8());
    for (RowCell cell : row.getCells()) {

      System.out.printf(
          "Family: %s    Qualifier: %s    Value: %s    Timestamp: %s%n",
          cell.getFamily(),
          cell.getQualifier().toStringUtf8(),
          cell.getValue().toStringUtf8(),
          cell.getTimestamp());
    }
  }
} catch (Exception e) {
  System.out.println("Error during scan: \n" + e.toString());
}

7. Deletes

Here you'll delete the data you put into your table. First you'll delete an individual row, then you'll delete multiple rows.

Cassandra's CQL allows for single row deletes as well as range removals when all the primary columns are specified. You can do this with Bigtable by scanning a range then performing row level deletes. Note that you will get the same result, but will have more operations since each delete will be its own operation.

Cassandra

Single

cqlsh:mykeyspace> DELETE from mobileTimeSeries where devicetype='phone' and deviceid = '4c410523';

Multiple

cqlsh:mykeyspace> DELETE from mobileTimeSeries where devicetype='tablet' and deviceid = 'a0b81f74';

Cloud Bigtable

Single

Here you'll delete the data for a specific phone and date. Use the row key to delete one row at a time.

try {
  String rowKey = "phone#4c410523#20190501";

  RowMutation mutation = RowMutation.create(tableId, rowKey).deleteRow();

  dataClient.mutateRow(mutation);
} catch (Exception e) {
  System.out.println("Error during Delete: \n" + e.toString());
}

Multiple

Here you'll delete all the data for a specific mobile tablet. To migrate a CQL-query deleting multiple rows, you would need to perform a scan and then delete each row using the resulting set of row keys.

try {
  Query query = Query.create(tableId).prefix("tablet#a0b81f7");
  ServerStream<Row> rowStream = dataClient.readRows(query);
  BulkMutation bulkMutation = BulkMutation.create(tableId);
  for (Row row : rowStream) {
    bulkMutation.add(row.getKey(), Mutation.create().deleteRow());
  }

  dataClient.bulkMutateRows(bulkMutation);
} catch (Exception e) {
  System.out.println("Error during DeleteMultiple: \n" + e.toString());
}

8. Finishing up

Clean up

Cassandra

If you created a Cassandra cluster to follow along with this, feel free to delete it as you normally would.

Cloud Bigtable

If you created your table on an existing Cloud Bigtable instance you can delete it with the cbt command

cbt deletetable mobile-time-series

If you used the emulator, you can just stop the emulator to clear out all the work, by typing CTRL-C in the terminal you started it in.

Next steps