使用 Cloud Run、Video Intelligence API 和 Vertex AI 创建视频场景图片描述服务

1. 简介

概览

在此 Codelab 中,您将创建一个以 Node.js 编写的 Cloud Run 服务,为视频中的每个场景提供直观说明。首先,您的服务将使用 Video Intelligence API 检测每次场景变化的时间戳。接下来,您的服务将使用名为 ffmpeg 的第三方二进制文件捕获每个场景切换时间戳的屏幕截图。最后,Vertex AI 可视化字幕用于提供屏幕截图的视觉说明。

此 Codelab 还演示了如何在 Cloud Run 服务中使用 ffmpeg 从视频中捕获给定时间戳的图片。由于 ffmpeg 需要单独安装,因此此 Codelab 会介绍如何创建 Dockerfile,以便将 ffmpeg 作为 Cloud Run 服务的一部分进行安装。

下图说明了 Cloud Run 服务的工作原理:

Cloud Run 视频说明服务图表

学习内容

  • 如何使用 Dockerfile 创建容器映像以安装第三方二进制文件
  • 如何通过为 Cloud Run 服务创建服务账号来调用其他 Google Cloud 服务来遵循最小权限原则
  • 如何通过 Cloud Run 服务使用 Video Intelligence 客户端库
  • 如何调用 Google API 以获取 Vertex AI 中每个场景的直观描述

2. 设置和要求

前提条件

激活 Cloud Shell

  1. 在 Cloud Console 中,点击激活 Cloud Shelld1264ca30785e435.png

cb81e7c8e34bc8d.png

如果这是您第一次启动 Cloud Shell,系统会显示一个中间屏幕,说明它是什么。如果您看到中间屏幕,请点击继续

d95252b003979716.png

预配和连接到 Cloud Shell 只需花几分钟时间。

7833d5e1c5d18f54

这个虚拟机装有所需的所有开发工具。它提供了一个持久的 5 GB 主目录,并在 Google Cloud 中运行,大大增强了网络性能和身份验证功能。您在此 Codelab 中的大部分(即使不是全部)工作都可以通过浏览器完成。

在连接到 Cloud Shell 后,您应该会看到自己已通过身份验证,并且相关项目已设为您的项目 ID。

  1. 在 Cloud Shell 中运行以下命令以确认您已通过身份验证:
gcloud auth list

命令输出

 Credentialed Accounts
ACTIVE  ACCOUNT
*       <my_account>@<my_domain.com>

To set the active account, run:
    $ gcloud config set account `ACCOUNT`
  1. 在 Cloud Shell 中运行以下命令,以确认 gcloud 命令了解您的项目:
gcloud config list project

命令输出

[core]
project = <PROJECT_ID>

如果不是上述结果,您可以使用以下命令进行设置:

gcloud config set project <PROJECT_ID>

命令输出

Updated property [core/project].

3. 启用 API 并设置环境变量

在开始使用此 Codelab 之前,您需要先启用多个 API。此 Codelab 需要使用以下 API。您可以通过运行以下命令来启用这些 API:

gcloud services enable run.googleapis.com \
    storage.googleapis.com \
    cloudbuild.googleapis.com \
    videointelligence.googleapis.com \
    aiplatform.googleapis.com

然后,您可以设置要在整个 Codelab 中使用的环境变量。

REGION=<YOUR-REGION>
PROJECT_ID=<YOUR-PROJECT-ID>

PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format='value(projectNumber)')
SERVICE_NAME=video-describer
export BUCKET_ID=$PROJECT_ID-video-describer

4. 创建 Cloud Storage 存储分区

创建一个 Cloud Storage 存储分区,您可以在其中使用以下命令上传供 Cloud Run 服务处理的视频:

gsutil mb -l us-central1 gs://$BUCKET_ID/

[可选] 您可以下载此示例视频,将其下载到本地,加以使用。

gsutil cp gs://cloud-samples-data/video/visionapi.mp4 testvideo.mp4

现在,将您的视频文件上传到存储分区中。

FILENAME=<YOUR-VIDEO-FILENAME>
gsutil cp $FILENAME gs://$BUCKET_ID

5. 创建 Node.js 应用

首先,为源代码创建一个目录,然后通过 cd 命令进入该目录。

mkdir video-describer && cd $_

然后,创建一个包含以下内容的 package.json 文件:

{
  "name": "video-describer",
  "version": "1.0.0",
  "private": true,
  "description": "describes the image in every scene for a given video",
  "main": "index.js",
  "author": "Google LLC",
  "license": "Apache-2.0",
  "scripts": {
    "start": "node index.js"
  },
  "dependencies": {
    "@google-cloud/storage": "^7.7.0",
    "@google-cloud/video-intelligence": "^5.0.1",
    "axios": "^1.6.2",
    "express": "^4.18.2",
    "fluent-ffmpeg": "^2.1.2",
    "google-auth-library": "^9.4.1"
  }
}

为提高可读性,此应用包含多个源文件。首先,创建一个包含以下内容的 index.js 源文件。此文件包含服务的入口点以及应用的主要逻辑。

const { captureImages } = require('./imageCapture.js');
const { detectSceneChanges } = require('./sceneDetector.js');
const transcribeScene = require('./imageDescriber.js');
const { Storage } = require('@google-cloud/storage');
const fs = require('fs').promises;
const path = require('path');
const express = require('express');
const app = express();

const bucketName = process.env.BUCKET_ID;

const port = parseInt(process.env.PORT) || 8080;
app.listen(port, () => {
  console.log(`video describer service ready: listening on port ${port}`);
});

// entry point for the service
app.get('/', async (req, res) => {

  try {

    // download the requested video from Cloud Storage
    let videoFilename =  req.query.filename; 
    console.log("processing file: " + videoFilename);

    // download the file to locally to the Cloud Run instance
    let localFilename = await downloadVideoFile(videoFilename);

    // detect all the scenes in the video & save timestamps to an array
    let timestamps = await detectSceneChanges(localFilename);
    console.log("Detected scene changes at the following timestamps: ", timestamps);

    // create an image of each scene change
    // and save to a local directory called "output"
    await captureImages(localFilename, timestamps);

    // get an access token for the Service Account to call the Google APIs 
    let accessToken = await transcribeScene.getAccessToken();
    console.log("got an access token");

    let imageBaseName = path.parse(localFilename).name;

    // the data structure for storing the scene description and timestamp
    // e.g. an array of json objects {timestamp: 1, description: "..."}, etc.    
    let scenes = []

    // for each timestamp, send the image to Vertex AI
    console.log("getting Vertex AI description all the timestamps");
    scenes = await Promise.all(
      timestamps.map(async (timestamp) => {

        let filepath = path.join("./output", imageBaseName + "-" + timestamp + ".png");

        // get the base64 encoded image
        const encodedFile = await fs.readFile(filepath, 'base64');

        // send each screenshot to Vertex AI for description
        let description = await transcribeScene.transcribeScene(accessToken, encodedFile)

        return { timestamp: timestamp, description: description };
      }));

    console.log("finished collecting all the scenes");
    //console.log(scenes);

    return res.json(scenes);

  } catch (error) {

    //return an error
    console.log("received error: ", error);
    return res.status(500).json("an internal error occurred");
  }

});

async function downloadVideoFile(videoFilename) {
  // Creates a client
  const storage = new Storage();

  // keep same name locally
  let localFilename = videoFilename;

  const options = {
    destination: localFilename
  };

  // Download the file
  await storage.bucket(bucketName).file(videoFilename).download(options);

  console.log(
    `gs://${bucketName}/${videoFilename} downloaded locally to ${localFilename}.`
  );

  return localFilename;
}

接下来,使用以下内容创建 sceneDetector.js 文件。此文件使用 Video Intelligence API 检测视频中的场景何时发生变化。

const fs = require('fs');
const util = require('util');
const readFile = util.promisify(fs.readFile);
const ffmpeg = require('fluent-ffmpeg');

const Video = require('@google-cloud/video-intelligence');
const client = new Video.VideoIntelligenceServiceClient();

module.exports = {
    detectSceneChanges: async function (downloadedFile) {

        // Reads a local video file and converts it to base64       
        const file = await readFile(downloadedFile);
        const inputContent = file.toString('base64');

        // setup request for shot change detection
        const videoContext = {
            speechTranscriptionConfig: {
                languageCode: 'en-US',
                enableAutomaticPunctuation: true,
            },
        };

        const request = {
            inputContent: inputContent,
            features: ['SHOT_CHANGE_DETECTION'],
        };

        // Detects camera shot changes
        const [operation] = await client.annotateVideo(request);
        console.log('Shot (scene) detection in progress...');
        const [operationResult] = await operation.promise();

        // Gets shot changes
        const shotChanges = operationResult.annotationResults[0].shotAnnotations;

        console.log("Shot (scene) changes detected: " + shotChanges.length);

        // data structure to be returned 
        let sceneChanges = [];

        // for the initial scene
        sceneChanges.push(1);

        // if only one scene, keep at 1 second
        if (shotChanges.length === 1) {
            return sceneChanges;
        }

        // get length of video
        const videoLength = await getVideoLength(downloadedFile);

        shotChanges.forEach((shot, shotIndex) => {
            if (shot.endTimeOffset === undefined) {
                shot.endTimeOffset = {};
            }
            if (shot.endTimeOffset.seconds === undefined) {
                shot.endTimeOffset.seconds = 0;
            }
            if (shot.endTimeOffset.nanos === undefined) {
                shot.endTimeOffset.nanos = 0;
            }

            // convert to a number
            let currentTimestampSecond = Number(shot.endTimeOffset.seconds);                  

            let sceneChangeTime = 0;
            // double-check no scenes were detected within the last second
            if (currentTimestampSecond + 1 > videoLength) {
                sceneChangeTime = currentTimestampSecond;                
            } else {
                // otherwise, for simplicity, just round up to the next second 
                sceneChangeTime = currentTimestampSecond + 1;
            }

            sceneChanges.push(sceneChangeTime);
        });

        return sceneChanges;
    }
}

async function getVideoLength(localFile) {
    let getLength = util.promisify(ffmpeg.ffprobe);
    let length = await getLength(localFile);

    console.log("video length: ", length.format.duration);
    return length.format.duration;
}

现在创建一个名为 imageCapture.js 的文件,其中包含以下内容。此文件使用节点软件包 fluent-ffmpeg 从节点应用中运行 ffmpeg 命令。

const ffmpeg = require('fluent-ffmpeg');
const path = require('path');
const util = require('util');


module.exports = {
    captureImages: async function (localFile, scenes) {


        let imageBaseName = path.parse(localFile).name;


        try {
            for (scene of scenes) {
                console.log("creating screenshot for scene: ", + scene);
                await createScreenshot(localFile, imageBaseName, scene);
            }


        } catch (error) {
            console.log("error gathering screenshots: ", error);
        }


        console.log("finished gathering the screenshots");
    }
}


async function createScreenshot(localFile, imageBaseName, scene) {
    return new Promise((resolve, reject) => {
        ffmpeg(localFile)
            .screenshots({
                timestamps: [scene],
                filename: `${imageBaseName}-${scene}.png`,
                folder: 'output',
                size: '320x240'
            }).on("error", () => {
                console.log("Failed to create scene for timestamp: " + scene);
                return reject('Failed to create scene for timestamp: ' + scene);
            })
            .on("end", () => {
                return resolve();
            });
    })
}

最后,创建名为 `imageDescriber.js` 的文件,其中包含以下内容。此文件使用 Vertex AI 获取每张场景图片的直观描述。

const axios = require("axios");
const { GoogleAuth } = require('google-auth-library');

const auth = new GoogleAuth({
    scopes: 'https://www.googleapis.com/auth/cloud-platform'
});

module.exports = {
    getAccessToken: async function () {

        return await auth.getAccessToken();
    }, 

    transcribeScene: async function(token, encodedFile) {

        let projectId = await auth.getProjectId();
    
        let config = {
            headers: {
                'Authorization': 'Bearer ' + token,
                'Content-Type': 'application/json; charset=utf-8'
            }
        }

        const json = {
            "instances": [
                {
                    "image": {
                        "bytesBase64Encoded": encodedFile
                    }
                }
            ],
            "parameters": {
                "sampleCount": 1,
                "language": "en"
            }
        }

        let response = await axios.post('https://us-central1-aiplatform.googleapis.com/v1/projects/' + projectId + '/locations/us-central1/publishers/google/models/imagetext:predict', json, config);

        return response.data.predictions[0];
    }
}

创建 Dockerfile 和 .dockerignore 文件

由于此服务使用 ffmpeg,因此您需要创建一个用于安装 ffmpeg 的 Dockerfile。

创建一个名为 Dockerfile 的文件,其中包含以下内容:

# Copyright 2020 Google, LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Use the official lightweight Node.js image.
# https://hub.docker.com/_/node
FROM node:20.10.0-slim

# Create and change to the app directory.
WORKDIR /usr/src/app

RUN apt-get update && apt-get install -y ffmpeg

# Copy application dependency manifests to the container image.
# A wildcard is used to ensure both package.json AND package-lock.json are copied.
# Copying this separately prevents re-running npm install on every code change.
COPY package*.json ./

# Install dependencies.
# If you add a package-lock.json speed your build by switching to 'npm ci'.
# RUN npm ci --only=production
RUN npm install --production

# Copy local code to the container image.
COPY . .

# Run the web service on container startup.
CMD [ "npm", "start" ]

并创建一个名为 .dockerignore 的文件,以忽略对某些文件的容器化。

Dockerfile
.dockerignore
node_modules
npm-debug.log

6. 创建服务账号

您将为 Cloud Run 服务创建一个服务账号,用于访问 Cloud Storage、Vertex AI 和 Video Intelligence API。

SERVICE_ACCOUNT="cloud-run-video-description"
SERVICE_ACCOUNT_ADDRESS=$SERVICE_ACCOUNT@$PROJECT_ID.iam.gserviceaccount.com

gcloud iam service-accounts create $SERVICE_ACCOUNT \
  --display-name="Cloud Run Video Scene Image Describer service account"
 
# to view & download storage bucket objects
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member serviceAccount:$SERVICE_ACCOUNT_ADDRESS \
  --role=roles/storage.objectViewer

# to call the Vertex AI imagetext model
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member serviceAccount:$SERVICE_ACCOUNT_ADDRESS \
  --role=roles/aiplatform.user

7. 部署 Cloud Run 服务

现在,您可以使用基于源代码的部署自动将 Cloud Run 服务容器化。

注意:Cloud Run 服务的默认处理时间为 60 秒。此 Codelab 使用的超时时间为 5 分钟,因为建议的测试视频的时长为 2 分钟。如果您使用的是时长较长的视频,则可能需要修改时长。

gcloud run deploy $SERVICE_NAME \
  --region=$REGION \
  --set-env-vars BUCKET_ID=$BUCKET_ID \
  --no-allow-unauthenticated \
  --service-account $SERVICE_ACCOUNT_ADDRESS \
  --timeout=5m \
  --source=.

部署后,将服务网址保存在环境变量中。

SERVICE_URL=$(gcloud run services describe $SERVICE_NAME --platform managed --region $REGION --format 'value(status.url)')

8. 调用 Cloud Run 服务

现在,您可以通过提供上传到 Cloud Storage 的视频的名称来调用您的服务。

curl -X GET -H "Authorization: Bearer $(gcloud auth print-identity-token)" ${SERVICE_URL}?filename=${FILENAME}

结果应类似于以下示例输出:

[{"timestamp":1,"description":"an aerial view of a city with a bridge in the background"},{"timestamp":7,"description":"a man in a blue shirt sits in front of shelves of donuts"},{"timestamp":11,"description":"a black and white photo of people working in a bakery"},{"timestamp":12,"description":"a black and white photo of a man and woman working in a bakery"}]

9. 恭喜!

恭喜您完成此 Codelab!

建议您查看有关 Video Intelligence APICloud RunVertex AI 视觉字幕的文档。

所学内容

  • 如何使用 Dockerfile 创建容器映像以安装第三方二进制文件
  • 如何通过为 Cloud Run 服务创建服务账号来调用其他 Google Cloud 服务来遵循最小权限原则
  • 如何通过 Cloud Run 服务使用 Video Intelligence 客户端库
  • 如何调用 Google API 以获取 Vertex AI 中每个场景的直观描述

10. 清理

为避免产生意外费用(例如,如果此 Cloud Run 服务的调用次数超出免费层级中的每月 Cloud Run 调用次数),您可以删除该 Cloud Run 服务或删除您在第 2 步中创建的项目。

如需删除 Cloud Run 服务,请前往 https://console.cloud.google.com/run/ 前往 Cloud Run Cloud 控制台,然后删除 video-describer 函数(如果您使用的是其他名称,则删除 $SERVICE_NAME)。

如果您选择删除整个项目,可以前往 https://console.cloud.google.com/cloud-resource-manager,选择您在第 2 步中创建的项目,然后选择“删除”。如果删除项目,则需要在 Cloud SDK 中更改项目。您可以通过运行 gcloud projects list 来查看所有可用项目的列表。