Cloud Run, Video Intelligence API, Vertex AI를 사용하여 장면별 이미지 설명 서비스 만들기

이 Codelab 정보

최종 업데이트: 1월 10, 2024

작성자: Google 직원

1. 소개

개요

이 Codelab에서는 동영상의 모든 장면에 대한 시각적 설명을 제공하는 Node.js로 작성된 Cloud Run 서비스를 만듭니다. 먼저 서비스에서 Video Intelligence API를 사용하여 장면이 변경될 때마다 타임스탬프를 감지합니다. 다음으로 서비스에서 ffmpeg라는 서드 파티 바이너리를 사용하여 각 장면 변경 타임스탬프의 스크린샷을 캡처합니다. 마지막으로 Vertex AI 시각적 캡션을 사용하여 스크린샷에 대한 시각적 설명을 제공합니다.

또한 이 Codelab은 Cloud Run 서비스 내에서 ffmpeg를 사용하여 특정 타임스탬프에 동영상에서 이미지를 캡처하는 방법도 보여줍니다. ffmpeg는 독립적으로 설치해야 하므로 이 Codelab에서는 Dockerfile을 만들어 ffmpeg를 Cloud Run 서비스의 일부로 설치하는 방법을 보여줍니다.

다음은 Cloud Run 서비스의 작동 방식을 보여주는 그림입니다.

Cloud Run 동영상 설명 서비스 다이어그램

학습할 내용

Dockerfile을 사용하여 컨테이너 이미지를 만들어 서드 파티 바이너리를 설치하는 방법
다른 Google Cloud 서비스를 호출하는 Cloud Run 서비스의 서비스 계정을 만들어 최소 권한의 원칙을 따르는 방법
Cloud Run 서비스에서 Video Intelligence 클라이언트 라이브러리를 사용하는 방법을 알아봅니다.
Google API를 호출하여 Vertex AI에서 각 장면의 시각적 설명을 가져오는 방법

2. 설정 및 요구사항

기본 요건

Cloud 콘솔에 로그인했습니다.
이전에 Cloud Run 서비스를 배포했습니다. 예를 들어 소스 코드에서 웹 서비스 배포 빠른 시작의 안내에 따라 시작할 수 있습니다.

Cloud Shell 활성화

Cloud Console에서 Cloud Shell 활성화를 클릭합니다.

Cloud Shell을 처음 시작하는 경우에는 무엇이 있는지 설명하는 중간 화면이 표시됩니다. 중간 화면이 표시되면 계속을 클릭합니다.

Cloud Shell을 프로비저닝하고 연결하는 데 몇 분 정도만 걸립니다.

가상 머신에는 필요한 개발 도구가 모두 들어 있습니다. 영구적인 5GB 홈 디렉터리를 제공하고 Google Cloud에서 실행되므로 네트워크 성능과 인증이 크게 개선됩니다. 이 Codelab에서 대부분의 작업은 브라우저를 사용하여 수행할 수 있습니다.

Cloud Shell에 연결되면 인증이 완료되었고 프로젝트가 자신의 프로젝트 ID로 설정된 것을 확인할 수 있습니다.

Cloud Shell에서 다음 명령어를 실행하여 인증되었는지 확인합니다.

gcloud auth list

명령어 결과

 Credentialed Accounts
ACTIVE  ACCOUNT
*       <my_account>@<my_domain.com>

To set the active account, run:
    $ gcloud config set account `ACCOUNT`

Cloud Shell에서 다음 명령어를 실행하여 gcloud 명령어가 프로젝트를 알고 있는지 확인합니다.

gcloud config list project

명령어 결과

[core]
project = <PROJECT_ID>

또는 다음 명령어로 설정할 수 있습니다.

gcloud config set project <PROJECT_ID>

명령어 결과

Updated property [core/project].

3. API 사용 설정 및 환경 변수 설정

이 Codelab을 사용하기 전에 먼저 사용 설정해야 하는 API가 몇 가지 있습니다. 이 Codelab에서는 다음 API를 사용해야 합니다. 다음 명령어를 실행하여 이러한 API를 사용 설정할 수 있습니다.

gcloud services enable run.googleapis.com \
    storage.googleapis.com \
    cloudbuild.googleapis.com \
    videointelligence.googleapis.com \
    aiplatform.googleapis.com

그런 다음 이 Codelab 전체에서 사용할 환경 변수를 설정할 수 있습니다.

REGION=<YOUR-REGION>
PROJECT_ID=<YOUR-PROJECT-ID>

PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format='value(projectNumber)')
SERVICE_NAME=video-describer
export BUCKET_ID=$PROJECT_ID-video-describer

4. Cloud Storage 버킷 만들기

다음 명령어를 사용하여 Cloud Run 서비스에서 처리할 동영상을 업로드할 수 있는 Cloud Storage 버킷을 만듭니다.

gsutil mb -l us-central1 gs://$BUCKET_ID/

[선택사항] 이 샘플 동영상을 로컬에 다운로드하여 사용할 수 있습니다.

gsutil cp gs://cloud-samples-data/video/visionapi.mp4 testvideo.mp4

이제 동영상 파일을 저장소 버킷에 업로드합니다.

FILENAME=<YOUR-VIDEO-FILENAME>
gsutil cp $FILENAME gs://$BUCKET_ID

5. Node.js 앱 만들기

먼저 소스 코드용 디렉터리를 만들고 해당 디렉터리로 cd하세요.

mkdir video-describer && cd $_

그런 다음 다음 내용으로 package.json 파일을 만듭니다.

{
  "name": "video-describer",
  "version": "1.0.0",
  "private": true,
  "description": "describes the image in every scene for a given video",
  "main": "index.js",
  "author": "Google LLC",
  "license": "Apache-2.0",
  "scripts": {
    "start": "node index.js"
  },
  "dependencies": {
    "@google-cloud/storage": "^7.7.0",
    "@google-cloud/video-intelligence": "^5.0.1",
    "axios": "^1.6.2",
    "express": "^4.18.2",
    "fluent-ffmpeg": "^2.1.2",
    "google-auth-library": "^9.4.1"
  }
}

이 앱은 가독성을 높이기 위해 여러 소스 파일로 구성됩니다. 먼저 아래 내용으로 index.js 소스 파일을 만듭니다. 이 파일에는 서비스의 진입점과 앱의 기본 로직이 포함되어 있습니다.

const { captureImages } = require('./imageCapture.js');
const { detectSceneChanges } = require('./sceneDetector.js');
const transcribeScene = require('./imageDescriber.js');
const { Storage } = require('@google-cloud/storage');
const fs = require('fs').promises;
const path = require('path');
const express = require('express');
const app = express();

const bucketName = process.env.BUCKET_ID;

const port = parseInt(process.env.PORT) || 8080;
app.listen(port, () => {
  console.log(`video describer service ready: listening on port ${port}`);
});

// entry point for the service
app.get('/', async (req, res) => {

  try {

    // download the requested video from Cloud Storage
    let videoFilename =  req.query.filename; 
    console.log("processing file: " + videoFilename);

    // download the file to locally to the Cloud Run instance
    let localFilename = await downloadVideoFile(videoFilename);

    // detect all the scenes in the video & save timestamps to an array
    let timestamps = await detectSceneChanges(localFilename);
    console.log("Detected scene changes at the following timestamps: ", timestamps);

    // create an image of each scene change
    // and save to a local directory called "output"
    await captureImages(localFilename, timestamps);

    // get an access token for the Service Account to call the Google APIs 
    let accessToken = await transcribeScene.getAccessToken();
    console.log("got an access token");

    let imageBaseName = path.parse(localFilename).name;

    // the data structure for storing the scene description and timestamp
    // e.g. an array of json objects {timestamp: 1, description: "..."}, etc.    
    let scenes = []

    // for each timestamp, send the image to Vertex AI
    console.log("getting Vertex AI description all the timestamps");
    scenes = await Promise.all(
      timestamps.map(async (timestamp) => {

        let filepath = path.join("./output", imageBaseName + "-" + timestamp + ".png");

        // get the base64 encoded image
        const encodedFile = await fs.readFile(filepath, 'base64');

        // send each screenshot to Vertex AI for description
        let description = await transcribeScene.transcribeScene(accessToken, encodedFile)

        return { timestamp: timestamp, description: description };
      }));

    console.log("finished collecting all the scenes");
    //console.log(scenes);

    return res.json(scenes);

  } catch (error) {

    //return an error
    console.log("received error: ", error);
    return res.status(500).json("an internal error occurred");
  }

});

async function downloadVideoFile(videoFilename) {
  // Creates a client
  const storage = new Storage();

  // keep same name locally
  let localFilename = videoFilename;

  const options = {
    destination: localFilename
  };

  // Download the file
  await storage.bucket(bucketName).file(videoFilename).download(options);

  console.log(
    `gs://${bucketName}/${videoFilename} downloaded locally to ${localFilename}.`
  );

  return localFilename;
}

다음으로 다음 콘텐츠로 sceneDetector.js 파일을 만듭니다. 이 파일은 Video Intelligence API를 사용하여 동영상에서 장면이 변경되는 시점을 감지합니다.

const fs = require('fs');
const util = require('util');
const readFile = util.promisify(fs.readFile);
const ffmpeg = require('fluent-ffmpeg');

const Video = require('@google-cloud/video-intelligence');
const client = new Video.VideoIntelligenceServiceClient();

module.exports = {
    detectSceneChanges: async function (downloadedFile) {

        // Reads a local video file and converts it to base64       
        const file = await readFile(downloadedFile);
        const inputContent = file.toString('base64');

        // setup request for shot change detection
        const videoContext = {
            speechTranscriptionConfig: {
                languageCode: 'en-US',
                enableAutomaticPunctuation: true,
            },
        };

        const request = {
            inputContent: inputContent,
            features: ['SHOT_CHANGE_DETECTION'],
        };

        // Detects camera shot changes
        const [operation] = await client.annotateVideo(request);
        console.log('Shot (scene) detection in progress...');
        const [operationResult] = await operation.promise();

        // Gets shot changes
        const shotChanges = operationResult.annotationResults[0].shotAnnotations;

        console.log("Shot (scene) changes detected: " + shotChanges.length);

        // data structure to be returned 
        let sceneChanges = [];

        // for the initial scene
        sceneChanges.push(1);

        // if only one scene, keep at 1 second
        if (shotChanges.length === 1) {
            return sceneChanges;
        }

        // get length of video
        const videoLength = await getVideoLength(downloadedFile);

        shotChanges.forEach((shot, shotIndex) => {
            if (shot.endTimeOffset === undefined) {
                shot.endTimeOffset = {};
            }
            if (shot.endTimeOffset.seconds === undefined) {
                shot.endTimeOffset.seconds = 0;
            }
            if (shot.endTimeOffset.nanos === undefined) {
                shot.endTimeOffset.nanos = 0;
            }

            // convert to a number
            let currentTimestampSecond = Number(shot.endTimeOffset.seconds);                  

            let sceneChangeTime = 0;
            // double-check no scenes were detected within the last second
            if (currentTimestampSecond + 1 > videoLength) {
                sceneChangeTime = currentTimestampSecond;                
            } else {
                // otherwise, for simplicity, just round up to the next second 
                sceneChangeTime = currentTimestampSecond + 1;
            }

            sceneChanges.push(sceneChangeTime);
        });

        return sceneChanges;
    }
}

async function getVideoLength(localFile) {
    let getLength = util.promisify(ffmpeg.ffprobe);
    let length = await getLength(localFile);

    console.log("video length: ", length.format.duration);
    return length.format.duration;
}

이제 다음 내용으로 imageCapture.js라는 파일을 만듭니다. 이 파일은 fluent-ffmpeg 노드 패키지를 사용하여 노드 앱 내에서 ffmpeg 명령어를 실행합니다.

const ffmpeg = require('fluent-ffmpeg');
const path = require('path');
const util = require('util');


module.exports = {
    captureImages: async function (localFile, scenes) {


        let imageBaseName = path.parse(localFile).name;


        try {
            for (scene of scenes) {
                console.log("creating screenshot for scene: ", + scene);
                await createScreenshot(localFile, imageBaseName, scene);
            }


        } catch (error) {
            console.log("error gathering screenshots: ", error);
        }


        console.log("finished gathering the screenshots");
    }
}


async function createScreenshot(localFile, imageBaseName, scene) {
    return new Promise((resolve, reject) => {
        ffmpeg(localFile)
            .screenshots({
                timestamps: [scene],
                filename: `${imageBaseName}-${scene}.png`,
                folder: 'output',
                size: '320x240'
            }).on("error", () => {
                console.log("Failed to create scene for timestamp: " + scene);
                return reject('Failed to create scene for timestamp: ' + scene);
            })
            .on("end", () => {
                return resolve();
            });
    })
}

마지막으로 다음 내용으로 `imageExplainr.js`` 라는 파일을 만듭니다. 이 파일은 Vertex AI를 사용하여 각 장면 이미지의 시각적 설명을 가져옵니다.

const axios = require("axios");
const { GoogleAuth } = require('google-auth-library');

const auth = new GoogleAuth({
    scopes: 'https://www.googleapis.com/auth/cloud-platform'
});

module.exports = {
    getAccessToken: async function () {

        return await auth.getAccessToken();
    }, 

    transcribeScene: async function(token, encodedFile) {

        let projectId = await auth.getProjectId();
    
        let config = {
            headers: {
                'Authorization': 'Bearer ' + token,
                'Content-Type': 'application/json; charset=utf-8'
            }
        }

        const json = {
            "instances": [
                {
                    "image": {
                        "bytesBase64Encoded": encodedFile
                    }
                }
            ],
            "parameters": {
                "sampleCount": 1,
                "language": "en"
            }
        }

        let response = await axios.post('https://us-central1-aiplatform.googleapis.com/v1/projects/' + projectId + '/locations/us-central1/publishers/google/models/imagetext:predict', json, config);

        return response.data.predictions[0];
    }
}

Dockerfile 및 .dockerignore 파일 만들기

이 서비스는 ffmpeg를 사용하므로 ffmpeg를 설치하는 Dockerfile을 만들어야 합니다.

다음 콘텐츠가 포함된 Dockerfile라는 파일을 만듭니다.

# Copyright 2020 Google, LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Use the official lightweight Node.js image.
# https://hub.docker.com/_/node
FROM node:20.10.0-slim

# Create and change to the app directory.
WORKDIR /usr/src/app

RUN apt-get update && apt-get install -y ffmpeg

# Copy application dependency manifests to the container image.
# A wildcard is used to ensure both package.json AND package-lock.json are copied.
# Copying this separately prevents re-running npm install on every code change.
COPY package*.json ./

# Install dependencies.
# If you add a package-lock.json speed your build by switching to 'npm ci'.
# RUN npm ci --only=production
RUN npm install --production

# Copy local code to the container image.
COPY . .

# Run the web service on container startup.
CMD [ "npm", "start" ]

그런 다음 .dockerignore라는 파일을 만들어 특정 파일의 컨테이너화를 무시합니다.

Dockerfile
.dockerignore
node_modules
npm-debug.log

6. 서비스 계정 만들기

Cloud Storage, Vertex AI, Video Intelligence API에 액세스하는 데 사용할 Cloud Run 서비스의 서비스 계정을 만듭니다.

SERVICE_ACCOUNT="cloud-run-video-description"
SERVICE_ACCOUNT_ADDRESS=$SERVICE_ACCOUNT@$PROJECT_ID.iam.gserviceaccount.com

gcloud iam service-accounts create $SERVICE_ACCOUNT \
  --display-name="Cloud Run Video Scene Image Describer service account"
 
# to view & download storage bucket objects
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member serviceAccount:$SERVICE_ACCOUNT_ADDRESS \
  --role=roles/storage.objectViewer

# to call the Vertex AI imagetext model
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member serviceAccount:$SERVICE_ACCOUNT_ADDRESS \
  --role=roles/aiplatform.user

7. Cloud Run 서비스 배포

이제 소스 기반 배포를 사용하여 Cloud Run 서비스를 자동으로 컨테이너화할 수 있습니다.

참고: Cloud Run 서비스의 기본 처리 시간은 60초입니다. 이 Codelab에서는 추천 테스트 동영상의 길이가 2분이므로 제한 시간 5분을 사용합니다. 재생 시간이 더 긴 동영상을 사용하는 경우 시간을 수정해야 할 수도 있습니다.

gcloud run deploy $SERVICE_NAME \
  --region=$REGION \
  --set-env-vars BUCKET_ID=$BUCKET_ID \
  --no-allow-unauthenticated \
  --service-account $SERVICE_ACCOUNT_ADDRESS \
  --timeout=5m \
  --source=.

배포되면 환경 변수에 서비스 URL을 저장합니다.

SERVICE_URL=$(gcloud run services describe $SERVICE_NAME --platform managed --region $REGION --format 'value(status.url)')

8. Cloud Run 서비스 호출

이제 Cloud Storage에 업로드한 동영상의 이름을 제공하여 서비스를 호출할 수 있습니다.

curl -X GET -H "Authorization: Bearer $(gcloud auth print-identity-token)" ${SERVICE_URL}?filename=${FILENAME}

결과는 아래 출력 예와 유사합니다.

[{"timestamp":1,"description":"an aerial view of a city with a bridge in the background"},{"timestamp":7,"description":"a man in a blue shirt sits in front of shelves of donuts"},{"timestamp":11,"description":"a black and white photo of people working in a bakery"},{"timestamp":12,"description":"a black and white photo of a man and woman working in a bakery"}]

9. 축하합니다.

축하합니다. Codelab을 완료했습니다.

Video Intelligence API, Cloud Run, Vertex AI 시각적 캡션에 관한 문서를 검토하는 것이 좋습니다.

학습한 내용

Dockerfile을 사용하여 컨테이너 이미지를 만들어 서드 파티 바이너리를 설치하는 방법
다른 Google Cloud 서비스를 호출하는 Cloud Run 서비스의 서비스 계정을 만들어 최소 권한의 원칙을 따르는 방법
Cloud Run 서비스에서 Video Intelligence 클라이언트 라이브러리를 사용하는 방법을 알아봅니다.
Google API를 호출하여 Vertex AI에서 각 장면의 시각적 설명을 가져오는 방법

10. 삭제

실수로 인한 요금 청구를 방지하려면(예: 이 Cloud Run 서비스가 무료 등급의 월별 Cloud Run 호출 할당보다 실수로 더 많이 호출된 경우) Cloud Run 서비스를 삭제하거나 2단계에서 만든 프로젝트를 삭제합니다.

Cloud Run 서비스를 삭제하려면 Cloud Run Cloud 콘솔(https://console.cloud.google.com/run/)으로 이동하여 video-describer 함수(또는 다른 이름을 사용한 경우 $SERVICE_NAME)를 삭제합니다.

전체 프로젝트를 삭제하려면 https://console.cloud.google.com/cloud-resource-manager로 이동하여 2단계에서 만든 프로젝트를 선택한 후 삭제를 선택하면 됩니다. 프로젝트를 삭제하면 Cloud SDK에서 프로젝트를 변경해야 합니다. gcloud projects list를 실행하면 사용 가능한 모든 프로젝트의 목록을 볼 수 있습니다.

오류 신고