運用 Gemini 分析 YouTube 影片行銷成效

運用 Gemini 分析 YouTube 影片行銷成效

程式碼研究室簡介

subject上次更新時間:4月 3, 2025
account_circle作者:Jisub Lee, Kyungjune Shin

1. 簡介

上次更新時間:2025 年 3 月 12 日

免責事項

以下是使用 YouTube Data API 和 Gemini 分析影片的程式碼範例。使用者須自行負責使用情況。在實際環境中使用這段程式碼時,請務必謹慎考量。作者對使用此程式碼所產生的任何問題概不負責。此外,由於人工智慧的特性,結果可能與實際事實有所出入。因此,請務必仔細檢查結果,切勿盲目信任。

這項專案的目標

主要目標是分析影片內容和情緒,找出適合用於品牌宣傳的 YouTube 影片和創作者。

總覽

這個專案會利用 YouTube Data API 擷取影片資訊,並使用 GCP Vertex AI API 搭配 Gemini 模型分析影片內容。這項工具會在 Google Colab 上執行。

您可以將日後產生的程式碼貼到 Colab 中,然後逐一執行。

課程內容

  • 如何使用 YouTube Data API 擷取影片資訊。
  • 如何搭配使用 GCP Vertex AI API 和 Gemini 模型,分析影片內容。
  • 如何使用 Google Colab 執行程式碼。
  • 如何根據分析結果建立試算表。

軟硬體需求

如要實作此解決方案,您需要:

  • Google Cloud Platform 專案。
  • 在專案中啟用 YouTube Data API v3、Vertex AI API、Generative Language API、Google Drive API 和 Google Sheets API。
  • 在「憑證」分頁中建立 API 金鑰,並授權使用 YouTube Data API v3。

這項解決方案會使用 YouTube Data API 和 GCP Vertex AI API。

2. 程式碼和說明

首先,我們需要匯入要使用的程式庫。接著,請登入 Google 帳戶並授予存取 Google 雲端硬碟的權限。

# library
# colab
import ipywidgets as widgets
from IPython.display import display
from google.colab import auth

# cloud
from google import genai
from google.genai.types import Part, GenerateContentConfig

# function, util
import requests, os, re, time
from pandas import DataFrame
from datetime import datetime, timedelta

auth.authenticate_user()

[敬請採取行動]

通常需要變更的值是 GCP 的 API 金鑰和專案 ID。下方的儲存格是 GCP 設定值。

# GCP Setting
LANGUAGE_MODEL = 'gemini-1.5-pro' # @param {type:"string"}
API_KEY = 'Please write your API_KEY' # @param {type:"string"}
PROJECT_ID = 'Please write your GCP_ID' # @param {type:"string"}
LOCATION = 'us-central1' # @param {type:"string"}

[敬請採取行動]

請在檢查 Input 下方的程式碼時變更變數值。

本文將以「Google」品牌為例,說明如何在 YouTube 中搜尋特定主題 (例如「Google AI」) 的影片,同時排除品牌本身頻道的影片。

YouTube 影片分析輸入變數

  • BRAND_NAME (必填):分析的品牌名稱 (例如Google)。
  • MY_COMPANY_INFO (必要):簡短的品牌說明和背景資訊。
  • SEARCH_QUERY (必要):YouTube 影片搜尋字詞 (例如Google AI)。
  • VIEWER_COUNTRY: 觀眾的國家/地區代碼 (兩個英文字母國家/地區代碼:ISO 3166-1 alpha-2) (例如:KR)。
  • GENERATION_LANGUAGE (必填): Gemini 結果的語言 (例如韓文)。
  • EXCEPT_CHANNEL_IDS:以半形逗號分隔要排除的頻道 ID。

你可以在 YouTube 頻道中找到頻道 ID。

a581655472a9b1b0.png

  • VIDEO_TOPIC:用於精進的 YouTube 主題 ID。

如要查看影片主題值,請參閱「搜尋:清單 | YouTube Data API | Google 開發人員」。

30f1e73c6ec6c346.png

  • DATE_INPUT (必要):已發布影片的開始日期 (YYYY-MM-DD)。
# Input
BRAND_NAME = "Google" # @param {type:"string"}
MY_COMPANY_INFO = "Google is a multinational technology company specializing in internet-related services and products." # @param {type:"string"}
SEARCH_QUERY = 'Google AI' # @param {type:"string"}
VIEWER_COUNTRY = 'KR' # @param {type:"string"}
GENERATION_LANGUAGE = 'Korean' # @param {type:"string"}
EXCEPT_CHANNEL_IDS = 'UCK8sQmJBp8GCxrOtXWBpyEA, UCdc_SRhKUlH3grljQXA0skw' # @param {type:"string"}
VIDEO_TOPIC = '/m/07c1v' # @param {type: "string"}
DATE_INPUT = '2025-01-01' # @param {type:"date"}

# Auth Scope
SCOPE = [
    'https://www.googleapis.com/auth/youtube.readonly',
    'https://www.googleapis.com/auth/spreadsheets',
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/cloud-platform'
]

# validation check
if not SEARCH_QUERY or not DATE_INPUT:
  raise ValueError("Search query and date input are required.")

EXCEPT_CHANNEL_IDS = [id.strip() for id in EXCEPT_CHANNEL_IDS.split(',')]

提供的文字會列出與 YouTube Data API 互動相關的重要函式。

# YouTube API function

def get_youtube_videos(q, viewer_country_code, topic_str, start_period):

    page_token_number = 1
    next_page_token = ''
    merged_array = []

    published_after_date = f"{start_period}T00:00:00Z"

    while page_token_number < 9 and len(merged_array) <= 75:
        result = search_youtube(q, topic_str, published_after_date, viewer_country_code, '', next_page_token, 50)
        merged_array = list(set(merged_array + result['items']))
        next_page_token = result['nextPageToken']
        page_token_number += 1

    return merged_array

def search_youtube(query, topic_id, published_after, region_code, relevance_language, next_page_token, max_results=50):

    if not query:
        return None

    q = query

    url = f'https://www.googleapis.com/youtube/v3/search?key={API_KEY}&part=snippet&q={q}&publishedAfter={published_after}&regionCode={region_code}&type=video&topicId={topic_id}&maxResults={max_results}&pageToken={next_page_token}&gl={region_code.lower()}'

    response = requests.get(url)
    data = response.json()
    results = data.get('items', [])
    next_page_token = data.get('nextPageToken', '')
    return_results = [item['id']['videoId'] for item in results]

    print(url)

    return {
        "nextPageToken": next_page_token,
        "items": return_results
    }

def get_date_string(days_ago):

    date = datetime.now() + timedelta(days=days_ago)
    return date.strftime('%Y-%m-%dT00:00:00Z')

def get_video_details(video_id):

    url = f'https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails&id={video_id}&key={API_KEY}'
    response = requests.get(url)
    data = response.json()

    if data.get('items'):
        video = data['items'][0]
        snippet = video['snippet']
        content_details = video['contentDetails']

        title = snippet.get('title', 'no title')
        description = snippet.get('description', 'no description')
        duration_iso = content_details.get('duration', None)
        channel_id = snippet.get('channelId', 'no channel id')
        channel_title = snippet.get('channelTitle', 'no channel title')
        return {'title': title, 'description': description, 'duration': duration_to_seconds(duration_iso), 'channel_id': channel_id, 'channel_title': channel_title}
    else:
        return None

def duration_to_seconds(duration_str):
  match = re.match(r'PT(?:(\d+)H)?(?:(\d+)M)?(?:(\d+)S)?', duration_str)
  if not match:
    return None

  hours, minutes, seconds = match.groups()

  total_seconds = 0
  if hours:
    total_seconds += int(hours) * 3600
  if minutes:
    total_seconds += int(minutes) * 60
  if seconds:
    total_seconds += int(seconds)

  return total_seconds

這段文字提供可視需要調整的提示範本,以及與 GCP Vertex AI API 互動的關鍵函式。

# GCP Vertex AI API

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)
model = client.models

def request_gemini(prompt, video_link):
  video_extraction_json_generation_config = GenerateContentConfig(
    temperature=0.0,
    max_output_tokens=2048,
  )

  contents = [
      Part.from_uri(
          file_uri=video_link,
          mime_type="video/mp4",
      ),
      prompt
  ]

  response = model.generate_content(
      model=LANGUAGE_MODEL,
      contents=contents,
      config=video_extraction_json_generation_config
  )

  try:
    return response.text
  except:
    return response.GenerateContentResponse

def create_prompt(yt_title, yt_description, yt_link):
  return f"""### Task: You are a highly specialized marketer and YouTube expert working for the brand or company, {BRAND_NAME}.
Your boss is wondering which a video to use to promote their company's advertisements and which a YouTuber to promote their advertisements with in the future. You are the expert who can give your boss the most suitable suggestions.
Analyze the video according to the criteria below and solve your boss's worries.

### Criteria: Now you review the video.
If you evaluate it using the following criteria, you will be able to receive a better evaluation.

1. Whether the video mentions brand, {BRAND_NAME}.
2. Whether the video views {BRAND_NAME} positively or negatively.
3. Whether the video would be suitable for marketing purposes.

### Context and Contents:
Your Company Information:
- Company Description: {MY_COMPANY_INFO}
- Brand: {BRAND_NAME}

Analysis subject:
- YouTube title: {yt_title}
- YouTube description: {yt_description}
- YouTube link: {yt_link}

### Answer Format:
brand_relevance_score: (Integer between 0 and 100 - If this video is more relative about the {BRAND_NAME}, it will score higher)
brand_positive_score: (Integer between 0 and 100 - If this video is positive about the {BRAND_NAME}, it will score higher)
brand_negative_score: (Integer between 0 and 100 - If this video is negative about the {BRAND_NAME}, it will score higher)
video_content_summary: (Summarize the content of the video like overview)
video_brand_summary: (Summarize the content about your brand, {BRAND_NAME})
opinion: (Why this video is suitable for promoting your company or product)

### Examples:
brand_relevance_score: 100
brand_positive_score: 80
brand_negative_score: 0
video_content_summary: YouTubers introduce various electronic products in their videos.
video_brand_summary: The brand products mentioned in the video have their advantages well explained by the YouTuber.
opinion: Consumers are more likely to think positively about the advantages of the product.

### Caution:
DO NOT fabricate information.
DO NOT imagine things.
DO NOT Markdown format.
DO Analyze each video based on the criteria mentioned above.
DO Analyze after watching the whole video.
DO write the results for summary as {GENERATION_LANGUAGE}."""

def parse_response(response: str):
  brand_relevance_score_pattern = r"brand_relevance_score:\s*(\d{1,3})"
  brand_positive_score_pattern = r"brand_positive_score:\s*(\d{1,3})"
  brand_negative_score_pattern = r"brand_negative_score:\s*(\d{1,3})"
  video_content_summary_pattern = r"video_content_summary:\s*(.*)"
  video_brand_summary_pattern = r"video_brand_summary:\s*(.*)"
  opinion_pattern = r"opinion:\s*(.*)"
  brand_relevance_score_match = re.search( brand_relevance_score_pattern, response )
  brand_relevance_score = ( int(brand_relevance_score_match.group(1)) if brand_relevance_score_match else 0 )
  brand_positive_score_match = re.search( brand_positive_score_pattern, response )
  brand_positive_score = ( int(brand_positive_score_match.group(1)) if brand_positive_score_match else 0 )
  brand_negative_score_match = re.search( brand_negative_score_pattern, response )
  brand_negative_score = ( int(brand_negative_score_match.group(1)) if brand_negative_score_match else 0 )
  video_content_score_match = re.search( video_content_summary_pattern, response )
  video_content_summary = ( video_content_score_match.group(1) if video_content_score_match else '' )
  video_brand_summary_match = re.search( video_brand_summary_pattern, response )
  video_brand_summary = ( video_brand_summary_match.group(1) if video_brand_summary_match else '' )
  opinion_match = re.search( opinion_pattern, response )
  opinion = ( opinion_match.group(1) if opinion_match else '' )
  return ( brand_relevance_score, brand_positive_score, brand_negative_score, video_content_summary, video_brand_summary, opinion)

def request_gemini_with_retry(prompt, youtube_link='', max_retries=1):
  retries = 0
  while retries <= max_retries:
    try:
      response = request_gemini(prompt, youtube_link)
      ( brand_relevance_score,
        brand_positive_score,
        brand_negative_score,
        video_content_summary,
        video_brand_summary,
        opinion) = parse_response(response)
      if ( validate_score(brand_relevance_score) and
           validate_score(brand_positive_score) and
           validate_score(brand_negative_score) and
           validate_summary(video_content_summary) and
           validate_summary(video_brand_summary) ):

        return ( brand_relevance_score,
                 brand_positive_score,
                 brand_negative_score,
                 video_content_summary,
                 video_brand_summary,
                 opinion
              )
      else:
        retries += 1
        ValueError(
            "The value may be incorrect, there may be a range issue, a parsing"
            " issue, or a response issue with Gemini: score -"
            f" {brand_relevance_score}, {brand_positive_score},"
            f" {brand_negative_score} , summary - {video_content_summary},"
            f" {video_brand_summary}" )

    except Exception as e:
      print(f"Request failed: {e}")
      retries += 1
      if retries <= max_retries:
        print(f"retry ({retries}/{max_retries})...")
      else:
        print("Maximum number of retries exceeded")
        return 0, 0, 0, "", "", ""

def validate_score(score):
  return score >= 0 and score <= 100

def validate_summary(summary):
  return len(summary) > 0

這個程式碼區塊負責執行三項主要功能:建立資料框架、執行 Gemini 分析,以及隨後更新資料框架。

def df_youtube_videos():
  youtube_video_list
= get_youtube_videos(SEARCH_QUERY, VIEWER_COUNTRY, VIDEO_TOPIC, DATE_INPUT)
  youtube_video_link_list
= []
  youtube_video_title_list
= []
  youtube_video_description_list
= []
  youtube_video_channel_title_list
= []
  youtube_video_duration_list
= []

 
for video_id in youtube_video_list:
    video_details
= get_video_details(video_id)
   
# https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/video-understanding
   
if video_details['duration'] < 50*60 and not video_details['channel_id'] in EXCEPT_CHANNEL_IDS:
      youtube_video_link_list
.append(f'https://www.youtube.com/watch?v={video_id}')
     
if video_details:
        youtube_video_title_list
.append(video_details['title'])
        youtube_video_description_list
.append(video_details['description'])
        youtube_video_channel_title_list
.append(video_details['channel_title'])
        duration_new_format
= f"{video_details['duration'] // 3600:02d}:{(video_details['duration'] % 3600) // 60:02d}:{video_details['duration'] % 60:02d}" # HH:MM:SS
        youtube_video_duration_list
.append(duration_new_format)
     
else:
        youtube_video_title_list
.append('')
        youtube_video_description_list
.append('')
        youtube_video_channel_title_list
.append(video_details['channel_title'])
        youtube_video_duration_list
.append('')

  df
= DataFrame({
     
'video_id': youtube_video_link_list,
     
'title': youtube_video_title_list,
     
'description': youtube_video_description_list,
     
'channel_title': youtube_video_channel_title_list,
     
'length': youtube_video_duration_list
 
})
 
return df

def run_gemini(df):
 
for index, row in df.iterrows():
    video_title
= row['title']
    video_description
= row['description']
    video_link
= row['video_id']
    prompt
= create_prompt(video_title, video_description, video_link)
   
( brand_relevance_score,
      brand_positive_score
,
      brand_negative_score
,
      video_content_summary
,
      video_brand_summary
,
      opinion
) = request_gemini_with_retry(prompt, video_link)
    df
.at[index, 'gemini_brand_relevance_score'] = brand_relevance_score
    df
.at[index, 'gemini_brand_positive_score'] = brand_positive_score
    df
.at[index, 'gemini_brand_negative_score'] = brand_negative_score
    df
.at[index, 'gemini_video_content_summary'] = video_content_summary
    df
.at[index, 'gemini_video_brand_summary'] = video_brand_summary
    df
.at[index, 'gemini_opinion'] = opinion
   
# https://cloud.google.com/vertex-ai/generative-ai/docs/quotas
    time
.sleep(1)
   
print(f"Processing: {index}/{len(df)}")
   
print(f"video_title: {video_title}")
 
return df

這是執行目前為止所有已編寫程式碼的程式碼區塊。它會從 YouTube 擷取資料,使用 Gemini 進行分析,最後建立資料框架。

# main
df = df_youtube_videos()
run_gemini(df)
df['gemini_brand_positive_score'] = df[ 'gemini_brand_positive_score' ].astype('int64')
df['gemini_brand_relevance_score'] = df[ 'gemini_brand_relevance_score' ].astype('int64')
df['gemini_brand_negative_score'] = df[ 'gemini_brand_negative_score' ].astype('int64')
df = df.sort_values( 'gemini_brand_positive_score', ascending=False )

df

最後一個步驟是使用資料框架建立試算表。如要查看進度,請使用輸出網址。

import gspread
from google.auth import default

today_date = datetime.now().strftime('%Y-%m-%d')
my_spreadsheet_title = f"Partner's Video Finder, {BRAND_NAME}, {SEARCH_QUERY}, {VIEWER_COUNTRY} ({DATE_INPUT}~{today_date})"

creds, _ = default()
gc = gspread.authorize(creds)
sh = gc.create(my_spreadsheet_title)
worksheet = gc.open(my_spreadsheet_title).sheet1
cell_list = df.values.tolist()
worksheet.update([df.columns.values.tolist()] + cell_list)

print("URL: ", sh.url)