关于此 Codelab
1. 简介
上次更新时间:2025 年 3 月 12 日
免责声明
以下示例代码使用 YouTube Data API 和 Gemini 分析视频。用户需对其使用负责。在实际环境中使用此代码时,应仔细考虑。作者对使用此代码而产生的任何问题概不负责。此外,由于人工智能的特性,结果始终有可能与实际情况不符。因此,请勿盲目相信结果,应仔细审核。
此项目的目标
其主要目标是通过分析视频内容和情绪,找出适合用于品牌宣传的 YouTube 视频和 YouTuber。
概览
该项目利用 YouTube Data API 提取视频信息,并使用 GCP Vertex AI API 和 Gemini 模型分析视频内容。它在 Google Colab 上运行。
您可以将日后提供的代码粘贴到 Colab 中,然后逐个运行这些代码。
学习内容
- 如何使用 YouTube Data API 提取视频信息。
- 如何将 GCP Vertex AI API 与 Gemini 模型搭配使用来分析视频内容。
- 如何使用 Google Colab 运行代码。
- 如何根据分析的数据创建电子表格。
所需条件
如需实现此解决方案,您需要具备以下几项:
- 一个 Google Cloud Platform 项目。
- 在项目中启用 YouTube Data API v3、Vertex AI API、Generative Language API、Google 云端硬盘 API 和 Google 表格 API。
- 在“凭据”标签页中创建一个 API 密钥,并为其授予 YouTube Data API v3 的使用权限。
此解决方案利用了 YouTube Data API 和 GCP Vertex AI API。
2. 代码和说明
首先,我们需要导入要使用的库。然后,使用您的 Google 账号登录并授予访问您的 Google 云端硬盘的权限。
# library
# colab
import ipywidgets as widgets
from IPython.display import display
from google.colab import auth
# cloud
from google import genai
from google.genai.types import Part, GenerateContentConfig
# function, util
import requests, os, re, time
from pandas import DataFrame
from datetime import datetime, timedelta
auth.authenticate_user()
[建议采取的行动]
GCP 中的 API 密钥和项目 ID 通常需要更改。以下单元格用于输入 GCP 设置值。
# GCP Setting
LANGUAGE_MODEL = 'gemini-1.5-pro' # @param {type:"string"}
API_KEY = 'Please write your API_KEY' # @param {type:"string"}
PROJECT_ID = 'Please write your GCP_ID' # @param {type:"string"}
LOCATION = 'us-central1' # @param {type:"string"}
[建议采取的行动]
请更改变量值,同时检查“输入”下方的代码。
本文将以品牌“Google”为例,演示如何在 YouTube 上搜索特定主题(例如“Google AI”)的视频,同时排除该品牌自家频道的视频。
YouTube 视频分析的输入变量
- BRAND_NAME(必需):要分析的品牌名称(例如,Google)。
- MY_COMPANY_INFO(必需):简要的品牌说明和背景信息。
- SEARCH_QUERY(必需):YouTube 视频的搜索字词(例如Google AI)。
- VIEWER_COUNTRY::观看者的国家/地区代码(双字母国家/地区代码:ISO 3166-1 alpha-2)(例如KR)。
- GENERATION_LANGUAGE(必需):Gemini 结果的语言(例如韩语)。
- EXCEPT_CHANNEL_IDS::要排除的频道 ID(以英文逗号分隔)。
您可以在 YouTube 频道中找到频道 ID。
- VIDEO_TOPIC::用于优化的 YouTube 主题 ID。
您可以在 Search: list | YouTube Data API | Google for Developers 中找到视频主题值。
- DATE_INPUT(必需):已发布视频的开始日期 (YYYY-MM-DD)。
# Input
BRAND_NAME = "Google" # @param {type:"string"}
MY_COMPANY_INFO = "Google is a multinational technology company specializing in internet-related services and products." # @param {type:"string"}
SEARCH_QUERY = 'Google AI' # @param {type:"string"}
VIEWER_COUNTRY = 'KR' # @param {type:"string"}
GENERATION_LANGUAGE = 'Korean' # @param {type:"string"}
EXCEPT_CHANNEL_IDS = 'UCK8sQmJBp8GCxrOtXWBpyEA, UCdc_SRhKUlH3grljQXA0skw' # @param {type:"string"}
VIDEO_TOPIC = '/m/07c1v' # @param {type: "string"}
DATE_INPUT = '2025-01-01' # @param {type:"date"}
# Auth Scope
SCOPE = [
'https://www.googleapis.com/auth/youtube.readonly',
'https://www.googleapis.com/auth/spreadsheets',
'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/cloud-platform'
]
# validation check
if not SEARCH_QUERY or not DATE_INPUT:
raise ValueError("Search query and date input are required.")
EXCEPT_CHANNEL_IDS = [id.strip() for id in EXCEPT_CHANNEL_IDS.split(',')]
所提供的文本列出了与与 YouTube Data API 交互相关的主要函数。
# YouTube API function
def get_youtube_videos(q, viewer_country_code, topic_str, start_period):
page_token_number = 1
next_page_token = ''
merged_array = []
published_after_date = f"{start_period}T00:00:00Z"
while page_token_number < 9 and len(merged_array) <= 75:
result = search_youtube(q, topic_str, published_after_date, viewer_country_code, '', next_page_token, 50)
merged_array = list(set(merged_array + result['items']))
next_page_token = result['nextPageToken']
page_token_number += 1
return merged_array
def search_youtube(query, topic_id, published_after, region_code, relevance_language, next_page_token, max_results=50):
if not query:
return None
q = query
url = f'https://www.googleapis.com/youtube/v3/search?key={API_KEY}&part=snippet&q={q}&publishedAfter={published_after}®ionCode={region_code}&type=video&topicId={topic_id}&maxResults={max_results}&pageToken={next_page_token}&gl={region_code.lower()}'
response = requests.get(url)
data = response.json()
results = data.get('items', [])
next_page_token = data.get('nextPageToken', '')
return_results = [item['id']['videoId'] for item in results]
print(url)
return {
"nextPageToken": next_page_token,
"items": return_results
}
def get_date_string(days_ago):
date = datetime.now() + timedelta(days=days_ago)
return date.strftime('%Y-%m-%dT00:00:00Z')
def get_video_details(video_id):
url = f'https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails&id={video_id}&key={API_KEY}'
response = requests.get(url)
data = response.json()
if data.get('items'):
video = data['items'][0]
snippet = video['snippet']
content_details = video['contentDetails']
title = snippet.get('title', 'no title')
description = snippet.get('description', 'no description')
duration_iso = content_details.get('duration', None)
channel_id = snippet.get('channelId', 'no channel id')
channel_title = snippet.get('channelTitle', 'no channel title')
return {'title': title, 'description': description, 'duration': duration_to_seconds(duration_iso), 'channel_id': channel_id, 'channel_title': channel_title}
else:
return None
def duration_to_seconds(duration_str):
match = re.match(r'PT(?:(\d+)H)?(?:(\d+)M)?(?:(\d+)S)?', duration_str)
if not match:
return None
hours, minutes, seconds = match.groups()
total_seconds = 0
if hours:
total_seconds += int(hours) * 3600
if minutes:
total_seconds += int(minutes) * 60
if seconds:
total_seconds += int(seconds)
return total_seconds
文本提供了可根据需要调整的提示模板,以及用于与 GCP Vertex AI API 交互的关键函数。
# GCP Vertex AI API
client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)
model = client.models
def request_gemini(prompt, video_link):
video_extraction_json_generation_config = GenerateContentConfig(
temperature=0.0,
max_output_tokens=2048,
)
contents = [
Part.from_uri(
file_uri=video_link,
mime_type="video/mp4",
),
prompt
]
response = model.generate_content(
model=LANGUAGE_MODEL,
contents=contents,
config=video_extraction_json_generation_config
)
try:
return response.text
except:
return response.GenerateContentResponse
def create_prompt(yt_title, yt_description, yt_link):
return f"""### Task: You are a highly specialized marketer and YouTube expert working for the brand or company, {BRAND_NAME}.
Your boss is wondering which a video to use to promote their company's advertisements and which a YouTuber to promote their advertisements with in the future. You are the expert who can give your boss the most suitable suggestions.
Analyze the video according to the criteria below and solve your boss's worries.
### Criteria: Now you review the video.
If you evaluate it using the following criteria, you will be able to receive a better evaluation.
1. Whether the video mentions brand, {BRAND_NAME}.
2. Whether the video views {BRAND_NAME} positively or negatively.
3. Whether the video would be suitable for marketing purposes.
### Context and Contents:
Your Company Information:
- Company Description: {MY_COMPANY_INFO}
- Brand: {BRAND_NAME}
Analysis subject:
- YouTube title: {yt_title}
- YouTube description: {yt_description}
- YouTube link: {yt_link}
### Answer Format:
brand_relevance_score: (Integer between 0 and 100 - If this video is more relative about the {BRAND_NAME}, it will score higher)
brand_positive_score: (Integer between 0 and 100 - If this video is positive about the {BRAND_NAME}, it will score higher)
brand_negative_score: (Integer between 0 and 100 - If this video is negative about the {BRAND_NAME}, it will score higher)
video_content_summary: (Summarize the content of the video like overview)
video_brand_summary: (Summarize the content about your brand, {BRAND_NAME})
opinion: (Why this video is suitable for promoting your company or product)
### Examples:
brand_relevance_score: 100
brand_positive_score: 80
brand_negative_score: 0
video_content_summary: YouTubers introduce various electronic products in their videos.
video_brand_summary: The brand products mentioned in the video have their advantages well explained by the YouTuber.
opinion: Consumers are more likely to think positively about the advantages of the product.
### Caution:
DO NOT fabricate information.
DO NOT imagine things.
DO NOT Markdown format.
DO Analyze each video based on the criteria mentioned above.
DO Analyze after watching the whole video.
DO write the results for summary as {GENERATION_LANGUAGE}."""
def parse_response(response: str):
brand_relevance_score_pattern = r"brand_relevance_score:\s*(\d{1,3})"
brand_positive_score_pattern = r"brand_positive_score:\s*(\d{1,3})"
brand_negative_score_pattern = r"brand_negative_score:\s*(\d{1,3})"
video_content_summary_pattern = r"video_content_summary:\s*(.*)"
video_brand_summary_pattern = r"video_brand_summary:\s*(.*)"
opinion_pattern = r"opinion:\s*(.*)"
brand_relevance_score_match = re.search( brand_relevance_score_pattern, response )
brand_relevance_score = ( int(brand_relevance_score_match.group(1)) if brand_relevance_score_match else 0 )
brand_positive_score_match = re.search( brand_positive_score_pattern, response )
brand_positive_score = ( int(brand_positive_score_match.group(1)) if brand_positive_score_match else 0 )
brand_negative_score_match = re.search( brand_negative_score_pattern, response )
brand_negative_score = ( int(brand_negative_score_match.group(1)) if brand_negative_score_match else 0 )
video_content_score_match = re.search( video_content_summary_pattern, response )
video_content_summary = ( video_content_score_match.group(1) if video_content_score_match else '' )
video_brand_summary_match = re.search( video_brand_summary_pattern, response )
video_brand_summary = ( video_brand_summary_match.group(1) if video_brand_summary_match else '' )
opinion_match = re.search( opinion_pattern, response )
opinion = ( opinion_match.group(1) if opinion_match else '' )
return ( brand_relevance_score, brand_positive_score, brand_negative_score, video_content_summary, video_brand_summary, opinion)
def request_gemini_with_retry(prompt, youtube_link='', max_retries=1):
retries = 0
while retries <= max_retries:
try:
response = request_gemini(prompt, youtube_link)
( brand_relevance_score,
brand_positive_score,
brand_negative_score,
video_content_summary,
video_brand_summary,
opinion) = parse_response(response)
if ( validate_score(brand_relevance_score) and
validate_score(brand_positive_score) and
validate_score(brand_negative_score) and
validate_summary(video_content_summary) and
validate_summary(video_brand_summary) ):
return ( brand_relevance_score,
brand_positive_score,
brand_negative_score,
video_content_summary,
video_brand_summary,
opinion
)
else:
retries += 1
ValueError(
"The value may be incorrect, there may be a range issue, a parsing"
" issue, or a response issue with Gemini: score -"
f" {brand_relevance_score}, {brand_positive_score},"
f" {brand_negative_score} , summary - {video_content_summary},"
f" {video_brand_summary}" )
except Exception as e:
print(f"Request failed: {e}")
retries += 1
if retries <= max_retries:
print(f"retry ({retries}/{max_retries})...")
else:
print("Maximum number of retries exceeded")
return 0, 0, 0, "", "", ""
def validate_score(score):
return score >= 0 and score <= 100
def validate_summary(summary):
return len(summary) > 0
此代码块负责执行以下三个主要功能:创建 DataFrame、执行 Gemini 分析,以及随后更新 DataFrame。
def df_youtube_videos():
youtube_video_list = get_youtube_videos(SEARCH_QUERY, VIEWER_COUNTRY, VIDEO_TOPIC, DATE_INPUT)
youtube_video_link_list = []
youtube_video_title_list = []
youtube_video_description_list = []
youtube_video_channel_title_list = []
youtube_video_duration_list = []
for video_id in youtube_video_list:
video_details = get_video_details(video_id)
# https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/video-understanding
if video_details['duration'] < 50*60 and not video_details['channel_id'] in EXCEPT_CHANNEL_IDS:
youtube_video_link_list.append(f'https://www.youtube.com/watch?v={video_id}')
if video_details:
youtube_video_title_list.append(video_details['title'])
youtube_video_description_list.append(video_details['description'])
youtube_video_channel_title_list.append(video_details['channel_title'])
duration_new_format = f"{video_details['duration'] // 3600:02d}:{(video_details['duration'] % 3600) // 60:02d}:{video_details['duration'] % 60:02d}" # HH:MM:SS
youtube_video_duration_list.append(duration_new_format)
else:
youtube_video_title_list.append('')
youtube_video_description_list.append('')
youtube_video_channel_title_list.append(video_details['channel_title'])
youtube_video_duration_list.append('')
df = DataFrame({
'video_id': youtube_video_link_list,
'title': youtube_video_title_list,
'description': youtube_video_description_list,
'channel_title': youtube_video_channel_title_list,
'length': youtube_video_duration_list
})
return df
def run_gemini(df):
for index, row in df.iterrows():
video_title = row['title']
video_description = row['description']
video_link = row['video_id']
prompt = create_prompt(video_title, video_description, video_link)
( brand_relevance_score,
brand_positive_score,
brand_negative_score,
video_content_summary,
video_brand_summary,
opinion) = request_gemini_with_retry(prompt, video_link)
df.at[index, 'gemini_brand_relevance_score'] = brand_relevance_score
df.at[index, 'gemini_brand_positive_score'] = brand_positive_score
df.at[index, 'gemini_brand_negative_score'] = brand_negative_score
df.at[index, 'gemini_video_content_summary'] = video_content_summary
df.at[index, 'gemini_video_brand_summary'] = video_brand_summary
df.at[index, 'gemini_opinion'] = opinion
# https://cloud.google.com/vertex-ai/generative-ai/docs/quotas
time.sleep(1)
print(f"Processing: {index}/{len(df)}")
print(f"video_title: {video_title}")
return df
这是一个代码块,用于执行到目前为止编写的所有代码。它会从 YouTube 提取数据,使用 Gemini 对其进行分析,最后创建 DataFrame。
# main
df = df_youtube_videos()
run_gemini(df)
df['gemini_brand_positive_score'] = df[ 'gemini_brand_positive_score' ].astype('int64')
df['gemini_brand_relevance_score'] = df[ 'gemini_brand_relevance_score' ].astype('int64')
df['gemini_brand_negative_score'] = df[ 'gemini_brand_negative_score' ].astype('int64')
df = df.sort_values( 'gemini_brand_positive_score', ascending=False )
df
最后一步是根据数据框创建电子表格。如需查看进度,请使用输出网址。
import gspread
from google.auth import default
today_date = datetime.now().strftime('%Y-%m-%d')
my_spreadsheet_title = f"Partner's Video Finder, {BRAND_NAME}, {SEARCH_QUERY}, {VIEWER_COUNTRY} ({DATE_INPUT}~{today_date})"
creds, _ = default()
gc = gspread.authorize(creds)
sh = gc.create(my_spreadsheet_title)
worksheet = gc.open(my_spreadsheet_title).sheet1
cell_list = df.values.tolist()
worksheet.update([df.columns.values.tolist()] + cell_list)
print("URL: ", sh.url)
3. 参考
在编写代码时,我参考了以下内容。如果您需要修改代码或想了解更详细的用法,请参阅以下链接。