Acerca de este codelab
1. Introducción
Última actualización: 12/03/2025
Renuncia de responsabilidad
Este es un código de muestra que analiza videos con la API de YouTube Data y Gemini. El usuario es responsable de su uso. Este código que se usa en entornos reales debe considerarse con cuidado. El autor no es responsable de los problemas que surjan por el uso de este código. Además, debido a la naturaleza de la inteligencia artificial, siempre existe la posibilidad de que los resultados difieran de los hechos reales. Por lo tanto, no se debe confiar ciegamente en los resultados y se deben revisar cuidadosamente.
Objetivo de este proyecto
El objetivo principal es identificar videos y YouTubers de YouTube adecuados para la promoción de la marca mediante el análisis del contenido y la opinión de los videos.
Descripción general
El proyecto aprovecha la API de datos de YouTube para recuperar información de los videos y la API de Vertex AI de GCP con el modelo Gemini para analizar el contenido de los videos. Se ejecuta en Google Colab.
Puedes pegar los códigos que saldrán en el futuro en Colab y ejecutarlos uno por uno.
Qué aprenderás
- Cómo usar la API de YouTube Data para recuperar información de videos
- Cómo usar la API de Vertex AI de GCP con el modelo de Gemini para analizar contenido de video
- Cómo usar Google Colab para ejecutar el código
- Cómo crear una hoja de cálculo a partir de los datos analizados
Requisitos
Para implementar esta solución, necesitarás lo siguiente:
- Un proyecto de Google Cloud Platform
- Habilita la versión 3 de la API de YouTube Data, la API de Vertex AI, la API de Generative Language, la API de Google Drive y la API de Google Sheets en el proyecto.
- Crea una clave de API en la pestaña de credenciales con autorización para la versión 3 de la API de YouTube Data.
Esta solución utiliza la API de YouTube Data y la API de Vertex AI de GCP.
2. Código y explicación
Lo primero que debemos hacer es importar las bibliotecas que queremos usar. Luego, accede con tu Cuenta de Google y otorga permiso para acceder a Google Drive.
# library
# colab
import ipywidgets as widgets
from IPython.display import display
from google.colab import auth
# cloud
from google import genai
from google.genai.types import Part, GenerateContentConfig
# function, util
import requests, os, re, time
from pandas import DataFrame
from datetime import datetime, timedelta
auth.authenticate_user()
[Action Required]
La CLAVE DE API y el ID DE PROYECTO de GCP son los valores que suelen necesitar cambios. Las siguientes celdas son para los valores de configuración de GCP.
# GCP Setting
LANGUAGE_MODEL = 'gemini-1.5-pro' # @param {type:"string"}
API_KEY = 'Please write your API_KEY' # @param {type:"string"}
PROJECT_ID = 'Please write your GCP_ID' # @param {type:"string"}
LOCATION = 'us-central1' # @param {type:"string"}
[Action Required]
Cambia los valores de las variables mientras revisas los códigos debajo de Entrada.
En este artículo, se usará la marca "Google" como ejemplo para mostrar cómo buscar videos en YouTube sobre un tema específico (p.ej., "IA de Google") y, al mismo tiempo, excluir los videos del canal de la marca.
Variables de entrada para el análisis de videos de YouTube
- BRAND_NAME (obligatorio): Es el nombre de la marca para el análisis (p.ej., Google) en el cuadro de búsqueda de YouTube.
- MY_COMPANY_INFO (obligatorio): Es una breve descripción y contexto de la marca.
- SEARCH_QUERY (obligatorio): Es el término de búsqueda para los videos de YouTube (p.ej., IA de Google).
- VIEWER_COUNTRY: Código de país del usuario (código de país de dos letras: ISO 3166-1 alpha-2) (p.ej., KR).
- GENERATION_LANGUAGE (obligatorio): Es el idioma de los resultados de Gemini (p.ej., Coreano).
- EXCEPT_CHANNEL_IDS: IDs de canales separados por comas que se deben excluir.
Puedes encontrar el ID del canal en el canal de YouTube.
- VIDEO_TOPIC: Es el ID de tema de YouTube para definir mejor el tema.
Puedes encontrar el valor del tema del video en Búsqueda: lista | API de datos de YouTube | Google for Developers.
- DATE_INPUT (obligatorio): Es la fecha de inicio del video publicado (AAAA-MM-DD).
# Input
BRAND_NAME = "Google" # @param {type:"string"}
MY_COMPANY_INFO = "Google is a multinational technology company specializing in internet-related services and products." # @param {type:"string"}
SEARCH_QUERY = 'Google AI' # @param {type:"string"}
VIEWER_COUNTRY = 'KR' # @param {type:"string"}
GENERATION_LANGUAGE = 'Korean' # @param {type:"string"}
EXCEPT_CHANNEL_IDS = 'UCK8sQmJBp8GCxrOtXWBpyEA, UCdc_SRhKUlH3grljQXA0skw' # @param {type:"string"}
VIDEO_TOPIC = '/m/07c1v' # @param {type: "string"}
DATE_INPUT = '2025-01-01' # @param {type:"date"}
# Auth Scope
SCOPE = [
'https://www.googleapis.com/auth/youtube.readonly',
'https://www.googleapis.com/auth/spreadsheets',
'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/cloud-platform'
]
# validation check
if not SEARCH_QUERY or not DATE_INPUT:
raise ValueError("Search query and date input are required.")
EXCEPT_CHANNEL_IDS = [id.strip() for id in EXCEPT_CHANNEL_IDS.split(',')]
En el texto proporcionado, se enumeran las funciones clave relacionadas con la interacción con la API de datos de YouTube.
# YouTube API function
def get_youtube_videos(q, viewer_country_code, topic_str, start_period):
page_token_number = 1
next_page_token = ''
merged_array = []
published_after_date = f"{start_period}T00:00:00Z"
while page_token_number < 9 and len(merged_array) <= 75:
result = search_youtube(q, topic_str, published_after_date, viewer_country_code, '', next_page_token, 50)
merged_array = list(set(merged_array + result['items']))
next_page_token = result['nextPageToken']
page_token_number += 1
return merged_array
def search_youtube(query, topic_id, published_after, region_code, relevance_language, next_page_token, max_results=50):
if not query:
return None
q = query
url = f'https://www.googleapis.com/youtube/v3/search?key={API_KEY}&part=snippet&q={q}&publishedAfter={published_after}®ionCode={region_code}&type=video&topicId={topic_id}&maxResults={max_results}&pageToken={next_page_token}&gl={region_code.lower()}'
response = requests.get(url)
data = response.json()
results = data.get('items', [])
next_page_token = data.get('nextPageToken', '')
return_results = [item['id']['videoId'] for item in results]
print(url)
return {
"nextPageToken": next_page_token,
"items": return_results
}
def get_date_string(days_ago):
date = datetime.now() + timedelta(days=days_ago)
return date.strftime('%Y-%m-%dT00:00:00Z')
def get_video_details(video_id):
url = f'https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails&id={video_id}&key={API_KEY}'
response = requests.get(url)
data = response.json()
if data.get('items'):
video = data['items'][0]
snippet = video['snippet']
content_details = video['contentDetails']
title = snippet.get('title', 'no title')
description = snippet.get('description', 'no description')
duration_iso = content_details.get('duration', None)
channel_id = snippet.get('channelId', 'no channel id')
channel_title = snippet.get('channelTitle', 'no channel title')
return {'title': title, 'description': description, 'duration': duration_to_seconds(duration_iso), 'channel_id': channel_id, 'channel_title': channel_title}
else:
return None
def duration_to_seconds(duration_str):
match = re.match(r'PT(?:(\d+)H)?(?:(\d+)M)?(?:(\d+)S)?', duration_str)
if not match:
return None
hours, minutes, seconds = match.groups()
total_seconds = 0
if hours:
total_seconds += int(hours) * 3600
if minutes:
total_seconds += int(minutes) * 60
if seconds:
total_seconds += int(seconds)
return total_seconds
El texto proporciona una plantilla de instrucción que se puede ajustar según sea necesario, junto con funciones clave para interactuar con la API de Vertex AI de GCP.
# GCP Vertex AI API
client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)
model = client.models
def request_gemini(prompt, video_link):
video_extraction_json_generation_config = GenerateContentConfig(
temperature=0.0,
max_output_tokens=2048,
)
contents = [
Part.from_uri(
file_uri=video_link,
mime_type="video/mp4",
),
prompt
]
response = model.generate_content(
model=LANGUAGE_MODEL,
contents=contents,
config=video_extraction_json_generation_config
)
try:
return response.text
except:
return response.GenerateContentResponse
def create_prompt(yt_title, yt_description, yt_link):
return f"""### Task: You are a highly specialized marketer and YouTube expert working for the brand or company, {BRAND_NAME}.
Your boss is wondering which a video to use to promote their company's advertisements and which a YouTuber to promote their advertisements with in the future. You are the expert who can give your boss the most suitable suggestions.
Analyze the video according to the criteria below and solve your boss's worries.
### Criteria: Now you review the video.
If you evaluate it using the following criteria, you will be able to receive a better evaluation.
1. Whether the video mentions brand, {BRAND_NAME}.
2. Whether the video views {BRAND_NAME} positively or negatively.
3. Whether the video would be suitable for marketing purposes.
### Context and Contents:
Your Company Information:
- Company Description: {MY_COMPANY_INFO}
- Brand: {BRAND_NAME}
Analysis subject:
- YouTube title: {yt_title}
- YouTube description: {yt_description}
- YouTube link: {yt_link}
### Answer Format:
brand_relevance_score: (Integer between 0 and 100 - If this video is more relative about the {BRAND_NAME}, it will score higher)
brand_positive_score: (Integer between 0 and 100 - If this video is positive about the {BRAND_NAME}, it will score higher)
brand_negative_score: (Integer between 0 and 100 - If this video is negative about the {BRAND_NAME}, it will score higher)
video_content_summary: (Summarize the content of the video like overview)
video_brand_summary: (Summarize the content about your brand, {BRAND_NAME})
opinion: (Why this video is suitable for promoting your company or product)
### Examples:
brand_relevance_score: 100
brand_positive_score: 80
brand_negative_score: 0
video_content_summary: YouTubers introduce various electronic products in their videos.
video_brand_summary: The brand products mentioned in the video have their advantages well explained by the YouTuber.
opinion: Consumers are more likely to think positively about the advantages of the product.
### Caution:
DO NOT fabricate information.
DO NOT imagine things.
DO NOT Markdown format.
DO Analyze each video based on the criteria mentioned above.
DO Analyze after watching the whole video.
DO write the results for summary as {GENERATION_LANGUAGE}."""
def parse_response(response: str):
brand_relevance_score_pattern = r"brand_relevance_score:\s*(\d{1,3})"
brand_positive_score_pattern = r"brand_positive_score:\s*(\d{1,3})"
brand_negative_score_pattern = r"brand_negative_score:\s*(\d{1,3})"
video_content_summary_pattern = r"video_content_summary:\s*(.*)"
video_brand_summary_pattern = r"video_brand_summary:\s*(.*)"
opinion_pattern = r"opinion:\s*(.*)"
brand_relevance_score_match = re.search( brand_relevance_score_pattern, response )
brand_relevance_score = ( int(brand_relevance_score_match.group(1)) if brand_relevance_score_match else 0 )
brand_positive_score_match = re.search( brand_positive_score_pattern, response )
brand_positive_score = ( int(brand_positive_score_match.group(1)) if brand_positive_score_match else 0 )
brand_negative_score_match = re.search( brand_negative_score_pattern, response )
brand_negative_score = ( int(brand_negative_score_match.group(1)) if brand_negative_score_match else 0 )
video_content_score_match = re.search( video_content_summary_pattern, response )
video_content_summary = ( video_content_score_match.group(1) if video_content_score_match else '' )
video_brand_summary_match = re.search( video_brand_summary_pattern, response )
video_brand_summary = ( video_brand_summary_match.group(1) if video_brand_summary_match else '' )
opinion_match = re.search( opinion_pattern, response )
opinion = ( opinion_match.group(1) if opinion_match else '' )
return ( brand_relevance_score, brand_positive_score, brand_negative_score, video_content_summary, video_brand_summary, opinion)
def request_gemini_with_retry(prompt, youtube_link='', max_retries=1):
retries = 0
while retries <= max_retries:
try:
response = request_gemini(prompt, youtube_link)
( brand_relevance_score,
brand_positive_score,
brand_negative_score,
video_content_summary,
video_brand_summary,
opinion) = parse_response(response)
if ( validate_score(brand_relevance_score) and
validate_score(brand_positive_score) and
validate_score(brand_negative_score) and
validate_summary(video_content_summary) and
validate_summary(video_brand_summary) ):
return ( brand_relevance_score,
brand_positive_score,
brand_negative_score,
video_content_summary,
video_brand_summary,
opinion
)
else:
retries += 1
ValueError(
"The value may be incorrect, there may be a range issue, a parsing"
" issue, or a response issue with Gemini: score -"
f" {brand_relevance_score}, {brand_positive_score},"
f" {brand_negative_score} , summary - {video_content_summary},"
f" {video_brand_summary}" )
except Exception as e:
print(f"Request failed: {e}")
retries += 1
if retries <= max_retries:
print(f"retry ({retries}/{max_retries})...")
else:
print("Maximum number of retries exceeded")
return 0, 0, 0, "", "", ""
def validate_score(score):
return score >= 0 and score <= 100
def validate_summary(summary):
return len(summary) > 0
Este bloque de código es responsable de tres funciones principales: crear un marco de datos, ejecutar un análisis de Gemini y, luego, actualizar el marco de datos.
def df_youtube_videos():
youtube_video_list = get_youtube_videos(SEARCH_QUERY, VIEWER_COUNTRY, VIDEO_TOPIC, DATE_INPUT)
youtube_video_link_list = []
youtube_video_title_list = []
youtube_video_description_list = []
youtube_video_channel_title_list = []
youtube_video_duration_list = []
for video_id in youtube_video_list:
video_details = get_video_details(video_id)
# https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/video-understanding
if video_details['duration'] < 50*60 and not video_details['channel_id'] in EXCEPT_CHANNEL_IDS:
youtube_video_link_list.append(f'https://www.youtube.com/watch?v={video_id}')
if video_details:
youtube_video_title_list.append(video_details['title'])
youtube_video_description_list.append(video_details['description'])
youtube_video_channel_title_list.append(video_details['channel_title'])
duration_new_format = f"{video_details['duration'] // 3600:02d}:{(video_details['duration'] % 3600) // 60:02d}:{video_details['duration'] % 60:02d}" # HH:MM:SS
youtube_video_duration_list.append(duration_new_format)
else:
youtube_video_title_list.append('')
youtube_video_description_list.append('')
youtube_video_channel_title_list.append(video_details['channel_title'])
youtube_video_duration_list.append('')
df = DataFrame({
'video_id': youtube_video_link_list,
'title': youtube_video_title_list,
'description': youtube_video_description_list,
'channel_title': youtube_video_channel_title_list,
'length': youtube_video_duration_list
})
return df
def run_gemini(df):
for index, row in df.iterrows():
video_title = row['title']
video_description = row['description']
video_link = row['video_id']
prompt = create_prompt(video_title, video_description, video_link)
( brand_relevance_score,
brand_positive_score,
brand_negative_score,
video_content_summary,
video_brand_summary,
opinion) = request_gemini_with_retry(prompt, video_link)
df.at[index, 'gemini_brand_relevance_score'] = brand_relevance_score
df.at[index, 'gemini_brand_positive_score'] = brand_positive_score
df.at[index, 'gemini_brand_negative_score'] = brand_negative_score
df.at[index, 'gemini_video_content_summary'] = video_content_summary
df.at[index, 'gemini_video_brand_summary'] = video_brand_summary
df.at[index, 'gemini_opinion'] = opinion
# https://cloud.google.com/vertex-ai/generative-ai/docs/quotas
time.sleep(1)
print(f"Processing: {index}/{len(df)}")
print(f"video_title: {video_title}")
return df
Este es un bloque de código que ejecuta todo el código escrito hasta el momento. Recupera datos de YouTube, los analiza con Gemini y, por último, crea un dataframe.
# main
df = df_youtube_videos()
run_gemini(df)
df['gemini_brand_positive_score'] = df[ 'gemini_brand_positive_score' ].astype('int64')
df['gemini_brand_relevance_score'] = df[ 'gemini_brand_relevance_score' ].astype('int64')
df['gemini_brand_negative_score'] = df[ 'gemini_brand_negative_score' ].astype('int64')
df = df.sort_values( 'gemini_brand_positive_score', ascending=False )
df
El último paso es crear una hoja de cálculo a partir del marco de datos. Para verificar tu progreso, usa la URL de salida.
import gspread
from google.auth import default
today_date = datetime.now().strftime('%Y-%m-%d')
my_spreadsheet_title = f"Partner's Video Finder, {BRAND_NAME}, {SEARCH_QUERY}, {VIEWER_COUNTRY} ({DATE_INPUT}~{today_date})"
creds, _ = default()
gc = gspread.authorize(creds)
sh = gc.create(my_spreadsheet_title)
worksheet = gc.open(my_spreadsheet_title).sheet1
cell_list = df.values.tolist()
worksheet.update([df.columns.values.tolist()] + cell_list)
print("URL: ", sh.url)
3. Referencia
Para escribir el código, consulté lo siguiente. Si necesitas modificar el código o deseas conocer el uso más detallado, consulta el siguiente vínculo.