Анализ видео YouTube для маркетинга с Gemini

Минут осталось: 6

О практической работе

Последнее обновление: апр. 3, 2025

Авторы: Jisub Lee, Kyungjune Shin

Эта страница переведена с помощью Cloud Translation API.

1. Введение

Последнее обновление: 12 марта 2025 г.

Отказ от ответственности

Это пример кода, который анализирует видео с помощью API данных YouTube и Gemini. Пользователь несет ответственность за его использование. Этот код, используемый в реальных средах, следует тщательно рассмотреть. Автор не несет ответственности за любые проблемы, возникающие в результате использования данного кода. Кроме того, из-за природы искусственного интеллекта всегда существует вероятность того, что результаты могут отличаться от реальных фактов. Поэтому не следует слепо доверять результатам и их следует тщательно анализировать.

Цель этого проекта

Основная цель — определить подходящие видеоролики YouTube и пользователей YouTube для продвижения бренда путем анализа видеоконтента и настроений.

Обзор

Проект использует API данных YouTube для получения видеоинформации и API GCP Vertex AI с моделью Gemini для анализа видеоконтента. Он работает на Google Colab .

Вы можете вставить коды, которые появятся в будущем, в colab и запускать их один за другим.

Что вы узнаете

Как использовать API данных YouTube для получения информации о видео.
Как использовать GCP Vertex AI API с моделью Gemini для анализа видеоконтента.
Как использовать Google Colab для запуска кода.
Как создать электронную таблицу на основе проанализированных данных.

Что вам понадобится

Для реализации данного решения вам понадобится следующее:

Проект Google Cloud Platform.
Включите в проекте API данных YouTube v3, API Vertex AI, API генеративного языка, API Google Диска и API Google Таблиц.
Создайте ключ API на вкладке учетных данных с авторизацией для API данных YouTube v3.

В этом решении используются API данных YouTube и API GCP Vertex AI.

2. Код и объяснение

Первое, что нам нужно сделать, это импортировать библиотеки, которые мы хотим использовать. Затем войдите в свою учетную запись Google и предоставьте разрешение на доступ к вашему Google Диску.

# library
# colab
import ipywidgets as widgets
from IPython.display import display
from google.colab import auth

# cloud
from google import genai
from google.genai.types import Part, GenerateContentConfig

# function, util
import requests, os, re, time
from pandas import DataFrame
from datetime import datetime, timedelta

auth.authenticate_user()

[Требуется действие]

API KEY и PROJECT ID из GCP — это значения, которые обычно необходимо изменить. Ячейки ниже предназначены для значений настроек GCP.

# GCP Setting
LANGUAGE_MODEL = 'gemini-1.5-pro' # @param {type:"string"}
API_KEY = 'Please write your API_KEY' # @param {type:"string"}
PROJECT_ID = 'Please write your GCP_ID' # @param {type:"string"}
LOCATION = 'us-central1' # @param {type:"string"}

[Требуется действие]

Пожалуйста, измените значения переменных, проверяя коды ниже «Ввод».

На примере бренда Google в этой статье будет показано, как искать на YouTube видео по определенной теме (например, «Google AI»), исключая при этом видео с собственного канала бренда.

Входные переменные для анализа видео YouTube

BRAND_NAME (обязательно): название бренда для анализа (например, Google).
MY_COMPANY_INFO (обязательно): краткое описание бренда и контекст.
SEARCH_QUERY (обязательно): поисковый запрос для видео YouTube (например, Google AI).
VIEWER_COUNTRY: код страны зрителя (двухбуквенный код страны: ISO 3166-1 альфа-2 ) (например, KR).
GENERATION_LANGUAGE (обязательно): язык результатов Gemini (например, корейский).
EXCEPT_CHANNEL_IDS: идентификаторы каналов, разделенные запятыми, которые необходимо исключить.

Вы можете найти идентификатор канала на канале YouTube.

а581655472a9b1b0.png

VIDEO_TOPIC: идентификатор темы YouTube для уточнения.

Значение темы видео можно найти в Поиске: список | API данных YouTube | Гугл для разработчиков .

DATE_INPUT (обязательно): дата начала публикации видео (ГГГГ-ММ-ДД).

# Input
BRAND_NAME = "Google" # @param {type:"string"}
MY_COMPANY_INFO = "Google is a multinational technology company specializing in internet-related services and products." # @param {type:"string"}
SEARCH_QUERY = 'Google AI' # @param {type:"string"}
VIEWER_COUNTRY = 'KR' # @param {type:"string"}
GENERATION_LANGUAGE = 'Korean' # @param {type:"string"}
EXCEPT_CHANNEL_IDS = 'UCK8sQmJBp8GCxrOtXWBpyEA, UCdc_SRhKUlH3grljQXA0skw' # @param {type:"string"}
VIDEO_TOPIC = '/m/07c1v' # @param {type: "string"}
DATE_INPUT = '2025-01-01' # @param {type:"date"}

# Auth Scope
SCOPE = [
    'https://www.googleapis.com/auth/youtube.readonly',
    'https://www.googleapis.com/auth/spreadsheets',
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/cloud-platform'
]

# validation check
if not SEARCH_QUERY or not DATE_INPUT:
  raise ValueError("Search query and date input are required.")

EXCEPT_CHANNEL_IDS = [id.strip() for id in EXCEPT_CHANNEL_IDS.split(',')]

В предоставленном тексте перечислены ключевые функции, связанные с взаимодействием с API данных YouTube.

# YouTube API function

def get_youtube_videos(q, viewer_country_code, topic_str, start_period):

    page_token_number = 1
    next_page_token = ''
    merged_array = []

    published_after_date = f"{start_period}T00:00:00Z"

    while page_token_number < 9 and len(merged_array) <= 75:
        result = search_youtube(q, topic_str, published_after_date, viewer_country_code, '', next_page_token, 50)
        merged_array = list(set(merged_array + result['items']))
        next_page_token = result['nextPageToken']
        page_token_number += 1

    return merged_array

def search_youtube(query, topic_id, published_after, region_code, relevance_language, next_page_token, max_results=50):

    if not query:
        return None

    q = query

    url = f'https://www.googleapis.com/youtube/v3/search?key={API_KEY}&part=snippet&q={q}&publishedAfter={published_after}&regionCode={region_code}&type=video&topicId={topic_id}&maxResults={max_results}&pageToken={next_page_token}&gl={region_code.lower()}'

    response = requests.get(url)
    data = response.json()
    results = data.get('items', [])
    next_page_token = data.get('nextPageToken', '')
    return_results = [item['id']['videoId'] for item in results]

    print(url)

    return {
        "nextPageToken": next_page_token,
        "items": return_results
    }

def get_date_string(days_ago):

    date = datetime.now() + timedelta(days=days_ago)
    return date.strftime('%Y-%m-%dT00:00:00Z')

def get_video_details(video_id):

    url = f'https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails&id={video_id}&key={API_KEY}'
    response = requests.get(url)
    data = response.json()

    if data.get('items'):
        video = data['items'][0]
        snippet = video['snippet']
        content_details = video['contentDetails']

        title = snippet.get('title', 'no title')
        description = snippet.get('description', 'no description')
        duration_iso = content_details.get('duration', None)
        channel_id = snippet.get('channelId', 'no channel id')
        channel_title = snippet.get('channelTitle', 'no channel title')
        return {'title': title, 'description': description, 'duration': duration_to_seconds(duration_iso), 'channel_id': channel_id, 'channel_title': channel_title}
    else:
        return None

def duration_to_seconds(duration_str):
  match = re.match(r'PT(?:(\d+)H)?(?:(\d+)M)?(?:(\d+)S)?', duration_str)
  if not match:
    return None

  hours, minutes, seconds = match.groups()

  total_seconds = 0
  if hours:
    total_seconds += int(hours) * 3600
  if minutes:
    total_seconds += int(minutes) * 60
  if seconds:
    total_seconds += int(seconds)

  return total_seconds

В тексте представлен шаблон подсказки, который можно настроить по мере необходимости, а также ключевые функции для взаимодействия с API GCP Vertex AI.

# GCP Vertex AI API

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)
model = client.models

def request_gemini(prompt, video_link):
  video_extraction_json_generation_config = GenerateContentConfig(
    temperature=0.0,
    max_output_tokens=2048,
  )

  contents = [
      Part.from_uri(
          file_uri=video_link,
          mime_type="video/mp4",
      ),
      prompt
  ]

  response = model.generate_content(
      model=LANGUAGE_MODEL,
      contents=contents,
      config=video_extraction_json_generation_config
  )

  try:
    return response.text
  except:
    return response.GenerateContentResponse

def create_prompt(yt_title, yt_description, yt_link):
  return f"""### Task: You are a highly specialized marketer and YouTube expert working for the brand or company, {BRAND_NAME}.
Your boss is wondering which a video to use to promote their company's advertisements and which a YouTuber to promote their advertisements with in the future. You are the expert who can give your boss the most suitable suggestions.
Analyze the video according to the criteria below and solve your boss's worries.

### Criteria: Now you review the video.
If you evaluate it using the following criteria, you will be able to receive a better evaluation.

1. Whether the video mentions brand, {BRAND_NAME}.
2. Whether the video views {BRAND_NAME} positively or negatively.
3. Whether the video would be suitable for marketing purposes.

### Context and Contents:
Your Company Information:
- Company Description: {MY_COMPANY_INFO}
- Brand: {BRAND_NAME}

Analysis subject:
- YouTube title: {yt_title}
- YouTube description: {yt_description}
- YouTube link: {yt_link}

### Answer Format:
brand_relevance_score: (Integer between 0 and 100 - If this video is more relative about the {BRAND_NAME}, it will score higher)
brand_positive_score: (Integer between 0 and 100 - If this video is positive about the {BRAND_NAME}, it will score higher)
brand_negative_score: (Integer between 0 and 100 - If this video is negative about the {BRAND_NAME}, it will score higher)
video_content_summary: (Summarize the content of the video like overview)
video_brand_summary: (Summarize the content about your brand, {BRAND_NAME})
opinion: (Why this video is suitable for promoting your company or product)

### Examples:
brand_relevance_score: 100
brand_positive_score: 80
brand_negative_score: 0
video_content_summary: YouTubers introduce various electronic products in their videos.
video_brand_summary: The brand products mentioned in the video have their advantages well explained by the YouTuber.
opinion: Consumers are more likely to think positively about the advantages of the product.

### Caution:
DO NOT fabricate information.
DO NOT imagine things.
DO NOT Markdown format.
DO Analyze each video based on the criteria mentioned above.
DO Analyze after watching the whole video.
DO write the results for summary as {GENERATION_LANGUAGE}."""

def parse_response(response: str):
  brand_relevance_score_pattern = r"brand_relevance_score:\s*(\d{1,3})"
  brand_positive_score_pattern = r"brand_positive_score:\s*(\d{1,3})"
  brand_negative_score_pattern = r"brand_negative_score:\s*(\d{1,3})"
  video_content_summary_pattern = r"video_content_summary:\s*(.*)"
  video_brand_summary_pattern = r"video_brand_summary:\s*(.*)"
  opinion_pattern = r"opinion:\s*(.*)"
  brand_relevance_score_match = re.search( brand_relevance_score_pattern, response )
  brand_relevance_score = ( int(brand_relevance_score_match.group(1)) if brand_relevance_score_match else 0 )
  brand_positive_score_match = re.search( brand_positive_score_pattern, response )
  brand_positive_score = ( int(brand_positive_score_match.group(1)) if brand_positive_score_match else 0 )
  brand_negative_score_match = re.search( brand_negative_score_pattern, response )
  brand_negative_score = ( int(brand_negative_score_match.group(1)) if brand_negative_score_match else 0 )
  video_content_score_match = re.search( video_content_summary_pattern, response )
  video_content_summary = ( video_content_score_match.group(1) if video_content_score_match else '' )
  video_brand_summary_match = re.search( video_brand_summary_pattern, response )
  video_brand_summary = ( video_brand_summary_match.group(1) if video_brand_summary_match else '' )
  opinion_match = re.search( opinion_pattern, response )
  opinion = ( opinion_match.group(1) if opinion_match else '' )
  return ( brand_relevance_score, brand_positive_score, brand_negative_score, video_content_summary, video_brand_summary, opinion)

def request_gemini_with_retry(prompt, youtube_link='', max_retries=1):
  retries = 0
  while retries <= max_retries:
    try:
      response = request_gemini(prompt, youtube_link)
      ( brand_relevance_score,
        brand_positive_score,
        brand_negative_score,
        video_content_summary,
        video_brand_summary,
        opinion) = parse_response(response)
      if ( validate_score(brand_relevance_score) and
           validate_score(brand_positive_score) and
           validate_score(brand_negative_score) and
           validate_summary(video_content_summary) and
           validate_summary(video_brand_summary) ):

        return ( brand_relevance_score,
                 brand_positive_score,
                 brand_negative_score,
                 video_content_summary,
                 video_brand_summary,
                 opinion
              )
      else:
        retries += 1
        ValueError(
            "The value may be incorrect, there may be a range issue, a parsing"
            " issue, or a response issue with Gemini: score -"
            f" {brand_relevance_score}, {brand_positive_score},"
            f" {brand_negative_score} , summary - {video_content_summary},"
            f" {video_brand_summary}" )

    except Exception as e:
      print(f"Request failed: {e}")
      retries += 1
      if retries <= max_retries:
        print(f"retry ({retries}/{max_retries})...")
      else:
        print("Maximum number of retries exceeded")
        return 0, 0, 0, "", "", ""

def validate_score(score):
  return score >= 0 and score <= 100

def validate_summary(summary):
  return len(summary) > 0

Этот блок кода отвечает за три основные функции: создание кадра данных, выполнение анализа Gemini и последующее обновление кадра данных.

def df_youtube_videos():
  youtube_video_list = get_youtube_videos(SEARCH_QUERY, VIEWER_COUNTRY, VIDEO_TOPIC, DATE_INPUT)
  youtube_video_link_list = []
  youtube_video_title_list = []
  youtube_video_description_list = []
  youtube_video_channel_title_list = []
  youtube_video_duration_list = []

  for video_id in youtube_video_list:
    video_details = get_video_details(video_id)
    # https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/video-understanding
    if video_details['duration'] < 50*60 and not video_details['channel_id'] in EXCEPT_CHANNEL_IDS:
      youtube_video_link_list.append(f'https://www.youtube.com/watch?v={video_id}')
      if video_details:
        youtube_video_title_list.append(video_details['title'])
        youtube_video_description_list.append(video_details['description'])
        youtube_video_channel_title_list.append(video_details['channel_title'])
        duration_new_format = f"{video_details['duration'] // 3600:02d}:{(video_details['duration'] % 3600) // 60:02d}:{video_details['duration'] % 60:02d}" # HH:MM:SS
        youtube_video_duration_list.append(duration_new_format)
      else:
        youtube_video_title_list.append('')
        youtube_video_description_list.append('')
        youtube_video_channel_title_list.append(video_details['channel_title'])
        youtube_video_duration_list.append('')

  df = DataFrame({
      'video_id': youtube_video_link_list,
      'title': youtube_video_title_list,
      'description': youtube_video_description_list,
      'channel_title': youtube_video_channel_title_list,
      'length': youtube_video_duration_list
  })
  return df

def run_gemini(df):
  for index, row in df.iterrows():
    video_title = row['title']
    video_description = row['description']
    video_link = row['video_id']
    prompt = create_prompt(video_title, video_description, video_link)
    ( brand_relevance_score,
      brand_positive_score,
      brand_negative_score,
      video_content_summary,
      video_brand_summary,
      opinion) = request_gemini_with_retry(prompt, video_link)
    df.at[index, 'gemini_brand_relevance_score'] = brand_relevance_score
    df.at[index, 'gemini_brand_positive_score'] = brand_positive_score
    df.at[index, 'gemini_brand_negative_score'] = brand_negative_score
    df.at[index, 'gemini_video_content_summary'] = video_content_summary
    df.at[index, 'gemini_video_brand_summary'] = video_brand_summary
    df.at[index, 'gemini_opinion'] = opinion
    # https://cloud.google.com/vertex-ai/generative-ai/docs/quotas
    time.sleep(1)
    print(f"Processing: {index}/{len(df)}")
    print(f"video_title: {video_title}")
  return df

Это блок кода, который выполняет весь написанный на данный момент код. Он извлекает данные с YouTube, анализирует их с помощью Gemini и, наконец, создает фрейм данных.

# main
df = df_youtube_videos()
run_gemini(df)
df['gemini_brand_positive_score'] = df[ 'gemini_brand_positive_score' ].astype('int64')
df['gemini_brand_relevance_score'] = df[ 'gemini_brand_relevance_score' ].astype('int64')
df['gemini_brand_negative_score'] = df[ 'gemini_brand_negative_score' ].astype('int64')
df = df.sort_values( 'gemini_brand_positive_score', ascending=False )

df

Последний шаг — создать электронную таблицу из кадра данных. Чтобы проверить свой прогресс, используйте выходной URL.

import gspread
from google.auth import default

today_date = datetime.now().strftime('%Y-%m-%d')
my_spreadsheet_title = f"Partner's Video Finder, {BRAND_NAME}, {SEARCH_QUERY}, {VIEWER_COUNTRY} ({DATE_INPUT}~{today_date})"

creds, _ = default()
gc = gspread.authorize(creds)
sh = gc.create(my_spreadsheet_title)
worksheet = gc.open(my_spreadsheet_title).sheet1
cell_list = df.values.tolist()
worksheet.update([df.columns.values.tolist()] + cell_list)

print("URL: ", sh.url)

3. Ссылка

Я ссылался на следующее для написания кода. Если вам нужно изменить код или узнать более подробную информацию об использовании, перейдите по ссылке ниже.

Сообщить об ошибке