Analiza filmów w YouTube na potrzeby marketingu za pomocą Gemini

Pozostało minut: 6

Informacje o tym ćwiczeniu (w Codelabs)

Ostatnia aktualizacja: kwi 3, 2025

Autorzy: Jisub Lee, Kyungjune Shin

1. Wprowadzenie

Ostatnia aktualizacja: 12 marca 2025 r.

Wyłączenie odpowiedzialności

Oto przykładowy kod, który analizuje filmy za pomocą interfejsu YouTube Data API i Gemini. Użytkownik jest odpowiedzialny za korzystanie z nich. Kod używany w rzeczywistych środowiskach powinien być dokładnie przeanalizowany. Autor nie ponosi odpowiedzialności za żadne problemy wynikające z używania tego kodu. Ponadto ze względu na charakter sztucznej inteligencji zawsze istnieje możliwość, że wyniki mogą różnić się od rzeczywistych faktów. Dlatego nie należy bezkrytycznie ufać wynikom, lecz dokładnie je sprawdzić.

Cel tego projektu

Głównym celem jest określenie odpowiednich filmów w YouTube i kanałów YouTube do promowania marki poprzez analizę treści filmów i nastawienia.

Omówienie

Projekt korzysta z interfejsu YouTube Data API do pobierania informacji o filmach oraz z interfejsu Vertex AI API od Google Cloud Platform z modelem Gemini do analizowania treści filmów. Jest ona dostępna w Colab.

Kody, które pojawią się w przyszłości, możesz wkleić do Colab i uruchamiać je pojedynczo.

Czego się nauczysz

Jak za pomocą interfejsu YouTube Data API pobrać informacje o filmach.
Jak używać interfejsu Vertex AI API w GCP z modelem Gemini do analizowania treści wideo.
Jak uruchomić kod za pomocą Google Colab.
Jak utworzyć arkusz kalkulacyjny na podstawie przeanalizowanych danych.

Czego potrzebujesz

Aby wdrożyć to rozwiązanie, musisz mieć:

projekt Google Cloud Platform,
Włącz w projekcie interfejsy YouTube Data API v3, Vertex AI API, Generative Language API, Google Drive API i Google Sheets API.
Na karcie „Dane logowania” utwórz klucz interfejsu API z autoryzacją dla interfejsu YouTube Data API v3.

To rozwiązanie korzysta z interfejsu YouTube Data API i interfejsu Vertex AI GCP.

2. Kod i wyjaśnienie

Najpierw musimy zaimportować biblioteki, których chcemy użyć. Następnie zaloguj się na konto Google i przejdź do sekcji z uprawnieniami.

# library
# colab
import ipywidgets as widgets
from IPython.display import display
from google.colab import auth

# cloud
from google import genai
from google.genai.types import Part, GenerateContentConfig

# function, util
import requests, os, re, time
from pandas import DataFrame
from datetime import datetime, timedelta

auth.authenticate_user()

[Wymagane działanie]

Wartości, które zwykle trzeba zmienić, to KLUCZ API i IDENTYFIKATOR PROJEKTU z GCP. Komórki poniżej zawierają wartości ustawień GCP.

# GCP Setting
LANGUAGE_MODEL = 'gemini-1.5-pro' # @param {type:"string"}
API_KEY = 'Please write your API_KEY' # @param {type:"string"}
PROJECT_ID = 'Please write your GCP_ID' # @param {type:"string"}
LOCATION = 'us-central1' # @param {type:"string"}

[Wymagane działanie]

Podczas sprawdzania kodów poniżej w polu „Input” (Wejście) zmień wartości zmiennych.

Na przykładzie marki „Google” pokażemy, jak wyszukiwać w YouTube filmy na określony temat (np. „Sztuczna inteligencja Google”), wykluczając jednocześnie filmy z kanału tej marki.

Zmienne wejściowe do analizy filmów w YouTube

BRAND_NAME (wymagany): nazwa marki do analizy (np. Google).
MY_COMPANY_INFO (wymagany): krótki opis marki i kontekst.
SEARCH_QUERY (wymagany): wyszukiwane hasło dotyczące filmów w YouTube (np. Google AI).
VIEWER_COUNTRY: kod kraju widza (dwuliterowy kod kraju w formacie ISO 3166-1 alfa-2) (np. KR).
GENERATION_LANGUAGE (wymagany): język wyników Gemini (np. koreański).
EXCEPT_CHANNEL_IDS: identyfikatory kanałów rozdzielone przecinkami, które mają zostać wykluczone.

Identyfikator kanału możesz znaleźć na kanale YouTube.

VIDEO_TOPIC: identyfikator tematu w YouTube.

Wartość tematu filmu możesz znaleźć w Wyszukiwanie: lista | YouTube Data API | Google for Developers.

DATE_INPUT (wymagany): data rozpoczęcia publikacji filmu (RRRR-MM-DD).

# Input
BRAND_NAME = "Google" # @param {type:"string"}
MY_COMPANY_INFO = "Google is a multinational technology company specializing in internet-related services and products." # @param {type:"string"}
SEARCH_QUERY = 'Google AI' # @param {type:"string"}
VIEWER_COUNTRY = 'KR' # @param {type:"string"}
GENERATION_LANGUAGE = 'Korean' # @param {type:"string"}
EXCEPT_CHANNEL_IDS = 'UCK8sQmJBp8GCxrOtXWBpyEA, UCdc_SRhKUlH3grljQXA0skw' # @param {type:"string"}
VIDEO_TOPIC = '/m/07c1v' # @param {type: "string"}
DATE_INPUT = '2025-01-01' # @param {type:"date"}

# Auth Scope
SCOPE = [
    'https://www.googleapis.com/auth/youtube.readonly',
    'https://www.googleapis.com/auth/spreadsheets',
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/cloud-platform'
]

# validation check
if not SEARCH_QUERY or not DATE_INPUT:
  raise ValueError("Search query and date input are required.")

EXCEPT_CHANNEL_IDS = [id.strip() for id in EXCEPT_CHANNEL_IDS.split(',')]

Udostępniony tekst zawiera listę najważniejszych funkcji związanych z interakcją z interfejsem YouTube Data API.

# YouTube API function

def get_youtube_videos(q, viewer_country_code, topic_str, start_period):

    page_token_number = 1
    next_page_token = ''
    merged_array = []

    published_after_date = f"{start_period}T00:00:00Z"

    while page_token_number < 9 and len(merged_array) <= 75:
        result = search_youtube(q, topic_str, published_after_date, viewer_country_code, '', next_page_token, 50)
        merged_array = list(set(merged_array + result['items']))
        next_page_token = result['nextPageToken']
        page_token_number += 1

    return merged_array

def search_youtube(query, topic_id, published_after, region_code, relevance_language, next_page_token, max_results=50):

    if not query:
        return None

    q = query

    url = f'https://www.googleapis.com/youtube/v3/search?key={API_KEY}&part=snippet&q={q}&publishedAfter={published_after}&regionCode={region_code}&type=video&topicId={topic_id}&maxResults={max_results}&pageToken={next_page_token}&gl={region_code.lower()}'

    response = requests.get(url)
    data = response.json()
    results = data.get('items', [])
    next_page_token = data.get('nextPageToken', '')
    return_results = [item['id']['videoId'] for item in results]

    print(url)

    return {
        "nextPageToken": next_page_token,
        "items": return_results
    }

def get_date_string(days_ago):

    date = datetime.now() + timedelta(days=days_ago)
    return date.strftime('%Y-%m-%dT00:00:00Z')

def get_video_details(video_id):

    url = f'https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails&id={video_id}&key={API_KEY}'
    response = requests.get(url)
    data = response.json()

    if data.get('items'):
        video = data['items'][0]
        snippet = video['snippet']
        content_details = video['contentDetails']

        title = snippet.get('title', 'no title')
        description = snippet.get('description', 'no description')
        duration_iso = content_details.get('duration', None)
        channel_id = snippet.get('channelId', 'no channel id')
        channel_title = snippet.get('channelTitle', 'no channel title')
        return {'title': title, 'description': description, 'duration': duration_to_seconds(duration_iso), 'channel_id': channel_id, 'channel_title': channel_title}
    else:
        return None

def duration_to_seconds(duration_str):
  match = re.match(r'PT(?:(\d+)H)?(?:(\d+)M)?(?:(\d+)S)?', duration_str)
  if not match:
    return None

  hours, minutes, seconds = match.groups()

  total_seconds = 0
  if hours:
    total_seconds += int(hours) * 3600
  if minutes:
    total_seconds += int(minutes) * 60
  if seconds:
    total_seconds += int(seconds)

  return total_seconds

Tekst zawiera szablon promptu, który można dostosować w razie potrzeby, oraz kluczowe funkcje do interakcji z interfejsem Vertex AI API w GCP.

# GCP Vertex AI API

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)
model = client.models

def request_gemini(prompt, video_link):
  video_extraction_json_generation_config = GenerateContentConfig(
    temperature=0.0,
    max_output_tokens=2048,
  )

  contents = [
      Part.from_uri(
          file_uri=video_link,
          mime_type="video/mp4",
      ),
      prompt
  ]

  response = model.generate_content(
      model=LANGUAGE_MODEL,
      contents=contents,
      config=video_extraction_json_generation_config
  )

  try:
    return response.text
  except:
    return response.GenerateContentResponse

def create_prompt(yt_title, yt_description, yt_link):
  return f"""### Task: You are a highly specialized marketer and YouTube expert working for the brand or company, {BRAND_NAME}.
Your boss is wondering which a video to use to promote their company's advertisements and which a YouTuber to promote their advertisements with in the future. You are the expert who can give your boss the most suitable suggestions.
Analyze the video according to the criteria below and solve your boss's worries.

### Criteria: Now you review the video.
If you evaluate it using the following criteria, you will be able to receive a better evaluation.

1. Whether the video mentions brand, {BRAND_NAME}.
2. Whether the video views {BRAND_NAME} positively or negatively.
3. Whether the video would be suitable for marketing purposes.

### Context and Contents:
Your Company Information:
- Company Description: {MY_COMPANY_INFO}
- Brand: {BRAND_NAME}

Analysis subject:
- YouTube title: {yt_title}
- YouTube description: {yt_description}
- YouTube link: {yt_link}

### Answer Format:
brand_relevance_score: (Integer between 0 and 100 - If this video is more relative about the {BRAND_NAME}, it will score higher)
brand_positive_score: (Integer between 0 and 100 - If this video is positive about the {BRAND_NAME}, it will score higher)
brand_negative_score: (Integer between 0 and 100 - If this video is negative about the {BRAND_NAME}, it will score higher)
video_content_summary: (Summarize the content of the video like overview)
video_brand_summary: (Summarize the content about your brand, {BRAND_NAME})
opinion: (Why this video is suitable for promoting your company or product)

### Examples:
brand_relevance_score: 100
brand_positive_score: 80
brand_negative_score: 0
video_content_summary: YouTubers introduce various electronic products in their videos.
video_brand_summary: The brand products mentioned in the video have their advantages well explained by the YouTuber.
opinion: Consumers are more likely to think positively about the advantages of the product.

### Caution:
DO NOT fabricate information.
DO NOT imagine things.
DO NOT Markdown format.
DO Analyze each video based on the criteria mentioned above.
DO Analyze after watching the whole video.
DO write the results for summary as {GENERATION_LANGUAGE}."""

def parse_response(response: str):
  brand_relevance_score_pattern = r"brand_relevance_score:\s*(\d{1,3})"
  brand_positive_score_pattern = r"brand_positive_score:\s*(\d{1,3})"
  brand_negative_score_pattern = r"brand_negative_score:\s*(\d{1,3})"
  video_content_summary_pattern = r"video_content_summary:\s*(.*)"
  video_brand_summary_pattern = r"video_brand_summary:\s*(.*)"
  opinion_pattern = r"opinion:\s*(.*)"
  brand_relevance_score_match = re.search( brand_relevance_score_pattern, response )
  brand_relevance_score = ( int(brand_relevance_score_match.group(1)) if brand_relevance_score_match else 0 )
  brand_positive_score_match = re.search( brand_positive_score_pattern, response )
  brand_positive_score = ( int(brand_positive_score_match.group(1)) if brand_positive_score_match else 0 )
  brand_negative_score_match = re.search( brand_negative_score_pattern, response )
  brand_negative_score = ( int(brand_negative_score_match.group(1)) if brand_negative_score_match else 0 )
  video_content_score_match = re.search( video_content_summary_pattern, response )
  video_content_summary = ( video_content_score_match.group(1) if video_content_score_match else '' )
  video_brand_summary_match = re.search( video_brand_summary_pattern, response )
  video_brand_summary = ( video_brand_summary_match.group(1) if video_brand_summary_match else '' )
  opinion_match = re.search( opinion_pattern, response )
  opinion = ( opinion_match.group(1) if opinion_match else '' )
  return ( brand_relevance_score, brand_positive_score, brand_negative_score, video_content_summary, video_brand_summary, opinion)

def request_gemini_with_retry(prompt, youtube_link='', max_retries=1):
  retries = 0
  while retries <= max_retries:
    try:
      response = request_gemini(prompt, youtube_link)
      ( brand_relevance_score,
        brand_positive_score,
        brand_negative_score,
        video_content_summary,
        video_brand_summary,
        opinion) = parse_response(response)
      if ( validate_score(brand_relevance_score) and
           validate_score(brand_positive_score) and
           validate_score(brand_negative_score) and
           validate_summary(video_content_summary) and
           validate_summary(video_brand_summary) ):

        return ( brand_relevance_score,
                 brand_positive_score,
                 brand_negative_score,
                 video_content_summary,
                 video_brand_summary,
                 opinion
              )
      else:
        retries += 1
        ValueError(
            "The value may be incorrect, there may be a range issue, a parsing"
            " issue, or a response issue with Gemini: score -"
            f" {brand_relevance_score}, {brand_positive_score},"
            f" {brand_negative_score} , summary - {video_content_summary},"
            f" {video_brand_summary}" )

    except Exception as e:
      print(f"Request failed: {e}")
      retries += 1
      if retries <= max_retries:
        print(f"retry ({retries}/{max_retries})...")
      else:
        print("Maximum number of retries exceeded")
        return 0, 0, 0, "", "", ""

def validate_score(score):
  return score >= 0 and score <= 100

def validate_summary(summary):
  return len(summary) > 0

Ten blok kodu odpowiada za 3 główne funkcje: tworzenie ramki danych, wykonywanie analizy Gemini i jej aktualizowanie.

def df_youtube_videos():
  youtube_video_list = get_youtube_videos(SEARCH_QUERY, VIEWER_COUNTRY, VIDEO_TOPIC, DATE_INPUT)
  youtube_video_link_list = []
  youtube_video_title_list = []
  youtube_video_description_list = []
  youtube_video_channel_title_list = []
  youtube_video_duration_list = []

  for video_id in youtube_video_list:
    video_details = get_video_details(video_id)
    # https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/video-understanding
    if video_details['duration'] < 50*60 and not video_details['channel_id'] in EXCEPT_CHANNEL_IDS:
      youtube_video_link_list.append(f'https://www.youtube.com/watch?v={video_id}')
      if video_details:
        youtube_video_title_list.append(video_details['title'])
        youtube_video_description_list.append(video_details['description'])
        youtube_video_channel_title_list.append(video_details['channel_title'])
        duration_new_format = f"{video_details['duration'] // 3600:02d}:{(video_details['duration'] % 3600) // 60:02d}:{video_details['duration'] % 60:02d}" # HH:MM:SS
        youtube_video_duration_list.append(duration_new_format)
      else:
        youtube_video_title_list.append('')
        youtube_video_description_list.append('')
        youtube_video_channel_title_list.append(video_details['channel_title'])
        youtube_video_duration_list.append('')

  df = DataFrame({
      'video_id': youtube_video_link_list,
      'title': youtube_video_title_list,
      'description': youtube_video_description_list,
      'channel_title': youtube_video_channel_title_list,
      'length': youtube_video_duration_list
  })
  return df

def run_gemini(df):
  for index, row in df.iterrows():
    video_title = row['title']
    video_description = row['description']
    video_link = row['video_id']
    prompt = create_prompt(video_title, video_description, video_link)
    ( brand_relevance_score,
      brand_positive_score,
      brand_negative_score,
      video_content_summary,
      video_brand_summary,
      opinion) = request_gemini_with_retry(prompt, video_link)
    df.at[index, 'gemini_brand_relevance_score'] = brand_relevance_score
    df.at[index, 'gemini_brand_positive_score'] = brand_positive_score
    df.at[index, 'gemini_brand_negative_score'] = brand_negative_score
    df.at[index, 'gemini_video_content_summary'] = video_content_summary
    df.at[index, 'gemini_video_brand_summary'] = video_brand_summary
    df.at[index, 'gemini_opinion'] = opinion
    # https://cloud.google.com/vertex-ai/generative-ai/docs/quotas
    time.sleep(1)
    print(f"Processing: {index}/{len(df)}")
    print(f"video_title: {video_title}")
  return df

To blok kodu, który wykonuje cały dotychczas napisany kod. Pobiera dane z YouTube, analizuje je za pomocą Gemini, a następnie tworzy ramkę danych.

# main
df = df_youtube_videos()
run_gemini(df)
df['gemini_brand_positive_score'] = df[ 'gemini_brand_positive_score' ].astype('int64')
df['gemini_brand_relevance_score'] = df[ 'gemini_brand_relevance_score' ].astype('int64')
df['gemini_brand_negative_score'] = df[ 'gemini_brand_negative_score' ].astype('int64')
df = df.sort_values( 'gemini_brand_positive_score', ascending=False )

df

Ostatnim krokiem jest utworzenie arkusza kalkulacyjnego z ramki danych. Aby sprawdzić postęp, użyj adresu URL danych wyjściowych.

import gspread
from google.auth import default

today_date = datetime.now().strftime('%Y-%m-%d')
my_spreadsheet_title = f"Partner's Video Finder, {BRAND_NAME}, {SEARCH_QUERY}, {VIEWER_COUNTRY} ({DATE_INPUT}~{today_date})"

creds, _ = default()
gc = gspread.authorize(creds)
sh = gc.create(my_spreadsheet_title)
worksheet = gc.open(my_spreadsheet_title).sheet1
cell_list = df.values.tolist()
worksheet.update([df.columns.values.tolist()] + cell_list)

print("URL: ", sh.url)

3. Dokumentacja

Podczas pisania kodu korzystałem z tych informacji. Jeśli chcesz zmodyfikować kod lub dowiedzieć się więcej o jego wykorzystaniu, kliknij link poniżej.

Zgłoś pomyłkę