YouTube Video Analysis For Marketing with Gemini

6 mins remaining

About this codelab

Last updated Apr 3, 2025

Written by Jisub Lee, Kyungjune Shin

1. Introduction

Last Updated: 2025-03-12

Disclaimer

This is a sample code that analyzes videos using the YouTube Data API and Gemini. The user is responsible for its usage. This code used in real-world environments should be carefully considered. The author is not responsible for any issues arising from the use of this code. Additionally, due to the nature of artificial intelligence, there is always a possibility that the results may differ from the actual facts. Therefore, the results should not be blindly trusted and should be carefully reviewed.

Goal of this Project

The primary objective is to identify suitable YouTube videos and YouTubers for brand promotion by analyzing video content and sentiment.

Overview

The project leverages the YouTube Data API to fetch video information and the GCP Vertex AI API with the Gemini model to analyze video content. It runs on Google Colab.

You can paste the codes that will come out in the future into colab and run them one by one.

What you'll learn

How to use the YouTube Data API to fetch video information.
How to use the GCP Vertex AI API with the Gemini model to analyze video content.
How to use Google Colab to run the code.
How to create a spreadsheet from the analyzed data.

What you'll need

To implement this solution, you will need the following:

A Google Cloud Platform project.
Enable the YouTube Data API v3, Vertex AI API, Generative Language API, Google Drive API, and Google Sheets API on the project.
Create an API key in the credentials tab with authorization for the YouTube Data API v3.

This solution utilizes the YouTube Data API and the GCP Vertex AI API.

2. Code and Explanation

The first thing we need to do is import the libraries we want to use. Then sign in with your Google Account and grant permission to access your Google Drive.

# library
# colab
import ipywidgets as widgets
from IPython.display import display
from google.colab import auth

# cloud
from google import genai
from google.genai.types import Part, GenerateContentConfig

# function, util
import requests, os, re, time
from pandas import DataFrame
from datetime import datetime, timedelta

auth.authenticate_user()

[Action Required]

The API KEY and PROJECT ID from GCP are the values that typically need to be changed. The cells below are for GCP setting values.

# GCP Setting
LANGUAGE_MODEL = 'gemini-1.5-pro' # @param {type:"string"}
API_KEY = 'Please write your API_KEY' # @param {type:"string"}
PROJECT_ID = 'Please write your GCP_ID' # @param {type:"string"}
LOCATION = 'us-central1' # @param {type:"string"}

[Action Required]

Please change the variable values while checking the codes below Input.

Using the brand "Google" as an example, this article will demonstrate how to search YouTube for videos on a specific topic (e.g. "Google AI") while excluding videos from the brand's own channel.

Input Variables for YouTube Video Analysis

BRAND_NAME (Required): Brand name for analysis (e.g., Google).
MY_COMPANY_INFO (Required): Brief brand description and context.
SEARCH_QUERY (Required): Search term for YouTube videos (e.g., Google AI).
VIEWER_COUNTRY: Viewer's country code (two-letter country code: ISO 3166-1 alpha-2) (e.g., KR).
GENERATION_LANGUAGE (Required): Language for Gemini's results (e.g., Korean).
EXCEPT_CHANNEL_IDS: Comma-separated channel IDs to exclude.

You can find channel id from YouTube channel.

VIDEO_TOPIC: YouTube topic ID for refinement.

You can find video topic value in Search: list | YouTube Data API | Google for Developers.

DATE_INPUT (Required): Start date for published video (YYYY-MM-DD).

# Input
BRAND_NAME = "Google" # @param {type:"string"}
MY_COMPANY_INFO = "Google is a multinational technology company specializing in internet-related services and products." # @param {type:"string"}
SEARCH_QUERY = 'Google AI' # @param {type:"string"}
VIEWER_COUNTRY = 'KR' # @param {type:"string"}
GENERATION_LANGUAGE = 'Korean' # @param {type:"string"}
EXCEPT_CHANNEL_IDS = 'UCK8sQmJBp8GCxrOtXWBpyEA, UCdc_SRhKUlH3grljQXA0skw' # @param {type:"string"}
VIDEO_TOPIC = '/m/07c1v' # @param {type: "string"}
DATE_INPUT = '2025-01-01' # @param {type:"date"}

# Auth Scope
SCOPE = [
    'https://www.googleapis.com/auth/youtube.readonly',
    'https://www.googleapis.com/auth/spreadsheets',
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/cloud-platform'
]

# validation check
if not SEARCH_QUERY or not DATE_INPUT:
  raise ValueError("Search query and date input are required.")

EXCEPT_CHANNEL_IDS = [id.strip() for id in EXCEPT_CHANNEL_IDS.split(',')]

The provided text lists key functions related to interacting with the YouTube Data API.

# YouTube API function

def get_youtube_videos(q, viewer_country_code, topic_str, start_period):

    page_token_number = 1
    next_page_token = ''
    merged_array = []

    published_after_date = f"{start_period}T00:00:00Z"

    while page_token_number < 9 and len(merged_array) <= 75:
        result = search_youtube(q, topic_str, published_after_date, viewer_country_code, '', next_page_token, 50)
        merged_array = list(set(merged_array + result['items']))
        next_page_token = result['nextPageToken']
        page_token_number += 1

    return merged_array

def search_youtube(query, topic_id, published_after, region_code, relevance_language, next_page_token, max_results=50):

    if not query:
        return None

    q = query

    url = f'https://www.googleapis.com/youtube/v3/search?key={API_KEY}&part=snippet&q={q}&publishedAfter={published_after}&regionCode={region_code}&type=video&topicId={topic_id}&maxResults={max_results}&pageToken={next_page_token}&gl={region_code.lower()}'

    response = requests.get(url)
    data = response.json()
    results = data.get('items', [])
    next_page_token = data.get('nextPageToken', '')
    return_results = [item['id']['videoId'] for item in results]

    print(url)

    return {
        "nextPageToken": next_page_token,
        "items": return_results
    }

def get_date_string(days_ago):

    date = datetime.now() + timedelta(days=days_ago)
    return date.strftime('%Y-%m-%dT00:00:00Z')

def get_video_details(video_id):

    url = f'https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails&id={video_id}&key={API_KEY}'
    response = requests.get(url)
    data = response.json()

    if data.get('items'):
        video = data['items'][0]
        snippet = video['snippet']
        content_details = video['contentDetails']

        title = snippet.get('title', 'no title')
        description = snippet.get('description', 'no description')
        duration_iso = content_details.get('duration', None)
        channel_id = snippet.get('channelId', 'no channel id')
        channel_title = snippet.get('channelTitle', 'no channel title')
        return {'title': title, 'description': description, 'duration': duration_to_seconds(duration_iso), 'channel_id': channel_id, 'channel_title': channel_title}
    else:
        return None

def duration_to_seconds(duration_str):
  match = re.match(r'PT(?:(\d+)H)?(?:(\d+)M)?(?:(\d+)S)?', duration_str)
  if not match:
    return None

  hours, minutes, seconds = match.groups()

  total_seconds = 0
  if hours:
    total_seconds += int(hours) * 3600
  if minutes:
    total_seconds += int(minutes) * 60
  if seconds:
    total_seconds += int(seconds)

  return total_seconds

The text provides a prompt template that can be adjusted as needed, along with key functions for interacting with the GCP Vertex AI API.

# GCP Vertex AI API

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)
model = client.models

def request_gemini(prompt, video_link):
  video_extraction_json_generation_config = GenerateContentConfig(
    temperature=0.0,
    max_output_tokens=2048,
  )

  contents = [
      Part.from_uri(
          file_uri=video_link,
          mime_type="video/mp4",
      ),
      prompt
  ]

  response = model.generate_content(
      model=LANGUAGE_MODEL,
      contents=contents,
      config=video_extraction_json_generation_config
  )

  try:
    return response.text
  except:
    return response.GenerateContentResponse

def create_prompt(yt_title, yt_description, yt_link):
  return f"""### Task: You are a highly specialized marketer and YouTube expert working for the brand or company, {BRAND_NAME}.
Your boss is wondering which a video to use to promote their company's advertisements and which a YouTuber to promote their advertisements with in the future. You are the expert who can give your boss the most suitable suggestions.
Analyze the video according to the criteria below and solve your boss's worries.

### Criteria: Now you review the video.
If you evaluate it using the following criteria, you will be able to receive a better evaluation.

1. Whether the video mentions brand, {BRAND_NAME}.
2. Whether the video views {BRAND_NAME} positively or negatively.
3. Whether the video would be suitable for marketing purposes.

### Context and Contents:
Your Company Information:
- Company Description: {MY_COMPANY_INFO}
- Brand: {BRAND_NAME}

Analysis subject:
- YouTube title: {yt_title}
- YouTube description: {yt_description}
- YouTube link: {yt_link}

### Answer Format:
brand_relevance_score: (Integer between 0 and 100 - If this video is more relative about the {BRAND_NAME}, it will score higher)
brand_positive_score: (Integer between 0 and 100 - If this video is positive about the {BRAND_NAME}, it will score higher)
brand_negative_score: (Integer between 0 and 100 - If this video is negative about the {BRAND_NAME}, it will score higher)
video_content_summary: (Summarize the content of the video like overview)
video_brand_summary: (Summarize the content about your brand, {BRAND_NAME})
opinion: (Why this video is suitable for promoting your company or product)

### Examples:
brand_relevance_score: 100
brand_positive_score: 80
brand_negative_score: 0
video_content_summary: YouTubers introduce various electronic products in their videos.
video_brand_summary: The brand products mentioned in the video have their advantages well explained by the YouTuber.
opinion: Consumers are more likely to think positively about the advantages of the product.

### Caution:
DO NOT fabricate information.
DO NOT imagine things.
DO NOT Markdown format.
DO Analyze each video based on the criteria mentioned above.
DO Analyze after watching the whole video.
DO write the results for summary as {GENERATION_LANGUAGE}."""

def parse_response(response: str):
  brand_relevance_score_pattern = r"brand_relevance_score:\s*(\d{1,3})"
  brand_positive_score_pattern = r"brand_positive_score:\s*(\d{1,3})"
  brand_negative_score_pattern = r"brand_negative_score:\s*(\d{1,3})"
  video_content_summary_pattern = r"video_content_summary:\s*(.*)"
  video_brand_summary_pattern = r"video_brand_summary:\s*(.*)"
  opinion_pattern = r"opinion:\s*(.*)"
  brand_relevance_score_match = re.search( brand_relevance_score_pattern, response )
  brand_relevance_score = ( int(brand_relevance_score_match.group(1)) if brand_relevance_score_match else 0 )
  brand_positive_score_match = re.search( brand_positive_score_pattern, response )
  brand_positive_score = ( int(brand_positive_score_match.group(1)) if brand_positive_score_match else 0 )
  brand_negative_score_match = re.search( brand_negative_score_pattern, response )
  brand_negative_score = ( int(brand_negative_score_match.group(1)) if brand_negative_score_match else 0 )
  video_content_score_match = re.search( video_content_summary_pattern, response )
  video_content_summary = ( video_content_score_match.group(1) if video_content_score_match else '' )
  video_brand_summary_match = re.search( video_brand_summary_pattern, response )
  video_brand_summary = ( video_brand_summary_match.group(1) if video_brand_summary_match else '' )
  opinion_match = re.search( opinion_pattern, response )
  opinion = ( opinion_match.group(1) if opinion_match else '' )
  return ( brand_relevance_score, brand_positive_score, brand_negative_score, video_content_summary, video_brand_summary, opinion)

def request_gemini_with_retry(prompt, youtube_link='', max_retries=1):
  retries = 0
  while retries <= max_retries:
    try:
      response = request_gemini(prompt, youtube_link)
      ( brand_relevance_score,
        brand_positive_score,
        brand_negative_score,
        video_content_summary,
        video_brand_summary,
        opinion) = parse_response(response)
      if ( validate_score(brand_relevance_score) and
           validate_score(brand_positive_score) and
           validate_score(brand_negative_score) and
           validate_summary(video_content_summary) and
           validate_summary(video_brand_summary) ):

        return ( brand_relevance_score,
                 brand_positive_score,
                 brand_negative_score,
                 video_content_summary,
                 video_brand_summary,
                 opinion
              )
      else:
        retries += 1
        ValueError(
            "The value may be incorrect, there may be a range issue, a parsing"
            " issue, or a response issue with Gemini: score -"
            f" {brand_relevance_score}, {brand_positive_score},"
            f" {brand_negative_score} , summary - {video_content_summary},"
            f" {video_brand_summary}" )

    except Exception as e:
      print(f"Request failed: {e}")
      retries += 1
      if retries <= max_retries:
        print(f"retry ({retries}/{max_retries})...")
      else:
        print("Maximum number of retries exceeded")
        return 0, 0, 0, "", "", ""

def validate_score(score):
  return score >= 0 and score <= 100

def validate_summary(summary):
  return len(summary) > 0

This code block is responsible for three primary functions: creating a dataframe, executing a Gemini analysis, and subsequently updating the dataframe.

def df_youtube_videos():
  youtube_video_list = get_youtube_videos(SEARCH_QUERY, VIEWER_COUNTRY, VIDEO_TOPIC, DATE_INPUT)
  youtube_video_link_list = []
  youtube_video_title_list = []
  youtube_video_description_list = []
  youtube_video_channel_title_list = []
  youtube_video_duration_list = []

  for video_id in youtube_video_list:
    video_details = get_video_details(video_id)
    # https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/video-understanding
    if video_details['duration'] < 50*60 and not video_details['channel_id'] in EXCEPT_CHANNEL_IDS:
      youtube_video_link_list.append(f'https://www.youtube.com/watch?v={video_id}')
      if video_details:
        youtube_video_title_list.append(video_details['title'])
        youtube_video_description_list.append(video_details['description'])
        youtube_video_channel_title_list.append(video_details['channel_title'])
        duration_new_format = f"{video_details['duration'] // 3600:02d}:{(video_details['duration'] % 3600) // 60:02d}:{video_details['duration'] % 60:02d}" # HH:MM:SS
        youtube_video_duration_list.append(duration_new_format)
      else:
        youtube_video_title_list.append('')
        youtube_video_description_list.append('')
        youtube_video_channel_title_list.append(video_details['channel_title'])
        youtube_video_duration_list.append('')

  df = DataFrame({
      'video_id': youtube_video_link_list,
      'title': youtube_video_title_list,
      'description': youtube_video_description_list,
      'channel_title': youtube_video_channel_title_list,
      'length': youtube_video_duration_list
  })
  return df

def run_gemini(df):
  for index, row in df.iterrows():
    video_title = row['title']
    video_description = row['description']
    video_link = row['video_id']
    prompt = create_prompt(video_title, video_description, video_link)
    ( brand_relevance_score,
      brand_positive_score,
      brand_negative_score,
      video_content_summary,
      video_brand_summary,
      opinion) = request_gemini_with_retry(prompt, video_link)
    df.at[index, 'gemini_brand_relevance_score'] = brand_relevance_score
    df.at[index, 'gemini_brand_positive_score'] = brand_positive_score
    df.at[index, 'gemini_brand_negative_score'] = brand_negative_score
    df.at[index, 'gemini_video_content_summary'] = video_content_summary
    df.at[index, 'gemini_video_brand_summary'] = video_brand_summary
    df.at[index, 'gemini_opinion'] = opinion
    # https://cloud.google.com/vertex-ai/generative-ai/docs/quotas
    time.sleep(1)
    print(f"Processing: {index}/{len(df)}")
    print(f"video_title: {video_title}")
  return df

This is a code block that executes all the code written so far. It fetches data from YouTube, analyzes it using Gemini, and finally creates a dataframe.

# main
df = df_youtube_videos()
run_gemini(df)
df['gemini_brand_positive_score'] = df[ 'gemini_brand_positive_score' ].astype('int64')
df['gemini_brand_relevance_score'] = df[ 'gemini_brand_relevance_score' ].astype('int64')
df['gemini_brand_negative_score'] = df[ 'gemini_brand_negative_score' ].astype('int64')
df = df.sort_values( 'gemini_brand_positive_score', ascending=False )

df

The last step is to create a spreadsheet from the dataframe. To check your progress, use the output URL.

import gspread
from google.auth import default

today_date = datetime.now().strftime('%Y-%m-%d')
my_spreadsheet_title = f"Partner's Video Finder, {BRAND_NAME}, {SEARCH_QUERY}, {VIEWER_COUNTRY} ({DATE_INPUT}~{today_date})"

creds, _ = default()
gc = gspread.authorize(creds)
sh = gc.create(my_spreadsheet_title)
worksheet = gc.open(my_spreadsheet_title).sheet1
cell_list = df.values.tolist()
worksheet.update([df.columns.values.tolist()] + cell_list)

print("URL: ", sh.url)

3. Reference

I have referenced the following for writing the code. If you need to modify the code or want to know more detailed usage, please refer to the link below.

Report a mistake