การใช้ Video Intelligence API กับ Python

เหลืออีก 17 นาที

เกี่ยวกับ Codelab นี้

อัปเดตล่าสุดเมื่อ เม.ย. 4, 2023

เขียนโดย Laurent Picard

1 ภาพรวม

Video Intelligence API ช่วยให้คุณใช้เทคโนโลยีการวิเคราะห์วิดีโอของ Google เป็นส่วนหนึ่งของแอปพลิเคชันได้

ในห้องทดลองนี้ คุณจะมุ่งเน้นที่การใช้ Video Intelligence API กับ Python

สิ่งที่คุณจะได้เรียนรู้

วิธีตั้งค่าสภาพแวดล้อม
วิธีตั้งค่า Python
วิธีตรวจหาการเปลี่ยนภาพ
วิธีตรวจหาป้ายกำกับ
วิธีตรวจหาเนื้อหาที่อาจไม่เหมาะสม
วิธีถอดเสียงคำพูด
วิธีตรวจหาและติดตามข้อความ
วิธีตรวจจับและติดตามวัตถุ
วิธีตรวจหาและติดตามโลโก้

สิ่งที่คุณต้องมี

โปรเจ็กต์ Google Cloud
เบราว์เซอร์ เช่น Chrome หรือ Firefox
ความคุ้นเคยกับการใช้ Python

แบบสำรวจ

คุณจะใช้บทแนะนำนี้อย่างไร

อ่านเท่านั้นอ่านและทำแบบฝึกหัด

คุณจะให้คะแนนประสบการณ์การใช้งาน Python อย่างไร

มือใหม่ระดับกลางผู้ชำนาญ

คุณจะให้คะแนนประสบการณ์การใช้งานบริการ Google Cloud อย่างไร

มือใหม่ระดับกลางผู้ชำนาญ

การตั้งค่าสภาพแวดล้อมตามเวลาที่สะดวก

ลงชื่อเข้าใช้ Google Cloud Console และสร้างโปรเจ็กต์ใหม่หรือใช้โปรเจ็กต์ที่มีอยู่ซ้ำ หากยังไม่มีบัญชี Gmail หรือ Google Workspace คุณต้องสร้างบัญชี

ชื่อโครงการคือชื่อที่แสดงของผู้เข้าร่วมโปรเจ็กต์นี้ เป็นสตริงอักขระที่ Google APIs ไม่ได้ใช้ โดยคุณจะอัปเดตวิธีการชำระเงินได้ทุกเมื่อ
รหัสโปรเจ็กต์จะไม่ซ้ำกันในทุกโปรเจ็กต์ของ Google Cloud และจะเปลี่ยนแปลงไม่ได้ (เปลี่ยนแปลงไม่ได้หลังจากตั้งค่าแล้ว) Cloud Console จะสร้างสตริงที่ไม่ซ้ำกันโดยอัตโนมัติ ปกติแล้วคุณไม่สนว่าอะไรเป็นอะไร ใน Codelab ส่วนใหญ่ คุณจะต้องอ้างอิงรหัสโปรเจ็กต์ (โดยปกติจะระบุเป็น PROJECT_ID) หากคุณไม่ชอบรหัสที่สร้างขึ้น คุณสามารถสร้างรหัสแบบสุ่มอื่นได้ หรือคุณจะลองดำเนินการเองแล้วดูว่าพร้อมให้ใช้งานหรือไม่ คุณจะเปลี่ยนแปลงหลังจากขั้นตอนนี้ไม่ได้และจะยังคงอยู่ตลอดระยะเวลาของโปรเจ็กต์
สำหรับข้อมูลของคุณ ค่าที่ 3 คือหมายเลขโปรเจ็กต์ ซึ่ง API บางตัวใช้ ดูข้อมูลเพิ่มเติมเกี่ยวกับค่าทั้ง 3 ค่าได้ในเอกสารประกอบ

ถัดไป คุณจะต้องเปิดใช้การเรียกเก็บเงินใน Cloud Console เพื่อใช้ทรัพยากร/API ของระบบคลาวด์ การใช้งาน Codelab นี้จะไม่มีค่าใช้จ่ายใดๆ หากมี หากต้องการปิดทรัพยากรเพื่อหลีกเลี่ยงการเรียกเก็บเงินที่นอกเหนือจากบทแนะนำนี้ คุณสามารถลบทรัพยากรที่คุณสร้างหรือลบโปรเจ็กต์ได้ ผู้ใช้ Google Cloud ใหม่มีสิทธิ์เข้าร่วมโปรแกรมช่วงทดลองใช้ฟรี$300 USD

เริ่มต้น Cloud Shell

แม้ว่าคุณจะดำเนินการ Google Cloud จากระยะไกลได้จากแล็ปท็อป แต่คุณจะใช้ Cloud Shell ใน Codelab ซึ่งเป็นสภาพแวดล้อมบรรทัดคำสั่งที่ทำงานในระบบคลาวด์

เปิดใช้งาน Cloud Shell

คลิกเปิดใช้งาน Cloud Shell จาก Cloud Console

หากเริ่มต้นใช้งาน Cloud Shell เป็นครั้งแรก คุณจะเห็นหน้าจอตรงกลางที่อธิบายว่านี่คืออะไร หากระบบแสดงหน้าจอตรงกลาง ให้คลิกต่อไป

การจัดสรรและเชื่อมต่อกับ Cloud Shell ใช้เวลาเพียงไม่กี่นาที

เครื่องเสมือนนี้โหลดด้วยเครื่องมือการพัฒนาทั้งหมดที่จำเป็น โดยมีไดเรกทอรีหลักขนาด 5 GB ถาวรและทำงานใน Google Cloud ซึ่งช่วยเพิ่มประสิทธิภาพของเครือข่ายและการตรวจสอบสิทธิ์ได้อย่างมาก งานส่วนใหญ่ใน Codelab นี้สามารถทำได้โดยใช้เบราว์เซอร์

เมื่อเชื่อมต่อกับ Cloud Shell แล้ว คุณควรเห็นข้อความตรวจสอบสิทธิ์และโปรเจ็กต์ได้รับการตั้งค่าเป็นรหัสโปรเจ็กต์แล้ว

เรียกใช้คำสั่งต่อไปนี้ใน Cloud Shell เพื่อยืนยันว่าคุณได้รับการตรวจสอบสิทธิ์แล้ว

gcloud auth list

เอาต์พุตจากคำสั่ง

 Credentialed Accounts
ACTIVE  ACCOUNT
*       <my_account>@<my_domain.com>

To set the active account, run:
    $ gcloud config set account `ACCOUNT`

เรียกใช้คำสั่งต่อไปนี้ใน Cloud Shell เพื่อยืนยันว่าคำสั่ง gcloud รู้เกี่ยวกับโปรเจ็กต์ของคุณ

gcloud config list project

เอาต์พุตจากคำสั่ง

[core]
project = <PROJECT_ID>

หากไม่ใช่ ให้ตั้งคำสั่งด้วยคำสั่งนี้

gcloud config set project <PROJECT_ID>

เอาต์พุตจากคำสั่ง

Updated property [core/project].

3 การตั้งค่าสภาพแวดล้อม

ก่อนที่คุณจะเริ่มใช้ Video Intelligence API ให้เรียกใช้คำสั่งต่อไปนี้ใน Cloud Shell เพื่อเปิดใช้ API

gcloud services enable videointelligence.googleapis.com

คุณควรจะเห็นบางสิ่งเช่นนี้:

Operation "operations/..." finished successfully.

คุณใช้ Video Intelligence API ได้แล้วในตอนนี้

ไปที่ไดเรกทอรีหน้าแรก

cd ~

สร้างสภาพแวดล้อมเสมือนของ Python เพื่อแยกทรัพยากร Dependency ต่อไปนี้

virtualenv venv-videointel

เปิดใช้งานสภาพแวดล้อมเสมือน

source venv-videointel/bin/activate

ติดตั้ง IPython และไลบรารีของไคลเอ็นต์ Video Intelligence API ดังนี้

pip install ipython google-cloud-videointelligence

คุณควรจะเห็นบางสิ่งเช่นนี้:

...
Installing collected packages: ..., ipython, google-cloud-videointelligence
Successfully installed ... google-cloud-videointelligence-2.11.0 ...

ตอนนี้คุณก็พร้อมใช้ไลบรารีของไคลเอ็นต์ Video Intelligence API แล้ว

ในขั้นตอนถัดไป คุณจะต้องใช้ล่าม Python แบบอินเทอร์แอกทีฟที่ชื่อ IPython ซึ่งคุณติดตั้งไว้ก่อนหน้านี้ เริ่มเซสชันโดยการเรียกใช้ ipython ใน Cloud Shell:

ipython

คุณควรจะเห็นบางสิ่งเช่นนี้:

Python 3.9.2 (default, Feb 28 2021, 17:03:44)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]:

4 วิดีโอตัวอย่าง

คุณสามารถใช้ Video Intelligence API เพื่อใส่คำอธิบายประกอบในวิดีโอที่จัดเก็บไว้ใน Cloud Storage หรือระบุเป็นไบต์ข้อมูลได้

ในขั้นตอนถัดไป คุณจะใช้วิดีโอตัวอย่างที่จัดเก็บไว้ใน Cloud Storage คุณสามารถดูวิดีโอในเบราว์เซอร์

เข้าที่ ระวัง ไป!

5 ตรวจหาการเปลี่ยนแปลงภาพ

คุณใช้ Video Intelligence API เพื่อตรวจหาการเปลี่ยนแปลงของการถ่ายทำในวิดีโอได้ ช็อตหนึ่งๆ คือส่วนหนึ่งของวิดีโอ ซึ่งเป็นชุดเฟรมที่มีความต่อเนื่องของภาพ

คัดลอกโค้ดต่อไปนี้ลงในเซสชัน IPython

from typing import cast

from google.cloud import videointelligence_v1 as vi


def detect_shot_changes(video_uri: str) -> vi.VideoAnnotationResults:
    video_client = vi.VideoIntelligenceServiceClient()
    features = [vi.Feature.SHOT_CHANGE_DETECTION]
    request = vi.AnnotateVideoRequest(input_uri=video_uri, features=features)

    print(f'Processing video: "{video_uri}"...')
    operation = video_client.annotate_video(request)

    # Wait for operation to complete
    response = cast(vi.AnnotateVideoResponse, operation.result())
    # A single video is processed
    results = response.annotation_results[0]

    return results

โปรดใช้เวลาสักครู่เพื่อศึกษาโค้ดและดูว่าโค้ดใช้วิธีการไลบรารีไคลเอ็นต์ annotate_video กับพารามิเตอร์ SHOT_CHANGE_DETECTION อย่างไรเพื่อวิเคราะห์วิดีโอและตรวจจับการเปลี่ยนภาพ

เรียกใช้ฟังก์ชันเพื่อวิเคราะห์วิดีโอ

video_uri = "gs://cloud-samples-data/video/JaneGoodall.mp4"

results = detect_shot_changes(video_uri)

รอให้ระบบประมวลผลวิดีโอโดยทำดังนี้

Processing video: "gs://cloud-samples-data/video/JaneGoodall.mp4"...

เพิ่มฟังก์ชันนี้เพื่อพิมพ์วิดีโอช็อต:

def print_video_shots(results: vi.VideoAnnotationResults):
    shots = results.shot_annotations
    print(f" Video shots: {len(shots)} ".center(40, "-"))
    for i, shot in enumerate(shots):
        t1 = shot.start_time_offset.total_seconds()
        t2 = shot.end_time_offset.total_seconds()
        print(f"{i+1:>3} | {t1:7.3f} | {t2:7.3f}")

เรียกใช้ฟังก์ชัน

print_video_shots(results)

คุณควรจะเห็นบางสิ่งเช่นนี้:

----------- Video shots: 34 ------------
  1 |   0.000 |  12.880
  2 |  12.920 |  21.680
  3 |  21.720 |  27.880
...
 32 | 135.160 | 138.320
 33 | 138.360 | 146.200
 34 | 146.240 | 162.520

หากคุณแยกเฟรมตรงกลางของแต่ละช็อตและจัดเรียงไว้ในผนังเฟรม คุณสามารถสร้างสรุปรูปภาพของวิดีโอได้

สรุป

ในขั้นตอนนี้ คุณสามารถตรวจหาการเปลี่ยนแปลงการถ่ายภาพในวิดีโอโดยใช้ Video Intelligence API คุณสามารถอ่านเพิ่มเติมเกี่ยวกับการตรวจหาการเปลี่ยนแปลงของการถ่ายภาพ

6 ตรวจหาป้ายกำกับ

คุณใช้ Video Intelligence API เพื่อตรวจหาป้ายกำกับในวิดีโอได้ ค่ายเพลงอธิบายวิดีโอตามเนื้อหาที่เป็นภาพ

คัดลอกโค้ดต่อไปนี้ลงในเซสชัน IPython

from datetime import timedelta
from typing import Optional, Sequence, cast

from google.cloud import videointelligence_v1 as vi


def detect_labels(
    video_uri: str,
    mode: vi.LabelDetectionMode,
    segments: Optional[Sequence[vi.VideoSegment]] = None,
) -> vi.VideoAnnotationResults:
    video_client = vi.VideoIntelligenceServiceClient()
    features = [vi.Feature.LABEL_DETECTION]
    config = vi.LabelDetectionConfig(label_detection_mode=mode)
    context = vi.VideoContext(segments=segments, label_detection_config=config)
    request = vi.AnnotateVideoRequest(
        input_uri=video_uri,
        features=features,
        video_context=context,
    )

    print(f'Processing video "{video_uri}"...')
    operation = video_client.annotate_video(request)

    # Wait for operation to complete
    response = cast(vi.AnnotateVideoResponse, operation.result())
    # A single video is processed
    results = response.annotation_results[0]

    return results

โปรดใช้เวลาสักครู่เพื่อศึกษาโค้ดและดูว่าโค้ดใช้วิธีการไลบรารีไคลเอ็นต์ annotate_video กับพารามิเตอร์ LABEL_DETECTION อย่างไรเพื่อวิเคราะห์วิดีโอและตรวจหาป้ายกำกับ

เรียกใช้ฟังก์ชันเพื่อวิเคราะห์ 37 วินาทีแรกของวิดีโอ

video_uri = "gs://cloud-samples-data/video/JaneGoodall.mp4"
mode = vi.LabelDetectionMode.SHOT_MODE
segment = vi.VideoSegment(
    start_time_offset=timedelta(seconds=0),
    end_time_offset=timedelta(seconds=37),
)

results = detect_labels(video_uri, mode, [segment])

รอให้ระบบประมวลผลวิดีโอโดยทำดังนี้

Processing video: "gs://cloud-samples-data/video/JaneGoodall.mp4"...

เพิ่มฟังก์ชันนี้เพื่อพิมพ์ป้ายกำกับที่ระดับวิดีโอ ดังนี้

def print_video_labels(results: vi.VideoAnnotationResults):
    labels = sorted_by_first_segment_confidence(results.segment_label_annotations)

    print(f" Video labels: {len(labels)} ".center(80, "-"))
    for label in labels:
        categories = category_entities_to_str(label.category_entities)
        for segment in label.segments:
            confidence = segment.confidence
            t1 = segment.segment.start_time_offset.total_seconds()
            t2 = segment.segment.end_time_offset.total_seconds()
            print(
                f"{confidence:4.0%}",
                f"{t1:7.3f}",
                f"{t2:7.3f}",
                f"{label.entity.description}{categories}",
                sep=" | ",
            )


def sorted_by_first_segment_confidence(
    labels: Sequence[vi.LabelAnnotation],
) -> Sequence[vi.LabelAnnotation]:
    def first_segment_confidence(label: vi.LabelAnnotation) -> float:
        return label.segments[0].confidence

    return sorted(labels, key=first_segment_confidence, reverse=True)


def category_entities_to_str(category_entities: Sequence[vi.Entity]) -> str:
    if not category_entities:
        return ""
    entities = ", ".join([e.description for e in category_entities])
    return f" ({entities})"

เรียกใช้ฟังก์ชัน

print_video_labels(results)

คุณควรจะเห็นบางสิ่งเช่นนี้:

------------------------------- Video labels: 10 -------------------------------
 96% |   0.000 |  36.960 | nature
 74% |   0.000 |  36.960 | vegetation
 59% |   0.000 |  36.960 | tree (plant)
 56% |   0.000 |  36.960 | forest (geographical feature)
 49% |   0.000 |  36.960 | leaf (plant)
 43% |   0.000 |  36.960 | flora (plant)
 38% |   0.000 |  36.960 | nature reserve (geographical feature)
 38% |   0.000 |  36.960 | woodland (forest)
 35% |   0.000 |  36.960 | water resources (water)
 32% |   0.000 |  36.960 | sunlight (light)

ป้ายกำกับระดับวิดีโอเหล่านี้จึงช่วยให้คุณเข้าใจว่าช่วงเริ่มต้นของวิดีโอส่วนใหญ่เป็นเนื้อหาเกี่ยวกับธรรมชาติและพืชพรรณ

เพิ่มฟังก์ชันนี้เพื่อพิมพ์ป้ายกำกับที่ระดับภาพถ่าย:

def print_shot_labels(results: vi.VideoAnnotationResults):
    labels = sorted_by_first_segment_start_and_confidence(
        results.shot_label_annotations
    )

    print(f" Shot labels: {len(labels)} ".center(80, "-"))
    for label in labels:
        categories = category_entities_to_str(label.category_entities)
        print(f"{label.entity.description}{categories}")
        for segment in label.segments:
            confidence = segment.confidence
            t1 = segment.segment.start_time_offset.total_seconds()
            t2 = segment.segment.end_time_offset.total_seconds()
            print(f"{confidence:4.0%} | {t1:7.3f} | {t2:7.3f}")


def sorted_by_first_segment_start_and_confidence(
    labels: Sequence[vi.LabelAnnotation],
) -> Sequence[vi.LabelAnnotation]:
    def first_segment_start_and_confidence(label: vi.LabelAnnotation):
        first_segment = label.segments[0]
        ms = first_segment.segment.start_time_offset.total_seconds()
        return (ms, -first_segment.confidence)

    return sorted(labels, key=first_segment_start_and_confidence)

เรียกใช้ฟังก์ชัน

print_shot_labels(results)

คุณควรจะเห็นบางสิ่งเช่นนี้:

------------------------------- Shot labels: 29 --------------------------------
planet (astronomical object)
 83% |   0.000 |  12.880
earth (planet)
 53% |   0.000 |  12.880
water resources (water)
 43% |   0.000 |  12.880
aerial photography (photography)
 43% |   0.000 |  12.880
vegetation
 32% |   0.000 |  12.880
 92% |  12.920 |  21.680
 83% |  21.720 |  27.880
 77% |  27.920 |  31.800
 76% |  31.840 |  34.720
...
butterfly (insect, animal)
 84% |  34.760 |  36.960
...

ด้วยป้ายกำกับระดับช็อตเหล่านี้ คุณจึงเข้าใจได้ว่าวิดีโอเริ่มต้นด้วยช็อตรูปดาว (น่าจะโลก) ว่ามีผีเสื้อในช็อต 34.760-36.960s...

สรุป

ในขั้นตอนนี้ คุณสามารถตรวจหาป้ายกำกับในวิดีโอโดยใช้ Video Intelligence API อ่านเพิ่มเติมเกี่ยวกับการตรวจหาป้ายกำกับ

7 ตรวจหาเนื้อหาที่อาจไม่เหมาะสม

คุณใช้ Video Intelligence API เพื่อตรวจหาเนื้อหาที่อาจไม่เหมาะสมในวิดีโอได้ เนื้อหาที่อาจไม่เหมาะสมคือเนื้อหาสำหรับผู้ใหญ่ที่โดยทั่วไปไม่เหมาะสมสำหรับผู้ที่มีอายุต่ำกว่า 18 ปี ซึ่งรวมถึงแต่ไม่จำกัดเพียงภาพเปลือย กิจกรรมทางเพศ และภาพลามกอนาจาร การตรวจจับจะทำงานตามสัญญาณภาพต่อเฟรมเท่านั้น (ไม่ใช้เสียง) การตอบสนองมีค่าที่เป็นไปได้ตั้งแต่ VERY_UNLIKELY ถึง VERY_LIKELY

คัดลอกโค้ดต่อไปนี้ลงในเซสชัน IPython

from datetime import timedelta
from typing import Optional, Sequence, cast

from google.cloud import videointelligence_v1 as vi


def detect_explicit_content(
    video_uri: str,
    segments: Optional[Sequence[vi.VideoSegment]] = None,
) -> vi.VideoAnnotationResults:
    video_client = vi.VideoIntelligenceServiceClient()
    features = [vi.Feature.EXPLICIT_CONTENT_DETECTION]
    context = vi.VideoContext(segments=segments)
    request = vi.AnnotateVideoRequest(
        input_uri=video_uri,
        features=features,
        video_context=context,
    )

    print(f'Processing video "{video_uri}"...')
    operation = video_client.annotate_video(request)

    # Wait for operation to complete
    response = cast(vi.AnnotateVideoResponse, operation.result())
    # A single video is processed
    results = response.annotation_results[0]

    return results

โปรดใช้เวลาสักครู่เพื่อศึกษาโค้ดและดูว่าโค้ดใช้วิธีการไลบรารีไคลเอ็นต์ annotate_video กับพารามิเตอร์ EXPLICIT_CONTENT_DETECTION อย่างไรเพื่อวิเคราะห์วิดีโอและตรวจหาเนื้อหาที่อาจไม่เหมาะสม

เรียกใช้ฟังก์ชันเพื่อวิเคราะห์ 10 วินาทีแรกของวิดีโอ

video_uri = "gs://cloud-samples-data/video/JaneGoodall.mp4"
segment = vi.VideoSegment(
    start_time_offset=timedelta(seconds=0),
    end_time_offset=timedelta(seconds=10),
)

results = detect_explicit_content(video_uri, [segment])

รอให้ระบบประมวลผลวิดีโอโดยทำดังนี้

Processing video: "gs://cloud-samples-data/video/JaneGoodall.mp4"...

เพิ่มฟังก์ชันนี้เพื่อพิมพ์จำนวนความเป็นไปได้ต่างๆ ดังนี้

def print_explicit_content(results: vi.VideoAnnotationResults):
    from collections import Counter

    frames = results.explicit_annotation.frames
    likelihood_counts = Counter([f.pornography_likelihood for f in frames])

    print(f" Explicit content frames: {len(frames)} ".center(40, "-"))
    for likelihood in vi.Likelihood:
        print(f"{likelihood.name:<22}: {likelihood_counts[likelihood]:>3}")

เรียกใช้ฟังก์ชัน

print_explicit_content(results)

คุณควรจะเห็นบางสิ่งเช่นนี้:

----- Explicit content frames: 10 ------
LIKELIHOOD_UNSPECIFIED:   0
VERY_UNLIKELY         :  10
UNLIKELY              :   0
POSSIBLE              :   0
LIKELY                :   0
VERY_LIKELY           :   0

เพิ่มฟังก์ชันนี้เพื่อพิมพ์รายละเอียดเฟรม

def print_frames(results: vi.VideoAnnotationResults, likelihood: vi.Likelihood):
    frames = results.explicit_annotation.frames
    frames = [f for f in frames if f.pornography_likelihood == likelihood]

    print(f" {likelihood.name} frames: {len(frames)} ".center(40, "-"))
    for frame in frames:
        print(frame.time_offset)

เรียกใช้ฟังก์ชัน

print_frames(results, vi.Likelihood.VERY_UNLIKELY)

คุณควรจะเห็นบางสิ่งเช่นนี้:

------- VERY_UNLIKELY frames: 10 -------
0:00:00.365992
0:00:01.279206
0:00:02.268336
0:00:03.289253
0:00:04.400163
0:00:05.291547
0:00:06.449558
0:00:07.452751
0:00:08.577405
0:00:09.554514

สรุป

ในขั้นตอนนี้ คุณตรวจหาเนื้อหาที่อาจไม่เหมาะสมในวิดีโอได้โดยใช้ Video Intelligence API อ่านเพิ่มเติมเกี่ยวกับการตรวจหาเนื้อหาที่อาจไม่เหมาะสม

8 ถอดเสียงพูด

คุณใช้ Video Intelligence API เพื่อถอดเสียงคำพูดในวิดีโอเป็นข้อความได้

คัดลอกโค้ดต่อไปนี้ลงในเซสชัน IPython

from datetime import timedelta
from typing import Optional, Sequence, cast

from google.cloud import videointelligence_v1 as vi


def transcribe_speech(
    video_uri: str,
    language_code: str,
    segments: Optional[Sequence[vi.VideoSegment]] = None,
) -> vi.VideoAnnotationResults:
    video_client = vi.VideoIntelligenceServiceClient()
    features = [vi.Feature.SPEECH_TRANSCRIPTION]
    config = vi.SpeechTranscriptionConfig(
        language_code=language_code,
        enable_automatic_punctuation=True,
    )
    context = vi.VideoContext(
        segments=segments,
        speech_transcription_config=config,
    )
    request = vi.AnnotateVideoRequest(
        input_uri=video_uri,
        features=features,
        video_context=context,
    )

    print(f'Processing video "{video_uri}"...')
    operation = video_client.annotate_video(request)

    # Wait for operation to complete
    response = cast(vi.AnnotateVideoResponse, operation.result())
    # A single video is processed
    results = response.annotation_results[0]

    return results

โปรดใช้เวลาสักครู่เพื่อศึกษาโค้ดและดูว่าโค้ดใช้วิธีการไลบรารีของไคลเอ็นต์ annotate_video ที่มีพารามิเตอร์ SPEECH_TRANSCRIPTION อย่างไรเพื่อวิเคราะห์วิดีโอและถอดเสียงคำพูด

เรียกใช้ฟังก์ชันเพื่อวิเคราะห์วิดีโอจากวินาที 55 ถึง 80:

video_uri = "gs://cloud-samples-data/video/JaneGoodall.mp4"
language_code = "en-GB"
segment = vi.VideoSegment(
    start_time_offset=timedelta(seconds=55),
    end_time_offset=timedelta(seconds=80),
)

results = transcribe_speech(video_uri, language_code, [segment])

รอให้ระบบประมวลผลวิดีโอโดยทำดังนี้

Processing video: "gs://cloud-samples-data/video/JaneGoodall.mp4"...

เพิ่มฟังก์ชันนี้เพื่อพิมพ์ข้อความที่ถอดเสียง

def print_video_speech(results: vi.VideoAnnotationResults, min_confidence: float = 0.8):
    def keep_transcription(transcription: vi.SpeechTranscription) -> bool:
        return min_confidence <= transcription.alternatives[0].confidence

    transcriptions = results.speech_transcriptions
    transcriptions = [t for t in transcriptions if keep_transcription(t)]

    print(f" Speech transcriptions: {len(transcriptions)} ".center(80, "-"))
    for transcription in transcriptions:
        first_alternative = transcription.alternatives[0]
        confidence = first_alternative.confidence
        transcript = first_alternative.transcript
        print(f" {confidence:4.0%} | {transcript.strip()}")

เรียกใช้ฟังก์ชัน

print_video_speech(results)

คุณควรจะเห็นบางสิ่งเช่นนี้:

--------------------------- Speech transcriptions: 2 ---------------------------
  91% | I was keenly aware of secret movements in the trees.
  92% | I looked into his large and lustrous eyes. They seem somehow to express his entire personality.

เพิ่มฟังก์ชันนี้เพื่อพิมพ์รายการคำที่ตรวจพบและการประทับเวลาของคำเหล่านั้น

def print_word_timestamps(
    results: vi.VideoAnnotationResults,
    min_confidence: float = 0.8,
):
    def keep_transcription(transcription: vi.SpeechTranscription) -> bool:
        return min_confidence <= transcription.alternatives[0].confidence

    transcriptions = results.speech_transcriptions
    transcriptions = [t for t in transcriptions if keep_transcription(t)]

    print(" Word timestamps ".center(80, "-"))
    for transcription in transcriptions:
        first_alternative = transcription.alternatives[0]
        confidence = first_alternative.confidence
        for word in first_alternative.words:
            t1 = word.start_time.total_seconds()
            t2 = word.end_time.total_seconds()
            word = word.word
            print(f"{confidence:4.0%} | {t1:7.3f} | {t2:7.3f} | {word}")

เรียกใช้ฟังก์ชัน

print_word_timestamps(results)

คุณควรจะเห็นบางสิ่งเช่นนี้:

------------------------------- Word timestamps --------------------------------
 93% |  55.000 |  55.700 | I
 93% |  55.700 |  55.900 | was
 93% |  55.900 |  56.300 | keenly
 93% |  56.300 |  56.700 | aware
 93% |  56.700 |  56.900 | of
...
 94% |  76.900 |  77.400 | express
 94% |  77.400 |  77.600 | his
 94% |  77.600 |  78.200 | entire
 94% |  78.200 |  78.500 | personality.

สรุป

ในขั้นตอนนี้ คุณสามารถถอดคำพูดในวิดีโอโดยใช้ Video Intelligence API อ่านเพิ่มเติมเกี่ยวกับการถอดเสียงเป็นคำ

9 ตรวจหาและติดตามข้อความ

คุณใช้ Video Intelligence API เพื่อตรวจหาและติดตามข้อความในวิดีโอได้

คัดลอกโค้ดต่อไปนี้ลงในเซสชัน IPython

from datetime import timedelta
from typing import Optional, Sequence, cast

from google.cloud import videointelligence_v1 as vi


def detect_text(
    video_uri: str,
    language_hints: Optional[Sequence[str]] = None,
    segments: Optional[Sequence[vi.VideoSegment]] = None,
) -> vi.VideoAnnotationResults:
    video_client = vi.VideoIntelligenceServiceClient()
    features = [vi.Feature.TEXT_DETECTION]
    config = vi.TextDetectionConfig(
        language_hints=language_hints,
    )
    context = vi.VideoContext(
        segments=segments,
        text_detection_config=config,
    )
    request = vi.AnnotateVideoRequest(
        input_uri=video_uri,
        features=features,
        video_context=context,
    )

    print(f'Processing video "{video_uri}"...')
    operation = video_client.annotate_video(request)

    # Wait for operation to complete
    response = cast(vi.AnnotateVideoResponse, operation.result())
    # A single video is processed
    results = response.annotation_results[0]

    return results

โปรดใช้เวลาสักครู่เพื่อศึกษาโค้ดและดูว่าโค้ดใช้วิธีการไลบรารีของไคลเอ็นต์ annotate_video ที่มีพารามิเตอร์ TEXT_DETECTION อย่างไรเพื่อวิเคราะห์วิดีโอและตรวจหาข้อความ

เรียกใช้ฟังก์ชันเพื่อวิเคราะห์วิดีโอจากวินาทีที่ 13 ถึง 27:

video_uri = "gs://cloud-samples-data/video/JaneGoodall.mp4"
segment = vi.VideoSegment(
    start_time_offset=timedelta(seconds=13),
    end_time_offset=timedelta(seconds=27),
)

results = detect_text(video_uri, segments=[segment])

รอให้ระบบประมวลผลวิดีโอโดยทำดังนี้

Processing video: "gs://cloud-samples-data/video/JaneGoodall.mp4"...

เพิ่มฟังก์ชันนี้เพื่อพิมพ์ข้อความที่ตรวจพบ

def print_video_text(results: vi.VideoAnnotationResults, min_frames: int = 15):
    annotations = sorted_by_first_segment_end(results.text_annotations)

    print(" Detected text ".center(80, "-"))
    for annotation in annotations:
        for text_segment in annotation.segments:
            frames = len(text_segment.frames)
            if frames < min_frames:
                continue
            text = annotation.text
            confidence = text_segment.confidence
            start = text_segment.segment.start_time_offset
            seconds = segment_seconds(text_segment.segment)
            print(text)
            print(f"  {confidence:4.0%} | {start} + {seconds:.1f}s | {frames} fr.")


def sorted_by_first_segment_end(
    annotations: Sequence[vi.TextAnnotation],
) -> Sequence[vi.TextAnnotation]:
    def first_segment_end(annotation: vi.TextAnnotation) -> int:
        return annotation.segments[0].segment.end_time_offset.total_seconds()

    return sorted(annotations, key=first_segment_end)


def segment_seconds(segment: vi.VideoSegment) -> float:
    t1 = segment.start_time_offset.total_seconds()
    t2 = segment.end_time_offset.total_seconds()
    return t2 - t1

เรียกใช้ฟังก์ชัน

print_video_text(results)

คุณควรจะเห็นบางสิ่งเช่นนี้:

-------------------------------- Detected text ---------------------------------
GOMBE NATIONAL PARK
   99% | 0:00:15.760000 + 1.7s | 15 fr.
TANZANIA
  100% | 0:00:15.760000 + 4.8s | 39 fr.
With words and narration by
  100% | 0:00:23.200000 + 3.6s | 31 fr.
Jane Goodall
   99% | 0:00:23.080000 + 3.8s | 33 fr.

เพิ่มฟังก์ชันนี้เพื่อพิมพ์รายการเฟรมข้อความและกรอบล้อมรอบที่ตรวจพบ

def print_text_frames(results: vi.VideoAnnotationResults, contained_text: str):
    # Vertex order: top-left, top-right, bottom-right, bottom-left
    def box_top_left(box: vi.NormalizedBoundingPoly) -> str:
        tl = box.vertices[0]
        return f"({tl.x:.5f}, {tl.y:.5f})"

    def box_bottom_right(box: vi.NormalizedBoundingPoly) -> str:
        br = box.vertices[2]
        return f"({br.x:.5f}, {br.y:.5f})"

    annotations = results.text_annotations
    annotations = [a for a in annotations if contained_text in a.text]
    for annotation in annotations:
        print(f" {annotation.text} ".center(80, "-"))
        for text_segment in annotation.segments:
            for frame in text_segment.frames:
                frame_ms = frame.time_offset.total_seconds()
                box = frame.rotated_bounding_box
                print(
                    f"{frame_ms:>7.3f}",
                    box_top_left(box),
                    box_bottom_right(box),
                    sep=" | ",
                )

เรียกใช้ฟังก์ชันเพื่อตรวจสอบว่าเฟรมใดแสดงชื่อผู้บรรยาย

contained_text = "Goodall"
print_text_frames(results, contained_text)

คุณควรจะเห็นบางสิ่งเช่นนี้:

--------------------------------- Jane Goodall ---------------------------------
 23.080 | (0.39922, 0.49861) | (0.62752, 0.55888)
 23.200 | (0.38750, 0.49028) | (0.62692, 0.56306)
...
 26.800 | (0.36016, 0.49583) | (0.61094, 0.56048)
 26.920 | (0.45859, 0.49583) | (0.60365, 0.56174)

หากคุณวาดกรอบล้อมรอบเหนือเฟรมที่ตรงกัน คุณจะได้รับข้อมูลต่อไปนี้

สรุป

ในขั้นตอนนี้ คุณสามารถตรวจหาและติดตามข้อความในวิดีโอโดยใช้ Video Intelligence API ได้ อ่านเพิ่มเติมเกี่ยวกับการตรวจหาและติดตามข้อความ

10 ตรวจหาและติดตามวัตถุ

คุณใช้ Video Intelligence API เพื่อตรวจหาและติดตามออบเจ็กต์ในวิดีโอได้

คัดลอกโค้ดต่อไปนี้ลงในเซสชัน IPython

from datetime import timedelta
from typing import Optional, Sequence, cast

from google.cloud import videointelligence_v1 as vi


def track_objects(
    video_uri: str, segments: Optional[Sequence[vi.VideoSegment]] = None
) -> vi.VideoAnnotationResults:
    video_client = vi.VideoIntelligenceServiceClient()
    features = [vi.Feature.OBJECT_TRACKING]
    context = vi.VideoContext(segments=segments)
    request = vi.AnnotateVideoRequest(
        input_uri=video_uri,
        features=features,
        video_context=context,
    )

    print(f'Processing video "{video_uri}"...')
    operation = video_client.annotate_video(request)

    # Wait for operation to complete
    response = cast(vi.AnnotateVideoResponse, operation.result())
    # A single video is processed
    results = response.annotation_results[0]

    return results

โปรดใช้เวลาสักครู่เพื่อศึกษาโค้ดและดูว่าโค้ดใช้วิธีการไลบรารีของไคลเอ็นต์ annotate_video ที่มีพารามิเตอร์ OBJECT_TRACKING อย่างไรเพื่อวิเคราะห์วิดีโอและตรวจหาออบเจ็กต์

เรียกใช้ฟังก์ชันเพื่อวิเคราะห์วิดีโอจากวินาที 98 ถึง 112:

video_uri = "gs://cloud-samples-data/video/JaneGoodall.mp4"
segment = vi.VideoSegment(
    start_time_offset=timedelta(seconds=98),
    end_time_offset=timedelta(seconds=112),
)

results = track_objects(video_uri, [segment])

รอให้ระบบประมวลผลวิดีโอโดยทำดังนี้

Processing video: "gs://cloud-samples-data/video/JaneGoodall.mp4"...

เพิ่มฟังก์ชันนี้เพื่อพิมพ์รายการวัตถุที่ตรวจพบ ดังนี้

def print_detected_objects(
    results: vi.VideoAnnotationResults,
    min_confidence: float = 0.7,
):
    annotations = results.object_annotations
    annotations = [a for a in annotations if min_confidence <= a.confidence]

    print(
        f" Detected objects: {len(annotations)}"
        f" ({min_confidence:.0%} <= confidence) ".center(80, "-")
    )
    for annotation in annotations:
        entity = annotation.entity
        description = entity.description
        entity_id = entity.entity_id
        confidence = annotation.confidence
        t1 = annotation.segment.start_time_offset.total_seconds()
        t2 = annotation.segment.end_time_offset.total_seconds()
        frames = len(annotation.frames)
        print(
            f"{description:<22}",
            f"{entity_id:<10}",
            f"{confidence:4.0%}",
            f"{t1:>7.3f}",
            f"{t2:>7.3f}",
            f"{frames:>2} fr.",
            sep=" | ",
        )

เรียกใช้ฟังก์ชัน

print_detected_objects(results)

คุณควรจะเห็นบางสิ่งเช่นนี้:

------------------- Detected objects: 3 (70% <= confidence) --------------------
insect                 | /m/03vt0   |  87% |  98.840 | 101.720 | 25 fr.
insect                 | /m/03vt0   |  71% | 108.440 | 111.080 | 23 fr.
butterfly              | /m/0cyf8   |  91% | 111.200 | 111.920 |  7 fr.

เพิ่มฟังก์ชันนี้เพื่อพิมพ์รายการเฟรมของออบเจ็กต์และกรอบล้อมรอบที่ตรวจพบ

def print_object_frames(
    results: vi.VideoAnnotationResults,
    entity_id: str,
    min_confidence: float = 0.7,
):
    def keep_annotation(annotation: vi.ObjectTrackingAnnotation) -> bool:
        return (
            annotation.entity.entity_id == entity_id
            and min_confidence <= annotation.confidence
        )

    annotations = results.object_annotations
    annotations = [a for a in annotations if keep_annotation(a)]
    for annotation in annotations:
        description = annotation.entity.description
        confidence = annotation.confidence
        print(
            f" {description},"
            f" confidence: {confidence:.0%},"
            f" frames: {len(annotation.frames)} ".center(80, "-")
        )
        for frame in annotation.frames:
            t = frame.time_offset.total_seconds()
            box = frame.normalized_bounding_box
            print(
                f"{t:>7.3f}",
                f"({box.left:.5f}, {box.top:.5f})",
                f"({box.right:.5f}, {box.bottom:.5f})",
                sep=" | ",
            )

เรียกใช้ฟังก์ชันด้วยรหัสเอนทิตีสำหรับแมลง

insect_entity_id = "/m/03vt0"
print_object_frames(results, insect_entity_id)

คุณควรจะเห็นบางสิ่งเช่นนี้:

--------------------- insect, confidence: 87%, frames: 25 ----------------------
 98.840 | (0.49327, 0.19617) | (0.69905, 0.69633)
 98.960 | (0.49559, 0.19308) | (0.70631, 0.69671)
...
101.600 | (0.46668, 0.19776) | (0.76619, 0.69371)
101.720 | (0.46805, 0.20053) | (0.76447, 0.68703)
--------------------- insect, confidence: 71%, frames: 23 ----------------------
108.440 | (0.47343, 0.10694) | (0.63821, 0.98332)
108.560 | (0.46960, 0.10206) | (0.63033, 0.98285)
...
110.960 | (0.49466, 0.05102) | (0.65941, 0.99357)
111.080 | (0.49572, 0.04728) | (0.65762, 0.99868)

หากคุณวาดกรอบล้อมรอบเหนือเฟรมที่ตรงกัน คุณจะได้รับข้อมูลต่อไปนี้

สรุป

ในขั้นตอนนี้ คุณสามารถตรวจหาวัตถุและติดตามวิดีโอโดยใช้ Video Intelligence API อ่านเพิ่มเติมเกี่ยวกับการตรวจหาและติดตามออบเจ็กต์

11 ตรวจหาและติดตามโลโก้

คุณใช้ Video Intelligence API เพื่อตรวจหาและติดตามโลโก้ในวิดีโอได้ ตรวจพบแบรนด์และโลโก้ได้มากกว่า 100,000 รายการ

คัดลอกโค้ดต่อไปนี้ลงในเซสชัน IPython

from datetime import timedelta
from typing import Optional, Sequence, cast

from google.cloud import videointelligence_v1 as vi


def detect_logos(
    video_uri: str, segments: Optional[Sequence[vi.VideoSegment]] = None
) -> vi.VideoAnnotationResults:
    video_client = vi.VideoIntelligenceServiceClient()
    features = [vi.Feature.LOGO_RECOGNITION]
    context = vi.VideoContext(segments=segments)
    request = vi.AnnotateVideoRequest(
        input_uri=video_uri,
        features=features,
        video_context=context,
    )

    print(f'Processing video "{video_uri}"...')
    operation = video_client.annotate_video(request)

    # Wait for operation to complete
    response = cast(vi.AnnotateVideoResponse, operation.result())
    # A single video is processed
    results = response.annotation_results[0]

    return results

โปรดใช้เวลาสักครู่เพื่อศึกษาโค้ดและดูว่าโค้ดใช้วิธีการไลบรารีไคลเอ็นต์ annotate_video กับพารามิเตอร์ LOGO_RECOGNITION อย่างไรเพื่อวิเคราะห์วิดีโอและตรวจหาโลโก้

เรียกใช้ฟังก์ชันนี้เพื่อวิเคราะห์ลำดับสุดท้ายของวิดีโอ

video_uri = "gs://cloud-samples-data/video/JaneGoodall.mp4"
segment = vi.VideoSegment(
    start_time_offset=timedelta(seconds=146),
    end_time_offset=timedelta(seconds=156),
)

results = detect_logos(video_uri, [segment])

รอให้ระบบประมวลผลวิดีโอโดยทำดังนี้

Processing video: "gs://cloud-samples-data/video/JaneGoodall.mp4"...

เพิ่มฟังก์ชันนี้เพื่อพิมพ์รายการโลโก้ที่ตรวจพบ ดังนี้

def print_detected_logos(results: vi.VideoAnnotationResults):
    annotations = results.logo_recognition_annotations

    print(f" Detected logos: {len(annotations)} ".center(80, "-"))
    for annotation in annotations:
        entity = annotation.entity
        entity_id = entity.entity_id
        description = entity.description
        for track in annotation.tracks:
            confidence = track.confidence
            t1 = track.segment.start_time_offset.total_seconds()
            t2 = track.segment.end_time_offset.total_seconds()
            logo_frames = len(track.timestamped_objects)
            print(
                f"{confidence:4.0%}",
                f"{t1:>7.3f}",
                f"{t2:>7.3f}",
                f"{logo_frames:>3} fr.",
                f"{entity_id:<15}",
                f"{description}",
                sep=" | ",
            )

เรียกใช้ฟังก์ชัน

print_detected_logos(results)

คุณควรจะเห็นบางสิ่งเช่นนี้:

------------------------------ Detected logos: 1 -------------------------------
 92% | 150.680 | 155.720 |  43 fr. | /m/055t58       | Google Maps

เพิ่มฟังก์ชันนี้เพื่อพิมพ์รายการเฟรมโลโก้และกรอบล้อมรอบที่ตรวจพบ

def print_logo_frames(results: vi.VideoAnnotationResults, entity_id: str):
    def keep_annotation(annotation: vi.LogoRecognitionAnnotation) -> bool:
        return annotation.entity.entity_id == entity_id

    annotations = results.logo_recognition_annotations
    annotations = [a for a in annotations if keep_annotation(a)]
    for annotation in annotations:
        description = annotation.entity.description
        for track in annotation.tracks:
            confidence = track.confidence
            print(
                f" {description},"
                f" confidence: {confidence:.0%},"
                f" frames: {len(track.timestamped_objects)} ".center(80, "-")
            )
            for timestamped_object in track.timestamped_objects:
                t = timestamped_object.time_offset.total_seconds()
                box = timestamped_object.normalized_bounding_box
                print(
                    f"{t:>7.3f}",
                    f"({box.left:.5f}, {box.top:.5f})",
                    f"({box.right:.5f}, {box.bottom:.5f})",
                    sep=" | ",
                )

เรียกใช้ฟังก์ชันด้วยรหัสเอนทิตีของโลโก้ Google Maps

maps_entity_id = "/m/055t58"
print_logo_frames(results, maps_entity_id)

คุณควรจะเห็นบางสิ่งเช่นนี้:

------------------- Google Maps, confidence: 92%, frames: 43 -------------------
150.680 | (0.42024, 0.28633) | (0.58192, 0.64220)
150.800 | (0.41713, 0.27822) | (0.58318, 0.63556)
...
155.600 | (0.41775, 0.27701) | (0.58372, 0.63986)
155.720 | (0.41688, 0.28005) | (0.58335, 0.63954)

หากคุณวาดกรอบล้อมรอบเหนือเฟรมที่ตรงกัน คุณจะได้รับข้อมูลต่อไปนี้

สรุป

ในขั้นตอนนี้ คุณสามารถตรวจหาและติดตามโลโก้ในวิดีโอโดยใช้ Video Intelligence API คุณสามารถอ่านเพิ่มเติมเกี่ยวกับการตรวจหาและติดตามโลโก้

12 ตรวจหาหลายฟีเจอร์

ประเภทคำขอที่คุณสามารถส่งเพื่อรับข้อมูลเชิงลึกทั้งหมดในครั้งเดียวมีดังนี้

from google.cloud import videointelligence_v1 as vi

video_client = vi.VideoIntelligenceServiceClient()
video_uri = "gs://..."
features = [
    vi.Feature.SHOT_CHANGE_DETECTION,
    vi.Feature.LABEL_DETECTION,
    vi.Feature.EXPLICIT_CONTENT_DETECTION,
    vi.Feature.SPEECH_TRANSCRIPTION,
    vi.Feature.TEXT_DETECTION,
    vi.Feature.OBJECT_TRACKING,
    vi.Feature.LOGO_RECOGNITION,
    vi.Feature.FACE_DETECTION,  # NEW
    vi.Feature.PERSON_DETECTION,  # NEW
]
context = vi.VideoContext(
    segments=...,
    shot_change_detection_config=...,
    label_detection_config=...,
    explicit_content_detection_config=...,
    speech_transcription_config=...,
    text_detection_config=...,
    object_tracking_config=...,
    face_detection_config=...,  # NEW
    person_detection_config=...,  # NEW
)
request = vi.AnnotateVideoRequest(
    input_uri=video_uri,
    features=features,
    video_context=context,
)

# video_client.annotate_video(request)

13 ยินดีด้วย

คุณได้เรียนรู้วิธีใช้ Video Intelligence API โดยใช้ Python แล้ว

ล้างข้อมูล

หากต้องการล้างสภาพแวดล้อมในการพัฒนาซอฟต์แวร์ ให้ดำเนินการดังนี้จาก Cloud Shell

หากคุณยังอยู่ในเซสชัน IPython ให้กลับไปที่ Shell: exit
หยุดใช้สภาพแวดล้อมเสมือนของ Python: deactivate
ลบโฟลเดอร์สภาพแวดล้อมเสมือน: cd ~ ; rm -rf ./venv-videointel

หากต้องการลบโปรเจ็กต์ Google Cloud จาก Cloud Shell ให้ทำดังนี้

เรียกข้อมูลรหัสโปรเจ็กต์ปัจจุบัน: PROJECT_ID=$(gcloud config get-value core/project)
ตรวจสอบว่านี่คือโปรเจ็กต์ที่คุณต้องการลบ: echo $PROJECT_ID
ลบโปรเจ็กต์: gcloud projects delete $PROJECT_ID

ดูข้อมูลเพิ่มเติม

ทดสอบเดโมในเบราว์เซอร์ที่ https://zackakil.github.io/video-intelligence-api-visualiser
เอกสาร Video Intelligence: https://cloud.google.com/video-intelligence/docs
ฟีเจอร์เบต้า: https://cloud.google.com/video-intelligence/docs/beta
Python บน Google Cloud: https://cloud.google.com/python
ไลบรารีไคลเอ็นต์ Cloud สำหรับ Python: https://github.com/googleapis/google-cloud-python

ใบอนุญาต

ผลงานนี้ได้รับอนุญาตภายใต้ใบอนุญาตทั่วไปครีเอทีฟคอมมอนส์แบบระบุแหล่งที่มา 2.0

รายงานความผิดพลาด