Python ile Video Intelligence API'yi Kullanma

17 dakika kaldı

Bu codelab hakkında

Son güncelleme Nis 4, 2023

Yazan: Laurent Picard

1. Genel Bakış

Video Intelligence API, Google video analiz teknolojisini uygulamalarınızın bir parçası olarak kullanmanızı sağlar.

Bu laboratuvarda, Video Intelligence API'yi Python ile kullanmaya odaklanacaksınız.

Neler öğreneceksiniz?

Ortamınızı ayarlama
Python'u kurma
Çekim değişikliklerini tespit etme
Etiketleri algılama
Uygunsuz içerik nasıl tespit edilir?
Konuşmayı metne dönüştürme
Metin nasıl algılanır ve izlenir?
Nesneler nasıl algılanır ve izlenir?
Logoları tespit etme ve takip etme

Gerekenler

Bir Google Cloud projesi
Chrome veya Firefox gibi bir tarayıcı
Python kullanımı hakkında bilgi

Anket

Bu eğiticiden nasıl yararlanacaksınız?

Yalnızca okumaOkuyun ve alıştırmaları tamamlayın

Python deneyiminizi nasıl değerlendirirsiniz?

AcemiOrtaYeterli

Google Cloud hizmetleriyle ilgili deneyiminizi nasıl değerlendirirsiniz?

AcemiOrtaYeterli

Kendi hızınızda ortam kurulumu

Google Cloud Console'da oturum açıp yeni bir proje oluşturun veya mevcut bir projeyi yeniden kullanın. Gmail veya Google Workspace hesabınız yoksa hesap oluşturmanız gerekir.

Proje adı, bu projenin katılımcıları için görünen addır. Google API'leri tarafından kullanılmayan bir karakter dizesidir. İstediğiniz zaman güncelleyebilirsiniz.
Proje Kimliği, tüm Google Cloud projelerinde benzersizdir ve değiştirilemez (belirlendikten sonra değiştirilemez). Cloud Console, otomatik olarak benzersiz bir dize oluşturur. bunun ne olduğunu umursamıyorsunuz. Çoğu codelab'de proje kimliğinizi (genellikle PROJECT_ID olarak tanımlanır) belirtmeniz gerekir. Oluşturulan kimliği beğenmezseniz rastgele bir kimlik daha oluşturabilirsiniz. Alternatif olarak, kendi ölçümünüzü deneyip mevcut olup olmadığına bakabilirsiniz. Bu adımdan sonra değiştirilemez ve proje süresince kalır.
Bilginiz olması açısından, bazı API'lerin kullandığı üçüncü bir değer, yani Proje Numarası daha vardır. Bu değerlerin üçü hakkında daha fazla bilgiyi belgelerde bulabilirsiniz.

Sonraki adımda, Cloud kaynaklarını/API'lerini kullanmak için Cloud Console'da faturalandırmayı etkinleştirmeniz gerekir. Bu codelab'i çalıştırmanın maliyeti, yüksek değildir. Bu eğitim dışında faturalandırmanın tekrarlanmasını önlemek amacıyla kaynakları kapatmak için oluşturduğunuz kaynakları silebilir veya projeyi silebilirsiniz. Yeni Google Cloud kullanıcıları 300 ABD doları değerindeki ücretsiz denemeden yararlanabilir.

Cloud Shell'i başlatma

Google Cloud, dizüstü bilgisayarınızdan uzaktan çalıştırılabilse de bu codelab'de Cloud'da çalışan bir komut satırı ortamı olan Cloud Shell'i kullanacaksınız.

Cloud Shell'i etkinleştirme

Cloud Console'da, Cloud Shell'i etkinleştir simgesini tıklayın.

Cloud Shell'i ilk kez başlatıyorsanız ne olduğunu açıklayan bir ara ekran gösterilir. Ara bir ekran görüntülendiyse Devam'ı tıklayın.

Temel hazırlık ve Cloud Shell'e bağlanmak yalnızca birkaç dakika sürer.

Gereken tüm geliştirme araçları bu sanal makinede yüklüdür. 5 GB boyutunda kalıcı bir ana dizin sunar ve Google Cloud'da çalışarak ağ performansını ve kimlik doğrulamasını büyük ölçüde iyileştirir. Bu codelab'deki çalışmalarınızın tamamı olmasa bile büyük bir kısmı tarayıcıyla yapılabilir.

Cloud Shell'e bağlandıktan sonra kimliğinizin doğrulandığını ve projenin proje kimliğinize ayarlandığını göreceksiniz.

Kimlik doğrulamanızın tamamlandığını onaylamak için Cloud Shell'de aşağıdaki komutu çalıştırın:

gcloud auth list

Komut çıkışı

 Credentialed Accounts
ACTIVE  ACCOUNT
*       <my_account>@<my_domain.com>

To set the active account, run:
    $ gcloud config set account `ACCOUNT`

gcloud komutunun projenizi bildiğini onaylamak için Cloud Shell'de aşağıdaki komutu çalıştırın:

gcloud config list project

Komut çıkışı

[core]
project = <PROJECT_ID>

Doğru değilse aşağıdaki komutla ayarlayabilirsiniz:

gcloud config set project <PROJECT_ID>

Komut çıkışı

Updated property [core/project].

3. Ortam kurulumu

Video Intelligence API'yi kullanmaya başlamadan önce API'yi etkinleştirmek için Cloud Shell'de aşağıdaki komutu çalıştırın:

gcloud services enable videointelligence.googleapis.com

Aşağıdakine benzer bir tablo görürsünüz:

Operation "operations/..." finished successfully.

Artık Video Intelligence API'yi kullanabilirsiniz.

Ana dizininize gidin:

cd ~

Bağımlılıkları izole etmek için bir Python sanal ortamı oluşturun:

virtualenv venv-videointel

Sanal ortamı etkinleştirin:

source venv-videointel/bin/activate

IPython ve Video Intelligence API istemci kitaplığını yükleyin:

pip install ipython google-cloud-videointelligence

Aşağıdakine benzer bir tablo görürsünüz:

...
Installing collected packages: ..., ipython, google-cloud-videointelligence
Successfully installed ... google-cloud-videointelligence-2.11.0 ...

Artık Video Intelligence API istemci kitaplığını kullanmaya hazırsınız.

Sonraki adımlarda, önceki adımda yüklediğiniz IPython adlı etkileşimli bir Python yorumlayıcısını kullanacaksınız. Cloud Shell'de ipython çalıştırarak oturum başlatın:

ipython

Aşağıdakine benzer bir tablo görürsünüz:

Python 3.9.2 (default, Feb 28 2021, 17:03:44)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]:

4. Örnek video

Cloud Storage'da depolanan veya veri baytı olarak sağlanan videolara ek açıklama eklemek için Video Intelligence API'yi kullanabilirsiniz.

Sonraki adımlarda, Cloud Storage'da depolanan örnek bir videoyu kullanacaksınız. Videoyu tarayıcınızda görüntüleyebilirsiniz.

Hazırsanız başlayalım.

5. Çekim değişikliklerini algılama

Bir videodaki çekim değişikliklerini tespit etmek için Video Intelligence API'yi kullanabilirsiniz. Videonun bir bölümü olan çekim, görsel olarak sürekliliği olan bir dizi karedir.

Aşağıdaki kodu IPython oturumunuza kopyalayın:

from typing import cast

from google.cloud import videointelligence_v1 as vi


def detect_shot_changes(video_uri: str) -> vi.VideoAnnotationResults:
    video_client = vi.VideoIntelligenceServiceClient()
    features = [vi.Feature.SHOT_CHANGE_DETECTION]
    request = vi.AnnotateVideoRequest(input_uri=video_uri, features=features)

    print(f'Processing video: "{video_uri}"...')
    operation = video_client.annotate_video(request)

    # Wait for operation to complete
    response = cast(vi.AnnotateVideoResponse, operation.result())
    # A single video is processed
    results = response.annotation_results[0]

    return results

Biraz zaman ayırarak kodu inceleyin ve videoyu analiz etmek ve çekim değişikliklerini tespit etmek için kodun SHOT_CHANGE_DETECTION parametresiyle annotate_video istemci kitaplığı yöntemini nasıl kullandığını görün.

Videoyu analiz etmek için işlevi çağırın:

video_uri = "gs://cloud-samples-data/video/JaneGoodall.mp4"

results = detect_shot_changes(video_uri)

Videonun işlenmesini bekleyin:

Processing video: "gs://cloud-samples-data/video/JaneGoodall.mp4"...

Video çekimlerini yazdırmak için şu işlevi ekleyin:

def print_video_shots(results: vi.VideoAnnotationResults):
    shots = results.shot_annotations
    print(f" Video shots: {len(shots)} ".center(40, "-"))
    for i, shot in enumerate(shots):
        t1 = shot.start_time_offset.total_seconds()
        t2 = shot.end_time_offset.total_seconds()
        print(f"{i+1:>3} | {t1:7.3f} | {t2:7.3f}")

İşlevi çağırın:

print_video_shots(results)

Aşağıdakine benzer bir tablo görürsünüz:

----------- Video shots: 34 ------------
  1 |   0.000 |  12.880
  2 |  12.920 |  21.680
  3 |  21.720 |  27.880
...
 32 | 135.160 | 138.320
 33 | 138.360 | 146.200
 34 | 146.240 | 162.520

Her çekimin orta karesini çıkarıp bunları bir kare duvarı şeklinde düzenlerseniz videonun görsel bir özetini oluşturabilirsiniz:

Özet

Bu adımda, Video Intelligence API'yi kullanarak bir videoda çekim değişikliği algılamayı gerçekleştirdiniz. Çekim değişikliklerini algılama hakkında daha fazla bilgi edinebilirsiniz.

6. Etiketleri algılama

Bir videodaki etiketleri algılamak için Video Intelligence API'yi kullanabilirsiniz. Etiketler, videoyu görsel içeriğine göre açıklar.

Aşağıdaki kodu IPython oturumunuza kopyalayın:

from datetime import timedelta
from typing import Optional, Sequence, cast

from google.cloud import videointelligence_v1 as vi


def detect_labels(
    video_uri: str,
    mode: vi.LabelDetectionMode,
    segments: Optional[Sequence[vi.VideoSegment]] = None,
) -> vi.VideoAnnotationResults:
    video_client = vi.VideoIntelligenceServiceClient()
    features = [vi.Feature.LABEL_DETECTION]
    config = vi.LabelDetectionConfig(label_detection_mode=mode)
    context = vi.VideoContext(segments=segments, label_detection_config=config)
    request = vi.AnnotateVideoRequest(
        input_uri=video_uri,
        features=features,
        video_context=context,
    )

    print(f'Processing video "{video_uri}"...')
    operation = video_client.annotate_video(request)

    # Wait for operation to complete
    response = cast(vi.AnnotateVideoResponse, operation.result())
    # A single video is processed
    results = response.annotation_results[0]

    return results

Biraz zaman ayırarak kodu inceleyin ve bir videoyu analiz etmek ve etiketleri algılamak için LABEL_DETECTION parametresiyle annotate_video istemci kitaplığı yöntemini nasıl kullandığını öğrenin.

Videonun ilk 37 saniyesini analiz etmek için işlevi çağırın:

video_uri = "gs://cloud-samples-data/video/JaneGoodall.mp4"
mode = vi.LabelDetectionMode.SHOT_MODE
segment = vi.VideoSegment(
    start_time_offset=timedelta(seconds=0),
    end_time_offset=timedelta(seconds=37),
)

results = detect_labels(video_uri, mode, [segment])

Videonun işlenmesini bekleyin:

Processing video: "gs://cloud-samples-data/video/JaneGoodall.mp4"...

Etiketleri video düzeyinde yazdırmak için şu işlevi ekleyin:

def print_video_labels(results: vi.VideoAnnotationResults):
    labels = sorted_by_first_segment_confidence(results.segment_label_annotations)

    print(f" Video labels: {len(labels)} ".center(80, "-"))
    for label in labels:
        categories = category_entities_to_str(label.category_entities)
        for segment in label.segments:
            confidence = segment.confidence
            t1 = segment.segment.start_time_offset.total_seconds()
            t2 = segment.segment.end_time_offset.total_seconds()
            print(
                f"{confidence:4.0%}",
                f"{t1:7.3f}",
                f"{t2:7.3f}",
                f"{label.entity.description}{categories}",
                sep=" | ",
            )


def sorted_by_first_segment_confidence(
    labels: Sequence[vi.LabelAnnotation],
) -> Sequence[vi.LabelAnnotation]:
    def first_segment_confidence(label: vi.LabelAnnotation) -> float:
        return label.segments[0].confidence

    return sorted(labels, key=first_segment_confidence, reverse=True)


def category_entities_to_str(category_entities: Sequence[vi.Entity]) -> str:
    if not category_entities:
        return ""
    entities = ", ".join([e.description for e in category_entities])
    return f" ({entities})"

İşlevi çağırın:

print_video_labels(results)

Aşağıdakine benzer bir tablo görürsünüz:

------------------------------- Video labels: 10 -------------------------------
 96% |   0.000 |  36.960 | nature
 74% |   0.000 |  36.960 | vegetation
 59% |   0.000 |  36.960 | tree (plant)
 56% |   0.000 |  36.960 | forest (geographical feature)
 49% |   0.000 |  36.960 | leaf (plant)
 43% |   0.000 |  36.960 | flora (plant)
 38% |   0.000 |  36.960 | nature reserve (geographical feature)
 38% |   0.000 |  36.960 | woodland (forest)
 35% |   0.000 |  36.960 | water resources (water)
 32% |   0.000 |  36.960 | sunlight (light)

Video düzeyindeki bu etiketler sayesinde, videonun başlangıcının çoğunlukla doğa ve bitki örtüsü hakkında olduğunu anlayabilirsiniz.

Çekim düzeyinde etiketleri yazdırmak için şu işlevi ekleyin:

def print_shot_labels(results: vi.VideoAnnotationResults):
    labels = sorted_by_first_segment_start_and_confidence(
        results.shot_label_annotations
    )

    print(f" Shot labels: {len(labels)} ".center(80, "-"))
    for label in labels:
        categories = category_entities_to_str(label.category_entities)
        print(f"{label.entity.description}{categories}")
        for segment in label.segments:
            confidence = segment.confidence
            t1 = segment.segment.start_time_offset.total_seconds()
            t2 = segment.segment.end_time_offset.total_seconds()
            print(f"{confidence:4.0%} | {t1:7.3f} | {t2:7.3f}")


def sorted_by_first_segment_start_and_confidence(
    labels: Sequence[vi.LabelAnnotation],
) -> Sequence[vi.LabelAnnotation]:
    def first_segment_start_and_confidence(label: vi.LabelAnnotation):
        first_segment = label.segments[0]
        ms = first_segment.segment.start_time_offset.total_seconds()
        return (ms, -first_segment.confidence)

    return sorted(labels, key=first_segment_start_and_confidence)

İşlevi çağırın:

print_shot_labels(results)

Aşağıdakine benzer bir tablo görürsünüz:

------------------------------- Shot labels: 29 --------------------------------
planet (astronomical object)
 83% |   0.000 |  12.880
earth (planet)
 53% |   0.000 |  12.880
water resources (water)
 43% |   0.000 |  12.880
aerial photography (photography)
 43% |   0.000 |  12.880
vegetation
 32% |   0.000 |  12.880
 92% |  12.920 |  21.680
 83% |  21.720 |  27.880
 77% |  27.920 |  31.800
 76% |  31.840 |  34.720
...
butterfly (insect, animal)
 84% |  34.760 |  36.960
...

Çekim düzeyindeki bu etiketler sayesinde, videonun bir gezegen (muhtemelen Dünya) çekimiyle başladığını, 34.760-36.960s karesinde bir kelebek bulunduğunu,...

Özet

Bu adımda, Video Intelligence API'yi kullanarak bir video üzerinde etiket algılama işlemi gerçekleştirdiniz. Etiketleri algılama hakkında daha fazla bilgi edinebilirsiniz.

7. Uygunsuz içerikleri tespit edin

Bir videodaki uygunsuz içeriği tespit etmek için Video Intelligence API'yi kullanabilirsiniz. Uygunsuz içerik, genellikle 18 yaşından küçükler için uygunsuz olan yetişkinlere uygun içeriktir ve çıplaklık, cinsel aktiviteler ve pornografiyi içerir ancak bunlarla sınırlı değildir. Algılama, yalnızca kare başına görsel sinyallere göre gerçekleştirilir (ses kullanılmaz). Yanıt, VERY_UNLIKELY ile VERY_LIKELY arasında değişen olasılık değerleri içeriyor.

Aşağıdaki kodu IPython oturumunuza kopyalayın:

from datetime import timedelta
from typing import Optional, Sequence, cast

from google.cloud import videointelligence_v1 as vi


def detect_explicit_content(
    video_uri: str,
    segments: Optional[Sequence[vi.VideoSegment]] = None,
) -> vi.VideoAnnotationResults:
    video_client = vi.VideoIntelligenceServiceClient()
    features = [vi.Feature.EXPLICIT_CONTENT_DETECTION]
    context = vi.VideoContext(segments=segments)
    request = vi.AnnotateVideoRequest(
        input_uri=video_uri,
        features=features,
        video_context=context,
    )

    print(f'Processing video "{video_uri}"...')
    operation = video_client.annotate_video(request)

    # Wait for operation to complete
    response = cast(vi.AnnotateVideoResponse, operation.result())
    # A single video is processed
    results = response.annotation_results[0]

    return results

Biraz zaman ayırarak kodu inceleyin ve bir videoyu analiz edip uygunsuz içerikleri tespit etmek için annotate_video istemci kitaplığı yöntemini EXPLICIT_CONTENT_DETECTION parametresiyle nasıl kullandığını öğrenin.

Videonun ilk 10 saniyesini analiz etmek için işlevi çağırın:

video_uri = "gs://cloud-samples-data/video/JaneGoodall.mp4"
segment = vi.VideoSegment(
    start_time_offset=timedelta(seconds=0),
    end_time_offset=timedelta(seconds=10),
)

results = detect_explicit_content(video_uri, [segment])

Videonun işlenmesini bekleyin:

Processing video: "gs://cloud-samples-data/video/JaneGoodall.mp4"...

Farklı olasılık sayılarını yazdırmak için bu işlevi ekleyin:

def print_explicit_content(results: vi.VideoAnnotationResults):
    from collections import Counter

    frames = results.explicit_annotation.frames
    likelihood_counts = Counter([f.pornography_likelihood for f in frames])

    print(f" Explicit content frames: {len(frames)} ".center(40, "-"))
    for likelihood in vi.Likelihood:
        print(f"{likelihood.name:<22}: {likelihood_counts[likelihood]:>3}")

İşlevi çağırın:

print_explicit_content(results)

Aşağıdakine benzer bir tablo görürsünüz:

----- Explicit content frames: 10 ------
LIKELIHOOD_UNSPECIFIED:   0
VERY_UNLIKELY         :  10
UNLIKELY              :   0
POSSIBLE              :   0
LIKELY                :   0
VERY_LIKELY           :   0

Kare ayrıntılarını yazdırmak için bu işlevi ekleyin:

def print_frames(results: vi.VideoAnnotationResults, likelihood: vi.Likelihood):
    frames = results.explicit_annotation.frames
    frames = [f for f in frames if f.pornography_likelihood == likelihood]

    print(f" {likelihood.name} frames: {len(frames)} ".center(40, "-"))
    for frame in frames:
        print(frame.time_offset)

İşlevi çağırın:

print_frames(results, vi.Likelihood.VERY_UNLIKELY)

Aşağıdakine benzer bir tablo görürsünüz:

------- VERY_UNLIKELY frames: 10 -------
0:00:00.365992
0:00:01.279206
0:00:02.268336
0:00:03.289253
0:00:04.400163
0:00:05.291547
0:00:06.449558
0:00:07.452751
0:00:08.577405
0:00:09.554514

Özet

Bu adımda, Video Intelligence API'yi kullanarak bir videoda uygunsuz içerik algılama işlemi gerçekleştirdiniz. Uygunsuz içerikleri algılama hakkında daha fazla bilgi edinebilirsiniz.

8. Konuşmayı metne dönüştür

Görüntülü konuşmaları metne dönüştürmek için Video Intelligence API'yi kullanabilirsiniz.

Aşağıdaki kodu IPython oturumunuza kopyalayın:

from datetime import timedelta
from typing import Optional, Sequence, cast

from google.cloud import videointelligence_v1 as vi


def transcribe_speech(
    video_uri: str,
    language_code: str,
    segments: Optional[Sequence[vi.VideoSegment]] = None,
) -> vi.VideoAnnotationResults:
    video_client = vi.VideoIntelligenceServiceClient()
    features = [vi.Feature.SPEECH_TRANSCRIPTION]
    config = vi.SpeechTranscriptionConfig(
        language_code=language_code,
        enable_automatic_punctuation=True,
    )
    context = vi.VideoContext(
        segments=segments,
        speech_transcription_config=config,
    )
    request = vi.AnnotateVideoRequest(
        input_uri=video_uri,
        features=features,
        video_context=context,
    )

    print(f'Processing video "{video_uri}"...')
    operation = video_client.annotate_video(request)

    # Wait for operation to complete
    response = cast(vi.AnnotateVideoResponse, operation.result())
    # A single video is processed
    results = response.annotation_results[0]

    return results

Biraz zaman ayırarak kodu inceleyin ve bir videoyu analiz etmek ve konuşmayı metne dönüştürmek için SPEECH_TRANSCRIPTION parametresiyle annotate_video istemci kitaplığı yöntemini nasıl kullandığını öğrenin.

55 ile 80. saniye arasındaki videoyu analiz etmek için işlevi çağırın:

video_uri = "gs://cloud-samples-data/video/JaneGoodall.mp4"
language_code = "en-GB"
segment = vi.VideoSegment(
    start_time_offset=timedelta(seconds=55),
    end_time_offset=timedelta(seconds=80),
)

results = transcribe_speech(video_uri, language_code, [segment])

Videonun işlenmesini bekleyin:

Processing video: "gs://cloud-samples-data/video/JaneGoodall.mp4"...

Metne dönüştürülen konuşmaları yazdırmak için şu işlevi ekleyin:

def print_video_speech(results: vi.VideoAnnotationResults, min_confidence: float = 0.8):
    def keep_transcription(transcription: vi.SpeechTranscription) -> bool:
        return min_confidence <= transcription.alternatives[0].confidence

    transcriptions = results.speech_transcriptions
    transcriptions = [t for t in transcriptions if keep_transcription(t)]

    print(f" Speech transcriptions: {len(transcriptions)} ".center(80, "-"))
    for transcription in transcriptions:
        first_alternative = transcription.alternatives[0]
        confidence = first_alternative.confidence
        transcript = first_alternative.transcript
        print(f" {confidence:4.0%} | {transcript.strip()}")

İşlevi çağırın:

print_video_speech(results)

Aşağıdakine benzer bir tablo görürsünüz:

--------------------------- Speech transcriptions: 2 ---------------------------
  91% | I was keenly aware of secret movements in the trees.
  92% | I looked into his large and lustrous eyes. They seem somehow to express his entire personality.

Algılanan kelimelerin listesini ve bunların zaman damgalarını yazdırmak için bu işlevi ekleyin:

def print_word_timestamps(
    results: vi.VideoAnnotationResults,
    min_confidence: float = 0.8,
):
    def keep_transcription(transcription: vi.SpeechTranscription) -> bool:
        return min_confidence <= transcription.alternatives[0].confidence

    transcriptions = results.speech_transcriptions
    transcriptions = [t for t in transcriptions if keep_transcription(t)]

    print(" Word timestamps ".center(80, "-"))
    for transcription in transcriptions:
        first_alternative = transcription.alternatives[0]
        confidence = first_alternative.confidence
        for word in first_alternative.words:
            t1 = word.start_time.total_seconds()
            t2 = word.end_time.total_seconds()
            word = word.word
            print(f"{confidence:4.0%} | {t1:7.3f} | {t2:7.3f} | {word}")

İşlevi çağırın:

print_word_timestamps(results)

Aşağıdakine benzer bir tablo görürsünüz:

------------------------------- Word timestamps --------------------------------
 93% |  55.000 |  55.700 | I
 93% |  55.700 |  55.900 | was
 93% |  55.900 |  56.300 | keenly
 93% |  56.300 |  56.700 | aware
 93% |  56.700 |  56.900 | of
...
 94% |  76.900 |  77.400 | express
 94% |  77.400 |  77.600 | his
 94% |  77.600 |  78.200 | entire
 94% |  78.200 |  78.500 | personality.

Özet

Bu adımda, Video Intelligence API'yi kullanarak bir videoda konuşmaları metne dönüştürme işlemi gerçekleştirdiniz. Sesleri metne dönüştürme hakkında daha fazla bilgi edinebilirsiniz.

9. Metin algılama ve izleme

Bir videodaki metinleri algılayıp izlemek için Video Intelligence API'yi kullanabilirsiniz.

Aşağıdaki kodu IPython oturumunuza kopyalayın:

from datetime import timedelta
from typing import Optional, Sequence, cast

from google.cloud import videointelligence_v1 as vi


def detect_text(
    video_uri: str,
    language_hints: Optional[Sequence[str]] = None,
    segments: Optional[Sequence[vi.VideoSegment]] = None,
) -> vi.VideoAnnotationResults:
    video_client = vi.VideoIntelligenceServiceClient()
    features = [vi.Feature.TEXT_DETECTION]
    config = vi.TextDetectionConfig(
        language_hints=language_hints,
    )
    context = vi.VideoContext(
        segments=segments,
        text_detection_config=config,
    )
    request = vi.AnnotateVideoRequest(
        input_uri=video_uri,
        features=features,
        video_context=context,
    )

    print(f'Processing video "{video_uri}"...')
    operation = video_client.annotate_video(request)

    # Wait for operation to complete
    response = cast(vi.AnnotateVideoResponse, operation.result())
    # A single video is processed
    results = response.annotation_results[0]

    return results

Biraz zaman ayırarak kodu inceleyin ve bir videoyu analiz edip metni algılamak için TEXT_DETECTION parametresiyle annotate_video istemci kitaplığı yöntemini nasıl kullandığını öğrenin.

13 ile 27. saniye arasındaki videoyu analiz etmek için işlevi çağırın:

video_uri = "gs://cloud-samples-data/video/JaneGoodall.mp4"
segment = vi.VideoSegment(
    start_time_offset=timedelta(seconds=13),
    end_time_offset=timedelta(seconds=27),
)

results = detect_text(video_uri, segments=[segment])

Videonun işlenmesini bekleyin:

Processing video: "gs://cloud-samples-data/video/JaneGoodall.mp4"...

Algılanan metni yazdırmak için şu işlevi ekleyin:

def print_video_text(results: vi.VideoAnnotationResults, min_frames: int = 15):
    annotations = sorted_by_first_segment_end(results.text_annotations)

    print(" Detected text ".center(80, "-"))
    for annotation in annotations:
        for text_segment in annotation.segments:
            frames = len(text_segment.frames)
            if frames < min_frames:
                continue
            text = annotation.text
            confidence = text_segment.confidence
            start = text_segment.segment.start_time_offset
            seconds = segment_seconds(text_segment.segment)
            print(text)
            print(f"  {confidence:4.0%} | {start} + {seconds:.1f}s | {frames} fr.")


def sorted_by_first_segment_end(
    annotations: Sequence[vi.TextAnnotation],
) -> Sequence[vi.TextAnnotation]:
    def first_segment_end(annotation: vi.TextAnnotation) -> int:
        return annotation.segments[0].segment.end_time_offset.total_seconds()

    return sorted(annotations, key=first_segment_end)


def segment_seconds(segment: vi.VideoSegment) -> float:
    t1 = segment.start_time_offset.total_seconds()
    t2 = segment.end_time_offset.total_seconds()
    return t2 - t1

İşlevi çağırın:

print_video_text(results)

Aşağıdakine benzer bir tablo görürsünüz:

-------------------------------- Detected text ---------------------------------
GOMBE NATIONAL PARK
   99% | 0:00:15.760000 + 1.7s | 15 fr.
TANZANIA
  100% | 0:00:15.760000 + 4.8s | 39 fr.
With words and narration by
  100% | 0:00:23.200000 + 3.6s | 31 fr.
Jane Goodall
   99% | 0:00:23.080000 + 3.8s | 33 fr.

Algılanan metin çerçevelerinin ve sınırlayıcı kutuların listesini yazdırmak için şu işlevi ekleyin:

def print_text_frames(results: vi.VideoAnnotationResults, contained_text: str):
    # Vertex order: top-left, top-right, bottom-right, bottom-left
    def box_top_left(box: vi.NormalizedBoundingPoly) -> str:
        tl = box.vertices[0]
        return f"({tl.x:.5f}, {tl.y:.5f})"

    def box_bottom_right(box: vi.NormalizedBoundingPoly) -> str:
        br = box.vertices[2]
        return f"({br.x:.5f}, {br.y:.5f})"

    annotations = results.text_annotations
    annotations = [a for a in annotations if contained_text in a.text]
    for annotation in annotations:
        print(f" {annotation.text} ".center(80, "-"))
        for text_segment in annotation.segments:
            for frame in text_segment.frames:
                frame_ms = frame.time_offset.total_seconds()
                box = frame.rotated_bounding_box
                print(
                    f"{frame_ms:>7.3f}",
                    box_top_left(box),
                    box_bottom_right(box),
                    sep=" | ",
                )

Hangi karelerde anlatıcının adının gösterildiğini kontrol etmek için işlevi çağırın:

contained_text = "Goodall"
print_text_frames(results, contained_text)

Aşağıdakine benzer bir tablo görürsünüz:

--------------------------------- Jane Goodall ---------------------------------
 23.080 | (0.39922, 0.49861) | (0.62752, 0.55888)
 23.200 | (0.38750, 0.49028) | (0.62692, 0.56306)
...
 26.800 | (0.36016, 0.49583) | (0.61094, 0.56048)
 26.920 | (0.45859, 0.49583) | (0.60365, 0.56174)

Sınırlayıcı kutuları karşılık gelen çerçevelerin üzerine çizerseniz şunu elde edersiniz:

Özet

Bu adımda, Video Intelligence API'yi kullanarak bir video üzerinde metin algılama ve izleme işlemi gerçekleştirdiniz. Metin algılama ve izleme hakkında daha fazla bilgi edinebilirsiniz.

10. Nesneleri algılama ve izleme

Video Intelligence API'yi kullanarak bir videodaki nesneleri algılayıp takip edebilirsiniz.

Aşağıdaki kodu IPython oturumunuza kopyalayın:

from datetime import timedelta
from typing import Optional, Sequence, cast

from google.cloud import videointelligence_v1 as vi


def track_objects(
    video_uri: str, segments: Optional[Sequence[vi.VideoSegment]] = None
) -> vi.VideoAnnotationResults:
    video_client = vi.VideoIntelligenceServiceClient()
    features = [vi.Feature.OBJECT_TRACKING]
    context = vi.VideoContext(segments=segments)
    request = vi.AnnotateVideoRequest(
        input_uri=video_uri,
        features=features,
        video_context=context,
    )

    print(f'Processing video "{video_uri}"...')
    operation = video_client.annotate_video(request)

    # Wait for operation to complete
    response = cast(vi.AnnotateVideoResponse, operation.result())
    # A single video is processed
    results = response.annotation_results[0]

    return results

Biraz zaman ayırarak kodu inceleyin ve bir videoyu analiz edip nesneleri algılamak için OBJECT_TRACKING parametresiyle annotate_video istemci kitaplığı yöntemini nasıl kullandığını görün.

98 ile 112. saniye arasındaki videoyu analiz etmek için işlevi çağırın:

video_uri = "gs://cloud-samples-data/video/JaneGoodall.mp4"
segment = vi.VideoSegment(
    start_time_offset=timedelta(seconds=98),
    end_time_offset=timedelta(seconds=112),
)

results = track_objects(video_uri, [segment])

Videonun işlenmesini bekleyin:

Processing video: "gs://cloud-samples-data/video/JaneGoodall.mp4"...

Algılanan nesnelerin listesini yazdırmak için bu işlevi ekleyin:

def print_detected_objects(
    results: vi.VideoAnnotationResults,
    min_confidence: float = 0.7,
):
    annotations = results.object_annotations
    annotations = [a for a in annotations if min_confidence <= a.confidence]

    print(
        f" Detected objects: {len(annotations)}"
        f" ({min_confidence:.0%} <= confidence) ".center(80, "-")
    )
    for annotation in annotations:
        entity = annotation.entity
        description = entity.description
        entity_id = entity.entity_id
        confidence = annotation.confidence
        t1 = annotation.segment.start_time_offset.total_seconds()
        t2 = annotation.segment.end_time_offset.total_seconds()
        frames = len(annotation.frames)
        print(
            f"{description:<22}",
            f"{entity_id:<10}",
            f"{confidence:4.0%}",
            f"{t1:>7.3f}",
            f"{t2:>7.3f}",
            f"{frames:>2} fr.",
            sep=" | ",
        )

İşlevi çağırın:

print_detected_objects(results)

Aşağıdakine benzer bir tablo görürsünüz:

------------------- Detected objects: 3 (70% <= confidence) --------------------
insect                 | /m/03vt0   |  87% |  98.840 | 101.720 | 25 fr.
insect                 | /m/03vt0   |  71% | 108.440 | 111.080 | 23 fr.
butterfly              | /m/0cyf8   |  91% | 111.200 | 111.920 |  7 fr.

Algılanan nesne çerçevelerinin ve sınırlayıcı kutuların listesini yazdırmak için bu işlevi ekleyin:

def print_object_frames(
    results: vi.VideoAnnotationResults,
    entity_id: str,
    min_confidence: float = 0.7,
):
    def keep_annotation(annotation: vi.ObjectTrackingAnnotation) -> bool:
        return (
            annotation.entity.entity_id == entity_id
            and min_confidence <= annotation.confidence
        )

    annotations = results.object_annotations
    annotations = [a for a in annotations if keep_annotation(a)]
    for annotation in annotations:
        description = annotation.entity.description
        confidence = annotation.confidence
        print(
            f" {description},"
            f" confidence: {confidence:.0%},"
            f" frames: {len(annotation.frames)} ".center(80, "-")
        )
        for frame in annotation.frames:
            t = frame.time_offset.total_seconds()
            box = frame.normalized_bounding_box
            print(
                f"{t:>7.3f}",
                f"({box.left:.5f}, {box.top:.5f})",
                f"({box.right:.5f}, {box.bottom:.5f})",
                sep=" | ",
            )

Böcekler için varlık kimliğiyle işlevi çağırın:

insect_entity_id = "/m/03vt0"
print_object_frames(results, insect_entity_id)

Aşağıdakine benzer bir tablo görürsünüz:

--------------------- insect, confidence: 87%, frames: 25 ----------------------
 98.840 | (0.49327, 0.19617) | (0.69905, 0.69633)
 98.960 | (0.49559, 0.19308) | (0.70631, 0.69671)
...
101.600 | (0.46668, 0.19776) | (0.76619, 0.69371)
101.720 | (0.46805, 0.20053) | (0.76447, 0.68703)
--------------------- insect, confidence: 71%, frames: 23 ----------------------
108.440 | (0.47343, 0.10694) | (0.63821, 0.98332)
108.560 | (0.46960, 0.10206) | (0.63033, 0.98285)
...
110.960 | (0.49466, 0.05102) | (0.65941, 0.99357)
111.080 | (0.49572, 0.04728) | (0.65762, 0.99868)

Sınırlayıcı kutuları karşılık gelen çerçevelerin üzerine çizerseniz şunu elde edersiniz:

Özet

Bu adımda, Video Intelligence API'yi kullanarak bir video üzerinde nesne algılama ve izleme işlemi gerçekleştirdiniz. Nesneleri algılama ve izleme hakkında daha fazla bilgi edinebilirsiniz.

11. Logoları algılama ve takip etme

Video Intelligence API'yi kullanarak bir videodaki logoları algılayıp takip edebilirsiniz. 100.000'den fazla marka ve logo algılanabilir.

Aşağıdaki kodu IPython oturumunuza kopyalayın:

from datetime import timedelta
from typing import Optional, Sequence, cast

from google.cloud import videointelligence_v1 as vi


def detect_logos(
    video_uri: str, segments: Optional[Sequence[vi.VideoSegment]] = None
) -> vi.VideoAnnotationResults:
    video_client = vi.VideoIntelligenceServiceClient()
    features = [vi.Feature.LOGO_RECOGNITION]
    context = vi.VideoContext(segments=segments)
    request = vi.AnnotateVideoRequest(
        input_uri=video_uri,
        features=features,
        video_context=context,
    )

    print(f'Processing video "{video_uri}"...')
    operation = video_client.annotate_video(request)

    # Wait for operation to complete
    response = cast(vi.AnnotateVideoResponse, operation.result())
    # A single video is processed
    results = response.annotation_results[0]

    return results

Biraz zaman ayırarak kodu inceleyin ve bir videoyu analiz edip logoları algılamak için LOGO_RECOGNITION parametresiyle annotate_video istemci kitaplığı yöntemini nasıl kullandığını öğrenin.

Videonun sondan bir önceki sırasını analiz etmek için işlevi çağırın:

video_uri = "gs://cloud-samples-data/video/JaneGoodall.mp4"
segment = vi.VideoSegment(
    start_time_offset=timedelta(seconds=146),
    end_time_offset=timedelta(seconds=156),
)

results = detect_logos(video_uri, [segment])

Videonun işlenmesini bekleyin:

Processing video: "gs://cloud-samples-data/video/JaneGoodall.mp4"...

Algılanan logoların listesini yazdırmak için şu işlevi ekleyin:

def print_detected_logos(results: vi.VideoAnnotationResults):
    annotations = results.logo_recognition_annotations

    print(f" Detected logos: {len(annotations)} ".center(80, "-"))
    for annotation in annotations:
        entity = annotation.entity
        entity_id = entity.entity_id
        description = entity.description
        for track in annotation.tracks:
            confidence = track.confidence
            t1 = track.segment.start_time_offset.total_seconds()
            t2 = track.segment.end_time_offset.total_seconds()
            logo_frames = len(track.timestamped_objects)
            print(
                f"{confidence:4.0%}",
                f"{t1:>7.3f}",
                f"{t2:>7.3f}",
                f"{logo_frames:>3} fr.",
                f"{entity_id:<15}",
                f"{description}",
                sep=" | ",
            )

İşlevi çağırın:

print_detected_logos(results)

Aşağıdakine benzer bir tablo görürsünüz:

------------------------------ Detected logos: 1 -------------------------------
 92% | 150.680 | 155.720 |  43 fr. | /m/055t58       | Google Maps

Algılanan logo çerçevelerinin ve sınırlayıcı kutuların listesini yazdırmak için şu işlevi ekleyin:

def print_logo_frames(results: vi.VideoAnnotationResults, entity_id: str):
    def keep_annotation(annotation: vi.LogoRecognitionAnnotation) -> bool:
        return annotation.entity.entity_id == entity_id

    annotations = results.logo_recognition_annotations
    annotations = [a for a in annotations if keep_annotation(a)]
    for annotation in annotations:
        description = annotation.entity.description
        for track in annotation.tracks:
            confidence = track.confidence
            print(
                f" {description},"
                f" confidence: {confidence:.0%},"
                f" frames: {len(track.timestamped_objects)} ".center(80, "-")
            )
            for timestamped_object in track.timestamped_objects:
                t = timestamped_object.time_offset.total_seconds()
                box = timestamped_object.normalized_bounding_box
                print(
                    f"{t:>7.3f}",
                    f"({box.left:.5f}, {box.top:.5f})",
                    f"({box.right:.5f}, {box.bottom:.5f})",
                    sep=" | ",
                )

Google Haritalar logosu varlık kimliğiyle işlevi çağırın:

maps_entity_id = "/m/055t58"
print_logo_frames(results, maps_entity_id)

Aşağıdakine benzer bir tablo görürsünüz:

------------------- Google Maps, confidence: 92%, frames: 43 -------------------
150.680 | (0.42024, 0.28633) | (0.58192, 0.64220)
150.800 | (0.41713, 0.27822) | (0.58318, 0.63556)
...
155.600 | (0.41775, 0.27701) | (0.58372, 0.63986)
155.720 | (0.41688, 0.28005) | (0.58335, 0.63954)

Sınırlayıcı kutuları karşılık gelen çerçevelerin üzerine çizerseniz şunu elde edersiniz:

Özet

Bu adımda, Video Intelligence API'yi kullanarak bir video üzerinde logo algılama ve izleme işlemi yapabildiniz. Logoları algılama ve izleme hakkında daha fazla bilgi edinebilirsiniz.

12. Birden fazla özelliği algılama

Tüm analizleri bir kerede almak için yapabileceğiniz istek türü:

from google.cloud import videointelligence_v1 as vi

video_client = vi.VideoIntelligenceServiceClient()
video_uri = "gs://..."
features = [
    vi.Feature.SHOT_CHANGE_DETECTION,
    vi.Feature.LABEL_DETECTION,
    vi.Feature.EXPLICIT_CONTENT_DETECTION,
    vi.Feature.SPEECH_TRANSCRIPTION,
    vi.Feature.TEXT_DETECTION,
    vi.Feature.OBJECT_TRACKING,
    vi.Feature.LOGO_RECOGNITION,
    vi.Feature.FACE_DETECTION,  # NEW
    vi.Feature.PERSON_DETECTION,  # NEW
]
context = vi.VideoContext(
    segments=...,
    shot_change_detection_config=...,
    label_detection_config=...,
    explicit_content_detection_config=...,
    speech_transcription_config=...,
    text_detection_config=...,
    object_tracking_config=...,
    face_detection_config=...,  # NEW
    person_detection_config=...,  # NEW
)
request = vi.AnnotateVideoRequest(
    input_uri=video_uri,
    features=features,
    video_context=context,
)

# video_client.annotate_video(request)

13. Tebrikler!

Python ile Video Intelligence API'yi kullanmayı öğrendiniz.

Temizleme

Geliştirme ortamınızı temizlemek için Cloud Shell'den:

Hâlâ IPython oturumunuzdaysanız kabuğa geri dönün: exit
Python sanal ortamını kullanmayı bırakın: deactivate
Sanal ortam klasörünüzü silin: cd ~ ; rm -rf ./venv-videointel

Google Cloud projenizi Cloud Shell'den silmek için:

Geçerli proje kimliğinizi alın: PROJECT_ID=$(gcloud config get-value core/project)
Silmek istediğiniz projenin bu proje olduğundan emin olun: echo $PROJECT_ID
Projeyi silin: gcloud projects delete $PROJECT_ID

Daha fazla bilgi

Demoyu tarayıcınızda test edin: https://zackakil.github.io/video-intelligence-api-visualiser
Video Intelligence belgeleri: https://cloud.google.com/video-intelligence/docs
Beta özellikler: https://cloud.google.com/video-intelligence/docs/beta
Google Cloud'da Python: https://cloud.google.com/python
Python için Cloud İstemci Kitaplıkları: https://github.com/googleapis/google-cloud-python

Lisans

Bu çalışma, Creative Commons Attribution 2.0 Genel Amaçlı Lisans ile lisans altına alınmıştır.

Hata bildir