搭配 Python 使用 Natural Language API

還剩 8 分鐘

程式碼研究室簡介

上次更新時間：9月 11, 2023

作者：Laurent Picard

本頁面由 Cloud Translation API 翻譯而成。

1. 總覽

Natural Language API 能使用 Google 機器學習技術，從非結構化文字中擷取資訊。在這個教學課程中，您將專注於使用 Python 用戶端程式庫。

課程內容

如何設定環境
如何執行情緒分析
如何執行實體分析
如何執行語法分析
如何進行內容分類
如何執行文字審核

軟硬體需求

Google Cloud 專案
瀏覽器，例如 Chrome 或 Firefox
熟悉使用 Python

問卷調查

您會如何使用這個教學課程？

僅供閱讀閱讀並完成練習

您對 Python 的使用體驗有何評價？

新手中級還算容易

針對使用 Google Cloud 服務的經驗，您會給予什麼評價？

新手中級還算容易

2. 設定和需求

自修環境設定

登入 Google Cloud 控制台，建立新專案或重複使用現有專案。如果您還沒有 Gmail 或 Google Workspace 帳戶，請先建立帳戶。

「專案名稱」是這項專案參與者的顯示名稱。這是 Google API 未使用的字元字串。您可以隨時更新付款方式。
所有 Google Cloud 專案的專案 ID 均不得重複，而且設定後即無法變更。Cloud 控制台會自動產生一個不重複的字串。但通常是在乎它何在在大部分的程式碼研究室中，您必須參照專案 ID (通常為 PROJECT_ID)。如果您對產生的 ID 不滿意，可以隨機產生一個 ID。或者，您也可以自行嘗試，看看是否支援。在這個步驟後，這個名稱即無法變更，而且在專案期間內仍會保持有效。
資訊中的第三個值是專案編號，部分 API 會使用這個編號。如要進一步瞭解這三個值，請參閱說明文件。

接下來，您需要在 Cloud 控制台中啟用計費功能，才能使用 Cloud 資源/API。執行本程式碼研究室不會產生任何費用 (如果有的話)。如要關閉資源，以免產生本教學課程結束後產生的費用，您可以刪除自己建立的資源或刪除專案。新使用者符合 $300 美元免費試用計畫的資格。

啟動 Cloud Shell

雖然 Google Cloud 可以從筆記型電腦遠端操作，但在本程式碼研究室中，您將使用 Cloud Shell，這是一種在 Cloud 中執行的指令列環境。

啟用 Cloud Shell

在 Cloud 控制台中，按一下「啟用 Cloud Shell」圖示。

如果您是第一次啟動 Cloud Shell，系統會顯示中繼畫面，說明這項服務的內容。如果系統顯示中繼畫面，請按一下「繼續」。

佈建並連線至 Cloud Shell 只需幾分鐘的時間。

這個虛擬機器已載入所有必要的開發工具。提供永久的 5 GB 主目錄，而且在 Google Cloud 中運作，大幅提高網路效能和驗證能力。在本程式碼研究室中，您的大部分作業都可透過瀏覽器完成。

連線至 Cloud Shell 後，您應會發現自己通過驗證，且專案已設為您的專案 ID。

在 Cloud Shell 中執行下列指令，確認您已通過驗證：

gcloud auth list

指令輸出

 Credentialed Accounts
ACTIVE  ACCOUNT
*       <my_account>@<my_domain.com>

To set the active account, run:
    $ gcloud config set account `ACCOUNT`

在 Cloud Shell 中執行下列指令，確認 gcloud 指令知道您的專案：

gcloud config list project

指令輸出

[core]
project = <PROJECT_ID>

如果尚未設定，請使用下列指令進行設定：

gcloud config set project <PROJECT_ID>

指令輸出

Updated property [core/project].

3. 環境設定

在開始使用 Natural Language API 之前，請先在 Cloud Shell 中執行下列指令來啟用 API：

gcloud services enable language.googleapis.com

畫面應如下所示：

Operation "operations/..." finished successfully.

現在您可以使用 Natural Language API！

前往主目錄：

cd ~

建立 Python 虛擬環境來區隔依附元件：

virtualenv venv-language

啟用虛擬環境：

source venv-language/bin/activate

安裝 IPython、Pandas 和 Natural Language API 用戶端程式庫：

pip install ipython pandas tabulate google-cloud-language

畫面應如下所示：

...
Installing collected packages: ... pandas ... ipython ... google-cloud-language
Successfully installed ... google-cloud-language-2.11.0 ...

現在，您可以開始使用 Natural Language API 用戶端程式庫了！

在後續步驟中，您將使用名為 IPython 的互動式 Python 解譯器，此語言是在之前的步驟中安裝。在 Cloud Shell 中執行 ipython 即可啟動工作階段：

ipython

畫面應如下所示：

Python 3.9.2 (default, Feb 28 2021, 17:03:44)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.15.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]:

4. 情緒分析

情緒分析會檢查指定的文字內容，進而識別文字內容的主要情緒觀點，特別是判斷語句和文件層級所表達的情緒為正面、負面或中立。這會透過傳回 AnalyzeSentimentResponse 的 analyze_sentiment 方法執行。

將下列程式碼複製到您的 IPython 工作階段：

from google.cloud import language

def analyze_text_sentiment(text: str) -> language.AnalyzeSentimentResponse:
    client = language.LanguageServiceClient()
    document = language.Document(
        content=text,
        type_=language.Document.Type.PLAIN_TEXT,
    )
    return client.analyze_sentiment(document=document)

def show_text_sentiment(response: language.AnalyzeSentimentResponse):
    import pandas as pd

    columns = ["score", "sentence"]
    data = [(s.sentiment.score, s.text.content) for s in response.sentences]
    df_sentence = pd.DataFrame(columns=columns, data=data)

    sentiment = response.document_sentiment
    columns = ["score", "magnitude", "language"]
    data = [(sentiment.score, sentiment.magnitude, response.language)]
    df_document = pd.DataFrame(columns=columns, data=data)

    format_args = dict(index=False, tablefmt="presto", floatfmt="+.1f")
    print(f"At sentence level:\n{df_sentence.to_markdown(**format_args)}")
    print()
    print(f"At document level:\n{df_document.to_markdown(**format_args)}")

執行分析：

# Input
text = """
Python is a very readable language, which makes it easy to understand and maintain code.
It's simple, very flexible, easy to learn, and suitable for a wide variety of tasks.
One disadvantage is its speed: it's not as fast as some other programming languages.
"""

# Send a request to the API
analyze_sentiment_response = analyze_text_sentiment(text)

# Show the results
show_text_sentiment(analyze_sentiment_response)

輸出內容應如下所示：

At sentence level:
   score | sentence
---------+------------------------------------------------------------------------------------------
    +0.8 | Python is a very readable language, which makes it easy to understand and maintain code.
    +0.9 | It's simple, very flexible, easy to learn, and suitable for a wide variety of tasks.
    -0.4 | One disadvantage is its speed: it's not as fast as some other programming languages.

At document level:
   score |   magnitude | language
---------+-------------+------------
    +0.4 |        +2.2 | en

花一些時間測試自己的句子。

摘要

在這個步驟中，您可以對一串文字執行情緒分析！

5. 實體分析

實體分析會檢查指定文字中的已知實體 (公眾人物或地標等專有名詞)，並傳回這些實體的相關資訊。這會透過傳回 AnalyzeEntitiesResponse 的 analyze_entities 方法執行。

將下列程式碼複製到您的 IPython 工作階段：

from google.cloud import language

def analyze_text_entities(text: str) -> language.AnalyzeEntitiesResponse:
    client = language.LanguageServiceClient()
    document = language.Document(
        content=text,
        type_=language.Document.Type.PLAIN_TEXT,
    )
    return client.analyze_entities(document=document)

def show_text_entities(response: language.AnalyzeEntitiesResponse):
    import pandas as pd

    columns = ("name", "type", "salience", "mid", "wikipedia_url")
    data = (
        (
            entity.name,
            entity.type_.name,
            entity.salience,
            entity.metadata.get("mid", ""),
            entity.metadata.get("wikipedia_url", ""),
        )
        for entity in response.entities
    )
    df = pd.DataFrame(columns=columns, data=data)
    print(df.to_markdown(index=False, tablefmt="presto", floatfmt=".0%"))

執行分析：

# Input
text = """Guido van Rossum is best known as the creator of Python,
which he named after the Monty Python comedy troupe.
He was born in Haarlem, Netherlands.
"""

# Send a request to the API
analyze_entities_response = analyze_text_entities(text)

# Show the results
show_text_entities(analyze_entities_response)

輸出內容應如下所示：

 name             | type         |   salience | mid       | wikipedia_url
------------------+--------------+------------+-----------+-------------------------------------------------------------
 Guido van Rossum | PERSON       |        50% | /m/01h05c | https://en.wikipedia.org/wiki/Guido_van_Rossum
 Python           | ORGANIZATION |        38% | /m/05z1_  | https://en.wikipedia.org/wiki/Python_(programming_language)
 creator          | PERSON       |         5% |           |
 Monty Python     | PERSON       |         3% | /m/04sd0  | https://en.wikipedia.org/wiki/Monty_Python
 comedy troupe    | PERSON       |         2% |           |
 Haarlem          | LOCATION     |         1% | /m/0h095  | https://en.wikipedia.org/wiki/Haarlem
 Netherlands      | LOCATION     |         1% | /m/059j2  | https://en.wikipedia.org/wiki/Netherlands

花點時間測試自己提及其他實體的句子。

摘要

在這個步驟中，您能夠執行實體分析！

6. 語法分析

語法分析會擷取語言資訊，將指定的文字內容拆解為一系列的語句和符記 (一般是以字詞邊界為基礎)，並針對這些符記提供進一步的分析。這會透過傳回 AnalyzeSyntaxResponse 的 analyze_syntax 方法執行。

將下列程式碼複製到您的 IPython 工作階段：

from typing import Optional
from google.cloud import language

def analyze_text_syntax(text: str) -> language.AnalyzeSyntaxResponse:
    client = language.LanguageServiceClient()
    document = language.Document(
        content=text,
        type_=language.Document.Type.PLAIN_TEXT,
    )
    return client.analyze_syntax(document=document)

def get_token_info(token: Optional[language.Token]) -> list[str]:
    parts = [
        "tag",
        "aspect",
        "case",
        "form",
        "gender",
        "mood",
        "number",
        "person",
        "proper",
        "reciprocity",
        "tense",
        "voice",
    ]
    if not token:
        return ["token", "lemma"] + parts

    text = token.text.content
    lemma = token.lemma if token.lemma != token.text.content else ""
    info = [text, lemma]
    for part in parts:
        pos = token.part_of_speech
        info.append(getattr(pos, part).name if part in pos else "")

    return info

def show_text_syntax(response: language.AnalyzeSyntaxResponse):
    import pandas as pd

    tokens = len(response.tokens)
    sentences = len(response.sentences)
    columns = get_token_info(None)
    data = (get_token_info(token) for token in response.tokens)
    df = pd.DataFrame(columns=columns, data=data)
    # Remove empty columns
    empty_columns = [col for col in df if df[col].eq("").all()]
    df.drop(empty_columns, axis=1, inplace=True)

    print(f"Analyzed {tokens} token(s) from {sentences} sentence(s):")
    print(df.to_markdown(index=False, tablefmt="presto"))

執行分析：

# Input
text = """Guido van Rossum is best known as the creator of Python.
He was born in Haarlem, Netherlands.
"""

# Send a request to the API
analyze_syntax_response = analyze_text_syntax(text)

# Show the results
show_text_syntax(analyze_syntax_response)

輸出內容應如下所示：

Analyzed 20 token(s) from 2 sentence(s):
 token       | lemma   | tag   | case       | gender    | mood       | number   | person   | proper   | tense   | voice
-------------+---------+-------+------------+-----------+------------+----------+----------+----------+---------+---------
 Guido       |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |
 van         |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |
 Rossum      |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |
 is          | be      | VERB  |            |           | INDICATIVE | SINGULAR | THIRD    |          | PRESENT |
 best        | well    | ADV   |            |           |            |          |          |          |         |
 known       | know    | VERB  |            |           |            |          |          |          | PAST    |
 as          |         | ADP   |            |           |            |          |          |          |         |
 the         |         | DET   |            |           |            |          |          |          |         |
 creator     |         | NOUN  |            |           |            | SINGULAR |          |          |         |
 of          |         | ADP   |            |           |            |          |          |          |         |
 Python      |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |
 .           |         | PUNCT |            |           |            |          |          |          |         |
 He          |         | PRON  | NOMINATIVE | MASCULINE |            | SINGULAR | THIRD    |          |         |
 was         | be      | VERB  |            |           | INDICATIVE | SINGULAR | THIRD    |          | PAST    |
 born        | bear    | VERB  |            |           |            |          |          |          | PAST    | PASSIVE
 in          |         | ADP   |            |           |            |          |          |          |         |
 Haarlem     |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |
 ,           |         | PUNCT |            |           |            |          |          |          |         |
 Netherlands |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |
 .           |         | PUNCT |            |           |            |          |          |          |         |

花點時間測試你的句子與其他語法結構。

如果更深入查看回應洞察，還能找出權杖之間的關係。以下是這個範例的完整語法分析結果，其中包含線上 Natural Language 示範的螢幕截圖：

摘要

在這個步驟中，您能夠執行語法分析了！

7. 內容分類

內容分類會分析文件並傳回符合文件文字內容的類別清單。這會透過傳回 ClassifyTextResponse 的 classify_text 方法執行。

將下列程式碼複製到您的 IPython 工作階段：

from google.cloud import language

def classify_text(text: str) -> language.ClassifyTextResponse:
    client = language.LanguageServiceClient()
    document = language.Document(
        content=text,
        type_=language.Document.Type.PLAIN_TEXT,
    )
    return client.classify_text(document=document)

def show_text_classification(text: str, response: language.ClassifyTextResponse):
    import pandas as pd

    columns = ["category", "confidence"]
    data = ((category.name, category.confidence) for category in response.categories)
    df = pd.DataFrame(columns=columns, data=data)

    print(f"Text analyzed:\n{text}")
    print(df.to_markdown(index=False, tablefmt="presto", floatfmt=".0%"))

執行分析：

# Input
text = """Python is an interpreted, high-level, general-purpose programming language.
Created by Guido van Rossum and first released in 1991, Python's design philosophy
emphasizes code readability with its notable use of significant whitespace.
"""

# Send a request to the API
classify_text_response = classify_text(text)

# Show the results
show_text_classification(text, classify_text_response)

輸出內容應如下所示：

Text analyzed:
Python is an interpreted, high-level, general-purpose programming language.
Created by Guido van Rossum and first released in 1991, Python's design philosophy
emphasizes code readability with its notable use of significant whitespace.

 category                             |   confidence
--------------------------------------+--------------
 /Computers & Electronics/Programming |          99%
 /Science/Computer Science            |          99%

花點時間測試您與其他類別相關的句子。請注意，您必須提供至少 20 個符記 (字詞和標點符號) 的文字區塊 (文件)。

摘要

這個步驟可讓您進行內容分類！

8. 文字管理

文字審核功能採用 Google 最新的 PaLM 2 基礎模型，能找出各種有害內容，包括仇恨言論、霸凌和性騷擾。這會透過傳回 ModerateTextResponse 的 moderate_text 方法執行。

將下列程式碼複製到您的 IPython 工作階段：

from google.cloud import language

def moderate_text(text: str) -> language.ModerateTextResponse:
    client = language.LanguageServiceClient()
    document = language.Document(
        content=text,
        type_=language.Document.Type.PLAIN_TEXT,
    )
    return client.moderate_text(document=document)

def show_text_moderation(text: str, response: language.ModerateTextResponse):
    import pandas as pd

    def confidence(category: language.ClassificationCategory) -> float:
        return category.confidence

    columns = ["category", "confidence"]
    categories = sorted(response.moderation_categories, key=confidence, reverse=True)
    data = ((category.name, category.confidence) for category in categories)
    df = pd.DataFrame(columns=columns, data=data)

    print(f"Text analyzed:\n{text}")
    print(df.to_markdown(index=False, tablefmt="presto", floatfmt=".0%"))

執行分析：

# Input
text = """I have to read Ulysses by James Joyce.
I'm a little over halfway through and I hate it.
What a pile of garbage!
"""

# Send a request to the API
response = moderate_text(text)

# Show the results
show_text_moderation(text, response)

輸出內容應如下所示：

Text analyzed:
I have to read Ulysses by James Joyce.
I'm a little over halfway through and I hate it.
What a pile of garbage!

 category              |   confidence
-----------------------+--------------
 Toxic                 |          67%
 Insult                |          58%
 Profanity             |          53%
 Violent               |          48%
 Illicit Drugs         |          29%
 Religion & Belief     |          27%
 Politics              |          22%
 Death, Harm & Tragedy |          21%
 Finance               |          18%
 Derogatory            |          14%
 Firearms & Weapons    |          11%
 Health                |          10%
 Legal                 |          10%
 War & Conflict        |           7%
 Public Safety         |           5%
 Sexual                |           4%

花一些時間測試自己的句子。

摘要

在這個步驟中，您可以執行文字審核！

9. 恭喜！

您已學會如何透過 Python 使用 Natural Language API！

清除所用資源

如要清除開發環境，請透過 Cloud Shell 執行下列操作：

如果您目前仍在 IPython 工作階段，請返回殼層：exit
停止使用 Python 虛擬環境：deactivate
刪除虛擬環境資料夾：cd ~ ; rm -rf ./venv-language

如要刪除 Google Cloud 專案，請透過 Cloud Shell 進行：

擷取目前的專案 ID：PROJECT_ID=$(gcloud config get-value core/project)
請確認這是要刪除的專案：echo $PROJECT_ID
刪除專案：gcloud projects delete $PROJECT_ID

瞭解詳情

在瀏覽器中測試示範內容：https://cloud.google.com/natural-language#natural-language-api-demo
Natural Language 說明文件：https://cloud.google.com/natural-language/docs
在 Google Cloud 中使用 Python：https://cloud.google.com/python
Python 適用的 Cloud 用戶端程式庫：https://github.com/googleapis/google-cloud-python

授權

這項內容採用的是創用 CC 姓名標示 2.0 通用授權。

回報錯誤