1. 總覽
Natural Language API 能使用 Google 機器學習技術,從非結構化文字中擷取資訊。在這個教學課程中,您將專注於使用 Python 用戶端程式庫。
課程內容
- 如何設定環境
- 如何執行情緒分析
- 如何執行實體分析
- 如何執行語法分析
- 如何進行內容分類
- 如何執行文字審核
軟硬體需求
問卷調查
您會如何使用這個教學課程?
您對 Python 的使用體驗有何評價?
針對使用 Google Cloud 服務的經驗,您會給予什麼評價?
2. 設定和需求
自修環境設定
- 登入 Google Cloud 控制台,建立新專案或重複使用現有專案。如果您還沒有 Gmail 或 Google Workspace 帳戶,請先建立帳戶。
- 「專案名稱」是這項專案參與者的顯示名稱。這是 Google API 未使用的字元字串。您可以隨時更新付款方式。
- 所有 Google Cloud 專案的專案 ID 均不得重複,而且設定後即無法變更。Cloud 控制台會自動產生一個不重複的字串。但通常是在乎它何在在大部分的程式碼研究室中,您必須參照專案 ID (通常為
PROJECT_ID
)。如果您對產生的 ID 不滿意,可以隨機產生一個 ID。或者,您也可以自行嘗試,看看是否支援。在這個步驟後,這個名稱即無法變更,而且在專案期間內仍會保持有效。 - 資訊中的第三個值是專案編號,部分 API 會使用這個編號。如要進一步瞭解這三個值,請參閱說明文件。
- 接下來,您需要在 Cloud 控制台中啟用計費功能,才能使用 Cloud 資源/API。執行本程式碼研究室不會產生任何費用 (如果有的話)。如要關閉資源,以免產生本教學課程結束後產生的費用,您可以刪除自己建立的資源或刪除專案。新使用者符合 $300 美元免費試用計畫的資格。
啟動 Cloud Shell
雖然 Google Cloud 可以從筆記型電腦遠端操作,但在本程式碼研究室中,您將使用 Cloud Shell,這是一種在 Cloud 中執行的指令列環境。
啟用 Cloud Shell
- 在 Cloud 控制台中,按一下「啟用 Cloud Shell」圖示 。
如果您是第一次啟動 Cloud Shell,系統會顯示中繼畫面,說明這項服務的內容。如果系統顯示中繼畫面,請按一下「繼續」。
佈建並連線至 Cloud Shell 只需幾分鐘的時間。
這個虛擬機器已載入所有必要的開發工具。提供永久的 5 GB 主目錄,而且在 Google Cloud 中運作,大幅提高網路效能和驗證能力。在本程式碼研究室中,您的大部分作業都可透過瀏覽器完成。
連線至 Cloud Shell 後,您應會發現自己通過驗證,且專案已設為您的專案 ID。
- 在 Cloud Shell 中執行下列指令,確認您已通過驗證:
gcloud auth list
指令輸出
Credentialed Accounts ACTIVE ACCOUNT * <my_account>@<my_domain.com> To set the active account, run: $ gcloud config set account `ACCOUNT`
- 在 Cloud Shell 中執行下列指令,確認 gcloud 指令知道您的專案:
gcloud config list project
指令輸出
[core] project = <PROJECT_ID>
如果尚未設定,請使用下列指令進行設定:
gcloud config set project <PROJECT_ID>
指令輸出
Updated property [core/project].
3. 環境設定
在開始使用 Natural Language API 之前,請先在 Cloud Shell 中執行下列指令來啟用 API:
gcloud services enable language.googleapis.com
畫面應如下所示:
Operation "operations/..." finished successfully.
現在您可以使用 Natural Language API!
前往主目錄:
cd ~
建立 Python 虛擬環境來區隔依附元件:
virtualenv venv-language
啟用虛擬環境:
source venv-language/bin/activate
安裝 IPython、Pandas 和 Natural Language API 用戶端程式庫:
pip install ipython pandas tabulate google-cloud-language
畫面應如下所示:
... Installing collected packages: ... pandas ... ipython ... google-cloud-language Successfully installed ... google-cloud-language-2.11.0 ...
現在,您可以開始使用 Natural Language API 用戶端程式庫了!
在後續步驟中,您將使用名為 IPython 的互動式 Python 解譯器,此語言是在之前的步驟中安裝。在 Cloud Shell 中執行 ipython
即可啟動工作階段:
ipython
畫面應如下所示:
Python 3.9.2 (default, Feb 28 2021, 17:03:44) Type 'copyright', 'credits' or 'license' for more information IPython 8.15.0 -- An enhanced Interactive Python. Type '?' for help. In [1]:
4. 情緒分析
情緒分析會檢查指定的文字內容,進而識別文字內容的主要情緒觀點,特別是判斷語句和文件層級所表達的情緒為正面、負面或中立。這會透過傳回 AnalyzeSentimentResponse
的 analyze_sentiment
方法執行。
將下列程式碼複製到您的 IPython 工作階段:
from google.cloud import language
def analyze_text_sentiment(text: str) -> language.AnalyzeSentimentResponse:
client = language.LanguageServiceClient()
document = language.Document(
content=text,
type_=language.Document.Type.PLAIN_TEXT,
)
return client.analyze_sentiment(document=document)
def show_text_sentiment(response: language.AnalyzeSentimentResponse):
import pandas as pd
columns = ["score", "sentence"]
data = [(s.sentiment.score, s.text.content) for s in response.sentences]
df_sentence = pd.DataFrame(columns=columns, data=data)
sentiment = response.document_sentiment
columns = ["score", "magnitude", "language"]
data = [(sentiment.score, sentiment.magnitude, response.language)]
df_document = pd.DataFrame(columns=columns, data=data)
format_args = dict(index=False, tablefmt="presto", floatfmt="+.1f")
print(f"At sentence level:\n{df_sentence.to_markdown(**format_args)}")
print()
print(f"At document level:\n{df_document.to_markdown(**format_args)}")
執行分析:
# Input
text = """
Python is a very readable language, which makes it easy to understand and maintain code.
It's simple, very flexible, easy to learn, and suitable for a wide variety of tasks.
One disadvantage is its speed: it's not as fast as some other programming languages.
"""
# Send a request to the API
analyze_sentiment_response = analyze_text_sentiment(text)
# Show the results
show_text_sentiment(analyze_sentiment_response)
輸出內容應如下所示:
At sentence level: score | sentence ---------+------------------------------------------------------------------------------------------ +0.8 | Python is a very readable language, which makes it easy to understand and maintain code. +0.9 | It's simple, very flexible, easy to learn, and suitable for a wide variety of tasks. -0.4 | One disadvantage is its speed: it's not as fast as some other programming languages. At document level: score | magnitude | language ---------+-------------+------------ +0.4 | +2.2 | en
花一些時間測試自己的句子。
摘要
在這個步驟中,您可以對一串文字執行情緒分析!
5. 實體分析
實體分析會檢查指定文字中的已知實體 (公眾人物或地標等專有名詞),並傳回這些實體的相關資訊。這會透過傳回 AnalyzeEntitiesResponse
的 analyze_entities
方法執行。
將下列程式碼複製到您的 IPython 工作階段:
from google.cloud import language
def analyze_text_entities(text: str) -> language.AnalyzeEntitiesResponse:
client = language.LanguageServiceClient()
document = language.Document(
content=text,
type_=language.Document.Type.PLAIN_TEXT,
)
return client.analyze_entities(document=document)
def show_text_entities(response: language.AnalyzeEntitiesResponse):
import pandas as pd
columns = ("name", "type", "salience", "mid", "wikipedia_url")
data = (
(
entity.name,
entity.type_.name,
entity.salience,
entity.metadata.get("mid", ""),
entity.metadata.get("wikipedia_url", ""),
)
for entity in response.entities
)
df = pd.DataFrame(columns=columns, data=data)
print(df.to_markdown(index=False, tablefmt="presto", floatfmt=".0%"))
執行分析:
# Input
text = """Guido van Rossum is best known as the creator of Python,
which he named after the Monty Python comedy troupe.
He was born in Haarlem, Netherlands.
"""
# Send a request to the API
analyze_entities_response = analyze_text_entities(text)
# Show the results
show_text_entities(analyze_entities_response)
輸出內容應如下所示:
name | type | salience | mid | wikipedia_url ------------------+--------------+------------+-----------+------------------------------------------------------------- Guido van Rossum | PERSON | 50% | /m/01h05c | https://en.wikipedia.org/wiki/Guido_van_Rossum Python | ORGANIZATION | 38% | /m/05z1_ | https://en.wikipedia.org/wiki/Python_(programming_language) creator | PERSON | 5% | | Monty Python | PERSON | 3% | /m/04sd0 | https://en.wikipedia.org/wiki/Monty_Python comedy troupe | PERSON | 2% | | Haarlem | LOCATION | 1% | /m/0h095 | https://en.wikipedia.org/wiki/Haarlem Netherlands | LOCATION | 1% | /m/059j2 | https://en.wikipedia.org/wiki/Netherlands
花點時間測試自己提及其他實體的句子。
摘要
在這個步驟中,您能夠執行實體分析!
6. 語法分析
語法分析會擷取語言資訊,將指定的文字內容拆解為一系列的語句和符記 (一般是以字詞邊界為基礎),並針對這些符記提供進一步的分析。這會透過傳回 AnalyzeSyntaxResponse
的 analyze_syntax
方法執行。
將下列程式碼複製到您的 IPython 工作階段:
from typing import Optional
from google.cloud import language
def analyze_text_syntax(text: str) -> language.AnalyzeSyntaxResponse:
client = language.LanguageServiceClient()
document = language.Document(
content=text,
type_=language.Document.Type.PLAIN_TEXT,
)
return client.analyze_syntax(document=document)
def get_token_info(token: Optional[language.Token]) -> list[str]:
parts = [
"tag",
"aspect",
"case",
"form",
"gender",
"mood",
"number",
"person",
"proper",
"reciprocity",
"tense",
"voice",
]
if not token:
return ["token", "lemma"] + parts
text = token.text.content
lemma = token.lemma if token.lemma != token.text.content else ""
info = [text, lemma]
for part in parts:
pos = token.part_of_speech
info.append(getattr(pos, part).name if part in pos else "")
return info
def show_text_syntax(response: language.AnalyzeSyntaxResponse):
import pandas as pd
tokens = len(response.tokens)
sentences = len(response.sentences)
columns = get_token_info(None)
data = (get_token_info(token) for token in response.tokens)
df = pd.DataFrame(columns=columns, data=data)
# Remove empty columns
empty_columns = [col for col in df if df[col].eq("").all()]
df.drop(empty_columns, axis=1, inplace=True)
print(f"Analyzed {tokens} token(s) from {sentences} sentence(s):")
print(df.to_markdown(index=False, tablefmt="presto"))
執行分析:
# Input
text = """Guido van Rossum is best known as the creator of Python.
He was born in Haarlem, Netherlands.
"""
# Send a request to the API
analyze_syntax_response = analyze_text_syntax(text)
# Show the results
show_text_syntax(analyze_syntax_response)
輸出內容應如下所示:
Analyzed 20 token(s) from 2 sentence(s): token | lemma | tag | case | gender | mood | number | person | proper | tense | voice -------------+---------+-------+------------+-----------+------------+----------+----------+----------+---------+--------- Guido | | NOUN | | | | SINGULAR | | PROPER | | van | | NOUN | | | | SINGULAR | | PROPER | | Rossum | | NOUN | | | | SINGULAR | | PROPER | | is | be | VERB | | | INDICATIVE | SINGULAR | THIRD | | PRESENT | best | well | ADV | | | | | | | | known | know | VERB | | | | | | | PAST | as | | ADP | | | | | | | | the | | DET | | | | | | | | creator | | NOUN | | | | SINGULAR | | | | of | | ADP | | | | | | | | Python | | NOUN | | | | SINGULAR | | PROPER | | . | | PUNCT | | | | | | | | He | | PRON | NOMINATIVE | MASCULINE | | SINGULAR | THIRD | | | was | be | VERB | | | INDICATIVE | SINGULAR | THIRD | | PAST | born | bear | VERB | | | | | | | PAST | PASSIVE in | | ADP | | | | | | | | Haarlem | | NOUN | | | | SINGULAR | | PROPER | | , | | PUNCT | | | | | | | | Netherlands | | NOUN | | | | SINGULAR | | PROPER | | . | | PUNCT | | | | | | | |
花點時間測試你的句子與其他語法結構。
如果更深入查看回應洞察,還能找出權杖之間的關係。以下是這個範例的完整語法分析結果,其中包含線上 Natural Language 示範的螢幕截圖:
摘要
在這個步驟中,您能夠執行語法分析了!
7. 內容分類
內容分類會分析文件並傳回符合文件文字內容的類別清單。這會透過傳回 ClassifyTextResponse
的 classify_text
方法執行。
將下列程式碼複製到您的 IPython 工作階段:
from google.cloud import language
def classify_text(text: str) -> language.ClassifyTextResponse:
client = language.LanguageServiceClient()
document = language.Document(
content=text,
type_=language.Document.Type.PLAIN_TEXT,
)
return client.classify_text(document=document)
def show_text_classification(text: str, response: language.ClassifyTextResponse):
import pandas as pd
columns = ["category", "confidence"]
data = ((category.name, category.confidence) for category in response.categories)
df = pd.DataFrame(columns=columns, data=data)
print(f"Text analyzed:\n{text}")
print(df.to_markdown(index=False, tablefmt="presto", floatfmt=".0%"))
執行分析:
# Input
text = """Python is an interpreted, high-level, general-purpose programming language.
Created by Guido van Rossum and first released in 1991, Python's design philosophy
emphasizes code readability with its notable use of significant whitespace.
"""
# Send a request to the API
classify_text_response = classify_text(text)
# Show the results
show_text_classification(text, classify_text_response)
輸出內容應如下所示:
Text analyzed: Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. category | confidence --------------------------------------+-------------- /Computers & Electronics/Programming | 99% /Science/Computer Science | 99%
花點時間測試您與其他類別相關的句子。請注意,您必須提供至少 20 個符記 (字詞和標點符號) 的文字區塊 (文件)。
摘要
這個步驟可讓您進行內容分類!
8. 文字管理
文字審核功能採用 Google 最新的 PaLM 2 基礎模型,能找出各種有害內容,包括仇恨言論、霸凌和性騷擾。這會透過傳回 ModerateTextResponse
的 moderate_text
方法執行。
將下列程式碼複製到您的 IPython 工作階段:
from google.cloud import language
def moderate_text(text: str) -> language.ModerateTextResponse:
client = language.LanguageServiceClient()
document = language.Document(
content=text,
type_=language.Document.Type.PLAIN_TEXT,
)
return client.moderate_text(document=document)
def show_text_moderation(text: str, response: language.ModerateTextResponse):
import pandas as pd
def confidence(category: language.ClassificationCategory) -> float:
return category.confidence
columns = ["category", "confidence"]
categories = sorted(response.moderation_categories, key=confidence, reverse=True)
data = ((category.name, category.confidence) for category in categories)
df = pd.DataFrame(columns=columns, data=data)
print(f"Text analyzed:\n{text}")
print(df.to_markdown(index=False, tablefmt="presto", floatfmt=".0%"))
執行分析:
# Input
text = """I have to read Ulysses by James Joyce.
I'm a little over halfway through and I hate it.
What a pile of garbage!
"""
# Send a request to the API
response = moderate_text(text)
# Show the results
show_text_moderation(text, response)
輸出內容應如下所示:
Text analyzed: I have to read Ulysses by James Joyce. I'm a little over halfway through and I hate it. What a pile of garbage! category | confidence -----------------------+-------------- Toxic | 67% Insult | 58% Profanity | 53% Violent | 48% Illicit Drugs | 29% Religion & Belief | 27% Politics | 22% Death, Harm & Tragedy | 21% Finance | 18% Derogatory | 14% Firearms & Weapons | 11% Health | 10% Legal | 10% War & Conflict | 7% Public Safety | 5% Sexual | 4%
花一些時間測試自己的句子。
摘要
在這個步驟中,您可以執行文字審核!
9. 恭喜!
您已學會如何透過 Python 使用 Natural Language API!
清除所用資源
如要清除開發環境,請透過 Cloud Shell 執行下列操作:
- 如果您目前仍在 IPython 工作階段,請返回殼層:
exit
- 停止使用 Python 虛擬環境:
deactivate
- 刪除虛擬環境資料夾:
cd ~ ; rm -rf ./venv-language
如要刪除 Google Cloud 專案,請透過 Cloud Shell 進行:
- 擷取目前的專案 ID:
PROJECT_ID=$(gcloud config get-value core/project)
- 請確認這是要刪除的專案:
echo $PROJECT_ID
- 刪除專案:
gcloud projects delete $PROJECT_ID
瞭解詳情
- 在瀏覽器中測試示範內容:https://cloud.google.com/natural-language#natural-language-api-demo
- Natural Language 說明文件:https://cloud.google.com/natural-language/docs
- 在 Google Cloud 中使用 Python:https://cloud.google.com/python
- Python 適用的 Cloud 用戶端程式庫:https://github.com/googleapis/google-cloud-python
授權
這項內容採用的是創用 CC 姓名標示 2.0 通用授權。