หน้านี้ได้รับการแปลโดย Cloud Translation API

การใช้ Natural Language API กับ Python

1. ภาพรวม

Natural Language API ช่วยให้คุณแยกข้อมูลจากข้อความที่ไม่มีโครงสร้างได้โดยใช้แมชชีนเลิร์นนิงของ Google ในบทแนะนำนี้ คุณจะมุ่งเน้นที่การใช้ไลบรารีของไคลเอ็นต์ Python

สิ่งที่คุณจะได้เรียนรู้

วิธีตั้งค่าสภาพแวดล้อม
วิธีวิเคราะห์ความเห็น
วิธีวิเคราะห์เอนทิตี
วิธีวิเคราะห์ไวยากรณ์
วิธีจัดประเภทเนื้อหา
วิธีดูแลข้อความ

สิ่งที่คุณต้องมี

โปรเจ็กต์ Google Cloud
เบราว์เซอร์ เช่น Chrome หรือ Firefox
ความคุ้นเคยกับการใช้ Python

แบบสำรวจ

คุณจะใช้บทแนะนำนี้อย่างไร

อ่านเท่านั้น

อ่านและทำแบบฝึกหัด

คุณจะให้คะแนนประสบการณ์การใช้งาน Python อย่างไร

มือใหม่

ระดับกลาง

ผู้ชำนาญ

คุณจะให้คะแนนประสบการณ์การใช้งานบริการ Google Cloud อย่างไร

มือใหม่

ระดับกลาง

ผู้ชำนาญ

2. การตั้งค่าและข้อกำหนด

การตั้งค่าสภาพแวดล้อมตามเวลาที่สะดวก

ลงชื่อเข้าใช้ Google Cloud Console และสร้างโปรเจ็กต์ใหม่หรือใช้โปรเจ็กต์ที่มีอยู่ซ้ำ หากยังไม่มีบัญชี Gmail หรือ Google Workspace คุณต้องสร้างบัญชี

ชื่อโครงการคือชื่อที่แสดงของผู้เข้าร่วมโปรเจ็กต์นี้ เป็นสตริงอักขระที่ Google APIs ไม่ได้ใช้ โดยคุณจะอัปเดตวิธีการชำระเงินได้ทุกเมื่อ
รหัสโปรเจ็กต์จะไม่ซ้ำกันในทุกโปรเจ็กต์ของ Google Cloud และจะเปลี่ยนแปลงไม่ได้ (เปลี่ยนแปลงไม่ได้หลังจากตั้งค่าแล้ว) Cloud Console จะสร้างสตริงที่ไม่ซ้ำกันโดยอัตโนมัติ ปกติแล้วคุณไม่สนว่าอะไรเป็นอะไร ใน Codelab ส่วนใหญ่ คุณจะต้องอ้างอิงรหัสโปรเจ็กต์ (โดยปกติจะระบุเป็น PROJECT_ID) หากคุณไม่ชอบรหัสที่สร้างขึ้น คุณสามารถสร้างรหัสแบบสุ่มอื่นได้ หรือคุณจะลองดำเนินการเองแล้วดูว่าพร้อมให้ใช้งานหรือไม่ คุณจะเปลี่ยนแปลงหลังจากขั้นตอนนี้ไม่ได้และจะยังคงอยู่ตลอดระยะเวลาของโปรเจ็กต์
สำหรับข้อมูลของคุณ ค่าที่ 3 คือหมายเลขโปรเจ็กต์ ซึ่ง API บางตัวใช้ ดูข้อมูลเพิ่มเติมเกี่ยวกับค่าทั้ง 3 ค่าได้ในเอกสารประกอบ

ถัดไป คุณจะต้องเปิดใช้การเรียกเก็บเงินใน Cloud Console เพื่อใช้ทรัพยากร/API ของระบบคลาวด์ การใช้งาน Codelab นี้จะไม่มีค่าใช้จ่ายใดๆ หากมี หากต้องการปิดทรัพยากรเพื่อหลีกเลี่ยงการเรียกเก็บเงินที่นอกเหนือจากบทแนะนำนี้ คุณสามารถลบทรัพยากรที่คุณสร้างหรือลบโปรเจ็กต์ได้ ผู้ใช้ Google Cloud ใหม่มีสิทธิ์เข้าร่วมโปรแกรมช่วงทดลองใช้ฟรี$300 USD

เริ่มต้น Cloud Shell

แม้ว่าคุณจะดำเนินการ Google Cloud จากระยะไกลได้จากแล็ปท็อป แต่คุณจะใช้ Cloud Shell ใน Codelab ซึ่งเป็นสภาพแวดล้อมบรรทัดคำสั่งที่ทำงานในระบบคลาวด์

เปิดใช้งาน Cloud Shell

คลิกเปิดใช้งาน Cloud Shell จาก Cloud Console

หากเริ่มต้นใช้งาน Cloud Shell เป็นครั้งแรก คุณจะเห็นหน้าจอตรงกลางที่อธิบายว่านี่คืออะไร หากระบบแสดงหน้าจอตรงกลาง ให้คลิกต่อไป

การจัดสรรและเชื่อมต่อกับ Cloud Shell ใช้เวลาเพียงไม่กี่นาที

เครื่องเสมือนนี้โหลดด้วยเครื่องมือการพัฒนาทั้งหมดที่จำเป็น โดยมีไดเรกทอรีหลักขนาด 5 GB ถาวรและทำงานใน Google Cloud ซึ่งช่วยเพิ่มประสิทธิภาพของเครือข่ายและการตรวจสอบสิทธิ์ได้อย่างมาก งานส่วนใหญ่ใน Codelab นี้สามารถทำได้โดยใช้เบราว์เซอร์

เมื่อเชื่อมต่อกับ Cloud Shell แล้ว คุณควรเห็นข้อความตรวจสอบสิทธิ์และโปรเจ็กต์ได้รับการตั้งค่าเป็นรหัสโปรเจ็กต์แล้ว

เรียกใช้คำสั่งต่อไปนี้ใน Cloud Shell เพื่อยืนยันว่าคุณได้รับการตรวจสอบสิทธิ์แล้ว

gcloud auth list

เอาต์พุตจากคำสั่ง

 Credentialed Accounts
ACTIVE  ACCOUNT
*       <my_account>@<my_domain.com>

To set the active account, run:
    $ gcloud config set account `ACCOUNT`

หมายเหตุ: เครื่องมือบรรทัดคำสั่ง gcloud เป็นเครื่องมือบรรทัดคำสั่งแบบรวมที่มีประสิทธิภาพใน Google Cloud โดยจะติดตั้งมาล่วงหน้าใน Cloud Shell คุณจะเห็นการรองรับการใช้งานแท็บเสร็จสมบูรณ์ คุณอาจได้รับแจ้งให้ตรวจสอบสิทธิ์ในครั้งแรกที่เรียกใช้คำสั่ง โปรดดูข้อมูลเพิ่มเติมที่หัวข้อภาพรวมของเครื่องมือบรรทัดคำสั่ง gcloud

เรียกใช้คำสั่งต่อไปนี้ใน Cloud Shell เพื่อยืนยันว่าคำสั่ง gcloud รู้เกี่ยวกับโปรเจ็กต์ของคุณ

gcloud config list project

เอาต์พุตจากคำสั่ง

[core]
project = <PROJECT_ID>

หากไม่ใช่ ให้ตั้งคำสั่งด้วยคำสั่งนี้

gcloud config set project <PROJECT_ID>

เอาต์พุตจากคำสั่ง

Updated property [core/project].

3. การตั้งค่าสภาพแวดล้อม

ก่อนที่จะเริ่มใช้ Natural Language API ให้เรียกใช้คำสั่งต่อไปนี้ใน Cloud Shell เพื่อเปิดใช้ API

gcloud services enable language.googleapis.com

คุณควรจะเห็นบางสิ่งเช่นนี้:

Operation "operations/..." finished successfully.

คุณใช้ Natural Language API ได้แล้ว

ไปที่ไดเรกทอรีหน้าแรก

cd ~

สร้างสภาพแวดล้อมเสมือนของ Python เพื่อแยกทรัพยากร Dependency ต่อไปนี้

virtualenv venv-language

เปิดใช้งานสภาพแวดล้อมเสมือน

source venv-language/bin/activate

ติดตั้ง IPython, Pandas และไลบรารีของไคลเอ็นต์ Natural Language API

pip install ipython pandas tabulate google-cloud-language

คุณควรจะเห็นบางสิ่งเช่นนี้:

...
Installing collected packages: ... pandas ... ipython ... google-cloud-language
Successfully installed ... google-cloud-language-2.11.0 ...

ตอนนี้คุณพร้อมที่จะใช้ไลบรารีของไคลเอ็นต์ Natural Language API แล้ว

ในขั้นตอนถัดไป คุณจะต้องใช้ล่าม Python แบบอินเทอร์แอกทีฟที่ชื่อ IPython ซึ่งคุณติดตั้งไว้ก่อนหน้านี้ เริ่มเซสชันโดยการเรียกใช้ ipython ใน Cloud Shell:

ipython

คุณควรจะเห็นบางสิ่งเช่นนี้:

Python 3.9.2 (default, Feb 28 2021, 17:03:44)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.15.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]:

4. การวิเคราะห์ความเห็น

การวิเคราะห์ความเห็นจะตรวจสอบข้อความที่ระบุและระบุความคิดเห็นทางอารมณ์ความรู้สึกที่มีต่อข้อความ โดยเฉพาะอย่างยิ่งเพื่อพิจารณาความเห็นที่แสดงออกว่าเป็นเชิงบวก เชิงลบ หรือกลาง ทั้งในระดับประโยคและระดับเอกสาร ซึ่งดำเนินการกับเมธอด analyze_sentiment ซึ่งแสดงผล AnalyzeSentimentResponse

คัดลอกโค้ดต่อไปนี้ลงในเซสชัน IPython

from google.cloud import language

def analyze_text_sentiment(text: str) -> language.AnalyzeSentimentResponse:
    client = language.LanguageServiceClient()
    document = language.Document(
        content=text,
        type_=language.Document.Type.PLAIN_TEXT,
    )
    return client.analyze_sentiment(document=document)

def show_text_sentiment(response: language.AnalyzeSentimentResponse):
    import pandas as pd

    columns = ["score", "sentence"]
    data = [(s.sentiment.score, s.text.content) for s in response.sentences]
    df_sentence = pd.DataFrame(columns=columns, data=data)

    sentiment = response.document_sentiment
    columns = ["score", "magnitude", "language"]
    data = [(sentiment.score, sentiment.magnitude, response.language)]
    df_document = pd.DataFrame(columns=columns, data=data)

    format_args = dict(index=False, tablefmt="presto", floatfmt="+.1f")
    print(f"At sentence level:\n{df_sentence.to_markdown(**format_args)}")
    print()
    print(f"At document level:\n{df_document.to_markdown(**format_args)}")

ดำเนินการวิเคราะห์ ดังนี้

# Input
text = """
Python is a very readable language, which makes it easy to understand and maintain code.
It's simple, very flexible, easy to learn, and suitable for a wide variety of tasks.
One disadvantage is its speed: it's not as fast as some other programming languages.
"""

# Send a request to the API
analyze_sentiment_response = analyze_text_sentiment(text)

# Show the results
show_text_sentiment(analyze_sentiment_response)

คุณควรเห็นผลลัพธ์ดังต่อไปนี้

At sentence level:
   score | sentence
---------+------------------------------------------------------------------------------------------
    +0.8 | Python is a very readable language, which makes it easy to understand and maintain code.
    +0.9 | It's simple, very flexible, easy to learn, and suitable for a wide variety of tasks.
    -0.4 | One disadvantage is its speed: it's not as fast as some other programming languages.

At document level:
   score |   magnitude | language
---------+-------------+------------
    +0.4 |        +2.2 | en

ใช้เวลาสักครู่เพื่อทดสอบประโยคของคุณเอง

โปรดดูข้อมูลเกี่ยวกับภาษาที่ Natural Language API รองรับได้ที่การรองรับภาษา
score ของช่วงความเชื่อมั่นระหว่าง -1.0 (เชิงลบ) และ +1.0 (เชิงบวก) และสอดคล้องกับความพึงพอใจโดยรวมจากข้อมูลที่ระบุ
magnitude ของความเห็นมีค่าตั้งแต่ 0.0 ถึง +inf และบ่งชี้ถึงระดับความเชื่อมั่นโดยรวมของความพึงพอใจจากข้อมูลที่ระบุ ยิ่งให้ข้อมูลเพิ่มเติมมากเท่าใด ก็จะยิ่งมีจำนวนสูงเท่านั้น
ดูข้อมูลเพิ่มเติมเกี่ยวกับวิธีตีความค่าความเห็น score และ magnitude ที่รวมอยู่ในการวิเคราะห์ได้ที่การตีความค่าการวิเคราะห์ความเห็น
การตอบกลับแต่ละรายการของ API จะแสดงผลเอกสารที่ตรวจพบโดยอัตโนมัติ (ใน ISO-639-1) ซึ่งจะแสดงที่นี่และข้ามในตัวอย่างการวิเคราะห์ถัดไป

สรุป

ในขั้นตอนนี้ คุณสามารถวิเคราะห์ความเห็นในสตริงข้อความได้

5. การวิเคราะห์เอนทิตี

การวิเคราะห์เอนทิตีจะตรวจสอบข้อความที่ระบุสำหรับเอนทิตีที่รู้จัก (คำนามที่เหมาะสม เช่น บุคคลสาธารณะ จุดสังเกต ฯลฯ) และแสดงผลข้อมูลเกี่ยวกับเอนทิตีเหล่านั้น ซึ่งดำเนินการกับเมธอด analyze_entities ซึ่งแสดงผล AnalyzeEntitiesResponse

คัดลอกโค้ดต่อไปนี้ลงในเซสชัน IPython

from google.cloud import language

def analyze_text_entities(text: str) -> language.AnalyzeEntitiesResponse:
    client = language.LanguageServiceClient()
    document = language.Document(
        content=text,
        type_=language.Document.Type.PLAIN_TEXT,
    )
    return client.analyze_entities(document=document)

def show_text_entities(response: language.AnalyzeEntitiesResponse):
    import pandas as pd

    columns = ("name", "type", "salience", "mid", "wikipedia_url")
    data = (
        (
            entity.name,
            entity.type_.name,
            entity.salience,
            entity.metadata.get("mid", ""),
            entity.metadata.get("wikipedia_url", ""),
        )
        for entity in response.entities
    )
    df = pd.DataFrame(columns=columns, data=data)
    print(df.to_markdown(index=False, tablefmt="presto", floatfmt=".0%"))

ดำเนินการวิเคราะห์ ดังนี้

# Input
text = """Guido van Rossum is best known as the creator of Python,
which he named after the Monty Python comedy troupe.
He was born in Haarlem, Netherlands.
"""

# Send a request to the API
analyze_entities_response = analyze_text_entities(text)

# Show the results
show_text_entities(analyze_entities_response)

คุณควรเห็นผลลัพธ์ดังต่อไปนี้

 name             | type         |   salience | mid       | wikipedia_url
------------------+--------------+------------+-----------+-------------------------------------------------------------
 Guido van Rossum | PERSON       |        50% | /m/01h05c | https://en.wikipedia.org/wiki/Guido_van_Rossum
 Python           | ORGANIZATION |        38% | /m/05z1_  | https://en.wikipedia.org/wiki/Python_(programming_language)
 creator          | PERSON       |         5% |           |
 Monty Python     | PERSON       |         3% | /m/04sd0  | https://en.wikipedia.org/wiki/Monty_Python
 comedy troupe    | PERSON       |         2% |           |
 Haarlem          | LOCATION     |         1% | /m/0h095  | https://en.wikipedia.org/wiki/Haarlem
 Netherlands      | LOCATION     |         1% | /m/059j2  | https://en.wikipedia.org/wiki/Netherlands

ใช้เวลาสักครู่เพื่อทดสอบประโยคของคุณเองที่พูดถึงสิ่งอื่นๆ

สำหรับข้อมูลเกี่ยวกับภาษาที่วิธีการนี้สนับสนุน โปรดดูที่การสนับสนุนภาษา
type ของเอนทิตีคือ Enum ที่ให้คุณแยกประเภทหรือแยกความแตกต่างของเอนทิตีได้ ตัวอย่างเช่น วิธีนี้ช่วยแยกความแตกต่างของเอนทิตีที่มีชื่อคล้ายกัน "T.E. ลุลา" (a PERSON) จาก "กฎหมายอาระเบีย" (ภาพยนตร์ที่ติดแท็ก WORK_OF_ART) โปรดดู Entity.Type
เอนทิตี salience จะระบุความสำคัญหรือความเกี่ยวข้องของเอนทิตีนี้กับข้อความในเอกสารทั้งหมด คะแนนนี้จะช่วยดึงข้อมูลและสรุปข้อมูลโดยการจัดลำดับความสำคัญให้กับส่วนสำคัญ คะแนนที่ใกล้เคียงกับ 0.0 ถือว่าสำคัญน้อยกว่า ในขณะที่คะแนนที่อยู่ใกล้กับ 1.0 ถือว่ามีความสำคัญสูง
ดูข้อมูลเพิ่มเติมได้ที่การวิเคราะห์เอนทิตี
นอกจากนี้คุณยังรวมทั้งการวิเคราะห์เอนทิตีและการวิเคราะห์ความเห็นเข้ากับเมธอด analyze_entity_sentiment ได้ด้วย ดูการวิเคราะห์ความเห็นเกี่ยวกับเอนทิตี

สรุป

ในขั้นตอนนี้ คุณสามารถทำการวิเคราะห์เอนทิตีได้แล้ว

6. การวิเคราะห์ไวยากรณ์

การวิเคราะห์ไวยากรณ์จะดึงข้อมูลภาษาต่างๆ มาแตกข้อความที่ระบุเป็นชุดประโยคและโทเค็น (โดยทั่วไปจะอิงตามขอบเขตของคำ) ซึ่งช่วยให้วิเคราะห์โทเค็นเหล่านั้นเพิ่มเติมได้ ซึ่งดำเนินการกับเมธอด analyze_syntax ซึ่งแสดงผล AnalyzeSyntaxResponse

คัดลอกโค้ดต่อไปนี้ลงในเซสชัน IPython

from typing import Optional
from google.cloud import language

def analyze_text_syntax(text: str) -> language.AnalyzeSyntaxResponse:
    client = language.LanguageServiceClient()
    document = language.Document(
        content=text,
        type_=language.Document.Type.PLAIN_TEXT,
    )
    return client.analyze_syntax(document=document)

def get_token_info(token: Optional[language.Token]) -> list[str]:
    parts = [
        "tag",
        "aspect",
        "case",
        "form",
        "gender",
        "mood",
        "number",
        "person",
        "proper",
        "reciprocity",
        "tense",
        "voice",
    ]
    if not token:
        return ["token", "lemma"] + parts

    text = token.text.content
    lemma = token.lemma if token.lemma != token.text.content else ""
    info = [text, lemma]
    for part in parts:
        pos = token.part_of_speech
        info.append(getattr(pos, part).name if part in pos else "")

    return info

def show_text_syntax(response: language.AnalyzeSyntaxResponse):
    import pandas as pd

    tokens = len(response.tokens)
    sentences = len(response.sentences)
    columns = get_token_info(None)
    data = (get_token_info(token) for token in response.tokens)
    df = pd.DataFrame(columns=columns, data=data)
    # Remove empty columns
    empty_columns = [col for col in df if df[col].eq("").all()]
    df.drop(empty_columns, axis=1, inplace=True)

    print(f"Analyzed {tokens} token(s) from {sentences} sentence(s):")
    print(df.to_markdown(index=False, tablefmt="presto"))

ดำเนินการวิเคราะห์ ดังนี้

# Input
text = """Guido van Rossum is best known as the creator of Python.
He was born in Haarlem, Netherlands.
"""

# Send a request to the API
analyze_syntax_response = analyze_text_syntax(text)

# Show the results
show_text_syntax(analyze_syntax_response)

คุณควรเห็นผลลัพธ์ดังต่อไปนี้

Analyzed 20 token(s) from 2 sentence(s):
 token       | lemma   | tag   | case       | gender    | mood       | number   | person   | proper   | tense   | voice
-------------+---------+-------+------------+-----------+------------+----------+----------+----------+---------+---------
 Guido       |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |
 van         |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |
 Rossum      |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |
 is          | be      | VERB  |            |           | INDICATIVE | SINGULAR | THIRD    |          | PRESENT |
 best        | well    | ADV   |            |           |            |          |          |          |         |
 known       | know    | VERB  |            |           |            |          |          |          | PAST    |
 as          |         | ADP   |            |           |            |          |          |          |         |
 the         |         | DET   |            |           |            |          |          |          |         |
 creator     |         | NOUN  |            |           |            | SINGULAR |          |          |         |
 of          |         | ADP   |            |           |            |          |          |          |         |
 Python      |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |
 .           |         | PUNCT |            |           |            |          |          |          |         |
 He          |         | PRON  | NOMINATIVE | MASCULINE |            | SINGULAR | THIRD    |          |         |
 was         | be      | VERB  |            |           | INDICATIVE | SINGULAR | THIRD    |          | PAST    |
 born        | bear    | VERB  |            |           |            |          |          |          | PAST    | PASSIVE
 in          |         | ADP   |            |           |            |          |          |          |         |
 Haarlem     |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |
 ,           |         | PUNCT |            |           |            |          |          |          |         |
 Netherlands |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |
 .           |         | PUNCT |            |           |            |          |          |          |         |

ใช้เวลาสักครู่เพื่อทดสอบประโยคของคุณเองกับโครงสร้างไวยากรณ์อื่นๆ

หากเจาะลึกข้อมูลเชิงลึกเกี่ยวกับการตอบกลับ คุณจะเห็นความสัมพันธ์ระหว่างโทเค็นด้วย นี่คือการตีความภาพที่แสดงการวิเคราะห์ไวยากรณ์ที่สมบูรณ์สำหรับตัวอย่างนี้ ภาพหน้าจอจากการสาธิตภาษาธรรมชาติออนไลน์

สรุป

ในขั้นตอนนี้ คุณสามารถทำการวิเคราะห์ไวยากรณ์ได้

7. การจัดประเภทเนื้อหา

การจัดประเภทเนื้อหาจะวิเคราะห์เอกสารและแสดงรายการหมวดหมู่เนื้อหาที่ใช้กับข้อความที่พบในเอกสาร ซึ่งดำเนินการกับเมธอด classify_text ซึ่งแสดงผล ClassifyTextResponse

คัดลอกโค้ดต่อไปนี้ลงในเซสชัน IPython

from google.cloud import language

def classify_text(text: str) -> language.ClassifyTextResponse:
    client = language.LanguageServiceClient()
    document = language.Document(
        content=text,
        type_=language.Document.Type.PLAIN_TEXT,
    )
    return client.classify_text(document=document)

def show_text_classification(text: str, response: language.ClassifyTextResponse):
    import pandas as pd

    columns = ["category", "confidence"]
    data = ((category.name, category.confidence) for category in response.categories)
    df = pd.DataFrame(columns=columns, data=data)

    print(f"Text analyzed:\n{text}")
    print(df.to_markdown(index=False, tablefmt="presto", floatfmt=".0%"))

ดำเนินการวิเคราะห์ ดังนี้

# Input
text = """Python is an interpreted, high-level, general-purpose programming language.
Created by Guido van Rossum and first released in 1991, Python's design philosophy
emphasizes code readability with its notable use of significant whitespace.
"""

# Send a request to the API
classify_text_response = classify_text(text)

# Show the results
show_text_classification(text, classify_text_response)

คุณควรเห็นผลลัพธ์ดังต่อไปนี้

Text analyzed:
Python is an interpreted, high-level, general-purpose programming language.
Created by Guido van Rossum and first released in 1991, Python's design philosophy
emphasizes code readability with its notable use of significant whitespace.

 category                             |   confidence
--------------------------------------+--------------
 /Computers & Electronics/Programming |          99%
 /Science/Computer Science            |          99%

ใช้เวลาสักครู่เพื่อทดสอบประโยคของคุณเองที่เกี่ยวข้องกับหมวดหมู่อื่นๆ โปรดทราบว่าคุณต้องใส่บล็อกข้อความ (เอกสาร) ที่มีโทเค็น (คำและเครื่องหมายวรรคตอน) อย่างน้อย 20 โทเค็น

สรุป

ในขั้นตอนนี้ คุณสามารถจัดประเภทเนื้อหาได้

8. การกลั่นกรองข้อความ

การกลั่นกรองข้อความซึ่งขับเคลื่อนโดยโมเดลพื้นฐาน PaLM 2 ล่าสุดของ Google ช่วยระบุเนื้อหาที่เป็นอันตรายได้มากมาย ซึ่งรวมถึงวาจาสร้างความเกลียดชัง การกลั่นแกล้ง และการล่วงละเมิดทางเพศ ซึ่งดำเนินการกับเมธอด moderate_text ซึ่งแสดงผล ModerateTextResponse

คัดลอกโค้ดต่อไปนี้ลงในเซสชัน IPython

from google.cloud import language

def moderate_text(text: str) -> language.ModerateTextResponse:
    client = language.LanguageServiceClient()
    document = language.Document(
        content=text,
        type_=language.Document.Type.PLAIN_TEXT,
    )
    return client.moderate_text(document=document)

def show_text_moderation(text: str, response: language.ModerateTextResponse):
    import pandas as pd

    def confidence(category: language.ClassificationCategory) -> float:
        return category.confidence

    columns = ["category", "confidence"]
    categories = sorted(response.moderation_categories, key=confidence, reverse=True)
    data = ((category.name, category.confidence) for category in categories)
    df = pd.DataFrame(columns=columns, data=data)

    print(f"Text analyzed:\n{text}")
    print(df.to_markdown(index=False, tablefmt="presto", floatfmt=".0%"))

ดำเนินการวิเคราะห์ ดังนี้

# Input
text = """I have to read Ulysses by James Joyce.
I'm a little over halfway through and I hate it.
What a pile of garbage!
"""

# Send a request to the API
response = moderate_text(text)

# Show the results
show_text_moderation(text, response)

คุณควรเห็นผลลัพธ์ดังต่อไปนี้

Text analyzed:
I have to read Ulysses by James Joyce.
I'm a little over halfway through and I hate it.
What a pile of garbage!

 category              |   confidence
-----------------------+--------------
 Toxic                 |          67%
 Insult                |          58%
 Profanity             |          53%
 Violent               |          48%
 Illicit Drugs         |          29%
 Religion & Belief     |          27%
 Politics              |          22%
 Death, Harm & Tragedy |          21%
 Finance               |          18%
 Derogatory            |          14%
 Firearms & Weapons    |          11%
 Health                |          10%
 Legal                 |          10%
 War & Conflict        |           7%
 Public Safety         |           5%
 Sexual                |           4%

ใช้เวลาสักครู่เพื่อทดสอบประโยคของคุณเอง

สรุป

ในขั้นตอนนี้ คุณสามารถทำการกลั่นกรองข้อความได้!

9. ยินดีด้วย

คุณได้เรียนรู้วิธีใช้ Natural Language API โดยใช้ Python แล้ว

ล้างข้อมูล

หากต้องการล้างสภาพแวดล้อมในการพัฒนาซอฟต์แวร์ ให้ดำเนินการดังนี้จาก Cloud Shell

หากคุณยังอยู่ในเซสชัน IPython ให้กลับไปที่ Shell: exit
หยุดใช้สภาพแวดล้อมเสมือนของ Python: deactivate
ลบโฟลเดอร์สภาพแวดล้อมเสมือน: cd ~ ; rm -rf ./venv-language

หากต้องการลบโปรเจ็กต์ Google Cloud จาก Cloud Shell ให้ทำดังนี้

เรียกข้อมูลรหัสโปรเจ็กต์ปัจจุบัน: PROJECT_ID=$(gcloud config get-value core/project)
ตรวจสอบว่านี่คือโปรเจ็กต์ที่คุณต้องการลบ: echo $PROJECT_ID
ลบโปรเจ็กต์: gcloud projects delete $PROJECT_ID

ดูข้อมูลเพิ่มเติม

ทดสอบเดโมในเบราว์เซอร์ที่ https://cloud.google.com/natural-language#natural-language-api-demo
เอกสารประกอบเกี่ยวกับภาษาธรรมชาติ: https://cloud.google.com/natural-language/docs
Python บน Google Cloud: https://cloud.google.com/python
ไลบรารีไคลเอ็นต์ Cloud สำหรับ Python: https://github.com/googleapis/google-cloud-python

ใบอนุญาต

ผลงานนี้ได้รับอนุญาตภายใต้ใบอนุญาตทั่วไปครีเอทีฟคอมมอนส์แบบระบุแหล่งที่มา 2.0