מציגים מסווגי בטיחות גמישים עם Gemma

הצגת סיווגים גמישים של תוכן לא בטוח באמצעות Gemma

מידע על Codelab זה

העדכון האחרון: ספט׳ 27, 2024

נכתב על ידי גוגלר

1.‏ סקירה כללית

בשיעור הקוד הזה תלמדו איך ליצור סיווג טקסט מותאם אישית באמצעות כוונון יעיל של פרמטרים (PET). במקום לבצע שיפורים עדינים בכל המודל, שיטות PET מעדכנות רק כמות קטנה של פרמטרים, וכך אימון המודל קל ומהיר יחסית. בנוסף, קל יותר למודל ללמוד התנהגויות חדשות עם כמות קטנה יחסית של נתוני אימון. השיטה מתוארת בפירוט במאמר Towards Agile Text Classifiers for Everyone, שבו מוסבר איך אפשר להחיל את הטכניקות האלה על מגוון משימות בטיחות ולהגיע לביצועים מתקדמים ביותר עם כמה מאות דוגמאות אימון בלבד.

בקודלאב הזה נעשה שימוש בשיטת ה-PET של LoRA ובדגם Gemma הקטן יותר (gemma_instruct_2b_en), כי אפשר להריץ אותו מהר יותר וביעילות רבה יותר. ב-Colab מוסבר איך מבצעים הטמעת נתונים, איך מעצבים אותם ל-LLM, איך מארגנים את משקלות ה-LoRA ואיך בודקים את התוצאות. בקודלאב הזה מתבצעת הדרכה על מערך הנתונים ETHOS, מערך נתונים שזמין לכולם לצורך זיהוי של דברי שטנה, שנוצר מתגובות ב-YouTube וב-Reddit. כשהמודל מאומן על 200 דוגמאות בלבד (רבע ממערך הנתונים), הוא משיג ערך F1 של 0.80 ו-ROC-AUC של 0.78, מעט מעל ה-SOTA שמדווח כרגע בלוח הבקרה (בזמן כתיבת המאמר, 15 בפברואר 2024). כשהמודל מאומן על 800 הדוגמאות המלאות, הוא משיג ציון F1 של 83.74 וציון ROC-AUC של 88.17. בדרך כלל, הביצועים של מודלים גדולים יותר, כמו gemma_instruct_7b_en, טובים יותר, אבל גם עלויות האימון וההרצה גבוהות יותר.

אזהרה: בקודלאב הזה אנחנו מפתחים סיווג בטיחות לזיהוי דברי שטנה, ולכן הדוגמאות וההערכה של התוצאות מכילות שפה מזעזעת.

2.‏ התקנה והגדרה

כדי להוריד את מודל הבסיס, תצטרכו גרסת keras (3) או keras-nlp (0.8.0) עדכנית וחשבון Kaggle.

!pip install -q -U keras-nlp
!pip install -q -U keras

כדי להתחבר ל-Kaggle, אפשר לאחסן את קובץ פרטי הכניסה kaggle.json ב-~/.kaggle/kaggle.json או להריץ את הפקודה הבאה בסביבת Colab:

import kagglehub

kagglehub.login()

סדנת הקוד הזו נבדקה באמצעות Tensorflow כקצה העורפי של Keras, אבל אפשר להשתמש ב-Tensorflow, ב-Pytorch או ב-JAX:

import os

os.environ["KERAS_BACKEND"] = "tensorflow"

3.‏ טעינת מערך הנתונים ETHOS

בקטע הזה תטעינו את מערך הנתונים שבו תרצו לאמן את הסיווג, ותבצעו עיבוד מקדים שלו כדי ליצור קבוצת אימון וקבוצת בדיקה. תשתמשו במערך הנתונים הפופולרי למחקר ETHOS, שנאסף כדי לזהות דברי שטנה ברשתות החברתיות. מידע נוסף על האופן שבו אוסף הנתונים נאסף זמין במאמר ETHOS: an Online Hate Speech Detection Dataset.

import pandas as pd

gh_root = 'https://raw.githubusercontent.com'
gh_repo = 'intelligence-csd-auth-gr/Ethos-Hate-Speech-Dataset'
gh_path = 'master/ethos/ethos_data/Ethos_Dataset_Binary.csv'
data_url = f'{gh_root}/{gh_repo}/{gh_path}'

df = pd.read_csv(data_url, delimiter=';')
df['hateful'] = (df['isHate'] >= df['isHate'].median()).astype(int)

# Shuffle the dataset.
df = df.sample(frac=1, random_state=32)

# Split into train and test.
df_train, df_test = df[:800],  df[800:]

# Display a sample of the data.
df.head(5)[['hateful', 'comment']]

יוצג משהו שדומה לזה:

	תווית	תגובה
0	`0`	`You said he but still not convinced this is a ...`
1	`0`	`well, looks like its time to have another child.`
2	`0`	`What if we send every men to mars to start a n...`
3	`1`	`It doesn't matter if you're black or white, ...`
4	`0`	`Who ever disliked this video should be ashamed...`

4.‏ הורדה של המודל ויצירת מופע שלו

כפי שמתואר במסמכי העזרה, אפשר להשתמש בקלות במודל Gemma בדרכים רבות. ב-Keras, צריך לבצע את הפעולות הבאות:

import keras
import keras_nlp

# For reproducibility purposes.
keras.utils.set_random_seed(1234)

# Download the model from Kaggle using Keras.
model = keras_nlp.models.GemmaCausalLM.from_preset('gemma_instruct_2b_en')

# Set the sequence length to a small enough value to fit in memory in Colab.
model.preprocessor.sequence_length = 128

כדי לבדוק שהמודל פועל, אפשר ליצור טקסט:

model.generate('Question: what is the capital of France? ', max_length=32)

5.‏ עיבוד טקסט מקדים וטוקני מפריד

כדי לעזור למודל להבין טוב יותר את הכוונה שלנו, אפשר לעבד את הטקסט מראש ולהשתמש באסימונים מפרידים. כך יש פחות סיכוי שהמודל ייצור טקסט שלא מתאים לפורמט הצפוי. לדוגמה, אפשר לנסות לבקש מהמודל סיווג של רגשות על ידי כתיבת הנחיה כמו זו:

Classify the following text into one of the following classes:[Positive,Negative]

Text: you look very nice today
Classification:

במקרה כזה, יכול להיות שהמודל יפיק את מה שאתם מחפשים ויכול להיות שלא. לדוגמה, אם הטקסט מכיל תווים של שורה חדשה, סביר להניח שתהיה לכך השפעה שלילית על ביצועי המודל. גישה חזקה יותר היא להשתמש באסימוני מפריד. ההנחיה תשתנה ל-

Classify the following text into one of the following classes:[Positive,Negative]
<separator>
Text: you look very nice today
<separator>
Prediction:

אפשר להשתמש בפונקציה לעיבוד מראש של הטקסט כדי להפוך את הבעיה למושג מופשט:

def preprocess_text(
    text: str,
    labels: list[str],
    instructions: str,
    separator: str,
) -> str:
  prompt = f'{instructions}:[{",".join(labels)}]'
  return separator.join([prompt, f'Text:{text}', 'Prediction:'])

עכשיו, אם מריצים את הפונקציה עם אותה הנחיה והטקסט כמו קודם, הפלט אמור להיות זהה:

text = 'you look very nice today'

prompt = preprocess_text(
    text=text,
    labels=['Positive', 'Negative'],
    instructions='Classify the following text into one of the following classes',
    separator='\n<separator>\n',
)

print(prompt)

הפלט אמור להיות:

Classify the following text into one of the following classes:[Positive,Negative]
<separator>
Text:well, looks like its time to have another child
<separator>
Prediction:

6.‏ עיבוד תמונה (Post Processing) של פלט

הפלט של המודל הוא אסימונים עם הסתברויות שונות. בדרך כלל, כדי ליצור טקסט, בוחרים מבין כמה האסימונים האפשריים הטובים ביותר ומרכיבים משפטים, פסקאות או אפילו מסמכים מלאים. עם זאת, למטרות סיווג, מה שחשוב הוא אם המודל סבור ש-Positive סביר יותר מ-Negative או להפך.

בהתאם למודל שיצרתם מקודם, כך אפשר לעבד את הפלט שלו לאפשרויות העצמאיות שהאסימון הבא יהיה Positive או Negative:

import numpy as np


def compute_output_probability(
    model: keras_nlp.models.GemmaCausalLM,
    prompt: str,
    target_classes: list[str],
) -> dict[str, float]:
  # Shorthands.
  preprocessor = model.preprocessor
  tokenizer = preprocessor.tokenizer

  # NOTE: If a token is not found, it will be considered same as "<unk>".
  token_unk = tokenizer.token_to_id('<unk>')

  # Identify the token indices, which is the same as the ID for this tokenizer.
  token_ids = [tokenizer.token_to_id(word) for word in target_classes]

  # Throw an error if one of the classes maps to a token outside the vocabulary.
  if any(token_id == token_unk for token_id in token_ids):
    raise ValueError('One of the target classes is not in the vocabulary.')

  # Preprocess the prompt in a single batch. This is done one sample at a time
  # for illustration purposes, but it would be more efficient to batch prompts.
  preprocessed = model.preprocessor.generate_preprocess([prompt])

  # Identify output token offset.
  padding_mask = preprocessed["padding_mask"]
  token_offset = keras.ops.sum(padding_mask) - 1

  # Score outputs, extract only the next token's logits.
  vocab_logits = model.score(
      token_ids=preprocessed["token_ids"],
      padding_mask=padding_mask,
  )[0][token_offset]

  # Compute the relative probability of each of the requested tokens.
  token_logits = [vocab_logits[ix] for ix in token_ids]
  logits_tensor = keras.ops.convert_to_tensor(token_logits)
  probabilities = keras.activations.softmax(logits_tensor)

  return dict(zip(target_classes, probabilities.numpy()))

אפשר לבדוק את הפונקציה הזו על ידי הפעלתה עם ההנחיה שיצרתם מקודם:

compute_output_probability(
    model=model,
    prompt=prompt,
    target_classes=['Positive', 'Negative'],
)

הפלט יהיה דומה לזה:

{'Positive': 0.99994016, 'Negative': 5.984089e-05}

7.‏ אריזה של הכול בתור Classifier

כדי להקל על השימוש, אפשר לקבץ את כל הפונקציות שיצרתם לסווג יחיד בסגנון sklearn, עם פונקציות מוכרות וקלות לשימוש כמו predict() ו-predict_score().

import dataclasses


@dataclasses.dataclass(frozen=True)
class AgileClassifier:
  """Agile classifier to be wrapped around a LLM."""

  # The classes whose probability will be predicted.
  labels: tuple[str, ...]

  # Provide default instructions and control tokens, can be overridden by user.
  instructions: str = 'Classify the following text into one of the following classes'
  separator_token: str = '<separator>'
  end_of_text_token: str = '<eos>'

  def encode_for_prediction(self, x_text: str) -> str:
    return preprocess_text(
        text=x_text,
        labels=self.labels,
        instructions=self.instructions,
        separator=self.separator_token,
    )

  def encode_for_training(self, x_text: str, y: int) -> str:
    return ''.join([
        self.encode_for_prediction(x_text),
        self.labels[y],
        self.end_of_text_token,
    ])

  def predict_score(
      self,
      model: keras_nlp.models.GemmaCausalLM,
      x_text: str,
  ) -> list[float]:
    prompt = self.encode_for_prediction(x_text)
    token_probabilities = compute_output_probability(
        model=model,
        prompt=prompt,
        target_classes=self.labels,
    )
    return [token_probabilities[token] for token in self.labels]

  def predict(
      self,
      model: keras_nlp.models.GemmaCausalLM,
      x_eval: str,
  ) -> int:
    return np.argmax(self.predict_score(model, x_eval))


agile_classifier = AgileClassifier(labels=('Positive', 'Negative'))

8.‏ כוונון עדין של מודל

LoRA היא ראשי תיבות של Low-Rank Adaptation (התאמה ברמה נמוכה). זוהי טכניקה של כוונון עדין שאפשר להשתמש בה כדי לבצע כוונון עדין יעיל של מודלים גדולים של שפה. מידע נוסף זמין במאמר LoRA: Low-Rank Adaptation of Large Language Models.

ההטמעה של Gemma ב-Keras מספקת את השיטה enable_lora() שאפשר להשתמש בה לכוונון מדויק:

# Enable LoRA for the model and set the LoRA rank to 4.
model.backbone.enable_lora(rank=4)

אחרי שמפעילים את LoRA, אפשר להתחיל בתהליך השיפור. התהליך הזה נמשך כ-5 דקות לכל תקופת אימון ב-Colab:

import tensorflow as tf

# Create dataset with preprocessed text + labels.
map_fn = lambda xy: agile_classifier.encode_for_training(*xy)
x_train = list(map(map_fn, df_train[['comment', 'hateful']].values))
ds_train = tf.data.Dataset.from_tensor_slices(x_train).batch(2)

# Compile the model using the Adam optimizer and appropriate loss function.
model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(learning_rate=0.0005),
    weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],
)

# Begin training.
model.fit(ds_train, epochs=4)

אימון על יותר תקופות יאריך את זמן האימון, אבל יביא לדיוק גבוה יותר עד שיתרחש התאמה יתר.

9.‏ בדיקת התוצאות

עכשיו אפשר לבדוק את הפלט של הסיווג הגמיש שהוכשרתם. הקוד הזה יניב את הציון החזוי של הכיתה על סמך קטע טקסט:

text = 'you look really nice today'
scores = agile_classifier.predict_score(model, text)
dict(zip(agile_classifier.labels, scores))

{'Positive': 0.99899644, 'Negative': 0.0010035498}

10.‏ הערכת מודל

לבסוף, נערוך הערכה של ביצועי המודל באמצעות שני מדדים נפוצים: ציון F1 ו-AUC-ROC. כדי לחשב את ציון ה-F1, מחשבים את הממוצע ההרמוני של הדיוק והזיהוי ב-recall בערך סף מסוים של סיווג, וכך מתעדים שגיאות של תוצאות שליליות שגויות ושגיאות של תוצאות חיוביות שגויות. לעומת זאת, עקומת AUC-ROC מתעדת את הפשרה בין שיעור החיובים האמיתיים לבין שיעור החיובים הכוזבים במגוון ערכי סף, ומחשבת את השטח מתחת לעקומה הזו.

from sklearn.metrics import f1_score, roc_auc_score

y_true = df_test['hateful'].values
# Compute the scores (aka probabilities) for each of the labels.
y_score = [agile_classifier.predict_score(model, x) for x in df_test['comment']]
# The label with highest score is considered the predicted class.
y_pred = np.argmax(y_score, axis=1)
# Extract the probability of a comment being considered hateful.
y_prob = [x[agile_classifier.labels.index('Negative')] for x in y_score]

# Compute F1 and AUC-ROC scores.
print(f'F1: {f1_score(y_true, y_pred):.2f}')
print(f'AUC-ROC: {roc_auc_score(y_true, y_prob):.2f}')

F1: 0.84
AUC-ROC: = 0.88

דרך מעניינת נוספת להעריך את תחזיות המודל היא באמצעות מטריצות בלבול. מטריצת בלבול תציג באופן חזותי את הסוגים השונים של שגיאות חיזוי.

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

cm = confusion_matrix(y_true, y_pred)
ConfusionMatrixDisplay(
    confusion_matrix=cm,
    display_labels=agile_classifier.labels,
).plot()

מטריצת בלבול

לבסוף, אפשר גם להסתכל על עקומת ROC כדי לקבל מושג על שגיאות חיזוי פוטנציאליות כשמשתמשים בערכים שונים של ערכי סף למתן ניקוד.

from sklearn.metrics import RocCurveDisplay, roc_curve

fpr, tpr, _ = roc_curve(y_true, y_prob, pos_label=1)
RocCurveDisplay(fpr=fpr, tpr=tpr).plot()

עקומת ROC

דיווח על טעות