Microsoft Azure Python评估库
项目描述
Azure AI Evaluation客户端库
我们很高兴宣布Azure AI Evaluation SDK的公开预览版。
源代码 | 包(PyPI) | API参考文档 | 产品文档 | 示例
此包已在Python 3.8、3.9、3.10、3.11和3.12上进行了测试。
有关更完整的Azure库集合,请参阅https://aka.ms/azsdk/python/all
入门
先决条件
- 使用此包需要Python 3.8或更高版本。
安装包
使用 pip 安装 Azure AI 评估库(Python 版本):
pip install azure-ai-evaluation
关键概念
评估器是用于衡量语言模型输出质量的自定义或预构建类或函数。
示例
评估器
用户可以在本地计算机上创建评估器运行,如下例所示:
import os
from pprint import pprint
from azure.ai.evaluation import evaluate, RelevanceEvaluator, ViolenceEvaluator
def response_length(response, **kwargs):
return {"value": len(response)}
if __name__ == "__main__":
# Built-in evaluators
# Initialize Azure OpenAI Model Configuration
model_config = {
"azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
"api_key": os.environ.get("AZURE_OPENAI_KEY"),
"azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
}
# Initialzing Relevance Evaluator
relevance_eval = RelevanceEvaluator(model_config)
# Running Relevance Evaluator on single input row
relevance_score = relevance_eval(
response="The Alpine Explorer Tent is the most waterproof.",
context="From the our product list,"
" the alpine explorer tent is the most waterproof."
" The Adventure Dining Table has higher weight.",
query="Which tent is the most waterproof?",
)
pprint(relevance_score)
# {'gpt_relevance': 5.0}
# Content Safety Evaluator
# Initialize Project Scope
azure_ai_project = {
"subscription_id": <subscription_id>,
"resource_group_name": <resource_group_name>,
"project_name": <project_name>
}
violence_eval = ViolenceEvaluator(azure_ai_project)
violence_score = violence_eval(query="What is the capital of France?", response="Paris.")
pprint(violence_score)
# {'violence': 'Very low',
# 'violence_reason': "The system's response is a straightforward factual response "
# 'to a geography question. There is no violent content or '
# 'language present.',
# 'violence_score': 0}
# Code based evaluator
response_length("The Alpine Explorer Tent is the most waterproof.")
# {'value': 48}
# Using multiple evaluators together using `Evaluate` API
result = evaluate(
data="evaluate_test_data.jsonl",
evaluators={
"response_length": response_length,
"violence": violence_eval,
},
)
pprint(result)
模拟器
模拟器允许用户使用其应用程序生成合成数据。模拟器期望用户有一个回调方法来调用其 AI 应用程序。
使用 Prompty 模拟
---
name: ApplicationPrompty
description: Simulates an application
model:
api: chat
configuration:
type: azure_openai
azure_deployment: ${env:AZURE_DEPLOYMENT}
api_key: ${env:AZURE_OPENAI_API_KEY}
azure_endpoint: ${env:AZURE_OPENAI_ENDPOINT}
parameters:
temperature: 0.0
top_p: 1.0
presence_penalty: 0
frequency_penalty: 0
response_format:
type: text
inputs:
conversation_history:
type: dict
---
system:
You are a helpful assistant and you're helping with the user's query. Keep the conversation engaging and interesting.
Output with a string that continues the conversation, responding to the latest message from the user, given the conversation history:
{{ conversation_history }}
应用程序代码
import json
import asyncio
from typing import Any, Dict, List, Optional
from azure.ai.evaluation.simulator import Simulator
from promptflow.client import load_flow
from azure.identity import DefaultAzureCredential
import os
azure_ai_project = {
"subscription_id": os.environ.get("AZURE_SUBSCRIPTION_ID"),
"resource_group_name": os.environ.get("RESOURCE_GROUP"),
"project_name": os.environ.get("PROJECT_NAME")
}
import wikipedia
wiki_search_term = "Leonardo da vinci"
wiki_title = wikipedia.search(wiki_search_term)[0]
wiki_page = wikipedia.page(wiki_title)
text = wiki_page.summary[:1000]
def method_to_invoke_application_prompty(query: str):
try:
current_dir = os.path.dirname(__file__)
prompty_path = os.path.join(current_dir, "application.prompty")
_flow = load_flow(source=prompty_path, model={
"configuration": azure_ai_project
})
response = _flow(
query=query,
context=context,
conversation_history=messages_list
)
return response
except:
print("Something went wrong invoking the prompty")
return "something went wrong"
async def callback(
messages: List[Dict],
stream: bool = False,
session_state: Any = None, # noqa: ANN401
context: Optional[Dict[str, Any]] = None,
) -> dict:
messages_list = messages["messages"]
# get last message
latest_message = messages_list[-1]
query = latest_message["content"]
context = None
# call your endpoint or ai application here
response = method_to_invoke_application_prompty(query)
# we are formatting the response to follow the openAI chat protocol format
formatted_response = {
"content": response,
"role": "assistant",
"context": {
"citations": None,
},
}
messages["messages"].append(formatted_response)
return {"messages": messages["messages"], "stream": stream, "session_state": session_state, "context": context}
async def main():
simulator = Simulator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())
outputs = await simulator(
target=callback,
text=text,
num_queries=2,
max_conversation_turns=4,
user_persona=[
f"I am a student and I want to learn more about {wiki_search_term}",
f"I am a teacher and I want to teach my students about {wiki_search_term}"
],
)
print(json.dumps(outputs))
if __name__ == "__main__":
os.environ["AZURE_SUBSCRIPTION_ID"] = ""
os.environ["RESOURCE_GROUP"] = ""
os.environ["PROJECT_NAME"] = ""
os.environ["AZURE_OPENAI_API_KEY"] = ""
os.environ["AZURE_OPENAI_ENDPOINT"] = ""
os.environ["AZURE_DEPLOYMENT"] = ""
asyncio.run(main())
print("done!")
对抗性模拟器
from from azure.ai.evaluation.simulator import AdversarialSimulator, AdversarialScenario
from azure.identity import DefaultAzureCredential
from typing import Any, Dict, List, Optional
import asyncio
azure_ai_project = {
"subscription_id": <subscription_id>,
"resource_group_name": <resource_group_name>,
"project_name": <project_name>
}
async def callback(
messages: List[Dict],
stream: bool = False,
session_state: Any = None,
context: Dict[str, Any] = None
) -> dict:
messages_list = messages["messages"]
# get last message
latest_message = messages_list[-1]
query = latest_message["content"]
context = None
if 'file_content' in messages["template_parameters"]:
query += messages["template_parameters"]['file_content']
# the next few lines explains how to use the AsyncAzureOpenAI's chat.completions
# to respond to the simulator. You should replace it with a call to your model/endpoint/application
# make sure you pass the `query` and format the response as we have shown below
from openai import AsyncAzureOpenAI
oai_client = AsyncAzureOpenAI(
api_key=<api_key>,
azure_endpoint=<endpoint>,
api_version="2023-12-01-preview",
)
try:
response_from_oai_chat_completions = await oai_client.chat.completions.create(messages=[{"content": query, "role": "user"}], model="gpt-4", max_tokens=300)
except Exception as e:
print(f"Error: {e}")
# to continue the conversation, return the messages, else you can fail the adversarial with an exception
message = {
"content": "Something went wrong. Check the exception e for more details.",
"role": "assistant",
"context": None,
}
messages["messages"].append(message)
return {
"messages": messages["messages"],
"stream": stream,
"session_state": session_state
}
response_result = response_from_oai_chat_completions.choices[0].message.content
formatted_response = {
"content": response_result,
"role": "assistant",
"context": {},
}
messages["messages"].append(formatted_response)
return {
"messages": messages["messages"],
"stream": stream,
"session_state": session_state,
"context": context
}
对抗性问答
scenario = AdversarialScenario.ADVERSARIAL_QA
simulator = AdversarialSimulator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())
outputs = asyncio.run(
simulator(
scenario=scenario,
max_conversation_turns=1,
max_simulation_results=3,
target=callback
)
)
print(outputs.to_eval_qa_json_lines())
直接攻击模拟器
scenario = AdversarialScenario.ADVERSARIAL_QA
simulator = DirectAttackSimulator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())
outputs = asyncio.run(
simulator(
scenario=scenario,
max_conversation_turns=1,
max_simulation_results=2,
target=callback
)
)
print(outputs)
故障排除
一般
Azure ML 客户端会抛出在 Azure Core 中定义的异常。
日志记录
此库使用标准的 logging 库进行日志记录。基本信息(URL、标题等)以 INFO 级别记录。
启用详细 DEBUG 级别日志记录,包括请求/响应体和未编辑的标题,可以在客户端使用 logging_enable
参数启用。
有关完整 SDK 日志记录文档和示例,请参阅此处。
下一步
贡献
此项目欢迎贡献和建议。大多数贡献都需要您同意贡献者许可协议(CLA),声明您有权并且实际上授予我们使用您贡献的权利。有关详细信息,请访问cla.microsoft.com。
在提交拉取请求时,CLA-bot 将自动确定您是否需要提供 CLA 并适当地装饰 PR(例如,标签、注释)。只需遵循机器人提供的说明即可。您在整个使用我们的 CLA 的所有存储库中只需这样做一次。
此项目已采用Microsoft 开源行为准则。有关更多信息,请参阅行为准则常见问题解答或通过opencode@microsoft.com联系以获取任何额外的问题或评论。
发行历史
1.0.0b3 (2024-10-01)
新增功能
- 向
AzureOpenAIModelConfiguration
和OpenAIModelConfiguration
添加了type
字段 - 以下评估器现在支持将
conversation
作为其通常的单轮输入的替代输入暴力评估器
性评估器
自残评估器
仇恨不公评估器
受保护材料评估器
间接攻击评估器
一致性评估器
相关性评估器
流畅性评估器
真实性评估器
- 公开
RetrievalScoreEvaluator
,以前是ChatEvaluator
的内部部分,作为一个仅对话的独立评估器。
重大变更
- 已删除
ContentSafetyChatEvaluator
和ChatEvaluator
- 现在,
evaluate
函数的evaluator_config
参数将评估器名称映射到字典EvaluatorConfig
,这是一个TypedDict
。现在应在新的字典内部指定data
或target
与评估器字段名称之间的column_mapping
。
之前
evaluate(
...,
evaluator_config={
"hate_unfairness": {
"query": "${data.question}",
"response": "${data.answer}",
}
},
...
)
之后
evaluate(
...,
evaluator_config={
"hate_unfairness": {
"column_mapping": {
"query": "${data.question}",
"response": "${data.answer}",
}
}
},
...
)
已修复的 bug
- 修复了 Enta ID 认证无法与
AzureOpenAIModelConfiguration
一起工作的问题
1.0.0b2 (2024-09-24)
重大变更
data
和evaluators
现在是evaluate
中的必需关键字。
1.0.0b1 (2024-09-20)
重大变更
- 将
synthetic
命名空间重命名为simulator
,并且删除了该模块下的子命名空间。 - 已删除
evaluate
和evaluators
命名空间,并将之前在这些模块中公开的所有内容添加到根命名空间azure.ai.evaluation
。 - 将内容安全评估器中的参数名称
project_scope
重命名为azure_ai_project
,以与 evaluate API 和模拟器保持一致。 - 模型配置类现在是
TypedDict
类型,并在azure.ai.evaluation
模块中公开,而不是来自promptflow.core
。 - 已更新内置评估器中
question
和answer
的参数名称,以更通用的术语:query
和response
。
新增功能
- 首次预览
- 本包是
promptflow-evals
的移植版本。新的功能将仅添加到此包中。 - 为
AzureAIProject
添加了TypedDict
,以便在传递项目信息时提供更好的智能感知和类型检查。
项目详情
下载文件
下载您平台上的文件。如果您不确定选择哪个,请了解更多关于 安装包 的信息。
源代码分发
构建分发
azure_ai_evaluation-1.0.0b3.tar.gz 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 84d15061f37068fbcdacc943d07fa5f3b1dd4ebeaa64265cb6d1ed0fcab21055 |
|
MD5 | 66333d52f0af945ab198a74db50f2850 |
|
BLAKE2b-256 | 41f1a7ab790d10e9aedd263ee5a840e6de8f12de98d3cb521fb061d831b7f89c |
azure_ai_evaluation-1.0.0b3-py3-none-any.whl 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | bcc26108e8bf142035e1834c3d78e7200a1a767763d3a9bd2040e1331563862e |
|
MD5 | 5a2d19c6b25f5db6b80a72a0594e035e |
|
BLAKE2b-256 | fda95b712d4fea341e778384d7466fd052ff11e040c50162c54961bb970f9e5e |