跳转到主要内容

Microsoft Azure Python评估库

项目描述

Azure AI Evaluation客户端库

我们很高兴宣布Azure AI Evaluation SDK的公开预览版。

源代码 | 包(PyPI) | API参考文档 | 产品文档 | 示例

此包已在Python 3.8、3.9、3.10、3.11和3.12上进行了测试。

有关更完整的Azure库集合,请参阅https://aka.ms/azsdk/python/all

入门

先决条件

  • 使用此包需要Python 3.8或更高版本。

安装包

使用 pip 安装 Azure AI 评估库(Python 版本):

pip install azure-ai-evaluation

关键概念

评估器是用于衡量语言模型输出质量的自定义或预构建类或函数。

示例

评估器

用户可以在本地计算机上创建评估器运行,如下例所示:

import os
from pprint import pprint

from azure.ai.evaluation import evaluate, RelevanceEvaluator, ViolenceEvaluator


def response_length(response, **kwargs):
    return {"value": len(response)}


if __name__ == "__main__":
    # Built-in evaluators
    # Initialize Azure OpenAI Model Configuration
    model_config = {
        "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
        "api_key": os.environ.get("AZURE_OPENAI_KEY"),
        "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
    }

    # Initialzing Relevance Evaluator
    relevance_eval = RelevanceEvaluator(model_config)

    # Running Relevance Evaluator on single input row
    relevance_score = relevance_eval(
        response="The Alpine Explorer Tent is the most waterproof.",
        context="From the our product list,"
        " the alpine explorer tent is the most waterproof."
        " The Adventure Dining Table has higher weight.",
        query="Which tent is the most waterproof?",
    )

    pprint(relevance_score)
    # {'gpt_relevance': 5.0}

    # Content Safety Evaluator

    # Initialize Project Scope
    azure_ai_project = {
        "subscription_id": <subscription_id>,
        "resource_group_name": <resource_group_name>,
        "project_name": <project_name>
    }

    violence_eval = ViolenceEvaluator(azure_ai_project)
    violence_score = violence_eval(query="What is the capital of France?", response="Paris.")
    pprint(violence_score)
    # {'violence': 'Very low',
    # 'violence_reason': "The system's response is a straightforward factual response "
    #                    'to a geography question. There is no violent content or '
    #                    'language present.',
    # 'violence_score': 0}

    # Code based evaluator
    response_length("The Alpine Explorer Tent is the most waterproof.")
    # {'value': 48}

    # Using multiple evaluators together using `Evaluate` API

    result = evaluate(
        data="evaluate_test_data.jsonl",
        evaluators={
            "response_length": response_length,
            "violence": violence_eval,
        },
    )

    pprint(result)

模拟器

模拟器允许用户使用其应用程序生成合成数据。模拟器期望用户有一个回调方法来调用其 AI 应用程序。

使用 Prompty 模拟

---
name: ApplicationPrompty
description: Simulates an application
model:
  api: chat
  configuration:
    type: azure_openai
    azure_deployment: ${env:AZURE_DEPLOYMENT}
    api_key: ${env:AZURE_OPENAI_API_KEY}
    azure_endpoint: ${env:AZURE_OPENAI_ENDPOINT}
  parameters:
    temperature: 0.0
    top_p: 1.0
    presence_penalty: 0
    frequency_penalty: 0
    response_format:
      type: text

inputs:
  conversation_history:
    type: dict

---
system:
You are a helpful assistant and you're helping with the user's query. Keep the conversation engaging and interesting.

Output with a string that continues the conversation, responding to the latest message from the user, given the conversation history:
{{ conversation_history }}

应用程序代码

import json
import asyncio
from typing import Any, Dict, List, Optional
from azure.ai.evaluation.simulator import Simulator
from promptflow.client import load_flow
from azure.identity import DefaultAzureCredential
import os

azure_ai_project = {
    "subscription_id": os.environ.get("AZURE_SUBSCRIPTION_ID"),
    "resource_group_name": os.environ.get("RESOURCE_GROUP"),
    "project_name": os.environ.get("PROJECT_NAME")
}

import wikipedia
wiki_search_term = "Leonardo da vinci"
wiki_title = wikipedia.search(wiki_search_term)[0]
wiki_page = wikipedia.page(wiki_title)
text = wiki_page.summary[:1000]

def method_to_invoke_application_prompty(query: str):
    try:
        current_dir = os.path.dirname(__file__)
        prompty_path = os.path.join(current_dir, "application.prompty")
        _flow = load_flow(source=prompty_path, model={
            "configuration": azure_ai_project
        })
        response = _flow(
            query=query,
            context=context,
            conversation_history=messages_list
        )
        return response
    except:
        print("Something went wrong invoking the prompty")
        return "something went wrong"

async def callback(
    messages: List[Dict],
    stream: bool = False,
    session_state: Any = None,  # noqa: ANN401
    context: Optional[Dict[str, Any]] = None,
) -> dict:
    messages_list = messages["messages"]
    # get last message
    latest_message = messages_list[-1]
    query = latest_message["content"]
    context = None
    # call your endpoint or ai application here
    response = method_to_invoke_application_prompty(query)
    # we are formatting the response to follow the openAI chat protocol format
    formatted_response = {
        "content": response,
        "role": "assistant",
        "context": {
            "citations": None,
        },
    }
    messages["messages"].append(formatted_response)
    return {"messages": messages["messages"], "stream": stream, "session_state": session_state, "context": context}



async def main():
    simulator = Simulator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())
    outputs = await simulator(
        target=callback,
        text=text,
        num_queries=2,
        max_conversation_turns=4,
        user_persona=[
            f"I am a student and I want to learn more about {wiki_search_term}",
            f"I am a teacher and I want to teach my students about {wiki_search_term}"
        ],
    )
    print(json.dumps(outputs))

if __name__ == "__main__":
    os.environ["AZURE_SUBSCRIPTION_ID"] = ""
    os.environ["RESOURCE_GROUP"] = ""
    os.environ["PROJECT_NAME"] = ""
    os.environ["AZURE_OPENAI_API_KEY"] = ""
    os.environ["AZURE_OPENAI_ENDPOINT"] = ""
    os.environ["AZURE_DEPLOYMENT"] = ""
    asyncio.run(main())
    print("done!")

对抗性模拟器

from from azure.ai.evaluation.simulator import AdversarialSimulator, AdversarialScenario
from azure.identity import DefaultAzureCredential
from typing import Any, Dict, List, Optional
import asyncio


azure_ai_project = {
    "subscription_id": <subscription_id>,
    "resource_group_name": <resource_group_name>,
    "project_name": <project_name>
}

async def callback(
    messages: List[Dict],
    stream: bool = False,
    session_state: Any = None,
    context: Dict[str, Any] = None
) -> dict:
    messages_list = messages["messages"]
    # get last message
    latest_message = messages_list[-1]
    query = latest_message["content"]
    context = None
    if 'file_content' in messages["template_parameters"]:
        query += messages["template_parameters"]['file_content']
    # the next few lines explains how to use the AsyncAzureOpenAI's chat.completions
    # to respond to the simulator. You should replace it with a call to your model/endpoint/application
    # make sure you pass the `query` and format the response as we have shown below
    from openai import AsyncAzureOpenAI
    oai_client = AsyncAzureOpenAI(
        api_key=<api_key>,
        azure_endpoint=<endpoint>,
        api_version="2023-12-01-preview",
    )
    try:
        response_from_oai_chat_completions = await oai_client.chat.completions.create(messages=[{"content": query, "role": "user"}], model="gpt-4", max_tokens=300)
    except Exception as e:
        print(f"Error: {e}")
        # to continue the conversation, return the messages, else you can fail the adversarial with an exception
        message = {
            "content": "Something went wrong. Check the exception e for more details.",
            "role": "assistant",
            "context": None,
        }
        messages["messages"].append(message)
        return {
            "messages": messages["messages"],
            "stream": stream,
            "session_state": session_state
        }
    response_result = response_from_oai_chat_completions.choices[0].message.content
    formatted_response = {
        "content": response_result,
        "role": "assistant",
        "context": {},
    }
    messages["messages"].append(formatted_response)
    return {
        "messages": messages["messages"],
        "stream": stream,
        "session_state": session_state,
        "context": context
    }

对抗性问答

scenario = AdversarialScenario.ADVERSARIAL_QA
simulator = AdversarialSimulator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())

outputs = asyncio.run(
    simulator(
        scenario=scenario,
        max_conversation_turns=1,
        max_simulation_results=3,
        target=callback
    )
)

print(outputs.to_eval_qa_json_lines())

直接攻击模拟器

scenario = AdversarialScenario.ADVERSARIAL_QA
simulator = DirectAttackSimulator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())

outputs = asyncio.run(
    simulator(
        scenario=scenario,
        max_conversation_turns=1,
        max_simulation_results=2,
        target=callback
    )
)

print(outputs)

故障排除

一般

Azure ML 客户端会抛出在 Azure Core 中定义的异常。

日志记录

此库使用标准的 logging 库进行日志记录。基本信息(URL、标题等)以 INFO 级别记录。

启用详细 DEBUG 级别日志记录,包括请求/响应体和未编辑的标题,可以在客户端使用 logging_enable 参数启用。

有关完整 SDK 日志记录文档和示例,请参阅此处

下一步

贡献

此项目欢迎贡献和建议。大多数贡献都需要您同意贡献者许可协议(CLA),声明您有权并且实际上授予我们使用您贡献的权利。有关详细信息,请访问cla.microsoft.com

在提交拉取请求时,CLA-bot 将自动确定您是否需要提供 CLA 并适当地装饰 PR(例如,标签、注释)。只需遵循机器人提供的说明即可。您在整个使用我们的 CLA 的所有存储库中只需这样做一次。

此项目已采用Microsoft 开源行为准则。有关更多信息,请参阅行为准则常见问题解答或通过opencode@microsoft.com联系以获取任何额外的问题或评论。

发行历史

1.0.0b3 (2024-10-01)

新增功能

  • AzureOpenAIModelConfigurationOpenAIModelConfiguration 添加了 type 字段
  • 以下评估器现在支持将 conversation 作为其通常的单轮输入的替代输入
    • 暴力评估器
    • 性评估器
    • 自残评估器
    • 仇恨不公评估器
    • 受保护材料评估器
    • 间接攻击评估器
    • 一致性评估器
    • 相关性评估器
    • 流畅性评估器
    • 真实性评估器
  • 公开 RetrievalScoreEvaluator,以前是 ChatEvaluator 的内部部分,作为一个仅对话的独立评估器。

重大变更

  • 已删除 ContentSafetyChatEvaluatorChatEvaluator
  • 现在,evaluate 函数的 evaluator_config 参数将评估器名称映射到字典 EvaluatorConfig,这是一个 TypedDict。现在应在新的字典内部指定 datatarget 与评估器字段名称之间的 column_mapping

之前

evaluate(
    ...,
    evaluator_config={
        "hate_unfairness": {
            "query": "${data.question}",
            "response": "${data.answer}",
        }
    },
    ...
)

之后

evaluate(
    ...,
    evaluator_config={
        "hate_unfairness": {
            "column_mapping": {
                "query": "${data.question}",
                "response": "${data.answer}",
             }
        }
    },
    ...
)

已修复的 bug

  • 修复了 Enta ID 认证无法与 AzureOpenAIModelConfiguration 一起工作的问题

1.0.0b2 (2024-09-24)

重大变更

  • dataevaluators 现在是 evaluate 中的必需关键字。

1.0.0b1 (2024-09-20)

重大变更

  • synthetic 命名空间重命名为 simulator,并且删除了该模块下的子命名空间。
  • 已删除 evaluateevaluators 命名空间,并将之前在这些模块中公开的所有内容添加到根命名空间 azure.ai.evaluation
  • 将内容安全评估器中的参数名称 project_scope 重命名为 azure_ai_project,以与 evaluate API 和模拟器保持一致。
  • 模型配置类现在是 TypedDict 类型,并在 azure.ai.evaluation 模块中公开,而不是来自 promptflow.core
  • 已更新内置评估器中 questionanswer 的参数名称,以更通用的术语:queryresponse

新增功能

  • 首次预览
  • 本包是 promptflow-evals 的移植版本。新的功能将仅添加到此包中。
  • AzureAIProject 添加了 TypedDict,以便在传递项目信息时提供更好的智能感知和类型检查。

项目详情


下载文件

下载您平台上的文件。如果您不确定选择哪个,请了解更多关于 安装包 的信息。

源代码分发

azure_ai_evaluation-1.0.0b3.tar.gz (142.0 kB 查看哈希值)

上传时间 源代码

构建分发

azure_ai_evaluation-1.0.0b3-py3-none-any.whl (144.8 kB 查看哈希值)

上传时间 Python 3

支持者

AWS AWS 云计算和安全赞助商 Datadog Datadog 监控 Fastly Fastly CDN Google Google 下载分析 Microsoft Microsoft PSF赞助商 Pingdom Pingdom 监控 Sentry Sentry 错误记录 StatusPage StatusPage 状态页面