Microsoft Azure Python评估库

这些详情尚未通过PyPI验证

项目链接

项目描述

Azure AI Evaluation客户端库

我们很高兴宣布Azure AI Evaluation SDK的公开预览版。

源代码 | 包（PyPI） | API参考文档 | 产品文档 | 示例

此包已在Python 3.8、3.9、3.10、3.11和3.12上进行了测试。

有关更完整的Azure库集合，请参阅https://aka.ms/azsdk/python/all

入门

先决条件

使用此包需要Python 3.8或更高版本。

安装包

使用 pip 安装 Azure AI 评估库（Python 版本）：

pip install azure-ai-evaluation

关键概念

评估器是用于衡量语言模型输出质量的自定义或预构建类或函数。

示例

评估器

用户可以在本地计算机上创建评估器运行，如下例所示：

import os
from pprint import pprint

from azure.ai.evaluation import evaluate, RelevanceEvaluator, ViolenceEvaluator


def response_length(response, **kwargs):
    return {"value": len(response)}


if __name__ == "__main__":
    # Built-in evaluators
    # Initialize Azure OpenAI Model Configuration
    model_config = {
        "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
        "api_key": os.environ.get("AZURE_OPENAI_KEY"),
        "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
    }

    # Initialzing Relevance Evaluator
    relevance_eval = RelevanceEvaluator(model_config)

    # Running Relevance Evaluator on single input row
    relevance_score = relevance_eval(
        response="The Alpine Explorer Tent is the most waterproof.",
        context="From the our product list,"
        " the alpine explorer tent is the most waterproof."
        " The Adventure Dining Table has higher weight.",
        query="Which tent is the most waterproof?",
    )

    pprint(relevance_score)
    # {'gpt_relevance': 5.0}

    # Content Safety Evaluator

    # Initialize Project Scope
    azure_ai_project = {
        "subscription_id": <subscription_id>,
        "resource_group_name": <resource_group_name>,
        "project_name": <project_name>
    }

    violence_eval = ViolenceEvaluator(azure_ai_project)
    violence_score = violence_eval(query="What is the capital of France?", response="Paris.")
    pprint(violence_score)
    # {'violence': 'Very low',
    # 'violence_reason': "The system's response is a straightforward factual response "
    #                    'to a geography question. There is no violent content or '
    #                    'language present.',
    # 'violence_score': 0}

    # Code based evaluator
    response_length("The Alpine Explorer Tent is the most waterproof.")
    # {'value': 48}

    # Using multiple evaluators together using `Evaluate` API

    result = evaluate(
        data="evaluate_test_data.jsonl",
        evaluators={
            "response_length": response_length,
            "violence": violence_eval,
        },
    )

    pprint(result)

模拟器

模拟器允许用户使用其应用程序生成合成数据。模拟器期望用户有一个回调方法来调用其 AI 应用程序。

使用 Prompty 模拟

---
name: ApplicationPrompty
description: Simulates an application
model:
  api: chat
  configuration:
    type: azure_openai
    azure_deployment: ${env:AZURE_DEPLOYMENT}
    api_key: ${env:AZURE_OPENAI_API_KEY}
    azure_endpoint: ${env:AZURE_OPENAI_ENDPOINT}
  parameters:
    temperature: 0.0
    top_p: 1.0
    presence_penalty: 0
    frequency_penalty: 0
    response_format:
      type: text

inputs:
  conversation_history:
    type: dict

---
system:
You are a helpful assistant and you're helping with the user's query. Keep the conversation engaging and interesting.

Output with a string that continues the conversation, responding to the latest message from the user, given the conversation history:
{{ conversation_history }}

应用程序代码

import json
import asyncio
from typing import Any, Dict, List, Optional
from azure.ai.evaluation.simulator import Simulator
from promptflow.client import load_flow
from azure.identity import DefaultAzureCredential
import os

azure_ai_project = {
    "subscription_id": os.environ.get("AZURE_SUBSCRIPTION_ID"),
    "resource_group_name": os.environ.get("RESOURCE_GROUP"),
    "project_name": os.environ.get("PROJECT_NAME")
}

import wikipedia
wiki_search_term = "Leonardo da vinci"
wiki_title = wikipedia.search(wiki_search_term)[0]
wiki_page = wikipedia.page(wiki_title)
text = wiki_page.summary[:1000]

def method_to_invoke_application_prompty(query: str):
    try:
        current_dir = os.path.dirname(__file__)
        prompty_path = os.path.join(current_dir, "application.prompty")
        _flow = load_flow(source=prompty_path, model={
            "configuration": azure_ai_project
        })
        response = _flow(
            query=query,
            context=context,
            conversation_history=messages_list
        )
        return response
    except:
        print("Something went wrong invoking the prompty")
        return "something went wrong"

async def callback(
    messages: List[Dict],
    stream: bool = False,
    session_state: Any = None,  # noqa: ANN401
    context: Optional[Dict[str, Any]] = None,
) -> dict:
    messages_list = messages["messages"]
    # get last message
    latest_message = messages_list[-1]
    query = latest_message["content"]
    context = None
    # call your endpoint or ai application here
    response = method_to_invoke_application_prompty(query)
    # we are formatting the response to follow the openAI chat protocol format
    formatted_response = {
        "content": response,
        "role": "assistant",
        "context": {
            "citations": None,
        },
    }
    messages["messages"].append(formatted_response)
    return {"messages": messages["messages"], "stream": stream, "session_state": session_state, "context": context}



async def main():
    simulator = Simulator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())
    outputs = await simulator(
        target=callback,
        text=text,
        num_queries=2,
        max_conversation_turns=4,
        user_persona=[
            f"I am a student and I want to learn more about {wiki_search_term}",
            f"I am a teacher and I want to teach my students about {wiki_search_term}"
        ],
    )
    print(json.dumps(outputs))

if __name__ == "__main__":
    os.environ["AZURE_SUBSCRIPTION_ID"] = ""
    os.environ["RESOURCE_GROUP"] = ""
    os.environ["PROJECT_NAME"] = ""
    os.environ["AZURE_OPENAI_API_KEY"] = ""
    os.environ["AZURE_OPENAI_ENDPOINT"] = ""
    os.environ["AZURE_DEPLOYMENT"] = ""
    asyncio.run(main())
    print("done!")

对抗性模拟器

from from azure.ai.evaluation.simulator import AdversarialSimulator, AdversarialScenario
from azure.identity import DefaultAzureCredential
from typing import Any, Dict, List, Optional
import asyncio


azure_ai_project = {
    "subscription_id": <subscription_id>,
    "resource_group_name": <resource_group_name>,
    "project_name": <project_name>
}

async def callback(
    messages: List[Dict],
    stream: bool = False,
    session_state: Any = None,
    context: Dict[str, Any] = None
) -> dict:
    messages_list = messages["messages"]
    # get last message
    latest_message = messages_list[-1]
    query = latest_message["content"]
    context = None
    if 'file_content' in messages["template_parameters"]:
        query += messages["template_parameters"]['file_content']
    # the next few lines explains how to use the AsyncAzureOpenAI's chat.completions
    # to respond to the simulator. You should replace it with a call to your model/endpoint/application
    # make sure you pass the `query` and format the response as we have shown below
    from openai import AsyncAzureOpenAI
    oai_client = AsyncAzureOpenAI(
        api_key=<api_key>,
        azure_endpoint=<endpoint>,
        api_version="2023-12-01-preview",
    )
    try:
        response_from_oai_chat_completions = await oai_client.chat.completions.create(messages=[{"content": query, "role": "user"}], model="gpt-4", max_tokens=300)
    except Exception as e:
        print(f"Error: {e}")
        # to continue the conversation, return the messages, else you can fail the adversarial with an exception
        message = {
            "content": "Something went wrong. Check the exception e for more details.",
            "role": "assistant",
            "context": None,
        }
        messages["messages"].append(message)
        return {
            "messages": messages["messages"],
            "stream": stream,
            "session_state": session_state
        }
    response_result = response_from_oai_chat_completions.choices[0].message.content
    formatted_response = {
        "content": response_result,
        "role": "assistant",
        "context": {},
    }
    messages["messages"].append(formatted_response)
    return {
        "messages": messages["messages"],
        "stream": stream,
        "session_state": session_state,
        "context": context
    }

对抗性问答

scenario = AdversarialScenario.ADVERSARIAL_QA
simulator = AdversarialSimulator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())

outputs = asyncio.run(
    simulator(
        scenario=scenario,
        max_conversation_turns=1,
        max_simulation_results=3,
        target=callback
    )
)

print(outputs.to_eval_qa_json_lines())

直接攻击模拟器

scenario = AdversarialScenario.ADVERSARIAL_QA
simulator = DirectAttackSimulator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())

outputs = asyncio.run(
    simulator(
        scenario=scenario,
        max_conversation_turns=1,
        max_simulation_results=2,
        target=callback
    )
)

print(outputs)

故障排除

一般

Azure ML 客户端会抛出在 Azure Core 中定义的异常。

日志记录

此库使用标准的 logging 库进行日志记录。基本信息（URL、标题等）以 INFO 级别记录。

启用详细 DEBUG 级别日志记录，包括请求/响应体和未编辑的标题，可以在客户端使用 logging_enable 参数启用。

有关完整 SDK 日志记录文档和示例，请参阅此处。

下一步

查看我们的示例。
查看我们的文档

贡献

此项目欢迎贡献和建议。大多数贡献都需要您同意贡献者许可协议（CLA），声明您有权并且实际上授予我们使用您贡献的权利。有关详细信息，请访问cla.microsoft.com。

在提交拉取请求时，CLA-bot 将自动确定您是否需要提供 CLA 并适当地装饰 PR（例如，标签、注释）。只需遵循机器人提供的说明即可。您在整个使用我们的 CLA 的所有存储库中只需这样做一次。

此项目已采用Microsoft 开源行为准则。有关更多信息，请参阅行为准则常见问题解答或通过opencode@microsoft.com联系以获取任何额外的问题或评论。

发行历史

1.0.0b3 (2024-10-01)

新增功能

向 AzureOpenAIModelConfiguration 和 OpenAIModelConfiguration 添加了 type 字段
以下评估器现在支持将 conversation 作为其通常的单轮输入的替代输入
- 暴力评估器
- 性评估器
- 自残评估器
- 仇恨不公评估器
- 受保护材料评估器
- 间接攻击评估器
- 一致性评估器
- 相关性评估器
- 流畅性评估器
- 真实性评估器
公开 RetrievalScoreEvaluator，以前是 ChatEvaluator 的内部部分，作为一个仅对话的独立评估器。

重大变更

已删除 ContentSafetyChatEvaluator 和 ChatEvaluator
现在，evaluate 函数的 evaluator_config 参数将评估器名称映射到字典 EvaluatorConfig，这是一个 TypedDict。现在应在新的字典内部指定 data 或 target 与评估器字段名称之间的 column_mapping。

之前

evaluate(
    ...,
    evaluator_config={
        "hate_unfairness": {
            "query": "${data.question}",
            "response": "${data.answer}",
        }
    },
    ...
)

之后

evaluate(
    ...,
    evaluator_config={
        "hate_unfairness": {
            "column_mapping": {
                "query": "${data.question}",
                "response": "${data.answer}",
             }
        }
    },
    ...
)

已修复的 bug

修复了 Enta ID 认证无法与 AzureOpenAIModelConfiguration 一起工作的问题

1.0.0b2 (2024-09-24)

重大变更

data 和 evaluators 现在是 evaluate 中的必需关键字。

1.0.0b1 (2024-09-20)

重大变更

将 synthetic 命名空间重命名为 simulator，并且删除了该模块下的子命名空间。
已删除 evaluate 和 evaluators 命名空间，并将之前在这些模块中公开的所有内容添加到根命名空间 azure.ai.evaluation。
将内容安全评估器中的参数名称 project_scope 重命名为 azure_ai_project，以与 evaluate API 和模拟器保持一致。
模型配置类现在是 TypedDict 类型，并在 azure.ai.evaluation 模块中公开，而不是来自 promptflow.core。
已更新内置评估器中 question 和 answer 的参数名称，以更通用的术语：query 和 response。

新增功能

首次预览
本包是 promptflow-evals 的移植版本。新的功能将仅添加到此包中。
为 AzureAIProject 添加了 TypedDict，以便在传递项目信息时提供更好的智能感知和类型检查。

项目详情

这些详情尚未通过PyPI验证

项目链接

发布历史发布通知 | RSS 源

本版本

1.0.0b3 预发布

2024年10月1日

1.0.0b2 预发布

2024年9月24日

1.0.0b1 预发布

2024年9月20日

0.0.0b0 预发布

2024年8月26日

下载文件

下载您平台上的文件。如果您不确定选择哪个，请了解更多关于安装包的信息。

源代码分发

azure_ai_evaluation-1.0.0b3.tar.gz (142.0 kB 查看哈希值)

上传时间 2024年10月1日 源代码

构建分发

azure_ai_evaluation-1.0.0b3-py3-none-any.whl (144.8 kB 查看哈希值)

上传时间 2024年10月1日 Python 3

azure_ai_evaluation-1.0.0b3.tar.gz 的哈希值

azure_ai_evaluation-1.0.0b3.tar.gz 的哈希值
算法	哈希摘要
SHA256	`84d15061f37068fbcdacc943d07fa5f3b1dd4ebeaa64265cb6d1ed0fcab21055`
MD5	`66333d52f0af945ab198a74db50f2850`
BLAKE2b-256	`41f1a7ab790d10e9aedd263ee5a840e6de8f12de98d3cb521fb061d831b7f89c`

azure_ai_evaluation-1.0.0b3-py3-none-any.whl 的哈希值

azure_ai_evaluation-1.0.0b3-py3-none-any.whl 的哈希值
算法	哈希摘要
SHA256	`bcc26108e8bf142035e1834c3d78e7200a1a767763d3a9bd2040e1331563862e`
MD5	`5a2d19c6b25f5db6b80a72a0594e035e`
BLAKE2b-256	`fda95b712d4fea341e778384d7466fd052ff11e040c50162c54961bb970f9e5e`

azure-ai-evaluation 1.0.0b3

导航

验证详情

维护者

未验证详情

项目链接

元信息

分类器

项目描述

Azure AI Evaluation客户端库

入门

先决条件

安装包

关键概念

示例

评估器

模拟器

使用 Prompty 模拟

对抗性模拟器

对抗性问答

直接攻击模拟器

故障排除

一般

日志记录

下一步

贡献

发行历史

1.0.0b3 (2024-10-01)

新增功能

重大变更

已修复的 bug

1.0.0b2 (2024-09-24)

重大变更

1.0.0b1 (2024-09-20)

重大变更

新增功能

项目详情

验证详情

维护者

未验证详情

项目链接

元信息

分类器

发布历史 发布通知 | RSS 源

下载文件

源代码分发

构建分发

发布历史发布通知 | RSS 源