使用Nuclia的模型评估RAG的库
项目描述
nuclia-eval:使用nuclia的模型评估您的RAG
使用 nuclia 模型评估RAG的库
其评估遵循由 TruLens 提出的RAG三联法
总结来说,nuclia-eval 为涉及一个 问题、一个 答案 和 N 个 上下文 的 RAG 体验提供的指标是:
- 答案相关性:答案相关性指的是答案在直接性和适当性方面,针对所提出的具体问题,提供准确、完整和上下文合适的信息。
- 分数:表示答案与问题相关性分数的介于0到5之间的数字。
- 原因:解释分数原因的字符串。
- 对于每个 N 个上下文之一
- 上下文相关性分数:上下文相关性是指上下文与 问题 的相关性,在0到5的范围内。
- 扎根度得分:扎根度定义为答案中包含的信息与上下文中信息重叠的程度,其中信息实质上相似或相同。得分范围在0到5之间。
安装
nuclia-eval仅在基于Linux的系统上受支持。
安装软件包
pip install nuclia-eval
下载模型的要求
要下载模型,您必须拥有Hugging Face账户并已登录。您可以在此处创建账户。您还需要通过运行huggingface-cli login
或此处描述的任何其他方法来通过Hugging Face API验证您的会话。
然后,您需要能够访问基础模型和适配器模型,您可以通过点击模型页面上的按钮轻松请求访问每个模型(登录后)。更多信息请参阅此处。
例如,对于REMi-v0,您需要访问Mistral-7B-Instruct-v0.3基础模型和REMi-v0适配器模型。
如果未完成此身份验证和授权过程,当首次尝试实例化评估器时,您将看到类似的消息
Access to model __model_name__ is restricted. You must be authenticated to access it.
可用模型
REMi-v0
REMi-v0(RAG评估指标)是Mistral-7B-Instruct-v0.3模型的LoRa适配器。
它由nuclia团队微调,以评估RAG体验所有部分的品质。
注意:REMi-v0模型需要至少24GB内存的GPU。
用法
from nuclia_eval import REMi
evaluator = REMi()
query = "By how many Octaves can I shift my OXYGEN PRO 49 keyboard?"
context1 = """\
* Oxygen Pro 49's keyboard can be shifted 3 octaves down or 4 octaves up.
* Oxygen Pro 61's keyboard can be shifted 3 octaves down or 3 octaves up.
To change the transposition of the keyboard, press and hold Shift, and then use the Key Octave –/+ buttons to lower or raise the keybed by one one, respectively.
The display will temporarily show TRANS and the current transposition (-12 to 12)."""
context2 ="""\
To change the octave of the keyboard, use the Key Octave –/+ buttons to lower or raise the octave, respectively
The display will temporarily show OCT and the current octave shift.\n\nOxygen Pro 25's keyboard can be shifted 4 octaves down or 5 octaves up"""
context3 = """\
If your DAW does not automatically configure your Oxygen Pro series keyboard, please follow the setup steps listed in the Oxygen Pro DAW Setup Guides.
To set the keyboard to operate in Preset Mode, press the DAW/Preset Button (on the Oxygen Pro 25) or Preset Button (on the Oxygen Pro 49 and 61).
On the Oxygen Pro 25 the DAW/Preset button LED will be off to show that Preset Mode is selected.
On the Oxygen Pro 49 and 61 the Preset button LED will be lit to show that Preset Mode is selected."""
answer = "Based on the context provided, The Oxygen Pro 49's keyboard can be shifted 3 octaves down or 4 octaves up."
result = evaluator.evaluate_rag(query=query, answer=answer, contexts=[context1, context2, context3])
answer_relevance, context_relevances, groundednesses = result
print(f"{answer_relevance.score}, {answer_relevance.reason}")
# 5, The response directly answers the query by specifying the range of octave shifts for the Oxygen Pro 49 keyboard.
print([cr.score for cr in context_relevances]) # [5, 1, 0]
print([g.score for g in groundednesses]) # [2, 0, 0]
粒度
REMi评估器提供了对RAG三联体的精细和严格评估。例如,如果我们稍微修改查询的答案
- answer = "Based on the context provided, The Oxygen Pro 49's keyboard can be shifted 3 octaves down or 4 octaves up."
+ answer = "Based on the context provided, the Oxygen Pro 49's keyboard can be shifted 4 octaves down or 4 octaves up."
...
print([g.score for g in groundednesses]) # [0, 0, 0]
由于答案中提供的信息不在任何上下文中,因此所有上下文的扎根度得分为0。
如果答案中的信息没有回答问题怎么办?
- answer = "Based on the context provided, The Oxygen Pro 49's keyboard can be shifted 3 octaves down or 4 octaves up."
+ answer = "Based on the context provided, the Oxygen Pro 61's keyboard can be shifted 3 octaves down or 4 octaves up."
...
print(f"{answer_relevance.score}, {answer_relevance.reason}")
# 1, The response is relevant to the entire query but incorrectly mentions the Oxygen Pro 61 instead of the Oxygen Pro 49
个别指标
我们还可以单独计算每个指标
...
answer_relevance = evaluator.answer_relevance(query=query, answer=answer)
context_relevances = evaluator.context_relevance(query=query, contexts=[context1, context2, context3])
groundednesses = evaluator.groundedness(answer=answer, contexts=[context1, context2, context3])
...
指定模型下载位置
默认情况下,模型下载到~/.nuclia-model-cache/
目录。您可以通过以下两种方式指定不同的位置
- 通过设置环境变量
NUCLIA_MODEL_CACHE
到所需位置。 - 通过在实例化评估器时覆盖默认设置
from nuclia_eval import REMi
from nuclia_eval.settings import Settings
# Create custom settings
settings = Settings(
nuclia_model_cache="my_cache/",
)
# Instantiate the evaluator
evaluator = REMi(settings=settings)
反馈和社区
对于反馈、问题或联系nuclia团队,我们可在我们的社区Slack频道上提供帮助。
项目详情
关闭
nuclia_eval-1.0.3.tar.gz的哈希
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 00d70ef8aaa6833b191177b73768f84ba26ece02a01586fdc8692afe75d7645b |
|
MD5 | 823668f1ede36fbb961d92aaf56cb6f3 |
|
BLAKE2b-256 | a8ac23c7f798c06dd914497fad2891e616cf796ca7bf48e10b0542b70205ab11 |