跳转到主要内容

概率生成模型编程

项目描述

Outlines 〰️

Outlines Logo

.txt Twitter Outlines Twitter

Contributors Downloads Discord

鲁棒(结构化)文本生成。

.txt团队的成员用❤👷️制作。

pip install outlines

第一次来?请访问我们的设置指南

功能

Outlines 〰 每周都有新的发布和功能。确保 ⭐ 星标和 👀 关注此仓库,关注 @dottxtai 以保持最新状态!

为什么我应该使用结构化生成?

.txt 公司

Outlines Logo

我们成立了一家公司,以不断推动结构化生成的边界。了解更多关于 .txt 的信息,如果您需要托管解决方案,可以尝试 我们的 .json API

结构化生成

确保包括大型语言模型在内的系统可靠性,第一步是确保它们输出与用户定义代码之间存在明确的接口。 Outlines 提供了控制语言模型生成的方法,以使其输出更加可预测。

在使用 mistral 模型之前,请在此处请求访问 huggingface 这里

# login to access mistral model
from huggingface_hub import login
login()

多项选择

您可以将完成降低为在多个可能性之间的选择

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")

prompt = """You are a sentiment-labelling assistant.
Is the following review positive or negative?

Review: This restaurant is just awesome!
"""

generator = outlines.generate.choice(model, ["Positive", "Negative"])
answer = generator(prompt)

类型约束

您可以指示模型仅返回整数或浮点数

import outlines

model = outlines.models.transformers("WizardLM/WizardMath-7B-V1.1")

prompt = "<s>result of 9 + 9 = 18</s><s>result of 1 + 2 = "
answer = outlines.generate.format(model, int)(prompt)
print(answer)
# 3

prompt = "sqrt(2)="
generator = outlines.generate.format(model, float)
answer = generator(prompt, max_tokens=10)
print(answer)
# 1.41421356

高效的正则表达式结构化生成

Outlines 还附带快速的正则表达式结构化生成。事实上,上面的 choiceformat 函数在底层都使用了正则表达式结构化生成

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")

prompt = "What is the IP address of the Google DNS servers? "

generator = outlines.generate.text(model)
unstructured = generator(prompt, max_tokens=30)

generator = outlines.generate.regex(
    model,
    r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)",
)
structured = generator(prompt, max_tokens=30)

print(unstructured)
# What is the IP address of the Google DNS servers?
#
# Passive DNS servers are at DNS servers that are private.
# In other words, both IP servers are private. The database
# does not contain Chelsea Manning

print(structured)
# What is the IP address of the Google DNS servers?
# 2.2.6.1

与其它库不同,Outlines 中的正则表达式结构化生成几乎与非结构化生成一样快。

遵循 Pydantic 模型的高效 JSON 生成

Outlines 〰 允许引导生成过程,以确保输出 保证 符合 JSON 模式Pydantic 模型

from enum import Enum
from pydantic import BaseModel, constr

import outlines
import torch


class Weapon(str, Enum):
    sword = "sword"
    axe = "axe"
    mace = "mace"
    spear = "spear"
    bow = "bow"
    crossbow = "crossbow"


class Armor(str, Enum):
    leather = "leather"
    chainmail = "chainmail"
    plate = "plate"


class Character(BaseModel):
    name: constr(max_length=10)
    age: int
    armor: Armor
    weapon: Weapon
    strength: int


model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")

# Construct structured sequence generator
generator = outlines.generate.json(model, Character)

# Draw a sample
rng = torch.Generator(device="cuda")
rng.manual_seed(789001)

character = generator("Give me a character description", rng=rng)

print(repr(character))
# Character(name='Anderson', age=28, armor=<Armor.chainmail: 'chainmail'>, weapon=<Weapon.sword: 'sword'>, strength=8)

character = generator("Give me an interesting character description", rng=rng)

print(repr(character))
# Character(name='Vivian Thr', age=44, armor=<Armor.plate: 'plate'>, weapon=<Weapon.crossbow: 'crossbow'>, strength=125)

此方法适用于联合类型、可选类型、数组、嵌套模式等。一些字段约束尚 不支持,但其他所有内容都应正常工作。

遵循 JSON Schema 的有效 JSON 生成

有时您只想传递一个 JSON Schema 而不是 Pydantic 模型。我们已经为您解决了这个问题

import outlines

schema = '''{
    "title": "Character",
    "type": "object",
    "properties": {
        "name": {
            "title": "Name",
            "maxLength": 10,
            "type": "string"
        },
        "age": {
            "title": "Age",
            "type": "integer"
        },
        "armor": {"$ref": "#/definitions/Armor"},
        "weapon": {"$ref": "#/definitions/Weapon"},
        "strength": {
            "title": "Strength",
            "type": "integer"
        }
    },
    "required": ["name", "age", "armor", "weapon", "strength"],
    "definitions": {
        "Armor": {
            "title": "Armor",
            "description": "An enumeration.",
            "enum": ["leather", "chainmail", "plate"],
            "type": "string"
        },
        "Weapon": {
            "title": "Weapon",
            "description": "An enumeration.",
            "enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
            "type": "string"
        }
    }
}'''

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")
generator = outlines.generate.json(model, schema)
character = generator("Give me a character description")

使用上下文无关文法指导生成

形式文法统治着世界,Outlines 也让它们统治着 LLM。您可以将任何 EBNF 格式的上下文无关文法传递,Outlines 将生成符合此语法的输出

import outlines

arithmetic_grammar = """
    ?start: expression

    ?expression: term (("+" | "-") term)*

    ?term: factor (("*" | "/") factor)*

    ?factor: NUMBER
           | "-" factor
           | "(" expression ")"

    %import common.NUMBER
"""

model = outlines.models.transformers("WizardLM/WizardMath-7B-V1.1")
generator = outlines.generate.cfg(model, arithmetic_grammar)
sequence = generator("Alice had 4 apples and Bob ate 2. Write an expression for Alice's apples:")

print(sequence)
# (8-2)

这是一个非常简单的语法定义,您可以使用 outlines.generate.cfg 生成有效的 Python、SQL 以及更多。实际上,任何类型的结构化文本都可以。您只需在网络上搜索 "X EBNF grammar",并查看 Outlines grammars 模块

开放函数

Outlines 可以从函数的签名中推断输出结构。结果是字典,可以使用常规字典展开语法 ** 直接传递给函数

import outlines


def add(a: int, b: int):
    return a + b

model = outlines.models.transformers("WizardLM/WizardMath-7B-V1.1")
generator = outlines.generate.json(model, add)
result = generator("Return json with two integers named a and b respectively. a is odd and b even.")

print(add(**result))
# 3

直接传递函数以指定结构的优点是 LLM 的结构将与函数定义一起改变。无需在多个地方更改代码!

提示

构建提示可能会变得混乱。《大纲》通过将模板封装在“模板函数”中,使编写和管理提示变得更加容易。

这些函数可以使提示逻辑与一般程序逻辑清晰分离;它们可以从其他模块和库中导入。

模板函数不需要多余的抽象,它们使用 Jinja2 模板引擎以简洁的方式构建复杂的提示。

import outlines

examples = [
    ("The food was disgusting", "Negative"),
    ("We had a fantastic night", "Positive"),
    ("Recommended", "Positive"),
    ("The waiter was rude", "Negative")
]

@outlines.prompt
def labelling(to_label, examples):
    """You are a sentiment-labelling assistant.

    {% for example in examples %}
    {{ example[0] }} // {{ example[1] }}
    {% endfor %}
    {{ to_label }} //
    """

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")
prompt = labelling("Just awesome", examples)
answer = outlines.generate.text(model)(prompt, max_tokens=100)

加入我们

  • 💡 有想法吗? 来和我们在 Discord 上聊天。
  • 🔨 想要贡献? 咨询我们的 贡献指南
  • 🐞 发现了一个错误? 打开一个 问题

引用 Outlines

@article{willard2023efficient,
  title={Efficient Guided Generation for LLMs},
  author={Willard, Brandon T and Louf, R{\'e}mi},
  journal={arXiv preprint arXiv:2307.09702},
  year={2023}
}

项目详情


下载文件

下载您平台的文件。如果您不确定选择哪一个,请了解更多关于 安装软件包 的信息。

源分布

outlines-0.0.46.tar.gz (2.1 MB 查看哈希值)

上传日期 源代码

构建分布

outlines-0.0.46-py3-none-any.whl (101.9 kB 查看哈希值)

上传日期 Python 3

支持者

AWS AWS 云计算和安全赞助商 Datadog Datadog 监控 Fastly Fastly CDN Google Google 下载分析 Microsoft Microsoft PSF 赞助商 Pingdom Pingdom 监控 Sentry Sentry 错误记录 StatusPage StatusPage 状态页面