跳转到主要内容

Python接口到LLM。

项目描述

LlamaBot: 一个Python风格的LLM接口

LlamaBot实现了对LLM的Python接口,使得在Jupyter笔记本中实验LLM以及构建利用LLM的Python应用变得更加容易。LlamaBot支持LiteLLM支持的所有的模型。

安装LlamaBot

要安装LlamaBot

pip install llamabot

获取访问LLM的权限

选项1:使用Ollama本地模型

LlamaBot支持通过Ollama使用本地模型。为此,请访问Ollama网站并安装Ollama。然后按照以下说明操作。

选项2:使用API提供商

OpenAI

如果您有OpenAI API密钥,则可以通过运行以下命令配置LlamaBot使用该API密钥:

export OPENAI_API_KEY="sk-your1api2key3goes4here"

Mistral

如果您有Mistral API密钥,则可以通过运行以下命令配置LlamaBot使用该API密钥:

export MISTRAL_API_KEY="your-api-key-goes-here"

其他API提供商

其他API提供商通常指定一个环境变量来设置。如果您有API密钥,请相应地设置环境变量。

使用方法

SimpleBot

LlamaBot最简单的用法是创建一个SimpleBot,不记录聊天历史。这实际上等同于一个无状态的函数,您使用自然语言指令而不是代码来编程。这对于提示实验或创建简单的机器人非常有用,这些机器人根据指令预先配置以处理文本,然后可以重复使用不同的文本进行调用。

使用API提供商的SimpleBot

例如,创建一个像Richard Feynman那样解释给定文本的机器人

from llamabot import SimpleBot

system_prompt = "You are Richard Feynman. You will be given a difficult concept, and your task is to explain it back."
feynman = SimpleBot(
  system_prompt,
  model_name="gpt-3.5-turbo"
)

使用GPT,您需要配置OPENAI_API_KEY环境变量。如果您想使用本地Ollama模型的SimpleBot,请查看此示例

现在,feynman 可以在任意文本块上调用,并将以理查德·费曼的风格(或更准确地说,根据 system_prompt 指定的风格)重写该文本。例如

prompt = """
Enzyme function annotation is a fundamental challenge, and numerous computational tools have been developed.
However, most of these tools cannot accurately predict functional annotations,
such as enzyme commission (EC) number,
for less-studied proteins or those with previously uncharacterized functions or multiple activities.
We present a machine learning algorithm named CLEAN (contrastive learning–enabled enzyme annotation)
to assign EC numbers to enzymes with better accuracy, reliability,
and sensitivity compared with the state-of-the-art tool BLASTp.
The contrastive learning framework empowers CLEAN to confidently (i) annotate understudied enzymes,
(ii) correct mislabeled enzymes, and (iii) identify promiscuous enzymes with two or more EC numbers—functions
that we demonstrate by systematic in silico and in vitro experiments.
We anticipate that this tool will be widely used for predicting the functions of uncharacterized enzymes,
thereby advancing many fields, such as genomics, synthetic biology, and biocatalysis.
"""
feynman(prompt)

这将返回类似以下内容

Alright, let's break this down.

Enzymes are like little biological machines that help speed up chemical reactions in our
bodies. Each enzyme has a specific job, or function, and we use something called an
Enzyme Commission (EC) number to categorize these functions.

Now, the problem is that we don't always know what function an enzyme has, especially if
it's a less-studied or new enzyme. This is where computational tools come in. They try
to predict the function of these enzymes, but they often struggle to do so accurately.

So, the folks here have developed a new tool called CLEAN, which stands for contrastive
learning–enabled enzyme annotation. This tool uses a machine learning algorithm, which
is a type of artificial intelligence that learns from data to make predictions or
decisions.

CLEAN uses a method called contrastive learning. Imagine you have a bunch of pictures of
cats and dogs, and you want to teach a machine to tell the difference. You'd show it
pairs of pictures, some of the same animal (two cats or two dogs) and some of different
animals (a cat and a dog). The machine would learn to tell the difference by contrasting
the features of the two pictures. That's the basic idea behind contrastive learning.

CLEAN uses this method to predict the EC numbers of enzymes more accurately than
previous tools. It can confidently annotate understudied enzymes, correct mislabeled
enzymes, and even identify enzymes that have more than one function.

The creators of CLEAN have tested it with both computer simulations and lab experiments,
and they believe it will be a valuable tool for predicting the functions of unknown
enzymes. This could have big implications for fields like genomics, synthetic biology,
and biocatalysis, which all rely on understanding how enzymes work.

使用 SimpleBot 和本地 Ollama 模型

如果您想使用本地托管的 Ollama 模型,则可以使用以下语法

from llamabot import SimpleBot

system_prompt = "You are Richard Feynman. You will be given a difficult concept, and your task is to explain it back."
bot = SimpleBot(
    system_prompt,
    model_name="ollama/llama2:13b"
)

只需指定 model_name 关键字参数,格式为 <provider>/<model name>。例如

您需要确保 Ollama 在本地运行;有关更多详细信息,请参阅 Ollama 文档。(同样,也可以对下面的 ChatBotQueryBot 类进行操作!)

model_name 参数是可选的。如果您不提供它,Llamabot 将尝试使用默认模型。您可以在 DEFAULT_LANGUAGE_MODEL 环境变量中配置它。

聊天机器人

为了在 Jupyter Notebook 中进行聊天机器人的实验,我们还提供了 ChatBot 接口。此接口会自动跟踪 Jupyter 会话生命周期内的聊天历史。这样做可以使您将本地的 Jupyter Notebook 用作聊天界面。

例如

from llamabot import ChatBot

system_prompt="You are Richard Feynman. You will be given a difficult concept, and your task is to explain it back."
feynman = ChatBot(
  system_prompt,
  session_name="feynman_chat",
  # Optional:
  # model_name="gpt-3.5-turbo"
  # or
  # model_name="ollama/mistral"
)

有关 model_name 的更多解释,请参阅 使用 SimpleBot 的示例

现在,您已经有一个 ChatBot 实例,您可以开始与之进行对话。

prompt = """
Enzyme function annotation is a fundamental challenge, and numerous computational tools have been developed.
However, most of these tools cannot accurately predict functional annotations,
such as enzyme commission (EC) number,
for less-studied proteins or those with previously uncharacterized functions or multiple activities.
We present a machine learning algorithm named CLEAN (contrastive learning–enabled enzyme annotation)
to assign EC numbers to enzymes with better accuracy, reliability,
and sensitivity compared with the state-of-the-art tool BLASTp.
The contrastive learning framework empowers CLEAN to confidently (i) annotate understudied enzymes,
(ii) correct mislabeled enzymes, and (iii) identify promiscuous enzymes with two or more EC numbers—functions
that we demonstrate by systematic in silico and in vitro experiments.
We anticipate that this tool will be widely used for predicting the functions of uncharacterized enzymes,
thereby advancing many fields, such as genomics, synthetic biology, and biocatalysis.
"""
feynman(prompt)

在可用的聊天历史中,您可以提出后续问题

feynman("Is there a simpler way to rephrase the text such that a high schooler would understand it?")

并且您的机器人将利用聊天历史进行响应。

查询机器人

提供的最后一个机器人是查询机器人。此机器人允许您查询文档集合。要使用它,您有两种选择

  1. 传递一个包含文本文件路径的列表,让 Llamabot 为它们创建一个新的集合,或者
  2. 传递先前实例化的 QueryBot 模型的 collection_name。(这将加载先前计算好的文本索引到内存中。)

创建新集合的示例

from llamabot import QueryBot
from pathlib import Path

bot = QueryBot(
  system_prompt="You are an expert on Eric Ma's blog.",
  collection_name="eric_ma_blog",
  document_paths=[
    Path("/path/to/blog/post1.txt"),
    Path("/path/to/blog/post2.txt"),
    ...,
  ],
  # Optional:
  # model_name="gpt-3.5-turbo"
  # or
  # model_name="ollama/mistral"
) # This creates a new embedding for my blog text.
result = bot("Do you have any advice for me on career development?")

使用现有集合的示例

from llamabot import QueryBot

bot = QueryBot(
  system_prompt="You are an expert on Eric Ma's blog",
  collection_name="eric_ma_blog",
  # Optional:
  # model_name="gpt-3.5-turbo"
  # or
  # model_name="ollama/mistral"
)  # This loads my previously-embedded blog text.
result = bot("Do you have any advice for me on career development?")

有关 model_name 的更多解释,请参阅 使用 SimpleBot 的示例

图像机器人

随着 OpenAI API 更新的发布,只要您有 OpenAI API 密钥,您就可以使用 LlamaBot 生成图像

from llamabot import ImageBot

bot = ImageBot()
# Within a Jupyter notebook:
url = bot("A painting of a dog.")

# Or within a Python script
filepath = bot("A painting of a dog.")

# Now, you can do whatever you need with the url or file path.

如果您在 Jupyter Notebook 中,您还会看到图像神奇地作为输出单元格的一部分出现。

CLI 示例

Llamabot 包含 CLI 示例,展示了可以使用它构建的内容,以及一些辅助代码。

这里有一个示例,我在命令行中直接使用 llamabot chat 暴露聊天机器人

<script async id="asciicast-594332" src="https://asciinema.org/a/594332.js"></script>

还有另一个示例,其中 llamabot 被用作 CLI 应用程序的后端,用于使用 llamabot zotero chat 与 Zotero 库聊天

<script async id="asciicast-594326" src="https://asciinema.org/a/594326.js"></script>

最后,这里有一个示例,我使用 llamabotSimpleBot 创建了一个机器人,该机器人可以自动为我编写提交消息。

<script async id="asciicast-594334" src="https://asciinema.org/a/594334.js"></script>

缓存

LlamaBot 使用缓存机制来提高性能并减少不必要的 API 调用。默认情况下,所有缓存条目在 1 天后(86400 秒)过期。此行为是通过使用 diskcache 库实现的。

缓存配置

在您使用任何机器人类(SimpleBotChatBotQueryBot)时,缓存会自动配置。您无需手动设置缓存。

缓存位置

默认缓存目录位于

~/.llamabot/cache

缓存超时

缓存超时可以通过使用环境变量 LLAMABOT_CACHE_TIMEOUT 进行配置。默认情况下,缓存超时设置为1天(86400秒)。要自定义缓存超时,将环境变量 LLAMABOT_CACHE_TIMEOUT 设置为所需的秒数。例如

export LLAMABOT_CACHE_TIMEOUT=3600

这将设置缓存超时为1小时(3600秒)。

贡献

新功能

欢迎提出新功能!对于大型语言模型的用户来说,这是一个早期且令人兴奋的日子。我们的开发目标是尽可能保持项目简单。带有pull request的功能请求将被优先考虑;功能的实现越简单(就维护负担而言),越有可能被批准。

错误报告

请使用问题跟踪器提交错误报告。

问题/讨论

请使用GitHub上的问题跟踪器。

贡献者

Rena Lu
陆瑞娜

💻
andrew giessel
andrew giessel

🤔 🎨 💻
Aidan Brewis
艾登·布鲁伊斯

💻
Eric Ma
马修·艾瑞克

🤔 🎨 💻
Mark Harrison
马克·哈里森

🤔
reka
reka

📖 💻
anujsinha3
anujsinha3

💻 📖
Elliot Salisbury
埃利奥特·萨尔兹伯里

📖
Ethan Fricker, PhD
Ethan Fricker, PhD

📖
Ikko Eltociear Ashimine
Ikko Eltociear Ashimine

📖

项目详情


发布历史 发布通知 | RSS源

下载文件

下载适用于您平台的文件。如果您不确定选择哪个,请了解更多关于安装包的信息。

源代码分发

llamabot-0.8.1.tar.gz (63.5 kB 查看哈希值)

上传时间 源代码

构建分发

llamabot-0.8.1-py3-none-any.whl (69.5 kB 查看哈希值)

上传时间 Python 3

AWS AWS 云计算和安全赞助商 Datadog Datadog 监控 Fastly Fastly CDN Google Google 下载分析 Microsoft Microsoft PSF 赞助商 Pingdom Pingdom 监控 Sentry Sentry 错误记录 StatusPage StatusPage 状态页面