Python接口到LLM。
项目描述
LlamaBot: 一个Python风格的LLM接口
LlamaBot实现了对LLM的Python接口,使得在Jupyter笔记本中实验LLM以及构建利用LLM的Python应用变得更加容易。LlamaBot支持LiteLLM支持的所有的模型。
安装LlamaBot
要安装LlamaBot
pip install llamabot
获取访问LLM的权限
选项1:使用Ollama本地模型
LlamaBot支持通过Ollama使用本地模型。为此,请访问Ollama网站并安装Ollama。然后按照以下说明操作。
选项2:使用API提供商
OpenAI
如果您有OpenAI API密钥,则可以通过运行以下命令配置LlamaBot使用该API密钥:
export OPENAI_API_KEY="sk-your1api2key3goes4here"
Mistral
如果您有Mistral API密钥,则可以通过运行以下命令配置LlamaBot使用该API密钥:
export MISTRAL_API_KEY="your-api-key-goes-here"
其他API提供商
其他API提供商通常指定一个环境变量来设置。如果您有API密钥,请相应地设置环境变量。
使用方法
SimpleBot
LlamaBot最简单的用法是创建一个SimpleBot
,不记录聊天历史。这实际上等同于一个无状态的函数,您使用自然语言指令而不是代码来编程。这对于提示实验或创建简单的机器人非常有用,这些机器人根据指令预先配置以处理文本,然后可以重复使用不同的文本进行调用。
使用API提供商的SimpleBot
例如,创建一个像Richard Feynman那样解释给定文本的机器人
from llamabot import SimpleBot
system_prompt = "You are Richard Feynman. You will be given a difficult concept, and your task is to explain it back."
feynman = SimpleBot(
system_prompt,
model_name="gpt-3.5-turbo"
)
使用GPT,您需要配置OPENAI_API_KEY
环境变量。如果您想使用本地Ollama模型的SimpleBot
,请查看此示例
现在,feynman
可以在任意文本块上调用,并将以理查德·费曼的风格(或更准确地说,根据 system_prompt
指定的风格)重写该文本。例如
prompt = """
Enzyme function annotation is a fundamental challenge, and numerous computational tools have been developed.
However, most of these tools cannot accurately predict functional annotations,
such as enzyme commission (EC) number,
for less-studied proteins or those with previously uncharacterized functions or multiple activities.
We present a machine learning algorithm named CLEAN (contrastive learning–enabled enzyme annotation)
to assign EC numbers to enzymes with better accuracy, reliability,
and sensitivity compared with the state-of-the-art tool BLASTp.
The contrastive learning framework empowers CLEAN to confidently (i) annotate understudied enzymes,
(ii) correct mislabeled enzymes, and (iii) identify promiscuous enzymes with two or more EC numbers—functions
that we demonstrate by systematic in silico and in vitro experiments.
We anticipate that this tool will be widely used for predicting the functions of uncharacterized enzymes,
thereby advancing many fields, such as genomics, synthetic biology, and biocatalysis.
"""
feynman(prompt)
这将返回类似以下内容
Alright, let's break this down.
Enzymes are like little biological machines that help speed up chemical reactions in our
bodies. Each enzyme has a specific job, or function, and we use something called an
Enzyme Commission (EC) number to categorize these functions.
Now, the problem is that we don't always know what function an enzyme has, especially if
it's a less-studied or new enzyme. This is where computational tools come in. They try
to predict the function of these enzymes, but they often struggle to do so accurately.
So, the folks here have developed a new tool called CLEAN, which stands for contrastive
learning–enabled enzyme annotation. This tool uses a machine learning algorithm, which
is a type of artificial intelligence that learns from data to make predictions or
decisions.
CLEAN uses a method called contrastive learning. Imagine you have a bunch of pictures of
cats and dogs, and you want to teach a machine to tell the difference. You'd show it
pairs of pictures, some of the same animal (two cats or two dogs) and some of different
animals (a cat and a dog). The machine would learn to tell the difference by contrasting
the features of the two pictures. That's the basic idea behind contrastive learning.
CLEAN uses this method to predict the EC numbers of enzymes more accurately than
previous tools. It can confidently annotate understudied enzymes, correct mislabeled
enzymes, and even identify enzymes that have more than one function.
The creators of CLEAN have tested it with both computer simulations and lab experiments,
and they believe it will be a valuable tool for predicting the functions of unknown
enzymes. This could have big implications for fields like genomics, synthetic biology,
and biocatalysis, which all rely on understanding how enzymes work.
使用 SimpleBot
和本地 Ollama 模型
如果您想使用本地托管的 Ollama 模型,则可以使用以下语法
from llamabot import SimpleBot
system_prompt = "You are Richard Feynman. You will be given a difficult concept, and your task is to explain it back."
bot = SimpleBot(
system_prompt,
model_name="ollama/llama2:13b"
)
只需指定 model_name
关键字参数,格式为 <provider>/<model name>
。例如
- 前缀为
ollama/
,以及 - 来自 Ollama 模型库 的模型名称
您需要确保 Ollama 在本地运行;有关更多详细信息,请参阅 Ollama 文档。(同样,也可以对下面的 ChatBot
和 QueryBot
类进行操作!)
model_name
参数是可选的。如果您不提供它,Llamabot 将尝试使用默认模型。您可以在 DEFAULT_LANGUAGE_MODEL
环境变量中配置它。
聊天机器人
为了在 Jupyter Notebook 中进行聊天机器人的实验,我们还提供了 ChatBot 接口。此接口会自动跟踪 Jupyter 会话生命周期内的聊天历史。这样做可以使您将本地的 Jupyter Notebook 用作聊天界面。
例如
from llamabot import ChatBot
system_prompt="You are Richard Feynman. You will be given a difficult concept, and your task is to explain it back."
feynman = ChatBot(
system_prompt,
session_name="feynman_chat",
# Optional:
# model_name="gpt-3.5-turbo"
# or
# model_name="ollama/mistral"
)
有关 model_name
的更多解释,请参阅 使用 SimpleBot
的示例。
现在,您已经有一个 ChatBot
实例,您可以开始与之进行对话。
prompt = """
Enzyme function annotation is a fundamental challenge, and numerous computational tools have been developed.
However, most of these tools cannot accurately predict functional annotations,
such as enzyme commission (EC) number,
for less-studied proteins or those with previously uncharacterized functions or multiple activities.
We present a machine learning algorithm named CLEAN (contrastive learning–enabled enzyme annotation)
to assign EC numbers to enzymes with better accuracy, reliability,
and sensitivity compared with the state-of-the-art tool BLASTp.
The contrastive learning framework empowers CLEAN to confidently (i) annotate understudied enzymes,
(ii) correct mislabeled enzymes, and (iii) identify promiscuous enzymes with two or more EC numbers—functions
that we demonstrate by systematic in silico and in vitro experiments.
We anticipate that this tool will be widely used for predicting the functions of uncharacterized enzymes,
thereby advancing many fields, such as genomics, synthetic biology, and biocatalysis.
"""
feynman(prompt)
在可用的聊天历史中,您可以提出后续问题
feynman("Is there a simpler way to rephrase the text such that a high schooler would understand it?")
并且您的机器人将利用聊天历史进行响应。
查询机器人
提供的最后一个机器人是查询机器人。此机器人允许您查询文档集合。要使用它,您有两种选择
- 传递一个包含文本文件路径的列表,让 Llamabot 为它们创建一个新的集合,或者
- 传递先前实例化的
QueryBot
模型的collection_name
。(这将加载先前计算好的文本索引到内存中。)
创建新集合的示例
from llamabot import QueryBot
from pathlib import Path
bot = QueryBot(
system_prompt="You are an expert on Eric Ma's blog.",
collection_name="eric_ma_blog",
document_paths=[
Path("/path/to/blog/post1.txt"),
Path("/path/to/blog/post2.txt"),
...,
],
# Optional:
# model_name="gpt-3.5-turbo"
# or
# model_name="ollama/mistral"
) # This creates a new embedding for my blog text.
result = bot("Do you have any advice for me on career development?")
使用现有集合的示例
from llamabot import QueryBot
bot = QueryBot(
system_prompt="You are an expert on Eric Ma's blog",
collection_name="eric_ma_blog",
# Optional:
# model_name="gpt-3.5-turbo"
# or
# model_name="ollama/mistral"
) # This loads my previously-embedded blog text.
result = bot("Do you have any advice for me on career development?")
有关 model_name
的更多解释,请参阅 使用 SimpleBot
的示例。
图像机器人
随着 OpenAI API 更新的发布,只要您有 OpenAI API 密钥,您就可以使用 LlamaBot 生成图像
from llamabot import ImageBot
bot = ImageBot()
# Within a Jupyter notebook:
url = bot("A painting of a dog.")
# Or within a Python script
filepath = bot("A painting of a dog.")
# Now, you can do whatever you need with the url or file path.
如果您在 Jupyter Notebook 中,您还会看到图像神奇地作为输出单元格的一部分出现。
CLI 示例
Llamabot 包含 CLI 示例,展示了可以使用它构建的内容,以及一些辅助代码。
这里有一个示例,我在命令行中直接使用 llamabot chat
暴露聊天机器人
还有另一个示例,其中 llamabot
被用作 CLI 应用程序的后端,用于使用 llamabot zotero chat
与 Zotero 库聊天
最后,这里有一个示例,我使用 llamabot
的 SimpleBot
创建了一个机器人,该机器人可以自动为我编写提交消息。
缓存
LlamaBot 使用缓存机制来提高性能并减少不必要的 API 调用。默认情况下,所有缓存条目在 1 天后(86400 秒)过期。此行为是通过使用 diskcache
库实现的。
缓存配置
在您使用任何机器人类(SimpleBot
、ChatBot
或 QueryBot
)时,缓存会自动配置。您无需手动设置缓存。
缓存位置
默认缓存目录位于
~/.llamabot/cache
缓存超时
缓存超时可以通过使用环境变量 LLAMABOT_CACHE_TIMEOUT
进行配置。默认情况下,缓存超时设置为1天(86400秒)。要自定义缓存超时,将环境变量 LLAMABOT_CACHE_TIMEOUT
设置为所需的秒数。例如
export LLAMABOT_CACHE_TIMEOUT=3600
这将设置缓存超时为1小时(3600秒)。
贡献
新功能
欢迎提出新功能!对于大型语言模型的用户来说,这是一个早期且令人兴奋的日子。我们的开发目标是尽可能保持项目简单。带有pull request的功能请求将被优先考虑;功能的实现越简单(就维护负担而言),越有可能被批准。
错误报告
请使用问题跟踪器提交错误报告。
问题/讨论
请使用GitHub上的问题跟踪器。
贡献者
陆瑞娜 💻 |
andrew giessel 🤔 🎨 💻 |
艾登·布鲁伊斯 💻 |
马修·艾瑞克 🤔 🎨 💻 |
马克·哈里森 🤔 |
reka 📖 💻 |
anujsinha3 💻 📖 |
埃利奥特·萨尔兹伯里 📖 |
Ethan Fricker, PhD 📖 |
Ikko Eltociear Ashimine 📖 |
项目详情
下载文件
下载适用于您平台的文件。如果您不确定选择哪个,请了解更多关于安装包的信息。