使用OpenAI和DuckDB的自然语言查询查询本地或远程数据文件。
项目描述
qabot
使用OpenAI的gpt
和duckdb
🦆的自然语言查询查询本地或远程文件。
可查询本地和远程文件(CSV,parquet)
安装
使用pipx
、pip
等安装
pipx install qabot
安全风险
此程序允许LLM访问您的本地和网络可访问的文件,并允许它执行任意SQL查询,有关更多信息,请参阅Security.md
。
命令行使用
$ EXPORT OPENAI_API_KEY=sk-...
$ EXPORT QABOT_MODEL_NAME=gpt-4o
$ qabot -w -q "How many Hospitals are there located in Beijing"
Query: How many Hospitals are there located in Beijing
There are 39 hospitals located in Beijing.
Total tokens 1749 approximate cost in USD: 0.05562
Python使用
from qabot import ask_wikidata, ask_file, ask_database
print(ask_wikidata("How many hospitals are there in New Zealand?"))
print(ask_file("How many men were aboard the titanic?", 'data/titanic.csv'))
print(ask_database("How many product images are there?", 'postgresql://user:password@localhost:5432/dbname'))
输出
There are 54 hospitals in New Zealand.
There were 577 male passengers on the Titanic.
There are 6,225 product images.
功能
支持本地CSV和Excel文件
远程CSV文件
$ qabot -f https://duckdb.org/data/holdings.csv -q "Tell me how many Apple holdings I currently have"
🦆 Creating local DuckDB database...
🦆 Loading data...
create view 'holdings' as select * from 'https://duckdb.org/data/holdings.csv';
🚀 Sending query to LLM
🧑 Tell me how many Apple holdings I currently have
🤖 You currently have 32.23 shares of Apple.
This information was obtained by summing up all the Apple ('APPL') shares in the holdings table.
SELECT SUM(shares) as total_shares FROM holdings WHERE ticker = 'APPL'
甚至支持存储在S3中的(公共)数据
您甚至可以通过自然语言查询从磁盘/URL加载数据
将'./data/titanic.csv'文件加载到名为'raw_passengers'的表中。为仅包含男性乘客的原始乘客表创建一个视图。生存的男性乘客的平均票价是多少?
~/Dev/qabot> qabot -q "Load the file 'data/titanic.csv' into a table called 'raw_passengers'. Create a view of the raw passengers table for just the male passengers. What was the average fare for surviving male passengers?" -v
🦆 Creating local DuckDB database...
🤖 Using model: gpt-4-1106-preview. Max LLM/function iterations before answer 20
🚀 Sending query to LLM
🧑 Load the file 'data/titanic.csv' into a table called 'raw_passengers'. Create a view of the raw passengers table for just the male passengers. What was the
average fare for surviving male passengers?
🤖 load_data
{'files': ['data/titanic.csv']}
🦆 Imported with SQL:
["create table 'titanic' as select * from 'data/titanic.csv';"]
🤖 execute_sql
{'query': "CREATE VIEW male_passengers AS SELECT * FROM titanic WHERE Sex = 'male';"}
🦆 No output
🤖 execute_sql
{'query': 'SELECT AVG(Fare) as average_fare FROM male_passengers WHERE Survived = 1;'}
🦆 average_fare
40.82148440366974
🦆 {"summary": "The average fare for surviving male passengers was approximately $40.82.", "detail": "The average fare for surviving male passengers was
calculated by creating a view called `male_passengers` to filter only the male passengers from the `titanic` table, and then running a query to calculate the
average fare for male passengers who survived. The calculated average fare is approximately $40.82.", "query": "CREATE VIEW male_passengers AS SELECT * FROM
titanic WHERE Sex = 'male';\nSELECT AVG(Fare) as average_fare FROM male_passengers WHERE Survived = 1;"}
🚀 Question:
🧑 Load the file 'data/titanic.csv' into a table called 'raw_passengers'. Create a view of the raw passengers table for just the male passengers. What was the
average fare for surviving male passengers?
🤖 The average fare for surviving male passengers was approximately $40.82.
The average fare for surviving male passengers was calculated by creating a view called `male_passengers` to filter only the male passengers from the `titanic`
table, and then running a query to calculate the average fare for male passengers who survived. The calculated average fare is approximately $40.82.
CREATE VIEW male_passengers AS SELECT * FROM titanic WHERE Sex = 'male';
SELECT AVG(Fare) as average_fare FROM male_passengers WHERE Survived = 1;
快速入门
您需要将环境变量OPENAI_API_KEY
设置为您的OpenAI API密钥,您可以从此处获取。
使用pip/pipx安装qabot
命令行工具
$ pip install -U qabot
然后使用qabot
命令运行本地文件(-f my-file.csv
)或使用-w
查询wikidata。
使用qabot --help
查看所有选项
示例
本地CSV文件
$ qabot -q "how many passengers survived by gender?" -f data/titanic.csv
🦆 Loading data from files...
Loading data/titanic.csv into table titanic...
Query: how many passengers survived by gender?
Result:
There were 233 female passengers and 109 male passengers who survived.
🚀 any further questions? [y/n] (y): y
🚀 Query: what was the largest family who did not survive?
Query: what was the largest family who did not survive?
Result:
The largest family who did not survive was the Sage family, with 8 members.
🚀 any further questions? [y/n] (y): n
查询WikiData
使用-w
标志查询wikidata。为了获得最佳结果,请使用gpt-4
或类似模型。
$ EXPORT QABOT_MODEL_NAME=gpt-4
$ qabot -w -q "How many Hospitals are there located in Beijing"
中间步骤和数据库查询
使用-v
标志查看中间步骤和数据库查询。有时到达答案需要走很长的路,但看看它是如何到达的很有趣。
qabot -f data/titanic.csv -q "how many passengers survived by gender?" -v
通过http/s3访问的数据
使用-f <url>
标志从URL加载数据,例如S3上的csv文件
$ qabot -f s3://covid19-lake/enigma-jhu-timeseries/csv/jhu_csse_covid_19_timeseries_merged.csv -q "how many confirmed cases of covid are there?" -v
🦆 Loading data from files...
create table jhu_csse_covid_19_timeseries_merged as select * from 's3://covid19-lake/enigma-jhu-timeseries/csv/jhu_csse_covid_19_timeseries_merged.csv';
Result:
264308334 confirmed cases
Docker使用
您可以使用以下说明构建和运行qabot
的Docker镜像
构建Docker镜像
要构建Docker镜像,请在存储库的根目录中运行以下命令
docker build -t qabot .
运行Docker镜像
要运行Docker镜像,请使用以下命令
docker run --rm -e OPENAI_API_KEY=your_openai_api_key ghcr.io/hardbyte/qabot -w -q "How many Hospitals are there located in Beijing"
将 your_openai_api_key
替换为您的实际 OpenAI API 密钥。
想法
- 流式模式,以接收到的顺序输出结果
- token 限制和更好的成本报告
- 监督代理 - 评估查询是否“安全”运行,可能会要求用户确认运行任何被标记的查询。
- 我们通常可以零样本提问并得到单个查询 - 在 MKL 链之前我们或许可以尝试这种方法
- 单独测试每个零样本代理
- 生成并返回向用户做出的假设
- 将可选的“澄清”工具添加到链中,要求用户澄清问题
- 创建查询检查工具,检查查询是否有效和/或安全
- 将 AWS 凭据注入 duckdb 以访问 S3 中的私有资源
- 自动发布到 pypi,例如使用 可信发布者
项目详情
下载文件
下载您平台上的文件。如果您不确定选择哪个,请了解有关 安装包 的更多信息。
源分发
qabot-0.6.0.tar.gz (16.7 kB 查看哈希值)
构建分发
qabot-0.6.0-py3-none-any.whl (15.6 kB 查看哈希值)
关闭
qabot-0.6.0.tar.gz 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | febaba3f9dda6a34b97f754cb7a01df78f78cab1af1d029b884ea70ba6e37628 |
|
MD5 | e2a721af9ed3dd4c2954538344ccd7b0 |
|
BLAKE2b-256 | 14b018e0772dc722a75cde88592d827feae5c6754248f5c6b0644246a447fd99 |
关闭
qabot-0.6.0-py3-none-any.whl 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 6c709afbdd4668bec10097421e8f76404665b9b4a52baa01e038b70f86d5698d |
|
MD5 | 54429191adc4ede9bf6b28cf57803f2d |
|
BLAKE2b-256 | c50a026ca2dd2d9fd863a71d1ad26d3813148145347a50c3af71c450829eaa70 |