跳转到主要内容

使用OpenAI和DuckDB的自然语言查询查询本地或远程数据文件。

项目描述

qabot

使用OpenAI的gptduckdb 🦆的自然语言查询查询本地或远程文件。

可查询本地和远程文件(CSV,parquet)

安装

使用pipxpip等安装

pipx install qabot

安全风险

此程序允许LLM访问您的本地和网络可访问的文件,并允许它执行任意SQL查询,有关更多信息,请参阅Security.md

命令行使用

$ EXPORT OPENAI_API_KEY=sk-...
$ EXPORT QABOT_MODEL_NAME=gpt-4o
$ qabot -w -q "How many Hospitals are there located in Beijing"
Query: How many Hospitals are there located in Beijing
There are 39 hospitals located in Beijing.
Total tokens 1749 approximate cost in USD: 0.05562

Python使用

from qabot import ask_wikidata, ask_file, ask_database

print(ask_wikidata("How many hospitals are there in New Zealand?"))
print(ask_file("How many men were aboard the titanic?", 'data/titanic.csv'))
print(ask_database("How many product images are there?", 'postgresql://user:password@localhost:5432/dbname'))

输出

There are 54 hospitals in New Zealand.
There were 577 male passengers on the Titanic.
There are 6,225 product images.

功能

支持本地CSV和Excel文件

远程CSV文件

$ qabot -f https://duckdb.org/data/holdings.csv -q "Tell me how many Apple holdings I currently have"
 🦆 Creating local DuckDB database...
 🦆 Loading data...
create view 'holdings' as select * from 'https://duckdb.org/data/holdings.csv';
 🚀 Sending query to LLM
 🧑 Tell me how many Apple holdings I currently have


 🤖 You currently have 32.23 shares of Apple.


This information was obtained by summing up all the Apple ('APPL') shares in the holdings table.

SELECT SUM(shares) as total_shares FROM holdings WHERE ticker = 'APPL'

甚至支持存储在S3中的(公共)数据

您甚至可以通过自然语言查询从磁盘/URL加载数据

将'./data/titanic.csv'文件加载到名为'raw_passengers'的表中。为仅包含男性乘客的原始乘客表创建一个视图。生存的男性乘客的平均票价是多少?

~/Dev/qabot> qabot -q "Load the file 'data/titanic.csv' into a table called 'raw_passengers'. Create a view of the raw passengers table for just the male passengers. What was the average fare for surviving male passengers?" -v
 🦆 Creating local DuckDB database...
 🤖 Using model: gpt-4-1106-preview. Max LLM/function iterations before answer 20
 🚀 Sending query to LLM
 🧑 Load the file 'data/titanic.csv' into a table called 'raw_passengers'. Create a view of the raw passengers table for just the male passengers. What was the    
average fare for surviving male passengers?
 🤖 load_data
{'files': ['data/titanic.csv']}
 🦆 Imported with SQL:
["create table 'titanic' as select * from 'data/titanic.csv';"]
 🤖 execute_sql
{'query': "CREATE VIEW male_passengers AS SELECT * FROM titanic WHERE Sex = 'male';"}
 🦆 No output
 🤖 execute_sql
{'query': 'SELECT AVG(Fare) as average_fare FROM male_passengers WHERE Survived = 1;'}
 🦆 average_fare
40.82148440366974
 🦆 {"summary": "The average fare for surviving male passengers was approximately $40.82.", "detail": "The average fare for surviving male passengers was
calculated by creating a view called `male_passengers` to filter only the male passengers from the `titanic` table, and then running a query to calculate the      
average fare for male passengers who survived. The calculated average fare is approximately $40.82.", "query": "CREATE VIEW male_passengers AS SELECT * FROM       
titanic WHERE Sex = 'male';\nSELECT AVG(Fare) as average_fare FROM male_passengers WHERE Survived = 1;"}


 🚀 Question:
 🧑 Load the file 'data/titanic.csv' into a table called 'raw_passengers'. Create a view of the raw passengers table for just the male passengers. What was the    
average fare for surviving male passengers?
 🤖 The average fare for surviving male passengers was approximately $40.82.


The average fare for surviving male passengers was calculated by creating a view called `male_passengers` to filter only the male passengers from the `titanic`    
table, and then running a query to calculate the average fare for male passengers who survived. The calculated average fare is approximately $40.82.

CREATE VIEW male_passengers AS SELECT * FROM titanic WHERE Sex = 'male';
SELECT AVG(Fare) as average_fare FROM male_passengers WHERE Survived = 1;

快速入门

您需要将环境变量OPENAI_API_KEY设置为您的OpenAI API密钥,您可以从此处获取。

使用pip/pipx安装qabot命令行工具

$ pip install -U qabot

然后使用qabot命令运行本地文件(-f my-file.csv)或使用-w查询wikidata。

使用qabot --help查看所有选项

示例

本地CSV文件

$ qabot -q "how many passengers survived by gender?" -f data/titanic.csv
🦆 Loading data from files...
Loading data/titanic.csv into table titanic...

Query: how many passengers survived by gender?
Result:
There were 233 female passengers and 109 male passengers who survived.


 🚀 any further questions? [y/n] (y): y

 🚀 Query: what was the largest family who did not survive? 
Query: what was the largest family who did not survive?
Result:
The largest family who did not survive was the Sage family, with 8 members.

 🚀 any further questions? [y/n] (y): n

查询WikiData

使用-w标志查询wikidata。为了获得最佳结果,请使用gpt-4或类似模型。

$ EXPORT QABOT_MODEL_NAME=gpt-4
$ qabot -w -q "How many Hospitals are there located in Beijing"

中间步骤和数据库查询

使用-v标志查看中间步骤和数据库查询。有时到达答案需要走很长的路,但看看它是如何到达的很有趣。

qabot -f data/titanic.csv -q "how many passengers survived by gender?" -v

通过http/s3访问的数据

使用-f <url>标志从URL加载数据,例如S3上的csv文件

$ qabot -f s3://covid19-lake/enigma-jhu-timeseries/csv/jhu_csse_covid_19_timeseries_merged.csv -q "how many confirmed cases of covid are there?" -v
🦆 Loading data from files...
create table jhu_csse_covid_19_timeseries_merged as select * from 's3://covid19-lake/enigma-jhu-timeseries/csv/jhu_csse_covid_19_timeseries_merged.csv';

Result:
264308334 confirmed cases

Docker使用

您可以使用以下说明构建和运行qabot的Docker镜像

构建Docker镜像

要构建Docker镜像,请在存储库的根目录中运行以下命令

docker build -t qabot .

运行Docker镜像

要运行Docker镜像,请使用以下命令

docker run --rm -e OPENAI_API_KEY=your_openai_api_key ghcr.io/hardbyte/qabot -w -q "How many Hospitals are there located in Beijing"

your_openai_api_key 替换为您的实际 OpenAI API 密钥。

想法

  • 流式模式,以接收到的顺序输出结果
  • token 限制和更好的成本报告
  • 监督代理 - 评估查询是否“安全”运行,可能会要求用户确认运行任何被标记的查询。
  • 我们通常可以零样本提问并得到单个查询 - 在 MKL 链之前我们或许可以尝试这种方法
  • 单独测试每个零样本代理
  • 生成并返回向用户做出的假设
  • 将可选的“澄清”工具添加到链中,要求用户澄清问题
  • 创建查询检查工具,检查查询是否有效和/或安全
  • 将 AWS 凭据注入 duckdb 以访问 S3 中的私有资源
  • 自动发布到 pypi,例如使用 可信发布者

项目详情


下载文件

下载您平台上的文件。如果您不确定选择哪个,请了解有关 安装包 的更多信息。

源分发

qabot-0.6.0.tar.gz (16.7 kB 查看哈希值)

上传时间

构建分发

qabot-0.6.0-py3-none-any.whl (15.6 kB 查看哈希值)

上传时间 Python 3

支持者

AWS AWS 云计算和安全赞助商 Datadog Datadog 监控 Fastly Fastly CDN Google Google 下载分析 Microsoft Microsoft PSF赞助商 Pingdom Pingdom 监控 Sentry Sentry 错误记录 StatusPage StatusPage 状态页面