未提供项目描述
项目描述
NucliaDB SDK
NucliaDB SDK是一个Python库,设计为NucliaDB HTTP API的轻量级包装。它针对希望创建与NucliaDB交互的低级脚本的开发者。
警告
⚠ 如果这是您第一次使用Nuclia,或者您希望通过脚本或CLI简单地将非结构化数据推送到Nuclia,我们强烈建议您使用Nuclia CLI/SDK,因为它更易于使用,且更专注于用例。 ⚠
安装
要安装它,只需使用pip
pip install nucliadb-sdk
如何使用它?
要连接到托管在Nuclia上的NucliaDB实例,只需使用带有api_key的NucliaDB构造函数方法。
from nucliadb_sdk import NucliaDB, Region
ndb = NucliaDB(region=Region.EUROPE1, api_key="my-api-key")
或者,要连接到本地安装的NucliaDB,使用
ndb = NucliaDB(region=Region.ON_PREM, api="http://localhost:8080/api")
然后,NucliaDB
类中的每个方法都映射到NucliaDB API的HTTP端点。它接受的参数对应于端点请求体方案的Pydantic模型。
SDK的方法到端点的映射在代码中声明,在_NucliaDBBase类中。
例如,要在您的知识箱中创建资源,端点在此定义此处。
它有一个{kbid}
路径参数,并期望一个json有效载荷,其中包含一些可选键,如slug
或title
,它们是字符串类型。使用curl
,命令将是
curl -XPOST http://localhost:8080/api/v1/kb/my-kbid/resources -H 'x-nucliadb-roles: WRITER' --data-binary '{"slug":"my-resource","title":"My Resource"}' -H "Content-Type: application/json"
{"uuid":"fbdb10a79abc45c0b13400f5697ea2ba","seqid":1}
以及使用NucliaDB SDK
>>> from nucliadb_sdk import NucliaDB
>>>
>>> ndb = NucliaDB(region="on-prem", url="http://localhost:8080/api")
>>> ndb.create_resource(kbid="my-kbid", slug="my-resource", title="My Resource")
ResourceCreated(uuid='fbdb10a79abc45c0b13400f5697ea2ba', elapsed=None, seqid=1)
注意,路径参数被映射为NucliaDB
类方法的必需关键字参数:因此有kbid="my-kbid"
。在方法中指定的任何其他关键字参数都将作为HTTP请求的json请求体一起发送。
或者,您还可以定义content
参数并传递端点期望的Pydantic模型实例
>>> from nucliadb_sdk import NucliaDB
>>> from nucliadb_models.writer import CreateResourcePayload
>>>
>>> ndb = NucliaDB(region="on-prem", url="http://localhost:8080/api")
>>> content = CreateResourcePayload(slug="my-resource", title="My Resource")
>>> ndb.create_resource(kbid="my-kbid", content=content)
ResourceCreated(uuid='fbdb10a79abc45c0b13400f5697ea2ba', elapsed=None, seqid=1)
也可以使用query_params
参数在每个方法中传递查询参数。例如
>>> ndb.get_resource_by_id(kbid="my-kbid", rid="rid", query_params={"show": ["values"]})
示例用法
以下是一个示例脚本,它抓取网站的HTML,从中提取所有链接,并将它们推送到NucliaDB,以便由Nuclia的处理引擎进行处理。
from nucliadb_models.link import LinkField
from nucliadb_models.writer import CreateResourcePayload
import nucliadb_sdk
import requests
from bs4 import BeautifulSoup
def extract_links_from_url(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
unique_links = set()
for link in soup.find_all("a"):
unique_links.add(link.get("href"))
return unique_links
def upload_link_to_nuclia(ndb, *, kbid, link, tags):
try:
title = link.replace("-", " ")
slug = "-".join(tags) + "-" + link.split("/")[-1]
content = CreateResourcePayload(
title=title,
slug=slug,
links={
"link": LinkField(
uri=link,
language="en",
)
},
)
ndb.create_resource(kbid=kbid, content=content)
print(f"Resource created from {link}. Title={title} Slug={slug}")
except nucliadb_sdk.exceptions.ConflictError:
print(f"Resource already exists: {link} {slug}")
except Exception as ex:
print(f"Failed to create resource: {link} {slug}: {ex}")
def main(site):
# Define the NucliaDB instance with region and URL
ndb = nucliadb_sdk.NucliaDB(region="on-prem", url="http://localhost:8080")
# Loop through extracted links and upload to NucliaDB
for link in extract_links_from_url(site):
upload_link_to_nuclia(ndb, kbid="my-kb-id", link=link, tags=["news"])
if __name__ == "__main__":
main(site="https://en.wikipedia.org/wiki/The_Lion_King")
数据推送后,NucliaDB SDK还可以用于在提取的链接上查找答案。
>>> import nucliadb_sdk
>>>
>>> ndb = nucliadb_sdk.NucliaDB(region="on-prem", url="http://localhost:8080")
>>> resp = ndb.ask(kbid="my-kb-id", query="What does Hakuna Matata mean?")
>>> print(resp.answer)
'Hakuna matata is actually a phrase in the East African language of Swahili that literally means “no trouble” or “no problems”.'