未提供项目描述
项目描述
GraphRAG-SDK
GraphRAG-SDK 是一个用于构建图检索增强生成(GraphRAG)应用的综合解决方案,利用 FalkorDB 以获得最佳性能。
功能
- 本体管理:手动或自动从非结构化数据管理本体。
- 知识图谱(KG):构建和查询知识图谱以实现高效数据检索。
- LLMs集成:支持OpenAI和Google Gemini模型。
- 多代理系统:基于KG的多代理调度器。
开始使用
安装
pip install graphrag_sdk
先决条件
图数据库
GraphRAG-SDK依赖于 FalkorDB 作为其图引擎,并与OpenAI/Gemini兼容。
使用 FalkorDB Cloud 获取凭证或本地启动FalkorDB
docker run -p 6379:6379 -p 3000:3000 -it --rm -v ./data:/data falkordb/falkordb:latest
LLM模型
目前,此SDK支持以下LLMs API
确保存在包含所有必需凭证的 .env
文件。
.env
OPENAI_API_KEY="OPENAI_API_KEY"
GOOGLE_API_KEY="GOOGLE_API_KEY"
基本用法
以下示例展示了使用自动检测的本体创建GraphRAG的基本用法。
from dotenv import load_dotenv
from graphrag_sdk.source import URL
from graphrag_sdk import KnowledgeGraph, Ontology
from graphrag_sdk.models.openai import OpenAiGenerativeModel
from graphrag_sdk.model_config import KnowledgeGraphModelConfig
load_dotenv()
# Import Data
urls = ["https://www.rottentomatoes.com/m/side_by_side_2012",
"https://www.rottentomatoes.com/m/matrix",
"https://www.rottentomatoes.com/m/matrix_revolutions",
"https://www.rottentomatoes.com/m/matrix_reloaded",
"https://www.rottentomatoes.com/m/speed_1994",
"https://www.rottentomatoes.com/m/john_wick_chapter_4"]
sources = [URL(url) for url in urls]
# Model
model = OpenAiGenerativeModel(model_name="gpt-4o")
# Ontology Auto-Detection
ontology = Ontology.from_sources(
sources=sources,
model=model,
)
# Knowledge Graph
kg = KnowledgeGraph(
name="movies",
model_config=KnowledgeGraphModelConfig.with_model(model),
ontology=ontology,
)
# GraphRAG System and Questioning
kg.process_sources(sources)
chat = kg.chat_session()
print(chat.send_message("Who is the director of the movie The Matrix?"))
print(chat.send_message("How this director connected to Keanu Reeves?"))
工具
导入源数据
该SDK支持以下文件格式
- TEXT
- JSONL
- URL
- HTML
- CSV
import os
from graphrag_sdk.source import Source
src_files = "data_folder"
sources = []
# Create a Source object.
for file in os.listdir(src_files):
sources.append(Source(os.path.join(src_files, file)))
本体
您可以从数据中自动检测本体,或者手动定义它。此外,您可以为本体自动检测设置边界
。
一旦创建本体,您可以在使用它来构建知识图谱(KG)之前,根据需要对其进行审查、修改和更新。
import random
from falkordb import FalkorDB
from graphrag_sdk import KnowledgeGraph, Ontology
from graphrag_sdk.models.openai import OpenAiGenerativeModel
# Define the percentage of files that will be used to auto-create the ontology.
percent = 0.1 # This represents 10%. You can adjust this value (e.g., 0.2 for 20%).
boundaries = """
Extract only the most relevant information about UFC fighters, fights, and events.
Avoid creating entities for details that can be expressed as attributes.
"""
# Define the model to be used for the ontology
model = OpenAiGenerativeModel(model_name="gpt-4o")
# Randomly select a percentage of files from sources.
sampled_sources = random.sample(sources, round(len(sources) * percent))
ontology = Ontology.from_sources(
sources=sampled_sources,
boundaries=boundaries,
model=model,
)
# Save the ontology to the disk as a json file.
with open("ontology.json", "w", encoding="utf-8") as file:
file.write(json.dumps(ontology.to_json(), indent=2))
在生成初始本体之后,您可以对其进行审查,并根据您的数据和需求进行必要的修改,这可能包括精炼实体类型或调整关系。
一旦您对本体满意,您就可以继续使用它来创建和管理您的知识图谱(KG)。
知识图谱
现在,您可以使用SDK从您的来源和本体创建知识图谱(KG)。
# After approving the ontology, load it from disk.
ontology_file = "ontology.json"
with open(ontology_file, "r", encoding="utf-8") as file:
ontology = Ontology.from_json(json.loads(file.read()))
kg = KnowledgeGraph(
name="kg_name",
model_config=KnowledgeGraphModelConfig.with_model(model),
ontology=ontology,
)
kg.process_sources(sources)
您可以通过使用process_sources
方法处理更多来源来随时更新KG。
图RAG
到此为止,您已经有一个可以使用此SDK进行查询的知识图谱。您可以使用ask
方法进行单次提问或使用chat_session
进行对话。
# Single question.
response = kg.ask("What were the last five fights? When were they? How many rounds did they have?")
print(response)
# Conversation.
chat = kg.chat_session()
response = chat.send_message("Who is Salsa Boy?")
print(response)
response = chat.send_message("Tell me about one of his fights?")
print(response)
多代理 - 编排器
GraphRAG-SDK支持KG代理。每个代理都是对其所学习数据的专家,而编排器则负责协调代理。
代理
请参阅基本用法部分,了解如何为代理创建KG对象。
# Define the model
model = OpenAiGenerativeModel("gpt-4o")
# Create the KG from the predefined ontology.
# In this example, we will use the restaurants agent and the attractions agent.
restaurants_kg = KnowledgeGraph(
name="restaurants",
ontology=restaurants_ontology,
model_config=KnowledgeGraphModelConfig.with_model(model),
)
attractions_kg = KnowledgeGraph(
name="attractions",
ontology=attractions_ontology,
model_config=KnowledgeGraphModelConfig.with_model(model),
)
# The following agent is specialized in finding restaurants.
restaurants_agent = KGAgent(
agent_id="restaurants_agent",
kg=restaurants_kg,
introduction="I'm a restaurant agent, specialized in finding the best restaurants for you.",
)
# The following agent is specialized in finding tourist attractions.
attractions_agent = KGAgent(
agent_id="attractions_agent",
kg=attractions_kg,
introduction="I'm an attractions agent, specialized in finding the best tourist attractions for you.",
)
编排器 - 多代理系统
编排器管理代理的使用并处理提问。
# Initialize the orchestrator while giving it the backstory.
orchestrator = Orchestrator(
model,
backstory="You are a trip planner, and you want to provide the best possible itinerary for your clients.",
)
# Register the agents that we created above.
orchestrator.register_agent(restaurants_agent)
orchestrator.register_agent(attractions_agent)
# Query the orchestrator.
runner = orchestrator.ask("Create a two-day itinerary for a trip to Rome. Please don't ask me any questions; just provide the best itinerary you can.")
print(runner.output)
支持
与我们社区联系以获取支持和讨论。如果您有任何问题,请通过以下方法之一与我们联系
项目详情
关闭
graphrag_sdk-0.2.1.tar.gz的散列值
算法 | 散列摘要 | |
---|---|---|
SHA256 | 7329cba069a974bc2aec1fdc8acfd9918be3c6c1f193da654da319ca33346e83 |
|
MD5 | 51fdb1db88b694d99f90267e09c81ab3 |
|
BLAKE2b-256 | ef22e1e0548d27441e9de1f8736c0c73c109f9e52864bbc00d36042a9ee71cd5 |