自动为监督学习生成预测问题和标签。
项目描述
Trane是一个软件包,它自动为时间数据集生成问题并为监督学习生成标签。其目标是简化机器学习问题解决过程。
安装
使用pip安装Trane
python -m pip install trane
使用方法
以下是一个快速展示Trane操作示例
import trane
data, metadata = trane.load_airbnb()
problem_generator = trane.ProblemGenerator(
metadata=metadata,
entity_columns=["location"]
)
problems = problem_generator.generate()
for problem in problems[:5]:
print(problem)
生成的一些问题
==================================================
Generated 40 total problems
--------------------------------------------------
Classification problems: 5
Regression problems: 35
==================================================
For each <location> predict if there exists a record
For each <location> predict if there exists a record with <location> equal to <str>
For each <location> predict if there exists a record with <location> not equal to <str>
For each <location> predict if there exists a record with <rating> equal to <str>
For each <location> predict if there exists a record with <rating> not equal to <str>
通过Trane的LLM附加组件(pip install trane[llm]
),我们可以确定与OpenAI相关的相关问题。
from trane.llm import analyze
instructions = "determine 5 most relevant problems about user's booking preferences. Do not include 'predict the first/last X' problems"
context = "Airbnb data listings in major cities, including information about hosts, pricing, location, and room type, along with over 5 million historical reviews."
relevant_problems = analyze(
problems=problems,
instructions=instructions,
context=context,
model="gpt-3.5-turbo-16k"
)
for problem in relevant_problems:
print(problem)
print(f'Reasoning: {problem.get_reasoning()}\n')
输出
For each <location> predict if there exists a record
Reasoning: This problem can help identify locations with missing data or locations that have not been booked at all.
For each <location> predict the first <location> in all related records
Reasoning: Predicting the first location in all related records can provide insights into the most frequently booked locations for each city.
For each <location> predict the first <rating> in all related records
Reasoning: Predicting the first rating in all related records can provide insights into the average satisfaction level of guests for each location.
For each <location> predict the last <location> in all related records
Reasoning: Predicting the last location in all related records can provide insights into the most recent bookings for each city.
For each <location> predict the last <rating> in all related records
Reasoning: Predicting the last rating in all related records can provide insights into the recent satisfaction level of guests for each location.
社区
- 有问题或疑虑?创建一个GitHub问题。
- 想要聊天吗? 加入我们的Slack社区。
引用Trane
如果您觉得Trane很有益,请考虑引用我们的论文
Ben Schreck,Kalyan Veeramachaneni. 数据科学家会问什么?自动构建和解决预测问题。 IEEE DSAA 2016,440-451。
BibTeX条目
@inproceedings{schreck2016would,
title={What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems},
author={Schreck, Benjamin and Veeramachaneni, Kalyan},
booktitle={Data Science and Advanced Analytics (DSAA), 2016 IEEE International Conference on},
pages={440--451},
year={2016},
organization={IEEE}
}
项目详情
下载文件
下载适合您平台的文件。如果您不确定选择哪个,请了解更多关于安装包的信息。
源分发
trane-0.8.0.tar.gz (4.4 MB 查看哈希值)
构建分发
trane-0.8.0-py3-none-any.whl (4.4 MB 查看哈希值)
关闭
trane-0.8.0.tar.gz的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 677514a691ba5a49a4b4569605a23990005549cd7943c71c8fc8e4ccef60684f |
|
MD5 | 1ce664566a94b7eb49792eb64af887cf |
|
BLAKE2b-256 | 2c8777b9b61a74c9b66392b9b383efdd2c572bffd4b40fd4d64e9fdb3f19a805 |
关闭
trane-0.8.0-py3-none-any.whl的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 9f69b86da4bd3226a1b25bb7f6fafb91ae47b9e7ef21a9dc99d4e200f6c9a8b5 |
|
MD5 | 7fd6e736471214a7059e6ce19fe38a18 |
|
BLAKE2b-256 | f2f01755d68322eca0c1344c5786650ec0d4d1f2d141d1b3e9135fff28090d64 |