跳转到主要内容

自动为监督学习生成预测问题和标签。

项目描述

Trane Logo

Tests Status Code Coverage PyPI Version PyPI Downloads


Trane是一个软件包,它自动为时间数据集生成问题并为监督学习生成标签。其目标是简化机器学习问题解决过程。

安装

使用pip安装Trane

python -m pip install trane

使用方法

以下是一个快速展示Trane操作示例

import trane

data, metadata = trane.load_airbnb()
problem_generator = trane.ProblemGenerator(
  metadata=metadata,
  entity_columns=["location"]
)
problems = problem_generator.generate()

for problem in problems[:5]:
    print(problem)

生成的一些问题

==================================================
Generated 40 total problems
--------------------------------------------------
Classification problems: 5
Regression problems: 35
==================================================
For each <location> predict if there exists a record
For each <location> predict if there exists a record with <location> equal to <str>
For each <location> predict if there exists a record with <location> not equal to <str>
For each <location> predict if there exists a record with <rating> equal to <str>
For each <location> predict if there exists a record with <rating> not equal to <str>

通过Trane的LLM附加组件(pip install trane[llm]),我们可以确定与OpenAI相关的相关问题。

from trane.llm import analyze

instructions = "determine 5 most relevant problems about user's booking preferences. Do not include 'predict the first/last X' problems"
context = "Airbnb data listings in major cities, including information about hosts, pricing, location, and room type, along with over 5 million historical reviews."
relevant_problems = analyze(
    problems=problems,
    instructions=instructions,
    context=context,
    model="gpt-3.5-turbo-16k"
)
for problem in relevant_problems:
    print(problem)
    print(f'Reasoning: {problem.get_reasoning()}\n')

输出

For each <location> predict if there exists a record
Reasoning: This problem can help identify locations with missing data or locations that have not been booked at all.

For each <location> predict the first <location> in all related records
Reasoning: Predicting the first location in all related records can provide insights into the most frequently booked locations for each city.

For each <location> predict the first <rating> in all related records
Reasoning: Predicting the first rating in all related records can provide insights into the average satisfaction level of guests for each location.

For each <location> predict the last <location> in all related records
Reasoning: Predicting the last location in all related records can provide insights into the most recent bookings for each city.

For each <location> predict the last <rating> in all related records
Reasoning: Predicting the last rating in all related records can provide insights into the recent satisfaction level of guests for each location.

社区

引用Trane

如果您觉得Trane很有益,请考虑引用我们的论文

Ben Schreck,Kalyan Veeramachaneni. 数据科学家会问什么?自动构建和解决预测问题。 IEEE DSAA 2016,440-451。

BibTeX条目

@inproceedings{schreck2016would,
  title={What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems},
  author={Schreck, Benjamin and Veeramachaneni, Kalyan},
  booktitle={Data Science and Advanced Analytics (DSAA), 2016 IEEE International Conference on},
  pages={440--451},
  year={2016},
  organization={IEEE}
}

项目详情


下载文件

下载适合您平台的文件。如果您不确定选择哪个,请了解更多关于安装包的信息。

源分发

trane-0.8.0.tar.gz (4.4 MB 查看哈希值

上传时间

构建分发

trane-0.8.0-py3-none-any.whl (4.4 MB 查看哈希值

上传时间 Python 3

支持者

AWS AWS 云计算和安全赞助商 Datadog Datadog 监控 Fastly Fastly CDN Google Google 下载分析 Microsoft Microsoft PSF 赞助商 Pingdom Pingdom 监控 Sentry Sentry 错误日志 StatusPage StatusPage 状态页面