timeseriesflattener

将时间序列数据（例如电子健康记录）转换为宽格式数据的软件包。

这些详细信息尚未由PyPI验证

项目链接

项目描述

来自电子健康记录等的时间序列数据通常具有大量变量，以不规则的时间间隔采样，并且往往具有大量缺失值。在可以使用机器学习方法（如逻辑回归或XGBoost）进行预测建模之前，需要对这些数据进行重塑。

本质上，需要将时间序列展平，以便每个预测时间由一组预测值和一个结果值表示。这些预测值可以通过在一定时间窗口内聚合时间序列中的先前值来构建。

timeseriesflattener旨在通过提供易于使用且完全指定的流程来简化复杂时间序列的展平过程。

🔧 安装

要开始使用timeseriesflattener，只需使用pip安装，在您的终端中运行以下行

pip install timeseriesflattener

⚡ 快速开始

import datetime as dt

import numpy as np
import polars as pl

# Load a dataframe with times you wish to make a prediction
prediction_times_df = pl.DataFrame(
    {"id": [1, 1, 2], "date": ["2020-01-01", "2020-02-01", "2020-02-01"]}
)
# Load a dataframe with raw values you wish to aggregate as predictors
predictor_df = pl.DataFrame(
    {
        "id": [1, 1, 1, 2],
        "date": ["2020-01-15", "2019-12-10", "2019-12-15", "2020-01-02"],
        "predictor_value": [1, 2, 3, 4],
    }
)
# Load a dataframe specifying when the outcome occurs
outcome_df = pl.DataFrame({"id": [1], "date": ["2020-03-01"], "outcome_value": [1]})

# Specify how to aggregate the predictors and define the outcome
from timeseriesflattener import (
    MaxAggregator,
    MinAggregator,
    OutcomeSpec,
    PredictionTimeFrame,
    PredictorSpec,
    ValueFrame,
)

predictor_spec = PredictorSpec(
    value_frame=ValueFrame(
        init_df=predictor_df, entity_id_col_name="id", value_timestamp_col_name="date"
    ),
    lookbehind_distances=[dt.timedelta(days=1)],
    aggregators=[MaxAggregator(), MinAggregator()],
    fallback=np.nan,
    column_prefix="pred",
)

outcome_spec = OutcomeSpec(
    value_frame=ValueFrame(
        init_df=outcome_df, entity_id_col_name="id", value_timestamp_col_name="date"
    ),
    lookahead_distances=[dt.timedelta(days=1)],
    aggregators=[MaxAggregator(), MinAggregator()],
    fallback=np.nan,
    column_prefix="outc",
)

# Instantiate TimeseriesFlattener and add the specifications
from timeseriesflattener import Flattener

result = Flattener(
    predictiontime_frame=PredictionTimeFrame(
        init_df=prediction_times_df, entity_id_col_name="id", timestamp_col_name="date"
    )
).aggregate_timeseries(specs=[predictor_spec, outcome_spec])
result.df

输出

	id	date	prediction_time_uuid	pred_test_feature_within_30_days_mean_fallback_nan	outc_test_outcome_within_31_days_maximum_fallback_0_dichotomous
0	1	2020-01-01 00:00:00	1-2020-01-01-00-00-00	2.5	0
1	1	2020-02-01 00:00:00	1-2020-02-01-00-00-00	1	1
2	2	2020-02-01 00:00:00	2-2020-02-01-00-00-00	4	0

📖 教程

🎓 教程
📖 通用文档

💬 哪里可以提问

类型
🚨 错误报告	GitHub问题跟踪器
🎁 功能请求和想法	GitHub问题跟踪器
👩‍💻 使用问题	GitHub 讨论区
🗯 通用讨论	GitHub 讨论区

🎓 项目

PSYCOP 项目使用 timeseriesflattener，更多详情请参阅 monorepo。

项目详情

这些详细信息尚未由PyPI验证

项目链接

发布历史发布通知 | RSS 源

本版本

2.4.0

2024年9月27日

2.3.0

2024年9月27日

2.2.6

2024年5月23日

2.2.5

2024年5月22日

2.2.4

2024年5月17日

2.2.3

2024年5月7日

2.2.2

2024年5月3日

2.2.1

2024年5月2日

2.2.0

2024年4月30日

2.1.2

2024年4月18日

2.1.1

2024年4月18日

2.1.0

2024年2月27日

2.0.2

2024年2月27日

2.0.1

2024年2月26日

2.0.0

2024年2月26日

1.36.2

2024年2月23日

1.36.1

2024年2月22日

1.36.0

2024年2月22日

1.35.0

2024年2月22日

1.34.0

2024年2月22日

1.33.0

2024年2月22日

1.32.0

2024年2月22日

1.31.3

2024年2月22日

1.31.2

2024年2月19日

1.31.1

2024年2月19日

1.31.0

2024年2月19日

1.30.0

2024年2月19日

1.29.0

2024年2月19日

1.28.0

2024年2月19日

1.27.0

2024年2月16日

1.26.0

2024年2月16日

1.25.1

2024年2月16日

1.25.0

2024年2月16日

1.24.0

2024年2月15日

1.23.0

2024年2月14日

1.22.0

2024年2月14日

1.21.1

2024年2月14日

1.21.0

2024年2月14日

1.20.1

2024年2月13日

1.20.0

2024年2月13日

1.19.0

2024年2月13日

1.18.1

2024年2月13日

1.18.0

2024年2月13日

1.17.0

2024年2月13日

1.16.0

2024年2月12日

1.15.0

2024年2月12日

1.14.0

2024年2月12日

1.13.0

2024年2月12日

1.12.0

2024年2月12日

1.11.0

2024年2月9日

1.10.0

2024年1月25日

1.9.1

2024年1月23日

1.9.0

2024年1月18日

1.8.0

2023年11月24日

1.7.0

2023年10月20日

1.6.1

2023年10月5日

1.6.0

2023年8月9日

1.5.2

2023年8月2日

1.5.1

2023年8月1日

1.5.0

2023年8月1日

1.4.0

2023年7月12日

1.3.1

2023年6月30日

1.3.0

2023年6月29日

1.2.1

2023年6月20日

1.2.0

2023年6月20日

1.0.0

2023年6月15日

0.27.0

2023年5月19日

0.26.0

2023年5月4日

0.25.1

2023年5月2日

0.25.0

2023年4月26日

0.24.0

2023年4月20日

0.23.11

2023年3月28日

0.23.10

2023年3月28日

0.23.9

2023年3月28日

0.23.8

2023年3月28日

0.23.7

2023年3月24日

0.23.6

2023年3月20日

0.23.5

2023年3月20日

0.23.4

2023年3月20日

0.23.3

2023年3月10日

0.23.2

2023年3月1日

0.23.1

2023年2月24日

0.23.0

2023年2月9日

0.22.1

2022年12月19日

0.22.0

2022年12月15日

0.21.0

2022年12月14日

0.20.3

2022年12月13日

0.20.2

2022年12月9日

0.20.1

2022年12月9日

0.20.0

2022年12月8日

0.19.1

2022年12月8日

0.19.0

2022年12月8日

0.18.0

2022年12月8日

0.17.0

2022年12月8日

0.16.0

2022年12月7日

0.15.0

2022年12月6日

0.14.0

2022年12月6日

0.13.0

2022年12月6日

0.12.1

2022年12月2日

下载文件

下载适用于您的平台的文件。如果您不确定选择哪个，请了解更多关于安装包的信息。

源代码分发

timeseriesflattener-2.4.0.tar.gz (9.6 MB 查看散列)

上传时间 2024年9月27日 源代码

构建分发

timeseriesflattener-2.4.0-py3-none-any.whl (8.6 MB 查看散列)

上传时间 2024年9月27日 Python 3