跳转到主要内容

高性能Python GLM,具有所有功能!

项目描述

glum

CI Docs Conda-forge PypiVersion PythonVersion

文档

广义线性模型(GLM)是一种核心统计工具,包括许多常见方法,如最小二乘回归、泊松回归和逻辑回归作为特例。在QuantCo,我们已在电子商务定价、保险索赔预测等领域使用GLM。我们开发了 glum,这是一个快速的Python优先GLM库。该开发基于 scikit-learn的一个分支,因此它具有类似scikit-learn的API。我们对在那个PR中由Christian Lorentzen提供的起点表示感谢!

glum 的目标是至少与现有的GLM库(如 glmneth2o)一样功能齐全。它支持

  • 内置交叉验证以实现最佳正则化,高效利用“正则化路径”
  • L1正则化,它产生稀疏且易于理解的解
  • L2正则化,包括变量矩阵值(Tikhonov)惩罚,在建模相关效应时很有用
  • 弹性网络正则化
  • 正态、泊松、逻辑、伽马和Tweedie分布,以及可变和可定制的连接函数
  • 箱型约束、线性不等式约束、样本权重、偏移量

此存储库还包括用于在 glum_benchmarks 模块中基准测试GLM实现的工具。有关基准测试的详细信息,请参阅此处。虽然glum相对于glmnet和h2o的性能取决于具体问题,但我们发现,当N >> K(观测值多于预测值)时,它对各种问题都明显更快。

Performance benchmarks Performance benchmarks

有关glum的更多信息,包括教程和API参考,请参阅文档

我们为什么选择glum这个名字?我们想要一个包含GLM字母的名字,且不与任何现有实现混淆。我们还认为glum听起来很有趣(实际上并非如此!)如果你需要一个更专业的名字,请随意将其读作G-L-um。或者,它可能代表“广义线性...嗯...建模?”

一个经典的房价预测例子

>>> from sklearn.datasets import fetch_openml
>>> from glum import GeneralizedLinearRegressor
>>>
>>> # This dataset contains house sale prices for King County, which includes
>>> # Seattle. It includes homes sold between May 2014 and May 2015.
>>> house_data = fetch_openml(name="house_sales", version=3, as_frame=True)
>>>
>>> # Use only select features
>>> X = house_data.data[
...     [
...         "bedrooms",
...         "bathrooms",
...         "sqft_living",
...         "floors",
...         "waterfront",
...         "view",
...         "condition",
...         "grade",
...         "yr_built",
...         "yr_renovated",
...     ]
... ].copy()
>>>
>>>
>>> # Model whether a house had an above or below median price via a Binomial
>>> # distribution. We'll be doing L1-regularized logistic regression.
>>> price = house_data.target
>>> y = (price < price.median()).values.astype(int)
>>> model = GeneralizedLinearRegressor(
...     family='binomial',
...     l1_ratio=1.0,
...     alpha=0.001
... )
>>>
>>> _ = model.fit(X=X, y=y)
>>>
>>> # .report_diagnostics shows details about the steps taken by the iterative solver.
>>> diags = model.get_formatted_diagnostics(full_report=True)
>>> diags[['objective_fct']]
        objective_fct
n_iter               
0            0.693091
1            0.489500
2            0.449585
3            0.443681
4            0.443498
5            0.443497
>>>
>>> # Models can also be built with formulas from formulaic.
>>> model_formula = GeneralizedLinearRegressor(
...     family='binomial',
...     l1_ratio=1.0,
...     alpha=0.001,
...     formula="bedrooms + np.log(bathrooms + 1) + bs(sqft_living, 3) + C(waterfront)"
... )
>>> _ = model_formula.fit(X=house_data.data, y=y)

安装

请通过conda-forge安装此包

conda install glum -c conda-forge

性能

为了在x86_64架构上实现最佳性能,我们建议使用MKL库(conda install mkl)。默认情况下,conda通常会安装openblas版本,这比较慢,但支持所有主要架构和操作系统。

项目详情


下载文件

下载适合您平台的文件。如果您不确定选择哪个,请了解更多关于安装包的信息。

源分布

glum-3.0.2.tar.gz (13.5 MB 查看哈希

上传时间

构建分布

glum-3.0.2-cp312-cp312-win_amd64.whl (530.2 kB 查看哈希

上传时间 CPython 3.12 Windows x86-64

glum-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB 查看哈希

上传时间 CPython 3.12 manylinux: glibc 2.17+ x86-64

glum-3.0.2-cp312-cp312-macosx_11_0_arm64.whl (623.9 kB 查看哈希

上传于 CPython 3.12 macOS 11.0+ ARM64

glum-3.0.2-cp312-cp312-macosx_10_13_x86_64.whl (974.3 kB 查看哈希值)

上传于 CPython 3.12 macOS 10.13+ x86-64

glum-3.0.2-cp311-cp311-win_amd64.whl (521.2 kB 查看哈希值)

上传于 CPython 3.11 Windows x86-64

glum-3.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB 查看哈希值)

上传于 CPython 3.11 manylinux: glibc 2.17+ x86-64

glum-3.0.2-cp311-cp311-macosx_11_0_arm64.whl (604.5 kB 查看哈希值)

上传于 CPython 3.11 macOS 11.0+ ARM64

glum-3.0.2-cp311-cp311-macosx_10_13_x86_64.whl (934.4 kB 查看哈希值)

上传于 CPython 3.11 macOS 10.13+ x86-64

glum-3.0.2-cp310-cp310-win_amd64.whl (520.5 kB 查看哈希值)

上传于 CPython 3.10 Windows x86-64

glum-3.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB 查看哈希值)

上传于 CPython 3.10 manylinux: glibc 2.17+ x86-64

glum-3.0.2-cp310-cp310-macosx_11_0_arm64.whl (609.0 kB 查看哈希值)

上传于 CPython 3.10 macOS 11.0+ ARM64

glum-3.0.2-cp310-cp310-macosx_10_13_x86_64.whl (934.7 kB 查看哈希值)

上传于 CPython 3.10 macOS 10.13+ x86-64

glum-3.0.2-cp39-cp39-win_amd64.whl (521.6 kB 查看哈希值)

上传于 CPython 3.9 Windows x86-64

glum-3.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB 查看哈希值)

上传于 CPython 3.9 manylinux: glibc 2.17+ x86-64

glum-3.0.2-cp39-cp39-macosx_11_0_arm64.whl (610.2 kB 查看哈希值)

上传于 CPython 3.9 macOS 11.0+ ARM64

glum-3.0.2-cp39-cp39-macosx_10_13_x86_64.whl (935.9 kB 查看哈希值)

上传于 CPython 3.9 macOS 10.13+ x86-64

由以下支持