跳转到主要内容

快速将数据从postgres移动到numpy或pandas。

项目描述

快速将数据从postgres移动到numpy或pandas。

API

to_arrays(query, *, bind=None)

Run the query returning a the results as np.ndarrays.

Parameters
----------
query : sa.sql.Selectable
    The query to run. This can be a select or a table.
bind : sa.Engine, optional
    The engine used to create the connection. If not provided
    ``query.bind`` will be used.

Returns
-------
arrays : dict[str, (np.ndarray, np.ndarray)]
    A map from column name to the result arrays. The first array holds the
    values and the second array is a boolean mask for NULLs. The values
    where the mask is False are 0 interpreted by the type.

to_dataframe(query, *, bind=None, null_values=None)

Run the query returning a the results as a pd.DataFrame.

Parameters
----------
query : sa.sql.Selectable
    The query to run. This can be a select or a table.
bind : sa.Engine, optional
    The engine used to create the connection. If not provided
    ``query.bind`` will be used.
null_values : dict[str, any]
    The null values to use for each column. This falls back to
    ``warp_prism.null_values`` for columns that are not specified.

Returns
-------
df : pd.DataFrame
    A pandas DataFrame holding the results of the query. The columns
    of the DataFrame will be named the same and be in the same order as the
    query.

register_odo_dataframe_edge()

Register an odo edge for sqlalchemy selectable objects to dataframe.

This edge will have a lower cost that the default edge so it will be
selected as the fasted path.

If the selectable is not in a postgres database, it will fallback to the
default odo edge.

比较

warp_prismodopd.read_sql_table 之间快速比较。

在这个例子中,我们将使用 warp_prismodopd.read_sql_table 读取存储在本地的postgres数据库中quandl存储的VIX的实时数据。然后,我们将使用 odo 创建一个具有两个浮点列和1000000行的表,并再次使用树工具查询它。

In [1]: import warp_prism

In [2]: from odo import odo, resource

In [3]: import pandas as pd

In [4]: table = resource(
   ...:     'postgresql://localhost/bz::yahoo_index_vix',
   ...:     schema='quandl',
   ...: )

In [5]: warp_prism.to_dataframe(table).head()
Out[5]:
   asof_date      open_       high        low      close  volume  \
0 2016-01-08  22.959999  27.080000  22.480000  27.010000     0.0
1 2015-12-04  17.430000  17.650000  14.690000  14.810000     0.0
2 2015-10-29  14.800000  15.460000  14.330000  14.610000     0.0
3 2015-12-21  19.639999  20.209999  18.700001  18.700001     0.0
4 2015-10-26  14.760000  15.430000  14.680000  15.290000     0.0

   adjusted_close                  timestamp
0       27.010000 2016-01-11 23:14:54.682220
1       14.810000 2016-01-11 23:14:54.682220
2       14.610000 2016-01-11 23:14:54.682220
3       18.700001 2016-01-11 23:14:54.682220
4       15.290000 2016-01-11 23:14:54.682220

In [6]: odo(table, pd.DataFrame).head()
Out[6]:
   asof_date      open_       high        low      close  volume  \
0 2016-01-08  22.959999  27.080000  22.480000  27.010000     0.0
1 2015-12-04  17.430000  17.650000  14.690000  14.810000     0.0
2 2015-10-29  14.800000  15.460000  14.330000  14.610000     0.0
3 2015-12-21  19.639999  20.209999  18.700001  18.700001     0.0
4 2015-10-26  14.760000  15.430000  14.680000  15.290000     0.0

   adjusted_close                  timestamp
0       27.010000 2016-01-11 23:14:54.682220
1       14.810000 2016-01-11 23:14:54.682220
2       14.610000 2016-01-11 23:14:54.682220
3       18.700001 2016-01-11 23:14:54.682220
4       15.290000 2016-01-11 23:14:54.682220

In [7]: pd.read_sql_table(table.name, table.bind, table.schema).head()
Out[7]:
   asof_date      open_       high        low      close  volume  \
0 2016-01-08  22.959999  27.080000  22.480000  27.010000     0.0
1 2015-12-04  17.430000  17.650000  14.690000  14.810000     0.0
2 2015-10-29  14.800000  15.460000  14.330000  14.610000     0.0
3 2015-12-21  19.639999  20.209999  18.700001  18.700001     0.0
4 2015-10-26  14.760000  15.430000  14.680000  15.290000     0.0

   adjusted_close                  timestamp
0       27.010000 2016-01-11 23:14:54.682220
1       14.810000 2016-01-11 23:14:54.682220
2       14.610000 2016-01-11 23:14:54.682220
3       18.700001 2016-01-11 23:14:54.682220
4       15.290000 2016-01-11 23:14:54.682220

In [8]: len(warp_prism.to_dataframe(table))
Out[8]: 6565

In [9]: %timeit warp_prism.to_dataframe(table)
100 loops, best of 3: 7.55 ms per loop

In [10]: %timeit odo(table, pd.DataFrame)
10 loops, best of 3: 49.9 ms per loop

In [11]: %timeit pd.read_sql_table(table.name, table.bind, table.schema)
10 loops, best of 3: 61.8 ms per loop

In [12]: big_table = odo(
    ...:     pd.DataFrame({
    ...:         'a': np.random.rand(1000000),
    ...:         'b': np.random.rand(1000000)},
    ...:     ),
    ...:     'postgresql://localhost/test::largefloattest',
    ...: )

In [13]: %timeit warp_prism.to_dataframe(big_table)
1 loop, best of 3: 248 ms per loop

In [14]: %timeit odo(big_table, pd.DataFrame)
1 loop, best of 3: 1.51 s per loop

In [15]: %timeit pd.read_sql_table(big_table.name, big_table.bind)
1 loop, best of 3: 1.9 s per loop

安装

Warp Prism可以通过pip安装,但需要numpy构建其C扩展

$ pip install numpy
$ pip install warp_prism

许可证

Warp Prism根据Apache 2.0许可证。

Warp Prism由Quantopian赞助,Quantopian在该平台上使用它通过Pipeline API或通过BlazeZipline交互式获取数据。

项目详情


下载文件

下载您平台的文件。如果您不确定该选择哪个,请了解有关安装包的更多信息。

源分发

warp_prism-0.1.1.tar.gz (11.4 kB 查看哈希值)

上传时间:

支持者