快速将数据从postgres移动到numpy或pandas。
项目描述
快速将数据从postgres移动到numpy或pandas。
API
to_arrays(query, *, bind=None)
Run the query returning a the results as np.ndarrays.
Parameters
----------
query : sa.sql.Selectable
The query to run. This can be a select or a table.
bind : sa.Engine, optional
The engine used to create the connection. If not provided
``query.bind`` will be used.
Returns
-------
arrays : dict[str, (np.ndarray, np.ndarray)]
A map from column name to the result arrays. The first array holds the
values and the second array is a boolean mask for NULLs. The values
where the mask is False are 0 interpreted by the type.
to_dataframe(query, *, bind=None, null_values=None)
Run the query returning a the results as a pd.DataFrame.
Parameters
----------
query : sa.sql.Selectable
The query to run. This can be a select or a table.
bind : sa.Engine, optional
The engine used to create the connection. If not provided
``query.bind`` will be used.
null_values : dict[str, any]
The null values to use for each column. This falls back to
``warp_prism.null_values`` for columns that are not specified.
Returns
-------
df : pd.DataFrame
A pandas DataFrame holding the results of the query. The columns
of the DataFrame will be named the same and be in the same order as the
query.
register_odo_dataframe_edge()
Register an odo edge for sqlalchemy selectable objects to dataframe.
This edge will have a lower cost that the default edge so it will be
selected as the fasted path.
If the selectable is not in a postgres database, it will fallback to the
default odo edge.
比较
在 warp_prism,odo 和 pd.read_sql_table 之间快速比较。
在这个例子中,我们将使用 warp_prism,odo 和 pd.read_sql_table 读取存储在本地的postgres数据库中quandl存储的VIX的实时数据。然后,我们将使用 odo 创建一个具有两个浮点列和1000000行的表,并再次使用树工具查询它。
In [1]: import warp_prism
In [2]: from odo import odo, resource
In [3]: import pandas as pd
In [4]: table = resource(
...: 'postgresql://localhost/bz::yahoo_index_vix',
...: schema='quandl',
...: )
In [5]: warp_prism.to_dataframe(table).head()
Out[5]:
asof_date open_ high low close volume \
0 2016-01-08 22.959999 27.080000 22.480000 27.010000 0.0
1 2015-12-04 17.430000 17.650000 14.690000 14.810000 0.0
2 2015-10-29 14.800000 15.460000 14.330000 14.610000 0.0
3 2015-12-21 19.639999 20.209999 18.700001 18.700001 0.0
4 2015-10-26 14.760000 15.430000 14.680000 15.290000 0.0
adjusted_close timestamp
0 27.010000 2016-01-11 23:14:54.682220
1 14.810000 2016-01-11 23:14:54.682220
2 14.610000 2016-01-11 23:14:54.682220
3 18.700001 2016-01-11 23:14:54.682220
4 15.290000 2016-01-11 23:14:54.682220
In [6]: odo(table, pd.DataFrame).head()
Out[6]:
asof_date open_ high low close volume \
0 2016-01-08 22.959999 27.080000 22.480000 27.010000 0.0
1 2015-12-04 17.430000 17.650000 14.690000 14.810000 0.0
2 2015-10-29 14.800000 15.460000 14.330000 14.610000 0.0
3 2015-12-21 19.639999 20.209999 18.700001 18.700001 0.0
4 2015-10-26 14.760000 15.430000 14.680000 15.290000 0.0
adjusted_close timestamp
0 27.010000 2016-01-11 23:14:54.682220
1 14.810000 2016-01-11 23:14:54.682220
2 14.610000 2016-01-11 23:14:54.682220
3 18.700001 2016-01-11 23:14:54.682220
4 15.290000 2016-01-11 23:14:54.682220
In [7]: pd.read_sql_table(table.name, table.bind, table.schema).head()
Out[7]:
asof_date open_ high low close volume \
0 2016-01-08 22.959999 27.080000 22.480000 27.010000 0.0
1 2015-12-04 17.430000 17.650000 14.690000 14.810000 0.0
2 2015-10-29 14.800000 15.460000 14.330000 14.610000 0.0
3 2015-12-21 19.639999 20.209999 18.700001 18.700001 0.0
4 2015-10-26 14.760000 15.430000 14.680000 15.290000 0.0
adjusted_close timestamp
0 27.010000 2016-01-11 23:14:54.682220
1 14.810000 2016-01-11 23:14:54.682220
2 14.610000 2016-01-11 23:14:54.682220
3 18.700001 2016-01-11 23:14:54.682220
4 15.290000 2016-01-11 23:14:54.682220
In [8]: len(warp_prism.to_dataframe(table))
Out[8]: 6565
In [9]: %timeit warp_prism.to_dataframe(table)
100 loops, best of 3: 7.55 ms per loop
In [10]: %timeit odo(table, pd.DataFrame)
10 loops, best of 3: 49.9 ms per loop
In [11]: %timeit pd.read_sql_table(table.name, table.bind, table.schema)
10 loops, best of 3: 61.8 ms per loop
In [12]: big_table = odo(
...: pd.DataFrame({
...: 'a': np.random.rand(1000000),
...: 'b': np.random.rand(1000000)},
...: ),
...: 'postgresql://localhost/test::largefloattest',
...: )
In [13]: %timeit warp_prism.to_dataframe(big_table)
1 loop, best of 3: 248 ms per loop
In [14]: %timeit odo(big_table, pd.DataFrame)
1 loop, best of 3: 1.51 s per loop
In [15]: %timeit pd.read_sql_table(big_table.name, big_table.bind)
1 loop, best of 3: 1.9 s per loop
安装
Warp Prism可以通过pip安装,但需要numpy构建其C扩展
$ pip install numpy
$ pip install warp_prism
许可证
Warp Prism根据Apache 2.0许可证。
Warp Prism由Quantopian赞助,Quantopian在该平台上使用它通过Pipeline API或通过Blaze与Zipline交互式获取数据。
项目详情
关闭
warp_prism-0.1.1.tar.gz的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | d57df5dbadf1d2764de27e2e3edb339cb296cbf1be71c5b9c9d317d5ca5d598c |
|
MD5 | fdf6efdebf4019c21e135588539d460f |
|
BLAKE2b-256 | 61efd40839723444d99ac54dc207ce130b2ca96d66c7dfda91a650fe48a0da3e |