线性内存CUDA时间扭曲编辑距离算法。
项目描述
cuTWED v2.0.2.
|用Python制作|
|DOI徽章| |GPLv3许可证| |GitHub发布| |开源之爱svg1|
这是一个解决时间扭曲编辑距离的线性内存CUDA算法。
关于
这是时间扭曲编辑距离的一种新颖并行实现。原始算法由Marteau在《时间扭曲编辑距离与刚度调整用于时间序列匹配(2009)》中描述。该论文中的代码变体包含在reference_implementation
中,并用于比较结果。还有一个wiki页面 <https://en.wikipedia.org/wiki/Time_Warp_Edit_Distance>
__。
原始TWED算法的时间和空间复杂度为O(n^2)
。此算法对于p(CUDA)核心来说,时间复杂度约为O(n * n/p)
。最重要的是,cuTWED在内存中是线性的,使用大约6*nA + 6*nB
个元素进行存储。
为了理解动态程序数据依赖关系以实现CUDA架构的并行化,我设计了一种方法,该方法具有改进的内存访问模式,并且能够在大型问题上大规模并行化。这是通过在动态程序矩阵中用三个对角带移动,共进行nA+nB-1
步来实现的。任何地方都不需要O(n^2)
矩阵。注意,这不是一个近似方法。完整的TWED动态程序计算发生在线性存储中。
代码包含在一个库中,该库具有用于double
和float
精度的方法。它接受输入时间序列作为N维数组的C顺序数组(时间是最慢的轴)。
对于原始TWED实现可以计算的典型问题,利用cuTWED和数千个CUDA核心可以实现极大的加速。对于常见问题,速度提高了1到2个数量级,在P100 GPU上能够实现200倍加速。更重要的是,线性内存占用允许计算以前无法处理的问题。现在可以更有效地计算大型问题和大型输入系统。
将后续提供一些速度比较和更正式的论文。
参考实现
Marteau's original code with a some minor changes has been included in
this package. It is built both as a C library and part of the Python
package ``cuTWED.ctwed``. The minor changes are an extra argument
``dimension`` to admit ``R^N`` inputs, and more common handling of norm
(nth-root). These modications were made to facilitate refence testing
cuTWED and also make the original code more general.
``ctwed`` is also included for users without a CUDA card who find other
implementations too slow.
Getting Started
---------------
For many users the prepackaged (linux) Python distribution is the
simplest way to get the code. If you have CUDA10/11 and a Python 3.6+
manylinux compatible installation you can try the prepackged wheels with:
`pip install cuTWED`
For other situations, or users seeking maximum performance, instructions
follow for building the core CUDA libray, installing, and building
the Python bindings from source.
Requirements
~~~~~~~~~~~~
For the CUDA code you will need NVCC, a CUDA capable card and CMake.
Otherwise, the CUDA code has no dependencies.
If you do not have CMake or it is too old, a lot of people just pip
install it ``pip install cmake>=3.11``. Otherwise you'll need to
refer to their (Kitware) docs for your situation.
For the Python binding ``pip`` manages the specific depends and
installation of the Python interface after you have built the main CUDA
C library. Generally requires ``numpy``, ``pycuda``, and ``cffi``. I
recommend you use virtualenv or conda to manage Python.
Building
~~~~~~~~
Building has two stages.
First the CUDA C library is built and installed.
Second the Python bindings (if desired) are built on top of that.
The CUDA C library may be permanently installed to your system,
in a standard fashion, with some customization via CMake.
Alternatively a manual local install option is described below.
Building the core CUDA C library
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note you may customize the call to ``cmake`` below with flags like
``-DCMAKE_INSTALL_PREFIX=/opt/``, or other flags you might require.
::
# git a copy of the source code
git clone https://github.com/garrettwrong/cuTWED
cd cuTWED
# setup a build area
mkdir build
cd build
cmake ../ # configures/generates makefiles
# make
make -j # builds the software
This should create several files in the ``build`` directory including
``libcuTWED.so``, ``*.h`` headers, and some other stuff. To install to
your system permanently (may require sudo):
::
make install
If you would rather just want to temporarily have the library
available on your linux machine you can just use the LD path.
This makes no changes to your system outside the repo and this shell process.
Assuming you are still in your `build` directory, add the library to your path.
Users may decide to add a similar line permanently to their `.bashrc` or equivalent.
::
export LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH
Python
^^^^^^
Once you have the CUDA C library readied, we can use ``pip`` for the
Python bindings. From the root of the git repo:
::
pip install .
If you are planning to edit the code, you might prefer pip install in
local editable mode instead.
::
pip install -e .
Checking
~~~~~~~~
Assuming you have built both the CUDA library and the Python bindings,
you can run the unit test suite:
::
pytest
Development Testing
^^^^^^^^^^^^^^^^^^^
For developers there is a whole mess of configured tests which you can
run with:
::
tox --skip-missing-interpreters
I hope to improve this soon, but there are a *lot* of complication
running hybrid codes with free CI tools, and also packaging properly
with Python etc that need to be worked through. Some are most easily
addressed by using a managed CI host, but this is non free.... I suspect
this is largely why you do not see a plethera of free high performance
hybrid codes... perhaps a future project...
Using cuTWED in other programs
C/C++ ^^^^^
在C/C++中,您应该能够使用include "cuTWED.h"
并链接到共享库libcuTWED.so
。这是我在test.x
中做的事情。公开的方法是extern C mangled,应该可以从C和C++中无问题使用。
所有公开方法都包含在共享库中的浮点(32位)版本中。它们只是简单地添加了f
,例如,twedf
是twed
的浮点版本。您可以根据您的应用程序选择合适的版本。我在testf.x
中使用浮点数。
目前有两种主要方式可以调用cuTWED算法:twed
和twed_dev
。首先,twed
是最常见的方式,您在此处传递主机上的C数组,库会为您管理设备内存和传输。
或者,如果您已经在管理GPU内存,您可以使用twed_dev,它期望指向存在于GPU上的内存的指针。我还在cuTWED.h
中提供了malloc、copy和free辅助函数,以防您想重用内存。请参阅cuTWED.h
。所有这类高级案例的逻辑和大小检查都应由用户负责。
还有一个批处理方法。在我有机会写出更好的文档之前,您可以在test_batch
、test_batch_dev
以及test_synthetic_validation.py
中的一个小型但可敬的ML批处理问题集中找到示例用法。
我包含了一个Jupyter Notebook,它演示了使用UCI Pseudo Periodic Synthetic Time Series Data Set <http://archive.ics.uci.edu/ml/datasets/Pseudo+Periodic+Synthetic+Time+Series>
__进行验证。这是一个更大的数据集。
未来的计划包括对大型批次的优化和多GPU选项。
Python ^^^^^^
::
from cuTWED import twed
对于Python,我已经包含了一些基本的pip可安装的Python绑定。我在tests/test_basic.py
中使用它。如果您感兴趣,这些是通过cffi后端实现的,该后端解析C头文件,并由setuptools为您构建。主要的Python接口在cuTWED.py
中。这需要您已构建库,并且它已安装在一个系统已知的或通过LD_LIBRARY_PATH
可用的位置。
我还使用PyCUDA gpuarrays将仅GPU内存的方法封装在Python中。双精度和单精度的示例在tests/test_basic_dev.py
中。
::
from cuTWED import twed_dev
批处理接口分别是twed_batch
和twed_batch_dev
。目前它正在进行原始的同步。我有一个使用流和事件的分支,但在我推送之前需要验证它是否稳健。这将在批处理模式下再提高约20%的速度。
如果您想从Python运行Marteau的C代码,可以尝试ctwed
。对于(非常)小的问题,您可能会发现他的原始C代码更快。
故障排除和已知问题
此软件处于早期生命周期。以下为已知问题
- 便携性,我预计你现在有Linux系统。
- 我没有时间对其进行分析或优化,我知道有些地方可以进行改进。
如果你发现代码中存在问题或错误,请提交问题。更多关于此的信息可以在贡献文档中找到。
许可证
GPLv3
版权所有 2020 加雷特·赖特,Gestalt Group LLC
.. |DOI徽章| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.3842261.svg :target: https://doi.org/10.5281/zenodo.3842261 .. |GPLv3许可证| image:: https://img.shields.io/badge/License-GPLv3-blue.svg :target: http://perso.crans.org/besson/LICENSE.html .. |GitHub版本| image:: https://img.shields.io/github/release/garrettwrong/cuTWED.svg :target: https://GitHub.com/garrettwrong/cuTWED/releases/ .. |开源之爱 svg1| image:: https://badges.frapsoft.com/os/v1/open-source.svg?v=103 :target: https://github.com/ellerbrock/open-source-badges/ .. |用Python制作| image:: http://ForTheBadge.com/images/badges/made-with-python.svg :target: https://pythonlang.cn/
项目详情
cuTWED-2.0.2.tar.gz的散列
算法 | 散列摘要 | |
---|---|---|
SHA256 | 68ecfbe759cf330cb0fc12bc91c3f0113a75d9f2fbb85523a2caa1cb782d6f9a |
|
MD5 | 32cd92dab9e5f5245b2fa8ec4847a7ac |
|
BLAKE2b-256 | 4e59a55fb13a2fddd80cea0a2638422798c40b8ee3fcaeea991010c7dc76595b |
cuTWED-2.0.2-cp39-cp39-manylinux2010_x86_64.whl的散列
算法 | 散列摘要 | |
---|---|---|
SHA256 | 56d88da87c9951393fcbe0300c32188706ac8657f1ff457f087f50908195274c |
|
MD5 | f00533df902d40db2201a10fb130b097 |
|
BLAKE2b-256 | 5eebe3b0062a7c890f3e40fbb6a8afd696aedfb3dc3cbf2bb1ef4ccae3bf620f |
cuTWED-2.0.2-cp39-cp39-manylinux1_x86_64.whl的散列
算法 | 散列摘要 | |
---|---|---|
SHA256 | 8d7cffe10377a13ba7aff6ad5c33f9e7bf80d4b7ea1078881671073c8e93b3a8 |
|
MD5 | 089b40ff6dedfaf039f361e4e37c8165 |
|
BLAKE2b-256 | 93d15bb09016f1c0b387623224b61eb43f1defefd611d601f21286e56f947a43 |
cuTWED-2.0.2-cp38-cp38-manylinux2010_x86_64.whl的散列
算法 | 散列摘要 | |
---|---|---|
SHA256 | ad8b2a8c7b74cf1d0826a514745f9b3945560b78326612767eeae496fef7b18b |
|
MD5 | 33f914a5e775feb78ad691526852421b |
|
BLAKE2b-256 | 4a084f599e8969790558c51412395ef71ed69acad45b35e8768e17ab537434f6 |
cuTWED-2.0.2-cp38-cp38-manylinux1_x86_64.whl的散列
算法 | 散列摘要 | |
---|---|---|
SHA256 | d5873f88c01a55584fb8ee13796936e036689ec68e9c8c62d9b81c338aaff913 |
|
MD5 | 693ab383f7a47d849fe7efa694b84096 |
|
BLAKE2b-256 | 68fd6a208af5cc96c0542d4a86dd48fd8af6cdfed20825b805c4d20c9000bfea |
cuTWED-2.0.2-cp37-cp37m-manylinux2010_x86_64.whl的散列
算法 | 散列摘要 | |
---|---|---|
SHA256 | 3d65f75e5ee1ebab63e75905d0474db0ae37b58511cdc2aac0dedf5ddc044141 |
|
MD5 | 39f62b9b9881ef9bf9376b67cb6062c3 |
|
BLAKE2b-256 | 2b68a67c1b54b4a8dd4bbf6e855b4f854ff986c1b7726a2087c690b04fdf0a23 |
cuTWED-2.0.2-cp37-cp37m-manylinux1_x86_64.whl的散列
算法 | 散列摘要 | |
---|---|---|
SHA256 | 16e40ef99c525e6e3f0b545086df734dc7be0569d981c823d1107df840bd8202 |
|
MD5 | 30a8f2d72dab764917a5f6500ee42031 |
|
BLAKE2b-256 | 9987442f1884d5fa6a4b5d6ce3aef64ed9c3839f4aff36bd33153b333d0a22f5 |
cuTWED-2.0.2-cp36-cp36m-manylinux2010_x86_64.whl的散列
算法 | 散列摘要 | |
---|---|---|
SHA256 | 4e739fc0649048eb4ced6d7c8a66ec6de6c448bd473768a3122a05be759fac2a |
|
MD5 | ffac9d6d461b6d05d1293fae966fc03e |
|
BLAKE2b-256 | 639ef52ade4c3bbad194ccccaa493dbf185353ef5e2efbd2dbfea59f0ce086c8 |
cuTWED-2.0.2-cp36-cp36m-manylinux1_x86_64.whl的散列
算法 | 散列摘要 | |
---|---|---|
SHA256 | bd5faf39ae38e6aadfeb8e0377beee8b8d184d62851c71bdf813d1e55715369a |
|
MD5 | cd388cb77a3ac48f5ea83203f8a46b75 |
|
BLAKE2b-256 | a0ced0c33e506ef067fc4fd5d318b1a3fc207493acf12e1d1954e11ce49e1e53 |