datafactory生成测试数据。
项目描述
概述
datafactory根据给定的规则创建灵活的数据。
功能分为字段、模型、容器和格式化器。如果您将其与数据库进行比较,字段是列,模型是记录,容器是表。
datafactory的出色之处在于其在类型指定方面的灵活性。容器还可以嵌套。
格式化器支持数据格式化和文件输出。
需求
Python 3.5或更高版本。
安装
$ pip install datafactory
用法
基本示例
In [1]: import datafactory
In [2]: model = datafactory.Model({
...: 'id': datafactory.IncrementField(),
...: 'x': datafactory.CycleField(['a', 'b', 'c']),
...: # BLANK will be omit.
...: 'option': datafactory.ChoiceField([True, False, datafactory.BLANK]),
...: })
In [3]: container = datafactory.Container(model, 5, render=True)
In [4]: container
Out[4]:
[{'id': 1, 'x': 'a'},
{'id': 2, 'x': 'b', 'option': False},
{'id': 3, 'x': 'c', 'option': True},
{'id': 4, 'x': 'a'},
{'id': 5, 'x': 'b'}]
# specify rewrite=True, if file already exists.
In [5]: datafactory.JsonFormatter(container).write('/tmp/test.json', rewrite=True)
In [6]: !cat /tmp/test.json
[
{
"x": "a",
"id": 1
},
{
"x": "b",
"id": 2,
"option": false
},
{
"x": "c",
"id": 3,
"option": true
},
{
"x": "a",
"id": 4
},
{
"x": "b",
"id": 5
}
]
TSV示例
In [1]: import datafactory
In [2]: model = datafactory.ListModel([
...: datafactory.IncrementField(start=10, step=5),
...: datafactory.HashOfField(2, 'md5'), # hashing value of the third column.
...: datafactory.ChoiceField(['foo', 'bar', 'baz']),
...: datafactory.CycleField(range(0, 30, 10)),
...: ]).ordering(2) # render at first index:2(third column)
# IterContainer is saving memory, because generating an element each time.
In [3]: container = datafactory.IterContainer(model, 10) # repeat 10 times.
In [4]: datafactory.CsvFormatter(
...: container,
...: delimiter='\t',
...: header=['id', 'hash-of-name', 'name', 'value']
...: ).write('/tmp/test.csv', rewrite=True)
In [5]: !cat /tmp/test.csv
id hash-of-name name value
10 acbd18db4cc2f85cedef654fccc4a4d8 foo 0
15 acbd18db4cc2f85cedef654fccc4a4d8 foo 10
20 73feffa4b7f6bb68e44cf984c85f6e88 baz 20
25 acbd18db4cc2f85cedef654fccc4a4d8 foo 0
30 acbd18db4cc2f85cedef654fccc4a4d8 foo 10
35 73feffa4b7f6bb68e44cf984c85f6e88 baz 20
40 73feffa4b7f6bb68e44cf984c85f6e88 baz 0
45 73feffa4b7f6bb68e44cf984c85f6e88 baz 10
50 37b51d194a7513e45b56f6524f2d51f2 bar 20
55 37b51d194a7513e45b56f6524f2d51f2 bar 0
自定义示例
如果对象是可调用的,则存储执行结果。
模型
In [1]: import datafactory
In [2]: def square(k, i):
...: return k * i
...:
In [3]: container = datafactory.DictContainer(square)
In [4]: container(['a', 'b', 'c', 'd', 'e'])
Out[4]: {'a': '', 'b': 'b', 'c': 'cc', 'd': 'ddd', 'e': 'eeee'}
字段
In [1]: import datafactory
In [2]: model = datafactory.Model({
...: 'col1': (lambda r, i: i),
...: 'col2': (lambda r: r['col1'] + 1),
...: 'col3': (lambda r: r['col2'] * 2),
...: 'col4': 100, # fixed value
...: }).ordering('col1', 'col2', 'col3')
In [3]: container = datafactory.ListContainer(model)
In [4]: container(4)
Out[4]:
[{'col1': 0, 'col2': 1, 'col3': 2, 'col4': 100},
{'col1': 1, 'col2': 2, 'col3': 4, 'col4': 100},
{'col1': 2, 'col2': 3, 'col3': 6, 'col4': 100},
{'col1': 3, 'col2': 4, 'col3': 8, 'col4': 100}]
元素数量限制示例
In [1]: import datafactory
In [2]: model = datafactory.Model({
...: # x: a is 1times limited. / b is 2times limited. / c is 3times limited.
...: 'x': datafactory.PickoutField({'a': 1, 'b': 2, 'c': 3}, missing=None),
...: # y: a is 2times limited. / b and c is 1times limited.
...: 'y': datafactory.PickoutField(['a', 'a', 'b', 'c'], missing='*'),
...: # z: a and b can't be selected. / c is 5times limited.
...: 'z': datafactory.PickoutField(['c']*5, missing=None),
...: })
In [3]: container = datafactory.ListContainer(model)
In [4]: container(6)
Out[4]:
[{'x': 'a', 'y': 'a', 'z': 'c'},
{'x': 'c', 'y': 'b', 'z': 'c'},
{'x': 'c', 'y': 'a', 'z': 'c'},
{'x': 'b', 'y': 'c', 'z': 'c'},
{'x': 'c', 'y': '*', 'z': 'c'},
{'x': 'b', 'y': '*', 'z': None}]
组合示例
要生成结合多个元素的测试数据,可以通过使用CycleField和SequenceField的重复参数来实现。
In [1]: import datafactory
In [2]: l0 = ['a', 'b']
In [3]: l1 = ['a', 'b', 'c']
In [4]: l2 = ['a', 'b', 'c', 'd']
In [5]: model = datafactory.ListModel([
...: datafactory.SequenceField(l0, repeat=len(l1)*len(l2), missing=datafactory.ESCAPE),
...: datafactory.CycleField(l1, repeat=len(l2)),
...: datafactory.CycleField(l2),
...: ])
In [6]: container = datafactory.Container(model)
# by specifying the ESCAPE to missing-argument
# automatically detect end of elements and escape before reaching 10000.
In [7]: container(10000)
Out[7]:
[['a', 'a', 'a'],
['a', 'a', 'b'],
['a', 'a', 'c'],
['a', 'a', 'd'],
['a', 'b', 'a'],
['a', 'b', 'b'],
['a', 'b', 'c'],
['a', 'b', 'd'],
['a', 'c', 'a'],
['a', 'c', 'b'],
['a', 'c', 'c'],
['a', 'c', 'd'],
['b', 'a', 'a'],
['b', 'a', 'b'],
['b', 'a', 'c'],
['b', 'a', 'd'],
['b', 'b', 'a'],
['b', 'b', 'b'],
['b', 'b', 'c'],
['b', 'b', 'd'],
['b', 'c', 'a'],
['b', 'c', 'b'],
['b', 'c', 'c'],
['b', 'c', 'd']]
嵌套示例
In [1]: import datafactory
In [2]: model = datafactory.Model({
...: 'a': datafactory.ListModel([
...: datafactory.CycleField(['b', 'c']),
...: datafactory.CycleField(['d', 'e']),
...: ]),
...: datafactory.ChoiceField(['f', 'g', 'h']): datafactory.DictContainer(lambda x: x * 2, 5)
...: })
In [3]: datafactory.Container(model, 10, render=True)
Out[3]:
[{'a': ['b', 'd'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['c', 'e'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['b', 'd'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['c', 'e'], 'g': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['b', 'd'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['b', 'd'], 'g': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['b', 'd'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}}]
datetime实用工具
选择
在开始和结束之间随机选择。
In [1]: from datafactory.utils.datetime import choice
In [2]: choice(1988, '2015-11-11T11:11:11.111111')
Out[2]: datetime.datetime(2009, 11, 30, 23, 25, 43, 240031)
# tuple: datetime(*tuple), dict: datetime(**dict)
In [3]: choice((1988, 5, 22), {'year': 2015, 'month': 11, 'day': 11})
Out[3]: datetime.datetime(1996, 7, 1, 11, 14, 59, 314809)
In [4]: from datetime import datetime, date
In [5]: choice(date(1988, 5, 22), datetime(2015, 11, 11, 11, 11, 11))
Out[5]: datetime.datetime(2011, 3, 23, 19, 39, 14, 476901)
生成器
生成定时期隔的datetime对象的生成器。
In [1]: from datetime import timedelta
In [2]: from datafactory.utils.datetime import generator
# if you omit end-argument, then it creates an object infinitely.
In [3]: g = generator(start=2015, interval=timedelta(days=1, hours=12))
In [4]: next(g)
Out[4]: datetime.datetime(2015, 1, 1, 0, 0)
In [5]: next(g)
Out[5]: datetime.datetime(2015, 1, 2, 12, 0)
In [6]: next(g)
Out[6]: datetime.datetime(2015, 1, 4, 0, 0)
In [7]: next(g)
Out[7]: datetime.datetime(2015, 1, 5, 12, 0)
范围
生成包含定期生成的datetime对象的列表对象。
In [1]: from datetime import timedelta
In [2]: from datafactory.utils.datetime import range
In [3]: range(2015, '2015/2/1')
Out[3]:
[datetime.datetime(2015, 1, 1, 0, 0),
datetime.datetime(2015, 1, 2, 0, 0),
datetime.datetime(2015, 1, 3, 0, 0),
datetime.datetime(2015, 1, 4, 0, 0),
datetime.datetime(2015, 1, 5, 0, 0),
datetime.datetime(2015, 1, 6, 0, 0),
datetime.datetime(2015, 1, 7, 0, 0),
datetime.datetime(2015, 1, 8, 0, 0),
datetime.datetime(2015, 1, 9, 0, 0),
datetime.datetime(2015, 1, 10, 0, 0),
datetime.datetime(2015, 1, 11, 0, 0),
datetime.datetime(2015, 1, 12, 0, 0),
datetime.datetime(2015, 1, 13, 0, 0),
datetime.datetime(2015, 1, 14, 0, 0),
datetime.datetime(2015, 1, 15, 0, 0),
datetime.datetime(2015, 1, 16, 0, 0),
datetime.datetime(2015, 1, 17, 0, 0),
datetime.datetime(2015, 1, 18, 0, 0),
datetime.datetime(2015, 1, 19, 0, 0),
datetime.datetime(2015, 1, 20, 0, 0),
datetime.datetime(2015, 1, 21, 0, 0),
datetime.datetime(2015, 1, 22, 0, 0),
datetime.datetime(2015, 1, 23, 0, 0),
datetime.datetime(2015, 1, 24, 0, 0),
datetime.datetime(2015, 1, 25, 0, 0),
datetime.datetime(2015, 1, 26, 0, 0),
datetime.datetime(2015, 1, 27, 0, 0),
datetime.datetime(2015, 1, 28, 0, 0),
datetime.datetime(2015, 1, 29, 0, 0),
datetime.datetime(2015, 1, 30, 0, 0),
datetime.datetime(2015, 1, 31, 0, 0),
datetime.datetime(2015, 2, 1, 0, 0)]
# +-3 hour noise, +5 minute noise
In [4]: range(2015, '2015-01-15', hours=3, minutes=(0, 5))
Out[4]:
[datetime.datetime(2015, 1, 1, 3, 1),
datetime.datetime(2015, 1, 2, 0, 3),
datetime.datetime(2015, 1, 3, 2, 0),
datetime.datetime(2015, 1, 3, 22, 2),
datetime.datetime(2015, 1, 4, 22, 3),
datetime.datetime(2015, 1, 6, 0, 2),
datetime.datetime(2015, 1, 7, 0, 4),
datetime.datetime(2015, 1, 8, 0, 4),
datetime.datetime(2015, 1, 8, 21, 3),
datetime.datetime(2015, 1, 9, 22, 0),
datetime.datetime(2015, 1, 11, 0, 0),
datetime.datetime(2015, 1, 11, 22, 1),
datetime.datetime(2015, 1, 12, 22, 5),
datetime.datetime(2015, 1, 14, 3, 0),
datetime.datetime(2015, 1, 15, 2, 5)]
# it is able to specify minus direction as interval.
In [5]: range(start='2015-5-22', end='2015-04-22', interval=timedelta(days=-1))
Out[5]:
[datetime.datetime(2015, 5, 22, 0, 0),
datetime.datetime(2015, 5, 21, 0, 0),
datetime.datetime(2015, 5, 20, 0, 0),
datetime.datetime(2015, 5, 19, 0, 0),
datetime.datetime(2015, 5, 18, 0, 0),
datetime.datetime(2015, 5, 17, 0, 0),
datetime.datetime(2015, 5, 16, 0, 0),
datetime.datetime(2015, 5, 15, 0, 0),
datetime.datetime(2015, 5, 14, 0, 0),
datetime.datetime(2015, 5, 13, 0, 0),
datetime.datetime(2015, 5, 12, 0, 0),
datetime.datetime(2015, 5, 11, 0, 0),
datetime.datetime(2015, 5, 10, 0, 0),
datetime.datetime(2015, 5, 9, 0, 0),
datetime.datetime(2015, 5, 8, 0, 0),
datetime.datetime(2015, 5, 7, 0, 0),
datetime.datetime(2015, 5, 6, 0, 0),
datetime.datetime(2015, 5, 5, 0, 0),
datetime.datetime(2015, 5, 4, 0, 0),
datetime.datetime(2015, 5, 3, 0, 0),
datetime.datetime(2015, 5, 2, 0, 0),
datetime.datetime(2015, 5, 1, 0, 0),
datetime.datetime(2015, 4, 30, 0, 0),
datetime.datetime(2015, 4, 29, 0, 0),
datetime.datetime(2015, 4, 28, 0, 0),
datetime.datetime(2015, 4, 27, 0, 0),
datetime.datetime(2015, 4, 26, 0, 0),
datetime.datetime(2015, 4, 25, 0, 0),
datetime.datetime(2015, 4, 24, 0, 0),
datetime.datetime(2015, 4, 23, 0, 0),
datetime.datetime(2015, 4, 22, 0, 0)]
常见
噪声
可以通过指定噪声参数来指定实际时间之间的间隔。允许指定噪声参数的函数是“datetimes.generator”和“datetimes.range”。
**noise以kwargs格式指定,它们不是必需的。
可用的键与timedelta-args相同。
天数
小时
分钟
秒
微秒
argtype
除了datetime类型之外的接受参数如下。
- int:
它被评估为年。
- str:
它被解析为从字符串的数字部分来的datetime。
- tuple:
它将被传递到datetime的参数中。
- dict:
它将被传递到datetime的关键字参数中。
- date:
它将被转换为datetime类型。
历史记录
1.0.x
初始化。
项目详情
下载文件
下载适合您平台的文件。如果您不确定该选择哪个,请了解有关安装包的更多信息。
源分布
datafactory-1.0.1.tar.gz (22.1 kB 查看哈希值)
构建分布
datafactory-1.0.1-py3-none-any.whl (36.4 kB 查看哈希值)
关闭
datafactory-1.0.1.tar.gz 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | bc60824d1d29ca55be9772ac8450c7ec5774bcd1d1630c8f0ab8e30b62f1887b |
|
MD5 | a1d6ecbcaff9cff3ccc868905d7a9be3 |
|
BLAKE2b-256 | b978d57f9c802b7ef482bf444263791db32d58672ebdd0ed7203ce06ef59253d |
关闭
datafactory-1.0.1-py3-none-any.whl 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 41ea7201fe4bc18ea54aea9f32fd3090c97c6fe9601f42c3aeca612a498c58fb |
|
MD5 | f65f7e9c06b4f32f61a6198d737465e7 |
|
BLAKE2b-256 | 5fcede9c6eb552cc04bee8cf70a37bad42ea4e9293efb0ad16e5e961565c6f90 |