smidjson的Python绑定,使用libpy
项目描述
libpy simdjson
要求
- 操作系统:macOS>10.15, Linux。
- 编译器:gcc>=9, clang >= 10(C++17代码)
- Python:libpy>=0.2.3, numpy。
安装
pip install libpy-simdjson
注意:安装libpy
(libpy_simdjson
所需)将使用python
可执行文件来获取有关您环境的信息。如果您没有使用虚拟环境或python
没有指向您想使用的Python安装(通过which python
和python --version
检查),您必须使用PYTHON
环境变量指向您的Python可执行文件,即PYTHON=python3 make
或PYTHON=python3 pip3 install libpy
。此外,确保您的CC
和CXX
环境变量指向正确的编译器。
使用方法
from pathlib import Path
import libpy_simdjson as json
doc = json.load(Path("twitter.json"))
# or json.load(b"twitter.json")
# or json.load("twitter.json")
# we also support `loads` for strings.
doc
是一个Object
。对象充当具有特殊方法的python字典。
isinstance(doc, json.Object)
True
我们可以获取键、获取长度、获取项,并访问特定的键
len(doc)
2
doc.keys()
[b'statuses', b'search_metadata']
doc[b'search_metadata'].items()
[(b'completed_in', 0.087),
(b'max_id', 505874924095815700),
(b'max_id_str', b'505874924095815681'),
(b'next_results',
b'?max_id=505874847260352512&q=%E4%B8%80&count=100&include_entities=1'),
(b'query', b'%E4%B8%80'),
(b'refresh_url',
b'?since_id=505874924095815681&q=%E4%B8%80&include_entities=1'),
(b'count', 100),
(b'since_id', 0),
(b'since_id_str', b'0')]
如果您需要实际的Python字典,请使用as_dict
doc[b'search_metadata'].as_dict()
{b'completed_in': 0.087,
b'max_id': 505874924095815700,
b'max_id_str': b'505874924095815681',
b'next_results': b'?max_id=505874847260352512&q=%E4%B8%80&count=100&include_entities=1',
b'query': b'%E4%B8%80',
b'refresh_url': b'?since_id=505874924095815681&q=%E4%B8%80&include_entities=1',
b'count': 100,
b'since_id': 0,
b'since_id_str': b'0'}
然而,我们也支持通过at_pointer
使用JSON Pointer语法。如果您知道您在寻找什么,这将更快。
doc.at_pointer(b"/statuses/50/created_at")
b'Sun Aug 31 00:29:04 +0000 2014'
doc.at_pointer(b"/statuses/50/text").decode()
'RT @Ang_Angel73: 逢坂「くっ…僕の秘められし右目が…!」\n一同「……………。」'
让我们看看statuses
statuses = doc[b'statuses']
statuses
是一个Array
。数组像Python列表一样工作,但有特殊方法。
注意:statuses
和doc
共享一个单一的解析器实例。当这些对象存在时,我们无法解析新的文档(尽管我们可以通过libpy_simdjson.Parser.load
创建新的解析器)。
isinstance(statuses, json.Array)
True
数组支持长度、索引和迭代。
len(statuses)
100
statuses[0][b'text'].decode()
'@aym0566x \n\n名前:前田あゆみ\n第一印象:なんか怖っ!\n今の印象:とりあえずキモい。噛み合わない\n好きなところ:ぶすでキモいとこ😋✨✨\n思い出:んーーー、ありすぎ😊❤️\nLINE交換できる?:あぁ……ごめん✋\nトプ画をみて:照れますがな😘✨\n一言:お前は一生もんのダチ💖'
for status in statuses:
# this is a bad example but you get the picture
if status[b'id'] % 2 == 0:
print(status[b"text"].decode())
break
else:
print("no even ids?")
@aym0566x
名前:前田あゆみ
第一印象:なんか怖っ!
今の印象:とりあえずキモい。噛み合わない
好きなところ:ぶすでキモいとこ😋✨✨
思い出:んーーー、ありすぎ😊❤️
LINE交換できる?:あぁ……ごめん✋
トプ画をみて:照れますがな😘✨
一言:お前は一生もんのダチ💖
如果您需要,可以使用as_list
将数组转换为列表。
statuses.as_list()[1][b'metadata']
{b'result_type': b'recent', b'iso_language_code': b'ja'}
但是,就像对象一样,我们也支持通过at_pointer
使用JSON Pointers,这将更快。
statuses.at_pointer(b"/33/created_at")
b'Sun Aug 31 00:29:06 +0000 2014'
基准测试
注意 - 与大多数其他Python JSON解析器不同,libpy_simdjson
会根据设计,尽可能晚地将数据转换为原生Python类型,为您提供Object
和Array
对象。libpy
允许您将这些代理对象作为实际的Python对象来处理,而无需承担对象转换的成本,直到真正需要。由于C++ simdjson
库非常高效,转换为Python对象是解析过程中最慢的部分,所以我们努力在尽可能晚和尽可能少的字段上进行转换。
请参阅(仍在进行的)“python dict访问开销”基准测试,以了解对象转换的开销。
为了比较,我们还在libpy_simdjson_as_py_obj
中进行了完整的Python对象转换的基准测试,尽管这并不是预期的使用场景。
------------------------------------------------------------ benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/canada.json': 8 tests -------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path0-libpy_simdjson-loads] 3.5037 (1.0) 10.9591 (1.0) 4.3319 (1.0) 0.6166 (1.0) 4.2607 (1.0) 0.2292 (1.0) 11;15 230.8454 (1.0) 162 1
test_benchmark_load[path0-pysimdjson-parse] 3.6885 (1.05) 11.1368 (1.02) 4.4029 (1.02) 0.7765 (1.26) 4.2611 (1.00) 0.5017 (2.19) 8;5 227.1254 (0.98) 118 1
test_benchmark_load[path0-pysimdjson_as_py_obj-loads] 13.5282 (3.86) 37.4092 (3.41) 24.2264 (5.59) 6.5177 (10.57) 27.0374 (6.35) 11.9384 (52.08) 16;0 41.2773 (0.18) 40 1
test_benchmark_load[path0-libpy_simdjson_as_py_obj-libpy_simdjson_as_py_obj] 13.5544 (3.87) 44.5382 (4.06) 22.4503 (5.18) 7.1879 (11.66) 25.0067 (5.87) 12.0174 (52.43) 12;0 44.5427 (0.19) 35 1
test_benchmark_load[path0-orjson-loads] 16.1693 (4.61) 37.2228 (3.40) 25.2505 (5.83) 6.8427 (11.10) 27.4105 (6.43) 12.5579 (54.79) 19;0 39.6032 (0.17) 41 1
test_benchmark_load[path0-ujson-loads] 22.0310 (6.29) 45.6815 (4.17) 32.3445 (7.47) 7.0874 (11.49) 35.0020 (8.22) 12.6422 (55.15) 12;0 30.9171 (0.13) 27 1
test_benchmark_load[path0-python_json-loads] 49.6505 (14.17) 72.4977 (6.62) 62.0533 (14.32) 7.2277 (11.72) 64.0998 (15.04) 12.7639 (55.68) 8;0 16.1152 (0.07) 19 1
test_benchmark_load[path0-rapidjson-loads] 50.3836 (14.38) 76.2291 (6.96) 61.1768 (14.12) 7.7522 (12.57) 63.4637 (14.90) 12.2982 (53.65) 6;0 16.3461 (0.07) 17 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/citm_catalog.json': 8 tests ---------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path3-libpy_simdjson-loads] 869.1500 (1.0) 5,930.8430 (2.49) 1,123.0447 (1.0) 203.5337 (1.52) 1,112.6140 (1.0) 81.2050 (1.02) 29;35 890.4365 (1.0) 722 1
test_benchmark_load[path3-pysimdjson-parse] 875.8530 (1.01) 2,384.6440 (1.0) 1,127.0993 (1.00) 133.4820 (1.0) 1,119.4185 (1.01) 79.8040 (1.0) 72;38 887.2333 (1.00) 786 1
test_benchmark_load[path3-libpy_simdjson_as_py_obj-libpy_simdjson_as_py_obj] 6,227.8560 (7.17) 23,751.5410 (9.96) 10,109.4607 (9.00) 5,149.2095 (38.58) 7,506.5770 (6.75) 1,209.3185 (15.15) 27;29 98.9172 (0.11) 120 1
test_benchmark_load[path3-orjson-loads] 6,445.2870 (7.42) 23,814.3440 (9.99) 10,414.0837 (9.27) 4,872.0612 (36.50) 7,897.2230 (7.10) 2,016.7280 (25.27) 26;26 96.0238 (0.11) 114 1
test_benchmark_load[path3-pysimdjson_as_py_obj-loads] 7,940.2930 (9.14) 24,984.5340 (10.48) 11,900.6726 (10.60) 5,518.1670 (41.34) 8,878.9950 (7.98) 3,686.7190 (46.20) 21;21 84.0289 (0.09) 90 1
test_benchmark_load[path3-ujson-loads] 8,323.1930 (9.58) 23,228.4760 (9.74) 12,104.0124 (10.78) 5,363.4790 (40.18) 9,344.6530 (8.40) 802.3722 (10.05) 19;20 82.6172 (0.09) 83 1
test_benchmark_load[path3-python_json-loads] 12,697.9430 (14.61) 31,279.0080 (13.12) 16,691.7376 (14.86) 5,163.8324 (38.69) 14,104.4160 (12.68) 2,390.4543 (29.95) 17;17 59.9099 (0.07) 75 1
test_benchmark_load[path3-rapidjson-loads] 14,025.9210 (16.14) 29,509.2470 (12.37) 17,785.5409 (15.84) 5,061.8217 (37.92) 15,456.7050 (13.89) 1,756.3833 (22.01) 7;7 56.2254 (0.06) 35 1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/github_events.json': 8 tests ---------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path2-libpy_simdjson-loads] 25.4010 (1.0) 1,960.0230 (1.28) 36.2258 (1.0) 24.5956 (1.0) 34.6020 (1.0) 2.9000 (1.04) 72;1589 27.6046 (1.0) 12642 1
test_benchmark_load[path2-pysimdjson-parse] 25.8010 (1.02) 1,528.8010 (1.0) 37.9300 (1.05) 26.3332 (1.07) 37.3020 (1.08) 2.8010 (1.0) 189;1856 26.3644 (0.96) 16103 1
test_benchmark_load[path2-libpy_simdjson_as_py_obj-libpy_simdjson_as_py_obj] 142.1090 (5.59) 3,436.0130 (2.25) 207.5540 (5.73) 88.5276 (3.60) 204.8130 (5.92) 17.6010 (6.28) 58;289 4.8180 (0.17) 4034 1
test_benchmark_load[path2-orjson-loads] 175.7120 (6.92) 5,736.6740 (3.75) 254.9315 (7.04) 162.0033 (6.59) 239.3160 (6.92) 20.8005 (7.43) 26;226 3.9226 (0.14) 2872 1
test_benchmark_load[path2-pysimdjson_as_py_obj-loads] 224.7150 (8.85) 2,283.7510 (1.49) 321.5699 (8.88) 72.4103 (2.94) 324.0220 (9.36) 26.0020 (9.28) 79;180 3.1097 (0.11) 2632 1
test_benchmark_load[path2-ujson-loads] 301.7190 (11.88) 7,409.5770 (4.85) 375.2140 (10.36) 180.4753 (7.34) 363.8240 (10.51) 29.0010 (10.35) 18;175 2.6651 (0.10) 2269 1
test_benchmark_load[path2-python_json-loads] 330.5200 (13.01) 2,521.5590 (1.65) 459.9910 (12.70) 86.1825 (3.50) 455.8290 (13.17) 38.7277 (13.83) 119;146 2.1740 (0.08) 1909 1
test_benchmark_load[path2-rapidjson-loads] 380.6250 (14.98) 2,082.2340 (1.36) 533.6130 (14.73) 91.0874 (3.70) 529.9340 (15.32) 43.7030 (15.60) 76;95 1.8740 (0.07) 1709 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/mesh.json': 8 tests ------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path4-libpy_simdjson-loads] 994.5520 (1.0) 4,240.8210 (1.06) 1,202.4742 (1.0) 194.6273 (1.04) 1,190.2620 (1.0) 93.9050 (1.31) 80;42 831.6187 (1.0) 772 1
test_benchmark_load[path4-pysimdjson-parse] 1,027.3570 (1.03) 4,001.3210 (1.0) 1,257.5495 (1.05) 187.9555 (1.0) 1,253.6190 (1.05) 71.9545 (1.0) 37;50 795.1973 (0.96) 804 1
test_benchmark_load[path4-pysimdjson_as_py_obj-loads] 2,782.4560 (2.80) 16,753.7350 (4.19) 3,907.4828 (3.25) 2,316.1314 (12.32) 3,456.5430 (2.90) 264.9645 (3.68) 9;18 255.9192 (0.31) 252 1
test_benchmark_load[path4-libpy_simdjson_as_py_obj-libpy_simdjson_as_py_obj] 2,936.7520 (2.95) 19,099.6830 (4.77) 4,066.0482 (3.38) 2,579.4705 (13.72) 3,459.0780 (2.91) 291.3400 (4.05) 10;21 245.9390 (0.30) 207 1
test_benchmark_load[path4-orjson-loads] 3,848.3120 (3.87) 16,809.8280 (4.20) 4,853.2711 (4.04) 2,387.4255 (12.70) 4,348.8370 (3.65) 282.5400 (3.93) 8;13 206.0466 (0.25) 203 1
test_benchmark_load[path4-ujson-loads] 4,224.0310 (4.25) 27,094.9610 (6.77) 5,393.1933 (4.49) 2,651.7927 (14.11) 4,906.3150 (4.12) 368.6210 (5.12) 7;13 185.4189 (0.22) 196 1
test_benchmark_load[path4-python_json-loads] 7,946.7190 (7.99) 21,914.9540 (5.48) 9,216.2188 (7.66) 2,458.6174 (13.08) 8,729.1600 (7.33) 425.9220 (5.92) 4;8 108.5044 (0.13) 108 1
test_benchmark_load[path4-rapidjson-loads] 8,958.0800 (9.01) 26,303.9020 (6.57) 10,523.4513 (8.75) 2,727.4495 (14.51) 10,081.1370 (8.47) 748.3645 (10.40) 4;4 95.0259 (0.11) 101 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/twitter.json': 8 tests ----------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path1-libpy_simdjson-loads] 329.3220 (1.0) 3,357.5250 (2.67) 440.8892 (1.0) 84.7542 (1.65) 441.1300 (1.0) 37.4280 (1.22) 64;75 2,268.1435 (1.0) 1637 1
test_benchmark_load[path1-pysimdjson-parse] 348.7250 (1.06) 1,258.3910 (1.0) 451.6477 (1.02) 51.2426 (1.0) 450.8825 (1.02) 30.8030 (1.0) 374;155 2,214.1149 (0.98) 2006 1
test_benchmark_load[path1-libpy_simdjson_as_py_obj-libpy_simdjson_as_py_obj] 2,206.2470 (6.70) 14,217.4510 (11.30) 2,759.9331 (6.26) 1,386.3121 (27.05) 2,551.0705 (5.78) 312.4220 (10.14) 2;3 362.3276 (0.16) 74 1
test_benchmark_load[path1-orjson-loads] 2,639.8850 (8.02) 15,075.4730 (11.98) 3,420.6474 (7.76) 1,657.3556 (32.34) 3,183.0750 (7.22) 215.1150 (6.98) 6;15 292.3423 (0.13) 270 1
test_benchmark_load[path1-ujson-loads] 3,304.1320 (10.03) 18,557.3880 (14.75) 4,286.1597 (9.72) 1,868.0725 (36.46) 4,021.4820 (9.12) 283.4200 (9.20) 5;14 233.3091 (0.10) 239 1
test_benchmark_load[path1-pysimdjson_as_py_obj-loads] 3,319.1400 (10.08) 16,355.6780 (13.00) 4,237.6133 (9.61) 1,642.9272 (32.06) 3,982.8870 (9.03) 306.1198 (9.94) 6;16 235.9819 (0.10) 259 1
test_benchmark_load[path1-python_json-loads] 4,154.6810 (12.62) 17,041.0680 (13.54) 5,474.9570 (12.42) 1,520.8511 (29.68) 5,262.9090 (11.93) 484.4330 (15.73) 6;8 182.6498 (0.08) 190 1
test_benchmark_load[path1-rapidjson-loads] 5,184.5590 (15.74) 19,217.2170 (15.27) 6,598.9679 (14.97) 1,949.8404 (38.05) 6,315.0330 (14.32) 682.2923 (22.15) 5;7 151.5388 (0.07) 149 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------- benchmark 'Random attribute access': 2 tests --------------------------------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_at[python_json-loads] 800.0000 (1.0) 1,560,881.0000 (6.79) 1,190.4413 (1.0) 6,373.7290 (3.79) 1,100.0000 (1.0) 100.0000 (1.0) 122;3966 840.0246 (1.0) 88496 1
test_benchmark_at[libpy_simdjson-loads] 1,300.0000 (1.63) 229,812.0000 (1.0) 1,848.6849 (1.55) 1,683.4805 (1.0) 1,800.0000 (1.64) 400.0000 (4.00) 143;234 540.9251 (0.64) 57801 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------- benchmark 'Random list access': 2 tests ----------------------------------------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_list_access[python_json-loads] 700.0000 (1.0) 258,213.0000 (1.0) 1,061.7637 (1.0) 1,165.4286 (1.0) 1,000.0000 (1.0) 200.0000 (1.0) 570;5297 941.8292 (1.0) 149232 1
test_benchmark_list_access[libpy_simdjson-loads] 700.0000 (1.00) 1,340,168.0000 (5.19) 10,658.8974 (10.04) 9,211.5111 (7.90) 10,301.0000 (10.30) 9,300.0000 (46.50) 4637;493 93.8183 (0.10) 81961 1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Legend:
Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
OPS: Operations Per Second, computed as 1 / Mean
项目详情
libpy_simdjson-0.4.0.tar.gz的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | e02b7681e9220eef36d62914c1e324737e12797c1de11bd3589db472fd740da5 |
|
MD5 | 3ade41780dbddbd7da8dbe570bc103ea |
|
BLAKE2b-256 | da7c52b4fa4d56e0e7e2d51c5870d6e45e0962c7620c985e7b3dd8ffb2028d38 |