跳转到主要内容

smidjson的Python绑定,使用libpy

项目描述

libpy simdjson

On Master Merge PyPI version

使用libpysimdjson提供的Python绑定。

要求

  • 操作系统:macOS>10.15, Linux。
  • 编译器:gcc>=9, clang >= 10(C++17代码)
  • Python:libpy>=0.2.3, numpy。

安装

pip install libpy-simdjson

注意:安装libpylibpy_simdjson所需)将使用python可执行文件来获取有关您环境的信息。如果您没有使用虚拟环境或python没有指向您想使用的Python安装(通过which pythonpython --version检查),您必须使用PYTHON环境变量指向您的Python可执行文件,即PYTHON=python3 makePYTHON=python3 pip3 install libpy。此外,确保您的CCCXX环境变量指向正确的编译器。

使用方法

from pathlib import Path
import libpy_simdjson as json
doc = json.load(Path("twitter.json"))
# or json.load(b"twitter.json")
# or json.load("twitter.json")
# we also support `loads` for strings.

doc是一个Object。对象充当具有特殊方法的python字典。

isinstance(doc, json.Object)
True

我们可以获取键、获取长度、获取项,并访问特定的键

len(doc)
2
doc.keys()
[b'statuses', b'search_metadata']
doc[b'search_metadata'].items()
[(b'completed_in', 0.087),
 (b'max_id', 505874924095815700),
 (b'max_id_str', b'505874924095815681'),
 (b'next_results',
  b'?max_id=505874847260352512&q=%E4%B8%80&count=100&include_entities=1'),
 (b'query', b'%E4%B8%80'),
 (b'refresh_url',
  b'?since_id=505874924095815681&q=%E4%B8%80&include_entities=1'),
 (b'count', 100),
 (b'since_id', 0),
 (b'since_id_str', b'0')]

如果您需要实际的Python字典,请使用as_dict

doc[b'search_metadata'].as_dict()
{b'completed_in': 0.087,
 b'max_id': 505874924095815700,
 b'max_id_str': b'505874924095815681',
 b'next_results': b'?max_id=505874847260352512&q=%E4%B8%80&count=100&include_entities=1',
 b'query': b'%E4%B8%80',
 b'refresh_url': b'?since_id=505874924095815681&q=%E4%B8%80&include_entities=1',
 b'count': 100,
 b'since_id': 0,
 b'since_id_str': b'0'}

然而,我们也支持通过at_pointer使用JSON Pointer语法。如果您知道您在寻找什么,这将更快。

doc.at_pointer(b"/statuses/50/created_at")
b'Sun Aug 31 00:29:04 +0000 2014'
doc.at_pointer(b"/statuses/50/text").decode()
'RT @Ang_Angel73: 逢坂「くっ…僕の秘められし右目が…!」\n一同「……………。」'

让我们看看statuses

statuses = doc[b'statuses']

statuses是一个Array。数组像Python列表一样工作,但有特殊方法。

注意:statusesdoc共享一个单一的解析器实例。当这些对象存在时,我们无法解析新的文档(尽管我们可以通过libpy_simdjson.Parser.load创建新的解析器)。

isinstance(statuses, json.Array)
True

数组支持长度、索引和迭代。

len(statuses)
100
statuses[0][b'text'].decode()
'@aym0566x \n\n名前:前田あゆみ\n第一印象:なんか怖っ!\n今の印象:とりあえずキモい。噛み合わない\n好きなところ:ぶすでキモいとこ😋✨✨\n思い出:んーーー、ありすぎ😊❤️\nLINE交換できる?:あぁ……ごめん✋\nトプ画をみて:照れますがな😘✨\n一言:お前は一生もんのダチ💖'
for status in statuses:
    # this is a bad example but you get the picture
    if status[b'id'] % 2 == 0:
        print(status[b"text"].decode())
        break
else:
    print("no even ids?")
@aym0566x

名前:前田あゆみ
第一印象:なんか怖っ!
今の印象:とりあえずキモい。噛み合わない
好きなところ:ぶすでキモいとこ😋✨✨
思い出:んーーー、ありすぎ😊❤️
LINE交換できる?:あぁ……ごめん✋
トプ画をみて:照れますがな😘✨
一言:お前は一生もんのダチ💖

如果您需要,可以使用as_list将数组转换为列表。

statuses.as_list()[1][b'metadata']
{b'result_type': b'recent', b'iso_language_code': b'ja'}

但是,就像对象一样,我们也支持通过at_pointer使用JSON Pointers,这将更快。

statuses.at_pointer(b"/33/created_at")
b'Sun Aug 31 00:29:06 +0000 2014'

基准测试

注意 - 与大多数其他Python JSON解析器不同,libpy_simdjson会根据设计,尽可能晚地将数据转换为原生Python类型,为您提供ObjectArray对象。libpy允许您将这些代理对象作为实际的Python对象来处理,而无需承担对象转换的成本,直到真正需要。由于C++ simdjson库非常高效,转换为Python对象是解析过程中最慢的部分,所以我们努力在尽可能晚和尽可能少的字段上进行转换。

请参阅(仍在进行的)“python dict访问开销”基准测试,以了解对象转换的开销。

为了比较,我们还在libpy_simdjson_as_py_obj中进行了完整的Python对象转换的基准测试,尽管这并不是预期的使用场景。


------------------------------------------------------------ benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/canada.json': 8 tests -------------------------------------------------------------
Name (time in ms)                                                                    Min                Max               Mean            StdDev             Median                IQR            Outliers       OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path0-libpy_simdjson-loads]                                   3.5037 (1.0)      10.9591 (1.0)       4.3319 (1.0)      0.6166 (1.0)       4.2607 (1.0)       0.2292 (1.0)         11;15  230.8454 (1.0)         162           1
test_benchmark_load[path0-pysimdjson-parse]                                       3.6885 (1.05)     11.1368 (1.02)      4.4029 (1.02)     0.7765 (1.26)      4.2611 (1.00)      0.5017 (2.19)          8;5  227.1254 (0.98)        118           1
test_benchmark_load[path0-pysimdjson_as_py_obj-loads]                            13.5282 (3.86)     37.4092 (3.41)     24.2264 (5.59)     6.5177 (10.57)    27.0374 (6.35)     11.9384 (52.08)        16;0   41.2773 (0.18)         40           1
test_benchmark_load[path0-libpy_simdjson_as_py_obj-libpy_simdjson_as_py_obj]     13.5544 (3.87)     44.5382 (4.06)     22.4503 (5.18)     7.1879 (11.66)    25.0067 (5.87)     12.0174 (52.43)        12;0   44.5427 (0.19)         35           1
test_benchmark_load[path0-orjson-loads]                                          16.1693 (4.61)     37.2228 (3.40)     25.2505 (5.83)     6.8427 (11.10)    27.4105 (6.43)     12.5579 (54.79)        19;0   39.6032 (0.17)         41           1
test_benchmark_load[path0-ujson-loads]                                           22.0310 (6.29)     45.6815 (4.17)     32.3445 (7.47)     7.0874 (11.49)    35.0020 (8.22)     12.6422 (55.15)        12;0   30.9171 (0.13)         27           1
test_benchmark_load[path0-python_json-loads]                                     49.6505 (14.17)    72.4977 (6.62)     62.0533 (14.32)    7.2277 (11.72)    64.0998 (15.04)    12.7639 (55.68)         8;0   16.1152 (0.07)         19           1
test_benchmark_load[path0-rapidjson-loads]                                       50.3836 (14.38)    76.2291 (6.96)     61.1768 (14.12)    7.7522 (12.57)    63.4637 (14.90)    12.2982 (53.65)         6;0   16.3461 (0.07)         17           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------



--------------------------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/citm_catalog.json': 8 tests ---------------------------------------------------------------------
Name (time in us)                                                                        Min                    Max                   Mean                StdDev                 Median                   IQR            Outliers       OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path3-libpy_simdjson-loads]                                     869.1500 (1.0)       5,930.8430 (2.49)      1,123.0447 (1.0)        203.5337 (1.52)      1,112.6140 (1.0)         81.2050 (1.02)        29;35  890.4365 (1.0)         722           1
test_benchmark_load[path3-pysimdjson-parse]                                         875.8530 (1.01)      2,384.6440 (1.0)       1,127.0993 (1.00)       133.4820 (1.0)       1,119.4185 (1.01)        79.8040 (1.0)         72;38  887.2333 (1.00)        786           1
test_benchmark_load[path3-libpy_simdjson_as_py_obj-libpy_simdjson_as_py_obj]      6,227.8560 (7.17)     23,751.5410 (9.96)     10,109.4607 (9.00)     5,149.2095 (38.58)     7,506.5770 (6.75)     1,209.3185 (15.15)       27;29   98.9172 (0.11)        120           1
test_benchmark_load[path3-orjson-loads]                                           6,445.2870 (7.42)     23,814.3440 (9.99)     10,414.0837 (9.27)     4,872.0612 (36.50)     7,897.2230 (7.10)     2,016.7280 (25.27)       26;26   96.0238 (0.11)        114           1
test_benchmark_load[path3-pysimdjson_as_py_obj-loads]                             7,940.2930 (9.14)     24,984.5340 (10.48)    11,900.6726 (10.60)    5,518.1670 (41.34)     8,878.9950 (7.98)     3,686.7190 (46.20)       21;21   84.0289 (0.09)         90           1
test_benchmark_load[path3-ujson-loads]                                            8,323.1930 (9.58)     23,228.4760 (9.74)     12,104.0124 (10.78)    5,363.4790 (40.18)     9,344.6530 (8.40)       802.3722 (10.05)       19;20   82.6172 (0.09)         83           1
test_benchmark_load[path3-python_json-loads]                                     12,697.9430 (14.61)    31,279.0080 (13.12)    16,691.7376 (14.86)    5,163.8324 (38.69)    14,104.4160 (12.68)    2,390.4543 (29.95)       17;17   59.9099 (0.07)         75           1
test_benchmark_load[path3-rapidjson-loads]                                       14,025.9210 (16.14)    29,509.2470 (12.37)    17,785.5409 (15.84)    5,061.8217 (37.92)    15,456.7050 (13.89)    1,756.3833 (22.01)         7;7   56.2254 (0.06)         35           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------



--------------------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/github_events.json': 8 tests ---------------------------------------------------------------
Name (time in us)                                                                     Min                   Max                Mean              StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path2-libpy_simdjson-loads]                                   25.4010 (1.0)      1,960.0230 (1.28)      36.2258 (1.0)       24.5956 (1.0)       34.6020 (1.0)       2.9000 (1.04)      72;1589       27.6046 (1.0)       12642           1
test_benchmark_load[path2-pysimdjson-parse]                                       25.8010 (1.02)     1,528.8010 (1.0)       37.9300 (1.05)      26.3332 (1.07)      37.3020 (1.08)      2.8010 (1.0)      189;1856       26.3644 (0.96)      16103           1
test_benchmark_load[path2-libpy_simdjson_as_py_obj-libpy_simdjson_as_py_obj]     142.1090 (5.59)     3,436.0130 (2.25)     207.5540 (5.73)      88.5276 (3.60)     204.8130 (5.92)     17.6010 (6.28)       58;289        4.8180 (0.17)       4034           1
test_benchmark_load[path2-orjson-loads]                                          175.7120 (6.92)     5,736.6740 (3.75)     254.9315 (7.04)     162.0033 (6.59)     239.3160 (6.92)     20.8005 (7.43)       26;226        3.9226 (0.14)       2872           1
test_benchmark_load[path2-pysimdjson_as_py_obj-loads]                            224.7150 (8.85)     2,283.7510 (1.49)     321.5699 (8.88)      72.4103 (2.94)     324.0220 (9.36)     26.0020 (9.28)       79;180        3.1097 (0.11)       2632           1
test_benchmark_load[path2-ujson-loads]                                           301.7190 (11.88)    7,409.5770 (4.85)     375.2140 (10.36)    180.4753 (7.34)     363.8240 (10.51)    29.0010 (10.35)      18;175        2.6651 (0.10)       2269           1
test_benchmark_load[path2-python_json-loads]                                     330.5200 (13.01)    2,521.5590 (1.65)     459.9910 (12.70)     86.1825 (3.50)     455.8290 (13.17)    38.7277 (13.83)     119;146        2.1740 (0.08)       1909           1
test_benchmark_load[path2-rapidjson-loads]                                       380.6250 (14.98)    2,082.2340 (1.36)     533.6130 (14.73)     91.0874 (3.70)     529.9340 (15.32)    43.7030 (15.60)       76;95        1.8740 (0.07)       1709           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------




----------------------------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/mesh.json': 8 tests ------------------------------------------------------------------------
Name (time in us)                                                                       Min                    Max                   Mean                StdDev                 Median                 IQR            Outliers       OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path4-libpy_simdjson-loads]                                    994.5520 (1.0)       4,240.8210 (1.06)      1,202.4742 (1.0)        194.6273 (1.04)      1,190.2620 (1.0)       93.9050 (1.31)        80;42  831.6187 (1.0)         772           1
test_benchmark_load[path4-pysimdjson-parse]                                      1,027.3570 (1.03)      4,001.3210 (1.0)       1,257.5495 (1.05)       187.9555 (1.0)       1,253.6190 (1.05)      71.9545 (1.0)         37;50  795.1973 (0.96)        804           1
test_benchmark_load[path4-pysimdjson_as_py_obj-loads]                            2,782.4560 (2.80)     16,753.7350 (4.19)      3,907.4828 (3.25)     2,316.1314 (12.32)     3,456.5430 (2.90)     264.9645 (3.68)         9;18  255.9192 (0.31)        252           1
test_benchmark_load[path4-libpy_simdjson_as_py_obj-libpy_simdjson_as_py_obj]     2,936.7520 (2.95)     19,099.6830 (4.77)      4,066.0482 (3.38)     2,579.4705 (13.72)     3,459.0780 (2.91)     291.3400 (4.05)        10;21  245.9390 (0.30)        207           1
test_benchmark_load[path4-orjson-loads]                                          3,848.3120 (3.87)     16,809.8280 (4.20)      4,853.2711 (4.04)     2,387.4255 (12.70)     4,348.8370 (3.65)     282.5400 (3.93)         8;13  206.0466 (0.25)        203           1
test_benchmark_load[path4-ujson-loads]                                           4,224.0310 (4.25)     27,094.9610 (6.77)      5,393.1933 (4.49)     2,651.7927 (14.11)     4,906.3150 (4.12)     368.6210 (5.12)         7;13  185.4189 (0.22)        196           1
test_benchmark_load[path4-python_json-loads]                                     7,946.7190 (7.99)     21,914.9540 (5.48)      9,216.2188 (7.66)     2,458.6174 (13.08)     8,729.1600 (7.33)     425.9220 (5.92)          4;8  108.5044 (0.13)        108           1
test_benchmark_load[path4-rapidjson-loads]                                       8,958.0800 (9.01)     26,303.9020 (6.57)     10,523.4513 (8.75)     2,727.4495 (14.51)    10,081.1370 (8.47)     748.3645 (10.40)         4;4   95.0259 (0.11)        101           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


---------------------------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/twitter.json': 8 tests ----------------------------------------------------------------------
Name (time in us)                                                                       Min                    Max                  Mean                StdDev                Median                 IQR            Outliers         OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path1-libpy_simdjson-loads]                                    329.3220 (1.0)       3,357.5250 (2.67)       440.8892 (1.0)         84.7542 (1.65)       441.1300 (1.0)       37.4280 (1.22)        64;75  2,268.1435 (1.0)        1637           1
test_benchmark_load[path1-pysimdjson-parse]                                        348.7250 (1.06)      1,258.3910 (1.0)        451.6477 (1.02)        51.2426 (1.0)        450.8825 (1.02)      30.8030 (1.0)       374;155  2,214.1149 (0.98)       2006           1
test_benchmark_load[path1-libpy_simdjson_as_py_obj-libpy_simdjson_as_py_obj]     2,206.2470 (6.70)     14,217.4510 (11.30)    2,759.9331 (6.26)     1,386.3121 (27.05)    2,551.0705 (5.78)     312.4220 (10.14)         2;3    362.3276 (0.16)         74           1
test_benchmark_load[path1-orjson-loads]                                          2,639.8850 (8.02)     15,075.4730 (11.98)    3,420.6474 (7.76)     1,657.3556 (32.34)    3,183.0750 (7.22)     215.1150 (6.98)         6;15    292.3423 (0.13)        270           1
test_benchmark_load[path1-ujson-loads]                                           3,304.1320 (10.03)    18,557.3880 (14.75)    4,286.1597 (9.72)     1,868.0725 (36.46)    4,021.4820 (9.12)     283.4200 (9.20)         5;14    233.3091 (0.10)        239           1
test_benchmark_load[path1-pysimdjson_as_py_obj-loads]                            3,319.1400 (10.08)    16,355.6780 (13.00)    4,237.6133 (9.61)     1,642.9272 (32.06)    3,982.8870 (9.03)     306.1198 (9.94)         6;16    235.9819 (0.10)        259           1
test_benchmark_load[path1-python_json-loads]                                     4,154.6810 (12.62)    17,041.0680 (13.54)    5,474.9570 (12.42)    1,520.8511 (29.68)    5,262.9090 (11.93)    484.4330 (15.73)         6;8    182.6498 (0.08)        190           1
test_benchmark_load[path1-rapidjson-loads]                                       5,184.5590 (15.74)    19,217.2170 (15.27)    6,598.9679 (14.97)    1,949.8404 (38.05)    6,315.0330 (14.32)    682.2923 (22.15)         5;7    151.5388 (0.07)        149           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


-------------------------------------------------------------------------------------------- benchmark 'Random attribute access': 2 tests --------------------------------------------------------------------------------------------
Name (time in ns)                                  Min                       Max                  Mean                StdDev                Median                 IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_at[python_json-loads]          800.0000 (1.0)      1,560,881.0000 (6.79)     1,190.4413 (1.0)      6,373.7290 (3.79)     1,100.0000 (1.0)      100.0000 (1.0)      122;3966      840.0246 (1.0)       88496           1
test_benchmark_at[libpy_simdjson-loads]     1,300.0000 (1.63)       229,812.0000 (1.0)      1,848.6849 (1.55)     1,683.4805 (1.0)      1,800.0000 (1.64)     400.0000 (4.00)      143;234      540.9251 (0.64)      57801           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


---------------------------------------------------------------------------------------------------- benchmark 'Random list access': 2 tests ----------------------------------------------------------------------------------------------------
Name (time in ns)                                         Min                       Max                   Mean                StdDev                 Median                   IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_list_access[python_json-loads]        700.0000 (1.0)        258,213.0000 (1.0)       1,061.7637 (1.0)      1,165.4286 (1.0)       1,000.0000 (1.0)        200.0000 (1.0)      570;5297      941.8292 (1.0)      149232           1
test_benchmark_list_access[libpy_simdjson-loads]     700.0000 (1.00)     1,340,168.0000 (5.19)     10,658.8974 (10.04)    9,211.5111 (7.90)     10,301.0000 (10.30)    9,300.0000 (46.50)    4637;493       93.8183 (0.10)      81961           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

项目详情


下载文件

下载适合您平台的文件。如果您不确定选择哪个,请了解更多关于安装包的信息。

源代码分发

libpy_simdjson-0.4.0.tar.gz (619.3 kB 查看哈希值)

上传时间: 源代码

支持者