机器学习、统计学以及围绕开发者、公司、项目生产力的实用工具
项目描述
devml
机器学习、统计学以及围绕开发者生产力的实用工具
一些实用的功能片段
可以在Github上检查所有仓库
将磁盘上检查出的仓库树转换为pandas数据框
组合数据框的统计
安装
pip install devml
此pip安装安装了一个命令行工具:dml(在下面的文档中引用)。还安装了库devml,也在下面的文档中引用。
获取环境设置
代码编写支持Python 3.6或更高版本。您可以从这里获取:https://pythonlang.cn/downloads/release/python-360/。
在本地运行项目的简单方法是检出仓库,并在仓库根目录下运行
make setup
这将在 ~/.devml 中创建一个虚拟环境
然后,源该虚拟环境
source ~/.devml/bin/activate
运行Make All(安装、检查和测试)
make all # #Example output #(.devml) ➜ devml git:(master) make all #pip install -r requirements.txt #Requirement already satisfied: pytest in /Users/noahgift/.devml/lib/python3.6/site-packages (from -r requirements.txt (line #1) ---------- coverage: platform darwin, python 3.6.2-final-0 ----------- Name Stmts Miss Cover ---------------------------------------------- devml/__init__.py 1 0 100% devml/author_stats.py 6 6 0% devml/fetch_repo.py 54 42 22% devml/mkdata.py 84 21 75% devml/org_stats.py 76 55 28% devml/post_processing.py 50 35 30% devml/state.py 29 9 69% devml/stats.py 55 43 22% devml/ts.py 29 14 52% devml/util.py 12 4 67% dml.py 111 66 41% ---------------------------------------------- TOTAL 507 295 42% ....
如果您不使用虚拟环境或不想使用虚拟环境,没问题,只需运行make all即可。如果您已安装Python 3.6,则可能应该正常工作。
make all
探索Github组织的Jupyter笔记本
您可以使用此示例作为起点在此处探索组合数据集
https://github.com/noahgift/devml/blob/master/notebooks/github_data_exploration.ipynb
板条箱项目
探索仓库流失的Jupyter笔记本
您可以在以下位置探索文件元数据探索示例
https://github.com/noahgift/devml/blob/master/notebooks/repo_file_exploration.ipynb
按类型流失的所有文件
按文件类型流失的板条箱项目相对值
按类型总结的流失统计
按文件类型划分的板条箱项目流失统计
预期配置
命令行工具期望您创建一个项目目录,并包含一个config.json文件。在config.json文件中,您需要提供一个令牌。您可以在以下位置找到有关如何创建令牌的信息: https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/。
或者,您可以通过Python API或命令行选项传递这些值。它们代表以下内容
org: Github组织(用于克隆整个仓库树)
checkout_dir: 检出位置
oath: 由Github生成的个人令牌
➜ devml git:(master) ✗ cat project/config.json { "project" : { "org":"pallets", "checkout_dir": "/tmp/checkout", "oath": "<keygenerated from Github>" } }
基本命令行用法
您可以通过以下方式查看检出或目录的统计信息
dml gstats author --path ~/src/mycompanyrepo(s)
Top Commits By Author: author_name commits
0 John Smith 3059
1 Sally Joe 2995
2 Greg Mathews 2194
3 Jim Mayflower 1448
基本API用法(将仓库树转换为pandas DataFrame)
In [1]: from devml import (mkdata, stats) In [2]: org_df = mkdata.create_org_df(path=/src/mycompanyrepo(s)") In [3]: author_counts = stats.author_commit_count(org_df) In [4]: author_counts.head() Out[4]: author_name commits 0 John Smith 3059 1 Sally Joe 2995 2 Greg Mathews 2194 3 Jim Mayflower 1448 4 Truck Pritter 1441
使用API克隆Github中的所有仓库
In [1]: from devml import (mkdata, stats, state, fetch_repo) In [2]: dest, token, org = state.get_project_metadata("../project/config.json") In [3]: fetch_repo.clone_org_repos(token, org, dest, branch="master") 017-10-14 17:11:36,590 - devml - INFO - Creating Checkout Root: /tmp/checkout 2017-10-14 17:11:37,346 - devml - INFO - Found Repo # 1 REPO NAME: flask , URL: git@github.com:pallets/flask.git 2017-10-14 17:11:37,347 - devml - INFO - Found Repo # 2 REPO NAME: pallets-sphinx-themes , URL: git@github.com:pallets/pallets-sphinx-themes.git 2017-10-14 17:11:37,347 - devml - INFO - Found Repo # 3 REPO NAME: markupsafe , URL: git@github.com:pallets/markupsafe.git 2017-10-14 17:11:37,348 - devml - INFO - Found Repo # 4 REPO NAME: jinja , URL: git@github.com:pallets/jinja.git 2017-10-14 17:11:37,349 - devml - INFO - Found Repo # 5 REPO NAME: werkzeug , URL: git@githu In [4]: !ls -l /tmp/checkout total 0 drwxr-xr-x 21 noahgift wheel 672 Oct 14 17:11 click drwxr-xr-x 25 noahgift wheel 800 Oct 14 17:11 flask drwxr-xr-x 11 noahgift wheel 352 Oct 14 17:11 flask-docs drwxr-xr-x 12 noahgift wheel 384 Oct 14 17:11 flask-ext-migrate drwxr-xr-x 8 noahgift wheel 256 Oct 14 17:11 flask-snippets drwxr-xr-x 14 noahgift wheel 448 Oct 14 17:11 flask-website drwxr-xr-x 18 noahgift wheel 576 Oct 14 17:11 itsdangerous drwxr-xr-x 23 noahgift wheel 736 Oct 14 17:11 jinja drwxr-xr-x 18 noahgift wheel 576 Oct 14 17:11 markupsafe drwxr-xr-x 4 noahgift wheel 128 Oct 14 17:11 meta drwxr-xr-x 10 noahgift wheel 320 Oct 14 17:11 pallets-sphinx-themes drwxr-xr-x 9 noahgift wheel 288 Oct 14 17:11 pocoo-sphinx-themes drwxr-xr-x 15 noahgift wheel 480 Oct 14 17:11 website drwxr-xr-x 25 noahgift wheel 800 Oct 14 17:11 werkzeug
高级CLI-变更:按文件类型获取变更
按变更次数排序并获取扩展名为.py的前十个文件
✗ dml gstats churn --path /Users/noahgift/src/flask --limit 10 --ext .py 2017-10-15 12:10:55,783 - devml.post_processing - INFO - Running churn cmd: [git log --name-only --pretty=format:] at path [/Users/noahgift/src/flask] files churn_count line_count extension \ 1 b'flask/app.py' 316 2183.0 .py 3 b'flask/helpers.py' 176 1019.0 .py 5 b'tests/flask_tests.py' 127 NaN .py 7 b'flask.py' 104 NaN .py 8 b'setup.py' 80 112.0 .py 10 b'flask/cli.py' 75 759.0 .py 11 b'flask/wrappers.py' 70 194.0 .py 12 b'flask/__init__.py' 65 49.0 .py 13 b'flask/ctx.py' 62 415.0 .py 14 b'tests/test_helpers.py' 62 888.0 .py relative_churn 1 0.14 3 0.17 5 NaN 7 NaN 8 0.71 10 0.10 11 0.36 12 1.33 13 0.15 14 0.07
获取扩展名为.py的描述性统计信息,并与另一个仓库进行比较
在此示例中,flask、此仓库和cpython都被比较,以查看中值变更。
(.devml) ➜ devml git:(master) dml gstats metachurn --path /Users/noahgift/src/flask --ext .py --statistic median 2017-10-15 12:39:44,781 - devml.post_processing - INFO - Running churn cmd: [git log --name-only --pretty=format:] at path [/Users/noahgift/src/flask] MEDIAN Statistics: churn_count line_count relative_churn extension .py 2 85.0 0.13 (.devml) ➜ devml git:(master) dml gstats metachurn --path /Users/noahgift/src/devml --ext .py --statistic median 2017-10-15 12:40:10,999 - devml.post_processing - INFO - Running churn cmd: [git log --name-only --pretty=format:] at path [/Users/noahgift/src/devml] MEDIAN Statistics: churn_count line_count relative_churn extension .py 1 62.5 0.02 (.devml) ➜ devml git:(master) dml gstats metachurn --path /Users/noahgift/src/cpython --ext .py --statistic median 2017-10-15 12:42:19,260 - devml.post_processing - INFO - Running churn cmd: [git log --name-only --pretty=format:] at path [/Users/noahgift/src/cpython] MEDIAN Statistics: churn_count line_count relative_churn extension .py 7 169.5 0.1
比较CPython活动比率与Linux活动比率
# Linux Development Active Ratio dml gstats activity --path /Users/noahgift/src/linux --sort active_days author_name active_days active_duration active_ratio 14541 Takashi Iwai 1677 4590 days 0.370000 4382 Eric Dumazet 1460 4504 days 0.320000 3641 David S. Miller 1428 4513 days 0.320000 7216 Johannes Berg 1329 4328 days 0.310000 8717 Linus Torvalds 1281 4565 days 0.280000 275 Al Viro 1249 4562 days 0.270000 9915 Mauro Carvalho Chehab 1227 4464 days 0.270000 9375 Mark Brown 1198 4187 days 0.290000 3172 Dan Carpenter 1158 3972 days 0.290000 12979 Russell King 1141 4602 days 0.250000 1683 Axel Lin 1040 2720 days 0.380000 400 Alex Deucher 1036 3497 days 0.300000 # CPython Development Active Ratio author_name active_days active_duration active_ratio 146 Guido van Rossum 2256 9673 days 0.230000 301 Raymond Hettinger 1361 5635 days 0.240000 128 Fred Drake 1239 5335 days 0.230000 47 Benjamin Peterson 1234 3494 days 0.350000 132 Georg Brandl 1080 4091 days 0.260000 375 Victor Stinner 980 2818 days 0.350000 235 Martin v. Löwis 958 5266 days 0.180000 36 Antoine Pitrou 883 3376 days 0.260000 362 Tim Peters 869 5060 days 0.170000 164 Jack Jansen 800 4998 days 0.160000 24 Andrew M. Kuchling 743 4632 days 0.160000 330 Serhiy Storchaka 720 1759 days 0.410000 44 Barry Warsaw 696 8485 days 0.080000 52 Brett Cannon 681 5278 days 0.130000 262 Neal Norwitz 559 2573 days 0.220000 In this analysis, Guido of Python has a 23% probability of working on a given day, and Linux has a 28% chance.
删除统计信息
查找仓库中的所有删除文件
dml gstats deleted --path /Users/noahgift/src/flask DELETION STATISTICS files ext 0 b'tests/test_deprecations.py' .py 1 b'scripts/flask-07-upgrade.py' .py 2 b'flask/ext/__init__.py' .py 3 b'flask/exthook.py' .py 4 b'scripts/flaskext_compat.py' .py 5 b'tests/test_ext.py' .py
常见问题解答
什么是变更以及为什么我要关心它?
代码变更是指文件被修改的次数。相对变更是指相对于代码行的修改次数。关于软件缺陷的研究表明,相对代码变更对缺陷具有高度预测性,即相对变更数越大,缺陷数量越多。
“相对代码变更的增加伴随着系统缺陷密度的增加;”
您可以在以下位置阅读整个研究: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/icse05churn.pdf
项目详情
devml-0.5.1.tar.gz的散列
算法 | 散列摘要 | |
---|---|---|
SHA256 | fd5d92ffe31f01dde01626ca5703db999749e1d6e88e44d59b70c477ee6fa6f3 |
|
MD5 | 4d6fccd00648df0634310d6c87ab410a |
|
BLAKE2b-256 | fcb77fc5c02b9741ba3ba5966347f1231fc0e6f384f3311a3a00f514ac282165 |