Python API,用于推断稀疏测量的基因型-表型图谱中缺失的数据。
项目描述
GPSeer
用于推断稀疏测量的基因型-表型图谱中缺失数据的简单软件
基本用法
使用pip安装gpseer
pip install gpseer
作为命令行使用时,在包含基因型-表型数据的输入.csv
文件上调用gpseer
。
API Demo.ipynb演示了如何在Jupyter笔记本中使用GPSeer。
下载示例
要开始,使用GPSeer的fetch-example
命令从其Github仓库下载示例。
下载gpseer示例并探索示例输入数据
# fetch data from Github page.
> gpseer fetch-example
[GPSeer] Downloading files to /examples...
[GPSeer] └──>: 100%|██████████████████| 3/3 [00:00<00:00, 9.16it/s]
[GPSeer] └──> Done!
# Change into the example directory and checkout the files that were downloaded
> cd examples/
> ls
API Demo.ipynb
example-full.csv
example-test.csv
example-train.csv
Generate Dataset.ipynb
genotypes.txt
pfcrt-raw-data.csv
使用ML模型预测缺失数据。
在训练集上估计最大似然加性模型,并预测所有缺失基因型。预测将被写入名为"example-train_predictions.csv"
的文件。
> gpseer estimate-ml example-train.csv
[GPSeer] Reading data from example-train.csv...
[GPSeer] └──> Done reading data.
[GPSeer] Constructing a model...
[GPSeer] └──> Done constructing model.
[GPSeer] Fitting data...
[GPSeer] └──> Done fitting data.
[GPSeer] Predicting missing data...
[GPSeer] └──> Done predicting.
[GPSeer] Calculating fit statistics...
[GPSeer]
Fit statistics:
---------------
parameter value
0 num_genotypes 128
1 num_unique_mutations 8
2 explained_variation 0.985186
3 num_parameters 9
4 num_obs_to_converge 2.82714
5 threshold None
6 spline_order None
7 spline_smoothness None
8 epistasis_order 1
[GPSeer]
Convergence:
------------
mutation num_obs num_obs_above fold_target converged
0 F0K 64 64 22.637735 True
1 S1Y 69 69 24.406308 True
2 Q2T 63 63 22.284020 True
3 R3V 70 70 24.760023 True
4 N4D 62 62 21.930306 True
5 A5C 69 69 24.406308 True
6 C6D 65 65 22.991450 True
7 C7A 64 64 22.637735 True
[GPSeer] └──> Done.
[GPSeer] Writing phenotypes to example-train_predictions.csv...
[GPSeer] └──> Done writing predictions!
[GPSeer] Writing plots...
[GPSeer] Writing example-train_correlation-plot.pdf...
[GPSeer] Writing example-train_phenotype-histograms.pdf...
[GPSeer] └──> Done plotting!
[GPSeer] GPSeer finished!
通过交叉验证计算模型的预测能力
使用“交叉验证”子命令来评估您的模型预测数据的效果。尝试以下示例,我们从数据中生成100个子集并计算预测得分。
> gpseer cross-fit example-test.csv
[GPSeer] Reading data from example-train.csv...
[GPSeer] └──> Done reading data.
[GPSeer] Fitting all data data...
[GPSeer] └──> Done fitting data.
[GPSeer] Sampling the data...
[GPSeer] └──>: 100%|████████████████████| 100/100 [00:03<00:00, 25.90it/s]
[GPSeer] └──> Done sampling data.
[GPSeer] Plotting example-train_cross-validation-plot.pdf...
[GPSeer] └──> Done writing data.
[GPSeer] Writing scores to example-train_cross-validation-scores.csv...
[GPSeer] └──> Done writing data
项目详情
关闭
gpseer-0.3.3.tar.gz的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 352e6b9371666c331369091fa43296d95c72e8f3193f401c2844a221fadbdc8c |
|
MD5 | 13d666e452b112a1ea20644d0c9f65c3 |
|
BLAKE2b-256 | 2930561d71db5333d7e158737e2b2e09a217a812bbffa2bdba57240330dffe92 |