Python包,封装了encode项目的一些API。
项目描述
Python包,封装了encode项目的一些API。
这里有一个简短的笔记本,其中包含教程。
我该如何安装此包?
像往常一样,只需使用pip下载即可
pip install encodeproject
使用示例
该包包含对Encode Project API进行查询的方法以及过滤响应的方法。每个可用方法都附带完整的docstring,因此欢迎您阅读源代码。
查询
该库目前提供查询方法,这些方法已经集成了某些过滤属性:一个用于实验,另一个用于生物样本。
对于查询实验,您可以运行以下命令
from encodeproject import experiment
experiments = experiment()
让我们来看一个深入示例,展示所有可用参数
from encodeproject import experiment
experiments = experiment(
# The cell line we are interested in.
# For example values could be K562 or GM12878.
# We use None to specify that we are not
# interested in any particular cell line.
cell_line = None,
# The reference genomic assembly we want.
# For example values could be hg19 or GRCh38
# We use None to specify that we are not
# interested in any particular genomic assembly.
assembly = None,
# The target (the genes coding for proteins in this context) we want.
# For example values could be CTCF or H3K27ac
# We use None to specify that we are not
# interested in any particular target.
target = None,
# The status of the data we want.
# We only want released data, meaning data that are
# neither old (archived) or with errors (revoked).
status = 'released',
# The organism we are considering.
# Since we only want Homo sapiens data,
# we specify that organism name.
organism = 'Homo sapiens',
# The format of the files we are interested in
file_type = 'bigWig',
# We ask to consider only experiments with replicas
replicated = True,
# We only want with the signals
# expressed as "fold change over control"
searchTerm = "fold change over control",
# We do not need to specify any other specific
# additional parameters
parameters = None,
# We want to download all the
# available experiments
limit = 'all',
# We want to drop all the experiments
# which have been characterized by significand issues
drop_errors = (
'extremely low read depth',
'missing control alignments',
'control extremely low read depth',
'extremely low spot score',
'extremely low coverage',
'extremely low read length',
'inconsistent control',
'inconsistent read count'
)
)
所有参数都是可选的,它们只是作为额外的过滤器。
对于查询生物样本,您可以运行以下命令
from encodeproject import biosample
my_biosample_query_response = biosample(
accession="ENCSR000EDP", # The accession code for the desired biosample
)
对于实验,也有许多过滤器可用
hg19_samples = biosamples(
# The list of accessions to retrieve
accessions=accession_codes,
# Wethever to convert the results in dataframe.
# The following filters only apply if dataframes are used
to_dataframe = True,
# The status of the data we want.
# We only want released data, meaning data that are
# neither old (archived) or with errors (revoked).
status = "released",
# The organism we want.
organism = "human",
# The genomic assembly we want to use
assembly = "hg19",
# The output type we want.
output_type = "fold change over control",
# And finally the bare minimum amount
# of biological replicates
min_biological_replicates = 2
)
对于一次性运行多个生物样本查询,您可以运行以下命令
from encodeproject import biosamples
responses = biosamples(
accessions=["ENCSR000EDP", "ENCSR030EDP", "ENCSR067EDP"], # The accessions code for the desired biosamples
)
过滤器
由于响应文件可能很大且难以阅读,我还准备了一些过滤器函数。
对于从实验响应中过滤访问号代码,您可以使用
from encodeproject import accessions
codes = accessions(my_experiment_query_response)
对于从生物样本响应中过滤下载URL,您可以使用
from encodeproject import download_urls
codes = download_urls(my_biosample_query_response)
实用工具
下载工具
我还增加了一个从给定URL下载的方法,显示一个加载条,基于StackOverflow上的这个答案。
from encodeproject import download
download("https://encode-public.s3.amazonaws.com/2012/07/01/074e1b37-2be1-4f6a-aa42-6c512fd1834b/ENCFF000XOW.bigWig")
将样本转换为DataFrame的指令
将样本转换为相对简单的pandas DataFrame的工具。
from encodeproject import biosample_to_dataframe
df = biosample_to_dataframe(my_biosample_query_response)
问题和功能请求
这个库最初是为了在encodeproject上编写一些查询而创建的。如果您需要当前库中尚未提供的特定功能,请进行拉取请求(最快的方法:自己添加功能并将其推送到库)或者您也可以打开一个问题,当我有时间时我会处理。
项目详情
关闭
encodeproject-1.0.28.tar.gz的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | bfb8ba7331c385d23ea74d7065a75d265951d2308c20db86b7f3dfc42a35f1d0 |
|
MD5 | 4a4d45a8ec175c5e4bf3cc75aaa7e051 |
|
BLAKE2b-256 | 4bfdab49b0aa09189be113e20d2161986328df604e2e2935ffe35390cca6ea89 |