跳转到主要内容

Python包,封装了encode项目的一些API。

项目描述

Pypi project Pypi total project downloads

Python包,封装了encode项目的一些API。

这里有一个简短的笔记本,其中包含教程

我该如何安装此包?

像往常一样,只需使用pip下载即可

pip install encodeproject

使用示例

该包包含对Encode Project API进行查询的方法以及过滤响应的方法。每个可用方法都附带完整的docstring,因此欢迎您阅读源代码。

查询

该库目前提供查询方法,这些方法已经集成了某些过滤属性:一个用于实验,另一个用于生物样本

对于查询实验,您可以运行以下命令

from encodeproject import experiment

experiments = experiment()

让我们来看一个深入示例,展示所有可用参数

from encodeproject import experiment

experiments = experiment(
    # The cell line we are interested in.
    # For example values could be K562 or GM12878.
    # We use None to specify that we are not
    # interested in any particular cell line.
    cell_line = None,
    # The reference genomic assembly we want.
    # For example values could be hg19 or GRCh38
    # We use None to specify that we are not
    # interested in any particular genomic assembly.
    assembly = None,
    # The target (the genes coding for proteins in this context) we want.
    # For example values could be CTCF or H3K27ac
    # We use None to specify that we are not
    # interested in any particular target.
    target = None,
    # The status of the data we want.
    # We only want released data, meaning data that are
    # neither old (archived) or with errors (revoked).
    status = 'released',
    # The organism we are considering.
    # Since we only want Homo sapiens data,
    # we specify that organism name.
    organism = 'Homo sapiens',
    # The format of the files we are interested in
    file_type = 'bigWig',
    # We ask to consider only experiments with replicas
    replicated = True,
    # We only want with the signals
    # expressed as "fold change over control"
    searchTerm = "fold change over control",
    # We do not need to specify any other specific
    # additional parameters
    parameters = None,
    # We want to download all the
    # available experiments
    limit = 'all',
    # We want to drop all the experiments
    # which have been characterized by significand issues
    drop_errors = (
        'extremely low read depth',
        'missing control alignments',
        'control extremely low read depth',
        'extremely low spot score',
        'extremely low coverage',
        'extremely low read length',
        'inconsistent control',
        'inconsistent read count'
    )
)

所有参数都是可选的,它们只是作为额外的过滤器。

对于查询生物样本,您可以运行以下命令

from encodeproject import biosample

my_biosample_query_response = biosample(
    accession="ENCSR000EDP", # The accession code for the desired biosample
)

对于实验,也有许多过滤器可用

hg19_samples = biosamples(
    # The list of accessions to retrieve
    accessions=accession_codes,
    # Wethever to convert the results in dataframe.
    # The following filters only apply if dataframes are used
    to_dataframe = True,
    # The status of the data we want.
    # We only want released data, meaning data that are
    # neither old (archived) or with errors (revoked).
    status = "released",
    # The organism we want.
    organism = "human",
    # The genomic assembly we want to use
    assembly = "hg19",
    # The output type we want.
    output_type = "fold change over control",
    # And finally the bare minimum amount
    # of biological replicates
    min_biological_replicates = 2
)

对于一次性运行多个生物样本查询,您可以运行以下命令

from encodeproject import biosamples

responses = biosamples(
    accessions=["ENCSR000EDP", "ENCSR030EDP", "ENCSR067EDP"], # The accessions code for the desired biosamples
)

过滤器

由于响应文件可能很大且难以阅读,我还准备了一些过滤器函数。

对于从实验响应中过滤访问号代码,您可以使用

from encodeproject import accessions

codes = accessions(my_experiment_query_response)

对于从生物样本响应中过滤下载URL,您可以使用

from encodeproject import download_urls

codes = download_urls(my_biosample_query_response)

实用工具

下载工具

我还增加了一个从给定URL下载的方法,显示一个加载条,基于StackOverflow上的这个答案

from encodeproject import download

download("https://encode-public.s3.amazonaws.com/2012/07/01/074e1b37-2be1-4f6a-aa42-6c512fd1834b/ENCFF000XOW.bigWig")

将样本转换为DataFrame的指令

将样本转换为相对简单的pandas DataFrame的工具。

from encodeproject import biosample_to_dataframe

df = biosample_to_dataframe(my_biosample_query_response)

问题和功能请求

这个库最初是为了在encodeproject上编写一些查询而创建的。如果您需要当前库中尚未提供的特定功能,请进行拉取请求(最快的方法:自己添加功能并将其推送到库)或者您也可以打开一个问题,当我有时间时我会处理。

项目详情


下载文件

下载适用于您平台的文件。如果您不确定选择哪个,请了解更多关于安装包的信息。

源代码分发

encodeproject-1.0.28.tar.gz (8.5 kB 查看哈希值)

源代码

由以下支持

AWS AWS 云计算和安全赞助商 Datadog Datadog 监控 Fastly Fastly CDN Google Google 下载分析 Microsoft Microsoft PSF 赞助商 Pingdom Pingdom 监控 Sentry Sentry 错误日志 StatusPage StatusPage 状态页面