clear-html · PyPI · Python 包索引

清理和标准化HTML。

这些详情尚未通过PyPI验证

项目链接

源代码

项目描述

清理和标准化HTML。保留嵌入内容（例如Twitter、Instagram等）

快速开始

安装

使用pip安装库

pip install clear-html

使用

使用lxml的示例用法

from lxml.html import fromstring
from clear_html import clean_node, cleaned_node_to_html

html="""
        <div style="color:blue" id="main_content">
            Some text to be
            <div>cleaned up!</div>
        </div>
     """
node = fromstring(html)
cleaned_node = clean_node(node)
cleaned_html = cleaned_node_to_html(cleaned_node)
print(cleaned_html)

使用Parsel的示例用法

from parsel import Selector
from clear_html import clean_node, cleaned_node_to_html

selector = Selector(text="""<html>
                            <body>
                                <h1>Hello!</h1>
                                <div style="color:blue" id="main_content">
                                    Some text to be
                                    <div>cleaned up!</div>
                                </div>
                            </body>
                            </html>""")
selector = selector.css("#main_content")
cleaned_node = clean_node(selector[0].root)
cleaned_html = cleaned_node_to_html(cleaned_node)
print(cleaned_html)

上述两种不同方法都会打印以下内容

<article>

<p>Some text to be</p>

<p>cleaned up!</p>

</article>

其他有趣的功能

cleaned_node_to_text：将清理后的节点转换为纯文本
formatted_text.clean_doc：低级方法，用于控制清理的更多方面

项目详情

这些详情尚未通过PyPI验证

项目链接

源代码

发布历史发布通知 | RSS源

本版本

0.4.1

2024年4月30日

0.4.0

2023年8月29日

下载文件

下载适用于您平台的文件。如果您不确定选择哪个，请了解更多关于安装包的信息。

源代码发行版

clear_html-0.4.1.tar.gz (23.9 kB 查看哈希值)

上传于 2024年4月30日 源

构建分发版

clear_html-0.4.1-py3-none-any.whl (24.7 kB 查看哈希值)

上传于 2024年4月30日 Python 3

哈希值 for clear_html-0.4.1.tar.gz

clear_html-0.4.1.tar.gz 的哈希值
算法	哈希摘要
SHA256	`711957bb03b0729caa257679e15881f9e0eeea27236b5c18eac1e75b8af06b06`
MD5	`f9bcf9d2d62dc0724fab546af717b67d`
BLAKE2b-256	`7a28d08437394b1b28e46fd804a99b3ba2e6dc3a1103ac14b097f04ea442bb26`

哈希值 for clear_html-0.4.1-py3-none-any.whl

clear_html-0.4.1-py3-none-any.whl 的哈希值
算法	哈希摘要
SHA256	`a270ed4d78bda7f8d9e308c7c4fa5ebe2bdcf39280730a448064ad677a0a76cf`
MD5	`55f9c42f64099028b74c08742ed731da`
BLAKE2b-256	`d11c349aa7cf8ac99c27a9afd1b27f4c1e5a9a913ae0b6f3fdc988e60b56116c`

clear-html 0.4.1

导航

验证详情

维护者

未验证详情

项目链接

元数据

分类

项目描述

快速开始

安装

使用

项目详情

验证详情

维护者

未验证详情

项目链接

元数据

分类

发布历史发布通知 | RSS源

下载文件

源代码发行版

构建分发版

clear-html 0.4.1

导航

验证详情

维护者

未验证详情

项目链接

元数据

分类

项目描述

快速开始

安装

使用

项目详情

验证详情

维护者

未验证详情

项目链接

元数据

分类

发布历史 发布通知 | RSS源

下载文件

源代码发行版

构建分发版

发布历史发布通知 | RSS源