跳转到主要内容

检查eeb存档(如互联网档案馆的Wayback Machine)中的快照

项目描述

memento-cli

Build Status

一个命令行工具,与Memento(RFC 7089)交互,支持网页存档,如互联网档案馆的Wayback Machine。

有关创建此工具的更多背景信息,请参阅:https://inkdroid.org/2023/09/14/memento-bisect/

用法

列出快照

要列出给定快照的所有可用快照(或Mementos),可以使用list命令

$ memento list https://web.archive.org/web/20230407140923/https:/help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2017-12-29 05:40:51 https://web.archive.org/web/20171229054051/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-03 20:03:00 https://web.archive.org/web/20180103200300/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-04 06:39:58 https://web.archive.org/web/20180104063958/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-06 16:08:07 https://web.archive.org/web/20180106160807/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-12 06:10:07 https://web.archive.org/web/20180112061007/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-12 17:40:16 https://web.archive.org/web/20180112174016/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-12 18:40:34 https://web.archive.org/web/20180112184034/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-12 19:11:48 https://web.archive.org/web/20180112191148/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-20 19:05:57 https://web.archive.org/web/20180120190557/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-20 19:19:20 https://web.archive.org/web/20180120191920/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
...

由于memento可以与任何支持RFC 7089的存档一起工作,因此您还可以使用它来列出其他网页存档中的版本

$ memento list https://www.webarchive.org.uk/wayback/archive/20130501020401/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
2013-05-01 02:03:57 https://www.webarchive.org.uk/wayback/archive/20130501020357mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition
2013-05-01 02:04:01 https://www.webarchive.org.uk/wayback/archive/20130501020401mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
2013-07-29 12:58:03 https://www.webarchive.org.uk/wayback/archive/20130729125803mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition
2013-07-29 12:58:06 https://www.webarchive.org.uk/wayback/archive/20130729125806mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
2021-01-22 06:38:21 https://www.webarchive.org.uk/wayback/archive/20210122063821mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
2022-03-14 16:36:16 https://www.webarchive.org.uk/wayback/archive/20220314163616mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/

搜索变更(二分法)

假设您知道Twitter的仇恨言论政策过去有关于

女性、有色人种、女同性恋、男同性恋、双性恋、跨性别、酷儿、间性人和无性恋个体的语言

您可以在2019年的互联网档案馆Wayback Machine中看到它(在此处)。但在2023年的页面上看不到它(在此处)。为了确定何时引入了此更改,您可以使用两个快照和--text选项来二分版本历史记录以查找文本丢失的版本。这将在这两个版本之间执行二分搜索以查找文本。

$ memento bisect --missing --text "women, people of color, lesbian, gay" \
  https://web.archive.org/web/20190711134608/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy \
  https://web.archive.org/web/20230621094005/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy

如果需要,--text值也可以是正则表达式。如果您只提供一个快照URL,则它将用作起始索引,并使用存档中的最后一个快照作为结束。

二分命令在幕后使用浏览器(使用Selenium)以完全渲染页面。如果您想找出某些文本何时出现(而不是丢失),则从命令中删除--missing参数。

如果您希望手动检查中间的页面,请省略--text参数,memento将提示您继续,并显示它所控制的浏览器。

如果您想在使用--text时看到浏览器,请使用--show-browser选项。

项目详情


下载文件

下载适用于您平台的文件。如果您不确定选择哪个,请了解更多关于安装包的信息。

源分发

memento_cli-0.0.4.tar.gz (4.7 kB 查看哈希值)

上传时间

构建分发

memento_cli-0.0.4-py3-none-any.whl (5.7 kB 查看哈希值)

上传时间 Python 3

由以下支持