用于抓取和解析关塔那摩湾囚犯的定期审查秘书处网页的客户端。
项目描述
> 从 他们的网站:“定期审查秘书处(PRS)开发和实施合格关塔那摩湾囚犯的定期审查流程,包括为囚犯提供个人代表。”
用法
PRB 对关塔那摩囚犯的文件进行三种不同形式的审查:初步审查、文件审查和全面审查。技术上,还有第四种类型,即后续全面审查,但目前尚未发布后续全面审查。
初步审查
initial_review --csv > initial_review.csv
initial_review --json > initial_review.json
initial_review --tsv > initial_review.tsv
文件审查
file_review --csv > file_review.csv
file_review --json > file_review.json
file_review --tsv > file_review.tsv
全面审查
full_review --csv > full_review.csv
full_review --json > full_review.json
full_review --tsv > full_review.tsv
架构
为每个文档返回一行或一个对象。每个文档包含特定于文档的字段,如 type_name、type_id 和 url,以及特定于囚犯的字段,如 name 和 isn。从 isn-type_id-hearing_or_review_date 为每个文档构建唯一的id。
[
{
"review_type": "full-review",
"review_url": "http://www.prs.mil/Review-Information/Initial-Review/",
"hearing_or_review_date":"2014-11-05",
"denial":null,
"name":"Abdel Malik Ahmed Abdel Wahab Al Rahabi",
"type_id":"1",
"url":"http:\/\/www.prs.mil\/Portals\/60\/Documents\/ISN037\/141105_U_ISN037_GOVERNMENT'S_UNCLASSIFIED_SUMMARY_PUBLIC.pdf",
"type_name":"Government's Unclassified Summary",
"id":"037-initial-review-1-2014-11-05",
"isn":"037",
"denied":false,
"notification_date":"2014-08-26",
"final_determination_date":"2014-12-05"
},
{
"review_type": "full-review",
"review_url": "http://www.prs.mil/Review-Information/Initial-Review/",
"hearing_or_review_date":"2014-11-05",
"denial":null,
"name":"Abdel Malik Ahmed Abdel Wahab Al Rahabi",
"type_id":"2",
"url":"http:\/\/www.prs.mil\/Portals\/60\/Documents\/ISN037\/141105_U_ISN037_PR_STATEMENT_PRB.pdf",
"type_name":"Opening Statements of Detainee's Representatives",
"id":"037-initial-review-2-2014-11-05",
"isn":"037",
"denied":false,
"notification_date":"2014-08-26",
"final_determination_date":"2014-12-05"
},
{
"review_type": "full-review",
"review_url": "http://www.prs.mil/Review-Information/Initial-Review/",
"hearing_or_review_date":"2014-11-05",
"denial":null,
"name":"Abdel Malik Ahmed Abdel Wahab Al Rahabi",
"type_id":"3",
"url":"http:\/\/www.prs.mil\/Portals\/60\/Documents\/ISN037\/141216_U_ISN037_DETAINEE_WRITTEN_SUBMISSION_PUBLIC.pdf",
"type_name":"Detainee's Written Submission",
"id":"037-initial-review-3-2014-11-05",
"isn":"037",
"denied":false,
"notification_date":"2014-08-26",
"final_determination_date":"2014-12-05"
},
{
"review_type": "full-review",
"review_url": "http://www.prs.mil/Review-Information/Initial-Review/",
"hearing_or_review_date":"2014-11-05",
"denial":null,
"name":"Abdel Malik Ahmed Abdel Wahab Al Rahabi",
"type_id":"4",
"url":"http:\/\/www.prs.mil\/LinkClick.aspx?fileticket=RFOMdQD69k4%3d&tabid=8447&portalid=60&mid=20067",
"type_name":"Transcript of Public Session",
"id":"037-initial-review-4-2014-11-05",
"isn":"037",
"denied":false,
"notification_date":"2014-08-26",
"final_determination_date":"2014-12-05"
},
{
"review_type": "full-review",
"review_url": "http://www.prs.mil/Review-Information/Initial-Review/",
"hearing_or_review_date":"2014-11-05",
"denial":null,
"name":"Abdel Malik Ahmed Abdel Wahab Al Rahabi",
"type_id":"5",
"url":"http:\/\/www.prs.mil\/Portals\/60\/Documents\/ISN037\/141105_U_ISN037_TRANSCRIPT_OF_DETAINEE_SESSION_PUBLIC.pdf",
"type_name":"Transcript of Detainee Session",
"id":"037-initial-review-5-2014-11-05",
"isn":"037",
"denied":false,
"notification_date":"2014-08-26",
"final_determination_date":"2014-12-05"
},
{
"review_type": "full-review",
"review_url": "http://www.prs.mil/Review-Information/Initial-Review/",
"hearing_or_review_date":"2014-11-05",
"denial":null,
"name":"Abdel Malik Ahmed Abdel Wahab Al Rahabi",
"type_id":"6",
"url":"http:\/\/www.prs.mil\/LinkClick.aspx?fileticket=s0XT-7qYc94%3d&tabid=8447&portalid=60&mid=20067",
"type_name":"Unclassified Summary of Final Determination",
"id":"037-initial-review-6-2014-11-05",
"isn":"037",
"denied":false,
"notification_date":"2014-08-26",
"final_determination_date":"2014-12-05"
}
]
输出
爬虫可以返回CSV、JSON或TSV格式。如果没有传递任何选项,默认为CSV格式。
项目详情
下载文件
下载适合您平台文件的文件。如果您不确定该选择哪个,请了解更多关于安装包的信息。
源代码分发
nyt-prb-scraper-0.0.10.tar.gz (4.1 kB 查看哈希值)
构建分发
关闭
nyt-prb-scraper-0.0.10.tar.gz的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 94f9dcb065a2b48533bf4846750329e7d08bc4895cb3f4a9856d7c51956c4145 |
|
MD5 | 20dcca0da7e8130872fe7f691581bfcd |
|
BLAKE2b-256 | 1386de45ddda8e209a1a0924d6b5ab52817df421cae5371620899d4c01e41b4a |
关闭
nyt_prb_scraper-0.0.10-py3-none-any.whl的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 6dd198d58affd2f198dcc1c7ac472aec4842c8e2462f72b6f4e9b835f455e10f |
|
MD5 | 40a3cff8691a78f4dac4e22b491be51b |
|
BLAKE2b-256 | 51dd61050859fd34ad70f1aaa97bb63c65f8438456839fe1b329d2f49ed496a8 |