获取链接到特定网站的维基百科文章列表。
项目描述
wikilinks
wikilinks
是一个Python命令行工具和模块,可以轻松地从指向特定网站的维基百科中提取外部链接。了解维基百科用户如何引用其他网站上提供的内容可能很有用和有趣。这些链接说明了网站在维基百科中的位置以及网站的兴趣主题。
wikilinks
的代码最初是名为Linkypedia的更大项目的一部分,该项目用于可视化维基百科上文化遗产材料的使用。Linkypedia现在已经关闭,但也许这个小小的功能对你仍然有用。
尽管维基百科的API允许你列出给定页面的外部链接,但它(据我所知)没有检索指向特定网站的页面的API调用。然而,他们提供了外部链接搜索页面,允许你在浏览器中执行此查找。wikilinks
只是抓取了所有语言维基百科的结果。
安装
pip install wikilinks
使用
查找指向mith.umd.edu网站的英文维基百科链接,并将它们打印为制表符分隔的URL
% wikilinks --lang en http://mith.umd.edu
https://en.wikipedia.org/wiki/User:Mastersplinter/Making_the_History_of_1989 http://mith.umd.edu/
https://en.wikipedia.org/wiki/University_of_Maryland_Libraries http://mith.umd.edu/ https://en.wikipedia.org/wiki/User:Walker222
http://mith.umd.edu https://en.wikipedia.org/wiki/Maryland_Institute_for_Technology_in_the_Humanities
http://mith.umd.edu/ https://en.wikipedia.org/wiki/User:Edsu http://mith.umd.edu
https://en.wikipedia.org/wiki/University_of_Maryland_College_of_Information_Studies http://mith.umd.edu/ https://en.wikipedia.org/wiki/Antonia%27s_Line
http://mith.umd.edu//WomensStudies/FilmReviews/antonias-line-mcalister https://en.wikipedia.org/wiki/Rio_Nutrias
http://mith.umd.edu//eada/gateway/diario/diary.html https://en.wikipedia.org/wiki/Rio_Nutria_(Zuni_River_tributary)
http://mith.umd.edu//eada/gateway/diario/diary.html https://en.wikipedia.org/wiki/Nutrioso,_Arizona
http://mith.umd.edu//eada/gateway/diario/diary.html https://en.wikipedia.org/wiki/Marilee_Lindemann http://mith.umd.edu/2008/01/
https://en.wikipedia.org/wiki/User:Keilana/Women_scientist_resources http://mith.umd.edu/WomensStudies/Bibliographies/ScienceBiblio/science-pt2
https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Maria_Ford http://mith.umd.edu/WomensStudies/FilmReviews/S/some-nudity-burdette.html
https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Log/2015_January_14 http://mith.umd.edu/WomensStudies/FilmReviews/S/some-nudity-burdette.html
https://en.wikipedia.org/wiki/Talk:Colors_of_the_Wind http://mith.umd.edu/WomensStudies/FilmReviews/pocahontas-strong ...
作为库使用
此示例将检索指向mith.umd.edu
网站的每个维基百科文章链接,作为(source, target)
元组
from wikilinks import wikilinks
for link in wikilinks("http://mith.umd.edu"):
print(link)
输出可能如下所示
('https://ca.wikipedia.org/wiki/Amerigo_Vespucci', 'http://mith.umd.edu//eada/html/display.php?docs=vespucci_letters.xml')
('https://de.wikipedia.org/wiki/Lisa_Monaco', 'http://mith.umd.edu/WomensStudies/GenderIssues/Violence+Women/ResponsetoRape/introduction')
('https://de.wikipedia.org/wiki/Buenaventura_River', 'http://mith.umd.edu/eada/gateway/diario/')
('https://de.wikipedia.org/wiki/Sarah_Kemble_Knight', 'http://mith.umd.edu/eada/html/display.php?docs=knight_journal.xml')
('https://de.wikipedia.org/wiki/Klapperschlangen', 'http://mith.umd.edu/eada/html/display.php?docs=smith_map.xml')
('https://de.wikipedia.org/wiki/Theater_(Bauwerk)', 'http://mith.umd.edu/theatrefinder/')
('https://en.wikipedia.org/wiki/User:Mastersplinter/Making_the_History_of_1989', 'http://mith.umd.edu/')
...
默认情况下,wikilinks
将搜索所有语言的维基百科。如果你只对特定语言的链接感兴趣,可以使用langs
参数
for link in wikilinks("http://mith.umd.edu", langs=["de", "fr"]):
print(link)