一套工具(API + 脚本),用于读取、写入和转换多种格式的字幕
项目描述
简介
这个包是一套工具,可以将字幕从一个格式转换到另一个格式。您将为每个格式找到Writer和Reader,如果您想在命令行中使用它,还有一个脚本。
支持的格式
sbv Reader和Writer
srt Reader和Writer
ttml Reader和Writer
transcript Reader和Writer
如何使用(API)
您可以阅读提供的unittest来获得完整的示例
from captionstransformer.sbv import Reader from captionstransformer.ttml import Writer from StringIO import StringIO test_content = StringIO(u""" 0:00:03.490,0:00:07.430 >> FISHER: All right. So, let's begin. This session is: Going Social 0:00:07.430,0:00:11.600 with the YouTube APIs. I am Jeff Fisher, 0:00:11.600,0:00:14.009 and this is Johann Hartmann, we're presenting today. 0:00:14.009,0:00:15.889 [pause] """) reader = Reader(test_content) captions = reader.read() len(captions) == 4 first = captions[0] type(first.text) == unicode first.text == u">> FISHER: All right. So, let's begin.\nThis session is: Going Social\n" # next get a writer filelike = StringIO() writer = Writer(filelike) writer.set_captions(captions) text = writer.captions_to_text() text.startswith(u"""<tt xml:lang="" xmlns="http://www.w3.org/ns/ttml"><body><div>""") writer.write() writer.close()
关于格式
这是一份关于现有字幕格式的非常难找到的简单文档。以下是现有命名字幕格式的集合
SubViewer (SUB)
00:04:35.03,00:04:38.82 Hello guys... please sit down... 00:05:00.19,00:05:03.47 M. Franklin,[br]are you crazy?
Youtube (SBV)
0:00:03.490,0:00:07.430 FISHER: All right. So, let's begin. This session is: Going Social 0:00:07.430,0:00:11.600 with the YouTube APIs. I am Jeff Fisher, 0:00:11.600,0:00:14.009 and this is Johann Hartmann, we're presenting today. 0:00:14.009,0:00:15.889 [pause]
SubRip (SRT)
1 00:00:03,490 --> 00:00:07,430 FISHER: All right. So, let's begin. This session is: Going Social 00:00:07,430 --> 00:00:11,600 with the YouTube APIs. I am Jeff Fisher, 2 00:00:11,600 --> 00:00:14,009 and this is Johann Hartmann, we're presenting today. 3 00:00:14,009 --> 00:00:15,889 [pause]
Timed Text Markup Language (TTML)
<tt xml:lang="" xmlns="http://www.w3.org/ns/ttml"> <body region="subtitleArea"> <div> <p xml:id="subtitle1" begin="0.76s" end="3.45s"> It seems a paradox, does it not, </p> <p xml:id="subtitle2" begin="5.0s" end="10.0s"> that the image formed on<br/> the Retina should be inverted? </p> </div> </body> </tt>
Transcript
<?xml version="1.0" encoding="utf-8" ?> <transcript> <text start="10" dur="2">Hi, I&#39;m Emily from Nomensa</text> <text start="12" dur="3">and today I&#39;m going to be talking about the order of content on your pages.</text> <text start="16" dur="6">Making sure the content on your web pages is presented logically is a really important part of web accessibility.</text> <text start="23" dur="2">Page content should be ordered so it makes sense</text> </transcript>
Microsoft SAMI (SAMI, SMI)
<SAMI> <Head> <Title>President John F. Kennedy Speech</Title> <SAMIParam> Copyright {(C)Copyright 1997, Microsoft Corporation} Media {JF Kennedy.wav} Metrics {time:ms; duration: 73000;} Spec {MSFT:1.0;} </SAMIParam> </Head> <Body> <SYNC Start=0> <P Class=ENUSCC ID=Source>Pres. John F. Kennedy <SYNC Start=10> <P Class=ENUSCC>Let the word go forth, from this time and place to friend and foe alike that the torch </Body> </SAMI>
致谢
公司
作者
JeanMichel FRANCOIS aka toutpt <toutpt@gmail.com>
变更日志
1.2.1 (2012-08-07)
强制将rawcontent转换为Unicode
1.2 (2012-08-06)
添加对transcript中的时间格式(如start="1.8")的支持
添加YouTube下载字幕:captionstransformer.youtube.get_captions & get_reader
1.1 (2012-08-05)
将扩展名和MIME类型信息添加到REGISTRY
将ID信息添加到REGISTRY
1.0 (2012-07-23)
首次发布
项目详情
关闭
captionstransformer-1.2.1.zip 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 6f5e0eb7ba49f0286dd183cbc082c5f2bffe790996ef3e749dbac5d28f8d86d8 |
|
MD5 | b5e30a19435c23dadbebe0ef24367cb3 |
|
BLAKE2b-256 | 3c5429c698d1690d7f35f387bba36bc135d5cf2dfce1c472b5d491e4529b3147 |