跳转到主要内容

一套工具(API + 脚本),用于读取、写入和转换多种格式的字幕

项目描述

简介

这个包是一套工具,可以将字幕从一个格式转换到另一个格式。您将为每个格式找到Writer和Reader,如果您想在命令行中使用它,还有一个脚本。

支持的格式

  • sbv Reader和Writer

  • srt Reader和Writer

  • ttml Reader和Writer

  • transcript Reader和Writer

如何使用(API)

您可以阅读提供的unittest来获得完整的示例

from captionstransformer.sbv import Reader
from captionstransformer.ttml import Writer
from StringIO import StringIO
test_content = StringIO(u"""
0:00:03.490,0:00:07.430
>> FISHER: All right. So, let's begin.
This session is: Going Social

0:00:07.430,0:00:11.600
with the YouTube APIs. I am
Jeff Fisher,

0:00:11.600,0:00:14.009
and this is Johann Hartmann,
we're presenting today.

0:00:14.009,0:00:15.889
[pause]
""")
reader = Reader(test_content)

captions = reader.read()
len(captions) == 4
first = captions[0]
type(first.text) == unicode
first.text == u">> FISHER: All right. So, let's begin.\nThis session is: Going Social\n"

# next get a writer
filelike = StringIO()
writer = Writer(filelike)
writer.set_captions(captions)
text = writer.captions_to_text()
text.startswith(u"""<tt xml:lang="" xmlns="http://www.w3.org/ns/ttml"><body><div>""")
writer.write()
writer.close()

关于格式

这是一份关于现有字幕格式的非常难找到的简单文档。以下是现有命名字幕格式的集合

SubViewer (SUB)

00:04:35.03,00:04:38.82
Hello guys... please sit down...

00:05:00.19,00:05:03.47
M. Franklin,[br]are you crazy?

Youtube (SBV)

0:00:03.490,0:00:07.430
FISHER: All right. So, let's begin.
This session is: Going Social

0:00:07.430,0:00:11.600
with the YouTube APIs. I am
Jeff Fisher,

0:00:11.600,0:00:14.009
and this is Johann Hartmann,
we're presenting today.

0:00:14.009,0:00:15.889
[pause]

SubRip (SRT)

1
00:00:03,490 --> 00:00:07,430
FISHER: All right. So, let's begin.
This session is: Going Social

00:00:07,430 --> 00:00:11,600
with the YouTube APIs. I am
Jeff Fisher,

2
00:00:11,600 --> 00:00:14,009
and this is Johann Hartmann,
we're presenting today.

3
00:00:14,009 --> 00:00:15,889
[pause]

Timed Text Markup Language (TTML)

<tt xml:lang="" xmlns="http://www.w3.org/ns/ttml">
  <body region="subtitleArea">
    <div>
      <p xml:id="subtitle1" begin="0.76s" end="3.45s">
        It seems a paradox, does it not,
      </p>
      <p xml:id="subtitle2" begin="5.0s" end="10.0s">
        that the image formed on<br/>
        the Retina should be inverted?
      </p>
    </div>
  </body>
</tt>

Transcript

<?xml version="1.0" encoding="utf-8" ?>
<transcript>
    <text start="10" dur="2">Hi, I&amp;#39;m Emily from Nomensa</text>
    <text start="12" dur="3">and today I&amp;#39;m going to be talking about the order of content on your pages.</text>
    <text start="16" dur="6">Making sure the content on your web pages is presented logically is a really important part of web accessibility.</text>
    <text start="23" dur="2">Page content should be ordered so it makes sense</text>
</transcript>

Microsoft SAMI (SAMI, SMI)

<SAMI>
<Head>
   <Title>President John F. Kennedy Speech</Title>
   <SAMIParam>
      Copyright {(C)Copyright 1997, Microsoft Corporation}
      Media {JF Kennedy.wav}
      Metrics {time:ms; duration: 73000;}
      Spec {MSFT:1.0;}
   </SAMIParam>
</Head>

<Body>
   <SYNC Start=0>
      <P Class=ENUSCC ID=Source>Pres. John F. Kennedy
   <SYNC Start=10>
      <P Class=ENUSCC>Let the word go forth,
         from this time and place to friend and foe
         alike that the torch
</Body>
</SAMI>

致谢

公司

cirb CIRB / CIBG

makinacom

作者

变更日志

1.2.1 (2012-08-07)

  • 强制将rawcontent转换为Unicode

1.2 (2012-08-06)

  • 添加对transcript中的时间格式(如start="1.8")的支持

  • 添加YouTube下载字幕:captionstransformer.youtube.get_captions & get_reader

1.1 (2012-08-05)

  • 将扩展名和MIME类型信息添加到REGISTRY

  • 将ID信息添加到REGISTRY

1.0 (2012-07-23)

  • 首次发布

项目详情


下载文件

下载适合您平台的文件。如果您不确定选择哪个,请了解更多关于安装包的信息。

源代码分发

captionstransformer-1.2.1.zip (41.5 kB 查看哈希值)

上传时间 源代码

由以下支持