使用深度学习确定文件内容类型的工具
项目描述
Magika Python 包
Magika是一个基于人工智能的文件类型检测工具,它依赖于深度学习的最新进展,以提供准确的检测。在内部,Magika使用一个定制的、高度优化的Keras模型,其大小仅为约1MB,即使在单个CPU上运行,也能在毫秒内实现精确的文件识别。
将Magika用作命令行客户端或Python代码中的模块!
请访问GitHub上的Magika以获取更多信息和相关文档: https://github.com/google/magika.
安装Magika
$ pip install magika
如果您只想将Magika用作命令行,您可能想使用 $ pipx install magika
。
将Magika用作命令行工具
$ magika examples/*
code.asm: Assembly (code)
code.py: Python source (code)
doc.docx: Microsoft Word 2007+ document (document)
doc.ini: INI configuration file (text)
elf64.elf: ELF executable (executable)
flac.flac: FLAC audio bitstream data (audio)
image.bmp: BMP image data (image)
java.class: Java compiled bytecode (executable)
jpg.jpg: JPEG image data (image)
pdf.pdf: PDF document (document)
pe32.exe: PE executable (executable)
png.png: PNG image data (image)
README.md: Markdown document (text)
tar.tar: POSIX tar archive (archive)
webm.webm: WebM data (video)
$ magika --help
Usage: magika [OPTIONS] [FILE]...
Magika - Determine type of FILEs with deep-learning.
Options:
-r, --recursive When passing this option, magika scans every
file within directories, instead of
outputting "directory"
--json Output in JSON format.
--jsonl Output in JSONL format.
-i, --mime-type Output the MIME type instead of a verbose
content type description.
-l, --label Output a simple label instead of a verbose
content type description. Use --list-output-
content-types for the list of supported
output.
-c, --compatibility-mode Compatibility mode: output is as close as
possible to `file` and colors are disabled.
-s, --output-score Output the prediction score in addition to
the content type.
-m, --prediction-mode [best-guess|medium-confidence|high-confidence]
--batch-size INTEGER How many files to process in one batch.
--no-dereference This option causes symlinks not to be
followed. By default, symlinks are
dereferenced.
--colors / --no-colors Enable/disable use of colors.
-v, --verbose Enable more verbose output.
-vv, --debug Enable debug logging.
--generate-report Generate report useful when reporting
feedback.
--version Print the version and exit.
--list-output-content-types Show a list of supported content types.
--model-dir DIRECTORY Use a custom model.
-h, --help Show this message and exit.
Magika version: "0.5.0"
Default model: "standard_v1"
Send any feedback to magika-dev@google.com or via GitHub issues.
将Magika用作Python模块
from magika import Magika
magika = Magika()
result = magika.identify_bytes(b"# Example\nThis is an example of markdown!")
print(result.output.ct_label) # Output: "markdown"
引用
如果您在研究中使用此软件,请按照以下方式引用它:
@software{magika,
author = {Fratantonio, Yanick and Bursztein, Elie and Invernizzi, Luca and Zhang, Marina and Metitieri, Giancarlo and Kurt, Thomas and Galilee, Francois and Petit-Bianco, Alexandre and Farah, Loua and Albertini, Ange},
title = {{Magika content-type scanner}},
url = {https://github.com/google/magika}
}
项目详情
下载文件
下载适合您平台的文件。如果您不确定选择哪个,请了解有关 安装包 的更多信息。
源分布
magika-0.5.1.tar.gz (1.0 MB 查看哈希值)
构建版本
magika-0.5.1-py3-none-any.whl (1.0 MB 查看哈希值)
关闭
magika-0.5.1.tar.gz 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 43dc1153a1637327225a626a1550c0a395a1d45ea33ec1f5d46b9b080238bee0 |
|
MD5 | 278586fcc194faa4b2b3df09961c7654 |
|
BLAKE2b-256 | 1a58c1d8887354d0ff2256d4d78d08a69bcc55719a0189afa706c51da04390f2 |
关闭
magika-0.5.1-py3-none-any.whl 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | a4d1f64f71460f335841c13c3d16cfc2cb21e839c1898a1ae9bd5adc8d66cb2b |
|
MD5 | b7198531cbbf7985862259bb10653a2f |
|
BLAKE2b-256 | 6679e1c167ec35060692b70bfc4f2d0aa9314dd7e37ba8e30c1c27965e2f1daa |