Production-ready LASER multilingual embeddings

These details have not been verified by PyPI

Project links

Project description

LASER embeddings

PyPI - Python Version

Out-of-the-box multilingual sentence embeddings.

laserembeddings is a pip-packaged, production-ready port of Facebook Research's LASER (Language-Agnostic SEntence Representations) to compute multilingual sentence embeddings.

Have a look at the project's repo (master branch) for the full documentation.

Getting started

You'll need Python 3.6 or higher.

Installation

pip install laserembeddings

To install laserembeddings with extra dependencies:

# if you need Chinese support:
pip install laserembeddings[zh]

# if you need Japanese support (not available on Windows):
pip install laserembeddings[ja]

# or both:
pip install laserembeddings[zh,ja]

Downloading the pre-trained models

python -m laserembeddings download-models

This will download the models to the default data directory next to the source code of the package. Use python -m laserembeddings download-models path/to/model/directory to download the models to a specific location.

Usage

from laserembeddings import Laser

laser = Laser()

# if all sentences are in the same language:

embeddings = laser.embed_sentences(
    ['let your neural network be polyglot',
     'use multilingual embeddings!'],
    lang='en')  # lang is only used for tokenization

# embeddings is a N*1024 (N = number of sentences) NumPy array

If the sentences are not in the same language, you can pass a list of languages

embeddings = laser.embed_sentences(
    ['I love pasta.',
     "J'adore les pâtes.",
     'Ich liebe Pasta.'],
    lang=['en', 'fr', 'de'])

If you downloaded the models into a specific directory:

from laserembeddings import Laser

path_to_bpe_codes = ...
path_to_bpe_vocab = ...
path_to_encoder = ...

laser = Laser(path_to_bpe_codes, path_to_bpe_vocab, path_to_encoder)

# you can also supply file objects instead of file paths

If you want to pull the models from S3:

from io import BytesIO, StringIO
from laserembeddings import Laser
import boto3

s3 = boto3.resource('s3')
MODELS_BUCKET = ...

f_bpe_codes = StringIO(s3.Object(MODELS_BUCKET, 'path_to_bpe_codes.fcodes').get()['Body'].read().decode('utf-8'))
f_bpe_vocab = StringIO(s3.Object(MODELS_BUCKET, 'path_to_bpe_vocabulary.fvocab').get()['Body'].read().decode('utf-8'))
f_encoder = BytesIO(s3.Object(MODELS_BUCKET, 'path_to_encoder.pt').get()['Body'].read())

laser = Laser(f_bpe_codes, f_bpe_vocab, f_encoder)

Algorithm	Hash digest
SHA256	`7b911e627155d55fdb0f55e5a5822017a739e75cbb7e30095a0666fa90f84b52`
MD5	`e4acd0c1387fff8312c0d5589ef33057`
BLAKE2b-256	`ba99e3e70b2619a361aa0053e44d7b7fed312c6044ac162ceaea819d88178179`

Algorithm	Hash digest
SHA256	`1364c7b5927617c7b9728023187f778de55719decbf2bf7ea86b57f1699e439d`
MD5	`cafc2fc98fa3aaf5a8400ec7939ba11f`
BLAKE2b-256	`a7cebea110a875b7b96d3627d6f5b88af4e40a5fd23b51f1068b9899e8f7033b`

laserembeddings 1.0.1a1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LASER embeddings

Getting started

Installation

Downloading the pre-trained models

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes