Multilingual pretrained word embeddings

From National Research Council Canada

Alternative titlePlongements de mots pré-entraînés
AuthorSearch for: 1ORCID identifier:
Name affiliation
  1. National Research Council of Canada. Digital Technologies
FormatText, Dataset
Physical description14 .tgz files – approximately 65 GB total size
SubjectYiSi; embeddings; machine translation; bleu score; NRC portage
FileFormatSizeLast Updated
TGZ3 GB2019-05-31
TGZ1 GB2019-05-31
TGZ2 GB2019-05-31
TGZ9 GB2019-05-31
TGZ7 GB2019-06-03
TGZ876 MB2019-06-03
TGZ4 GB2019-06-03
TGZ87 MB2019-06-03
TGZ691 MB2019-06-03
TGZ515 MB2019-06-03
TGZ248 MB2019-06-03
TGZ1 GB2019-06-03
TGZ492 MB2019-06-03
TGZ472 MB2019-06-03
Publication date
Date created2018
PublisherNational Research Council of Canada
Related publication
NoteThis dataset is the supplement to this article: Chi-kiu Lo, Michel Simard, Darlene Stewart, Samuel Larkin, Cyril Goutte and Patrick Littell (2018). “Accurate semantic textual similarity for cleaning noisy parallel corpora using semantic machine translation evaluation metric: The NRC supervised submissions to the Parallel Corpus Filtering task” Third conference on Machine Translation (WMT 2018). Brussels, Belgium: Nov 2018.
Export citationExport as RIS
CollectionNRC Research Data
Record identifier41bc88cd-5362-4d43-b4fd-61ef661018c8
Record created2019-05-23
Record modified2020-06-04
Date modified: