Multilingual pretrained word embeddings

From National Research Council Canada

Alternative titlePlongements de mots pré-entraînés
AuthorSearch for: 1ORCID identifier: https://orcid.org/0000-0001-8714-7846
Name affiliation
  1. National Research Council Canada. Digital Technologies
FormatText
TypeDataset
ISBN978-1-948087-81-0
Physical description14 .tgz files – approximately 65 GB total size
SubjectYiSi; embeddings; machine translation; bleu score; NRC portage
Abstract
Downloads
FileFormatSizeLast Updated
TGZ3 GB2019-05-31
TGZ1 GB2019-05-31
TGZ2 GB2019-05-31
TGZ9 GB2019-05-31
TGZ7 GB2019-06-03
TGZ876 MB2019-06-03
TGZ4 GB2019-06-03
TGZ87 MB2019-06-03
TGZ691 MB2019-06-03
TGZ515 MB2019-06-03
TGZ248 MB2019-06-03
TGZ1 GB2019-06-03
TGZ492 MB2019-06-03
TGZ472 MB2019-06-03
Publication date
Date created2018
PublisherNational Research Council of Canada
Licence
Related publication
NoteThis dataset is the supplement to this article: Chi-kiu Lo, Michel Simard, Darlene Stewart, Samuel Larkin, Cyril Goutte and Patrick Littell (2018). “Accurate semantic textual similarity for cleaning noisy parallel corpora using semantic machine translation evaluation metric: The NRC supervised submissions to the Parallel Corpus Filtering task” Third conference on Machine Translation (WMT 2018). Brussels, Belgium: Nov 2018. https://nrc-publications.canada.ca/eng/view/object/?id=6a17ac14-d76b-4b32-9343-93b03c77ca0d
LanguageEnglish
Export citationExport as RIS
CollectionNRC Research Data
Record identifier41bc88cd-5362-4d43-b4fd-61ef661018c8
Record created2019-05-23
Record modified2020-02-10
Date modified: