Coding for Language Communities

Open Edge Hardware and Software for Natural Language Translation and Understanding

The last half decade has seen a major increase in the accuracy of deep learning methods for natural language translation and understanding. However many users still interact with these systems through proprietary models served on specialized cloud hardware. In this talk we discuss co-design efforts between researchers in natural language processing and computer architecture to develop an open-source software/hardware system for natural language translation and understanding across languages. With this system, users can access state-of-the-art models for translation, speech, and classification, and also run these models efficiently on edge device open-hardware designs. Our work combines two open-source development efforts, OpenNMT and FlexNLP. The OpenNMT project is a multi-year collaborative project for creating an ecosystem for neural machine translation and neural sequence learning. Started in December 2016 by the Harvard NLP group and SYSTRAN, the project has since been used in many research and industry applications. The project includes highly configurable model architectures and training procedures, efficient model serving capabilities for use in real world applications, and extensions to tasks such as text generation, tagging, summarization, image to text, and speech to text. FlexNLP is an open-source fully retargetable hardware accelerator targeted for natural language processing. Its hardware design is targeted to key NLP computational functions such as attention mechanisms and layer normalization that are often overlooked by today’s CNN or RNN hardware accelerators. FlexNLP’s rich instruction set architecture and microarchitecture enable a diverse set of computations and operations that are paramount for end-to-end inference on state-of-the-art attention-based NLP models. Together they provide an open pipeline for both model training and edge device deployment.

Additional information

Type devroom

More sessions

2/1/20
Coding for Language Communities
Alberto Massidda
AW1.120
We present: 1) a full pipeline for unsupervised machine translation training (making use of monolingual corpora) for languages with low available resources; 2) a translation server making use of that unsupervised MT with an HTTP API compatible with Moses toolkit, a once prominent MT system; 3) a Docker packaged version of the EU funded free Computer Aided Translation (CAT) tool MateCAT for ease of deployment. This full translation pipeline enables a non technical user, speaking a non-FIGS ...
2/1/20
Coding for Language Communities
Lydia Pintscher
AW1.120
Wikidata, Wikimedia's knowledge base, has been collecting general purpose data about the world for 7 years now. This data powers Wikipedia but also many applications outside Wikimedia, like your digital personal assistant. In recent years Wikidata's community has also started collecting lexicographical data in order to provide a large data set of machine-readable data about words in hundreds of languages. In this talk we will explore how Wikidata enables thousands of volunteers to describe their ...
2/1/20
Coding for Language Communities
Sander van Geloven
AW1.120
Nuspell version 3 is a FOSS checker that is written in pure C 17. It extensively supports character encodings, locales, compounding, affixing and complex morphology. Existing spell checking in web browsers, office suits, IDEs and other text editors can use this as a drop-in replacement. Nuspell supports 90 languages, suggestions and personal dictionaries.
2/1/20
Coding for Language Communities
Michal Čihař
AW1.120
Please note that this talk will now be given by Michal Čihař instead of Václav Zbránek. The presentation will show you how to localize your project easily with little effort, open-source way. Why we started Weblate? We said no to repetitive work, no to manual work with translation files anymore. Weblate is unique for its tight integration to VCS. Set it up once and start engaging the community of translators. More languages translated means more happy users of your software. Be like ...
2/1/20
Coding for Language Communities
Peter Bouda
AW1.120
The Poio project develops language technologies to support communication in lesser-used and under-resourced languages on and with electronic devices. Within the Poio project we develop text input services with text prediction and transliteration for mobile devices and desktop users to allow conversation between individuals and in online communities.