Bloque III – Documentación y herramientas de traducción asistida y traducción automática aplicadas a la
traducción técnico-científica. Localización de software.
Unit 5 – Documentation, CAT, TM and MT and localisation
1. Documentation.
1.1 Evaluating your sources (Shaheem et al. 2017):
a) Which quality criteria do you follow to do searches on the Internet?
b) What are scholarly and non-scholarly sources?
c) What are the pros and cons of these sources? Can they be useful?
- Google
- Google Scholar
- A blog
- A personal webpage
- A journal
- A popular magazine
- A glossary
- A company website
- Wikipedia
- An academic book
d) Number of documents per day: 500,000,000 tweets, 2,000,000 blog posts, 144,000 hours of video,
4,931 articles.
e) Review time:
- Social media: any revision
- Blogs/personal pages: 1-2 reviewers; time: hours/days
- Academic books, articles, etc.: 3-4 reviewers; time: months/years
f) Number of outside sources:
- Social media: 0
- Blogs/personal pages: around 5
- Academic books, articles, etc.: 15-30.
1.2 Criterios de evaluación de las fuentes (Mayoral 1997):
a) Fiabilidad: Autoridad - Información ofrecida por especialistas y empresas especializadas - Organismos
con intención normalizadora - Práctica cotidiana
b) Accesibilidad: Si es fiable, pero de difícil acceso, no es idónea.
c) Originalidad: Mejor redactada ex novo, no traducida.
d) Especificidad: Cuanto más especializada sea la fuente, más fiable. Si es bilingüe, que contenga
definiciones.
e) Exhaustividad: Si es poco exhaustiva, perdemos el tiempo.
f) Corpus: Basados en textos reales.
, Bloque III – Documentación y herramientas de traducción asistida y traducción automática aplicadas a la
traducción técnico-científica. Localización de software.
1.3 Google searches:
- Exact phrase: “mill spindle”
- Excluding terms: “mill spindle” -linguee -reverso
- OR, NOT: metal OR wood; metal NOT wood
- Synonyms: plumbing ~university -> includes colleges
- Site: site:boe.es, site:uk
- Between: chemical engineering articles 2010.. 2018
- Body, title, URL: inurl:boe, intext:”end mill”, intitle:review
- Related: related:boe.es
- File type: filetype:pdf
- Pictures
- Combined: ~HTML -HTML (only synonyms); site:intel.com related:microchip filetype:pdf
2. CAT, TM and MT.
CAT (TAO): computer-assisted translation.
TM (MT): translation memories.
MT (TA): machine translation.
- Also: text extractors, aligners, memory editors, project management, terminology modules
(MultiTerm and online), APIs for MT, quality control (Xbench and others), file conversion...
- Many people say CAT tools are only helpful with high numbers of internal repetitions and when DB
are full. But: they are actually useful even from zero, to keep consistency.
- Higher productivity, fewer risks of breaking or deleting tags. But: sometimes less freedom of
movement with segments, paragraphs... and maybe less brain activity.
2.1 Machine/automatic translation:
Definition: Translate a word, phrase, document or group of documents automatically using online or offline
software, with or without linked glossaries or terminology databases.
History: First steps in the 1950s-60s > Systran used by the EC in 1976 > first commercial systems in the 1980s,
using rules > first statistics and corpus-based systems in the 1990s > first consumers' systems around 2006
(Google Translate) > first neural network-based systems in 2016.
Types:
a) Rules and/or corpora (grammar and lexical rules).
b) Statistics (phrase-based probability).
c) Neural networks: probability p(x|‘My flight is delayed', 'Mi vuelo lleva') + 'retraso' (most likely x).
Translators:
- Google Translate: neural. - Apertium: rules.
- DeepL: neural (> Linguee). - Moses: statistics.
- Bing: neural. - Yandex: hybrid neural-statistics.
- Systran online: hybrid neural-statistics. - Prompt: hybrid rules-neural (> Reverso).