This site uses cookies.
Some of these cookies are essential to the operation of the site,
while others help to improve your experience by providing insights into how the site is being used.
For more information, please see the ProZ.com privacy policy.
Hello everyone! I am currently working on some piece of software, where I need to align two sets of segments in different languages.
I've been thinking of different methods like tracking numbers, punctuation, segment length, special terms, DNT sets and so on and so forth. But now I am starting to feel that I would have to pre-translate the source into the target language to align segments properly. And here is the point where 'roots dictionaries' (I am not sure about th... See more
Hello everyone! I am currently working on some piece of software, where I need to align two sets of segments in different languages.
I've been thinking of different methods like tracking numbers, punctuation, segment length, special terms, DNT sets and so on and so forth. But now I am starting to feel that I would have to pre-translate the source into the target language to align segments properly. And here is the point where 'roots dictionaries' (I am not sure about the term) would be extremely handy. Like if we have the segment 'Abdominal pain' in a segment and need to align it with, say, russian 'боль в животе', would be handy to have a dictionary where 'abdom' corresponds to 'жив' and 'брюш', and 'pain' - with 'бол' and maybe 'страд'.
Has anyone heard anything about such dictionaries? I am especially interested in English - Ukrainian pair.
Has anyone heard anything about such dictionaries? I am especially interested in English - Ukrainian pair.
Hi Mikhail,
I have been using the LF Aligner program for many years. There are such files in the "\scripts\hunalign\data\raw" folder. For example, en.txt contains 19277 lines: === @ 1st a bird in the hand is worth two in the bush a cappella a drop in the bucket a friend in need is a friend indeed a journey of a thousand miles begins with a single step a little a lot ... zoology zoonosis zootechnics Zoroastrianism zucchini Zürich === A script "dicmaker.pl" create bilingual dictionary, e.g. en-ru.dic === 1-ый @ 1st а капелла @ a cappella капля в море @ a drop in the bucket друг познаётся в беде @ a friend in need is a friend indeed путь в тысячу вёрст начинается с первого шага @ a journey of a thousand miles begins with a single step немного @ a little много @ a lot === You have to translate a list of stop words. Take 3 languages as source a create one Ukrainian file. A system shall create *.dic for any language pair for which you have file in \raw\ folder. You can use MT for translating 3 languages and edit it to one uk.txt file. Each alignment SW use own list of stop words.
Milan
Yra Kharchenko
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Samuel Murray Netherlands Local time: 02:02 Member (2006) English to Afrikaans + ...
@Mikhail
Aug 31, 2022
Mikhail Sergievskiy wrote: Has anyone heard anything about such dictionaries?
Related concepts are called "word stems", "lexemes" and "roots" (see Wikipedia).
Hunspell spelling dictionaries often contain roots along with codes that indicate potential prefixes and suffixes, but of course, Hunspell spelling dictionaries are monolingual.
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value