Enter fullscreen mode

Guidelines for maintainers of wordlists

Creation&Curation of a MediaGlyphs translation

Online editing system

To edit a wordlist, please use the MG database editing system (coming soon).

Guidelines for curators and maintainers of wordlists

When creating or curating (cleaning up) a translation of glyphs in a natural language the following guidelines should be followed:

  1. attention to word class

    Many english/chinese verbs can also function as nouns. The same happens in MG, with the noun meaning "an act or the action of ...ing". So most entries marked as a verb could be also translated as nouns in your language. The entries are marked with MORphology codes (reference table).

    In this case we'd like to have both the noun form and the verb form, possibly with a coherent comment '(N:)' to indicate the noun forms.

    The rule of thumb is that if the possible noun means "to be doing the action" expressed by the verb, then both forms should be inserted.

    e.g.:

      eng: run (continued rapid movement)
      MOR: verb
      ita: correre | (N:) corsa

    but not:
      eng: swim
      MOR: verb
      ita: nuotare | (N:) nuotata | nuoto
    (nuotata means "a swim" and is ok, but "nuoto" is the name of the sport and should be used for a separate specific entry in the composite list)

    also not:
      eng: slide
      ita: scivolare | (N:) scivolo

    ("Scivolo" does not mean the action of sliding but the object found in gardens on which kids can slide in play, which is one of the meanings of the eng word "slide" but not the one we want. "Scivolata" would be ok, since it means "sliding event")

    also not:
      eng: cook
      ita: cucinare | (N:) cuoco

    (cuoco is "the cook" not "the cooking". I.e. the person that cooks. This is covered by an entry in the phrase wordlist. The correct word to add would be "cottura" which means the act of cooking).

    If your language uses a noun to cover MG's adjective, then use the noun.
    The same applies for other cases of a different class.

    When in doubt, ask.

  2. different forms (for languages with high morphology)

    Apart from what is said in point 1), different forms of the same word should be avoided (the full range of inflections will be captured by rules in language-specific morphology plugins for the input system). Hence different forms for tense/mode inflections in verb, or number in noun/adjective need not be specified.

    e.g.

      eng: run
    no need to add the forms "ran" or "runs" or "running"
      eng: shoe
    no need to add "shoes" or "the shoes"
      ita: leggere | (N:) lettura
    no need to add "lessi | leggo | leggiamo | leggesse....."

    Personal pronouns constitute a special case for which we'd like to specify all forms (in case of gender distinction). Pay attention that in the "phrases" wordlist there are specific pronouns (with gender and number distinctions and also possessive forms).

    e.g.

      eng: I | me
    for 1st person unspecified gender (MG @b)
      eng: she | her
    for 3rd person specified gender (MG [@v@m])

    Verbs in most languages are usually inserted in the infinitive form, but for languages which have no infinitive, follow the rule for your language. For example insert the 1st person singular of the present tense, like in ell: (Greek) and bul: (Bulgarian).

      e.g. ita: andare

    If pronouns, nouns or adjectives have gender forms and you feel like adding all of them, the suggested order&form should be:

    neuter | feminine | masculine
    or
    feminine | masculine
    No need for plural forms.

      e.g. ita: allegra | allegro

    If instead you feel like inserting just one form, then it's ok to use the one appearing in the dictionary (sadly it's usually the masculine one).

    An alternative approach can be:

    neuter <-feminine_ending | -masculine_ending>
      e.g. ell: ένοχο <-χη | -χος>

  3. which words to use

    When multiple words in the eng definition (or in the definition you are using as basis for translation) seem to belong to different concepts, try to infer the common semantic area, the common concept shared by the various words and find the words in your language that best represent that concept.
    If you can identify one word that correctly represents the various concept, use it. Otherwise, separately translate the different words in the eng definition. It is important not to diverge too much from the original meaning area that we wish to cover

    e.g.

      eng: team | crew | squad
      ita: squadra | equipaggio
    but not
      ita: squadrone | commando

    Another example:

      eng: chief | leader | boss
      deu: Führer | Führerin | Chef | Chefin | Vorgesetzter | Vorgesetzte

    When there is more than one word corresponding to the same concept in your language, list all words.

      eng: race (group of people with similar characteristics)
      ita: razza | etnia

      eng: chest
      ita: busto | petto | torace

    Use common words, general words, avoiding particular or unknown words. Don't use words which are too general, too ambiguous. Or too specific.

    Of course the mapping is never perfect. Two languages, even if similar, have different ways of expressing the various concepts. We are trying to have the best mapping, with the majority of the semantic area covered, without covering too much outside.

    Nevertheless, the use of synonyms is highly encouraged, because there are different ways in which people express the same thing and it's good to have them find the correct glyph when they are looking for it. Hence putting more choices is usually good, as long as they do not deviate from the meaning we are trying to express in the glyph.

    For example, using medical terms for parts of the body is accepted and encouraged, as long as they are placed together (possibly after) the common words for their concepts.

  4. synopsis, how to write entries, use of () [] {} <>

    Words that shouldn't be used as keywords for typing or for looking up entries in the wordlist should be shielded by () or [].

    e.g.

      eng: vein | [blood] vessel
      eng: soft | malleable | yielding [to pressure]
      eng: seem | appear [to be] | [give the] impression (of)
      eng: approve (of)

    In these examples the words "blood, to, pressure, be, give, the, of" won't be used to index these concepts. If these words were not shielded the following example cases would happen:

    • the wordlist index would have an entry "of" with a lot of completely unrelated meanings
    • typing "blood" you'd be presented with vein, typing "give" you'd be presented with "appear".

    It is therefore very important to shield with () or [] those words unrelated to the concept.

    e.g.

      eng: fear
      ita: [avere] paura

      eng: succeed
      deu: Erfolg [haben] | gelingen

    What then is the difference between using [] and ()?

    The [] will be removed in the dictionary explanation entries, while the () will be kept. So when looking up the glyph for "vein", the translation "vein, blood vessel" will be written. "vessel" and "vein" will both be keywords in the wordlist index and will lead to the same glyph.

    On the other hand an entry like "approve (of)" will be kept like it is, to indicate that both entries "approve" and "approve of" would be a valid meaning. The () shield optional words that are part of the meaning.
    The [] shield words which are needed by the meaning. And the [] will be removed.

    Also () is used for semantic comments. Hence the entry

      eng: believe (accept as true)

    will produce "believe" as keyword but the comment in ()-brackets will be left for the explanatory page.

    Remember that all words separated by " " (blank space) or " | " will be used as keywords for wordlists index and input programs, unless they are shielded between () [] <> or {}. Synonyms (multiple translations) are separated by " | " (note the format with the two single blanks on the two sides of the "pipe" ("|") character: "BLANK+PIPE+BLANK".

    The Japanese and Chinese wordlists constitute a special case: the []-brackets enclose the romanization transliteration.

    e.g.

      eng: children
      cmn.t: 兒女 [er2 nu:3] | 孩子 [hai2 zi5]

    The <> ("less than", "greater than" signs) mark grammatic comments

    e.g.

      eng: furniture
      deu: Möbel

      eng: hour
      fra: heure

    The {} (curly brackets) mark usage examples

    e.g.

      eng: deny (declare untrue) {they denied that they had robbed the house}

  5. more synopsis, general precautions

    Please observe the following guidelines when dealing with the wordlist:

    Don't nest [] () <> or {} (i.e. don't have cases like "[(abc) xa [ax] yz]").

    Don't leave [] () <> or {} unbalanced (e.g. "abc ]" or "abc [xyz"), close all the brackets that you open.

    Leave a blank space before and after [] () <> or {} (i.e. don't have "abc[lem]"), even if the shielded part is an affix (in this case it's best to use "()"):

      deu: Zwilling (-sbruder)"

    If the entry is a suffix, type it as "-suffix", i.e. prepending a "-"

      ita: -issimo
      fin: -ille

    If the entry is a prefix, type it as "prefix-", i.e. appending a "-"

      eng: meta-

    If the entry is an infix, type it as "-infix-".

    Use Unicode UTF-8 Encoding for all entries. Don't mix encodings.

    Avoid using the character " (double quote). If needed for quotations, use the single quote (') instead.


We thank you very much for your help and we are sorry to bother with so many guidelines, but it's needed for a coherent result and an easy automatic parsing.
 
 
Contribute
First appearance: Wed Jun 19 17:52:26 BST 2002 - | - Last modified: Sun Jan 3 22:12:13 CET 2010
[MG: rain; rainfallMG: fall; dropMG: -ing; imperfect]rains; it's raining