Enter fullscreen mode

MG Encoding - Technical document v4.4

(specifications - rationale - parsing algorithm)

Contents

Instructions on how to encode and decode MediaGlyphs sentences is provided, together with specifications on file locations for display and linking of images and explanation pages. An algorithm and sample perl code are added at the bottom of the document.

Purpose of the encoding

Storage and transmission of codes representing glyphs, glyph-combinations and phonetic names.

Total Alphabet

[0-9] [a-z] [A-Z] [] {} @ ^ + = _

Alphabet explained

Glyph subset alphabet

[0-9] [a-z] [A-Z] {} @

Special symbols subset alphabet

[] ^ + =

Punctuation symbols that can appear in MG sentences

, . ; : ( ) ' " -

Additional symbols used

The " " (NO BREAK SPACE) can be used for human legibility but is squashed and ignored when parsing.

The "_" (LOW LINE) is used for compatibility with filesystems that do not differentiate between uppercase and lowercase letters. It is used in this way: all uppercase letters are followed by "_" to differentiate the filenames.
Hence "aa.png" is different from "A_A_.png" which is different from "aA_.png" and so on.
For transmission and storage of codes, the "_" is not needed, but for filenames (html pages, png files...) it is necessary.
Hence all MG encoded strings will be "escaped" with "_" and "unescaped" removing it, as needed.

When parsing MG codes, " " and "_" are eliminated.

Rationale

Two symbols from the "glyph subset alphabet" are required to specify a glyph.
E.g.: 7O qc w6 eN @f {j l{
all specify single glyphs.
NOTE: "@" will be used only as first symbol specifying a glyph, not appearing in second position.

There are hence 4160 (64*64 + 1*64) possible combinations of the "glyph subset alphabet" to encode a maximum of 4160 single glyphs. We don't expect to reach this maximum number, and instead we plan to keep the number of single glyphs around 2000.

The first symbol (of the two that specify a glyph) indicates the category that the glyph belongs to.
Hence "ja" and "ji" are glyphs in the same category ("numerals").

The symbols from the "special symbols subset alphabet" all have a meaning affecting parsing, because they are involved in specifying composites, glyphs being shifted of category, phrases...

The trivial case: A MG string containing only symbols from the "glyph subset alphabet" would be easily parsed by splitting it in consecutive substrings of length 2, and these would be the codes specifying the glyphs and directly pointing to the image files (.png).

E.g.: "@baH@kQC@bbt" (MG)
(equivalent to "@b aH @k QC @b bt" and to "@baH_@kQ_C_@bbt")
would encode 6 consecutive glyphs (5 unique) whose images are located in the "l/" directory, with filenames: "@b.png" "aH_.png" "@k.png" "Q_C_.png" "bt.png"
('l' stands for 'library', short for 'image library').

Things become slightly more complicated with the special symbols.

Explanation of special symbols and their syntax

More examples: sample sentences

The HTML and the encoded string of the following sample sentences can be compared:

Test and compare

It's possible to compare the result of parsing made by a new program and the existing display system

Parsing algorithm

Remove " " and "_" from the encoded string

For the whole length of the encoded string do:


 
 
What is MG?
First appearance: Wed Jan 22 19:32:22 GMT 2003 - | - Last modified: Mon Jul 20 23:32:30 CEST 2009
[MG: throw; tossMG: -er; -or]thrower