Go to the first, previous, next, last section, table of contents.
a2ps is trying to support the various usual encodings that its users use. This chapter presents what an encoding is, how the encodings support is handled within a2ps, and some encodings it supports.
This section is actually taken from the web pages of Alis Technologies inc.
Document encoding is the most important but also the most sensitive and explosive topic in Internet internationalization. It is an essential factor since most of the information distributed over the Internet is in text format. But the history of the Internet is such that the predominant - and in some cases the only possible - encoding is the very limited ASCII, which can represent only a handful of languages, only three of which are used to any great extent: English, Indonesian and Swahili.
All the other languages, spoken by more than 90% of the world's population, must fall back on other character sets. And there is a plethora of them, created over the years to satisfy writing constraints and constantly changing technological limitations. The ISO international character set registry contains only a small fraction; IBM's character registry is over three centimeters thick; Microsoft and Apple each have a bunch of their own, as do other software manufacturers and editors.
The problem is not that there are too few but rather too many choices, at least whenever Internet standards allow them. And the surplus is a real problem; if every Arabic user made his own choice among the three dozen or so codes available for this language, there is little likelihood that his "neighbor" would do the same and that they would thus be able to understand each other. This example is rather extreme, but it does illustrate the importance of standards in the area of internationalization. For a group of users sharing the same language to be able to communicate,
Certain character sets stand out either because of their status as an official national or international standard, or simply because of their widespread use.
First off, there is the ISO 8859 standards series that standardize a dozen character sets that are useful for a large number of languages using the Latin, Cyrillic, Arabic, Greek and Hebrew alphabets. These standards have a limited range of application (8 bits per character, a maximum of 190 characters, no combining) but where they suffice (as they do for 10 of the 20 most widely used languages), they should be used on the Internet in preference to other codes. For all other languages, national standards should preferably be chosen or, if none are available, a well-known and widely-used code should be the second choice.
Even when we limit ourselves to the most widely used standards, the overabundance remains considerable, and this significantly complicates life for truly international software developers and users of several languages, especially when such languages can only be represented by a single code. It was to resolve this problem that both Unicode and the ISO 10646 International standard were created. Two standards? Oh no! Their designers soon realized the problem and were able to cooperate to the extent of making the character set repertoires and coding identical.
ISO 10646 (and Unicode) contain over 30,000 characters capable of representing most of the living languages within a single code. All of these characters, except for the Han (Chinese characters also used in Japanese and Korean), have a name. And there is still room to encode the missing languages as soon as enough of the necessary research is done. Unicode can be used to represent several languages, using different alphabets, within the same electronic document.
The support of the encodings in a2ps is completely taken out of the code. That is to say, adding, removing or changing anything in its support for an encoding does not require programming, nor even being a programmer.
See section 6.1 What is an Encoding, if you want to know more about this.
See section 5.2 Map Files, for a description of the map files.
The meaningful lines of the `encoding.map' file have the form:
alias key iso-8859-1 latin1 latin1 latin1 l1 latin1
where
mail style sheet (support for
MIME).
When encoding is asked, the lower case version of encoding
must be equal to alias.
The encoding description file describing the encoding key is named `key.edf'. It is subject to the same rules as any other a2ps file:
The entries are
Name: ISO-8859-1
Documentation Also known as ISO Latin 1, or Latin 1. It is a superset of ASCII, and covers most West-European languages. EndDocumentation
Courier, Times-Roman...) do not support many encodings
(for instance it does not support Latin 2). To avoid that Latin 2 users
have to replace everywhere calls to Courier, a2ps allows to
specify that whenever a font is called in an encoding, then another font
should be used.
For instance in `iso2.edf' one can read:
# Fonts from Ogonkify offer full support of ISO Latin 2 Substitute: Courier Courier-Ogonki Substitute: Courier-Bold Courier-Bold-Ogonki Substitute: Courier-BoldOblique Courier-BoldOblique-Ogonki Substitute: Courier-Oblique Courier-Oblique-Ogonki
Courier
equivalent is the best choice.
Default: Courier-Ogonki
^G) should not be named). The special name `.notdef' is to
be used when the character is not printable.
Warning. Make sure to use real, official, PostScript names.
Using names such as `c123' may be the sign you use unusual names.
On the other hand PostScript names such as `afii8879' are common.
Most of the following information is a courtesy of Alis Technologies inc. and of Roman Czyborra's page about The ISO 8859 Alphabet Soup. See section 6.1 What is an Encoding, is an instructive presentation of the encodings.
The known encodings are:
The lack of the new C=-resembling Euro currency symbol U+20AC has opened the discussion of a new Latin0.
Support is provided thanks to Ogonkify.
Support is provided thanks to Ogonkify.
Support is provided thanks to Ogonkify.
The Cyrillic alphabet was created by St. Cyril in the 9th century from the upper case letters of the Greek alphabet. The more ancient Glagolithic (from the ancient Slav glagol, which means "word"), was created for certain dialects from the lower case Greek letters. These characters are still used by Dalmatian Catholics in their liturgical books. The kings of France were sworn in at Reims using a Gospel in Glagolithic characters attributed to St. Jerome.
Note that Russians seem to prefer the KOI8-R character set to the ISO set for computer purposes. KOI8-R is composed using the lower half (the first 128 characters) of the corresponding American ASCII character set.
Support is provided thanks to Ogonkify.
Support is provided thanks to Ogonkify.
Very few fonts yet offer the possibility to print the Euro sign.
Go to the first, previous, next, last section, table of contents.