Various information about the purpose of the characters that can be
found in various legacy character sets, and other comments about the
SECS/VSECS collections.


------------------------------------------------------------------------
From: tiro@tiro.com (Tiro Typeworks)
Date: Fri, 14 Aug 1998 23:18:37 GMT

>01B7 # LATIN CAPITAL LETTER EZH

Ezh (or Yogh) is found in Old and Middle English texts, and is a
letter in the orthographies of a number of African languages. The only
modern European language I associate it with is Skolt Saami (see
below). The number of speakers/writers of Skolt Saami is probably well
below the 10,000 minimum set in Markus' criteria.


>01C4 # LATIN CAPITAL LETTER DZ WITH CARON
>01C6 # LATIN SMALL LETTER DZ WITH CARON
>01C7 # LATIN CAPITAL LETTER LJ
>01C9 # LATIN SMALL LETTER LJ
>01CA # LATIN CAPITAL LETTER NJ
>01CC # LATIN SMALL LETTER NJ

These are digraphs which were separately encoded in ISO/IEC 10646 and
Unicode to facilitate compatible font mappings between Latin and
Cyrillic fonts for Serbo-Croatian. Language reform policies in the
former Yugoslav republic -- particularly in Croatia -- have greatly
reduced the need for such compatability. I believe these digraph
characters may still be of use in Serbia, if transliteration to Latin
script is a requirement, but such specialised usage may fall beyond
the proposed scope of SECS.


>01DE # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON
>01DF # LATIN SMALL LETTER A WITH DIAERESIS AND MACRON

Unicode 2.0 identifies these characters as Lappish. In the first
place, Lappish is generally considered a derogatory term; in the
second these characters do not appear in any of the Saami
orthographies I have collected. Note that I only have Latin
orthographies for five of the nine Saami languages.


>01E4 # LATIN CAPITAL LETTER G WITH STROKE
>01E5 # LATIN SMALL LETTER G WITH STROKE

These characters are used to write Skolt Saami in the Latin script
(Skolt Saami is also written by some using the Cyrillic script). The
number of speakers/writers of Skolt Saami is probably well below the
10,000 minimum set in Markus' criteria.


>01E6 # LATIN CAPITAL LETTER G WITH CARON
>01E7 # LATIN SMALL LETTER G WITH CARON

I can find no reference for these characters. Their use in Turkish is
incorrect and an unacceptable substitute for the G breve diacritics.
Unicode 2.0 indicates 'Lappish', but they do not occur in any of the
Saami orthographies I have on file.


>01E8 # LATIN CAPITAL LETTER K WITH CARON
>01E9 # LATIN SMALL LETTER K WITH CARON
>01EE # LATIN CAPITAL LETTER EZH WITH CARON
>01EF # LATIN SMALL LETTER EZH WITH CARON

These characters are used to write Skolt Saami in the Latin script
(Skolt Saami is also written by some using the Cyrillic script). The
number of speakers/writers of Skolt Saami is probably well below the
10,000 minimum set in Markus' criteria.


>01F1 # LATIN CAPITAL LETTER DZ
>01F3 # LATIN SMALL LETTER DZ
>01F4 # LATIN CAPITAL LETTER G WITH ACUTE
>01F5 # LATIN SMALL LETTER G WITH ACUTE

Your guess is as good as mine. I believe these can be safely omitted.


>027C # LATIN SMALL LETTER R WITH LONG LEG

I know of no usage of this character outside of phonetic transcription
(strident apico-alveolar trill). I'm not even sure that it remains
part of the official IPA standard set.


>0292 # LATIN SMALL LETTER EZH

See note above for uppercase Ezh/Yogh. Of course, if it is decided to
include a basic IPA subset, this character would become necessary.


>0374 # GREEK NUMERAL SIGN
>0375 # GREEK LOWER NUMERAL SIGN

I believe these to be archaic, and are only of use when Greek letters
are serving as numerals (as they did before the introduction of
'Arabic' numerals).


>037A # GREEK YPOGEGRAMMENI

This is the Greek subscript iota. It is not used in modern, monotonic
Greek, so may be safely omitted from SECS.


>037E # GREEK QUESTION MARK

I'm unable to confirm, at this time, whether this punctuation mark is
still in use or not. I suspect not, and most readers would be unlikely
to distinguish it from a semicolon.


John Hudson, Type Director

Tiro Typeworks
Vancouver, BC
tiro@tiro.com
www.tiro.com

------------------------------------------------------------------------
From: tiro@tiro.com (Tiro Typeworks)
Date: Fri, 14 Aug 1998 22:35:08 GMT

My browser finally finished downloading Markus' SECS website, and I
have prepared the following comments on some of the WGL4 characters he
has excluded from SECS. I believe that some of these characters should
be included in the SECS, in accordance with Markus' criteria, and have
marked my comments on these characters with an asterisk.

I have not bothered to comment on the heavy linedraw characters, etc.,
and have confined my comments to letters and diacritics.

[I am also concerned that Markus' recommended mathematical set may be
too extensive. Is this really a _basic_ mathematical subset, or
something more?]


0114    LATIN CAPITAL LETTER E WITH BREVE

0115    LATIN SMALL LETTER E WITH BREVE

012C    LATIN CAPITAL LETTER I WITH BREVE

012D    LATIN SMALL LETTER I WITH BREVE

These characters are not required for the modern writing of any
European language. They are essential to much European prosody, and
are found in most Latin language textbooks and dictionaries. I believe
it would be sound to omit them from SECS if the basic, non-combining
IPA characters are also to be omitted. If the latter are included it
would make sense to include short and long vowel diacritics.


0132    LATIN CAPITAL LIGATURE IJ

0133    LATIN SMALL LIGATURE IJ

These, of course, are the Dutch digraph characters. There is no need
for them to be separately encoded, as Dutch writers commonly type /I/
followed by /J/. These characters can, I believe, be safely omitted
from SECS.


013F    LATIN CAPITAL LETTER L WITH MIDDLE DOT

0140    LATIN SMALL LETTER L WITH MIDDLE DOT

These are composite rendering forms for the Catalan lateral
approximant. They are not strictly necessary in a character set which
includes an appropriately sized, positioned and spaced midpoint
character (U+00B7). I am a little concerned that in a monospaced font,
of the kind referred to in Markus' SECS criteria, reliance on the
midpoint character will produce gaping holes in the middle of many
Catalan words. I am undecided about the possible inclusion of these
characters.


0149    LATIN SMALL LETTER N PRECEDED BY APOSTROPHE

This is an Hewlett Packard character, apparently used by them for
Afrikaans. I've never heard a clear explanation of its purpose, or its
inclusion in WGL4 or other character sets (other than the fact that HP
wanted it to be included). In any case, Afrikaans is beyond the scope
of SECS, so this character may be safely omitted.


014E    LATIN CAPITAL LETTER O WITH BREVE

014F    LATIN SMALL LETTER O WITH BREVE

These characters are not required for the modern writing of any
European language. They are essential to much European prosody, and
are found in most Latin language textbooks and dictionaries. I believe
it would be sound to omit them from SECS if the basic, non-combining
IPA characters are also to be omitted. If the latter are included it
would make sense to include short and long vowel diacritics.


017F    LATIN SMALL LETTER LONG S

Archaic. This may be safely omitted.


01A0    LATIN CAPITAL LETTER O WITH HORN

01A1    LATIN SMALL LETTER O WITH HORN

01AF    LATIN CAPITAL LETTER U WITH HORN

01B0    LATIN SMALL LETTER U WITH HORN

Vietnamese. These characters may be safely omitted (although there are
sizeable Vietnamese speaking populations in parts of Europe, notably
in the Netherlands).


01FA    LATIN CAPITAL LETTER A WITH RING ABOVE AND
        ACUTE

01FB    LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE

01FC    LATIN CAPITAL LETTER AE WITH ACUTE

01FD    LATIN SMALL LETTER AE WITH ACUTE

01FE    LATIN CAPITAL LETTER O WITH STROKE AND ACUTE

01FF    LATIN SMALL LETTER O WITH STROKE AND ACUTE

* These characters are used in Danish and their inclusion in both
Unicode and the WGL4 set was at the request of the Danish standards
organization. My understanding is that there is some debate over the
status of these characters in modern Danish. Some sources claim that
they are archaic, others that they are orthographically correct and
that to omit them is a mistake. I believe they should not be omitted
from SECS without further research.


02D6    MODIFIER LETTER PLUS SIGN

This may be safely omitted.


1E80    LATIN CAPITAL LETTER W WITH GRAVE

1E81    LATIN SMALL LETTER W WITH GRAVE

1E82    LATIN CAPITAL LETTER W WITH ACUTE

1E83    LATIN SMALL LETTER W WITH ACUTE

1E84    LATIN CAPITAL LETTER W WITH DIAERESIS

1E85    LATIN SMALL LETTER W WITH DIAERESIS

1EF2    LATIN CAPITAL LETTER Y WITH GRAVE

1EF3    LATIN SMALL LETTER Y WITH GRAVE

* All these characters are used in modern Welsh and should _not_ be
omitted from SECS. Their use is less common than the W and Y
circumflex diacritics, but all are essential to semantic distinction
and or pronunciation. My source for this information is Andrew Hawke
(ach@pophost.aber.ac.uk), assistant editor of the University of Wales
dictionary of the Welsh language. I can provide a Welsh word list if
required.

John Hudson, Type Director

Tiro Typeworks
Vancouver, BC
tiro@tiro.com
www.tiro.com


------------------------------------------------------------------------
From: sommar@algonet.se (Erland Sommarskog)
Date: Sat, 15 Aug 1998 21:27:41 GMT

tiro@tiro.com (Tiro Typeworks) skriver:
>I am a little concerned that in a monospaced font, of the kind referred to in 
>Markus' SECS criteria, reliance on the midpoint character will produce gaping 
>holes in the middle of many Catalan words. I am undecided about the possible 
>inclusion of these characters.

If you go with the Barcelona you will find that there is a station
which appears to be named Paral-lel, so thick is the middle dot,
and this is not the only instance, I've seen.


--
Erland Sommarskog, Stockholm, sommar@algonet.se


------------------------------------------------------------------------
From: pmbail01@slug.louisville.edu
Date: 15 Aug 1998 05:09:31 GMT

>The following are claimed to be used in Welsh, but Welsh
>native speakers who I asked claimed to have never seen them,
>so I suspect they are historic characters that are not in
>general use.
>
>1E80 # LATIN CAPITAL LETTER W WITH GRAVE
>1E81 # LATIN SMALL LETTER W WITH GRAVE
>1E82 # LATIN CAPITAL LETTER W WITH ACUTE
>1E83 # LATIN SMALL LETTER W WITH ACUTE
>1E84 # LATIN CAPITAL LETTER W WITH DIAERESIS
>1E85 # LATIN SMALL LETTER W WITH DIAERESIS
>1EF2 # LATIN CAPITAL LETTER Y WITH GRAVE
>1EF3 # LATIN SMALL LETTER Y WITH GRAVE

Actually, if one is doing a Welsh *pronunciation* guide these could be
potentially useful; I've also seen at least "w" and "y" acute in past.
(The others, I'll admit, *are* weird--I'm not entirely sure where they'd
be used, save *maybe* in other languages in the same subfamily of Celtic
languages Cymru/Welsh is in [for example, Breton or Manx].  I'm rather
afraid I don't speak any Celtic tongue so I can't be for sure on this;
if memory serves, there is a Manx dictionary online, though.  IF my
memory of that serves at ALL well, Manx doesn't use "w" as a vowel but
*does* use "y"; I know exactly nothing on Breton.)

*POSSIBLY* w-diaresis and y-diaresis occur in *some* transcription schemes
for Native American languages (if they occur in this, it'd likely be for
Northwest languages that have vowels and consonants that literally cannot
be expressed in any other way without resorting to the International
Phonetic Alphabet).

Y-dieresis and y-dieresis do occur in the standard character sets of most
English-language Postscript and Truetype fonts.

Offhand, as an aside--I expect some of the other oddish characters
(AE-grave, etc.) are also used mostly in pronunciation guides as well.

A-dieresis-grave, etc. *may* be used in Vietnamese, but I'm not sure.

>The purpose of the following characters is also
>unclear to me:
>
>201B # SINGLE HIGH-REVERSED-9 QUOTATION MARK
>203C # DOUBLE EXCLAMATION MARK
>203E # OVERLINE
>
>All these are in WGL4 but (so far) not in SECS.

Double-exclamation sounds more like a "typesetting character"; so does
"high reversed-9 quot mark" (maybe this is equivalent to leftquot?)

>There are also some mysterious characters in the MES-2
>proposal which I have not found anywhere else:
>
>01B7 # LATIN CAPITAL LETTER EZH
>01C4 # LATIN CAPITAL LETTER DZ WITH CARON
>01C6 # LATIN SMALL LETTER DZ WITH CARON
>01C7 # LATIN CAPITAL LETTER LJ
>01C9 # LATIN SMALL LETTER LJ

EZH I'm not sure on, but it *may* be used in some Turkic languages; DZ and
its variants, and LJ and its variants, occur in some Slavic languages and
also possibly in some Turkic languages (mostly those spoken in countries
that split off from the old USSR and are going back to Romanised
chracters).

(In Cyrillic, separate letters *do* exist for each of these in regional
variants that were used before the USSR split up.  This is probably why
they carry over.)

LJ/lj is roughly equivalent to slash-l in Polish, BTW.

>01CA # LATIN CAPITAL LETTER NJ
>01CC # LATIN SMALL LETTER NJ

Used in some Slavic languages, and occasionally in various African
languages.  (In Slavic languages, indicates a palatalised-N (similar to
n-acute in some Slavic languages; the "j" essentially means the same as
the "soft mark" in Cyrillic); in the African languages where this is an
actual character, indicates exactly what it says--an "nj" sound (like "ng"
only one doesn't touch one's palate). :)

>01DE # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON
>01DF # LATIN SMALL LETTER A WITH DIAERESIS AND MACRON

If memory serves, used in Vietnamese (in this case, the macron is a tone
character) and in some transcription schemes for Native American
languages.

>01E4 # LATIN CAPITAL LETTER G WITH STROKE
>01E5 # LATIN SMALL LETTER G WITH STROKE

I've only seen this offhand in *some* transcription schemes for Native
American languages [this indicates *roughly* the same as g-caron; see
below] but it may occur in Turkic languages that are converting to Roman
characters.

>01E6 # LATIN CAPITAL LETTER G WITH CARON
>01E7 # LATIN SMALL LETTER G WITH CARON

Commonly used in Turkish and some other Turkic languages to indicate a
"hard G" sound.  Also occurs, for the same sound, in some Native American
language transcription schemes.

>01E8 # LATIN CAPITAL LETTER K WITH CARON
>01E9 # LATIN SMALL LETTER K WITH CARON

Less common, but does occur in some Turkic languages; indicates a "hard K"
sound (like hard G--you say it in the back of your throat).  Occurs in
some transcription schemes for Native American languages as well.

(As a minor aside--you will find many, MANY standards for transcription
and, in some cases, transliteration of Native American languages.  These
vary from fitting the closest Roman equivalent, to using diacritical marks
for consonants that are "sort" of close [many languages have literally two
to four different ways you can pronounce a consonant sound where we might
have one in English, for example] to using unused characters to represent
sounds ["x" for "sh" and "c" for "soft ch" are rather common] to resorting
to the IPA when there's no really good way to represent it via Roman
characters.  Hence my notes on this. :)

>01EE # LATIN CAPITAL LETTER EZH WITH CARON
>01EF # LATIN SMALL LETTER EZH WITH CARON

Possibly used in some Slavic languages and Slavic transliteration schemes.
Possibly occurs in Turkic languages.  (Again, an "ezh-caron" equivalent
does occur in several local variants of Cyrillic used for "minority
languages" in the USSR.)

>01F1 # LATIN CAPITAL LETTER DZ
>01F3 # LATIN SMALL LETTER DZ

Commonly used in Slavic and Turkic languages; occurs in some Native
American languages as well (most notably the Na-Dene family, which
includes Dine' [Navaho]).

>01F4 # LATIN CAPITAL LETTER G WITH ACUTE
>01F5 # LATIN SMALL LETTER G WITH ACUTE

Fairly unusual, but does occur in some Native American and Slavic (and
possibly Turkic as well, depending on the country's Romanisation scheme)
languages.  Usually indicates a palatalised g sound in the few places
where I've seen it.

>027C # LATIN SMALL LETTER R WITH LONG LEG

Fairly unusual; used in some Native American languages as an R-variant.
This is borrowed from the IPA, offhand.  This also, occasionally, occurs
in transcription schemes for some African languages.

Some Turkic languages may use it; not sure (at least I've not *seen* any)
however.  

>Do you know a good reason why any of these characters should
>go into a simple European character set?

Some of them I'm sort of puzzled on m'self.  Some (like Y-dieresis and
Y-acute-dieresis, for example) I can see as they are used in languages
with a known, large audience on Usenet (for instance, Vietnamese-language
or Cymru-language newsgroups).

Some of them, I will frankly admit (namely, *all* the Greek characters
noted and, possibly, some of the other *unusual* letters like longleg-r
and k-macron, etc.) puzzle me why they're included.  (As far as I know,
longleg-r only exists in a few Native American transcription schemes and
in some African-language transcription schemes; unless there is a large
Usenet population of folks wishing to type in Salish, I'm not sure why it
should be there.  [If it is in there, we should go ahead and add upside-
down K/k, upside-down T/t, cedilla-H, Latin-omega-acute-dieresis,
Latin-chi, etc. and all the other IPA characters you *have* to import from
the IPA to write some of the languages of that area. :)  And, of course,
import Latin capital-schwa and Latin small-schwa for our friends in
Azerbaijan; hell, let's just import the entire IPA and be done with it :)

Ah well...I'm sure the author will be glad to explain, in any case. :)

-moo


------------------------------------------------------------------------
From: tc31@cornell.edu (Thomas Chan)
Date: 15 Aug 1998 06:59:51 GMT

>If memory serves, used in Vietnamese (in this case, the macron is a tone
>character) and in some transcription schemes for Native American
>languages.

No diaeresis's and macrons in Vietnamese.

One needs:

one of <a>, <i>, <u>, <e>, <o>, <y>

with possibility of a circumflex on <a>,
or a horn on <o>,
or a horn on <u>,
or a circumflex on <o>,
or a circumflex on <e>,
or a breve on <a>

plus nothing,
or acute accent,
or grave accent,
or "curl", (sorry, do not know technical name for this)
or tilde,
or dot underneath (is there a technical name for this?)

(Not all of the above combinations will exist.)

(Optionally, a <2> or <z>-like hybrid of "curl" and tilde
may occur in the handwriting of southern Vietnamese
speakers who do not distinguish the two tones
marked by those diacritics.)


Thomas Chan
tc31@cornell.edu


------------------------------------------------------------------------
Date: 15 Aug 1998 09:21:12 -0500
From: cnahr@ibm.net (Christoph Nahr)

>The long s might be from German Fraktur fonts which is unused
>since ~1945. This letter has certainly no equivalent in modern
>German roman/antiqua fonts and is certainly not needed to
>write German:
>
>017F # LATIN SMALL LETTER LONG S

While I agree that this letter is not needed in a basic European
character set your reasoning is quite wrong.

The long s was actually used in *both* Fraktur and Antiqua (i.e.
non-Fraktur) typefaces for centuries, and is completely unrelated to
any "Germanness".  You should see lots of long s in any older English
(French, Italian, ...) book.  The only difference is that Antiqua (or
"Latin") typefaces eventually dropped the long s while Fraktur
typefaces kept it to this day.

As for Fraktur going out of fashion in Germany by 1945... well, the
connection between Nazis and Fraktur is a common misconception.
Actually, the Nazi government *discouraged* use of Fraktur in 1940
because Hitler thought it outdated and contrary to his plans to
"modernise" Germany according to Nazi ideology.

As for Fraktur being "unused" today... several new Fraktur typefaces
have been designed during the past few decades by German designers.
If you go to any newspaper stand you'll see plenty of Fraktur
headlines on newspapers of any nationality.  Station and street signs
are also frequently set in Fraktur.  But I agree that Fraktur
typefaces are only being used as decorative fonts these days, not as
text fonts which is the important criterium for this discussion.


------------------------------------------------------------------------
Date: Sat, 15 Aug 1998 10:40:10 -0500
From: ehrich@minn.net (William Ehrich)

If we can afford to include just one letter for historical / sentimental
reasons I would like that to be:

> 017F # LATIN SMALL LETTER LONG S

It is useful for quoting most old English and German literature.

-- William Ehrich


------------------------------------------------------------------------
From: Constantine Stathopoulos <cstath@irismedia.gr>
Date: Mon, 17 Aug 1998 19:10:38 +0300

In the early eighties, the Greek Parliament by law decided to adopt a
simplified accentuation system for use in state documents and the
general educational system. Many people for a variety of reasons that
are not of interest here, did not adopt the simplified accentuation
system (monotonic) in their writings and continue to use the polytonic
system.

For example, the polytonic system is still exclusively used  by the
Eastern Orthodox Church and the Greek Scholars world-wide. Also, in
Athens only, there is a couple of daily newspapers and half a dozen
journals of mass circulation that I know of, that are still published in
the polytonic. Moreover, all works in Greek before the '80s were printed
in polytonic and most re-prints use polytonic, too. Finally, there is a
growing tendency in Greece today to prefer the use of the traditional
polytonic system as an artistic choice in quality publications, such as
albums or cultural publications.

Regrettably, up to now with the introduction of Unicode Standard, the
Greek world had no eazy or standard way to use the complete set of Greek
characters. Beta code,SMK and custom TT fonts are all isolated attempts
made by non-Greek scholars to satisfy their immediate and pressing need
for computerized publications. The Greek printing industry developed its
own custom specialized polytonic fonts and solutions, none of which was
meant for home computers, and the mass market in Greece just continued
to use the pre-computer polytonic printing tools, such as hand-writing
or polytonic typewriters. Myself, for example, use the traditional
system in hand-writing and the
monotonic of  ISO 8859-7 (MS Windows 1253) when it comes to computers
due to the lack of an acceptable standard, since Extended Greek is still
not implemented in MS Windows, although from what I hear Microsoft is
currently working on it.

I understand the complications  that led to the proposal of  removing
Greek polytonic letters from MES-2 and will perhaps result in exactly
that effect in the end. The only reason that urged me to send this
e-mail was to clear out where and how the polytonic system is still used
and to explain that the absence of an accepted polytonic standard has
deprived many greek-speaking people of  the right to choose. I do not
have any data on what the number of these people is; but I believe that
they are enough to justify the inclusion of Greek Extended in the MES-2
standard, even if that means that in a number of applications Greek
Extended will be difficult to implement or will not be implemented at
all.

Thank you for your time,

Sincerely,
Constantine Stathopoulos.


------------------------------------------------------------------------
From: pla@sktb.demon.co.uk (Paul L. Allen)
Date: Mon, 17 Aug 1998 18:36:48 +0100

> 0x83    0x0192  #LATIN SMALL LETTER F WITH HOOK

Certainly the Dutch Guilder/Gulden/Florin symbol.


------------------------------------------------------------------------
Date: 17 Aug 1998 13:30:36 -0400
From: Chris Maden <crism@oreilly.com>

> 0x83    0x0192  #LATIN SMALL LETTER F WITH HOOK

This is the guilder sign.  Unicode, for whatever reason, doesn't
include an actual guilder/florin sign, but the small f with hook looks
right.  This mapping is an approximation.  Both the Windows and
Macintosh character sets include the character, so its omission from
Unicode was a surprise to me.

> 0x88    0x02C6  #MODIFIER LETTER CIRCUMFLEX ACCENT
> 0x98    0x02DC  #SMALL TILDE

These are to distinguish between the character and the accent.  The
circumflex (shift-6 on most US keyboards) is now used for the literal
character (for TeX superscript, regexp inversion...), and so a
distinct character is needed for the diacritic.  Similarly, the tilde
is now used for home directories or approximation; a smaller tilde is
needed for using as a diacritic.

-Chris

------------------------------------------------------------------------
From: Michel Suignard <michelsu@microsoft.com>
Date: Mon, 17 Aug 1998 12:28:41 -0700

> CP1252 is a rather straight forward extention of
> ISO 8859-1 with useful characters. There are only three characters in
> CP1252 for which the purpose is completely unclear to me:
>
> 0x83    0x0192  #LATIN SMALL LETTER F WITH HOOK
> 0x88    0x02C6  #MODIFIER LETTER CIRCUMFLEX ACCENT
> 0x98    0x02DC  #SMALL TILDE
> 
> Could you give me any idea what these characters were intended for?

All characters from 1252 have been in the Minimum European Subset
repertoires (except the recent 6937 transposition) created by TC304 since
the beginning. I was in fact part of the effort of creating the WGL4 and
have seen your recent posting on the subject. WGL4 has a long history and
its sources are multiple (original PC code pages, OS/2 UGL, Macintosh code
pages etc...) which make rather difficult to trace rationale. I can give
some hints for the characters you mentioned.
0x83 is originally from code page 437 and has been used sometimes as the
symbol for the Dutch florin. It is probably deprecated now, but it was
probably seen as useful for a while and as you know when characters get in
they can never be removed.

The modifier letters have been used to convey transient encoding for dead
key processing.

In my opinion it is rather futile to try to find rationale for character set
content that have been in usage for many many years. It is now a legacy
thing and the mere fact that they exist is a good enough reason that they
need to be encoded in a larger repertoire. It is basically the same reason
that Unicode had to allow an one to one mapping for all largely used prior
encoding, whatever the content of these prior encoded character sets made
complete sense or not.

Finally, WGL4 was created to allow from a single Unicode font to represent
all popular code pages we had to support, while at the same time adding a
better coverage of the Latin repertoire (all of U+0100-017F for example).

Michel Suignard

------------------------------------------------------------------------
From: Andrew Hawke <ach@aber.ac.uk>
Date: Tue, 18 Aug 1998 17:10:15 +0100

> 1E80 # LATIN CAPITAL LETTER W WITH GRAVE
> 1E81 # LATIN SMALL LETTER W WITH GRAVE
> 1E82 # LATIN CAPITAL LETTER W WITH ACUTE
> 1E83 # LATIN SMALL LETTER W WITH ACUTE
> 1E84 # LATIN CAPITAL LETTER W WITH DIAERESIS
> 1E85 # LATIN SMALL LETTER W WITH DIAERESIS
> 1EF2 # LATIN CAPITAL LETTER Y WITH GRAVE
> 1EF3 # LATIN SMALL LETTER Y WITH GRAVE

Markus, 
       you e-mailed Geraint.Price@cl.cam.ac.uk regarding the frequency
of certain Welsh letter+accent combinations. He submitted your query
to the WELSH-L discussion list. I have replied to the list, but I also
felt that I should take the liberty of contacting you directly, as this
is something I have strong views on.

Some background:
I am Assistant Editor and Systems Manager for the University of Wales 
Dictionary of Welsh, the standard scholarly dictionary of the language. 
I also chair the Celtic Texts Specialist Group of the International
Association of Literary and Linguistic Computing. The University of
Wales has an orthography committee which publishes guidelines for
Welsh spelling which are accepted by all Welsh writers and
publishers. These notes are based on those guidelines. Welsh is now
legally one of the two official languages of Wales, on an equal
legal footing with English. The government has established a body
called the Welsh Language Board to promote the use of Welsh. The
language is now taught in every school in Wales (and is the main
language of instruction in many of them). Some 600 books and
many magazines and newspapers are published annually. The use of the
language in all spheres, and increasingly in business, public life,
the administration of justice, education, government and the media 
(there is a Welsh-language TV channel) is growing rapidly. Welsh is
spoken by approximately 500,000 people in Wales, and by several hundred
thousand outside Wales. The number of speakers showed a slight increase
at the last census, after nearly a century of continuous decline.

The availabilty of character sets to represent the language is
absolutely essential, and such character sets should be as complete
as possible. In the past, the lack of appropriate character sets
has been a considerable deterrant to using the language in print and
electronically. I would urge you to bear this in mind when considering
the following.

Johann van Wingen (of the Netherlands WG on ISO 10,460) pushed hard for
the inclusion of all the possible Welsh letter/accent combinations,
which was eventually accepted by the ISO and subsequently Unicode.

Microsoft has also committed to including the 13 additional characters
in its OpenType fonts. I have communicated extensively on this point
with John Hudson of Tiro Typeworks in Vancouver (www.tiro.com) who has
been working on OpenType fonts for Microsoft and for academic purposes.
I reproduce below my main comments to him which may be of assistance
to you.

===================== COPIED MATERIAL FOLLOWS ================

Modern usage of the diacritics in Welsh is as follows:

(All diacritics are shown following the vowel which is accented, e.g.
a^ represents a lower-case a with a circumflex accent.)

Welsh requires the circumflex (^), acute ('), grave (`), and diaeresis (")
on all vowels, i.e. a, e, i, o, u w, y (w being used in Welsh both as a
vowel and a semi-vowel). The incidence of these combinations varies very
widely.

All diacritics (accents) in Modern standard Welsh are compulsory and are
used to differentiate between different pronunciations of otherwise
similar- or identical-looking words, either in terms of length (long vs.
short) or stress. The stress accent in Welsh always falls on the penultimate
syllable, unless an accent (or a hyphen or an inserted h) indicates otherwise.

BECAUSE OF THIS, ALL THE ACCENTED WELSH CHARACTERS ARE REQUIRED, IN BOTH
UPPER- AND LOWER-CASE FORMS.

The circumflex is used solely to indicate that a vowel is long in a context
in which it would normally be expected to be short, e.g.:

        gwa^n `he pierces'      vs.     gwan `weak'
        gwe^n `a smile'         vs.     gwen `white (fem.)'
        pi^n `pine (wood, tree)' vs.    pi`n `a pin'     
        co^r `a choir'          vs.     cor `a dwarf'
        bu^m `I was (perfect)'  vs.     bum `five (mutated)'
        tw^r `a tower'          vs.     twr `a group'
        y^m `we are'            vs.     ym `in (before m)'

The diaeresis is used to separate vowels, as in English:

        prosa"ig `prosaic', cre"wr `creator', copi"o `to copy',
        tro"edigaeth `conversion', du"wch `blackness', Rebacay"ddiaeth
        `Rebaccaism', cyw"res `concubine'

The acute accent is used to indicate unexpected stress (i.e. not on the 
penultimate):

        casa'u `to hate', case't `cassette', ricri'wt `a recruit'
        paraso'l `a parasol', rebu'wc `a rebuke', 
        caridy'ms `riff-raff', gw'raidd `manly' (this last is on the
        penult, but is to distinguish it from the word gwraidd `root',
        which is monosyllabic)

The grave accent is used to indicate that a vowel is short in a context
in which it would normally be expected to be long:

        pa`s `a pass, permit'   vs.     pas `a cough'
        sie`d `a shed'          vs.     sie^d/sied `escheat'
        sgi`l `a skill'         vs.     sgi^l/sgil `following'
        no`d `a nod'            vs.     nod `a target, an aim'
        cu`l `a hut'            vs.     cul `narrow'
        mw`g `a mug'            vs.     mwg `smoke (n.)'
        py`g `dirty'            vs.     pyg `pitch, tar'

Generally speaking, diacritics in Welsh cannot reasonably be omitted as they
are used either to show unusual stress, or to differentiate between pairs of 
otherwise identical words with different pronunciations. As such they are
equally necessary in upper- and lower-case forms.

The commonest diacritic is the circumflex, followed by the acute and diaeresis
probably about equally. The grave is rare, but as more and more words are
borrowed from English, and new compounds coined for technical terms, their
use will undoubtedly increase.

To give a very rough indication, according to the headwords in our (unfinished)
dictionary (which we estimate will contain about about 84,500 entries), the
number of accented keywords (extrapolated to the expected finished size of the
dictionary) will be roughly: 

        circumflex: 2,000; diaeresis: 880; acute: 500; grave: 160

All the above remarks refer to Modern Welsh orthography.

=========================== COPIED MATERIAL ENDS ==========================

From the background information you supplied, I certainly feel that the
character set for publishing, etc., MUST include all the possible
combinations. The low-end set should ideally be as complete as possible,
but I do appreciate that this may cause problems. If a compromise HAS
to be made, w+" could be dispensed with first, followed by w+' and y+`.
The upper-case versions are less essential than the lower-case ones,
but W+^ and Y+^ MUST be retained, even at the expense of the more
dispensable lower-case combinations just mentioned (w+", w+`, y+`).
E-mail and Web applications are becoming increasingly important in
Wales, and the ability to write correctly is of great benefit.

Thank you for your interest in the language. Please do not hesitate
to contact me if you wish to discuss the matter further. If you feel
there is someone else to whom I should make representations (such
as a UK representative, perhaps), please send me contact details.

With best wishes

Andrew Hawke


------------------------------------------------------------------------
From: Jrg Knappen <knappen@springer.de>
Date: Wed, 19 Aug 1998 13:07:34 +0200

Markus,

the function sign probably comes from a legacy character set. It makes
no sense to declare a math italic f a character for its own right.

The latin small letter f with hook is indeed used in several languages
of western subsaharan africa, e. g. Ewe, where it contrasts with the
usual f.

The Unicode merger of this letter with the florin symbol (or in
Macintoshese: folder sign) is a misfeature of UNicode. The same 
goes for the merger with the mythical `function sign'.

--J"org Knappen


---------------------------------------------------------------------------

