Re: [indic] Indic Sorting/collation challenging Problem - CLDR 1.3-
data for Indic
hrpansari at yahoo.com
Tue Jan 18 23:48:41 IST 2005
--- Gora Mohanty <gora_mohanty at yahoo.co.in> wrote:
> I don't follow what you are saying here. To take a
> practical example with the letter "ka", as per my
> understanding the correct order should be
> ka + anusvara
> ka + visarga
> ka + candrabindu
> Are you saying that we should change this?
No. You have set it rightly. I also support/like this
As per the views of Phonics scientists (as
tested/measured with frequency meters)the right order
kr [k+vocalic R)
krr [k+vocalic RR)
kl [k+vocalic L)
kll [k+vocalic LL)
ka + candrabindu
ka + anusvara
ka + visarga
But traditional dictionaries Hindi/Oriya/Other Indic
also follows "ka+VM < ka".
(VM=Vowel Modifier, ie. anuswara < visarga <
chandrabindu -- as per Oriya traditional
(chandrabindu < anuswara < visarga, -- as per Hindi
and other Indic Languages traditional dictionaries).
[< = less than]
All literary experts/pandits bounds/pressurises us for
that unscientific order. This is too dificult/compex
for computer's defaults.
> If so, I
> don't think that will be possible as that is the
> accepted dictionary order, and we cannot confront
> computer users with a different order. If you agree
> with this order, the current Indic collation table
> implements this, regardless of where the combined
> letter lies in the word, i.e., it works correctly
> even at the end of the word.
Right, But the (old orthodoxy) literary pandit's
traditional order not in line with computer's
defaults. This anyhow works where VM appears in
begining/middle of a word but does not work where VM
appears at end of a word (i.e. before a Space
> > (2) The character with a halant practically has a
> > lower value, how to set computer's direct sorting
> > order that character with halant should come first
> > character without halant should come after
> > even if when it occures at the end of a word?
> This is possible to do with the present POSIX
> used by glibc by defining what is called a collating
> element, e.g., one defines ka+halant as a single
> element, and orders it before ka in the collation
> table. The problem is that there is a glibc bug so
> the locale compiler crashes when defining all the
> collating elements required for all Indian
> I am looking for a workaround. The alternative would
> be to split up the comprehensive sorting table into
> one for each language. Doing this should not be a
> problem with the CLDR locale.
Right. But this applies to all Indic languages.
"X+halant < X" default setting is too complex for all
What will be the root level solve of this? Is not it
lies in the root-level understanding and encodings of
My main aim is to create a sence among the literary
Pandits to forget the old unscientific/wrong
traditional methods and refresh their brains to adopt
new correct/scientific order to have a global
compatitibiltiy for Indic languages in this era of
internet. So that Indic could be simplified and could
be better/easier than English etc.
Do you Yahoo!?
The all-new My Yahoo! - Get yours free!
More information about the Oriya-group