[Oriya-group] [indic] Re: [Indlinux-hindi] hindi dictionaries
Hariram Pansari
hrpansari at yahoo.com
Mon Jan 31 10:45:29 IST 2005
--- Gora Mohanty <gora_mohanty at yahoo.co.in> wrote:
> --- jitendras at ncst.ernet.in wrote:
> > It perplexes me to no end that issue of collation
> > order is (as it appears to me)
> > confused with sequence of encoding in ISCII or
> > UNICODE.
> > But it gives me a reason to check my perception.
> > Following questions must be answered succinctly.
> > 1> Is encoding responsible for collation order:
> (my
> > answer NO)
>
> I would agree, though with the comment that encoding
> should correspond to collation order where possible.
But Unicode.org states that collation is separate task
than encoding.
> > 2> Who is responsible for collation order: (My
> > answer)
> a> locale at system level or ,
Yes, peoples i.e. users does not have any
tech.knowledge must get perfect/standardised callation
done by the default of OS.
ISCII BIS IS document also accepts that in some cases
the traditional/illogical Indic sorting order fails by
default of a computer OS.
> Some comments on the points made by Mr. Pansari
> below
> (we had a similar discussion recently regarding the
> collation order in Oriya):
> 1. I do not feel that we, as implementors, have the
> authority to diverge from the accepted dictionary
> sorting order. Such a change, no matter how
> logical,
> can only come through a lingustic consensus, or
> from an official body with the proper authority,
> e.g., the Sahitya Academy.
Right. I agree. Sorting order i.e. collation for a
language must be stadandarised by some International
body (like ISO) so that, independent of OSs same order
of Indexing could be possible as a default by a
general PC user.
> 2. It is entirely possible to have consonant +
> halant
> sorted before consonant. In fact, it would have
> been implemented in the present Indic collation
> table for glibc POSIX locales, but for a bug in
> localedef that has been fixed in the glibc CVS...
Whether this technical feature could be applicable (I
mean, how could be made available) for all the OSs
like Windows, Unix, Mac.... as a defult?
> In any case, this point is hardly of great
> practical importance, as it is unlikely that the
> real-world ordering of two words will come down
> to comparing a consonant + halant to the
> consonant.
In NLP applications of Indic,
in indexing the moola_Dhatu and Dhatu_roopa of
Sanskrit,
in indexing the Astaadhyaayee of Panini...,
mostly in Oriya and South Indian scripts the pure
consonants are used more and better phonetic
scientifically, (whereas in Hindi: ending character of
a word is prounciated as pure-consonant but written as
consonant with 'a' vowel wrongly),
in other NLP uses like STT (speech to text) the
"X+halant < X" and
"X < X+Chandrabindu(/anuswar/visarga" default
collation feature is felt as a bare necessity.
With Regards,
Hariram Pansari
__________________________________
Do you Yahoo!?
The all-new My Yahoo! - Get yours free!
http://my.yahoo.com
More information about the Oriya-group
mailing list