[Oriya-group] [indic] Re: [Indlinux-hindi] hindi dictionaries

Gora Mohanty gora_mohanty at yahoo.co.in
Mon Jan 31 16:15:33 IST 2005


 --- Hariram Pansari <hrpansari at yahoo.com> wrote: 
[...]
> But Unicode.org states that collation is separate
> task than encoding.

Well, yes it is. But if you will notice, Unicode
code points match pretty closely to collation order,
at least in the languages that I am familiar with.

[...]
> Right. I agree. Sorting order i.e. collation for a
> language must be stadandarised by some International
> body (like ISO) so that, independent of OSs same
> order of Indexing could be possible as a default by
a
> general PC user.

There is no need for ISO to get involved. I do not
know that they would even be interested. The logical
place for these standardizations to happen is at the
state and national level. In India at least, TDIL is
interested in doing this, though they should take more
of an active role. In our state, Orissa, OCAC has now
at least expressed interest in providing a forum for
such standardization efforts. I do not see a need for
international involvement here.

[...]
> Whether this technical feature could be applicable
> (I mean, how could be made available) for all the
OSs
> like Windows, Unix, Mac.... as a defult?

Any POSIX-compliant system should be able to do this.
This includes Mac OS X, most Unices, BSD and Linux.
I am not competent to comment on Windows, but it at
least used to have a POSIX-compliant mode. I suspect
that all systems will move to comply with the Unicode
CLDR.

[...]
> In NLP applications of Indic, in indexing the
> moola_Dhatu and Dhatu_roopa of Sanskrit, 
> in indexing the Astaadhyaayee of Panini...,

Unfortunately, it is beyond my linguistic capabilities
to follow this, but are you sure that actual words in
use differ only by a single halant at the end? This is
the only case I can visualize the sorting order of
consonant + halant vs. consonant mattering. But maybe
there is something that I am missing.

> mostly in Oriya and South Indian scripts the pure
> consonants are used more and better phonetic
> scientifically, (whereas in Hindi: ending character
> of a word is prounciated as pure-consonant but
> written as consonant with 'a' vowel wrongly), 
> 
> in other NLP uses like STT (speech to text) the
> "X+halant < X" and "X < X+Chandrabindu
> (/anuswar/visarga" default  collation feature is
felt
> as a bare necessity.

Now, you are referring to a different issue from
collation; as to whether the encoding itself should
differentiate between consonant and consonant +
halant.
This is desirable in many cases, but is a much more
complicated task than changing the collation order.

Regards,
Gora

________________________________________________________________________
Yahoo! India Matrimony: Find your life partner online
Go to: http://yahoo.shaadi.com/india-matrimony






More information about the Oriya-group mailing list