[Oriya-group] Re: Rebati Glossary

Gora Mohanty gora_mohanty at yahoo.co.in
Fri Dec 24 18:33:52 IST 2004


 --- hrpansari at vsnl.net wrote: 
[...]
> I downloaded it and thoroughly read. But unable to
> understand. All Indic scritps are included in it.
> Not separately for Oriya. 

Well, if you could install it on your Linux platform
(instructions are included in the distribution: all
you have to do is copy two files, and recompile your
Oriya locale) you can try actually using it to sort
UTF-8 text. A program to mindlessly generate all
possible combinations of 1-4 consonant conjuncts is
included, so that you can use it to test the sorting.
The alternative is if you generate a test file in
UTF-8 and mail it to me, I can sort it and return it
to you for comments.
  Yes, all Indian (not Indic) languages in Unicode are
included, but it is largely untested, except for
Devanagari and Oriya where I have followed dictionary
order and done some simple tests. It almost certainly
has problems for other scripts, and probably misses
complex problems even in Devanagari and Oriya. However
this is the best I can do unless I get feedback from
people who know the language better.

> Lot of problems/confusions are to be solved :
> 
> Whether the 0B56 and 0B57 are to be ignored?

Currently, they are placed after U0B4C (vowel sign
au).
As these never appear in Oriya text, I do not see a
problem with this.

>                                               0B5F
> should appear just after 0B2F.

Yes, you are probably right. Currently it is not, but
I will fix that.

>                                0B71 should appear
> just after 0B35.

OK. Will fix that too.

>                  Where to place 0B3C, 0B3D, 0B70 or
> to be ignored? 

Yes, I was not too sure about this. Currently, I have
U0B3C, U0B3D immediately after the anusvara, visaraga
and candrabindu which are at the beginning. U0B70 is
placed after the signs and vowels (but before the
consonants). No good reason for any of this, and I
would be willing to hear opinions.

> In general/practical/logical usages 0B01, 0B02, 0B03
> comes after 0B14. But some traditional dictionaries
> use them at begining wrongly/unscientificaly/blindly
> following the SANSKRIT SLOKAs/Mantras where the 'OM'
> comes first. 0B01, 0B02 should not be dealt as
> representative/modifiers of 'OM'. No general Oriya
> words/sentences begin with it and never could be as
> per phonics.  This mis-conception paralysed total
> Indic computing. As we are the pioneers, we have to
> set/lay a correct path.

Pansaribabu, again we will have to agree to disagree.
As per dictionary order, it is U0B02 (anusvara),
U0B03 (visarga), and U0B01 (candrabindu). Logic has
nothing to do with it. I am, however, most willing
to change this upon a consensus from linguists.

> I seen sometimes a simple Oriya collation table at
> Unicode.org but presently not accessing. I will get
> back with a simple table.

Unicode has released the CLDR, the draft version of
which had a sorting order for Oriya. Unfortunately,
in their wisdom, they have chosen to use XML locale
specifications, without providing a tool for
conversion
to actually usable POSIX locales (I am told the next
CLDR release due soon will have such a tool). Thus, I
am not too keen to spend my time right now learning
about XML locales. The CLDR will become the Unicode
standard, and probably even be adopted eventually by
POSIX and glibc. However, while they are willing to
listen to outside parties, they officially take input
only from official government folk. Right now, from
what I understand, the input is from Dr. B.L. Mohanty,
Lecturer, Eastern Regional Language Centre,
Bhubaneswar, India. Is this someone from OCAC?

Regards,
Gora

________________________________________________________________________
Yahoo! India Matrimony: Find your life partner online
Go to: http://yahoo.shaadi.com/india-matrimony



More information about the Oriya-group mailing list