Talk: Mixed Devanagari & Telugu

Options to hint language to Google OCR

Started 1 year ago

akprasad · 1 year ago

Google OCR has an option to provide language hints during the OCR process. If you think this will be a major ongoing use-case, we can investigate multilingual OCR and see if it’s possible to improve the performance here.

Andhrabharati · 1 year ago

I guess Telugu, Kannada and probably Tamil/Grantha could be added along with Devanagari script.

The first two (languages & scripts) have worthy translations of quite many Sanskrit works, and they could be added with this option.

Andhrabharati · 1 year ago

BTW, these pages are from the VERY FIRST alphabetic Sanskrit lexicon in any language, the शब्दार्थकल्पतरुः (composed by Māmiḍi Venkaṭārya, an Andhra Vaiśya hailed from Machilipatnam in Andhra Pradesh, in early 1800s].

This has been mentioned by HH Wilson in the introduction to the 1st ed. of his Sanskrit-English Dictionary (1819). And this has been even quoted in शब्दकल्पद्रुमः, and thence went into the Boethlingk’s Sanskrit-German Dictionary and then into Monier Williams Sanskrit-English Dictionary (through the SKD quotes).

This शब्दार्थकल्पतरुः work has come in 3 editions so far; first ed. only with Sanskrit entries with Sanskrit meanings (all in Telugu script); 2nd ed. enhanced it by adding Telugu meanings (all in Telugu script); and the 3rd ed. enriched it with Devanagari script for the HWs and the Sanskrit meanings, along with HWs and Telugu meanings in Telugu script.

We’ve the first half completely done few years back (for, and the later half is yet to be done for hosting.