Thousands of Sanskrit texts are available only in print. Join our global volunteer effort to digitize these texts and make them accessible to all.

Complete guide

Comprehensive guidelines for proofing a text. And when in doubt: don't change what the page says!

This document defines the proofreading guidelines we use on Ambuda. Generally, these guidelines apply to all of the projects we proof.

What is our goal?

We want to create machine-readable texts that we can display on Ambuda and share with others.

We should create these texts at the highest quality we are capable of. We should establish credibility by making clear where these texts come from (which books, editions, editors, publishers, …), how they were created, and who was involved.

The golden rule

Don't change what the page says. We follow the conventions of the printed book as closely as we can.

  • Follow the book's spelling conventions. If the book says कार्य्यते or सङ्गच्छति, then we write कार्य्यते (not कार्यते) and सङ्गच्छति (not संगच्छति). A similar idea applies for text in Roman print; if the book says çru, then we should write çru and not śru.

  • Follow the book's script conventions. If the book uses Devanagari, we should use Devanagari, not Roman script or some other convention.

  • Follow the book's word-splitting conventions. Different books have different conventions for splitting long blocks of Devanagari text or for adding and removing Sanskrit's sound changes (सन्धि). We follow the conventions of the book regardless of our personal preference.

  • Include everything meaningful on the page. If the book contains spurious verses and footnotes, we should include those as well.

That said, our goal is not to create a pixel-for-pixel perfect copy of the text. (The scanned book already does that.) Instead, we want to capture the information and the structure in the original book as accurately as possible. For details on what we can include or exclude, see the sections below.

Characters

Rules that start with [auto] can be fixed with software. Fixing them is not a good use of time.

Hyphens at the end of a line

If the printed line ends with a hyphen, we keep the hyphen. Our program uses these hyphens to stitch together different lines:

स भगवान्सृष्ट्वेदं जगत्तस्य च स्थितिं चिकी-
र्षुः ...

Exception: If the hyphen is part of a word that would otherwise have a hyphen (e.g. "front-end," "topsy-turvy"), join the words together so that our program preserves the hyphen.

Hyphens and dashes

Books often use two kinds of horizontal lines:
  • Hyphens (-) appear at the end of lines and between words in a compound. Write these with a single dash (-).

  • Em dashes (—) separate different phrases, usually for emphasis. Write these with two dashes (--).

[auto] Spaces around punctuation characters

We don't leave any space between a dash and the words around it:

We remove the space to the right of a "(" or "[" character and to the left of a ")" or "]" character:

[auto] Quotation marks

We use "straight quotes" without any special formatting.

[auto] Spaces at the end of a line

We delete spaces at the end of a line:

Paragraphs

Line breaks and line-ending hyphens

Keep line breaks and line-ending hyphens.

Proofing is easier when we can quickly compare our digitized text to the original image. Line breaks and line-breaking make our digitized text look more similar to the image, which means we can proofread more quickly. (A special program will remove these later.)

Spaces between paragraphs

Separate paragraphs and verses with blank lines.

Page headers and page footers

Page headers and page footers (but not footnotes!) are the small text and page numbers that appear on every page. They don't have any useful information, and you should delete them.

Footnotes

In the main text, mark the character that indicates the footnote by wrapping it in brackets and placing a ^ character after the first bracket.

At the bottom of the page, begin the footnote with the same convention. An example:

प्रपेदे पुनरुद्भेदः शुचिनां[^१] कच्छकेतकैः ।
उपक्रियायाः सदृशं नारेभे रविसूनुना ॥ ७ ॥

[^१] चीनां । शूचीनां ।

[auto] Indenting paragraphs

You don't need to indent paragraphs. But if the paragraphs are indented, you can leave that spacing in. (A special program will remove indents later.)

Pages

Skip any content that is not part of the original book. This includes handwritten notes, stamps, watermarks, dirt, stains, etc.

Annotations

Errors and corrections

If the text has a typographical error, wrap the error in the <error> tag. If needed, add a fix with the <fix> tag.

This is an <error>example</error> <fix>example</fix>.

Notes

If you need to make a note in the text, use the <note> tag:

<note>This is a transpositional error.</note>.

Uncertain text

If the text is confusing or uncertain, use the <flag> tag so that another proofer can notice and take a look.

This is a <flag>xople?</flag>.

If your question isn't answered here

These guidelines are a work in progress. If you don't see a clear answer to your question here, come discuss it with our team:

  1. Join our Discord server.
  2. Join the #proofreading channel.
  3. Ask your question on the channel.