Thousands of Sanskrit texts are available only in print. Join our global volunteer effort to digitize these texts and make them accessible to all.
Comprehensive guidelines for proofing a text. And when in doubt: don't change what the page says!
This document defines the proofreading guidelines we use on Ambuda. Generally, these guidelines apply to all of the projects we proof.
- What is our goal?
- The golden rule
- If your question isn't answered here
What is our goal?
We want to create machine-readable texts that we can display on Ambuda and share with others.
We should create these texts at the highest quality we are capable of. We should establish credibility by making clear where these texts come from (which books, editions, editors, publishers, …), how they were created, and who was involved.
The golden rule
Don't change what the page says. We follow the conventions of the printed book as closely as we can.
Follow the book's spelling conventions. If the book says कार्य्यते or सङ्गच्छति, then we write कार्य्यते (not कार्यते) and सङ्गच्छति (not संगच्छति). A similar idea applies for text in Roman print; if the book says çru, then we should write çru and not śru.
Follow the book's script conventions. If the book uses Devanagari, we should use Devanagari, not Roman script or some other convention.
Follow the book's word-splitting conventions. Different books have different conventions for splitting long blocks of Devanagari text or for adding and removing Sanskrit's sound changes (सन्धि). We follow the conventions of the book regardless of our personal preference.
Include everything meaningful on the page. If the book contains spurious verses and footnotes, we should include those as well.
That said, our goal is not to create a pixel-for-pixel perfect copy of the text. (The scanned book already does that.) Instead, we want to capture the information and the structure in the original book as accurately as possible. For details on what we can include or exclude, see the sections below.
Hyphens at the end of a line
If the printed line ends with a hyphen, we keep the hyphen. Our program uses these hyphens to stitch together different lines:
स भगवान्सृष्ट्वेदं जगत्तस्य च स्थितिं चिकी- र्षुः ...
Exception: If the hyphen is part of a word that would otherwise have a hyphen (e.g. "front-end," "topsy-turvy"), join the words together so that our program preserves the hyphen.
Hyphens and dashesBooks often use two kinds of horizontal lines:
Hyphens (-) appear at the end of lines and between words in a compound. Write these with a single dash (-).
Em dashes (—) separate different phrases, usually for emphasis. Write these with two dashes (--).
[auto] Spaces around punctuation characters
We don't leave any space between a dash and the words around it:
We remove the space to the right of a "(" or "[" character and to the left of a ")" or "]" character:
[auto] Quotation marksWe use "straight quotes" without any special formatting.
[auto] Spaces at the end of a lineWe delete spaces at the end of a line:
Line breaks and line-ending hyphens
Keep line breaks and line-ending hyphens.
Proofing is easier when we can quickly compare our digitized text to the original image. Line breaks and line-breaking make our digitized text look more similar to the image, which means we can proofread more quickly. (A special program will remove these later.)
Spaces between paragraphs
Separate paragraphs and verses with blank lines.
Page headers and page footers
Page headers and page footers (but not footnotes!) are the small text and page numbers that appear on every page. They don't have any useful information, and you should delete them.
In the main text, mark the character that indicates the footnote by wrapping it in brackets and placing a ^ character after the first bracket.
At the bottom of the page, begin the footnote with the same convention. An example:
प्रपेदे पुनरुद्भेदः शुचिनां[^१] कच्छकेतकैः । उपक्रियायाः सदृशं नारेभे रविसूनुना ॥ ७ ॥ [^१] चीनां । शूचीनां ।
[auto] Indenting paragraphs
You don't need to indent paragraphs. But if the paragraphs are indented, you can leave that spacing in. (A special program will remove indents later.)
Skip any content that is not part of the original book. This includes handwritten notes, stamps, watermarks, dirt, stains, etc.
Errors and corrections
If the text has a typographical error, wrap the error in the <error> tag. If needed, add a fix with the <fix> tag.
This is an <error>example</error> <fix>example</fix>.
If you need to make a note in the text, use the <note> tag:
<note>This is a transpositional error.</note>.
If the text is confusing or uncertain, use the <flag> tag so that another proofer can notice and take a look.
This is a <flag>xople?</flag>.
If your question isn't answered here
These guidelines are a work in progress. If you don't see a clear answer to your question here, come discuss it with our team:
- Join our Discord server.
- Join the #proofreading channel.
- Ask your question on the channel.