Proofing guidelines

(This page is a rough draft. Please report rough edges to the team.)

This page describes Ambuda's proofing guidelines. All proofers must know and apply these guidelines when proofing texts on Ambuda.

Please read this page carefully! The information here is critical to Ambuda's mission.

Core principles

Don't change what the page says. Follow the conventions of the text as exactly as possible. Corrections and cleanup will come later.

Examples:

  • Spelling. If the page says कार्य्यते or सङ्गच्छति, then we write कार्य्यते (not कार्यते) and सङ्गच्छति (not संगच्छति). If the page says çru, then we should write çru and not śru.

  • Typos. If the page has mistakes, include those mistakes as-is. We will clean them up later.

  • Line breaks. Preserve line breaks and hyphens exactly as they appear on the original page.

  • Script. If the page uses Devanagari, we should use Devanagari, not Roman script or some other convention.

  • Whitespace. Preserve line breaks and spaces between words. If the page doesn't have a space between two words, don't add a space.

  • Word splitting. Different texts apply sandhi rules in different ways. Follow the conventions of the page regardless of personal preference.

  • Secondary content. Include footnotes, spurious verses, and all other meaningful content that appears on the page.

Exceptions:

  • Irrelevant content. Ignore page numbers, repetitive page headers and footers, stamps, handwritten notes, and any other content that is irrelevant to the text.

    Some pages are also irrelevant and can be marked as Not relevant to exclude them from the final text. Examples include index pages, prefaces, publication details, appendices, and so on.

  • Irrelevant whitespace. Whitespace between words or between paragraphs is important because it encodes semantic information. Whitespace around punctuation marks is not interesting and will be normalized when we publish the text. Do your best to match the page, but don't fuss about matching it exactly.

  • Special instructions. If the project has special instructions, such as "ignore this commentary" or "ignore footnotes," follow those instructions.

  • Glyph variants. Different typefaces may represent letters like अ in different ways. (Examples.) Unicode does not support this distinction, so neither do we.

  • Zero-width joiner. Unicode supports a "zero-width joiner" that lets us break apart certain conjunct characters. For example, we can use a zero-width joiner to display the conjunct क्ष as क्‍ष. This difference is tedious to represent and does not change the phonetics of a given word. If you see a word that requires a zero-width joiner, you do not need to insert it.

Structuring the page

Once we have an exact version of the text on the phase, we structure the page content and correct any mistakes the page has. First we will describe structuring.

Structuring has two parts. First, we split the text into blocks and describe the type of each block. Here are our basic block types:

  • Ignore — For irrelevant content (as defined above). This will be removed from the final output.

  • Paragraph — For all prose content. This is the default type for all text. Each paragraph must be in its own block.

  • Verse — For metrical content, including partial verses. Each verse must be in its own block.

  • Footnote — For footnotes. Each footnote must be in its own block. The first token in the block must be the footnote number, e.g. १. or (क).

  • Title — For the title of a text. Typically a text has at most one title.

  • Subtitle — For the subtitle of a text. Typically a subtitle appears just above or below a title. Each subtitle must be in its own block.

  • Heading — For headings within a text.

  • Trailer — For verses or sayings that end a text section.

  • Metadata — A specialized block used by Ambuda admins to prepare a text for publishing.

Second, we mark up text within each block. Markup is typically useful only for specialized texts like plays. All markup types are available under Edit → Mark as in the editor.

Basic markup types:

  • Footnote number — a reference to a specific footnote.

Specialized markup types for plays:

  • Speaker — the person speaking.

  • Stage — a stage direction.

  • Chaya — a Sanskrit translation of Prakrit speech.

  • Prakrit — Prakrit speech. If some text in a block is annotated as chaya, all other text in the block is treated as prakrit. This annotation is useful only if a character switches between Sanskrit and Prakrit, so that we can mark which text the chaya corresponds to.

Specialized options that we need to document better:

  • View → Show advanced options then merge next — Indicates that a block continues onto the next page.

Correcting the page

We make corrections to the page through three different markup options:

  • Error — marks erroneous text. This will be removed from the final output.

  • Fix — marks a correction to the text.

  • Unclear — marks text that is unclear or hard to understand. This will be included in the final output but flagged for review by more experienced proofers.

Saving your changes

When you save your changes, you must choose a status to represent the state of the page.

Options:

  • Needs more work — The page needs additional work or review.
  • Proofed once — The page is correct to the best of your judgment.
  • Not relevant — The page is not relevant to the text. Examples: index pages, blank pages, publication details, forewords, prefaces, and appendices.