4.2 C. 11. Important additional information

4.2  C.  11.  Important additional information

Back to:  4. Proofread a Book

              1)  Correct obvious scanning and formatting errors
              2)  Don't correct any errors made by the Author or Editor!
              3)  Only remove illegible text if doing so does not affect the Page Numbering
              4)  Don't add any text of your own to the book
              5)  Don't remove Blank Pages if the Page Numbering will be affected
              6)  Getting the page numbering that is in the original book"

1)  Correct obvious scanning and formatting errors

      All scanned books will have some character recognition and formatting errors.  Volunteers are
      encouraged to correct these errors since these corrections improve the quality of books in the
      collection.  Bookshare staff understands that correcting errors without an original copy of the
      book can be an inexact science, so correction of all errors is not required.

      Software packages like Kurzweil 1000 and OpenBook provide tools which can be helpful in cleaning
      up books, such as Automatic OCR Correction, Automatic Hyphen Removal, and Ranked Spelling.
      The use of these tools is encouraged, but volunteers should be aware that automated tools can cause
      some errors at the same time that they are correcting many others.  The tools should only be used
      when the benefits outweigh any problems they can cause.  The best way to determine this is by
      making a backup copy of a book, running the tools, and checking the results.

      Volunteers proofreading books should be aware that the submitter may have run these tools before
      submitting the book and corrected any errors which were introduced, so running the tools again may
      occasionally cause more errors than running them will fix.

      Notes:  A library stamp, while not a scanning error, is also not a part of the copyrighted text and
      should be removed.

      On the other hand, the symbol "©", a copyright symbol, appears in many copyright notices.
      Volunteers need to be careful not to mistake this for a scanning error or to accidentally change it
      while proofreading on an electronic Braille device such as one of the Braille notetakers.

2)  Don't correct any errors made by the Author or Editor!

      These errors should not be changed since doing so is editing an author's work rather than cleaning
      up a scan.  Bookshare understands that mistakes can be made as part of the process for adding a
      book to the library, but volunteers should understand that correcting errors in a book when they know
      the error is not a scanning error is a violation of their volunteer agreement and could result in a
      violation of copyright law.

3)  Only remove illegible text if doing so does not affect the Page Numbering

      It is rare for an entire page of the main content of a book to be illegible.  While you are free to fix
      errors in a book's main content, simply removing illegible text does not improve the reader's
      experience and may in fact make the book harder to read, as well as disrupt pagination.  Therefore,
      you may fix illegible text within the main content, but you should not simply remove it.

      Illegible text may be removed in material which is not required for a book to be complete if the
      removal does not make the book harder to read or disrupt pagination.  For example, removing lines
      of illegible text in a Table of Contents would make a book harder to read since it would be obvious
      that lines were missing, while removing illegible text caused by scanning a picture would make a book
      easier to read.  In the event that an entire page of illegible text is removed, then the page should be
      left blank or contain a brief note about the contents which were removed in order to preserve page
      numbering.  See the following section for examples of notes which are permitted.

4)  Don't add any text of your own to the book

      Adding text is a violation of your volunteer agreement and could result in a violation of copyright law.
      An exception to this would be text entered to explain a picture, chart, or other graphic element rendered
      unreadable by character recognition.  You are not required to enter such explanatory text but may
      do so if you choose.

      Examples of brief notes which are permitted:

          *  Picture of Seabiscuit
          *  Map of Oz
          *  Diagram of the Starship Enterprise
          *  From the Book Jacket

      It is often helpful to put such notes in square brackets in order to signal to readers that they are a
      note from the submitter, i.e. [map of Oz].

5)  Don't remove Blank Pages if the Page Numbering will be affected

      Blank pages are often found in the following places in books:

          *  After the Table of Contents
          *  After the Dedication
          *  After the Acknowledgments
          *  After pages which identify a section of the book as Part I, Part II, etc.
          *  Between chapters
          *  Before the Glossary
          *  Before the Index
          *  Before the appendices

      Blank pages should only be removed if doing so does not affect a book's page numbering.  For
      example, it is sometimes obvious that a blank page is page 85 because the page before it is
      numbered 84 and the page after it is numbered 86.  In this instance, the blank page should not be
      removed, and it may be helpful to add the page number to the page to help prevent it from
      accidentally being lost when converting the file to a different format, accidentally being removed by
      a volunteer during proofreading, mis-numbered by the tool which processes page numbers for DAISY
      books, or by the tool responsible for identifying original page numbers in BRF books.

      Note that books often begin numbering pages with the first page of the main content, but page
      numbering can also begin with the first page of a book.  When numbering begins with the first page
      of the book, then blank pages should be included in the front of the book to preserve page numbering.

      Some books will have photos, advertisements, or other material in the middle of the book.  Remove
      these pages only if they do not disrupt the page numbers.  Many times the pages on each side of
      the material will be sequential (for example, page 112, six pages, page 113).  In this case, please
      remove the pages from the scan to preserve page numbering.

6)  Getting the page numbering that is in the original book

      It's crucial that every book in the Bookshare library have the same page numbering as that in the
      original physical book.

      When a book is uploaded after proofreading, it's processed by a tool which attempts to determine the
      correct page numbering throughout the book.  This tool looks for a page number on each page.

      However, page numbers are usually on the same line of a page as a "running header" or "running
      footer" (text that recurs at the top or bottom of each page throughout the body of a book, such as the
      book title, the author, or the title of each chapter).  To identify a page number, the tool attempts to strip
      out all the text of each running header in the hopes that what remains is the page number.  Then it
      stores the page number for that page.

      But since the OCR usually will make mistakes in some headers, the tool isn't always successful at
      stripping out all the running header text, from each page.  The best method is to have a proofreader
      remove the running header (or footer) from each page, leaving the page number as the only text
      on that line.

      Long-time Bookshare engineer Jake Brownell has written about this tool, called the RTF Converter,
      saying it attempts to do four things:

          1)  Identify recurring text at the top and bottom of each page known as "running headers and footers"
          2)  Identify page numbers in the running headers and footers
          3)  Remove text from running headers and footers
          4)  Process the page numbers to allow easier navigation of DAISY books and include original
               page numbers in BRF books

      Volunteers can assist this tool by insuring that headers and footers are consistent throughout the book:

          *  Capitalization should be used consistently throughout headers and footers
          *  Tabs and spaces should be used consistently throughout headers and footers
          *  Page numbers and text in each header or footer should be on the same line
          *  Page numbers should be separated from text in a header or footer by a space or tab

      Note that the text will be removed by the tool only if it is completely consistent.  So the easiest way
      to insure that the tool can identify page numbers is to remove all text except for the page
      numbers themselves, from running headers and footers. This is not only much easier for
      volunteers, but makes the tool's job more effective as well.

      Please do the following for each page containing a header or footer:

          *  Make sure that if a page contains a header that the header is the first line of text on the page.
          *  Make sure that if a page contains a footer that the footer is the last line of text on the page.

      Note that if a line contains no words of text, but does contain junk characters in the form of
      punctuation marks or symbols, such as hyphens and asterisks, that line will still be considered as
      a "line of text" by the tool.  The tool cannot tell that these characters are junk characters, so instead
      of removing such a useless line, the tool will keep this line.  Proofreaders should remove these junk
      characters so that the line contains no characters on it at all.  The tool will then remove such "blank

To the next Topic:  C. Optional proofreading steps

