The page Scanning And Proofreading Manual does not exist.
1) Correct obvious scanning and formatting errors
2) Don't correct any errors made by the Author or Editor!
3) Only remove illegible text if doing so does not affect the Page Numbering
4) Don't add any text of your own to the book
5) Don't remove Blank Pages if the Page Numbering will be affected
6) Getting the page numbering that is in the original book"
All scanned books will have some character recognition and formatting errors. Volunteers are
encouraged to correct these errors since these corrections improve the quality of books in the
collection. Bookshare staff understands that correcting errors without an original copy of the
book can be an inexact science, so correction of all errors is not required.
Software packages like Kurzweil 1000 and OpenBook provide tools which can be helpful in cleaning
up books, such as Automatic OCR Correction, Automatic Hyphen Removal, and Ranked Spelling.
The use of these tools is encouraged, but volunteers should be aware that automated tools can cause
some errors at the same time that they are correcting many others. The tools should only be used
when the benefits outweigh any problems they can cause. The best way to determine this is by
making a backup copy of a book, running the tools, and checking the results.
Volunteers proofreading books should be aware that the submitter may have run these tools before
submitting the book and corrected any errors which were introduced, so running the tools again may
occasionally cause more errors than running them will fix.
Notes: A library stamp, while not a scanning error, is also not a part of the copyrighted text and
should be removed.
On the other hand, the symbol "©", a copyright symbol, appears in many copyright notices.
Volunteers need to be careful not to mistake this for a scanning error or to accidentally change it
while proofreading on an electronic Braille device such as one of the Braille notetakers.
These errors should not be changed since doing so is editing an author's work rather than cleaning
up a scan. Bookshare understands that mistakes can be made as part of the process for adding a
book to the library, but volunteers should understand that correcting errors in a book when they know
the error is not a scanning error is a violation of their volunteer agreement and could result in a
violation of copyright law.
It is rare for an entire page of the main content of a book to be illegible. While you are free to fix
errors in a book's main content, simply removing illegible text does not improve the reader's
experience and may in fact make the book harder to read, as well as disrupt pagination. Therefore,
you may illegible text within the main content, but you should not simply .
Illegible text may be removed in material which is not required for a book to be complete if the
removal does not make the book harder to read or disrupt pagination. For example, removing lines
of illegible text in a Table of Contents would make a book harder to read since it would be obvious
that lines were missing, while removing illegible text caused by scanning a picture would make a book
easier to read. In the event that an entire page of illegible text is removed, then the page should be
left blank or contain a brief note about the contents which were removed in order to preserve page
numbering. See the following section for examples of notes which are permitted.
Adding text is a violation of your volunteer agreement and could result in a violation of copyright law.
An exception to this would be text entered to explain a picture, chart, or other graphic element rendered
unreadable by character recognition. You are not required to enter such explanatory text but may
do so if you choose.
Examples of brief notes which are permitted:
* Picture of Seabiscuit
* Map of Oz
* Diagram of the Starship Enterprise
* From the Book Jacket
It is often helpful to put such notes in square brackets in order to signal to readers that they are a
note from the submitter, i.e. [map of Oz].
Blank pages are often found in the following places in books:
* After the Table of Contents
* After the Dedication
* After the Acknowledgments
* After pages which identify a section of the book as Part I, Part II, etc.
* Between chapters
* Before the Glossary
* Before the Index
* Before the appendices
Blank pages should only be removed if doing so does not affect a book's page numbering. For
example, it is sometimes obvious that a blank page is page 85 because the page before it is
numbered 84 and the page after it is numbered 86. In this instance, the blank page should not be
removed, and it may be helpful to add the page number to the page to help prevent it from
accidentally being lost when converting the file to a different format, accidentally being removed by
a volunteer during proofreading, mis-numbered by the tool which processes page numbers for DAISY
books, or by the tool responsible for identifying original page numbers in BRF books.
Note that books often begin numbering pages with the first page of the main content, but page
numbering can also begin with the first page of a book. When numbering begins with the first page
of the book, then blank pages should be included in the front of the book to preserve page numbering.
Some books will have photos, advertisements, or other material in the middle of the book. Remove
these pages only if they do not disrupt the page numbers. Many times the pages on each side of
the material will be sequential (for example, page 112, six pages, page 113). In this case, please
remove the pages from the scan to preserve page numbering.
It's crucial that every book in the Bookshare library have the same page numbering as that in the
original physical book.
When a book is uploaded after proofreading, it's processed by a tool which attempts to determine the
correct page numbering throughout the book. This tool looks for a page number on each page.
However, page numbers are usually on the same line of a page as a "running header" or "running
footer" (text that recurs at the top or bottom of each page throughout the body of a book, such as the
book title, the author, or the title of each chapter). To identify a page number, the tool attempts to strip
out all the text of each running header in the hopes that what remains is the page number. Then it
stores the page number for that page.
But since the OCR usually will make mistakes in some headers, the tool isn't always successful at
stripping out all the running header text, from each page. The best method is to have a proofreader
remove the running header (or footer) from each page, leaving the page number as the only text
on that line.
Long-time Bookshare engineer Jake Brownell has written about this tool, called the RTF Converter,
saying it attempts to do four things:
1) Identify recurring text at the top and bottom of each page known as "running headers and footers"
2) Identify page numbers in the running headers and footers
3) Remove text from running headers and footers
4) Process the page numbers to allow easier navigation of DAISY books and include original
page numbers in BRF books
Volunteers can assist this tool by insuring that headers and footers are consistent throughout the book:
* Capitalization should be used consistently throughout headers and footers
* Tabs and spaces should be used consistently throughout headers and footers
* Page numbers and text in each header or footer should be on the same line
* Page numbers should be separated from text in a header or footer by a space or tab
Note that the text will be removed by the tool only if it is completely consistent. So the easiest way
to insure that the tool can identify page numbers is to remove all text except for the page
numbers themselves, from running headers and footers. This is not only
, but makes the tool's job more effective as well.
Please do the following for each page containing a header or footer:
* Make sure that if a page contains a header that the header is the first line of text on the page.
* Make sure that if a page contains a footer that the footer is the last line of text on the page.
Note that if a line contains no words of text, but does contain junk characters in the form of
punctuation marks or symbols, such as hyphens and asterisks, that line will still be considered as
a "line of text" by the tool. The tool cannot tell that these characters are junk characters, so instead
of removing such a useless line, the tool will keep this line. Proofreaders should remove these junk
characters so that the line contains no characters on it at all. The tool will then remove such "blank
TOP OF PAGE
The page Scanning And Proofreading Manual does not exist.