39 The Difference Between an Accessible PDF and a Scanned Image of Text
Heather Caprette
Adobe Reader is a free program for viewing documents that end with .pdf. Adobe Reader does not have the full functionality of Acrobat Professional, which is used to create accessible PDFs.
If you have a PDF you would like to use online, you can open it with Adobe Reader to check for some accessibility features. If a PDF is accessible, it will have tags and searchable text. Tags are the structural elements, similar to HTML elements, that provide semantic meaning to the content on the page. There are tags for headings, paragraphs, images, lists, tables, and table headers for example. It is these tags, much like Word Styles and properly marked up HTML, that help people using assistive technology and devices navigate the page and understand what content is.
You can check to see if the PDF has been tagged by opening the document in Adobe Reader and going under the File menu, selecting Properties. In the Document Properties pop-up window, look for the word “Yes” next to Tagged PDF at the bottom of the screen. Please be aware, that even though a document may have tags, they may not be semantically correct. Problems arise with automatic tagging that produce the wrong tags, or wrong parenting order and break the meaning that is conveyed visually. A common problem with automatic tagging is the breaking up of one list into multiple lists. Another is the addition of empty tags produced by spaces created by using the return key multiple times within a Word document, before it was converted to PDF.
To see if a document has searchable text, you can also check it within Adobe Reader by pressing your CTRL key plus “F.” Then in the “Find” search box that pops up, type a word that you see in the PDF and click the “Next” button. If the word becomes highlighted on the page, then it has searchable text as opposed to being a scanned image of text.
You can check for correct read order within an accessible, tagged PDF or one that has had optical character recognition done on it by going under the “View” menu and checking off “Read Mode.” You then go under the “View” menu, select “Read Out Loud,” and “Activate Read Out Loud.” Go back under the “View” menu and select “Read Out Loud,” and either “Read This Page Only,” or “Read To End Of Document.” Listen to see if the order of presentation makes sense for meaning. Pay attention to areas where diagrams with text have been inserted into pages. The text within these gets converted to searchable text that can be read, but often they get read in the middle of something else, breaking logical read order.
Often, you will find “full text PDF” versions of publisher’s articles on library databases. These articles have had optical character recognition run on them, and have searchable text, but don’t necessarily have correct read order, especially when figures or tables are inserted within the body of the text, or there are multi-column layouts. These are partially accessible, but better than a scanned image of an article.