Abbyy finereader 15 инструкция на русском - Все инструкции и руководства по применению

PDF-документы давно стали необходимой составляющей офисной работы. В этом формате хранятся цифровые архивы, юристы согласуют договоры, дизайнеры верстают брошюры, издательства публикуют электронные книги. До недавнего времени главным достоинством и одновременно с этим недостатком PDF-документов было отсутствие возможности редактировать текст в них. Благодаря развитию технологий эту и другие задачи научилась решать программа ABBYY FineReader, которая стала многофункциональным редактором любых документов. «Хайтек» вместе с ABBYY рассказывает, как технологически устроено редактирование PDF-документов в новой версии FineReader 15, каким образом программа сравнивает версии документов и как работает распознавание иероглифов с помощью нейросетей.

Читайте «Хайтек» в

Диджитализация документооборота массово началась еще во второй половине ХХ века. Многие предприятия переходили на электронные документы. В офисах устанавливали первые компьютеры со специальным софтом для обработки и хранения важной информации. Тогда и появились популярные текстовые редакторы. Сотрудники набирали вручную документы, а затем, с появлением в 1993 году PDF, стали экспортировать их в этот формат.

На первый взгляд казалось: если весь документооборот станет электронным, то о шкафах с бумажными каталогами и завалах на рабочих столах можно будет забыть. На практике оказалось, что чем больше организация использует компьютеры для цифрового документооборота, тем больше документов она печатает. 64% крупных компаний уверены, что по крайней мере до 2025 года печать будет значимой частью их бизнеса. С другой стороны, если сегодня в офис по традиционной почте приходит бумажный документ, его немедленно отсканируют и переведут в цифру. Как правило, сканы документов хранят в виде PDF-файлов.

Документом в формате PDF удобнее пользоваться — его можно послать по электронной почте с уверенностью, что информация дойдет до адресата без искажений (если, конечно, кто-то не решит внести изменения собственноручно), и, в отличие от DOC, его трудно изменить. Это особенно важно, если речь идет о контрактах или коммерческих предложениях.

Офисные сотрудники отмечают рост объемов использования PDF: каждый второй респондент ответил, что регулярно работает с документами в этом формате и нуждается в специализированной программе. За последние два года количество таких рабочих файлов в мире выросло в три раза — эти данные приводят эксперты IDC в исследовании «Addressing the document disconnect». В России PDF также пользуется популярностью. Также по результатам исследования ABBYY выяснилось, что в наиболее частые сценарии работы с PDF-документами вошли совершенно не типичные для этого формата ранее задачи: 52% респондентов вносят мелкие правки в текст PDF, исправляют ошибки или опечатки; 62% опрошенных часто ищут информацию в тексте PDF и 60% копируют текст из документа. Поэтому от программ, работающих с PDF, требуются новые возможности для редактирования, сравнения и распознавания текстов. Все они есть в новом FineReader 15.

Почему так сложно редактировать текст в PDF?

Изначально PDF не предназначался для того, что его каким-либо образом изменяли. Что было и его преимуществом — это безопасность, одинаковое отображение на любом устройстве и удобный способ обмена информацией, и недостатком — невозможность внесения правок, поиска по тексту и сравнения документов.

Особенности отображения текста в PDF

Несмотря на то, что PDF — это формат текста, в цифровом виде эти буквы, слова и предложения на самом деле не существуют, они «нарисованы». Содержимое хранится в виде потоков — это могут быть текст, изображения и векторная графика. Типичных для формата DOC слов, строчек, абзацев и таблиц в PDF нет. В формате нет и букв как таковых, а есть коды символов. Такие коды с одинаковыми характеристиками объединяются в группы по виду и размеру шрифта. Этот шрифт определяет, как символ должен отображаться в документе, сопоставляя код символа и глиф — набор команд для отрисовки. Еще одно отличие от обычного текстового документа — объекты в PDF существуют в трех измерениях. По координате Z судят о глубине расположения объекта на странице, ведь текст может находиться поверх изображения или наоборот.

Текст в PDF-документе напоминает «мешочек букв», который нужно правильно отобразить в конкретных местах документа с соответствующим форматированием.

С 2008 года PDF стал открытым форматом, что позволило разработчикам без проблем и дополнительных отчислений создавать программы для чтения файлов PDF, конвертеры и другие полезные вещи. Развитие OCR привело к тому, что у ранее неизменного PDF-документа появилась возможность редактирования — сначала построчного, а затем и в пределах абзацев.

Как ABBYY FineReader помогает редактировать PDF

Чтобы редактировать PDF-документ, его необходимо сначала подготовить к этому. Главная задача этого процесса — понять и проанализировать структуру текста. А ключевая сложность — отсутствие как абзацев, так и вообще форматирования в PDF. Поэтому сразу после того, как программа распознала текст, она начинает воссоздавать абзацы.

Если речь идет о digital-born-документе (изначально созданный на компьютере, а не отсканированный бумажный документ — «Хайтек»), то в режиме редактирования подключаются фоновые процессы, и программа приступает к анализу структуры документа. Для этого используется технология, которая строит блоки на основе данных, записанных в PDF, а не на основе распознавания. За считанные доли секунды технология должна пройти всю цепочку по определению параметров текста: места, где находятся заголовки, подзаголовки, отдельные абзацы и другие элементы. Потом — распихать «мешочки букв» по этим блокам, сформировать строки.

Следующий этап — синтез. Специальные технологии определяют внешние параметры текста — отступы и межстрочные интервалы. Благодаря этому из хаотичной структуры снова появляется текстовый документ с форматированием. И уже в него можно вносить правки — менять слова и целые абзацы, исправлять форматирование, сохранять изменения и так далее.

Функция построчного редактирования уже была в предыдущей версии FineReader (ABBYY FineReader 14 вышла в январе 2017 года — «Хайтек»). Этого было достаточно, чтобы внести небольшие исправления в текст: заменить несколько букв или цифр. Новый ABBYY FineReader 15 стал универсальным текстовым редактором, в котором вносить изменения можно в целые абзацы.

Как отредактировать текст в отсканированном документе

Отдельная офисная задача — отредактировать скан-копию бумажного документа. Раньше для этого пользователю приходилось конвертировать файл в редактируемый формат или просто искать исходник.

Когда пользователь редактирует скан, ABBYY FineReader 15 в первую очередь распознает документ и создает временный текстовый слой на тех страницах, которые пользователь просматривает. В режиме редактирования создается текстовое представление страницы — именно его редактирует пользователь. Затем эти правки встраиваются в изображение страницы в отсканированном документе.

Как найти в PDF внесенные правки и избежать обмана

Сравнение документов — особо важный для бизнеса сегмент офисных задач. Прежде всего, потому что неожиданные правки могут стоить очень больших денег. Иногда их незаметно пытаются внести в уже подписанный договор и воспользоваться человеческой невнимательностью — такие документы обычно сравнивают юристы, внимательно вычитывая распечатки оригинала, созданного в Word, и ответа контрагента — отсканированный вариант.

Поиск отличий в текстовых документах может быть полезен еще и в том случае, если над ними работают одновременно несколько человек или со временем один и тот же файл периодически изменяют. Это позволяет быстро найти последние правки, которые внесли в файл коллеги. В файлах DOCX для этого есть режим Track Changes, создающий на основе двух версий документа третью — с подсвеченными отличиями в тексте. В новом ABBYY FineReader 15 можно сохранить результаты сравнения любых документов в таком DOCX c Track Changes и в привычном режиме увидеть все различия.

Сравнивать в ABBYY FineReader 15 можно практически что угодно — PDF, сканы или изображения, файлы DOC, DOCX и даже таблицы из Excel. В программу загружаются оба документа, которые при необходимости распознаются с помощью OCR. На основе извлеченного текста в документе определяются дополнительные элементы форматирования — например, колонтитулы, нумерация списков. В программе используется специальный алгоритм, который позволяет быстро выявлять отличия в версиях документов.

Разностный алгоритм принимает два файла на вход. Первый, обычно более ранний — файл А, второй — файл B. Алгоритм определяет количество вставок или удалений, необходимых для превращения одного файла в другой, находя для этого кратчайший путь.

Сравнение проходит в три этапа. Сначала текст, полученный в результате распознавания, разбивается на параграфы. Алгоритм считает, что один параграф — это один объект для сравнения. Все несовпадающие фрагменты обрабатываются во время второго прохода алгоритма — уже по строчкам. Программа определяет, какие строки внутри параграфа совпадают не полностью.

Остается последний проход, уже в рамках несовпадающих строк, который сравнивает отдельные буквы. Этот процесс чуть сложнее: дополнительно используются различные эвристики — варианты распознавания. Если буквы совпадают по вариантам распознавания и процент уверенности распознавания этого элемента превышает 50%, то считается, что они эквивалентны. Не учитываются в качестве различий разные виды кавычек, скобок и маркеры списка.

Для каждого символа существует несколько вариантов распознавания: иногда их число доходит до 20. У каждого из этих вариантов есть процент уверенности, на сколько, по оценке технологии, буква соответствует отсканированному изображению. Затем в ходе анализа документа часть вариантов исключается, так как они не соответствуют эталону или не подходят по морфологии.

На этапе сравнения в программе запускается проверка: совпадает ли эта буква с той, что в документе? Если буква получена в результате распознавания, то проверяется похожесть символов в версиях и рассматриваются варианты распознавания. Возможно, «А» в бумажном документе распозналась ошибочно, и из-за этого при сравнении могут возникнуть разночтения. Тогда в вариантах распознавания ищется другая буква, у которой тоже высокий процент вероятности. Если вероятность больше 50%, в распознанном документе происходит замена. Это помогает избежать ошибок из-за плохого качества сканов.

Но поиск отличий в тексте — лишь один из этапов сравнения документов. Необходимо представить найденные отличия в том виде, в котором пользователю будет комфортно с ними работать. Например, слово «мама» заменили на «папа». По факту изменились только две буквы. Но более наглядно для пользователя будет выглядеть полная замена одного слова на другое, а не замененные на «п» буквы «м». Поэтому программа дорабатывает различия: растягивает и объединяет их до конца слова, строки или параграфа. Программа пытается восстановить логику, по которой действовал человек, вносивший исправления. И сделать так, чтобы различие выглядело более естественно и читалось понятно.

В завершении работы с документами программа объединяет обнаруженные различия в группы. Это необходимо, например, чтобы отделить внесенные исправления в основном тексте от колонтитулов и нумерации списка. В большинстве случаев колонтитулы не интересуют пользователя с точки зрения сравнения, за исключением вставок. Например, если у вас есть список на 100 позиций, в середине которого добавили или изменили один из пунктов. Чтобы работать с документом было удобнее, различия в нумерации попадают в отдельную группу.

В финале пользователь может посмотреть все исправления в документе так, как ему удобно. На выбор есть несколько способов: сохранить новую версию документа в формате DOCX, где все изменения уже подсвечиваются в режиме Track Changes, получить PDF с комментариями в местах изменений или создать таблицу с перечнем правок в Word.

Среди поддерживаемых ABBYY FineReader 15 функций:

просмотр PDF-документов;
редактирование текста в PDF-документе в пределах абзаца;
удаление конфиденциальных данных;
сравнение документов разного формата и написанных на разных языках;
автоматизация задач по оцифровке и конвертации;
распознавание и конвертирование документов;
комментирование и согласование;
защита и цифровая подпись.

Как работают нейросети для распознавания иероглифов и арабской вязи

Распознавание иероглифов осложняется тем, что в отличие от европейских языков, они состоят из большого количества черточек, палочек, наклонов. Но размер иероглифов вполне сопоставим с размером европейских букв. В низком разрешении сканов иероглифы могут и вовсе выглядеть как кляксы. Носитель языка поймет символ, исходя из контекста. Программа же работает поэтапно: сначала анализирует изображение всего документа, определяет абзацы, разбивает распознанные строки на слова, а слова — на отдельные символы. На этом этапе алгоритмы опираются не на контекст, как человек, а на внешний вид иероглифа, и здесь многое зависит от качества изображения. Для распознавания японского, китайского и корейского языков компания ABBYY внедрила нейросети. Они решают две главные задачи при работе с иероглифами — улучшение качества распознавания и «модернизацию» языков.

Качество и скорость в быстром и нормальном режиме

Внедрение нейросетей значительно повысило качество распознавания японского и китайского в быстром режиме, но скорость работы на начальном этапе разработки снизилась. Для клиентов, работающих с большим потоком документов, даже небольшая просадка по скорости может привести к сильному замедлению в обработке данных. Оказалось, что скорость проседает в документах с большим количеством символов с простой структурой — таких, как японская буквенная азбука (в современном японском языке используется три основных системы письма: кандзи — иероглифы китайского происхождения и две слоговые азбуки, созданные в Японии — хирагана и катакана — «Хайтек»).

Катакана

Кандзи

Эту проблему решили с помощью кэша. Когда программа распознает страницу, одна и та же буква может попадаться на ней несколько раз. Встретив букву «А», написанную одним и тем же шрифтом, ABBYY FineReader анализирует и запоминает ее особенности. Этот принцип оптимизации позволяет не тратить время на распознавание одинаковых символов. Для японского и китайского ранее не использовался кэш, потому что встретить один и тот же иероглиф на странице, написанной естественным языком, можно очень редко. Но для символов с простой структурой это оказалось полезным. Включение кэша позволило ускорить и нормальный, и быстрый режим распознавания.

Почему важно следить за развитием языка

В предыдущих версиях FineReader в японском языке присутствовали иероглифы, которые уже не используются в современных документах. Это заметили сотрудники японского офиса ABBYY: время от времени программа вставляла при распознавании один-два устаревших символа. Для рядового носителя языка это воспринимается как буквы из русского дореволюционного алфавита для нас. Чтобы исправить эту ошибку, потребовалось создать в программе «новый язык» — Japanese Modern. Легко заставить программу не отображать те или иные устаревшие символы. Но необходимо было не просто выбросить ненужное, но и оставить всё необходимое, найти множество иероглифов, которые отображают всё богатство современного японского языка.

Результаты распознавания до и после внедрения нейросетей.

Новое множество символов формировалось в несколько этапов. Для тестирования создавали подходящие наборы изображений документов. Если в пакет попадала хотя бы одна страница с устаревшими формами, весь комплект оказывался непригодным. Приходилось вынимать эту страничку и формировать новый комплект материалов. Наконец удалось добиться того, чтобы в результатах распознавания почти не было устаревших символов и при этом правильно отображались все современные иероглифы.

Для китайского в FineReader всегда поддерживали традиционный и упрощенный языки. При этом по составу символов они не отличались. Получить разный результат распознавания всё равно было возможно, потому что в программе было заложено разное распределение вероятностей. В новой версии в результате экспериментов удалось выделить символы, необходимые для распознавания упрощенного китайского. В FineReader заложена возможность создавать пользовательский язык. Используя этот инструмент и внося изменения в состав, специалисты сравнивали результаты распознавания на разных образцах документов, и в результате в упрощенном китайском остался только необходимый набор иероглифов.

Корейская письменность, хангыль — нечто среднее между китайским и европейским письмом. Внешне это квадратные символы, напоминающие иероглифы, и на одной странице текста можно насчитать больше сотни уникальных. С другой стороны, это фонетическая письменность, то есть основанная на записывании звуков. Имеется алфавит, содержащий 24 буквы (плюс можно дополнительно посчитать диграфы и дифтонги). Но, в отличие от латиницы или кириллицы, звуки пишутся не в линию, а объединяются в блоки. Каждый блок может состоять из двух, трех или четырех букв. Первой всегда идет согласная, затем одна или две гласных, и в конце может стоять еще одна согласная. Для корейского обучили отдельную нейросеть, которая, помимо корейских слогов, распознает и некоторые иероглифы. Вместо распознавания символов целиком технология определяет отдельные буквы в них.

Как резать арабскую вязь на фрагменты

Арабский язык отличается от других тем, что найти линии порезки между символами в арабской вязи очень сложно. Даже гистограмма при распознавании арабского отличается: выглядит как бесконечный набор горбиков и ямочек.

Варианты разделения текста на символы создаются всегда, даже для европейских языков. В процессе работы программа выбирает наиболее вероятный путь распознавания. В случае с арабским языком таких вариантов очень много, и это приводило к ошибкам. Поэтому для повышения точности программу научили видеть не отдельную букву, а всё слово целиком. Для этого была разработана сеть end-to-end (e2e). Она полезна не только для арабского, но и для европейских языков — например, в дизайнерских шрифтах, когда на изображениях сложно построить путь для распознавания.

Результаты распознавания до и после внедрения нейросетей.

При e2e-подходе на вход в нейросеть поступает набор изображений — фрагментов, состоящих из отдельных слов. На выходе такая нейросеть выдает последовательность графем, которые затем проходят дополнительную обработку: проводится словарный анализ, корректируются пробелы.

Для обучения использовался набор из нескольких сотен тысяч фрагментов — отдельные слова из отсканированных газет, журналов, официальных документов. Они были выбраны в несколько итераций: сначала собирали базу из слов, которые удачно распознали, и обучали нейросеть на этом датасете. Потом еще раз обучали, корректировали, выявляли ошибки. Часть, которую не смогли распознать, отдельно отдавали на доразметку и корректировку фрагментов. В результате всё больше очищали датасет для обучения, улучшая общее качество распознавания.

Кроме того, часть данных для обучения была создана искусственно. Это было необходимо для распознавания шрифтов, для которых было собрано мало образцов. В таких случаях использовался корпус текста, в который добавлялись различные искажения, типичные для этапа сканирования документа: шум, размытие символа. Это делала в автоматическом режиме специальная программа — генератор синтетики, или «портилка».

Сначала в ходе обучения такой подход привел к тому, что потерялась информация об охватывающих прямоугольниках символов, которые необходимо отображать для пользователя на этапе верификации. Отказавшись от посимвольного распознавания, пришлось внедрить альтернативный механизм, который дополнял результаты распознавания информацией об охватывающих прямоугольниках и резал слова на отдельные символы.

Сочетание новых алгоритмов машинного обучения сделало возможным создание многофункционального текстового редактора для работы с PDF, сканами и digital-born-документами. Внесение правок, сравнение файлов и распознавание сложных языков дает пользователю возможность полноценно работать с файлами вне зависимости от их формата. По сути, это позволяет охватить все спектры офисных задач по работе с электронными и даже бумажными документами, максимально упрощая работу сотрудникам и снижая вероятность ошибок из-за человеческого фактора.

Источник

Главная

ABBYY FineReader 15 на русском бесплатно

	Категория:	Распознавание текста
Поддерживаемые ОС:	Windows 10/8/7
Разрядность:	32 bit, 64 bit, x32, x64
Для устройств:	Компьютер
Язык интерфейса:	На Русском
Версия:	FineReader 15 (Corporate)
Разработчик:	ABBYY

ABBYY FineReader 15 – программа для работы с бумажными и PDF-документами на цифровом пространстве. Работа инструмента основывается на оптическом распознавании символов с применением искусственного интеллекта. Благодаря этому осуществляется извлечение нужной информации из документа с повышением производительности работы пользователя. Программа создает, оцифровывает, конвертирует и редактирует документы бумажного и PDF формата с возможность поиска и защиты информации при совместной работе с пользователями. Приложение конвертирует текст с отсканированных изображений, фотографий в формат DOC. Для пользователей доступно создание нового PDF из нескольких документов, добавление к нему цифровой подписи и водяных знаков.

Возможности программы Файн ридер 15:

Обновленный FineReader 15 версии поддерживает машинное обучение и искусственный интеллект, что улучшает распознавание PDF, даже если присутствует битая кодировка или битый текстовый слой. Программа лучше распознает таблицы и колонтитулы. Обновленный вариант распознает японский и корейский языки, представляет таблицы, составленные в Excel, где текст пишется с права на лево с расставлением автоматических тегов при сохранении. FineReader доработан многострочным редактированием в пределах абзаца – автоматическое перераспределение текста по строкам, если добавляются или удаляются слова и фрагменты. В 15 версии файлы открываются на 40% быстрее. Изначально формат PDF не предназначен для того, чтобы вносить в нем правки, теперь FineReader редактирует абзацы.

Особенности программы FineReader 15:

Программа рассчитана на операционные системы — Windows 10/8/7

Microsoft Windows Server 2019, 2016, 2012 R2 (с учетом применения Web Access), Citrix Workspace App 1808, Citrix Virtual Apps and Desktops. Если применять локализованный интерфейс, для операционной системы в обязательном порядке характерно владеть требуемой языковой поддержкой. Преимущество нового варианта ABBYY FineReader PDF 15 – добавление заполняемых текстовых полей, раскрывающихся списков, опросов, кнопок для инициализации требуемых действий. Для отклонения или принятия предлагаемых поправок программа сохраняет результаты обработки в режиме Track Changes (режим исправлений).

Версии FineReader для windows

ABBYY FineReader для Windows 10
ABBYY FineReader для Windows 8.1
ABBYY FineReader для Windows 8

ABBYY FineReader для Windows 7
ABBYY FineReader для Windows Vista
ABBYY FineReader для Windows XP

Файн ридер 15 на русском языке скачать бесплатно:

Название	Платформа	Распаковщик	Язык	Формат	Версия	Загрузка
ABBYY FineReader 15	Windows 10	OpenBox	Русский	x32 — x64	Бесплатно	Скачать .torrent
ABBYY FineReader 15	Windows 8	OpenBox	Русский	x32 — x64	Бесплатно	Скачать .torrent
ABBYY FineReader 15	Windows 7	OpenBox	Русский	x32 — x64	Бесплатно	Скачать .torrent
Внимание!! Чтоб активировать FineReader 15 переместите файл Awl.dll из архива, в папку установленной программы. (с заменой существующего там файла)

Путь куда должен переместится файл из архива, по умолчанию это — «C:Program Files (x86)ABBYY FineReader 15»

* Файлы доступные на сайте можно скачать абсолютно бесплатно без регистрации и без смс, файл проверен!

Как установить ABBY FineReader PDF 15:

Ознакомьтесь с информацией и нажмите «Далее».

ABBY FineReader 15 бесплатно установить скрин 1

Нажмите «Далее», или снимите галочки с рекомендуемого ПО и нажмите «Далее». Если снять все галочки, рекомендуемое ПО не установится!

Установка ABBY FineReader 15 (Yandex) скрин 2

Дождитесь распаковки.

Установка ABBY FineReader 15 (Yandex) скрин 3

Устанавливаем программу, она полностью на Русском.

Установка finereader 15

Внимание!! Во время установки выставите такие параметры, если вы установите проверять обновления, есть вариант что при обновлении слетит активация (которую вы проведете бесплатно скачав необходимые файлы в блоке загрузок)

finereader 15 этапы установки

Дождитесь окончания установки, далее не спешите открывать программу все равно из нее выходить придется, чтоб активировать.

FineReader 15 процесс установки

Перенесите файлы из архива в папку установленной программы, запускайте программу, она будет работать вечно.

Совет!! Во время работы в активированной версии Abby FineReader 15 будет периодически предлагать включить обновления или передавать данные о работе программы, не соглашайтесь, есть вероятность что активация слетит.

Как активировать ABBYY FineReader PDF 15

OpenBox — Утилита которая распаковывает и запускает установщик необходимой программы. Имеет безопасную, встроенную рекламу! Подробнее на www.openbox.su

Источник

ABBYY® FineReader 15

User’s Guide

ABBYY® FineReader 15 User’s Guide

Information in this document is subject to change without notice and does not bear any commitment on the part of ABBYY.

The software described in this document is supplied under a license agreement. The software may only be used or copied in strict accordance

with the terms of the agreement. It is a breach of the «On legal protection of software and databases» law of the Russian Federation and of

international law to copy the software onto any medium unless specifically allowed in the license agreement or nondisclosure agreements.

No part of this document may be reproduced or transmitted in any from or by any means, electronic or other, for any purpose, without the

express written permission of ABBYY.

ABBYY® FineReader 15 User’s Guide
Contents
Introducing ABBYY FineReader …………………………………………………………………………………………………	8
About ABBYY FineReader ………………………………………………………………………………………………………………………………	9
What’s New in ABBYY FineReader 15 ………………………………………………………………………………………………………..	11
The New Task window …………………………………………………………………………………………………………….	13
Viewing and editing PDFs ……………………………………………………………………………………………………………………………	15
Quick conversion …………………………………………………………………………………………………………………………………………..	17
Creating PDF documents …………………………………………………………………………………………………………………..	20
Creating Microsoft Word documents ………………………………………………………………………………………………	22
Creating Microsoft Excel spreadsheets ……………………………………………………………………………………………	24
Other formats ………………………………………………………………………………………………………………………………………	26
Advanced conversion …………………………………………………………………………………………………………………………………..	26
Comparing documents ………………………………………………………………………………………………………………………………..	30
Scanning and saving documents ……………………………………………………………………………………………………………….	33
Scanning to the OCR Editor ………………………………………………………………………………………………………………	36
Scanning to PDF ………………………………………………………………………………………………………………………………….	38
Scanning to Microsoft Word ……………………………………………………………………………………………………………..	40
Scanning to Microsoft Excel ………………………………………………………………………………………………………………	42
Scanning to image files ………………………………………………………………………………………………………………………	44
Scanning to other formats …………………………………………………………………………………………………………………	46
PDF Editor ………………………………………………………………………………………………………………………………	47
Viewing PDF documents ……………………………………………………………………………………………………………………………..	48
Viewing modes …………………………………………………………………………………………………………………………………….	49
Navigating PDF documents ………………………………………………………………………………………………………………	52
Background recognition …………………………………………………………………………………………………………………….	54
Keyword search ……………………………………………………………………………………………………………………………………	55
Copying content from PDF documents …………………………………………………………………………………………..	57
PDF security features ………………………………………………………………………………………………………………………….	58
Reviewing PDF documents ………………………………………………………………………………………………………………………….	59
Comments …………………………………………………………………………………………………………………………………………….	59
Marking up text …………………………………………………………………………………………………………………………………..	60
Drawing shapes ……………………………………………………………………………………………………………………………………	62
Adding a Text block annotation to a PDF document ……………………………………………………………………	65
Collaborating on PDF documents …………………………………………………………………………………………………….	66
Adding stamps …………………………………………………………………………………………………………………………………….	71
Working with PDF content ………………………………………………………………………………………………………………………….	74
Inserting and editing text …………………………………………………………………………………………………………………..	75
Inserting and editing pictures …………………………………………………………………………………………………………..	83
Inserting and editing hyperlinks ……………………………………………………………………………………………………….	84
Recognizing text ………………………………………………………………………………………………………………………………….	86

ABBYY® FineReader 15 User’s Guide
Contents
Working with pages ……………………………………………………………………………………………………………………………	87
Adding bookmarks ……………………………………………………………………………………………………………………………..	91
Adding headers and footers ……………………………………………………………………………………………………………..	93
Adding watermarks …………………………………………………………………………………………………………………………….	96
Adding file attachments …………………………………………………………………………………………………………………….	97
Viewing metadata ……………………………………………………………………………………………………………………………….	98
Enhancing page images ……………………………………………………………………………………………………………………..	99
Filling out forms ………………………………………………………………………………………………………………………………………….	100
Signing PDF documents ……………………………………………………………………………………………………………………………	101
Digital signature ………………………………………………………………………………………………………………………………..	102
Text signature …………………………………………………………………………………………………………………………………….	104
Picture signature ……………………………………………………………………………………………………………………………….	105
Protecting PDF documents with passwords …………………………………………………………………………………………..	105
Passwords and permissions …………………………………………………………………………………………………………….	106
Deleting confidential information from PDF documents …………………………………………………………..	107
Creating PDF documents …………………………………………………………………………………………………………………………..	108
Creating PDF documents from selected pages ……………………………………………………………………………	108
Using a virtual printer to create PDF documents ………………………………………………………………………..	108
Saving and exporting PDF documents ……………………………………………………………………………………………………	109
Saving PDF documents …………………………………………………………………………………………………………………….	110
Saving in PDF/A …………………………………………………………………………………………………………………………………	111
Saving in other formats ……………………………………………………………………………………………………………………	113
Reducing the size of your PDF documents …………………………………………………………………………………..	113
Sending PDF documents to the OCR Editor …………………………………………………………………………………	114
E-mailing PDF documents ……………………………………………………………………………………………………………….	114
Printing PDF documents ………………………………………………………………………………………………………………….	115
OCR Editor ……………………………………………………………………………………………………………………………	116
Launching the OCR Editor …………………………………………………………………………………………………………………………	116
OCR Editor interface …………………………………………………………………………………………………………………………………..	117
Obtaining documents ………………………………………………………………………………………………………………………………..	121
Opening images and PDFs ………………………………………………………………………………………………………………	122
Scanning paper documents …………………………………………………………………………………………………………….	123
Recognizing documents ……………………………………………………………………………………………………………………………	124
OCR projects ………………………………………………………………………………………………………………………………………	125
Group work with OCR projects ……………………………………………………………………………………………………….	130
Improving OCR results ………………………………………………………………………………………………………………………………	131
If your document image has defects and OCR accuracy is low ………………………………………………..	132
If areas are detected incorrectly ……………………………………………………………………………………………………..	135
Editing area properties ………………………………………………………………………………………………………………	139
If the complex structure of a paper document is not reproduced …………………………………………..	140

ABBYY® FineReader 15 User’s Guide
Contents
If you are processing a large number of documents with identical layouts …………………………..	141
If tables and pictures are not detected …………………………………………………………………………………………	142
If a barcode is not detected …………………………………………………………………………………………………………….	145
If an incorrect font is used or some characters are replaced with «?» or «□» ………………………….	146
If your printed document contains non-standard fonts …………………………………………………………….	147
If your document contains many specialized terms ……………………………………………………………………	150
If the program fails to recognize certain characters …………………………………………………………………..	151
If vertical or inverted text was not recognized …………………………………………………………………………….	153
Checking and editing texts ……………………………………………………………………………………………………………………….	154
Checking recognized text ………………………………………………………………………………………………………………..	155
Using styles ………………………………………………………………………………………………………………………………………..	158
Editing hyperlinks ……………………………………………………………………………………………………………………………..	159
Editing tables ……………………………………………………………………………………………………………………………………..	160
Removing confidential information ……………………………………………………………………………………………….	160
Copying content from documents ………………………………………………………………………………………………………….	161
Saving OCR results ……………………………………………………………………………………………………………………………………..	162
Saving in PDF …………………………………………………………………………………………………………………………………….	164
Saving editable documents …………………………………………………………………………………………………………….	165
Saving tables ……………………………………………………………………………………………………………………………………..	167
Saving e-books ………………………………………………………………………………………………………………………………….	168
Saving in HTML ………………………………………………………………………………………………………………………………….	169
Saving images ……………………………………………………………………………………………………………………………………	170
Sending OCR results to the PDF Editor …………………………………………………………………………………………	171
E-mailing OCR results ………………………………………………………………………………………………………………………	172
Sending OCR results to Kindle ………………………………………………………………………………………………………..	173
Integration with other applications ……………………………………………………………………………………….	173
Integration with Windows Explorer …………………………………………………………………………………………………………	174
Integration with Microsoft SharePoint ……………………………………………………………………………………………………	177
Automating and scheduling OCR …………………………………………………………………………………………..	178
Automating document processing with ABBYY FineReader ……………………………………………………………….	179
ABBYY Hot Folder ……………………………………………………………………………………………………………………………………….	182
ABBYY Compare Documents …………………………………………………………………………………………………	187
Launching ABBYY Compare Documents ………………………………………………………………………………………………..	188
Comparing documents ……………………………………………………………………………………………………………………………..	189
The main window ……………………………………………………………………………………………………………………………………….	191
Improving comparison results ………………………………………………………………………………………………………………….	192
Viewing comparison results ……………………………………………………………………………………………………………………..	194
Saving comparison results ………………………………………………………………………………………………………………………..	196

ABBYY® FineReader 15 User’s Guide
Contents
ABBYY Screenshot Reader …………………………………………………………………………………………………….	197
Reference ……………………………………………………………………………………………………………………………..	201
How to set ABBYY FineReader 15 as your default PDF viewer …………………………………………………………..	202
Types of PDF documents …………………………………………………………………………………………………………………………..	204
Scanning tips ……………………………………………………………………………………………………………………………………………….	206
Taking photos of documents ……………………………………………………………………………………………………………………	209
Options dialog box …………………………………………………………………………………………………………………………………….	212
Format settings …………………………………………………………………………………………………………………………………………..	215
PDF settings ……………………………………………………………………………………………………………………………………….	215
DOC(X)/RTF/ODT settings ……………………………………………………………………………………………………………….	220
XLS(X) settings …………………………………………………………………………………………………………………………………..	223
PPTX settings ……………………………………………………………………………………………………………………………………..	224
CSV settings ……………………………………………………………………………………………………………………………………….	224
TXT settings ………………………………………………………………………………………………………………………………………..	225
HTML settings ……………………………………………………………………………………………………………………………………	226
EPUB/FB2 settings …………………………………………………………………………………………………………………………….	227
DjVu settings ……………………………………………………………………………………………………………………………………..	228
Supported OCR and document comparison languages ……………………………………………………………………..	230
Supported document formats ………………………………………………………………………………………………………………….	237
Document features to consider prior to OCR ……………………………………………………………………………………….	240
Image processing options …………………………………………………………………………………………………………………………	243
OCR options ………………………………………………………………………………………………………………………………………………..	246
Working with complex-script languages ………………………………………………………………………………………………..	249
Supported interface languages ………………………………………………………………………………………………………………..	253
Current date and time on stamps and in headers and footers ………………………………………………………….	254
Fonts required for the correct display of texts in supported languages …………………………………………..	257
Regular expressions ……………………………………………………………………………………………………………………………………	259
Installing, activating, and registering ABBYY FineReader ………………………………………………………	261
System requirements …………………………………………………………………………………………………………………………………	262
Installing and starting ABBYY FineReader ………………………………………………………………………………………………	263
Activating ABBYY FineReader …………………………………………………………………………………………………………………..	264
Registering ABBYY FineReader …………………………………………………………………………………………………………………	266
Data privacy …………………………………………………………………………………………………………………………………………………	266
Appendix ……………………………………………………………………………………………………………………………..	266
Glossary ………………………………………………………………………………………………………………………………………………………..	267
Keyboard shortcuts …………………………………………………………………………………………………………………………………….	273
Technical support …………………………………………………………………………………………………………………	284

ABBYY® FineReader 15 User’s Guide

Contents

Third-party software …………………………………………………………………………………………………………….

284

ABBYY® FineReader 15 User’s Guide

Introducing ABBYY FineReader

This chapter provides an overview of ABBYY FineReader and its features.

Chapter contents

About ABBYY FineReader 9

What’s New in ABBYY FineReader 11

ABBYY® FineReader 15 User’s Guide

About ABBYY FineReader

ABBYY FineReader 15 is a universal PDF tool for managing documents in the digital workplace. Powered

by ABBYY’s AI-based OCR and document-conversion technologies, FineReader unlocks the information

contained within a document to increase business productivity. FineReader makes it easy and efficient

to digitize, retrieve, edit, convert, protect, share, and collaborate on all kinds of PDF and paper

documents in the modern working world.

What you can do with ABBYY FineReader 15:

Work with any type of PDF, including scanned documents

o Edit text 75 (even whole paragraphs, also within table cells), hyperlinks 84 , and pictures 83

throughout a document.

o Search by keywords 55 in the text, comments, bookmarks, and metadata within a document.

o Rearrange, add, delete 87 and enhance (rotate, crop, deskew) 99 pages in PDFs.

o Copy text, tables, and pictures 57 from PDFs, scans, or photos in a few clicks.

o Export PDFs into Microsoft Word, Excel, or another editable format 109 .

o Add comments and annotations 59 to documents.

o Add watermarks 96 , headers and footers, bates numbering, 93 and stamps 71 to PDFs.

o Apply and verify digital signatures 102 .

o Protect PDFs 106 with passwords and encryption.

o Remove sensitive information 107 from documents through redaction.

o Create / combine PDF documents 110 , including industry standards PDF/A for long-term

archiving 111 and PDF/UA for accessibility.

o Fill out PDF forms 100 .

o View 48 and print 115 PDFs.

o Identify differences 187 in the text from two versions of the same document, whether the

versions be PDFs, scans, images, Microsoft Word documents, or any combination of

supported digital files.

o Save and share 196 these differences as a Microsoft Word document in Track Changes mode.

Scan and convert documents 13

oScan and convert PDF and paper documents into editable and searchable formats (including

Microsoft Word, Microsoft Excel, searchable PDF, PDF/A, PDF/UA, and many more) to further

edit, reuse, or store them.

ABBYY® FineReader 15 User’s Guide

o Convert paper documents, document images, and PDFs quickly and accurately — while

retaining their original layout, formatting, and structure — with the advanced OCR Editor.

o Improve quality and correct distortions in the digital images of documents

(scans/photographs) before converting them, either automatically or manually.

oQuickly check how recognized text matches up with the original document and make any

necessary changes with a built-in text editor and verification tools before saving.

o Improve the accuracy of converting documents with advanced tools: adjust or specify

document areas, train the program to recognize unusual or decorative fonts, and create user

dictionaries and languages for specific terminology, abbreviations, codes, etc.

Compare texts to identify differences 187 between two versions of the same document*

o ABBYY FineReader 15 can compare two versions of the same document even if they are in

two different formats. For example, you can compare a scanned document and the same

document in Microsoft Word (in either DOC and DOCX formats).

o Save and share the differences as a simple list of changes, as an entire Microsoft Word

document in Track Changes mode, or as an entire PDF with highlighted text mark-ups and

comments.

Automate your personal document conversion routines with ABBYY Hot Folder 182 *

oABBYY Hot Folder is a conversion scheduling tool included with ABBYY FineReader 15 that

watches for documents in user-defined folders and converts them on a schedule with pre-set

parameters.

Take a snapshot of any part of the screen with ABBYY Screenshot Reader 197

o If a screenshot contains text or tables, it can also be extracted and saved as an editable

format.**

*This feature is not available with all versions of ABBYY FineReader 15. Please visit our home page for

more information.

**In order to use ABBYY Screenshot Reader, you must first register your copy of ABBYY FineReader 15.

ABBYY® FineReader 15 User’s Guide

What’s New in ABBYY FineReader 15

Editing PDFs

Editing text within paragraphs

Now, editing text in PDFs of any kind – including scanned documents – can be done within a

whole block of text. When you add or delete text, it automatically flows from line to line, which

provides you with a convenience and freedom similar to editing in a word processor like

Microsoft Word.

Reformatting text

You can also change the text formatting (font type and size, typeface, color, line spacing, text

alignment, and direction) either for the whole paragraph or for only a selection of text.

Editing page layout

You can even change the layout of any page in a PDF. Add or delete paragraphs, change their

positioning or order, and make them wider, narrower, higher, or lower to align them with rest

of the page. Throughout the process, the text will automatically flow into the layout to fit with

the changes you make.

Editing table cells

Each cell in a table can now be edited individually, as a separate paragraph, and it will not

affect content in the other cells in the same row.

Viewing PDFs

Faster viewer

FineReader’s PDF viewer has become 1.5x faster. Opening any kind of PDF is now as quick as

you would expect.

Converting PDFs

Detecting text-layer quality

Detect the quality of a text layer when working with digital PDFs. If the text layer in a page is

problematic (corrupted, encoding problems, etc.), FineReader applies OCR to convert the

whole page rather than extracting the text layer. This allows for the most accurate results when

converting digitally-created PDFs into editable formats.

Detecting text in fields and annotations

When converting an interactive PDF form or a PDF with annotations into an editable format,

FineReader ensures that the text from fields and certain kinds of annotations (such as a Text

Box or Typewriter) is accurately and reliably extracted.

Improved layout retention

Reconstructing paragraphs when converting digital PDFs into editable formats has been

improved as well.

ABBYY® FineReader 15 User’s Guide

Comparing documents

Export in Track Changes mode

Now, you can export the comparison results as a Microsoft Word document highlighting the

differences in Track Changes mode, the mode commonly used in organizations, particularly in

the legal field.

Even more accurate comparisons

Thanks to the improvements in converting digital PDFs, you can compare such documents with

any other type of supported format even more precisely than before.

New comparison language

Comparing documents in Armenian is now possible, with 38 comparison languages in total.

Improved OCR

ABBYY’s latest OCR technology provides improvements to a variety of features in ABBYY FineReader 15:

more reliable detection of headers and footers; more accurate document conversion in Japanese and

Korean; improved retention of table structure when saving to Excel in languages written from right to

left; and better automatic tagging when saving to tagged PDFs (including PDF/UA).

Improvements for organizations

Remote User licenses

Based on access for named users, Remote User licenses allow organizations to use FineReader

with desktop and application virtualization solutions, such as Microsoft Remote Desktop

Services (RDS), Citrix XenApp, and Citrix Virtual Apps and Desktops. Please refer to

FineReader’s Administrator Guide for detailed information.

Improved product customization with GPO

The list of possibilities to customize FineReader for specific users/workstations using GPO

(Group Policy Objects) has increased to include the following options:

o Define the maximum number of workstation CPUs used by FineReader.

o Set a user inactivity timeout to force the release of licenses for workstations that use

concurrent licenses.

o Take advantage of ADMX/ADML templates.

We would like to extend our sincere appreciation for all the users who have contributed feedback and

helped us broaden FineReader’s capabilities to make it more useful in daily work.

ABBYY® FineReader 15 User’s Guide

The New Task window

When you launch ABBYY FineReader, a New Task window opens, where you can easily open, scan,

create, or compare documents. If you don’t see the New Task window (e.g. if you closed it or if you

initiated an ABBYY FineReader task by right-clicking a file in Windows Explorer), you can always open it

by clicking the button on the main toolbar.

To start processing a document, select a task:

1. In the left-hand pane:

Click Open if you already have documents that you need to process.

Click Scan if you need to scan paper documents first.

Click Compare if you want to compare two versions of the same document.

Click Recent to resume work on a previously saved PDF document or OCR project.

ABBYY® FineReader 15 User’s Guide

2. In the right-hand pane, select the appropriate task.

For your convenience, when you hover the mouse cursor over a task, a pop-up window appears

listing the most common scenarios covered by that task.

The settings for all ABBYY FineReader tasks are specified in the Options 212 dialog box. To open this

dialog box, click Options at the bottom of the left-hand pane.

Chapter contents

Viewing and editing PDF documents 15

Quick conversion 17

Advanced conversion 26

Comparing documents 30

Scanning and saving documents 33

ABBYY® FineReader 15 User’s Guide

Viewing and editing PDFs

With ABBYY FineReader, you can easily view, edit, comment, and search inside any type of PDF

documents, even those that were obtained by simply scanning a paper document and so do not

contain any searchable or editable text.

Viewing PDF documents and adding your comments

On the New Task screen, click the Open tab and then click Open PDF Document. The selected

document will be opened in the PDF Editor for viewing 48 and commenting 59 .

Use the Pages 52 , Bookmarks 91 , Search 55 and Comments 66 buttons to navigate around the

document.

ABBYY FineReader offers the following commenting tools:

Add Note 59

Highlight 60 , Underline, Strikethrough, and Insert Text

Draw 62 Shape, Line, or Arrow

ABBYY® FineReader 15 User’s Guide

If you don’t see the commenting tools, click the button.

Editing PDF documents

ABBYY FineReader offers the following editing tools:

See also: Editing text 75 , Inserting and editing pictures 83 .

Protecting PDF documents 105

With ABBYY FineReader, you can:

See also: Digital signatures 102 , Removing confidential information from PDF documents 107 , Passwords

and permissions 106 .

Filling out forms 100

ABBYY FineReader allows you to fill out, save, and print interactive forms.

When you open a PDF that contains an interactive form, the form fields are highlighted, inviting you to

select a value from the drop-down list or type in some information.

If you encounter a form that cannot be filled out by simply typing text in the empty fields, use the

Export tool to type the necessary information over the form. See also: Filling out forms 100 .

For more information on working with PDF documents, see Working with PDF documents 47 .

ABBYY® FineReader 15 User’s Guide

Quick conversion

You can use the built-in tasks on the Open tab of the New Task screen to convert PDF documents or

images or create a new PDF from files in various formats.

Converting one or more files

1. Click the Open tab and then click a desired task:

Convert to PDF creates PDF documents from *.docx, *.html, *.jpeg, and other files. You

can also use this task to combine multiple files into one PDF document.

Convert to Microsoft Word creates Word documents from PDF and image files. You can

also use this task to combine multiple files into one Microsoft Word document.

Convert to Microsoft Excel®creates Excel spreadsheets from PDF and image files. You

can also use this task to combine multiple files into one Excel document.

Convert to Other Formats converts PDF and image files into popular formats, including

*.odt, *.pptx, *.epub, *.html, and many more.

ABBYY® FineReader 15 User’s Guide

2. In the dialog box that opens, select one or more files to convert.

3.Specify conversion settings.

These settings determine the appearance and properties of the output document.

4.Add or remove files if necessary.

5.Click the Convert to <format> button.

6.Specify a destination folder for the output file.

When the task is completed, the resulting file will be placed into the folder that you specified.

Combining files

1.Click the Open tab and then click a desired task.

2.In the dialog box that opens, select the files that you want to convert.

3.Specify conversion settings.

ABBYY® FineReader 15 User’s Guide

4. Add or remove files if necessary.

5.Arrange the files in the desired order and select the Combine all files into one document

option.

6.Click the Convert to <format> button.

7.Specify a name and a destination folder for the output file.

When the task is completed, the resulting file will be placed into the folder that you specified.

Use advanced conversion 26 for large documents with complicated layouts.

Creating PDF documents

In the New Task window, you can:

Create PDF documents from files in various formats.

Convert multiple files to PDF.

Combine multiple files into one PDF.

Create searchable PDF documents.

Create PDF/A-compliant documents.

Converting one or more files

1.Click the Open tab and then click Convert to PDF.

2.In the dialog box that opens, select one or more files to convert.

3.Specify conversion settings. These settings determine the appearance and properties of the

output document.

3.1.Image quality The quality of the pictures and the size of the resulting file can be tweaked

using the options in the Image quality drop-down menu:

Best quality

Select this option to retain the quality of the pictures and the page image. The original

resolution will be preserved.

Balanced

Select this option to reduce the size of the output PDF file without too much loss in

picture quality.

Compact size

Select this option to obtain a small-sized PDF file at the expense of picture quality.

Custom…

Select this option to customize picture saving. In the Custom Settings dialog box,

specify desired values and click OK.

3.2.Full-text search Use this drop-down menu to enable or disable full-text searches in the

output document:

As in original document The text on the images will not be recognized. Users will be

able to search inside the output document only if the original document has a text

layer.

Search inside text and images The text on the images will be recognized. Users will be

able to search inside the output document.

ABBYY® FineReader 15 User’s Guide

Disable full-text search The document will be converted to image-only PDF. Users will

not be able to search inside the output document.

3.3.Create PDF/A documents Select this option to create a PDF/A-compliant document. A

PDF/A-2b document will be created by default. Click More options… to select another

version of PDF/A.

3.4.Use MRC compression Select this option to apply Mixed Raster Content (MRC)

compression to reduce file size without noticeable loss in image quality.

3.5.OCR languages Select the language(s) of your document. See also: OCR languages 240 .

3.6.Image preprocessing settings… Here you can specify some additional manipulations to

be preformed on your scans and image files to improve their appearance and the quality

of conversion. See also: Image processing options 244 .

3.7.More options… Open the PDF 215 tab of the Format Settings 215 dialog box.

4.Add or remove files if necessary.

5.Click the Convert to PDF button.

6.Specify a destination folder for the output file.

When the task is completed, the resulting PDF document will be placed into the folder that

you specified.

Combining files

1.Click the Open tab and then click Convert to PDF.

2.In the dialog box that opens, select the files that you want to convert.

3.Specify conversion settings 20 .

4.Add or remove files if necessary.

5.Arrange the files in the desired order and select the Combine all files into one document

option.

6.Click the Convert to PDF button.

7.Specify a name and a destination folder for the output file.

When the task is completed, the resulting PDF document will be placed into the folder that

you specified.

ABBYY® FineReader 15 User’s Guide

Creating Microsoft Word documents

In the New Task window, you can create Microsoft Word documents from PDF documents and images

and from files in any of the supported formats 237 . You can also convert and combine multiple files into

one Microsoft Word document.

Converting one or more files

1.Click the Open tab and then click Convert to Microsoft Word.

2.In the dialog box that opens, select one or more files to convert.

3.Specify conversion settings. These settings determine the appearance and properties of the

output document.

3.1.Keep formatting.

Select the appropriate setting depending on how you plan to use the output document:

Exact copy

The output document will look almost exactly like the original, but will offer limited

editing options.

Editable copy

The appearance of the output document may slightly differ from the original, but the

document can be easily edited.

Formatted text

The font types, font sizes, and paragraph formatting will be retained. The output text

will be placed in one column.

Plain text

Only the paragraph formatting will be retained. The output text will be placed in one

column and a single font will be used throughout.

3.2.OCR languages Select the language(s) of your document. See also: OCR languages 240 .

3.3.Keep pictures Select this option if you want to preserve the pictures in the output

document.

3.4.Keep headers, footers, and page numbers Select this option to preserve the headers,

footers, and page numbers.

3.5.More options… Opens the DOC(X)/RTF/ODT 220 tab of the Format Settings 215 dialog

box.

4.Add or remove files if necessary.

5.Click the Convert to Word button.

ABBYY® FineReader 15 User’s Guide

6.Specify a destination folder for the output file.

When the task is completed, the resulting Microsoft Word document will be placed into the

folder that you specified.

Combining files

1.Click the Open tab and then click Convert to Microsoft Word.

2.In the dialog box that opens, select the files that you want to convert.

3.Specify conversion settings 22 .

4.Add or remove files if necessary.

5.Arrange the files in the desired order and select the Combine all files into one document

option.

6.Click the Convert to Word button.

7.Specify a name and a destination folder for the output file.

When the task is completed, the resulting Microsoft Word document will be placed into the

folder that you specified.

ABBYY® FineReader 15 User’s Guide

Creating Microsoft Excel spreadsheets

In the New Task window, you can create Microsoft Excel documents from PDF documents and images

and from files in any of the supported formats 237 . You can also convert and combine multiple files into

one Excel document.

Converting one or more files

1.Click the Open tab and then click Convert to Microsoft Excel.

2.In the dialog box that opens, select one or more files to convert.

3.Specify conversion settings. These settings determine the appearance and properties of the

output document.

3.1.Keep formatting.

Select the appropriate setting depending on how you plan to use the output document.

Formatted text

The font types, font sizes, and paragraph formatting will be retained.

Plain text

Only the paragraphs will be retained. A single font will be used throughout.

3.2.OCR languages Select the language(s) of your document. See also: OCR languages 240 .

3.3.Keep pictures (XLSX only) Select this option if you want to preserve the pictures in the

output document.

3.4.Create a separate sheet for each page (XLSX only) Select this option if you want to

create a separate Microsoft Excel spreadsheet from each page of the original document(s).

3.5.More options… Opens the XLS(X) 223 tab of the Format Settings 215 dialog box.

4.Add or remove files if necessary.

5.Click the Convert to Excel button.

6.Specify a destination folder for the output file.

When the task is completed, the resulting Microsoft Excel file will be placed into the folder

that you specified.

Combining files

1.Click the Open and then click Convert to Microsoft Excel.

2.In the dialog box that opens, select the files that you want to convert.

3.Specify conversion settings 24 .

4.Add or remove files if necessary.

ABBYY® FineReader 15 User’s Guide

5.Arrange the files in the desired order and select the Combine all files into one document

option.

6.Click the Convert to Excel button.

7.Specify a name and a destination folder for the output file.

When the task is completed, the resulting Microsoft Excel document will be placed into the

folder that you specified.

ABBYY® FineReader 15 User’s Guide

Other formats

In the New Task window, you can convert PDF documents and images into popular formats (*.pptx,

*.odt, *.html, *.epub, *.fb2, *.rtf, *.txt, *.csv, *.djvu) and combine multiple files into one document.

Converting one or more files

1.Click the Open tab and then click Convert to other formats.

2.In the dialog box that opens, select one or more files to convert.

3.Specify conversion settings. These settings determine the appearance and properties of the

output document.

3.1.Select output format Select a format into which to convert your file.

3.2.OCR languages Select the language(s) of your document. See also: OCR languages 240 .

3.3.More options… Opens the corresponding tab of the Format Settings 215 dialog box.

4.Add or remove files if necessary.

5.Click the Convert to <format> button.

6.Specify a destination folder for the output file.

When the task is completed, the resulting file will be placed into the folder that you specified.

Combining files

1.Click the Open tab and then click Convert to other formats.

2.In the dialog box that opens, select the files that you want to convert.

3.Specify conversion settings 26 .

4.Add or remove files if necessary.

5.Arrange the files in the desired order and select the Combine all files into one document

option.

6.Click the Convert to <format> button.

7.Specify a name and a destination folder for the output file.

When the task is completed, the resulting document will be placed into the folder that you

specified.

Advanced conversion

ABBYY FineReader includes an OCR Editor 116 , which provides advanced OCR and conversion features.

The OCR Editor allows you to check recognition areas and verify recognized text, preprocess images in

ABBYY® FineReader 15 User’s Guide

order to improve OCR accuracy, and much more.

ABBYY® FineReader 15 User’s Guide

The OCR Editor also offer powerful features for fine-tuning OCR and conversion to get the best possible

results. For example, you can edit recognition areas 135 , check recognized text 155 , and train 147 ABBYY

FineReader to recognize non-standard characters and fonts.

1. There are several ways to open the OCR Editor:

Open the New Task 13 window by clicking File > New Task, click the Open tab, and then

click the Open in OCR Editor task.

Open the New Task window and click Tools > OCR Editor.

Open the New Task window, and click File > Open in OCR Editor….

2.In the Open Image dialog box, select the files you want to open.

If you are using the default settings, ABBYY FineReader will automatically analyze and

recognize the files you opened. You can change these settings on the Image Processing tab

of the Options dialog box (click Tools > Options… to open this dialog box).

3.After you open a document, its image will be displayed in the Image pane, and text, picture,

table and barcode areas will be marked on the image. Check that the areas have been

detected correctly and edit them if necessary.

ABBYY® FineReader 15 User’s Guide

ABBYY FineReader analyzes documents to detect areas that contain text, pictures, tables,

and barcodes.

Sometimes, areas in complex documents may be detected incorrectly. In most cases it is

easier to correct automatically detected areas than to draw all areas manually.

You can find tools for drawing and editing areas on the toolbar above the Image pane

and on the toolbars that appear above text, picture, background picture, and table areas

when you select them.

You can use these tools to:

Add and delete areas

Change the type of an area

Adjust area borders and move entire areas

Add rectangular parts to areas or delete them

Change the order of areas

4.If you made any changes to areas, click the Recognize button on the main toolbar to

recognize the document again.

5.Check the recognized text in the Text pane and correct it if necessary.

6.Save the recognized document 162 . You can select the format in which to save your document

from the drop-down list of the Save/Send button on the main toolbar (click the arrow next to

the button to open the drop-down list).

For more information about the OCR Editor and its features, see Working with the OCR Editor 116 .

ABBYY® FineReader 15 User’s Guide

Comparing documents

(This functionality is not available in some versions of ABBYY FineReader 15. See also:

http://www.ABBYY.com/FineReader.)

ABBYY FineReader includes ABBYY Compare Documents, an application that lets you compare two

versions of a document, even if these versions are in different formats. ABBYY FineReader’s document

comparison tool lets you detect significant inconsistencies in a text and, for example, prevent the

approval or publication of the wrong version of a document.

There are several ways to start ABBYY Compare Documents:

Open the New Task 13 window, click the Compare tab, and then click Open ABBYY Compare

Documents.

Click the Start button in Windows and click ABBYY FineReader 15 > ABBYY Compare

Documents (in Windows 10, click the Start button, click the All Programs item on the

start menu, and then click ABBYY FineReader 15 > ABBYY Compare Documents)

Click Compare Documents on the Tools menu.

Right-click a file in Windows Explorer 174 and click Compare documents… on the shortcut

menu.

Follow the instructions below to compare two documents.

ABBYY FineReader - 15.0 User Manual

ABBYY® FineReader 15 User’s Guide

1.Open ABBYY Compare Documents, open one of the versions that you want to compare in the

left-hand pane and the other one in the right-hand pane.

2.In the COMPARE pane, select the languages of the document from the drop-down list.

ABBYY® FineReader 15 User’s Guide

3. Click the Compare button to compare the documents.

4.Review the differences detected by ABBYY Compare Documents.

The differences between the two versions will be highlighted in each version and listed in the

right-hand pane, providing you with a clear picture of the changes made to the document.

This makes it easy to see which text was added, removed or edited in each version. Both

pages are scrolled simultaneously and the identical fragments are always displayed side by

side. Differences can be removed from the list or copied to the Clipboard.

Differences that were removed from the list will not be saved to the difference report.

You can save the comparison results:

As a Microsoft Word files where the differences will be shown using the Track Changes

feature.

As a PDF document with comments.

As a Microsoft Word table containing the differences.

For more information on comparing two versions of the same document, see ABBYY Compare

Documents 187 .

ABBYY® FineReader 15 User’s Guide

Scanning and saving documents

You can use the tasks on the Scan tab of the New Task window to create digital documents in various

formats. You will need a scanner or a digital camera to obtain document images.

1. Click the Scan tab and then click a task:

Scan to OCR Editor opens scans in the OCR Editor 116 .

Scan to PDF creates PDF documents from images obtained from a scanner or digital

camera.

Scan to Microsoft Word creates Microsoft Word documents from images obtained from

a scanner or digital camera.

Scan to Microsoft Excel creates Microsoft Excel documents from images obtained from a

scanner or digital camera.

Scan to Image Files creates image-only documents from images obtained from a scanner

or digital camera.

ABBYY® FineReader 15 User’s Guide

Scan to Other Formats creates documents in popular formats, such as *.odt, *.pptx,

*.epub, and *.html, from images obtained from a scanner or digital camera.

2.Select a device and specify scanning settings 206 .

3.Click the Preview button or click anywhere inside the image.

4.Review the image. If you are not satisfied with the quality of the image, change the scanning

settings and click the Preview button again.

5.Specify the settings specific to the selected format.

These settings determine the appearance and properties of the output document.

6.Click the Scan to <format> button.

7.When scanning starts, a dialog box with a progress bar and tips will be displayed.

8.After a page has been scanned, a dialog box prompting you to decide what to do next will be

displayed.

Click Scan Again to scan more pages using the current settings or click Finish Scanning to

close the dialog box.

9.Depending on the task you selected in step 1, the scanned images will be:

Processed and added to an OCR project in the OCR Editor 116 .

Processed and converted to PDF. Specify the folder where you want to save the resulting

document. The document will remain open in the OCR Editor.

Processed and converted to the selected format. Specify the folder where you want to

save the resulting document. The document will remain open in the OCR Editor.

ABBYY® FineReader 15 User’s Guide

Scanning to the OCR Editor

You can open images from a scanner or camera in the OCR Editor, where you will be able to:

Draw and edit recognition areas manually

Check recognized text

Train ABBYY FineReader to recognize non-standard characters and fonts

Use other advanced tools to ensure the best possible OCR result.

1.Open the New Task window, click the Scan tab, and then click the Scan to OCR Editor task.

2.Select a device and specify scanning settings 206 .

3.Click the Preview button or click anywhere inside the image.

4.Review the image. If you are not satisified with the quality of the image, change the scanning

settings and click the Preview button again.

5.Specify preprocessing and automation settings.

5.1.Automatically process page images as they are added

This option enables or disables automatic processing of newly added pages. If automatic

processing is enabled, you can select general document processing options and image

preprocessing settings to be used when scanning and opening images:

Recognize page images

Enable this option if you want FineReader to automatically preprocess newly added

images using the settings specified in the Preprocessing Settings dialog box (click the

Image preprocessing settings (apply to conversion and OCR) link below to open

this dialog box). Analysis and OCR will also be performed automatically.

Analyze page images

Performs image preprocessing and document analysis automatically, but OCR has to be

started manually.

Preprocess page images

Preprocesses images automatically. Analysis and OCR have to be started manually.

5.2.OCR languages

Use this option to specify the languages of the document. See also: OCR languages 240 .

ABBYY® FineReader 15 User’s Guide

5.3.Image preprocessing settings…

Opens the Preprocessing Settings dialog box where you can specify image preprocessing

settings such as detection of page orientation and automatic preprocessing settings.

These settings can significantly improve source images, resulting in greater OCR accuracy.

See also: Image processing options 244 .

5.4.More options…

Opens the Image Processing 213 tab of the Options dialog box. You can also open this

dialog box by clicking Options… on the Tools menu.

6.Click Scan.

7.A progress dialog box will be displayed, showing a progress bar and tips.

8.After the page has been scanned, a dialog box prompting you to decide what to do next will

appear.

Click Scan Again to scan subsequent pages using the current settings or Finish Scanning to

close the dialog box.

9.After the scanning process is completed, the scanned images will be added to an OCR project

in the OCR Editor and processed using the preprocessing and automation settings you

specified earlier.

For more information about the OCR Editor and its features, see Working with the OCR Editor 116 .

ABBYY® FineReader 15 User’s Guide

Scanning to PDF

The Scan to PDF task in the New Task window lets you create PDF documents from images obtained

from a scanner or a digital camera.

1.Open the New Task window, click the Scan tab, and then click Scan to PDF.

2.Select a device and specify scanning settings 206 .

3.Click the Preview button or click anywhere inside the image.

4.Review the image. If you are not satisfied with the quality of the image, change the scanning

settings and click the Preview button again.

5.Specify conversion settings. These settings determine the appearance and properties of the

output document.

5.1.Image quality This option determines the quality of images and pictures, which affects

the size of the resulting output file. The following quality settings are available:

Best quality

Select this option to retain the quality of the pictures and the page image. The original

resolution will be preserved.

Balanced

Select this option to reduce the size of the output PDF file without too much loss in

picture quality.

Compact size

Select this option to obtain a small-sized PDF file at the expense of picture quality.

Custom…

Select this option to customize picture saving. In the Custom Settings dialog box,

specify the desired values and click OK.

5.2.Create PDF/A documents

Select this option to create a PDF/A-compliant document.

5.3.Use MRC compression

Select this option to apply Mixed Raster Content (MRC) compression to reduce file size

without noticeable loss in image quality..

5.4.Recognize text on images

Select this option if you want OCR to start automatically.

5.5.OCR languages

Use this option to specify the languages of the document. See also: OCR languages 240 .

ABBYY® FineReader 15 User’s Guide

5.6.Image preprocessing settings…

Use this option to specify image preprocessing settings, such as detection of page

orientation and automatic preprocessing settings. These settings can significantly improve

source images, resulting in greater OCR accuracy. See also: Image processing options 244 .

5.7.More options…

Opens the PDF 215 section of the Format Settings 215 of the Options dialog box, where

you can specify additional settings (you can also open this dialog box by clicking

Options… on the Tools menu).

6.Click Scan to PDF.

7.A dialog box will be displayed, showing a progress bar and tips.

8.After the page has been scanned, a dialog box prompting you to decide what to do next will

appear.

Click Scan Again to scan more pages using the current settings or click Finish Scanning to

close the dialog box.

9.After the scanning is completed, the scanned images will be processed using the settings you

specified, converted to PDF, and opened in the OCR Editor.

10.Specify the folder where you want to save the resulting PDF.

ABBYY® FineReader 15 User’s Guide

Scanning to Microsoft Word

The Scan to Microsoft Word task in the New Task window lets you create Microsoft Word documents

from images obtained from a scanner or a digital camera.

1.Open the New Task window, click the Scan tab, and then click the Scan to Microsoft Word

task.

2.Select a device and specify scanning settings 206 .

3.Click the Preview button or click anywhere inside the image.

4.Review the image. If you are not satisfied with the quality of the image, change the scanning

settings and click the Preview button again.

5.Specify conversion settings. These settings determine the appearance and properties of the

output document.

5.1.Preserve formatting

Select the appropriate setting depending on how you plan to use the output document.

Exact copy

The output document will look almost exactly like the original, but will offer limited

editing options..

Editable copy

The appearance of the output document may slightly differ from the original, but the

document can be easily edited.

Formatted text

The font types, font sizes, and paragraph formatting will be retained. The output text

will be placed in one column.

Plain text

Only the paragraph formatting will be retained. The output text will be placed in one

column and a single font will be used throughout.

5.2.OCR languages

Select the language(s) of your document. See also: OCR languages 240 .

5.3.Keep pictures

Select this option if you want to preserve the pictures in the output document.

5.4.Keep headers, footers, and page numbers

Select this option to preserve the headers, footers, and page numbers.

ABBYY® FineReader 15 User’s Guide

5.5.Image preprocessing settings…

Specify image preprocessing settings, such as detection of page orientation and automatic

preprocessing settings. These settings can significantly improve source images, resulting in

greater OCR accuracy. See also: Image processing options 244 .

5.6.More options…

Opens the DOC(X)/RTF/ODT 220 section of the Format Settings 215 tab of the Options

dialog box, where you can specify additional settings (you can also open this dialog box

by clicking Options… on the Tools menu).

6.Click Scan to Word.

7.A dialog box will be displayed, showing a progress bar and tips.

8.After the page has been scanned, a dialog box prompting you to decide what to do next will

appear.

Click Scan Again to scan more pages using the current settings or click Finish Scanning to

close the dialog box.

9.Specify the folder where you want to save your Microsoft Word document.

When the task is completed, a Microsoft Word document will be created in the folder that you

specified. All of the document’s pages will also be opened in the OCR Editor.

ABBYY® FineReader 15 User’s Guide

Scanning to Microsoft Excel

The Scan to Microsoft Excel task in the New Task window lets you create Microsoft Excel documents

from images obtained from a scanner or a digital camera.

1.Open the New Task window, click the Scan tab, and then click the Scan to Microsoft Excel

task.

2.Select a device and specify scanning settings 206 .

3.Click the Preview button or click anywhere inside the image.

4.Review the image. If you are not satisfied with the quality of the image, change the scanning

settings and click the Preview button again.

5.Specify conversion settings. These settings determine the appearance and properties of the

output document.

5.1.Preserve formatting.

Select the appropriate setting depending on how you plan to use the output document.

Formatted text

The font types, font sizes, and paragraph formatting will be retained.

Plain text

Only the paragraphs will be retained. A single font will be used throughout.

5.2.OCR languages

Select the language(s) of your document. See also: OCR languages 240 .

5.3.XLSX settings:

Keep pictures

Select this option if you want to preserve the pictures in the output document.

Create a separate sheet for each page

Select this option if you want to create a separate Microsoft Excel spreadsheet from

each page of the original document(s).

5.4.Image preprocessing settings…

Use this option to specify image preprocessing settings, such as detection of page

orientation and automatic preprocessing settings. These settings can significantly improve

source images, resulting in greater OCR accuracy. See also: Image processing options 244 .

5.5.More options…

Opens the XLS(X) 223 section of the Format Settings 215 tab of the Options dialog box,

where you can specify additional settings (you can also open this dialog box by clicking

Options… on the Tools menu).

ABBYY® FineReader 15 User’s Guide

6.Click Scan to Excel.

7.A dialog box will be displayed, showing a progress bar and tips.

8.After the page has been scanned, a dialog box prompting you to decide what to do next will

appear.

Click Scan Again to scan more pages using the current settings or click Finish Scanning to

close the dialog box.

9.Specify the folder where you want to save your Microsoft Excel document.

When the task is completed, a Microsoft Excel document will be created in the folder that you specified.

All of the document’s pages will also be opened in the OCR Editor.

ABBYY® FineReader 15 User’s Guide

Scanning to image files

The Scan to Image Files task in the New Task window lets you create image-only documents from

images obtained from a scanner or a digital camera.

1.Click the Scan tab and then click the Scan to Image Files task.

2.Select a device and specify scanning settings 206 .

3.Click the Preview button or click anywhere inside the image.

4.Review the image. If you are not satisfied with the quality of the image, change the scanning

settings and click the Preview button again.

5.Specify conversion settings. These settings determine the appearance and properties of the

output document.

5.1.Select image format

Use this setting to select the desired image file format.

5.2.Compression

If you selected the TIFF format, you will be able to compress scanned images. Image

compression reduces file size.

Using different compression methods results in different data compression rates and may

result in data loss (loss of image quality). There are two factors you should consider when

choosing a compression method: the quality of images in the output file and its size.

ABBYY FineReader lets you use the following compression methods:

PACKBITS

Does not cause data loss and is well suited for compressing black-and-white scans.

JPEG (JFIF format)

This method is used to compress grayscale and color images such as photographs. It

compresses images significantly, but at the cost of some data loss. This leads to

reduced image quality (blurriness and loss of color saturation).

ZIP

Does not cause data loss and works best on images that contain large single-color

areas such as screenshots and black-and-white images.

LZW

Does not cause data loss and works best on images with vector graphics and grayscale

images.

5.3.Image preprocessing settings…

Specify image preprocessing settings, such as detection of page orientation and automatic

preprocessing settings. These settings can significantly improve source images, resulting in

greater OCR accuracy. See also: Image processing options 244 .

ABBYY® FineReader 15 User’s Guide

6.Click Scan to <format>.

7.A dialog box will be displayed, showing a progress bar and tips.

8.After the page has been scanned, a dialog box prompting you to decide what to do next will

appear.

Click Scan Again to scan more pages using the current settings or click Finish Scanning to

close the dialog box.

9.Specify the folder where you want to save your output file.

When the task is completed, output files in the specified format will be created in the folder that you

specified. All of the document’s pages will also be opened in the OCR Editor.

ABBYY® FineReader 15 User’s Guide

Scanning to other formats

The Scan to Other Formats task in the New Task window lets you create documents in popular

formats (*.pptx, *.odt, *.html, *.epub, *.fb2, *.rtf, *.txt, *.csv, and *.djvu) from images obtained from a

scanner or a digital camera.

1.Open the New Task window, click the Scan tab, and then click the Scan to Other Formats

task.

2.Select a device and specify scanning settings 206 .

3.Click the Preview button or click anywhere inside the image.

4.Review the image. If you are not satisfied with the quality of the image, change the scanning

settings and click the Preview button again.

5.Specify conversion settings. These settings determine the appearance and properties of the

output document.

5.1.Select output format

Use this option to select the desired format for the output file.

5.2.OCR languages

Select the language(s) of your document. See also: OCR languages 240 .

5.3.Image preprocessing settings…

Specify image preprocessing settings, such as detection of page orientation and automatic

preprocessing settings. These settings can significantly improve source images, resulting in

greater OCR accuracy. See also: Image processing options 244 .

5.4.More options…

Opens the section with the settings of the selected format on the Format Settings 215 tab

of the Options dialog box, where you can specify additional settings (you can also open

this dialog box by clicking Options… on the Tools menu).

6.Click Scan to <format>.

7.A dialog box will be displayed, showing a progress bar and tips.

8.After the page has been scanned, a dialog box prompting you to decide what to do next will

appear.

Click Scan Again to scan more pages using the current settings or click Finish Scanning to

close the dialog box.

9.Specify the folder where you want to save your output file.

When the task is completed, output files in the specified format will be created in the folder that you

specified. All of the document’s images will also be opened in the OCR Editor.

ABBYY® FineReader 15 User’s Guide

PDF Editor

The PDF Editor is an easy-to-use tool that lets you view and search PDF documents, rearrange, add or

delete pages, copy text and pictures, edit text, and add comments to documents. You don’t need to

convert your PDF to an editable format, even if it only contains scans without a text layer.

Chapter contents

Viewing PDF documents 48

Reviewing PDF documents 59

Working with PDF content 74

Filling out forms 100

Signing PDF documents with a digital signature 101

Protecting PDF documents with passwords 105

Creating PDF documents 108

Saving and exporting PDF documents 109

ABBYY® FineReader 15 User’s Guide

Viewing PDF documents

The PDF Editor allows you to view and search PDF documents and copy text, pictures, and tables inside

them.

To open a PDF document in the PDF Editor:

Open the New Task 13 window, click the Open tab and then click the Open PDF Document

task.

Open the New Task window and click File > Open PDF Document….

The document will be displayed in the PDF Editor.

To customize the way the document is displayed, use the following settings.

The viewing modes change the way pages are displayed and scrolled.

Viewing modes

The PDF Editor has four viewing modes that determine how document pages are displayed and

scrolled:

One-Page View displays one page and hides all the other pages.

One-Page Scrolling displays pages one after the other, so that when you get to the bottom of one page, the top of the next page is visible.

Two-Page View displays pages side-by-side, with odd-numbered pages on the left and even-numbered pages on the right.

Two-Page Scrolling displays two pages side-by-side with subsequent pages appearing as you scroll

down.

ABBYY® FineReader 15 User’s Guide

If you want to display even pages on the left and odd pages on the right in one of the two-page

viewing modes, click View > View mode > Odd Pages on the Right.

There are several ways to change the viewing mode:

Click one of the buttons on the toolbar at the bottom of the screen:

Click View > View mode and select one of the viewing modes.

Use the following keyboard shortcuts: Ctrl+1, Ctrl+2, Ctrl+3, and Ctrl+4.

Full-screen mode

The PDF Editor has a full-screen viewing mode, in which the document takes up the entire screen space

and no panels or toolbars are visible.

To enter full-screen mode, do one of the following:

Click the button on the toolbar at the bottom of the screen.

Click View > Full Screen.

Press F11.

In the full-screen mode, you can:

Scroll the document.

View comments (place the mouse pointer over a commented area to display the comment).

Change viewing modes and scaling.

To display the bottom toolbar with viewing options, move the mouse pointer to the bottom

edge of the screen.

Go to specific pages in the document.

Задача: понять PDF

Содержимое каждой страницы в PDF-файле хранится в виде потоков команд для отрисовки документа – это могут быть текст, изображения или векторная графика. Структуру файла определяют PDF-объекты, например, страница, картинка, комментарий (а абзацы, строчки текста и буквы – это всего лишь части объекта). Символ в PDF представляется глифом. То, как они записаны, определяется шрифтом. Каждый символ хранится отдельно: у него есть шрифт, код символа в шрифте и координаты его расположения на странице. То, где глифы расположены, определяется как раз потоком команд. Кроме того, буквы объединены в потоки текста (text run), но они не смысловые.

В PDF нет ни строк, ни абзацев, которые есть в документах текстовых форматов. Даже порядок текста не всегда определен. То есть вы видите текст, но на самом деле текста не существует. Это хаос из трудно понятных инструкций (как на изображении выше), которые нужно правильно отобразить в конкретных местах документа, с соответствующим форматированием.

«А как же текст?» – спросите вы.

Текст в PDF все же существует, и его даже получится редактировать. Для этого мы учим наши технологии понимать структуру текста, например, определять и выделять строки. Расскажем об этом подробнее.

Библиотеки PDF и как мы их поменяли

Чтобы сделать возможным редактирование целых абзацев, мы сильно поменяли нашу внутреннюю подсистему (библиотеку), которую мы называем PdfTools. Она занимается тем, что открывает PDF-файлы, парсит потоки команд (т.е. понимает, где расположен текст, где картинки, и воссоздает структуру документа) и помогает пользователям оперировать этими данными: прочитать, изменить, сохранить в PDF.

Подсистема PdfTools содержит все необходимые инструменты, чтобы прочитать содержимое и обернуть его в объекты (страница, картинка, комментарий), с которыми удобно работать программе. С этими объектами уже могут работать наши продукты, в частности ABBYY FineReader PDF и другие.

Как было раньше. В FineReader 14 мы умели редактировать текст только в рамках одной строчки. После редактирования необходимо было выполнить «рендеринг» — расставить глифы на свои новые места.

Вообще рендеринг — это визуализация. Но мы вкладываем в это слово иное понятие — расположение объектов в PDF на своих местах. Для PDF-специалистов это и есть визуализация, которую больше никто не видит. Когда мы говорим о визуализации в привычном понимании, то используем слово «растеризация».

Весь этот процесс располагался в подсистеме PdfTools. Она помогала нам собирать содержимое PDF в строчки и редактировать их. Например, надо поставить на 5-ое месте глиф «А». FineReader передавал подсистеме PdfTools, что на пятое место нужно поставить глиф «А» с заданным размером и шрифтом, а PdfTools вставляла «А» и перемещала на нужное место в строчке все глифы, которые следовали за буквой «А». Построчное редактирование довольно легкое: текст просто смещался вправо или, например, влево, если он записан на иврите или арабском языке. Это позволяло внести небольшие корректировки, например, исправить опечатку, но не давало возможность сделать более глобальные изменения в тексте PDF-документа.

Что решили изменить. Когда появилась задача многострочного редактирования, мы поняли, что в рамках одной библиотеки PdfTools это будет проблематично делать. Нам необходимо было научиться автоматически находить в тексте PDF более крупные фрагменты, например, «видеть» абзацы, понимать, где находятся их границы, какое форматирование должно быть у целого фрагмента текста и что происходит при переходе с одной строки на другую. Чтобы определить все эти параметры, мы решили привлечь для решения этой задачи и другие наши OCR-технологии — Document Analysis (DA) и Synthesis, которые умеют строить структуру документа.

Document Analysis и Synthesis

Чтобы определять в тексте блоки, ABBYY FineReader PDF использует технологию Document Analysis. Она позволяет найти абзацы, таблицы, картинки. Программа подсвечивает найденные блоки небольшими бледными рамками, чтобы пользователю удобнее было вносить правки:

Далее мы усовершенствовали другую подсистему нашей программы – Synthesis. Мы уже рассказывали на Хабре, зачем она нужна. Если вкратце, именно она определяет структуру и все характеристики распознанного текста: какие используются шрифты и размеры, какое начертание (bold, italic, underline), где заголовки, списки, отступы и многие другие параметры, которые можно настраивать в том же MS Word. Мы доработали Synthesis для того, чтобы при распознавании и воссоздании страницы очень точно восстанавливать исходные параметры текста.

Особенности подчеркнутого текста

В PDF нет такого атрибута текста как подчеркивание, привычного, например, пользователям MS Word. Подчеркивание в PDF – это векторная графика, никак не связанная с текстом. Без дополнительной доработки продукта при редактировании «подчеркнутого» текста символы бы перемещались привычным образом, а линии, обозначающие подчеркивания, оставались бы на месте. ABBYY FineReader PDF умеет определять и редактировать подчеркнутый текст привычным пользователю образом.

Редактирование таблиц в PDF

Изменилось и редактирование таблиц. Раньше программа «видела» таблицу, как отдельные строки, и редактировала ее так же. Теперь при работе с таблицами ABBYY FineReader PDF определяет содержимое каждой ячейки, умеет извлекать из них текст и работать с ним. Это удобно, когда надо исправить ошибку в цифре, поменять точку на запятую и при этом сохранить структуру таблицы, сделать это быстро и без конвертации PDF-документа в другие форматы.

Как отредактировать скан?

Возможность многострочного редактирования доступна и для сканов. Кстати, пользователю даже не надо задумываться, скан перед ним или нет. ABBYY FineReader PDF сам определит это и запустит нужные механизмы. Например, в дате договора — опечатка, или ФИО контрагента поменялось: оно стало длиннее и должно «перетечь» на следующую строчку.

В программе скан сначала распознается, а потом происходит подготовка к редактированию. Когда скан распознали, то текст получается не в нашем исходном документе, а в его виртуальном «двойнике». И именно в нем происходят все операции по редактированию.

Когда пользователь закончил редактировать документ, программа автоматически собирает все изменения со страницы и заменяет эти фрагменты в исходном документе. Наша задача — встроить текст обратно в PDF-документ, не повредив все то остальное, что уже есть в нем.

Редактирование скана позволяет не тратить время на конвертацию документа в другие форматы и обратно. Это удобно, когда нужно быстро внести забытую правку в дату или другой фрагмент текста.

Пример многострочного редактирования. Текст автоматически перераспределяется по строкам по мере добавления слов и предложений внутри абзаца.

Вместо заключения

Исправить опечатку в листовке, поменять местами текстовые блоки в инструкции, изменить целый абзац в скане договора или добавить несколько новых, поправить форматирование всего текста – все эти задачи теперь возможно решить:

быстро,
без конвертации документа,
с помощью одной программы.

Попробовать можно прямо сейчас – скачайте триал-версию ABBYY FineReader PDF бесплатно.

В следующем посте через неделю мы расскажем о том, как научили ABBYY FineReader PDF еще одной интересной фиче и для чего может пригодиться новая функциональность.

Пишите в комментариях, о каких еще технологических особенностях нашей программы вам было бы интересно узнать?

Источник

Руководство пользователя по ABBYY FineReader 15

Год: 2019

Добавил: Admin 15 Ноя 22

Проверил: Admin 15 Ноя 22

Формат:

FB2, ePub, TXT, RTF, PDF, HTML, MOBI, JAVA, LRF

Currently 0/5

Рейтинг: 0/5 (Всего голосов: 0)

Аннотация

Руководство пользователя по ABBYY FineReader 15

Другие книги автора Коллектив авторов

Комментарии к книге «Руководство пользователя по ABBYY FineReader 15»

Комментарий не найдено. Будьте первыми!

Чтобы оставить комментарий или поставить оценку книге Вам нужно зайти на сайт или зарегистрироваться

Источник

Читайте «Хайтек» в

Почему так сложно редактировать текст в PDF?

Особенности отображения текста в PDF

Как ABBYY FineReader помогает редактировать PDF

Как отредактировать текст в отсканированном документе

Как найти в PDF внесенные правки и избежать обмана

Как работают нейросети для распознавания иероглифов и арабской вязи

Качество и скорость в быстром и нормальном режиме

Почему важно следить за развитием языка

Как резать арабскую вязь на фрагменты

Возможности программы Файн ридер 15:

Особенности программы FineReader 15:

Версии FineReader для windows

Файн ридер 15 на русском языке скачать бесплатно:

Как установить ABBY FineReader PDF 15:

Как активировать ABBYY FineReader PDF 15

Introducing ABBYY FineReader

About ABBYY FineReader

What’s New in ABBYY FineReader 15

The New Task window

Viewing and editing PDFs

Quick conversion

Creating PDF documents

Creating Microsoft Word documents

Creating Microsoft Excel spreadsheets

Other formats

Advanced conversion

Comparing documents

Scanning and saving documents

Scanning to the OCR Editor

Scanning to PDF

Scanning to Microsoft Word

Scanning to Microsoft Excel

Scanning to image files

Scanning to other formats

PDF Editor

Viewing PDF documents

Viewing modes

Задача: понять PDF

Библиотеки PDF и как мы их поменяли

Document Analysis и Synthesis

Особенности подчеркнутого текста

Редактирование таблиц в PDF

Как отредактировать скан?

Вместо заключения

Аннотация

Другие книги автора Коллектив авторов

Похожие книги

Комментарии к книге «Руководство пользователя по ABBYY FineReader 15»

Это тоже интересно: