java - How to remap stripped text position to pdf document position -
i use pdfbox's pdftextstripper
extract plain texts 2 pdf files compared using nlp algorithm. algorithm returns postions of common passages of plain texts.
what want highlight common passages in pdf. problem have position in plain text not corresponding position in pdf. using pdftextstripper
mapping lost.
are there solutions/common approaches preserve mapping plain text position pdf document position while stripping text pdfs? accept use different pdf library if supports have use java.
Comments
Post a Comment