[R] Reading PDF files with German umlauts using tabulizer
grond @end|ng |rom number|@nd@de
Tue Sep 6 11:39:52 CEST 2022
I have some trouble with reading PDF files in German language.
I want to extract text and tables with the tabulizer package, and every
things goes well as long as I read English texts.
When I try the same code
text <- extract_text(file = "Pub_001.pdf")
with documents in German language
German umlauts are not recognized.
They are either replaced by a combination of characters.
"Entmischung und Kristallisation in Gläsern des Systems"
"Entmischung und Kristallisation in GHisern des Systems"
or replaced by ascii like this
"In Gläsern des Systems"
"In Glasern des Systems"
Opening the file with Adobe Reader tells me that encoding is "Ansi"
Is there a way to read this file correctly?
Thanks in advance for any idea.
More information about the R-help