Type: Package
Title: Extract Text from Microsoft Word Documents
Version: 1.3.4
Description: Wraps the 'AntiWord' utility to extract text from Microsoft Word documents. The utility only supports the old 'doc' format, not the new xml based 'docx' format. Use the 'xml2' package to read the latter.
Imports: sys (≥ 2.0)
URL: https://docs.ropensci.org/antiword/, https://ropensci.r-universe.dev/antiword
BugReports: https://github.com/ropensci/antiword/issues
License: GPL-2
Encoding: UTF-8
NeedsCompilation: yes
Packaged: 2024-10-03 14:12:08 UTC; jeroen
Author: Jeroen Ooms ORCID iD [aut, cre], Adri van Os [cph] (Author 'antiword' utility)
Maintainer: Jeroen Ooms <jeroenooms@gmail.com>
Repository: CRAN
Date/Publication: 2024-10-04 13:20:02 UTC

Antiword

Description

Wraps the antiword utility. Takes a path to an word file and returns text from the document.

Usage

antiword(file = NULL, format = FALSE)

Arguments

file

path or url to your word file

format

format the output text (-f parameter)

Examples

text <- antiword("https://jeroen.github.io/files/UDHR-english.doc")
cat(text)