I have a Microsoft Word document that contains several bullets and nested bullets (sub-bullets), with up to three levels of nesting. I have been exploring the use of the officer
package in R to read the text from the Word document, which I plan to then insert into a database. I am able to successfully extract all the text from the document, but what I can't seem to figure out is how to extract the bullets themselves. Each bullet and bullet level provides important contextual information about the text that I need to extract, but it seems the bullets are stripped/ignored using the officer
package. So my question is, is there any way for me to use officer
to extract the bullets themselves, in addition to the text, or is there some other R package that I might be able to use that will retrieve the bullets as well?
I realize, I could probably write a custom function to extract the xml structure of the Word document and obtain the bullets from there, but I'm really trying to avoid digging into those details and re-creating the wheel that others might have already developed.
Thanks.