I am curious about how a PDF stores text formatting in a sticky note. Adobe reader lets me make bold, underline, and italics formatting in my sticky note annotations. But when I extract that data from the GetContents method on the annotation using the PDFTron library, it’s just the text.
The main reason I’m asking this is that we currently have our own document viewer (using the XOD format) and our viewer includes a custom feature to make annotations on the document. We store that metadata ourselves and currently it is in HTML. I would like to be able to convert between PDF sticky note annotations and our annotations, but don’t really see a way for the formatting to be converted.
Can you provide any insight into this?
I found the answer with a little more searching:
The short answer is it is in the “RC” field of the SDFObj (contents-richtext from the PDF spec).
For anyone that cares, this is the code (in PHP):
$comment = Text::Create($pdfdoc->GetSDFDoc(), new Point(155.84971098266,$page->GetPageHeight()-127.20086705202)); $comment->GetSDFObj()->PutString("RC", $html);
However, since we’re using the “contenteditable” attribute of HTML5, I found there are lots of quirks; for example, different browsers will do different things when hitting the enter key (e.g. Firefox adds a
tag, but Chrome adds a
tag to the second line), and cutting and pasting from a Word doc will add all sorts of markup. Adobe doesn’t correctly interpret the HTML that is created in those scenarios, so I’ve had to make a lot manual changes to the HTML in order for it to show up in Adobe Reader.
Due to that, I’m going to look into switching to the XFDF way of storing annotations and hopefully be able to take advantage of some of the WebViewer libraries.
To add to your findings, there is a specific subset of HTML that is supported in a RC entry. See section 18.104.22.168 Rich Text Strings to see what is valid.