How can I create a stream containing the content between pageStart,
lineStart and pageEnd, lineEnd?
How can assemble a pdf from streams created this way?
Here is the context for what I am trying to do:
I am working for a company that sells pdf research documents online.
The customer can buy the whole report or sections out of the report.
When the pdf comes to us from the publisher, it is not necessary
bookmarked in a way that corresponds directly to the sections of the
report that will be available for sale.
We need a way to tag/mark the pdf with the start and end points of
sections in a way that will allow us to quickly extract the section(s)
of a report the customer would like to buy.
We currently work with a third party that is doing some of this for
us. We would like to be able to do it ourselves. The current
solution seems to be, that the original pdf is restructured so that
report sections are on streams. Also, an xml file functioning as an
index is created and stored as xmp metadata. The index maps section
names to a list of page number, stream number pairs. This then allows
looking up and grabbing the content.
Here is a small piece of a typical xml index that gets stored with the
<INDEX creation_date="12/30/2009 10:47:05 PM">
<CONTENT page="4" stream="1"/>
<CONTENT page="4" stream="2"/>
<CONTENT page="5" stream="1"/>
<CONTENT page="6" stream="1"/>
I can do the splitting once the streams and index file are in place in
the pdf. I need help doing the restructuring of the document.