Splitting PDF based on Bookmarks.

Q: In my research your SDK seems to be the most professional and well
supported product I could find. I have downloaded the trail version
and though some research on your forum I found some sample code that
would break the document up based on the bookmarks. I was able to
make this code work but there is a problem,

Let us say for instance I have a bookmark that starts on page 4, and
also ends on page 4 with another bookmark that begins on that same
page as well. When I break the document up I get all of page 4 in one
and then the subsequent section is missing it’s top bookmark.

What I would like to know is do I have the ability to somehow create a
single page that starts and ends based on the bookmark with the next
bookmark being wholly contained in the next “document part” that I
create? Basically I want to eliminate widows and orphans and also keep
each bookmark from being separated across the multiple PDF files that
get created – Obviously a user wouldn’t be happy is they bought
section 5 and had to by section 4 to get all of section 5.

Is there some sample code you could point out that I could look at?
Again, I have the sample code with breaks it up based on bookmarks but
it is only capable of creating files based on the entire page of the
PDF, I need to take the whole region that the bookmark encompasses –
split that out into a new file. Not the whole page as in many cases it
is less than a page.
----
A: I am not exactly clear regarding your requirement, however PDFNet
SDK API is general enough that you should be able to implement any
type of PDF split and merge operation.

For example, if you would like to include the page where the next
section begins you could simply add +1 in the SplitByBookmarks(). For
example:

class BookmarkSplit
{
static int max_bookmark_levels = 3;
static string output_path = "../../../../TestFiles/Output/";

static void CreateSplitList(PDFDoc doc, Bookmark item, SortedList
slist, int level) {
    for (; item.IsValid(); item = item.GetNext())
    {
        Action action = item.GetAction();
        if (action.IsValid() && action.GetType() ==
Action.Type.e_GoTo)
        {
            Destination dest = action.GetDest();
            if (dest.IsValid())
            {
                Page page = dest.GetPage();
                int page_idx = page.GetIndex();
                if (slist.ContainsKey(page_idx) == false)
                    slist.Add(page_idx, item.GetTitle());
            }
        }

        if (level < max_bookmark_levels && item.HasChildren())
        {
            CreateSplitList(doc, item.GetFirstChild(), slist, level +
1);
        }
    }
}

static void SplitByBookmarks(PDFDoc doc, Bookmark item) {
    SortedList slist = new SortedList();
    CreateSplitList(doc, item, slist, 1);

    // Now we have a sorted list of pages indexes // whic can be used
to split the source document.
    if (slist.Count == 0)
    {
        Console.WriteLine("No document bookmarks");
    }

    if (slist.ContainsKey(1) == false)
    {
        slist.Add(1, "First page");
    }

    int num_pages = doc.GetPageCount();
    if (slist.ContainsKey(num_pages) == false)
    {
        slist.Add(num_pages, "Last page");
    }

    for (int i = 1; i < slist.Count; ++i)
    {
        int range_start = (int)slist.GetKey(i - 1);
        int range_end = (int)slist.GetKey(i);
        string bookmark_title = (string)slist.GetByIndex(i - 1);
        Console.WriteLine("Page Range {0:d}-{1:d} Title: {2}",
range_start, range_end, bookmark_title);

        PDFDoc new_doc = new PDFDoc();
        ArrayList copy_pages = new ArrayList();
        PageIterator itr = doc.GetPageIterator(range_start);
        PageIterator end = doc.GetPageIterator(range_end + 1);
        for (; itr != end; itr.Next())
        {
            copy_pages.Add(itr.Current());
        }

        ArrayList imported_pages = new_doc.ImportPages(copy_pages);
        for (int j = 0; j != imported_pages.Count; ++j)
        {
            new_doc.PagePushBack((Page)imported_pages[j]);
        }

        string outname = string.Format("split_{0:d}_{1:d}.pdf",
range_start, range_end);
        //string outname = string.Format("split_{0:d}_{1:d}-{2}.pdf",
range_start, range_end, bookmark_title);
        new_doc.Save(output_path + outname, 0);
        new_doc.Close();
    }
}

/// <summary>
/// The main entry point for the application.
/// </summary>
static void Main(string[] args)
{
    PDFNet.Initialize();
    try
    {
        PDFDoc doc = new PDFDoc(@"C:\PDFReference16.pdf");
        doc.InitSecurityHandler();

        // Get the root bookmark...
        Bookmark root = doc.GetFirstBookmark();
        if (!root.IsValid()) Console.WriteLine("No bookmarks");

        SplitByBookmarks(doc, root);
        doc.Close();
    }
    catch (PDFNetException e)
    {
        Console.WriteLine(e.Message);
    }

    PDFNet.Terminate();
    Console.WriteLine("Done.");
}
}

You could definitely extend this code to implement something smarter.
For example, instead of storing a spilt list of page numbers while
traversing bookmarks, you could store start and end of the page range.
This way you can perform more intelligent splitting which is probably
what you are looking for.

If required, you could also split individual PDF pages based on their
content, or based on positioning information (something along the
lines of ElementEdit sample but using position information -
element.GetBBox() to remove element from a page)... Not sure if this
is relevant to your problem though.