Migrating from Adobe/Datalogics API to PDFNet SDK for C/C++

Aaron_Gravesdale · September 10, 2009, 5:11pm

Q: We are in the process of migrating from Adobe PDF Library,
documented in the Acrobat Core API Reference (Version 5.0) to PDFNet
SDK (http://www.pdftron.com/pdfnet).

We use this library for Windows (WinXP, Vista) exclusively. We use
Visual Studio 2008. Our product is compiled with UNICODE defined, i.e.
all characters are 16bit. For our form viewer control we use straight
Win32 calls, i.e. we don’t use MFC or .Net.

Since we have a lot of code using this library, we thought it would be
easiest to emulate the part of the Adobe PDF library we use using
calls to the PDFNetC library from PDFTron.

Question 1: Is this a good idea? Did other people may be do the same
thing? Does PDFTron may be already have a compatibility layer like
this? In our code we intend to define something like this:

typedef int ASInt32;
typedef pdftron::PDF::PDFDoc* PDDoc;
typedef pdftron::SDF::SDFDoc* CosDoc;
typedef pdftron::PDF::Page* PDPage;
///struct CosObj {
// ASInt32 a, b;
//};
typedef pdftron::SDF::Obj* CosObj;
//struct PDAnnot {
// ASInt32 a, b;
//};
typedef pdftron::PDF::Annot* PDAnnot;

Question 2: SDF::Obj::Type does not distinguish between Int, Fixed
and Real. Is there may be a workaround? Some of our code might need to
make this distinction, e.g.:

ASInt32 CosObjGetType(CosObj cosObj)
{
switch (cosObj->GetType()) {
case pdftron::SDF::Obj::e_null : return CosNull;
case pdftron::SDF::Obj::e_bool : return CosBoolean;
case pdftron::SDF::Obj::e_number: return CosReal; // or
CosInteger, or CosFixed?
case pdftron::SDF::Obj::e_name : return CosName;
case pdftron::SDF::Obj::e_string: return CosString;
case pdftron::SDF::Obj::e_dict : return CosDict;
case pdftron::SDF::Obj::e_array : return CosArray;
case pdftron::SDF::Obj::e_stream: return CosStream;
default: Assert(false); return CosNull;
}
}

Question 3: How do we get the PDF::PDFDoc from a SDF::SDFDoc?

PDDoc PDDocFromCosDoc(CosDoc cosDoc)
{
return ???;
}

Question 4: PDF::Page::GetPage returns a reference to a page, not a
pointer to a page. Is the following code correct?

PDPage PDDocAcquirePage(PDDoc doc, ASInt32 pageNum)
{
pdftron::PDF::Page page=doc->GetPage(pageNum+1);
if (page.IsValid()) {
return new pdftron::PDF::Page(page);
}
else {
return NULL;
}
}

void PDPageRelease(PDPage page)
{
delete page;
}

Question 5: Coordinate system. I couldn’t find any documentation about
the coordinate system used in PDFNetC. Is the origin at the lower left
corner of the page? Are coordinates in points (1 pt = 1/72inch)? When
I tried to get the size of a page, I always got width==10.0 and
height==10.0 regardless of the actual page size. This is the code I
use:

typedef int ASFixed;
ASFixed ASFloatToFixed(double inFloat)
{
return (ASFixed)(inFloat*65536.0);
}

void PDPageGetCropBox(PDPage page, ASFixedRect* cropBoxP)
{
pdftron::PDF::Rect r=page->GetCropBox(); // The same result when
using page->GetMediaBox()
double x1, x2, y1, y2;
x1 = r.GetX1(); // Now: x1==0.0
y1 = r.GetY1(); // Now: y1==0.0
x2 = r.GetX2(); // Now: x2==10.0
y2 = r.GetY2(); // Now: y2==10.0

// A temporary hack to adjust for this:
x1 = 8.572.0/10.0;
x2 = 8.572.0/10.0;
y1 = 11.072.0/10.0;
y2 = 11.072.0/10.0;

cropBoxP->left = ASFloatToFixed(x1);
cropBoxP->top = ASFloatToFixed(y2);
cropBoxP->right = ASFloatToFixed(x2);
cropBoxP->bottom = ASFloatToFixed(y1);
}

Question 6: What is the best way to draw a page to a Windows screen
device context? I found an undocumented function PDF::PDFDraw::
DrawInRect which seems to lack a few parameters, e.g. zoom level,
rotation, viewport origin, update rectangle, etc. In our current code
we use the following function:

void PDPageDrawContentsToWindowEx(PDPage page, void* window, void*
displayContext,
ASBool isDPS, ASFixedMatrix* matrix,
ASUns32 flags,
ASFixedRect* updateRect, CancelProc
cancelProc, void* cancelProcClientData);

We call it like this:

PDPageDrawContentsToWindowEx(pdpage, NULL, hdc, false, &ctm, 0, &crop,
NULL, NULL);

Question 7: Exception handling. The PDFNetC library functions might
throw exceptions of the type Common::Exception. Are these the only
ones we have to catch or do we also have to look for std::exception or
others? Also, it looks like the Exception class doesn’t have an error
code member, only a message string. Is this correct? If yes, do we
need to compare message strings if we want to treat some errors
differently? Could we use macro definitions like these to mimic the
Adobe PDF library?

#define DURING try {
#define HANDLER } catch (pdftron::Common::Exception& ex) {
#define END_HANDLER }
#define ERRORCODE ((ASInt32)ASAtomFromString(ex.GetMessage()))

Question 8: Unicode strings. The PDFNet library uses a string class
called UString. Why not using std::wstring? Are there helper functions
to convert from UString to std::wstring and vice versa?

Question 9. How to compare Obj instances? Is this code correct?

ASBool CosObjEqual(CosObj cosObj, CosObj cosObj2)
{
if (cosObj==cosObj2) return true;
if (cosObj==NULL || cosObj2==NULL) return false;
if (cosObj->GetType()!=cosObj2->GetType()) return false;
if (cosObj->IsIndirect()!=cosObj2->IsIndirect()) return false;
if (cosObj->IsIndirect()) {
if (cosObj->GetGenNum()!=cosObj2->GetGenNum()) return false;
if (cosObj->GetObjNum()!=cosObj2->GetObjNum()) return false;
}
else {
if (!cosObj->IsEqual(*cosObj2)) return false;
}
return true;
}

Question 10. PDF::Rect::GetX1(), GetX2(), GetY1() GetY2() are not
const functions. Why? Code like this won’t compile:

static void SetARect(ASFixedRect& rect, const pdftron::PDF::Rect& r)
{
rect.left = ASFloatToFixed(r.GetX1());
rect.top = ASFloatToFixed(r.GetY2());
rect.right = ASFloatToFixed(r.GetX2());
rect.bottom = ASFloatToFixed(r.GetY1());
}

Does GetX1() may be normalize the rectangle first?

Question 11. Can I delete multiple pages from a document in one step?
Currently I do this:

void PDDocDeletePages(PDDoc doc, ASInt32 firstPage, ASInt32 lastPage,
ProgressMonitor progMon, void* progMonClientData)
{
pdftron::PDF::PageIterator it;
//
while (firstPage<=lastPage) {
it = doc->GetPageIterator(firstPage+1);
doc->PageRemove(it);
lastPage–;
}
}

Question 12. How can I copy multiple pages from one document into
another? How can I check if doc and doc2 point to different files? Do
I have to create a vector of source pages from doc2 and then call
doc.ImportPages?

void PDDocInsertPages(PDDoc doc, ASInt32 mergeAfterThisPage, PDDoc
doc2, ASInt32 startPage, ASInt32 numPages, ASUns16 insertFlags,
ProgressMonitor progMon, void*
progMonClientData, CancelProc cancelProc, void* cancelProcClientData)
{
???
}

When calling PDDocInsertPages, the last 4 parameters are always NULL
in our code.

Question 13. Is there a good way to create a temporary copy of a
currently opened document? Currently we do something like this:

PDDoc PDDocCreate()
{
return new pdftron::PDF::PDFDoc();
}

PDoc pdoc_new=PDDocCreate();
PDDocInsertPages(pdoc_new, PDBeforeFirstPage, pddoc, 0, PDAllPages,
PDInsertAll, NULL, NULL, NULL, NULL);

Will this copy all embedded fonts and copy annotations correctly?

Question 14. How do I insert an empty new page?

double ASFixedToFloat(ASFixed f)
{
return ((double)f)/65536.0;
}

void SetPRect(pdftron::PDF::Rect& rect, const ASFixedRect& r)
{
rect.SetX1(ASFixedToFloat(r.left ));
rect.SetX2(ASFixedToFloat(r.right ));
rect.SetY1(ASFixedToFloat(r.bottom));
rect.SetY2(ASFixedToFloat(r.top ));
}

PDPage PDDocCreatePage(PDDoc doc, ASInt32 afterPageNum, ASFixedRect
mediaBox)
{
pdftron::PDF::PageIterator it=doc->GetPageIterator(afterPageNum+1);
pdftron::PDF::Page page;
pdftron::PDF::Rect rect;
//
SetPRect(rect, mediaBox);
page.SetMediaBox(rect);
doc->PageInsert(it, page);
return &doc->GetPage(afterPageNum+1);
}

A: You could probably implement a wrapper layer around PDFNet to
emulate Adobe’s API, however I am not clear regarding the
effectiveness of this approach. We have a number of users who have
migrated to PDFNet (from Adobe and other similar APIs), but it seems
that in most cases they decided to refactor the existing code instead
of writing the wrapper layer. The main advantage of this approach is
that the code is easier to understand and there is less code to
maintain. This would most likely lower the number of defects and
decrease maintenance costs. On the other side, if you have a huge code
base which would take too long to refactor, it is possible that it
would be more cost effective to implement a wrapper layer. The
following is a relevant entry from PDFNet KB that compares SDF API in
PDFNet SDK relative to COS API in Acrobat Professional SDK:

Besides the syntax differences, both SDF (Structured Document Format)
API in PDFNet and COS API in Acrobat Pro SDK offer the same type of
functionality. These APIs are used for low-level manipulation of PDF
documents. A useful analogy would be a comparison between different
DOM API-s in the world of XML. To put it technically, both SDF and COS
PDF offer read, write, and edit operation on the same ‘infoset’.

Although both SDF API in PDFNet and COS in Acrobat SDK can be used to
implement the same type of functionality, there are a number of
advantages of using SDF API in PDFNet:

The SDF layer like the rest of PDFNet follows an object-oriented
paradigm and, as a result, is easier to use and is more elegant.
The user is relieved from having to manually acquire and release
resources which can lead to memory and resource leaks.
The exception support is transparent and seamlessly integrates into
any programming language (e.g. cross platform C++, C#, VB.Net, or any
other .NET language).
The identical API is available for both managed languages (C#,
VB.NET, Java, Managed C++) and for native code (C/C++). As a result,
it is very straight forward to port applications back and forth
between managed and unmanaged mode.
The SDF API seamlessly integrates with high-level API-s. For
example, a user can obtain a low-level object from any PDF object
using the GetSDFObj() method and can create a high-level PDF object
from an SDF object. As a result, the user is empowered with tools to
implement custom high-level extensions.
The number of lines of code required to implement the same
functionality using SDF API is significantly lower than the number of
lines required to implement the same function using Acrobat API.

Question 2: SDF::Obj::Type does not distinguish between Int, Fixed
and Real. Is there may be a workaround? Some of our code might need to
make this distinction, e.g.:

PDFNet simplifies the API usage by treating all numbers as ‘doubles’.
Depending on context (which is defined in PDF specification) you could
treat a given number as Int or Real (Fixed numbers are never directly
used). Also PDF Specification itself does not explicitly distinguish
between Int or Real and this distinction comes from the context (e.g.
because PDF Spec says that a Rotate parameter in page dictionary is an
integer you can safely cast the number into an ‘int’ - int rot = (int)
page.GetSDFObj().Get(“Rotate”).GetNumber()).

If you would really like to distinguish between Int/Fixed/Real you
could write a utility function such as (or along the same lines):

bool IsInt(double num) { return int(num) == num; }

Question 3: How do we get the PDF::PDFDoc from a SDF::SDFDoc?

You could use PDFDoc(SDF::SDFDoc& sdfdoc) constructor but keep in mind
that this constructor takes the ownership of SDFDoc (which becomes
invalid).

Actually in PDFNet there is no real difference between TRN_SDFDoc and
TRN_PDFDoc (defined in Headers\C\Common\TRN_Types.h). As a result you
can do a simple type cast between these types. You are also free to
extend PDFNet headers to include your custom constructor. For example:

PDFDoc(TRN_SDFDoc d) : mp_doc((TRN_PDFDoc)d) {} PDFDoc PDDocFromCosDoc
(SDFDoc cosDoc) {
return PDFDoc(cosDoc.mp_doc);
}

Question 4: PDF::Page::GetPage returns a reference to a page, not a
pointer to a page. Is the following code correct?

Actually Page object in PDFNet is a simple wrapper around SDF::Obj
(TRN_Page is same as TRN_Obj). There no need to release the TRN_Page/
TRN_obj. So you could write you code as follows:

typedef TRN_Page PDPage;

PDPage PDDocAcquirePage(PDDoc doc, ASInt32 pageNum) {
pdftron::PDF::Page page=doc->GetPage(pageNum+1);
if (page.IsValid()) {
return page.mp_page;
}
else {
return 0;
}
}

void PDPageRelease(PDPage page)
{
// No op…
}

Void DOsomethinOnPage(PDPage page)
{
pdftron::PDF::Page p((TRN_Page)page);
p.SetMediaBox(…);
…
}

Question 5: Coordinate system. I couldn’t find any documentation about
the coordinate system used in PDFNetC. Is the origin at the lower left
corner of the page? Are coordinates in points (1 pt = 1/72inch)?

PDFNet is using PDF coordinate systems described in PDF reference. The
origin is in the lower left corner of the page and the units are
points.

I always got width==10.0 and height==10.0 regardless of the actual
page size. This is the code I use:

PDFNet is not using fixed points (which is an artifact in Acrobat API
and is not part of PDF Spec). The actual PDF units (points) are
returned directly by page.GetCropBox()/GetMediaBox() and they don’t
need any manipulation.

Question 6: What is the best way to draw a page to a Windows screen
device context? I found an undocumented function
PDF::PDFDraw:: DrawInRect which seems to lack a few parameters, e.

This member function is documented in ‘Headers\PDF\PDFDraw.h’

Draws the contents of the page to a given device context.
@note this method is only supported on Windows platforms.
If your application is running on a Windows platform, you can
select GDI+ rasterizer with SetRasterizerType() and e_GDIPlus type.
@param page The source PDF page.
@param hdc Device context (i.e. HDC structure).
@param rect The rectangle in the device context inside of which
the page will be drawn.

void DrawInRect(Page& page, void* hdc, const Rect& rect);

PDFDraw is a utility class around PDFRasterizer. Since you would like
to use matrix, clip rectangle, and other advanced parameters you may
want to use PDFRasterizer::Rasterize() method instead.

/**

Draws the contents of the page to a given device context.
This method can be used to send raster or GDI output to arbitrary
Windows device context (e.g. memory bitmap device, printer device,
or EMF/WMF record).
See also PDF::PDFDraw::DrawInRect()
@note this method is only supported on Windows platforms.
If your application is running on a Windows platform, you can
select GDI+ rasterizer with SetRasterizerType() and e_GDIPlus type.
@param page The page to rasterize.
@param hdc Device context (i.e. HDC structure).
@param device_mtx Device transformation matrix that maps PDF page
from PDF user space
into device coordinate space (e.g. pixel space). PDF user space is
represented in
page untis, where one unit corresponds to 1/72 of an inch.
@param clip Optional parameter defining the clip region for the
page. In the parameter
is null or is not specified, PDFRasterizer uses page’s crop box as
a default clip region.
@param dpi Optional parameter used to specify the resolution of the
rasterized bitmap.
This paramter is used only for built-in rasterizer. If the
parameter is not specified or
is less than zero, PDFNet will query the target device to determine
the output resolution.
@param cancel An optional variable that can be used to stop the
rendering thread.
/
void PDFRasterizer::Rasterize(Page& page, void hdc, const
Common::Matrix2D& device_mtx, const Rect* clip = 0, int dpi = -1,
volatile bool* cancel = 0);

Question 7: Exception handling. The PDFNetC library functions might
throw exceptions of the type Common::Exception. Are these the only
ones we have to catch or do we also have to look for std::exception or
others?

Correct, PDFNet will throw only exceptions of type
pdftron::Common::Exception.

Also, it looks like the Exception class doesn’t have an error code
member, only a message string. Is this correct?

This is correct. You could use a message string to generate an error
code (e.g. using a message digest or similar).

#define DURING try {
#define HANDLER } catch (pdftron::Common::Exception& ex) {
#define END_HANDLER }
#define ERRORCODE ((ASInt32)ASAtomFromString(ex.GetMessage()))

This looks good.

Question 8: Unicode strings. The PDFNet library uses a string class
called UString. Why not using std::wstring? Are there helper functions
to convert from UString to std::wstring and vice versa?

We are not using std::wstring because of portability issues. Some
platforms/compilers use 2 bytes to represent wchar_t, some use 4
bytes. Also the memory layout for std::string is not consistent even
for different versions of the same compiler. UString provides some
utility methods to convert to std::string (e.g. std::string
ConvertToUtf8(); std::string ConvertToAscii()) - and, if required, you
can definitely add any additional conversions.

Question 9. How to compare Obj instances? Is this code correct?

Your function is on the right track. Depending on the semantics of
CosObjEqual, you may need to comment out IsIndirect() line. For
example

ASBool CosObjEqual(CosObj cosObj, CosObj cosObj2) {
if (cosObj==cosObj2) return true;
if (cosObj==NULL || cosObj2==NULL) return false;
return cosObj->IsEqual(*cosObj2));
}

Question 11. Can I delete multiple pages from a document in one step?
Currently I do this:

Yes, you can delete multiple pages in one loop. Your code seems to be
roughly correct (you may want to check that page iterator actually
exist, but the idea is ok).

Question 12. How can I copy multiple pages from one document into
another?

You can copy multiple pages from one PDF document into another as show
in the last code snippet in PDFPage sample (http://www.pdftron.com/
pdfnet/samplecode.html#PDFPage):

// In C++ …
PDFDoc in_doc((input_path + “newsletter.pdf”).c_str());
in_doc.InitSecurityHandler(); PDFDoc new_doc;

vector copy_pages;
for (PageIterator itr=in_doc.GetPageIterator(); itr.HasNext(); itr.Next
()) { copy_pages.push_back(itr.Current());
}

vector imported_pages = new_doc.ImportPages(copy_pages);
vector::iterator i; for (i=imported_pages.begin(); i!
=imported_pages.end(); ++i) {
new_doc.PagePushFront(*i); // Order pages in reverse order.
// Use PagePushBack() if you would like to preserve the same order.
}

new_doc.Save((output_path + “newsletter_import_pages.pdf”).c_str(),
0, NULL);
//--------------------

So you create a vector of source pages from the source document, then
call doc.ImportPages, and then place imported pages at the desired
location within the destination page sequence.

How can I check if doc and doc2 point to different files?

There are couple of options. You could compare input file names. If
you need to compare two different files on the disk (or memory) you
would need to compare their data (e.g. using memcmp or similar).

Question 13. Is there a good way to create a temporary copy of a
currently opened document? Currently we do something like this:

PDDoc PDDocCreate() {
return new pdftron::PDF::PDFDoc();
}
PDoc pdoc_new=PDDocCreate();
PDDocInsertPages(pdoc_new, PDBeforeFirstPage, pddoc, 0, PDAllPages,
PDInsertAll, NULL, NULL, NULL, NULL); Will this copy all embedded fonts and copy annotations correctly?

This approach will copy all embedded fonts and annotations but it is
not guaranteed to be exact replica of the original document (e.g. some
custom properties, unreferenced objects, and metadata may not be
copied).

Instead you could clone the original document in a memory buffer (or
you could use pdfdoc.Save(mem_buf) if the document is dynamic) and
then load another instance of that document using PDFDoc(mem_buf,
mem_buf_sz).

Question 14. How do I insert an empty new page?

To create a blank new page use pdfdoc.PageCreate(). For example:

pdftron::PDF::Rect media_box = …;
pdftron::PDF::Page page pdfdoc.PageCreate(media_box);
pdfdoc.PagePushBack(it, page);

Aaron_Gravesdale · September 10, 2009, 5:25pm

Q: Is it possible to use the PDFNet SDK from just C code rather than C+
+? As the vast majority of our existing code is in C, this could make
the porting much easier.
------
A: The C API is located under 'PDFNet\Headers\C'. You can use C API
directly, however C++ API is easier to use and there is less code to
write (e.g. there is no need to manually free allocated resource, to
check for error in each function call, etc). Because the C++ wrapper
is using C API you can take a look at its implementation (located
under 'Headers\Impl') as an example of how to use C API.

Aaron_Gravesdale · September 16, 2009, 10:55pm

Q: Could you provide a simple sample in C (not C++) that does
something like open a PDF and save it? I am struggling to see how to
get the equivalent of PDDocOpen(filename) from C working with your API
(although from C++ it is easy).

A: I didn’t find any ready-to-use C code snippet, however direct
translation/lookup from C++ sample gives the following calls:

#include <C/PDF/TRN_PDFNet.h>
#include <C/PDF/TRN_PDFDoc.h>
#include <stdio.h>

// Open a PDF and print out its page count
int main(int argc, char *argv[])
{
int ret = 0;
TRN_Exception ex=TRN_PDFNetInitialize(0);

TRN_PDFDoc doc;
ex=TRN_PDFDocCreateFromFilePath(“c:\my.pdf”, &doc);
if(ex!=0) { assert(false); return 1; }

int page_count = 0;
ex=TRN_PDFDocGetPageCount(doc, &page_count);

printf("%d\n", page_count);

ex=TRN_PDFDocDestroy(doc);
if(ex!=0) { assert(false); return 1; }

TRN_PDFNetTerminate();
return 0;
}

Aaron_Gravesdale · September 21, 2009, 10:42pm

Q: Thanks for the reply. I was having a problem opening an existing
file and now realise that it probably failed because I had an invalid
path to a file.

I have taken the code you sent and put that into a main.cpp in my
test.
That works to read the page count. However, when saving I have now
found out that my chosen test file (your Samples/TestFiles/
newsletter.pdf) does *not* get the Title field changed, however using
a different PDF
(numbered.pdf) it does work. There is no security set in the original
newsletter.pdf, so why doesn't it work? I am attaching my code for you
to see and/or try out.
----------
A: We checked the output file using CosEdit (http://www.pdftron.com/
pdfcosedit) and the tile is set properly.
The problem is that this file also contains an XMP metadata steam
which overrides doc info properties. You could remove the XMP stream
as follows:

In C++:
doc.GetRoot().Erase("Metadata");

In pure C:
  TRN_Obj root;
  TRN_PDFDocGetRoot(doc, &root);
  TRN_ObjEraseFromKey(root, "Metadata");

Btw, when you are developing in "C" language you need to explicitly
destroy UStrings when they are no longer in use (otherwise you will
end up with memory leaks).

Migrating from Adobe/Datalogics API to PDFNet SDK for C/C++

pdftron::PDF::Rect media_box = …; pdftron::PDF::Page page pdfdoc.PageCreate(media_box); pdfdoc.PagePushBack(it, page);

pdftron::PDF::Rect media_box = …;
pdftron::PDF::Page page pdfdoc.PageCreate(media_box);
pdfdoc.PagePushBack(it, page);