How do I pick which PDF/A Conformance Level to convert to?

Question:

I am trying to figure our what PDF/A conformance level to use.

What if I wanted to convert a PDF file of version 1.7 to PDF/A-1 (level 1A or 1B) using PDFACompliance class of PDFNet SDK?

What happens?

  1. conversion will be unsuccessful?

  2. some information loss will occur during conversion?

  3. features that are not in PDF specification 1.4 will remain untouched?

Answer:

I take it you don’t have a requirement for a certain version of PDF/A?

If not then, the following is a very simple break down.

PDF/A-1 : Corresponds to PDF version 1.4. But with some of the following forbidden

  • Layers/Optional Content Groups

  • External Resources

  • Transparencies

  • Soft Masks

PDF/A-2 : Corresponds to PDF version 1.7 (ISO32000).

PDF/A-3 : Essentially the same as PDF/A-2, but allows for the embedding of non-PDF/A files.

For each there are also different levels.

B : A relaxation of level A, and just aims at reliable rendering. Does not require logical structure or valid textual (unicode) information.

U : Introduced with PDF/A-2, level U is a middle ground between A and B, and includes valid unicode for valid textual information, but does not require logical structure.

So what does this all mean?

Getting your typical PDF file to conform to PDF/A-1 will probably require the most changes, as transparencies are pretty common, and also the file will quite possibly grow in size due to some missing compression features in PDF 1.4 standard. This is due to the many technical short comings of PDF 1.4.

Also reaching Level A conformance is often not possible using automatic conversion for the majority of PDF files out there, and so failure to automatically convert will occur more often, and require extra (probably manual) pre-processing to pass. See this post for more info.
https://groups.google.com/d/msg/pdfnet-sdk/OgaZdEp1mtI/DRf0WtqJ2sMJ

So, given no other restrictions, I would recommend targeting PDF/A-2B. However, converting to PDF/A-1B is (almost) as likely to pass. If you are confident that the unicode mapping of text is correct, then try for PDF/A-2U.

With PDF/A-2 you will get smaller files, and most likely will require the least amount of changes, as it supports the newer PDF 1.7 spec (and older 1.4). Targeting PDF/A-1 is more likely to force changes to the PDF, and result in larger size due to less available features for minimizing file size.

As for Level, B is the easiest one to target, and just ensures that rendering will be reliable (such as requiring fonts to be embedded). Targeting Level A requires Level B, but also a lot of semantic information, that might not be present in the PDF. Given a typical PDF, converting to Level B is far more likely to succeed automatically, than Level A. Level A is more likely to fail, and require extra (probably manual) processing to pass. See this post for more info.
https://groups.google.com/d/msg/pdfnet-sdk/OgaZdEp1mtI/DRf0WtqJ2sMJ

So, given no other restrictions I would recommend targeting PDF/A-2B. However, converting to PDF/A-1B is (almost) as likely to pass.

What if I wanted to convert a PDF file of version 1.7 to PDF/A-1 (level 1A or 1B) using PDFACompliance class of PDFNet SDK?

What happens?

  1. conversion will be unsuccessful?

Perhaps, but PDFNet will do its best to pass. 1B is most likely to be successful.

  1. some information loss will occur during conversion?

Not that I am aware of, unless something would cause PDF/A-1 compliance to fail. This would be reported in the report.

  1. features that are not in PDF specification 1.4 will remain untouched?

If they don’t cause PDF/A-1 failure, then yes they remain untouched. Otherwise, this would be logged in the report.

Please note our PDFA converter provides detailed messaging explaining what it did.
A : Is aimed at ensuring both reliable rendering of the content, but also logical structure and textual information. The objective being that the document can be reliably and consistently displayed, but its information is also accessible, and even repurposed.