How can I extract file attachments from a PDF document?


How can I extract file attachments from a PDF document?

Apparently, file extraction has to do something with the FileSpec
class, but I couldn't relate it with PDFDoc.

The code used to transverse low-level objects may look as follows
(most of the code was extracted from JBIG2 sample project):

In C#:

Doc cos_doc = doc.GetSDFDoc();
int num_objs = cos_doc.XRefSize();

for (int i=1; i<num_objs; ++i) {
Obj obj = cos_doc.GetObj(i);
if (obj!=null && !obj.IsFree()&& obj.IsDict()){
   // Process only FileSpec objects
   DictIterator itr = obj.Find("Type");
   if (itr == obj.DictEnd() || itr.Value().GetName() != "FileSpec")
   FileSpec file_spec = FileSpec(obj);

To extract files from 'File Attachment Annotations' you could use the
following code (similar to code from Annotation sample project):

PageIterator end = doc.PageEnd();
for (PageIterator itr = doc.PageBegin(); itr!=end; itr.Next()) {
Page page = itr.Current();
int num_annots = page.GetNumAnnots();
for (int i=0; i<num_annots; ++i) {
   Annot annot = page.GetAnnot(i);
   if(annot.GetType() == Annot.Type.e_FileAttachment) {
     FileSpec file_spec = annot.GetFileAttachment();

Embedded file streams can be also associated with the document as a
whole through the EmbeddedFiles entry in the PDF document's name
dictionary. The associated name tree maps name strings to file
specifications that refer to embedded file streams through their 'EF'

Using PDFNet you can traverse 'EmbeddedFiles' map using
pdftron.SDF.NameTree as follows:

PDFDoc doc = new PDFDoc("mypdf.pdf");

NameTree file_map = SDF.NameTree.Find(doc, "EmbeddedFiles");
if (!file_map.IsValid()) return;

// Traverse all entries in the NameTree...
SDF.NameTreeIterator end = dests.End();
SDF.NameTreeIterator i = dests.Begin();
for (; i!=end; ++i) {
  String key = i.Key().GetStr()
  Obj value = i.Value();
  FileSpec file_spec = new FileSpec(value);

The code from the last part of the previous post is out of date. Below is the current way to parse file attachments.

NameTree file_map = NameTree.Find(doc, “EmbeddedFiles”);
if (!file_map.IsValid()) return;

// Traverse all entries in the NameTree…
NameTreeIterator i = file_map.GetIterator();
for (; i.HasNext(); i.Next())
String key = i.Key().GetAsPDFText();
Obj value = i.Value();
FileSpec file_spec = new FileSpec(value);
Console.WriteLine(String.Format("{0} {1}", key, file_spec.GetFilePath()));