How do I extract file attachments from PDF?

Aaron_Gravesdale · November 16, 2007, 11:47pm

Q: How do I extract file attachments from PDF?
--------
A:
To extract data associated with 'File Attachment Annotations' you
could use the following code (similar to code from Annotation sample
project):

PageIterator end = doc.PageEnd();
for (PageIterator itr = doc.PageBegin(); itr!=end; itr.Next()) {
  Page page = itr.Current();
  int num_annots = page.GetNumAnnots();
  for (int i=0; i<num_annots; ++i) {
   Annot annot = page.GetAnnot(i);
   if(annot.GetType() == Annot.Type.e_FileAttachment) {
     FileSpec file_spec = annot.GetFileAttachment();
     Filter stm = file_spec.GetFileData();
     if (stm != null)
       FilterReader reader = new FilterReader(stm);
       // use file_spec.GetFilePath() to get the filename...
       StdFile out_file = new StdFile("out.dat",
StdFile.OpenMode.e_write_mode);
       FilterWriter writer = new FilteWriter(out_file);
       writer.WriteFilter(reader);
       writer.Flush();
       out_file.Close();
   }
  }
}

Embedded file streams can be also associated with the document as a
whole through the EmbeddedFiles entry in the PDF document's name
dictionary. The associated name tree maps name strings to file
specifications that refer to embedded file streams through their 'EF'
entries.

Using PDFNet you can traverse 'EmbeddedFiles' map using
pdftron.SDF.NameTree as follows:

PDFDoc doc = new PDFDoc("mypdf.pdf");
doc.InitSecurityHandler();

NameTree file_map = SDF.NameTree.Find(doc, "EmbeddedFiles");
if (!file_map.IsValid()) return;

// Traverse all entries in the NameTree...
SDF.NameTreeIterator end = dests.End();
SDF.NameTreeIterator i = dests.Begin();
for (; i!=end; ++i) {
  String key = i.Key().GetStr()
  Obj value = i.Value();
  FileSpec file_spec = new FileSpec(value);
  ... same as above ...
}

Also please see the following

http://groups.google.com/group/pdfnet-sdk/browse_thread/thread/21da509149023400

Ivanho · April 22, 2013, 9:27pm

Q:

n Adobe Reader, we could double-click the link to open the jpg file in Windows Photo Viewer. While, in our PDF Viewer (using PDFNet) or your PDFViewer sample application, when double-clicking the link we only got a small empty “Sticky Note” window. We are wondering whether this kind of internal or external file links in a pdf document is supported by PDFNet or not? Could we implement something using PDFNet SDK so that the file link will work?

A:

You can use the following code as a starting point to extract file attachments via PDFViewCtrl (WinForms):

protected override void OnMouseDoubleClick(System.Windows.Forms.MouseEventArgs e)
{
// base.OnMouseDown(e); // First process the event in the base class
int page_num = GetPageNumberFromScreenPt(e.X, e.Y);
if (page_num < 1) return;
// Find the click point in page coordinate system... is there a file attachment annotation at this point?
double x = e.X, y = e.Y;
ConvScreenPtToPagePt(ref x, ref y, page_num);
Page page = GetDoc().GetPage(page_num);
int annot_num = page.GetNumAnnots();
for (int i = 0; i < annot_num; ++i)
{
Annot annot = page.GetAnnot(i);
// Process only file attachment annotations...
if (annot.IsValid() == false ||
annot.GetType() != Annot.Type.e_FileAttachment) continue;
Rect box = annot.GetRect();
if (box.Contains(x, y)) {
// Extract the file attachment ...
// See [https://groups.google.com/d/msg/pdfnet-sdk/gA8o_eKVG7c/kc0BsgEhif0J](https://groups.google.com/d/msg/pdfnet-sdk/gA8o_eKVG7c/kc0BsgEhif0J)
pdftron.PDF.Annots.FileAttachment fileAttachment = new pdftron.PDF.Annots.FileAttachment(annot);
FileSpec file_spec = fileAttachment.GetFileSpec();
using (Filter stm = file_spec.GetFileData()) {
if (stm != null) {
FilterReader reader = new FilterReader(stm);
// use file_spec.GetFilePath() to get the filename...
using (StdFile out_file = new StdFile("c:/1.jpg", StdFile.OpenMode.e_write_mode)) {
FilterWriter writer = new pdftron.Filters.FilterWriter(out_file);
writer.WriteFilter(reader);
writer.Flush();
}
// Launch the attachment in an external viewer ... ?
try {
System.Diagnostics.Process.Start("c:/1.jpg");
}
catch (System.ComponentModel.Win32Exception noBrowser) {
if (noBrowser.ErrorCode == -2147467259)
MessageBox.Show(noBrowser.Message);
}
catch (System.Exception other) {
MessageBox.Show(other.Message);
}
}
}
}
}
}

Please keep in mind that extracting and executing file attachments from PDF is potentially a dangerous operations (since embedded files may also contain executables, viruses, etc.).

Daniel_Lutz1 · November 5, 2019, 2:44pm

Hello support,

the StdFile class seems not to be existed. So today what ist the ebst way to safe a file attachment to disc?

Best regards
Daniel
Am Samstag, 17. November 2007 00:47:08 UTC+1 schrieb Support:

Q: How do I extract file attachments from PDF?

A:
To extract data associated with ‘File Attachment Annotations’ you
could use the following code (similar to code from Annotation sample
project):

PageIterator end = doc.PageEnd();
for (PageIterator itr = doc.PageBegin(); itr!=end; itr.Next()) {
Page page = itr.Current();
int num_annots = page.GetNumAnnots();
for (int i=0; i<num_annots; ++i) {
Annot annot = page.GetAnnot(i);
if(annot.GetType() == Annot.Type.e_FileAttachment) {
FileSpec file_spec = annot.GetFileAttachment();
Filter stm = file_spec.GetFileData();
if (stm != null)
FilterReader reader = new FilterReader(stm);
// use file_spec.GetFilePath() to get the filename…
StdFile out_file = new StdFile(“out.dat”,
StdFile.OpenMode.e_write_mode);
FilterWriter writer = new FilteWriter(out_file);
writer.WriteFilter(reader);
writer.Flush();
out_file.Close();
}
}
}

Embedded file streams can be also associated with the document as a
whole through the EmbeddedFiles entry in the PDF document’s name
dictionary. The associated name tree maps name strings to file
specifications that refer to embedded file streams through their ‘EF’
entries.

Using PDFNet you can traverse ‘EmbeddedFiles’ map using
pdftron.SDF.NameTree as follows:

PDFDoc doc = new PDFDoc(“mypdf.pdf”);
doc.InitSecurityHandler();

NameTree file_map = SDF.NameTree.Find(doc, “EmbeddedFiles”);
if (!file_map.IsValid()) return;

// Traverse all entries in the NameTree…
SDF.NameTreeIterator end = dests.End();
SDF.NameTreeIterator i = dests.Begin();
for (; i!=end; ++i) {
String key = i.Key().GetStr()
Obj value = i.Value();
FileSpec file_spec = new FileSpec(value);
… same as above …
}

Also please see the following

http://groups.google.com/group/pdfnet-sdk/browse_thread/thread/21da509149023400

Ryan · November 6, 2019, 1:16am

Yes, the StdFile class is gone. Instead the Filter class has a WriteToFile method.
https://www.pdftron.com/api/PDFTronSDK/dotnet/pdftron.Filters.Filter.html#pdftron_Filters_Filter_WriteToFile_System_String_System_Boolean_

Filter stm = file_spec.GetFileData(); stm.WriteToFile(path);