Product: PDFNetPython3
Product Version: demo
PDFDraw()/PDFRasterizer() is very slowly!?
I need to render pdf page to image in my project, and now I am looking for some librarys(xpdf, pymupdf, pdftron).
My use case is:
- convert page of origin and modified pdfs to image
- compare the two images and display only the difference sections in GUI.
test contents:
convert a pdf file which contains 20 pages(size=A4) to image(@216dpi).
test environment:
mac mini(2018)
OS: macOS Big Sur
CPU: Intel Core i3 3.6GHz
memory: 16GB
Harddisk: 128G SSD
test result(time taken: seconds):
PyMupdf: 0.25 s
Xpdf: 0.7 s
pdftron: PDFDraw/PDFRasterizer: 10.5 s (almost 40 times of pymupdf)
Below is my code:
from datetime import datetime
from PDFNetPython3 import *
from PIL import Image
PDFNet.Initialize( "demo:******")
doc = PDFDoc("20P.pdf")
doc.InitSecurityHandler()
# ---------- pdftron draw
draw = PDFDraw()
draw.SetDPI(216)
begin = datetime.now()
for i in range(1, doc.GetPageCount() + 1):
pg = doc.GetPage(i)
bm = draw.GetBitmap(pg, PDFDraw.e_rgb)
#a = Image.frombuffer('RGB', (bm.width, bm.height), bytes(bm.GetBuffer()))
print(datetime.now() - begin)
# a.show()
#
# ---------- pdftron Rasterize
box = Page.e_crop
begin = datetime.now()
for i in range(1, doc.GetPageCount() + 1):
pg = doc.GetPage(i)
mtx = pg.GetDefaultMatrix(True, box)
pg_w = pg.GetPageWidth(box)
pg_h = pg.GetPageHeight(box)
# C) Scale matrix from PDF space to buffer space
dpi = 216
scale = dpi / 72.0 # PDF space is 72 dpi
buf_w = int(scale * pg_w)
buf_h = int(scale * pg_h)
bytes_per_pixel = 4 # BGRA buffer
mtx = Matrix2D(scale, 0, 0, scale, 0, 0).Multiply(mtx)
rast = PDFRasterizer()
buf = rast.Rasterize(pg, buf_w, buf_h, buf_w * bytes_per_pixel,
bytes_per_pixel, False, mtx)
# a = Image.frombuffer('RGBA', (buf_w, buf_h), bytes(buf))
print(datetime.now() - begin)
code of pymupdf:
from datetime import datetime
import fitz
from PIL import Image
doc = fitz.open("20P.pdf")
ppi = fitz.Matrix(3, 3) # 72*3=216ppi
begin = datetime.now()
for p in range(doc.page_count):
pm = doc[p].get_pixmap(matrix=ppi)
pil = Image.frombuffer('RGB', (pm.width, pm.height), pm.samples)
print(datetime.now() - begin)
code of xpdf:
from pyxpdf import Document
from pyxpdf.xpdf import RawImageOutput
from datetime import datetime
doc = Document("20P.pdf")
imgs = RawImageOutput(doc, mode="RGBA", resolution_x=216, resolution_y=216)
begin = datetime.now()
for i in range(doc.num_pages):
pil = imgs.get(i)
print(datetime.now() - begin)
I am sorry for my poor English. Did I tested in an incorrect way?
Thanks very much.