ermansay.in

Extract images from PDF

21 Sep 2022

A while ago, I came across a small graphic novel in a PDF file, and I needed the pages as regular image files.

Here's a quick way to extract bitmap images from PDFs, using Python.

# Install dependency
pip install pymupdf
import fitz
doc = fitz.open("/path/to/file.pdf")
for i in range(len(doc)):
    for img in doc.get_page_images(i):
        xref = img[0]
        pix = doc.extract_image(xref)

        imgout = open("p%s-%s.png" % (i, xref), "wb")
        imgout.write(pix["image"])
        imgout.close()