Extract images from PDF

    21 Sep 2022

    A while ago, I came across a small graphic novel in a PDF file, and I needed the pages as regular image files.

    Here's a quick way to extract bitmap images from PDFs, using Python.

    # Install dependency
    pip install pymupdf
    import fitz
    doc = fitz.open("/path/to/file.pdf")
    for i in range(len(doc)):
        for img in doc.get_page_images(i):
            xref = img[0]
            pix = doc.extract_image(xref)
    
            imgout = open("p%s-%s.png" % (i, xref), "wb")
            imgout.write(pix["image"])
            imgout.close()