Skip to content

Instantly share code, notes, and snippets.

@jeremy-code
Created August 23, 2025 03:13
Show Gist options
  • Select an option

  • Save jeremy-code/fc5349f9b9c8fc5a7c48387d2602a224 to your computer and use it in GitHub Desktop.

Select an option

Save jeremy-code/fc5349f9b9c8fc5a7c48387d2602a224 to your computer and use it in GitHub Desktop.
Extract all images from a PDF page using PDF.js
import { OPS, type PDFPageProxy, type ImageKind } from "pdfjs-dist/legacy/build/pdf.mjs";
// https://github.com/mozilla/pdf.js/blob/master/src/core/image.js#L698
type ImageObject = {
width: number;
height: number;
interpolate: undefined;
kind: (typeof ImageKind)[keyof typeof ImageKind];
data: Uint8Array | Uint8ClampedArray;
dataLen: number;
ref: string;
};
/**
* Extracts {@link ImageObject} instances from a PDF page from PDF.js
*/
const extractImagesFromPage = async (page: PDFPageProxy) => {
const operatorList = await page.getOperatorList();
const images = operatorList.fnArray.reduce<Record<string, ImageObject>>(
(acc, fn, index) => {
if (fn === OPS.paintImageXObject) {
const imageName = operatorList.argsArray[index][0];
const image: ImageObject = page.objs.has(imageName)
? page.objs.get(imageName)
: page.commonObjs.has(imageName)
? page.commonObjs.get(imageName)
: null;
acc[imageName] = image;
}
return acc;
},
{}
);
return images;
};
export { extractImagesFromPage };
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment