Convert PDF to Images using AWS Lambda with Nodejs

andyyou
3 min readJun 20, 2021

The requirement is converting PDF to images but I hope It can be done with simple solution. That’s why at very first step I consider using AWS Lambda. If you google it you should found a lot articles implement through Python. Unfortunately I am not good at Python that makes this post or say a note. Hope it can also help you.

WARNING: It's not 100% perfect solution, it can NOT support huge file probably you can not handle file over than 500MB. If you found other better way please share with me :)

Create a AWS Lambda Function

Login your AWS console and move to Lambda. Then create a function with name and default values.

Add AWS Layers

Befrore you coding we should complete settings. Secondly, we need to add two layers to make our libraries work without errors.

Deploy them to your region and keep ARN we will use later.

Go back to function and add layer with specify ARN.

Create Project

$ mkdir pdf-to-images
$ cd pdf-to-images
$ npm init --yes
$ npm i pdf-lib gm aws-sdk
$ touch index.js

Code Snippet

const fs = require('fs').promises;
const AWS = require('aws-sdk');
const gm = require("gm").subClass({imageMagick: true});
const s3 = new AWS.S3();
const PDFDocument = require('pdf-lib').PDFDocument;
const convert = (body, index, bucket, dist) => {
return new Promise((resolve, reject) => {
console.log(`gm process started: page ${index}.`);
gm(body, `pdf.pdf[${index}]`)
.resize(1536)
.density(200)
.quality(80)
.setFormat('jpeg')
.stream((error, stdout, stderr) => {
if (error) {
console.log("gm conversion process error::1");
reject(error);
}
const chunks = [];
stdout.on('data', (chunk) => {
chunks.push(chunk);
});
stdout.on('end', () => {
console.log(`gm process complete: page ${index}.`);
const buffer = Buffer.concat(chunks);
s3.putObject({
Bucket: bucket,
Key: `${dist}/${index}.jpeg`,
ContentType: 'image/jpeg',
Body: buffer,
ACL: 'public-read',
}, (error, data) => {
if (error) {
console.log("gm conversion process error::2");
reject(error);
}
resolve();
});
});
stderr.on('data', (data) => {
console.log('stderr:', data);
});
});
});
}
async function handler(event, context, callback) {
try {
console.log('starting converting process...');
console.log('start downloaded PDF...');
const bucket = event.Records[0].s3.bucket.name;
const key = decodeURIComponent(event.Records[0].s3.object.key).replace(/\+/g, ' ');
const pdf = await s3.getObject({
Bucket: bucket,
Key: key,
}).promise();
console.log('converting PDF to images...');
const pdoc = await PDFDocument.load(pdf.Body);
const pageCount = pdoc.getPageCount() - 1;
const pages = Array.from({ length: pageCount }).map((_, i) => i);
for await(let page of pages) {
await convert(pdf.Body, page, bucket, key);
}
callback(null, {
statusCode: 200,
message: 'Success',
});
} catch (error) {
console.error(JSON.stringify(error));
callback(null, {
statusCode: 400,
message: 'Failed',
});
}
}
exports.handler = handler;

Finally, zip the folder and upload it.

Conclusion

As I said it’s not perfect solution, but it avoid using /tmp because there is 512MB limitation. But still if you try to handle huge file you will get a lot of 0B files I have to warning. And also I am not AWS experts if you have any better and simple solution welcome to share.

--

--