Using AWS Lambda for Image Processing

AWS Lamba is one of the very useful AWS service for performing tasks which requires more computing power like image processing. Especially if you are a startup and do not want to spend too much money upfront for reserving a high capacity EC2 instance, Lambda is your best choice. If your company has grown and if there is a need to process many images in a short duration, still Lambda comes to the rescue with its automagic demand based scaling. With AWS Lambda, scaling is no more a pain point because AWS takes care of it.

With free seconds offered every month, we can start using Lambda at zero cost and continue to use it till the point where Lambda bill is costlier than owning an EC2 instance or your own server. In fact, we can squeeze more juice out of it by using some clever techniques (which I will brief towards the end of this article), but everything has a limit.

Before starting, I would like to reiterate that Lamba is not a solution to every problem. Flynn recently posted an article about Lambda being not ready for prime time yet, I agree with his points. On the other side, Lambda is capable of solving a handful of problems that Flynn did not mention about. He just covered only one side. I am sure Lambda will continue to evolve and solve many more category of problems in future.

Enough of theory and lets get our hands dirty.

Lets take an example where we want to resize images stored in S3 to different dimensions and upload all resized images back to S3. Lets assume the Lambda client will send a request JSON containing array of images and array of dimensions to resize for each image as shown below.

{
  "bucket": "myImagesBucket",
  "image_list": [
    {
      "src_key": "images/headerimage.jpg",
      "resize_options": [
        {
          "dest_key": "images/headerimage_250X250_0.jpg",
          "width": 250,
          "height": 250
        },
        {
          "dest_key": "images/headerimage_500X500_0.jpg",
          "width": 500,
          "height": 500
        }
      ]
    },
    {
      "src_key": "images/cover.jpg",
      "resize_options": [
        {
          "dest_key": "images/cover_250X250_1.jpg",
          "width": 250,
          "height": 250
        },
        {
          "dest_key": "images/cover_500X500_1.jpg",
          "width": 500,
          "height": 500
        }
      ]
    }
  ]
}

I am passing S3 bucket name and image_list (an array of objects) in the request JSON. Each image_list contains src_key which is a reference to the filename (which needs to be resized) stored in S3 bucket and array of resize_options. Each resize_option specifies the width and height (in pixels) to which that image has to be resized and a dest_key which is the filename to store that resized image in S3 bucket. In short, client will dictate the list of images to be resized, dimensions and file name to store the result in S3. Lambda function will just take care of processing it without worrying about the client or its business logic.

To implement Lambda function, we have options to code using three languages – Python, Java and NodeJS. This list is really short and developers should get more flexibility going forward. I have selected NodeJS mainly for its non-blocking IO and event driven processing. Lambda’s billing is based on the combination of number of seconds a function takes to complete and the memory size reserved for that function. Since I am downloading and uploading images from S3, blocking IO will result in wasted seconds. If a single image download/upload takes 1 second each and if I want to process 5 images, 10 seconds will be lost in IO itself. In case of non-blocking IO, the same IO operation will complete in 2+ seconds and this is a huge saving.

Here is the code.

'use strict';
var AWS = require('aws-sdk');
var im = require('imagemagick');
var fs = require('fs');
var async = require('async');

//AWS Configurations
AWS.config.update({region: 'us-east-1'});
AWS.config.update({accessKeyId: 'ACCESSKEYID', secretAccessKey: 'SECRETACCESSKEY'});

// get reference to S3 client 
var s3 = new AWS.S3();

// response object returned from the lambda function
var output = [];
var bucket = '';

const resize = (bucket, response, resize_option) => {
    console.log('Resize operation started for - ' + resize_option.dest_key);
	var filename = resize_option.dest_key.replace(/^.*[\\\/]/, '');
	var resizedFile = "/tmp/" + filename; 
	var resize_req = {width:resize_option.width, height:resize_option.height, srcData:response.Body, dstPath:resizedFile};
	
	try {
		im.resize(resize_req, (err, stdout, stderr) => {
			if (err) {
				throw err;
			} else {
				console.log('Resize operation completed successfully for - ' + resize_option.dest_key);
				var resized_image_content = new Buffer(fs.readFileSync(resizedFile));
				var put_params = {Bucket: bucket, Key: resize_option.dest_key, Body: resized_image_content, ContentType: response.ContentType};
				// Upload the resized file to S3
				s3.putObject(put_params, function (err) {
					if (err)
					{
						console.log('Failed to upload the file - ' + resize_option.dest_key);
						console.log(err, err.stack);
					}
					else
					{
						console.log('Successfully uploaded the file - ' + resize_option.dest_key);
						var result = {};
						result.dest_key = resize_option.dest_key;
						result.status = true;
						output.push(result);
						
						//Delete the created tmp file
						try {
							fs.unlinkSync(resizedFile);
						} catch (err) {
							console.log('Failed to unlink the temporary file - ' + resizedFile);
						}
						
						console.log('Done resizing for - ' + resize_option.dest_key);
					}
				});
			}
		});
	} catch (err) {
		console.log('Resize operation failed:', err);
	}
};

const process_single_image = (single_image, callback) => {
	console.log('started processing the image - ' + single_image.src_key);
	var src_key = single_image.src_key;
	var get_params = {Bucket:bucket, Key:src_key};

	s3.getObject(get_params, function(err, response) {
		if (err) {
			console.log('Failed to download the file');
			console.log(err, err.stack);
		}
		else {
			console.log('Successfully downloaded the file - ' + src_key);
			
			async.forEach(single_image.resize_options, function(resize_option, resizeCallback) {
				console.log('Resize operation started for - ' + resize_option.dest_key);
				var filename = resize_option.dest_key.replace(/^.*[\\\/]/, '');
				var resizedFile = "/tmp/" + filename; 
				var resize_req = {width:resize_option.width, height:resize_option.height, srcData:response.Body, dstPath:resizedFile};
				
				try {
					im.resize(resize_req, (err, stdout, stderr) => {
						if (err) {
							throw err;
						} else {
							console.log('Resize operation completed successfully for - ' + resize_option.dest_key);
							var resized_image_content = new Buffer(fs.readFileSync(resizedFile));
							var put_params = {Bucket: bucket, Key: resize_option.dest_key, Body: resized_image_content, ContentType: response.ContentType};
							// Upload the resized file to S3
							s3.putObject(put_params, function (err) {
								if (err)
								{
									console.log('Failed to upload the file - ' + resize_option.dest_key);
									console.log(err, err.stack);
								}
								else
								{
									console.log('Successfully uploaded the file - ' + resize_option.dest_key);
									var result = {};
									result.dest_key = resize_option.dest_key;
									result.status = true;
									output.push(result);
									
									//Delete the created tmp file
									try {
										fs.unlinkSync(resizedFile);
									} catch (err) {
										console.log('Failed to unlink the temporary file - ' + resizedFile);
									}
									
									console.log('Done resizing for - ' + resize_option.dest_key);
									resizeCallback();
								}
							});
						}
					});
				} catch (err) {
					console.log('Resize operation failed:', err);
				}
			}, function(err) {
				if (err)
				{
					console.log('image resizing failed');
				}
				else
				{
					console.log('image resizing completed for - ' + src_key);
					callback();
				}
			});
		}
	});
};

exports.handler = function(event, context, callback) {
    console.log('entered handler function');
	bucket = event.bucket;
	
	async.forEach(event.image_list, process_single_image, function(err) {
		if (err)
		{
			console.log('image processing failed');
		}
		else
		{
			console.log('its done buddy!!');
			callback(null, output);
		}
	});
};

Above Lambda function will return a JSON response containing the list of all successfully resized images. You can modify the code to include failed images and so on as per your need. Here is the sample response.

[
  {
    "dest_key": "images/cover_250X250_1.jpg",
    "status": true
  },
  {
    "dest_key": "images/headerimage_250X250_0.jpg",
    "status": true
  },
  {
    "dest_key": "images/headerimage_500X500_0.jpg",
    "status": true
  },
  {
    "dest_key": "images/cover_500X500_1.jpg",
    "status": true
  }
]

Above Lambda function will work without Async module, but the only drawback is that you cannot return the final status for all images. Async module is used to make sure that a callback is triggered after the image processing is complete for all images in the request. We aggregate the status for each image and return the complete status in that callback. Of-course we can use some other workarounds (like checking the presence of dest_key in S3 bucket from your client side and so on) to determine the status of the request, but why to complicate simple things. Please note that Async is not in the default list of modules provided by Lambda. So you will need to manually install Async module in your computer, ZIP it along with your code and upload that ZIP file to Lamda while creating your function. Given that Async (or other similar modules) is required to perform even very simple things like returning the status, Lambda could have added it in their default list of modules. Especially with API Gateway, Lambda is being projected as a way to create web services without any server (Serverless Architecture). There are not many web services I am aware of that don’t return final response back to the caller. I hope Lambda will add Async module in the near future.

Few Suggestions
  1. Use a language that supports non-blocking IO to implement Lambda function. I have used NodeJS for the same reason. Even if your Lambda function does have any IO operation, still it is better to go with non-blocking IO since you may not know your future needs.
  2. Try to combine multiple images into a single request to get more juice out of Lambda. In my experiments, processing a single image of size 250KB was taking ~3 seconds which brings the cost per image to 3 seconds. But if I combine, lets say 20 images into a single request, it took just 20+ seconds to process all those images. Cost per image now comes down to 1 second. That’s because of the non-blocking IO and event driven processing built in NodeJS. This will cut down your overall bill by 67% which is very significant.
  3. Make sure the client also has the capability to trigger parallel requests to Lambda instead of synchronously calling the Lambda function in a loop. Otherwise you will be ignoring the scalalability aspect of Lambda function.
  4. Test, Test and Test – You have to determine the limit of a Lambda function and play under those limits. When you create a Lambda function, you will be asked to choose the memory required to run that Lambda function. Memory size will also play a critical role in billing along with the number of seconds. Once you choose your required memory size, you have to identify the number of images (or total size of all images) that can be processed by that Lambda function. If you send a request that consumes more memory than what you have reserved, function execution will just fail with an error message. In my case, I have selected 512 MB as the memory size. For 512 MB, based on my experiments, I was able to safely send 15 MB worth of images. So if I have a single image of size 15 MB, I can send only one image for processing. If there are 60 images of size 250 KB each, I can send all those images. I put this logic in client and let Lambda to concentrate only on image processing. But you can enhance this Lambda function to take care of it by itself. The important point to note is that you have to test, test and test to identify the limits.

 

Have fun designing with Lambda. Please let me know your comments or suggestions below.

 

Your email address will not be published. Required fields are marked *

*