Making Promises For a Better World

Do you think that we can make our world better by making and keeping our promises? If you agree, keep reading! Just kidding. Actually, in this post, we’re going to see how we can build better software by making promises in Node.js.

Node is built with callback design pattern in mind. Large part of its APIs accepts a callback function as a second parameter. Callbacks are very easy to grasp. They have become the default way of handling async data flows in JavaScript. But its simplicity comes at a price. While callback function is fine for simple cases, it may causes a lot of issues in complex cases:

  • Callback hell
  • Inconsistent behavior ( releasing Zalgo) )
  • Misbehaving callbacks
    • Callbacks may be called many times
    • Callbacks may never be called
    • Callbacks may be called too early

And, Promises came to save us from callbacks. But, in Node, how can we make Promises from the callback world? By promisifying!

Almost all Promise libraries support an extra API allowing us to make a Promise version from callback version:

What “making a promise” functions do is to take an asynchronous operation that conforms to error-first style (or Node style) and wrap that into a Promise. Note that “making a promise” functions can’t convert a synchronous operation into a Promise based operation.

In following examples, we’re going to use bluebird to make Promises. By default, bluebird append the suffix “Async” to the callback based operation name when creating a Promise based version: method1 –> method1Async

before

const fs = require('fs')

fs.readFile('./data.zip', (err, content) => {
  if (err) {
      return errorHandler(err)
  }
  console.log(content)
})

after

const Promise = require("bluebird")
const fs = Promise.promisifyAll(require("fs"))

fs.readFileAsync("./data.zip")
  .then(content => console.log(content))
  .catch(errorHandler)

before

import mysql from "mysql";
const pool = mysql.createPool({..});

module.exports.execute = function(statement, params) {
    pool.getConnection(function(e, connection) {
        if(e) {
            return errorHandler(e);
        } else {            
            connection.query(statement, (e, results, fields) =>{
                connection.release();
                if(e) {
                    return errorHandler(e);
                } else {
                    console.log(results);
                }
            });
        }
    });
}

after

import mysql from "mysql";
import Promise from "bluebird";

Promise.promisifyAll(require("mysql/lib/Connection").prototype);
Promise.promisifyAll(require("mysql/lib/Pool").prototype);

const pool = mysql.createPool(config.dbConnectionPool);

/**
 * API to execute a sql statement against the db.
**/
export function execute (sql, params) {
    return Promise.using(getConnection(), (conn)=>{
        return conn.queryAsync(sql, params);
    });
}

function getConnection(){
    return pool.getConnectionAsync().disposer((conn)=>{      
        conn.release();
    });
}

With Promises, our program is more clear by being closer to synchronous code, reducing nesting blocks and keeping track of less state.

Advertisements

Cross Origin Techniques

Cross Origin Techniques

To understand the techniques, we need to understand the problem we’re trying to solve: what are cross origin requests ?

But what is an origin first?

An origin defines where a resource lives.

Following URLs:

have the same origin: https://dzone.com

And, following URLs have different origins

URLs Origin
https://canho.me/2016/06/18/awaiting-aws-resources https://canho.me
http://canho.me http://canho.me
https://api.canho.me https://api.canho.me
https://canho.me:5000 https://canho.me:5000
file:///D:/projects/node-sample/package.json null

So, the origin is everything in the URL up until the path. In other words, the origin is a combination of the scheme, host, and port.

Same-origin vs Cross-origin

When we’re saying same-origin and cross-origin, we’re actually comparing origins of 2 objects: the client and the server. So, when an origin refers to the client making the request, it’s client origin. When an origin refers to the server receiving the request, it’s server origin.

So, we can define a request is a same-origin request when client origin and server origin are exactly the same.

Alt

Otherwise, the request is a cross-origin request.

Alt

But what is the problem anyway ? Why do I care about that ? Same-origin policy!
The same-origin policy restricts how a document or script loaded from one origin can interact with a resource from another origin. It is a critical security mechanism for isolating potentially malicious documents.

Same-origin policy is necessary but it’s too restrictive that causes some problems for servers using multiple domains and it’s hard for servers to open up its APIs to a new world of users.

Bellow are some techniques dealing with cross origin:

JSONP

JSONP – JSON with padding, is the oldest technique that is based on the fact that the browser doesn’t impose same-origin policy on script tag. And, we use the script tag to make cross-origin requests

<!DOCTYPE html>
<html>
    <head>
        function loadUsers(users) {

        };
        <script src="https://api.github.com/users?jsoncallback=loadUsers" />
    </head>
    <body>
    <div id="users">
    </div>


    </body>
</html>

The important part of script tag is the parameter jsoncallback. The value of this parameter is the name of an existing function loadUsers. When sending a response to the client, the server first pads the response with the name of the callback function, like this

loadUsers([{"id": "user1",...}, {"id": "user2"}])

When the client receives the response, it calls the callback function with the actual data returned by the server.

JSONP only supports GET requests. Ideally, this is use for sharing public data.

Cross-origin messaging

HTML5’s postMessage method allows documents from different origins to communicate with each other.

Alt

The page wanting send cross-origin requests needs to embed a document from the server via iframe and use postMessage to communicate with the iframe. As the iframe and the server are from the same origin, requests from the iframe and the server are same-origin requests.

As postMessage is a low level API, we may want a library acting as abstraction layer and providing high level messaging semantics on top of postMessage. There’re such libraries:

As postMessage is widely supported (see http://caniuse.com/#search=postmessage), this technique can be used in most cases.

Using proxy server

The same-origin policy is imposed by the browser on javascript code running in the browser. There isn’t such same-origin policy on the server. So, we can use a client’s own server acting as a proxy server receiving requests from the client and forwarding requests to the server.

Alt

Using this technique enables almost any type of cross-origin requests.

CORS

CORS – Cross-Origin Resource Sharing, is a W3C spec that allows cross-origin communication. CORS works by adding new HTTP headers that allow servers to describe a set of origins that are permitted to interact with the server. Most part of this technique involves in server configuration.

Below is cors flow with preflight request
Alt

CORS headers prefixed with Access-Control-

  • Access-Control-Allow-Origin (required): This header must be included in all valid responses. Possible values: * or a specific origin
  • Access-Control-Allow-Methods: indicates the methods allowed when accessing the resource
  • Access-Control-Allow-Headers: used in response to a preflight request to indicate which HTTP headers can be used when making the actual request
  • Access-Control-Allow-Credentials: indicates if the server allows credentials during CORS requests

A great post about CORS can be found here

CORS can be used as a modern alternative to the JSONP pattern. While JSONP supports only the GET request method, CORS also supports other types of HTTP requests. Using CORS enables a web programmer to use regular XMLHttpRequest, which supports better error handling than JSONP.(wiki)

Summary

  • CORS is the standardized mechanism for making cross-origin requests. Large part of CORS involved with server configuration. Almost all browsers supports CORS.
  • The first 3 techniques follow the same pattern: using a proxy object that receives the request from the client and send it to the server
  • The first 3 techniques require custom code. This leads to additional maintenance cost.

Awaiting AWS resources

Normally, when we work with AWS using AWS SDK, we need to wait for AWS resources to be in a specific status such as: an EC2 instance is running, a Kinesis stream is active, a Opsworks deployment process is successful… before we can continue. This can be done by continuous polling AWS resources until they are in a desired status.

Bellow is sample code for polling a newly created kinesis stream until it’s active.

function waitForStreamActive(streamName){
    let count = 0;
    const interval = 5000;
    const maxTries = 15;
    return (function wait(){
        return describeStream({StreamName : streamName}).then((data)=>{
            if(data.StreamDescription.StreamStatus === 'ACTIVE'){
                return Promise.resolve(streamName);
            } else {
                count++;
                logger.info(`Waiting for the stream ${streamName} active: ${count}`);
                //The stream is not active yet. Wait for some seconds
                if(count < maxTries){
                    return Promise.delay(interval).then(wait);
                } else {
                    return Promise.reject(`Max tries ${count} reached but the stream ${streamName} still not active`);
                }
            }
        });
    }());
}

We don't want to wait forever. In above code, when a polling completes, we will wait 5 seconds (interval) before a next polling. And we will do at most 15 tries (maxTries). If the resource isn't in a desired status after maxTries, we will terminate the polling process.

I kept doing this polling (partly b/c I was in a rush) by writing my own code until I realized that AWS SDK provides an API for this need (see waitFor):

waitFor(state, params, callback) ⇒ void

As waitFor is in abstract class (AWS.Service), we need to consult specific resource class for supported state names.

So, above code can be rewritten using AWS API waitFor as follows:

waitFor('streamExists', {StreamName: 'stream name'})
    .then(function(data){
        console.log(data);
    })
    .catch(function(err) {
        console.error(err);
    });

Sadly, AWS SDK for Node doesn't seem to allow us to config interval and maxTries parameters. I hadn't thought so ( because I know that AWS SDK for Ruby does allow us to do so) until I read the document carefully and found the hard-coded parameters stored in kinesis-2013-12-02.waiters2.json

{
  "version": 2,
  "waiters": {
    "StreamExists": {
      "delay": 10,
      "operation": "DescribeStream",
      "maxAttempts": 18,
      "acceptors": [
        {
          "expected": "ACTIVE",
          "matcher": "path",
          "state": "success",
          "argument": "StreamDescription.StreamStatus"
        }
      ]
    }
  }
}

Note: In code samples above, AWS's callback style APIs such as kinesis.describeStream, kinesis.waitFor… are converted to Promise style by using a a Promise library like bluebird

Understanding middleware pattern in express.js

Understanding middleware pattern in express.js

The term middleware (middle-ware, literally the software in the middle) may cause confusing for inexperienced and especially those coming from enterprise programming world. That is because in the enterprise architecture, middleware reminds of software suits that shield developers from having to deal with many of the low level and difficult issues, allowing developers to concentrate on business logic.

In express.js, middleware function is defined as

Middleware functions are functions that have access to the request object (req), the response object (res), and the next middleware function in the application’s request-response cycle. The next middleware function is commonly denoted by a variable named next.

A middleware function has following signature:

function(req, res, next) { ... }

There is a special kind of middleware named error-handling. This kind of middleware is special because it takes 4 arguments instead of three allowing expressjs to recognize this middleware as error-handling

function(err, req, res, next) {...}

Middleware functions can perform following tasks:

  • Logging requests
  • Authenticating / authorizing requests
  • Parsing the body of requests
  • End a request – response lifecycle
  • Call the next middleware function in the stack.

These tasks are not core concerns (business logic) of an application. Instead, they are cross cutting concerns applicable throughout the application and affecting the entire application.

Request-response lifecycle through a middleware is as follows:

alt text

  1. The first middleware function (A) in the pipeline will be invoked to process the request.
  2. Each middleware function may end the request by sending response to client or invoke the next middleware function (B) by calling next() or hand over the request to an error-handling middleware by calling next(err) with an error argument
  3. Each middleware function receives input as the result of previous middleware function
  4. If the request reaches the last middleware in the pipeline, we can assume 404 error
app.use((req, res) =&amp;gt; {
    res.writeHead(404, { 'Content-Type': 'text/html' });
    res.end(&quot;Cannot &quot; + req.method.toUpperCase() + &quot; &quot; + req.url);
});

As we can see, the idea behind the middleware pattern is not new. We can consider middleware pattern in express.js a variant of

This pattern has some benefits:

  • Avoid coupling the sender of a request to the receiver by giving more than one object a chance to handle the request. Both the receiver and the sender have no explicit knowledge of each other.
  • Flexibility in distributing responsibilities among objects. We add or change responsibilities for handling a request by adding to or changing the chain at run-time
references

Executing async tasks serially with Array#reduce

Executing async tasks serially with Array#reduce

Suppose that we’re assigned a task of writing a migration tool for a database with following requirements:

  • The tool will read a list of sql scripts and execute them serially one after another.
  • Each script will run once previous script has completed.
  • If any script execution fails, no more scripts will execute.

This can be done by using a library such as async#reduce( or async#series )

async.reduce(files, Promise.resolve(), function(prevPromise, file, callback){
    prevPromise.then(function(){
        return readFileAsync(file, {encoding: 'utf-8'});
    }).then(function(query){
        return execDbAsync(query);
    }).then(function(data){
        callback(data);
    });
}, function(err, result){

});

Without using additional library, this executing async tasks serially problem can be solved by using Array#reduce method:

files.reduce(function(prevPromise, curr, curIdx, arr){
    return prevPromise.then(function(){
        return readFileAsync(curr, {encoding: 'utf-8'});
    })
    .then(function(query){
        return execDbAsync(query);
    });
}, Promise.resolve());

Above code will return a Promise which:

  • gets resolved when all chaining promises get resolved.
  • gets rejected when any of chaining promises gets rejected.

Comparing to using async#reduce, using builtin Array#reduce method has some benefits:

  • No additional library needed
  • No callback
  • Less code as we don’t need to explicitly call “callback (data)” to notify a task completion

Execution order looks like bellow:

read (script 1)—> exec (script 1)—> read (script2)—> exec (script 2)—>…–> read (script n)—> exec (script n)

Understanding execFile, spawn, exec and fork in node

Understanding execFile, spawn, exec and fork in node

In node, child_process module provides 4 different methods for executing external applications:

1. execFile

2. spawn

3. exec

4. fork

All of these are asynchronous. Calling these methods will return an object which is instance of ChildProcess class.

cp_methods

The right method depends on what we need. We will take a look in detail at these.


1. execFile

what?

Executes an external application, given optional arguments and callback with the buffered output after the application exits. Below is the method signature (from nodejs document):

child_process.execFile(file[, args][, options][, callback])

how?

In below example, node program will be executed with argument “–version”. When the external application exists, callback function is called. Callback function will contains the stdout and stderr output of the child process. The output stdout from the external application is buffered internally.
Running below code will print out the current node version.

const execFile = require('child_process').execFile;
    const child = execFile('node', ['--version'], (error, stdout, stderr) => {
    if (error) {
        console.error('stderr', stderr);
        throw error;
    }
    console.log('stdout', stdout);
});

How does node know where to find the external application?

PATH environment variable which specifies a set of directories where executable programs are located. If an external application exists on PATH environment, it can be located without needing an absolute or relative path to the application

when ?

execFile is used when we just need to execute an application and get the output. For example, we can use execFile to run image-processing application like ImageMagick to convert an image from PNG to JPG format and we only care if it succeeds or not. execFile should not be used when the external application produces a large amount of data and we need to consume that data in real time manner.

2. spawn

what?

The spawn method spawns an external application in a new process and return a streaming interface for I/O.

child_process.spawn(command[, args][, options])
  • command <String> The command to run
  • args <Array> List of string arguments
  • options <Object>
    • cwd <String> Current working directory of the child process
    • env <Object> Environment key-value pairs
    • stdio <Array> | <String> Child’s stdio configuration. (See options.stdio)
    • detached <Boolean> Prepare child to run independently of its parent process. Specific behavior depends on the platform, seeoptions.detached)
    • uid <Number> Sets the user identity of the process. (See setuid(2).)
    • gid <Number> Sets the group identity of the process. (See setgid(2).)
    • shell <Boolean> | <String> If true, runs command inside of a shell. Uses ‘/bin/sh’ on UNIX, and ‘cmd.exe’ on Windows. A different shell can be specified as a string. The shell should understand the -c switch on UNIX, or /s /c on Windows. Defaults to false (no shell).
  • return: <ChildProcess>

how?

const spawn = require('child_process').spawn;
const fs = require('fs');
function resize(req, resp) {
    const args = [
        "-", // use stdin
        "-resize", "640x", // resize width to 640
        "-resize", "x360<", // resize height if it's smaller than 360
        "-gravity", "center", // sets the offset to the center
        "-crop", "640x360+0+0", // crop
        "-" // output to stdout
    ];

    const streamIn = fs.createReadStream('./path/to/an/image');
    const proc = spawn('convert', args);
    streamIn.pipe(proc.stdin);
    proc.stdout.pipe(resp);
}

In the nodejs function above (an expressjs controller function), we read an image file using a stream. Then, we use spawn method to spawn convert program (see imagemagick.org). Then, we feed ChildProcess proc will the image stream. As long as the proc object produces data, we write that data to the resp (which is a Writable stream) and users can see the image immediately without having to wait for the whole image converted (resized)

when?

As spawn returns a stream based object, it’s great for handling applications that produce large amount of data or working with data as it reads in.As it’s stream based, all stream benefits apply as well:

  • Low memory footprint
  • Automatically handle back-pressure
  • Lazily produce or consume data in buffered chunks.
  • Evented and non-blocking
  • Buffers allow you to work around the V8 heap memory limit

3. exec

what?

This method will spawn a subshell and execute the command in that shell and buffer generated data. When the child process completes, callback function will be called with:

  • buffered data when the command executes successfully
  • error (which is an instance of Error) when the command fails
child_process.exec(command[, options][, callback])
  • command <String> The command to run, with space-separated arguments
  • options <Object>
    • cwd <String> Current working directory of the child process
    • env <Object> Environment key-value pairs
    • encoding <String> (Default: ‘utf8’)
    • shell <String> Shell to execute the command with (Default: ‘/bin/sh’ on UNIX, ‘cmd.exe’ on Windows, The shell should understand the -c switch on UNIX or /s /c on Windows. On Windows, command line parsing should be compatible withcmd.exe.)
    • timeout <Number> (Default: 0)
    • maxBuffer <Number> largest amount of data (in bytes) allowed on stdout or stderr – if exceeded child process is killed (Default:200\*1024)
    • killSignal <String> (Default: ‘SIGTERM’)
    • uid <Number> Sets the user identity of the process. (See setuid(2).)
    • gid <Number> Sets the group identity of the process. (See setgid(2).)
  • callback <Function> called with the output when process terminates
  • Return: <ChildProcess>

Comparing to execFile and spawn, exec doesn’t have an args argument because exec allows us to execute more than one command on a shell. When using exec, if we need to pass arguments to the command, they should be part of the whole command string.

how?

Following code snippet will print out recursively all items under current folder:

const exec = require('child_process').exec;
exec('for i in $( ls -LR ); do echo item: $i; done', (e, stdout, stderr)=> {
    if (e instanceof Error) {
        console.error(e);
        throw e;
    }
    console.log('stdout ', stdout);
    console.log('stderr ', stderr);
});

When running command in a shell, we have access to all functionality supported by that shell such as pipe, redirect..

const exec = require('child_process').exec;
exec('netstat -aon | find "9000"', (e, stdout, stderr)=> {
    if (e instanceof Error) {
        console.error(e);
        throw e;
    }
    console.log('stdout ', stdout);
    console.log('stderr ', stderr);
});

In above example, node will spawn a subshell and execute the command “netstat -aon | find “9000”” in that subshell. The command string includes two commands:

  • netstat -aon: netstat command with argument -aon
  • find “9000”: find command with argument 9000

The first command will displays all active TCP connections(-a), process id (-o), ports and addresses (expressed numerically -n) on which the computer is listening. Output of this command will feed into the second command which finds the process with port id 9000. On success, following line will print out

TCP    0.0.0.0:9000           0.0.0.0:0              LISTENING       11180

when?

exec should be used when we need to utilize shell functionality such as pipe, redirects, backgrounding…

Notes
  • The exec will execute the command in a shell which map to /bin/sh (linux)and cmd.exe (windows)
  • Executing a command in a shell using exec is great. However, exec should be used with caution as shell injection can be exploited. Whenever possible, execFile should be use as invalid arguments passed to execFile will yield an error.

4. fork

what?

The child_process.fork() method is a special case of child_process.spawn() used specifically to spawn new Node.js processes. Like child_process.spawn(), a ChildProcess object is returned. The returned ChildProcess will have an additional communication channel built-in that allows messages to be passed back and forth between the parent and child.

The fork method will open an IPC channel allowing message passing between Node processes:

  • On the child process, process.on(‘message’) and process.send(‘message to parent’) can be used to receive and send data
  • On the parent process, child.on(‘message’) and child.send(‘message to child’) are used

Each process has it’s own memory, with their own V8 instances assuming at least 30ms start up and 10mb each.

child_process.fork(modulePath[, args][, options])
  • modulePath <String> The module to run in the child
  • args <Array> List of string arguments
  • options <Object>
    • cwd <String> Current working directory of the child process
    • env <Object> Environment key-value pairs
    • execPath <String> Executable used to create the child process
    • execArgv <Array> List of string arguments passed to the executable (Default: process.execArgv)
    • silent <Boolean> If true, stdin, stdout, and stderr of the child will be piped to the parent, otherwise they will be inherited from the parent, see the ‘pipe’ and ‘inherit’ options for child_process.spawn()‘s stdio for more details (Default:false)
    • uid <Number> Sets the user identity of the process. (See setuid(2).)
    • gid <Number> Sets the group identity of the process. (See setgid(2).)
  • Return: <ChildProcess>

how?

//parent.js
const cp = require('child_process');
const n = cp.fork(`${__dirname}/sub.js`);

n.on('message', (m) => {
  console.log('PARENT got message:', m);
});

n.send({ hello: 'world' });
//sub.js
process.on('message', (m) => {
  console.log('CHILD got message:', m);
});

process.send({ foo: 'bar' });

when?

Since Node main process is single threaded, long running tasks like computation will tie up the main process. As a result, incoming requests can’t be serviced and the application becomes unresponsive. Off loading long running tasks from the main process by forking a new Node process allows the application to serve incoming requests and stay responsive.

summary

The method should be used to execute an external application can be summarized as the image below.

references