Understanding execFile, spawn, exec and fork in node

Understanding execFile, spawn, exec and fork in node

In node, child_process module provides 4 different methods for executing external applications:

1. execFile

2. spawn

3. exec

4. fork

All of these are asynchronous. Calling these methods will return an object which is instance of ChildProcess class.

cp_methods

The right method depends on what we need. We will take a look in detail at these.


1. execFile

what?

Executes an external application, given optional arguments and callback with the buffered output after the application exits. Below is the method signature (from nodejs document):

child_process.execFile(file[, args][, options][, callback])

how?

In below example, node program will be executed with argument “–version”. When the external application exists, callback function is called. Callback function will contains the stdout and stderr output of the child process. The output stdout from the external application is buffered internally.
Running below code will print out the current node version.

const execFile = require('child_process').execFile;
    const child = execFile('node', ['--version'], (error, stdout, stderr) => {
    if (error) {
        console.error('stderr', stderr);
        throw error;
    }
    console.log('stdout', stdout);
});

How does node know where to find the external application?

PATH environment variable which specifies a set of directories where executable programs are located. If an external application exists on PATH environment, it can be located without needing an absolute or relative path to the application

when ?

execFile is used when we just need to execute an application and get the output. For example, we can use execFile to run image-processing application like ImageMagick to convert an image from PNG to JPG format and we only care if it succeeds or not. execFile should not be used when the external application produces a large amount of data and we need to consume that data in real time manner.

2. spawn

what?

The spawn method spawns an external application in a new process and return a streaming interface for I/O.

child_process.spawn(command[, args][, options])
  • command <String> The command to run
  • args <Array> List of string arguments
  • options <Object>
    • cwd <String> Current working directory of the child process
    • env <Object> Environment key-value pairs
    • stdio <Array> | <String> Child’s stdio configuration. (See options.stdio)
    • detached <Boolean> Prepare child to run independently of its parent process. Specific behavior depends on the platform, seeoptions.detached)
    • uid <Number> Sets the user identity of the process. (See setuid(2).)
    • gid <Number> Sets the group identity of the process. (See setgid(2).)
    • shell <Boolean> | <String> If true, runs command inside of a shell. Uses ‘/bin/sh’ on UNIX, and ‘cmd.exe’ on Windows. A different shell can be specified as a string. The shell should understand the -c switch on UNIX, or /s /c on Windows. Defaults to false (no shell).
  • return: <ChildProcess>

how?

const spawn = require('child_process').spawn;
const fs = require('fs');
function resize(req, resp) {
    const args = [
        "-", // use stdin
        "-resize", "640x", // resize width to 640
        "-resize", "x360<", // resize height if it's smaller than 360
        "-gravity", "center", // sets the offset to the center
        "-crop", "640x360+0+0", // crop
        "-" // output to stdout
    ];

    const streamIn = fs.createReadStream('./path/to/an/image');
    const proc = spawn('convert', args);
    streamIn.pipe(proc.stdin);
    proc.stdout.pipe(resp);
}

In the nodejs function above (an expressjs controller function), we read an image file using a stream. Then, we use spawn method to spawn convert program (see imagemagick.org). Then, we feed ChildProcess proc will the image stream. As long as the proc object produces data, we write that data to the resp (which is a Writable stream) and users can see the image immediately without having to wait for the whole image converted (resized)

when?

As spawn returns a stream based object, it’s great for handling applications that produce large amount of data or working with data as it reads in.As it’s stream based, all stream benefits apply as well:

  • Low memory footprint
  • Automatically handle back-pressure
  • Lazily produce or consume data in buffered chunks.
  • Evented and non-blocking
  • Buffers allow you to work around the V8 heap memory limit

3. exec

what?

This method will spawn a subshell and execute the command in that shell and buffer generated data. When the child process completes, callback function will be called with:

  • buffered data when the command executes successfully
  • error (which is an instance of Error) when the command fails
child_process.exec(command[, options][, callback])
  • command <String> The command to run, with space-separated arguments
  • options <Object>
    • cwd <String> Current working directory of the child process
    • env <Object> Environment key-value pairs
    • encoding <String> (Default: ‘utf8’)
    • shell <String> Shell to execute the command with (Default: ‘/bin/sh’ on UNIX, ‘cmd.exe’ on Windows, The shell should understand the -c switch on UNIX or /s /c on Windows. On Windows, command line parsing should be compatible withcmd.exe.)
    • timeout <Number> (Default: 0)
    • maxBuffer <Number> largest amount of data (in bytes) allowed on stdout or stderr – if exceeded child process is killed (Default:200\*1024)
    • killSignal <String> (Default: ‘SIGTERM’)
    • uid <Number> Sets the user identity of the process. (See setuid(2).)
    • gid <Number> Sets the group identity of the process. (See setgid(2).)
  • callback <Function> called with the output when process terminates
  • Return: <ChildProcess>

Comparing to execFile and spawn, exec doesn’t have an args argument because exec allows us to execute more than one command on a shell. When using exec, if we need to pass arguments to the command, they should be part of the whole command string.

how?

Following code snippet will print out recursively all items under current folder:

const exec = require('child_process').exec;
exec('for i in $( ls -LR ); do echo item: $i; done', (e, stdout, stderr)=> {
    if (e instanceof Error) {
        console.error(e);
        throw e;
    }
    console.log('stdout ', stdout);
    console.log('stderr ', stderr);
});

When running command in a shell, we have access to all functionality supported by that shell such as pipe, redirect..

const exec = require('child_process').exec;
exec('netstat -aon | find "9000"', (e, stdout, stderr)=> {
    if (e instanceof Error) {
        console.error(e);
        throw e;
    }
    console.log('stdout ', stdout);
    console.log('stderr ', stderr);
});

In above example, node will spawn a subshell and execute the command “netstat -aon | find “9000”” in that subshell. The command string includes two commands:

  • netstat -aon: netstat command with argument -aon
  • find “9000”: find command with argument 9000

The first command will displays all active TCP connections(-a), process id (-o), ports and addresses (expressed numerically -n) on which the computer is listening. Output of this command will feed into the second command which finds the process with port id 9000. On success, following line will print out

TCP    0.0.0.0:9000           0.0.0.0:0              LISTENING       11180

when?

exec should be used when we need to utilize shell functionality such as pipe, redirects, backgrounding…

Notes
  • The exec will execute the command in a shell which map to /bin/sh (linux)and cmd.exe (windows)
  • Executing a command in a shell using exec is great. However, exec should be used with caution as shell injection can be exploited. Whenever possible, execFile should be use as invalid arguments passed to execFile will yield an error.

4. fork

what?

The child_process.fork() method is a special case of child_process.spawn() used specifically to spawn new Node.js processes. Like child_process.spawn(), a ChildProcess object is returned. The returned ChildProcess will have an additional communication channel built-in that allows messages to be passed back and forth between the parent and child.

The fork method will open an IPC channel allowing message passing between Node processes:

  • On the child process, process.on(‘message’) and process.send(‘message to parent’) can be used to receive and send data
  • On the parent process, child.on(‘message’) and child.send(‘message to child’) are used

Each process has it’s own memory, with their own V8 instances assuming at least 30ms start up and 10mb each.

child_process.fork(modulePath[, args][, options])
  • modulePath <String> The module to run in the child
  • args <Array> List of string arguments
  • options <Object>
    • cwd <String> Current working directory of the child process
    • env <Object> Environment key-value pairs
    • execPath <String> Executable used to create the child process
    • execArgv <Array> List of string arguments passed to the executable (Default: process.execArgv)
    • silent <Boolean> If true, stdin, stdout, and stderr of the child will be piped to the parent, otherwise they will be inherited from the parent, see the ‘pipe’ and ‘inherit’ options for child_process.spawn()‘s stdio for more details (Default:false)
    • uid <Number> Sets the user identity of the process. (See setuid(2).)
    • gid <Number> Sets the group identity of the process. (See setgid(2).)
  • Return: <ChildProcess>

how?

//parent.js
const cp = require('child_process');
const n = cp.fork(`${__dirname}/sub.js`);

n.on('message', (m) => {
  console.log('PARENT got message:', m);
});

n.send({ hello: 'world' });
//sub.js
process.on('message', (m) => {
  console.log('CHILD got message:', m);
});

process.send({ foo: 'bar' });

when?

Since Node main process is single threaded, long running tasks like computation will tie up the main process. As a result, incoming requests can’t be serviced and the application becomes unresponsive. Off loading long running tasks from the main process by forking a new Node process allows the application to serve incoming requests and stay responsive.

summary

The method should be used to execute an external application can be summarized as the image below.

references
Advertisements