Skip to content

Commit

Permalink
Merge pull request #6 from Edujugon/dev
Browse files Browse the repository at this point in the history
Update readme file
  • Loading branch information
Edujugon authored Mar 19, 2023
2 parents 69c6b1e + 92bffa9 commit 6a2ef0b
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 14 deletions.
42 changes: 30 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,21 @@
# NODE Parallelizer
A NodeJS package for running code in parallel, originally created to provide multiprocessing in an AWS Lambda function but can be used on any NodeJS environment.
# Node Parallelizer
A NodeJS package for running code in parallel. Initially created to provide multiprocessing in an AWS Lambda function, it can be used in any NodeJS environment.

## Supported parallelizers
- Child Process
- Worker threads [Coming soon]

### Child Process Parallelizer
This parallelizer is specifically designed to process hundreds or thousands of records in a single invocation when your code performs both CPU-intensive and I/O-intensive operations.
Behind the scenes, it uses the NodeJS [child process module](https://nodejs.org/api/child_process.html)
This parallelizer is specifically designed for processing hundreds or thousands of records in a single invocation when your code performs both CPU-intensive and I/O-intensive operations. It uses the NodeJS [child process module](https://nodejs.org/api/child_process.html) behind the scenes.

When you call `runBatch(records)` method in this parallelizer, the package will split the list of records you provide into smaller subsets, and your code will be used to execute each subset in parallel.
When you call the `runBatch(records)` method in this parallelizer, the package will split the list of records you provide into smaller subsets, and your code will be used to execute each subset in parallel.

## AWS Lambda & Child Process Parallelizer
The package can detect the number of vCPU cores allocated to your lambda function and maximize its utilization. By default, it generates one child process per vCPU core, but this setting can be customized to meet your specific requirements. Alternatively, you can manually specify the number of child processes the library creates, regardless of the number of vCPU cores available.
The package can detect the number of vCPU cores allocated to your Lambda function and maximize their utilization. By default, it generates one child process per vCPU core, but this setting can be customized to meet your specific requirements. Alternatively, you can manually specify the number of child processes the library creates, regardless of the number of vCPU cores available.

It uses the Lambda function environment `/tmp` folder to create the required module that runs in the child.

When you call the `createChildProcessFromFile` or the `createChildProcessFromCode` methods outside of the Lambda handler function, it will reuse the child processes across the different invocations within a Lambda instance, improving performance. Furthermore, if the package detects a disconnection of any of the child processes, it will recreate it automatically without affecting the execution.
When you call the `createChildProcessFromFile` method outside of the Lambda handler function, it will reuse the child processes across the different invocations within a Lambda instance, improving performance. Furthermore, if the package detects a disconnection of any of the child processes, it will recreate it automatically without affecting the execution.

## Installation
To add this package to your dependency list, run:
Expand All @@ -25,10 +24,29 @@ To add this package to your dependency list, run:
npm i node-parallelizer --save
```
## Usage

### Child Process Parallelizer
#### Class instantiation
`ChildProcess({ tmpPath = '/tmp', maxProcesses = false, processesPerCPU = 1, debug = false })`

**Parameters**
- `tmpPath` (String) (Default value: '/tmp'): The path where the module that runs in the child will be created.
- `maxProcesses` (Number|false) (Default value: false): The maximum number of child processes that will be created. If false, it is based on the CPU cores available.
- `processesPerCPU` (Number) (Default value: 1): If the `maxProcesses` is set to `false`, this parameter defines the amount of processes per CPU.
- `debug` (Boolean) (Default value: false): Enables the internal logs for debuggin purposes.
#### Main methods
`createChildProcessFromFile({ filePath, processBatchFunctionName })`

**Parameters**
- `filePath` (String): The absolute path to the file that contains the function that will be executed with the subset.
- `processBatchFunctionName` (String): The name of the function that will be executed with the subset.

`runBatch(batch)`

**Parameters**
- `batch` (Array): The records you want to process in parallel.

#### Example 1, using child process parallizer in AWS Lambda.
**Returns** (Array): The responses of the child processes.
#### Using child process parallizer in AWS Lambda.
In this example, the repository structure looks like this
```
src/
Expand Down Expand Up @@ -57,7 +75,7 @@ module.exports.handler = async(event) => {
};

```
> Ensure that the filePath parameter is given as an absolute path. In this sample, we've added '/var/task/' to our path child code path because Lambda deploys your code in that path folder.
> Make sure to provide the filePath parameter as an absolute path. In this example, we've included '/var/task/' in the path for the child code, as Lambda deploys your code within that folder.
The below snippet represents the code you want to run in parallel
```javascript
Expand All @@ -66,7 +84,7 @@ The below snippet represents the code you want to run in parallel
const batchProcessor = ({ batch }) => {

//
// HERE MY Business logic code
// HERE YOUR CODE
//

return { success: true, count: batch.length }
Expand All @@ -76,7 +94,7 @@ const batchProcessor = ({ batch }) => {
module.exports = { batchProcessor }

```
> Ensure the input signature of your function (this case: `batchProcessor`) has `batch` as a parameter. As this contains the subset of records that a child process will process.
> Verify that the input signature of your function (in this case, batchProcessor) includes batch as a parameter, as it contains the subset of records that a child process will handle.
## Contribution
We welcome contributions to this project. If you are interested in contributing, please feel free to submit a pull request.
4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "node-parallelizer",
"version": "1.1.1",
"description": "A Nodejs package for running code in parallel.",
"version": "1.1.2",
"description": "A NodeJS package for running code in parallel. Initially created to provide multiprocessing in an AWS Lambda function, it can be used in any NodeJS environment.",
"main": "src/index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
Expand Down

0 comments on commit 6a2ef0b

Please sign in to comment.