Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V0.1.3 #4

Merged
merged 3 commits into from
Jan 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "mgikit"
version = "0.1.2"
version = "0.1.3"
edition = "2021"
authors = ["Ziad Al Bkhetan <[email protected]>"]
repository = "https://github.com/sagc-bioinformatics/mgikit"
Expand Down
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,25 @@ This command is to merge demultiplexing and quality reports from multiple lanes

<hr/>

## Installation

You can use the static binary under bins directly, however, if you like to build it from the source code:

You need to have Rust and cargo installed first, check rust [documenation](https://doc.rust-lang.org/cargo/getting-started/installation.html)

```bash
git clone https://github.com/sagc-bioinformatics/mgikit.git
cd mgikit
cargo build --release
```



## User Guide

Please checkout the [documeantion](https://sagc-bioinformatics.github.io/mgikit/)


## Commerical Use

Please contact us if you want to use the software for commercial purposes.
Binary file added bins/mgikit-V0.1.3.zip
Binary file not shown.
Binary file removed bins/mgikit.zip
Binary file not shown.
13 changes: 13 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,19 @@ This command is to merge demultiplexing and quality reports from multiple lanes

<hr/>

## Installation

You can use the static binary under bins directly, however, if you like to build it from the source code:

You need to have Rust and cargo installed first, check rust [documenation](https://doc.rust-lang.org/cargo/getting-started/installation.html)


```bash
git clone https://github.com/sagc-bioinformatics/mgikit.git
cd mgikit
cargo build --release
```

## User Guide Table of Content

{% include section-navigation-tiles.html type="guides" %}
Expand Down
123 changes: 122 additions & 1 deletion docs/pages/demultiplex.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ the number of allowed mismatches is high.

+ **`--report-level`**: The level of reporting. [default: 2]

+ **`--compression-level`**: The level of compression (between 0 and 12). 0 is fast but no compression, 9 is slow but high compression. [default: 1]
+ **`--compression-level`**: The level of compression (between 0 and 12). 0 is fast but no compression, 12 is slow but high compression. [default: 1]

+ **`--force`**: this flag is to force the run and overwrite the existing output directory if exists.

Expand Down Expand Up @@ -356,6 +356,127 @@ multiqc mgikit-examples/test/

```

### Performance evaluation

Performance time (in minutes) evaluation and comparison on different datasets.
DS01 and DS04 are 10 bp dual index, DS02 and DS3 are 8 bp dual index and DS05 is 8 bp single index.
In the case of single-end, the R2 file of the dataset is used alone for demultiplexing.

<style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
.tg .tg-g7sd{border-color:inherit;font-weight:bold;text-align:left;vertical-align:middle}
.tg .tg-uzvj{border-color:inherit;font-weight:bold;text-align:center;vertical-align:middle}
.tg .tg-7btt{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
.tg .tg-fymr{border-color:inherit;font-weight:bold;text-align:left;vertical-align:top}
</style>
<table class="tg">
<thead>
<tr>
<th class="tg-g7sd" rowspan="2">Dataset</th>
<th class="tg-uzvj" rowspan="2">Reads</th>
<th class="tg-uzvj" rowspan="2">Samples</th>
<th class="tg-uzvj" colspan="2">Length (bp)</th>
<th class="tg-uzvj" colspan="2">Size (GB)</th>
<th class="tg-uzvj" rowspan="2">Paired-end</th>
<th class="tg-uzvj" rowspan="2">Single-end</th>
</tr>
<tr>
<th class="tg-7btt">R1</th>
<th class="tg-7btt">R2</th>
<th class="tg-7btt">R1</th>
<th class="tg-7btt">R2</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tg-fymr">DS01</td>
<td class="tg-c3ow">298303014</td>
<td class="tg-c3ow">102</td>
<td class="tg-c3ow">300</td>
<td class="tg-c3ow">320</td>
<td class="tg-c3ow">76</td>
<td class="tg-c3ow">85</td>
<td class="tg-c3ow">71.5</td>
<td class="tg-c3ow">37.2</td>
</tr>
<tr>
<td class="tg-fymr">DS02</td>
<td class="tg-c3ow">494667136</td>
<td class="tg-c3ow">39</td>
<td class="tg-c3ow">148</td>
<td class="tg-c3ow">172</td>
<td class="tg-c3ow">65</td>
<td class="tg-c3ow">75</td>
<td class="tg-c3ow">61.5</td>
<td class="tg-c3ow">31.8</td>
</tr>
<tr>
<td class="tg-fymr">DS03</td>
<td class="tg-c3ow">506600595</td>
<td class="tg-c3ow">29</td>
<td class="tg-c3ow">100</td>
<td class="tg-c3ow">124</td>
<td class="tg-c3ow">46</td>
<td class="tg-c3ow">55</td>
<td class="tg-c3ow">43.5</td>
<td class="tg-c3ow">30</td>
</tr>
<tr>
<td class="tg-fymr">DS04</td>
<td class="tg-c3ow">274567350</td>
<td class="tg-c3ow">5</td>
<td class="tg-c3ow">28</td>
<td class="tg-c3ow">70</td>
<td class="tg-c3ow">8.5</td>
<td class="tg-c3ow">19</td>
<td class="tg-c3ow">13</td>
<td class="tg-c3ow">11.9</td>
</tr>
<tr>
<td class="tg-fymr">DS05</td>
<td class="tg-c3ow">500612381</td>
<td class="tg-c3ow">64</td>
<td class="tg-c3ow">50</td>
<td class="tg-c3ow">8</td>
<td class="tg-c3ow">22</td>
<td class="tg-c3ow">5.5</td>
<td class="tg-c3ow">12</td>
<td class="tg-c3ow">-</td>
</tr>
</tbody>
</table>

### Memory utilisation

The default parameters of the tool are optimised to achive high performance. The majority of the memory needed is allocated for output buffering to reduce writing to disk operations.

The expected memory usage is influnced yb three main factors,

1. Number of samples in the sample sheet.
2. Writing buffer size (`--writing-buffer-size` parameter, default is `67108864`).
3. Compression buffer size (`--compression-buffer-size` parameter, default is `131072`).
4. Single end or paired end input data.

The expected allocated memory is

+ **Single-end input**: `number of smaples * (writing buffer size + 2 * compression buffer size)`.

+ **Paired-end input**: `2 * number of smaples * (writing buffer size + 2 * compression buffer size)`.

When using the default parameters:

+ **Single-end input**: `number of smaples * 64.25 MB`.

+ **Paired-end input**: `2 * number of smaples 64.25 MB`.

Reducing the writing buffer size will reduce the reqiured memory but also affect the performance time.


### Execution examples

You can use the datasets at `testing_data` to perform these tests.
Expand Down
Loading