Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Additional interval joins implementations #169

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion page/content/en/docs/Algorithms/join.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,4 +73,20 @@ parameter. There are 2 prerequisites:

2. node class used for storing intervals must extend `org.biodatageeks.sequila.rangejoins.methods.base.BaseNode[V]`

Both Scala and Java classes are supported. Please use default interval tree implementation for reference.
Both Scala and Java classes are supported. Please use default interval tree implementation for reference.

---
title: "Available custom interval structures"
linkTitle: "Available custom interval structures"
weight: 3
description: >
Custom interval structures
---
| Structure | spark.biodatageeks.rangejoin.intervalHolderClass | Reference |
|-----------------------------------------|-----------------|------------------------------------------------------------------------------------|
| Nested Containment List | org.biodatageeks.sequila.rangejoins.exp.nclist.NCList | [link](https://academic.oup.com/bioinformatics/article/23/11/1386/199545) |
| Augmented Interval List | org.biodatageeks.sequila.rangejoins.exp.ailist.AIList | [link](https://academic.oup.com/bioinformatics/article/35/23/4907/5509521) |
| Implicit Interval Tree | org.biodatageeks.sequila.rangejoins.exp.iit.IITree | [link](https://pubmed.ncbi.nlm.nih.gov/32966548/)|
| Implicit Interval Tree With Interpolation Index | org.biodatageeks.sequila.rangejoins.exp.iitii.ImplicitIntervalTreeWithInterpolationIndex | [link](https://github.com/mlin/iitii) |


3 changes: 2 additions & 1 deletion page/content/en/docs/Configuration/join.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,5 @@ description: >
|spark.biodatageeks.rangejoin.maxBroadcastSize| 0.1*spark.driver.memory| The maximum allowed size of the broadcasted intverval structure which is used by the SeQuiLa's optimizer to chose interval join [algorithm](http://biodatageeks.ii.pw.edu.pl/sequila/architecture/architecture.html#optimizations).|
|spark.biodatageeks.rangejoin.maxGap| 0 | The maximum gap between between regions |
|spark.biodatageeks.rangejoin.minOverlap| 1 | The minimal length of the overlap between regions |
|spark.biodatageeks.rangejoin.intervalHolderClass|[`IntervalTreeRedBlack`](https://github.com/biodatageeks/sequila/blob/master/src/main/scala/org/biodatageeks/sequila/rangejoins/methods/IntervalTree/IntervalTreeRedBlack.java)| [Pluggable](/sequila/docs/algorithms/join/#custom-interval-structure) mechanism for implementing custom interval structures.|
|spark.biodatageeks.rangejoin.intervalHolderClass|[`IntervalTreeRedBlack`](https://github.com/biodatageeks/sequila/blob/master/src/main/scala/org/biodatageeks/sequila/rangejoins/methods/IntervalTree/IntervalTreeRedBlack.java)| [Pluggable](/sequila/docs/algorithms/join/#custom-interval-structure) mechanism for implementing custom interval structures.|
|spark.biodatageeks.rangejoin.iitii.domainsNum|floor(dataset.count/4880)| Parameter of the ImplicitIntervalTreeWithInterpolationIndex structure to explicitly provide number of domains that the dataset is divided into. It indicates the number of interpolation functions that are being used during intersection. |
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ case class AlignmentsRDD(rdd: RDD[SAMRecord]) {
.sqlContext
.setConf(InternalParams.AlignmentIntervals, "")
logger.info(s"Found ${boundsOverlappingReads.length} overlapping reads")
val tree = new IntervalHolderChromosome[TruncRead](boundsOverlappingReads, "org.biodatageeks.sequila.rangejoins.methods.IntervalTree.IntervalTreeRedBlack")
val tree = new IntervalHolderChromosome[TruncRead](boundsOverlappingReads, "org.biodatageeks.sequila.rangejoins.methods.IntervalTree.IntervalTreeRedBlack", Map[String,String]())
PartitionUtils.getAdjustedPartitionBounds(lowerBounds, tree, conf, contigs)
}

Expand Down Expand Up @@ -236,4 +236,4 @@ case class AlignmentsRDD(rdd: RDD[SAMRecord]) {
}
(adjustedAlignments, normBroadcastBounds)
}
}
}
Loading