You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using Deckard with the following configuration:
MIN_TOKENS='20 30 50' # can be a sequence of integers
STRIDE='inf' # can be a sequence of integers
SIMILARITY='1' # can be a sequence of values <= 1
I set stride as inf because:
The setting with an infinite stride means that vector merging was disabled. reference
If stride is set to infinity, only non-overlapping and syntactically complete pieces of code (e.g., a complete if statement or a complete for statement) are considered for clones. reference
After running Deckard, I noticed that some clusters in post_cluster_* contain code blocks with vastly different lengths. For example, the following cluster includes one block with 1057 lines while the other has only 110 lines:
Description
I am using Deckard with the following configuration:
I set stride as inf because:
After running Deckard, I noticed that some clusters in
post_cluster_*
contain code blocks with vastly different lengths. For example, the following cluster includes one block with 1057 lines while the other has only 110 lines:Since dist:0.0 suggests they are considered identical clones, I would like to understand why blocks of such different lengths are grouped together.
Here are the original files in the above cluster:
files.zip
I also tried setting stride=2, but the large variation in block lengths within clusters still persists.
Questions
The text was updated successfully, but these errors were encountered: