Skip to content

A toolbox of consistent weighted sampling algorithms for weighted Min-Hash.

License

Notifications You must be signed in to change notification settings

drhash-cn/consistent-weighted-sampling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

consistent-weighted-sampling

The weighted MinHash algorithms based on Consistent Weighted Sampling

The algorithms convert each weighted set into the hashcode for similarity-based data mining and machine learning tasks, e.g., classification, retreival, etc., by pairwise Hamming similarity calculation between the hashcodes.

Here, we develop three algorithms

  • CCWS. Wei Wu, Bin Li, Ling Chen, Chengqi Zhang. (2016). Canonical Consistent Weighted Sampling for Real-Value Min-Hash. Proceedings of the 16th International Conference on Data Mining. 1287-1292.
  • PCWS. Wei Wu, Bin Li, Ling Chen, Chengqi Zhang. (2017). Consistent Weighted Sampling Made More Practical. Proceedings of the 26th International World Wide Web Conference. 1035-1043.
  • I2CWS. Wei Wu, Bin Li, Ling Chen, Chengqi Zhang, Philip S. Yu. (2019). Improved Consistent Weighted Sampling Revisited. IEEE Transactions on Knowledge and Data Engineering. 31(12):2332-2345.

If you use our algorithms in your research, please cite the following papers as reference in your publicaions:

@inproceedings{wu2016canonical,
  title={{C}anonical {C}onsistent {W}eighted {S}ampling for {R}eal-{V}alue {W}eighted {M}in-{H}ash},
  author={Wu, Wei and Li, Bin and Chen, Ling and Zhang, Chengqi},
  booktitle={ICDM},
  pages={1287--1292},
  year={2016}
}

@inproceedings{wu2017consistent,
  title={{C}onsistent {W}eighted {S}ampling {M}ade {M}ore {P}ractical},
  author={Wu, Wei and Li, Bin and Chen, Ling and Zhang, Chengqi},
  booktitle={WWW},
  pages={1035--1043},
  year={2017}
}

@article{wu2017improved,
  title={{I}mproved {C}onsistent {W}eighted {S}ampling {R}evisited},
  author={Wu, Wei and Li, Bin and Chen, Ling and Zhang, Chengqi and Yu, Philip S},
  journal={IEEE Transactions on Knowledge and Data Engineering},
  pages={2332--2345},
  year={2019}
}

@article{wu2020review,
 title={{A} {R}eview for {W}eighted {M}in{H}ash {A}lgorithms},
  author={Wu, Wei and Li, Bin and Chen, Ling and Gao, Junbin and Zhang, Chengqi},
  journal={IEEE Transactions on Knowledge and Data Engineering},
  year={2022},
  pages={2553--2573},
  volume={34},
  number={6}
}

About

A toolbox of consistent weighted sampling algorithms for weighted Min-Hash.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published