Skip to content
/ rdupes Public

Find duplicate files and show them in an interactive GUI

Notifications You must be signed in to change notification settings

rizsi/rdupes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 

Repository files navigation

rdupes

Find duplicate files and show them in an interactive GUI.

Typical usage is to find and remove duplicates in a backup folder or to check whether files in a machine are backed up already or not.

Download and Install

Download the latest version from: http://rizsi.com/programs/rdupes/

The program need not be installed but can be run as a user. Dependencies required:

  • java 8 JRE

  • JavaFX - which is a separate package on many Linux distros

  • xgd-open - just to lauch this help page from the about box

Install the dependencies on Ubuntu 16:04 or later:

$ sudo apt-get install default-jre openjfx

Or with this in case default jre is not version 8 yet: (On 14.04 there is no JRE 8 yet. It can be installed from PPA or by downloading from Oracle and installing manually.)

$ sudo apt-get install openjdk-8-jre openjfx

Usage

Run the program with:

$ java -jar rdupes.jar

Drag a folder from a file browser into the main area of the window to load it. Or use the "File/Add root folder…​" menu as an alternative.

Multiple folders can be dragged. They can be removed (from the view they are not deleted by the tool) by using the context menu on the root folders.

Edit the files to achieve your goals (Get rid of all duplicates, make sure the backup folder contains everything, etc). The UI is auomatically updated.

If you want to use the trash function then set up a trach folder for the root folder using right click on it. Then files and folders can be moved to the trash folder by selecting the "Trash" action in the context menu.

Files and folders can be omitted from the duplicate search by adding ".rdupesignore" files to folders. The sematics of these files is similar to gitignore. The folders are automatically re-loaded when such a file is updated.

Multiple windows can be opened onto the same in-memory model. It can be useful when two folders are edited at the same time. Use The menu: File/Create new window. The program finishes when all windows are closed.

rdupes00

Explanation of the UI:

  • Each node in the tree is a folder or a file.

  • fileName 5, 15 bytes means that there are 5 files that uses 15 bytes within the subtree.

  • [3, 10 bytes] means that there are 3 duplicate files that use 10 bytes storage together. If there are no duplicates the [] is omitted. When there is at least one duplicate in a subtree it is written with red fonts.

  • → filePath shows the path of the similar files

  • green background means that all files within that subtree have a copy in a different subtree. (Deleting that subtree results in no data loss.)

Use cases

Get rid of duplicates in a folder - for example in a photos or a music collection

Open rdupes on the folder. Wait until all files are cached then browse the tree for duplicates. Use extrnal tools to delete to files or the internal trash function to move them to a separate folder. The trash folder then can be deleted by hand.

In case there is an intentional duplicate then create a .rdupesignore file containing its name within the parent folder. (Only one of the duplicates need to be ignored so further duplicates will still be found.)

Make sure that a backup disk contains all files from your PC

Load the PC folder and the backup folder at the same time. Files and folders have to be copied into the backup until all folders (and also the root folder) are green in the source folder.

In case there are files or folders which need not be backed up then create an .rdupesignore file containing them.

Make sure that two backups are similar

Make sure that both are duplicate free. Then load them at once and edit them until both are green (without creating duplicates). Double check that there are no duplicates within any of the backups - by loading them and only them at once.

tweaks

The number of hashing threads can be changed by entering a number into that text area. The number of physical cores is a good guess. Hyperthreading cores are not very useful (according to my measurements) because hashing is very memory intensive.

Features

  • Visits recursively all files within the root folders and finds the ones with similar size and md5 sum.

  • Symlinks are ignored either for folders and files. (They are not visited and shown just as if they were not there.)

  • Shows the files in a tree view. The tree view is automatically refreshed when the files are updated in the file system.

  • Show sub-collisions in the root folder. Just like error markers are propagated upwards in tree-editors. Number of sub-collisions is marked.

  • Possibility to ignore files (can be used mark duplicates as accepted). .rdupesignore files are handled as gitignore files in git.

  • Add/remove folders dynamically without restart. This is important because restart means re-hashing all the files.

  • Opening multiple views is possible. Useful when two folders are reviewed simultaneously.

  • Multithreaded hash counting using effective java.nio API of MessageDigest

  • Folders visiting, hash counting, etc is done on background thread and progress feedback is shown in the main window.

  • Feedback of heap allocation and GC on request

  • Deleting duplicates is possible by moving to a "trash" folder which is set up on the root folder. Action is accessible as a context menu.

  • add folders by drag and drop - validate for not adding subfolders of each other

  • Warning popup screen for possible data loss using the program

  • Validation view to check if all files within a folder are present in the other folder. (Are all files backed up?) Such folders have green background. Dupe files also have a green background.

Planned features

  • foldable Settings view

  • Sane initial value for n threads of hash counting (Use https://github.com/veddan/java-physical-cores )

  • easy set up of trash folder

  • two pane view - easy move folders from one to the other

  • Also find similar folders - hash of folder is the hash of the files in alphabetic filename order

About

Find duplicate files and show them in an interactive GUI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published