Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a remote command for batch duplicate finding. #1524

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion auto-complete/geeqie
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ file_types='@(3fr|ani|arw|avif|bmp|cr2|cr3|crw|cur|dds|djvu|dng|erf|exr|.fits|fi

actions='About AddMark0 AddMark1 AddMark2 AddMark3 AddMark4 AddMark5 AddMark6 AddMark7 AddMark8 AddMark9 AlterNone Animate Back ClearMarks CloseWindow ColorProfile0 ColorProfile1 ColorProfile2 ColorProfile3 ColorProfile4 ColorProfile5 ConnectZoom100 ConnectZoom200 ConnectZoom25 ConnectZoom300 ConnectZoom33 ConnectZoom400 ConnectZoom50 ConnectZoomFillHor ConnectZoomFillVert ConnectZoomFit ConnectZoomIn ConnectZoomOut Copy CopyImage CopyPath CopyPathUnquoted CropFourThree CropNone CropOneOne CropRectangle CropSixteenNine CropThreeTwo CutPath Delete DeleteWindow DrawRectangle Escape ExifRotate ExifWin FilterMark0 FilterMark1 FilterMark2 FilterMark3 FilterMark4 FilterMark5 FilterMark6 FilterMark7 FilterMark8 FilterMark9 FindDupes FirstImage FirstPage Flip FloatTools FolderTree Forward FullScreen Grayscale HelpChangeLog HelpContents HelpKbd HelpNotes HelpPdf HelpSearch HelpShortcuts HideBars HideSelectableToolbars HideTools HistogramChanB HistogramChanCycle HistogramChanG HistogramChanR HistogramChanRGB HistogramChanV HistogramModeCycle HistogramModeLin HistogramModeLog Home IgnoreAlpha ImageBack ImageForward ImageHistogram ImageOverlay ImageOverlayCycle IntMark0 IntMark1 IntMark2 IntMark3 IntMark4 IntMark5 IntMark6 IntMark7 IntMark8 IntMark9 KeywordAutocomplete LastImage LastPage LayoutConfig LogWindow Maintenance Mark0 Mark1 Mark2 Mark3 Mark4 Mark5 Mark6 Mark7 Mark8 Mark9 Mirror Move NewCollection NewFolder NewWindow NewWindowDefault NewWindowFromCurrent NextImage NextPage OpenArchive OpenCollection OpenRecent OpenWith OverUnderExposed PanView PermanentDelete Plugins Preferences PrevImage PrevPage Print Quit Rating0 Rating1 Rating2 Rating3 Rating4 Rating5 RatingM1 RectangularSelection Refresh Rename RenameWindow ResetMark0 ResetMark1 ResetMark2 ResetMark3 ResetMark4 ResetMark5 ResetMark6 ResetMark7 ResetMark8 ResetMark9 Rotate180 RotateCCW RotateCW SBar SBarSort SaveMetadata Search SearchAndRunCommand SelectAll SelectInvert SelectMark0 SelectMark1 SelectMark2 SelectMark3 SelectMark4 SelectMark5 SelectMark6 SelectMark7 SelectMark8 SelectMark9 SelectNone SetMark0 SetMark1 SetMark2 SetMark3 SetMark4 SetMark5 SetMark6 SetMark7 SetMark8 SetMark9 ShowFileFilter ShowInfoPixel ShowMarks SlideShow SlideShowFaster SlideShowPause SlideShowSlower SplitDownPane SplitHorizontal SplitNextPane SplitPaneSync SplitPreviousPane SplitQuad SplitSingle SplitTriple SplitUpPane SplitVertical StereoAuto StereoCross StereoCycle StereoOff StereoSBS Thumbnails ToggleMark0 ToggleMark1 ToggleMark2 ToggleMark3 ToggleMark4 ToggleMark5 ToggleMark6 ToggleMark7 ToggleMark8 ToggleMark9 UnselMark0 UnselMark1 UnselMark2 UnselMark3 UnselMark4 UnselMark5 UnselMark6 UnselMark7 UnselMark8 UnselMark9 Up UseColorProfiles UseImageProfile ViewIcons ViewInNewWindow ViewList WriteRotation WriteRotationKeepDate Zoom100 Zoom200 Zoom25 Zoom300 Zoom33 Zoom400 Zoom50 ZoomFillHor ZoomFillVert ZoomFit ZoomIn ZoomOut ZoomToRectangle'

options='--action= --action-list --back --cache-metadata --cache-render= --cache-render-recurse= --cache-render-shared= --cache-render-shared-recurse= --cache-shared= --cache-thumbs= --close-window --config-load= --debug= --delay= --file= --File= --file-extensions --first --fullscreen --geometry= --get-collection= --get-collection-list --get-destination= --get-file-info --get-filelist= --get-filelist-recurse= --get-rectangle --get-render-intent --get-selection --get-sidecars= --get-window-list --grep= --id= --last --log-file= --lua= --new-window --next --pixel-info --print0 --quit --raise --selection-add= --selection-clear --selection-remove= --show-log-window --slideshow --slideshow-recurse= --tell --tools --view= --version'
options='--action= --action-list --back --cache-metadata --cache-render= --cache-render-recurse= --cache-render-shared= --cache-render-shared-recurse= --cache-shared= --cache-thumbs= --close-window --config-load= --debug= --delay= --duplicates-process --duplicates-program= --duplicates-threshold= --file= --File= --file-extensions --first --fullscreen --geometry= --get-collection= --get-collection-list --get-destination= --get-file-info --get-filelist= --get-filelist-recurse= --get-rectangle --get-render-intent --get-selection --get-sidecars= --get-window-list --grep= --id= --last --log-file= --lua= --new-window --next --pixel-info --print0 --quit --raise --selection-add= --selection-clear --selection-remove= --show-log-window --slideshow --slideshow-recurse= --tell --tools --view= --version'

_geeqie()
{
Expand Down
114 changes: 114 additions & 0 deletions src/command-line-handling.cc
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@

#include "command-line-handling.h"

#include <sys/wait.h>

#include <cstring>
#include <map>
#include <vector>

#include "cache-maint.h"
Expand All @@ -40,6 +44,7 @@
#include "main-defines.h"
#include "main.h"
#include "misc.h"
#include "pic-equiv.h"
#include "pixbuf-renderer.h"
#include "rcfile.h"
#include "secure-save.h"
Expand Down Expand Up @@ -407,6 +412,112 @@ void gq_delay(GtkApplication *, GApplicationCommandLine *app_command_line, GVari
options->slideshow.delay = static_cast<gint>(n * 10.0 + 0.01);
}

void gq_duplicates_process(GtkApplication *, GApplicationCommandLine *, GVariantDict *, GList *file_list)
{
std::map<std::string, std::unique_ptr<pic_equiv>> pics;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unique_ptr seems redundant.

Copy link
Author

@porridge porridge Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @qarkai
I addressed the other comments, but with my rudimentary C++ skills I cannot see how to address this one?

  • replacing with pic_equiv * would leak memory, right?
  • replacing with pic_equiv does not work because the assignment operator is deleted:
../src/command-line-handling.cc: In function ‘void {anonymous}::gq_duplicates_process(GtkApplication*, 
GApplicationCommandLine*, GVariantDict*, GList*)’:
../src/command-line-handling.cc:422:42: error: use of deleted function ‘pic_equiv& pic_equiv::operator=(const pic_equiv&)’
  422 |                 pics[name] = pic_equiv(fd);

and it's deleted because of the sim member.

It also seems to me like using a pointer is better than copying the sim structure around...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. ImageSimilarityData lacks constructors and moving operator. Ok. let's stay with unique_ptr for now.

for (GList *work = file_list; work; work = work->next)
{
const char *fd = static_cast<const char *>(work->data);
std::string name(fd);
pics[name] = std::make_unique<pic_equiv>(fd);
}
DEBUG_1("processing %d files in set", pics.size());

// Compute similarity score for every pair, build equivalence sets.
for (auto outer_iter = pics.begin(); outer_iter != pics.end(); ++outer_iter)
{
for (auto inner_iter = std::next(outer_iter); inner_iter != pics.end(); ++inner_iter)
{
double similarity = outer_iter->second->compare(*inner_iter->second);
DEBUG_1("%s vs %s: %f", outer_iter->second->name.c_str(), inner_iter->second->name.c_str(), similarity);
if (similarity < options->duplicates_similarity_threshold)
continue;
outer_iter->second->equivalent.insert(inner_iter->second->equivalent.begin(), inner_iter->second->equivalent.end());
for (auto const &sibling : outer_iter->second->equivalent)
{
pics[sibling]->equivalent.insert(outer_iter->second->equivalent.begin(), outer_iter->second->equivalent.end());
}
}
}

std::set<std::string> processed;
for (auto const &pic : pics)
{
if (pic.second->equivalent.size() < 2)
// skip this pic if not similar to any other one but itself
continue;
if (processed.find(pic.second->name) != processed.end())
// skip this pic if it was already processed (when processing a similar image)
continue;
std::vector<const char *> cmd;
cmd.push_back(options->duplicates_program);
for (auto const &sibling : pic.second->equivalent)
{
cmd.push_back(sibling.c_str());
processed.insert(sibling);
}
cmd.push_back(nullptr);
pid_t pid = fork();
if (pid == -1)
{
log_printf("failed creating child process: %s\n", strerror(errno));
return;
}
if (pid == 0)
{
execvp(const_cast<char *>(cmd[0]), const_cast<char **>(cmd.data()));
perror("execv");
exit(1);
}
else
{
int status;
wait(&status);
if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
{
log_printf("subprocess failed, aborting further duplicate processing\n");
return;
}
}
}
}

void gq_duplicates_program(GtkApplication *, GApplicationCommandLine *, GVariantDict *command_line_options_dict, GList *)
{
const gchar *text = nullptr;

g_variant_dict_lookup(command_line_options_dict, "duplicates-program", "&s", &text);

g_free(options->duplicates_program);
options->duplicates_program = g_strdup(text);
DEBUG_1("duplicates program set to \"%s\"", options->duplicates_program);
}

void gq_duplicates_threshold(GtkApplication *, GApplicationCommandLine *, GVariantDict *command_line_options_dict, GList *)
{
const gint thresh_min = 0;
const gint thresh_max = 100;
gint thresh = 0;
gboolean res;

res = g_variant_dict_lookup(command_line_options_dict, "duplicates-threshold", "i", &thresh);
if (res)
{
if (thresh < thresh_min || thresh > thresh_max)
{
printf_term(TRUE, "Image similarity threshold " BOLD_ON "%d" BOLD_OFF " out of range (%d to %d)\n", thresh, thresh_min, thresh_max);
return;
}
}
else
{
thresh = 99;
}

options->duplicates_similarity_threshold = static_cast<guint>(thresh);
DEBUG_1("threshold set to %d", options->duplicates_similarity_threshold);
}

void file_load_no_raise(const gchar *text, GApplicationCommandLine *app_command_line)
{
gchar *filename;
Expand Down Expand Up @@ -1487,6 +1598,9 @@ CommandLineOptionEntry command_line_options[] =
{ "debug", gq_debug, PRIMARY_REMOTE, GUI },
#endif
{ "delay", gq_delay, PRIMARY_REMOTE, GUI },
{ "duplicates-process", gq_duplicates_process, PRIMARY_REMOTE, TEXT },
{ "duplicates-program", gq_duplicates_program, PRIMARY_REMOTE, GUI },
{ "duplicates-threshold", gq_duplicates_threshold, PRIMARY_REMOTE, GUI },
{ "file", gq_file, PRIMARY_REMOTE, GUI },
{ "File", gq_File, PRIMARY_REMOTE, GUI },
{ "file-extensions", gq_file_extensions, PRIMARY_REMOTE, TEXT },
Expand Down
3 changes: 3 additions & 0 deletions src/main.cc
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,9 @@ GOptionEntry command_line_options[] =
{ "debug" , 0, G_OPTION_FLAG_NONE, G_OPTION_ARG_INT , nullptr, _("turn on debug output") , "[level]" },
#endif
{ "delay" , 'd', G_OPTION_FLAG_NONE, G_OPTION_ARG_STRING, nullptr, _("set slide show delay to Hrs Mins N.M seconds,") , "<[H:][M:][N][.M]>" },
{ "duplicates-process" , 'p', G_OPTION_FLAG_NONE, G_OPTION_ARG_NONE , nullptr, _("group duplicate pictures in current collection and process them") , nullptr },
{ "duplicates-program" , 0, G_OPTION_FLAG_NONE, G_OPTION_ARG_STRING, nullptr, _("run program with each identified set of duplicate images, by default 'echo'") , "<PROGRAM>" },
{ "duplicates-threshold" , 0, G_OPTION_FLAG_NONE, G_OPTION_ARG_INT, nullptr, _("set similarity threshold (0-100) for what is considered a duplicate") , "<N>" },
{ "file" , 0, G_OPTION_FLAG_NONE, G_OPTION_ARG_STRING, nullptr, _("open FILE or URL bring Geeqie window to the top") , "<FILE>|<URL>" },
{ "File" , 0, G_OPTION_FLAG_NONE, G_OPTION_ARG_STRING, nullptr, _("open FILE or URL do not bring Geeqie window to the top") , "<FILE>|<URL>" },
{ "file-extensions" , 0, G_OPTION_FLAG_NONE, G_OPTION_ARG_NONE , nullptr, _("list known file extensions") , nullptr },
Expand Down
2 changes: 2 additions & 0 deletions src/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,8 @@ main_sources = files('advanced-exif.cc',
'osd.cc',
'osd.h',
'pan-view.h',
'pic-equiv.cc',
'pic-equiv.h',
'pixbuf-renderer.cc',
'pixbuf-renderer.h',
'pixbuf-util.cc',
Expand Down
1 change: 1 addition & 0 deletions src/options.cc
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ ConfOptions *init_options(ConfOptions *options)
options->dnd_icon_size = 48;
options->dnd_default_action = DND_ACTION_ASK;
options->duplicates_similarity_threshold = 99;
options->duplicates_program = g_strdup("echo");
options->rot_invariant_sim = TRUE;
options->sort_totals = FALSE;
options->rectangle_draw_aspect_ratio = RECTANGLE_DRAW_ASPECT_RATIO_NONE;
Expand Down
1 change: 1 addition & 0 deletions src/options.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ struct ConfOptions

guint duplicates_similarity_threshold;
guint duplicates_match;
gchar *duplicates_program;
gboolean duplicates_thumbnails;
guint duplicates_select_type;
gboolean rot_invariant_sim;
Expand Down
68 changes: 68 additions & 0 deletions src/pic-equiv.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
/*
* Copyright (C) 2024 The Geeqie Team
*
* Author: Marcin Owsiany <[email protected]>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*
*
* Helper class for computing equivalence sets of pictures.
*
*/

#include "pic-equiv.h"

/**
* @param cname path to picture file to represent
*/
pic_equiv::pic_equiv(char const *cname) : name(cname), equivalent{name}, sim(load_image_sim(cname)) {}

ImageSimilarityData *pic_equiv::load_image_sim(const char *cname)
{
g_autoptr(GError) error = nullptr;
g_autoptr(GdkPixbuf) buf = gdk_pixbuf_new_from_file(cname, &error);
if (error)
{
fprintf(stderr, "Unable to read file %s: %s\n", cname, error->message);
return nullptr;
}
return image_sim_new_from_pixbuf(buf);
}

pic_equiv::~pic_equiv()
{
if (sim != nullptr)
image_sim_free(sim);
}

/**
* @brief orders two pic_equiv objects, according to the paths they represent
*/
int operator<(const pic_equiv &a, const pic_equiv &b)
{
return a.name < b.name;
}

/**
* @brief compares this pic_equiv object to another
* @param other object to compare to
* @returns a number between 0 and 100 which denotes visual similarity
*/
gdouble pic_equiv::compare(const pic_equiv &other)
{
if (sim == nullptr || other.sim == nullptr)
return 0.0;
return 100.0 * image_sim_compare(sim, other.sim);
}
50 changes: 50 additions & 0 deletions src/pic-equiv.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
/*
* Copyright (C) 2024 The Geeqie Team
*
* Author: Marcin Owsiany <[email protected]>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*
*
* Helper class for computing equivalence sets of pictures.
*
*/

#include <set>
#include <string>

#include <gdk-pixbuf/gdk-pixbuf.h>
#include <glib/gtypes.h>

#include "similar.h"

/**
* @class pic_equiv
* @brief holds a picture's similarity data, as well as its equivalence set, and allows comparing pictures.
*/
class pic_equiv {
public:
explicit pic_equiv(char const *cname);
~pic_equiv();
pic_equiv(const pic_equiv& other) = delete;
pic_equiv& operator=(const pic_equiv& other) = delete;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess pic_equiv assignment operator could be restored using shared_ptr with custom deleter for sim member.

And one more thing. Would you mind renaming class to PicEquiv or something for naming consistency?

gdouble compare(const pic_equiv&);
std::string name;
std::set<std::string> equivalent;
private:
ImageSimilarityData *sim;
static ImageSimilarityData *load_image_sim(const char *cname);
friend int operator<(const pic_equiv &a, const pic_equiv &b);
};