-
Notifications
You must be signed in to change notification settings - Fork 0
/
Saliency.tex
892 lines (755 loc) · 40.7 KB
/
Saliency.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
%\documentclass[10pt,twocolumn,letterpaper,draft]{article}
\documentclass[10pt,twocolumn,letterpaper]{article}
\usepackage{cvpr}
\usepackage{times}
\usepackage{epsfig}
\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{subfigure}
\usepackage{overpic}
\usepackage{enumitem}
\setenumerate[1]{itemsep=0pt,partopsep=0pt,parsep=\parskip,topsep=5pt}
\setitemize[1]{itemsep=0pt,partopsep=0pt,parsep=\parskip,topsep=5pt}
\setdescription{itemsep=0pt,partopsep=0pt,parsep=\parskip,topsep=5pt}
% Include other packages here, before hyperref.
% If you comment hyperref and then uncomment it, you should delete
% egpaper.aux before re-running latex. (Or just hit 'q' on the first latex
% run, let it finish, and you should be clear).
\usepackage[pagebackref=true,breaklinks=true,letterpaper=true,colorlinks,bookmarks=false]{hyperref}
\cvprfinalcopy % *** Uncomment this line for the final submission
\def\cvprPaperID{159} % *** Enter the CVPR Paper ID here
\def\httilde{\mbox{\tt\raisebox{-.5ex}{\symbol{126}}}}
\newcommand{\figref}[1]{Figure~\ref{#1}}
\newcommand{\tabref}[1]{Table~\ref{#1}}
\newcommand{\equref}[1]{Equation~\ref{#1}}
\newcommand{\secref}[1]{Section~\ref{#1}}
\newcommand{\cmm}[1]{\textcolor[rgb]{0,0.6,0}{CMM: #1}}
\newcommand{\todo}[1]{{\textcolor{red}{\bf [#1]}}}
\newcommand{\alert}[1]{\textcolor[rgb]{.6,0,0}{#1}}
\newcommand{\IT}{IT\cite{98pami/Itti}}
\newcommand{\MZ}{MZ\cite{03ACMMM/Ma_Contrast-based}}
\newcommand{\GB}{GB\cite{conf/nips/HarelKP06}}
\newcommand{\SR}{SR\cite{07cvpr/hou_SpectralResidual}}
\newcommand{\FT}{FT\cite{09cvpr/Achanta_FTSaliency}}
\newcommand{\CA}{CA\cite{10cvpr/goferman_context}}
\newcommand{\LC}{LC\cite{06acmmm/ZhaiS_spatiotemporal}}
\newcommand{\AC}{AC\cite{08cvs/achanta_salient}}
\newcommand{\HC}{HC-maps }
\newcommand{\RC}{RC-maps }
\newcommand{\Lab}{$L^*a^*b^*$}
\newcommand{\vnudge}{\vspace*{-.1in}}
%\newcommand{\vnudge}{\vspace*{-2pt}}
\newcommand{\mypara}[1]{\paragraph{#1.}}
%\renewcommand{\baselinestretch}{0.995}
\graphicspath{{figures/}}
% Pages are numbered in submission mode, and unnumbered in camera-ready
%\ifcvprfinal\pagestyle{empty}\fi
\setcounter{page}{409}
\begin{document}
%%%%%%%%% TITLE
\title{Global Contrast based Salient Region Detection}
\author{Ming-Ming Cheng$^{1}$\quad Guo-Xin Zhang$^{1}$ \quad Niloy J. Mitra$^{2}$
\quad Xiaolei Huang$^{3}$ \quad Shi-Min Hu$^{1}$ \\
$^{1}$ TNList, Tsinghua University \quad \quad
$^2$ KAUST \quad \quad $^3$ Lehigh University\\
%{\tt \small [chengmingvictor, zgx.net, niloym]@gmail.com \quad [email protected] \quad [email protected]}
{\tt \small [email protected]}
}
\maketitle
% \thispagestyle{empty}
%%%%%%%%% ABSTRACT
\begin{abstract}
%
Reliable estimation of visual saliency allows appropriate processing of images without prior
knowledge of their contents, and thus remains an important step in many computer vision tasks
including image segmentation, object recognition, and adaptive compression.
%
We propose a regional contrast based saliency extraction algorithm,
which simultaneously evaluates global contrast differences and spatial coherence.
%
The proposed algorithm is simple, efficient, and yields full resolution saliency maps.
%
Our algorithm consistently outperformed existing saliency detection methods, yielding
higher precision and better recall rates, when evaluated using one of the largest
publicly available data sets.
%
We also demonstrate how the extracted saliency map can be used to create high quality
segmentation masks for subsequent image processing.
\end{abstract}
%%%%%%%%% BODY TEXT %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}\label{sec:Introduction}
Humans routinely and effortlessly judge the importance of image regions, and focus
attention on important parts.
%
Computationally detecting such salient image regions remains a significant goal,
as it allows preferential allocation of computational resources in subsequent image analysis and synthesis.
%
Extracted saliency maps are widely used in many computer vision applications including object-of-interest
image segmentation~\cite{06TCSVT/han_unsupervised,06josa/KoN_InterestSegmentation},
object recognition~\cite{04cvpr/RutishauserWWKP}, adaptive compression of images~\cite{00CE/christopoulos_jpeg},
content-aware image editing~\cite{TOG/Wang08,09cgf/ZhangC,wu-2010-resizing,10vc/Ding}, and image retrieval~\cite{tog09/ChenCT_Sketch2Photo}.
\begin{figure}[t!]
\begin{overpic}[width=\columnwidth]{teaser.pdf}
\end{overpic}
\caption{Given input images~(top), a global contrast analysis is used
to compute high resolution saliency maps~(middle), which can be used to
produce masks~(bottom) around regions of interest.
}\label{fig:teaser} \vnudge
\end{figure}
\begin{figure*}[t!]
\begin{overpic}[width=\textwidth]{all_comparisons.pdf} \small
\put(0.3,0){(a)~original}
\put(10,0){(b)~\IT}
\put(18.3,0){(c)~\MZ}
\put(27.5,0){(d)~\GB}
\put(37,0){(e)~\SR}
\put(46.5,0){(f)~\AC}
\put(55.5,0){(g)~\CA}
\put(65,0){(h)~\FT}
\put(73.5,0){(i)~\LC}
\put(84,0){(j)~HC}
\put(93,0){(k)~RC}
\end{overpic}
\caption{Saliency maps computed by different state-of-the-art methods~(b-i),
and with our proposed HC~(j) and RC methods~(k). Most results
highlight edges, or are of low resolution.
See also Figure~\ref{fig:VisualComparison} (and our project webpage).
}\label{fig:cmp1vAll} \vnudge
\end{figure*}
Saliency originates from visual uniqueness, unpredictability, rarity, or surprise, and is often
attributed to variations in image attributes like color, gradient, edges, and boundaries.
%
Visual saliency, being closely related to how we perceive and process visual stimuli, is
investigated by multiple disciplines including cognitive
psychology~\cite{55ARP/Teuber_physiological,04nature/Wolfe_attributesVisual},
neurobiology~\cite{95ARN/DesimoneNeuralMachanisms,09biology/eyeMovement}, and computer
vision~\cite{98pami/Itti,09cvpr/Achanta_FTSaliency}.
%
Theories of human attention hypothesize that the human vision system only processes parts of an image in detail,
while leaving others nearly unprocessed.
%
Early work by Treisman and Gelade~\cite{80cogSc/Treisman_featureIntegration}, Koch and
Ullman~\cite{85HN/KochVisualAttention}, and subsequent attention theories proposed by Itti, Wolfe
and others, suggest two stages of visual attention: fast, pre-attentive, bottom-up, data driven
saliency extraction; and slower, task dependent, top-down, goal driven saliency extraction.
We focus on bottom-up data driven saliency detection using image contrast.
%
It is widely believed that human cortical cells may be \emph{hard wired} to preferentially respond
to high contrast stimulus in their receptive fields~\cite{03neuron/Reynolds_attentionSaliency}.
%
We propose contrast analysis for extracting high-resolution, full-field saliency maps based on the
following observations:
\begin{itemize}
\item A global contrast based method, which separates a large-scale object from its surroundings,
is preferred over local contrast based methods producing high saliency values at or near object edges.
\item Global considerations enable assignment of comparable saliency values to similar image regions,
and can uniformly highlight entire objects.
\item Saliency of a region depends mainly on its contrast to the nearby regions,
while contrasts to distant regions are less significant.
\item Saliency maps should be fast and easy to generate to allow processing of
large image collections, and facilitate efficient image classification and retrieval.
\end{itemize}
We propose a \emph{histogram-based contrast method}~(HC) to measure saliency.
%
\HC assign pixel-wise saliency values based simply on color separation from all other image
pixels to produce full resolution saliency maps. We use a histogram-based approach for efficient processing,
while employing a smoothing procedure to control quantization artifacts.
%
Note that our algorithm is targeted towards natural scenes,
and maybe suboptimal for extracting saliency of highly textured scenes (see \figref{fig:challenging_maps}).
As an improvement over HC-maps, we incorporate spatial relations to produce
\emph{region-based contrast}~(RC) maps where we first segment the input image into
regions, and then assign saliency values to them. The saliency value of a region is
now calculated using a global contrast score, measured by the region's
contrast and spatial distances to other regions in the image.
We have extensively evaluated our methods on publicly available benchmark data sets, and compared
our methods with (eight) state-of-the-art saliency methods
~\cite{98pami/Itti,03ACMMM/Ma_Contrast-based,06acmmm/ZhaiS_spatiotemporal,conf/nips/HarelKP06,07cvpr/hou_SpectralResidual,08cvs/achanta_salient,09cvpr/Achanta_FTSaliency,10cvpr/goferman_context}
as well as with manually produced ground truth annotations\footnote{Results for 1000 images and
prototype software are available at the project webpage:
\href{http://cg.cs.tsinghua.edu.cn/people/~cmm/saliency/}{http://cg.cs.tsinghua.edu.cn/people/\%7Ecmm/saliency/}}.
%
The experiments show significant improvements over previous methods both in precision and recall rates.
%
Overall, compared with HC-maps, \RC produce better precision and recall rates, but at the cost
of increased computations. Encouragingly, we observe that the saliency cuts extracted
using our saliency maps are, in most cases, comparable to manual annotations.
%
We also present application of the extracted saliency maps to segmentation,
context aware resizing, and non-photo realistic rendering.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Related Work}
\label{sec:RelatedWorks}
We focus on relevant literature targeting pre-attentive bottom-up saliency detection,
which may be biologically motivated, or purely computational, or involve both aspects.
%
Such methods utilize low-level processing to determine the contrast of image regions to their surroundings,
using feature attributes such as intensity, color, and edges~\cite{09cvpr/Achanta_FTSaliency}.
%
We broadly classify the algorithms into local and global schemes.
Local contrast based methods investigate the rarity of image regions with respect to (small) local neighborhoods.
%
Based on the highly influential biologically inspired \emph{early representation} model introduced
by Koch and Ullman~\cite{85HN/KochVisualAttention}, Itti et al.~\cite{98pami/Itti} define image
saliency using central-surrounded differences across multi-scale image features.
%
Ma and Zhang~\cite{03ACMMM/Ma_Contrast-based} propose an alternative local contrast
analysis for generating saliency maps, which is then extended using a fuzzy growth model.
%
Harel et al.~\cite{conf/nips/HarelKP06} normalize the feature maps of Itti et al., to highlight
conspicuous parts and permit combination with other importance maps.
%
Liu et al.~\cite{10pami/Liu_Learning} find multi-scale contrast by linearly combining contrast
in a Gaussian image pyramid.
%
More recently, Goferman et al.~\cite{10cvpr/goferman_context} simultaneously model local
low-level clues, global considerations, visual organization rules, and high-level
features to highlight salient objects along with their contexts.
%
Such methods using local contrast tend to produce higher saliency values near edges instead of uniformly
highlighting salient objects (see \figref{fig:cmp1vAll}).
Global contrast based methods evaluate saliency of an image region using its contrast
with respect to the entire image.
%
Zhai and Shah~\cite{06acmmm/ZhaiS_spatiotemporal} define pixel-level saliency based on a pixel's contrast to all other pixels.
However, for efficiency they use only luminance information, thus ignoring distinctiveness clues in other channels.
%
Achanta et al.~\cite{09cvpr/Achanta_FTSaliency} propose a frequency tuned method that
directly defines pixel saliency using a pixel's color difference from the average image
color. The elegant approach, however, only considers first order average color, which
can be insufficient to analyze complex variations common in natural images.
%
In Figures~\ref{fig:VisualComparison} and~\ref{fig:plots}, we show qualitative and
quantitative weaknesses of such approaches. Furthermore, these methods ignore spatial
relationships across image parts, which can be critical for reliable and coherent
saliency detection (see \secref{sec:Experiment}).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Histogram Based Contrast}\label{sec:HC}
Based on the observation from biological vision that the vision system is sensitive to contrast in visual signal,
we propose a histogram-based contrast~(HC) method to define saliency values
for image pixels using color statistics of the input image.
%
Specifically, the saliency of a pixel is defined using its color contrast to all other pixels in the image,
i.e., the saliency value of a pixel $I_k$ in image $I$ is defined as,
\begin{equation}\label{equ:PixelColorContrast}
S(I_k) = \sum_{\forall I_i \in I} D(I_k, I_i),
\end{equation}
where $D(I_k, I_i)$ is the color distance metric between pixels $I_k$ and $I_i$ in the
\Lab space (see also~\cite{06acmmm/ZhaiS_spatiotemporal}).
%
%
\equref{equ:PixelColorContrast} can be expanded by pixel order to have the following
form,
\begin{equation}\label{equ:PixelCCPixelOrder}
S(I_k) = D(I_k, I_1) + D(I_k, I_2) + \cdots + D(I_k, I_N),
\end{equation}where $N$ is the number of pixels in image $I$.
%
It is easy to see that pixels with the same color value have the same saliency value
under this definition, since the measure is oblivious to spatial relations. Hence,
rearranging \equref{equ:PixelCCPixelOrder} such that the terms with the same color value
$c_j$ are grouped together, we get saliency value for each color as,
\begin{equation}\label{equ:PixelCCColorOrder}
S(I_k) = S(c_l) = \sum_{j=1}^{n}{f_j D(c_l, c_j)},
\end{equation}
where $c_l$ is the color value of pixel $I_k$, $n$ is the number of distinct pixel colors, and
$f_j$ is the probability of pixel color $c_j$ in image $I$.
%
Note that in order to prevent salient region color statistics from being corrupted by
similar colors from other regions, one can develop a similar scheme using varying window
masks. However, given the strict efficiency requirement, we take the simple global
approach.
\begin{figure}
\begin{overpic}[width=\columnwidth]{histogram.pdf}
\end{overpic}
\caption{Given an input image~(left), we compute its color histogram~(middle).
Corresponding histogram bin colors are shown in the lower bar.
The quantized image~(right) uses only $43$ histogram bin colors
and still retains sufficient visual quality for saliency detection.
}\label{fig:colorFre} \vnudge
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\vnudge
\mypara{Histogram based speed up} Naively evaluating the saliency value for each image
pixel using \equref{equ:PixelColorContrast} takes $O(N^2)$ time, which is
computationally too expensive even for medium sized images.
%
The equivalent representation in \equref{equ:PixelCCColorOrder}, however, takes $O(N) + O(n^2)$ time,
implying that computational efficiency can be improved to $O(N)$ if $O(n^2) \leq O(N)$.
%
Thus, the key to speed up is to reduce the number of pixel colors in the image.
However, the true-color space contains $256^3$ possible colors,
which is typically larger than the number of image pixels.
Zhai and Shah~\cite{06acmmm/ZhaiS_spatiotemporal} reduce the number of colors, $n$,
by only using luminance. In this way, $n^2=256^2$ (typically $256^2 \ll N$).
However, their method has the disadvantage that the distinctiveness of color information is ignored.
%
In this work, we use the full color space instead of luminance only.
To reduce the number of colors needed to consider, we first quantize each color channel to have 12 different values,
which reduces the number of colors to $12^3=1728$.
%
Considering that color in a natural image typically covers only a small portion of the full color space,
we further reduce the number of colors by ignoring less frequently occurring colors.
%
By choosing more frequently occurring colors and ensuring these colors cover the colors of more
than $95\%$ of the image pixels, we typically are left with around $n=85$ colors (see
\secref{sec:Experiment} for experimental details).
%
The colors of the remaining pixels, which comprise fewer than $5\%$ of the image pixels,
are replaced by the closest colors in the histogram. A typical example of such
quantization is shown in \figref{fig:colorFre}.
%
Note that again due to efficiency requirements we select the simple histogram based quantization
instead of optimizing for an image specific color palette.
\begin{figure}
\centering
\includegraphics[width=\columnwidth]{histogram_saliency.pdf}
\caption{Saliency of each color, normalized to the range $[0,1]$,
before~(left) and after~(right) color space smoothing.
Corresponding saliency maps are shown in the respective insets.
}\label{fig:HCSmoothing} \vnudge
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\vnudge
\mypara{Color space smoothing}
Although we can efficiently compute color contrast by building a compact color histogram
using color quantization and choosing more frequent colors,
the quantization itself may introduce artifacts.
%
Some similar colors may be quantized to different values.
In order to reduce noisy saliency results caused by such randomness,
we use a smoothing procedure to refine the saliency value for each color.
%
We replace the saliency value of each color by the weighted average of the saliency
values of similar colors (measured by \Lab distance).
%
This is actually a smoothing process in the color feature space. Typically we choose
$m=n/4$ nearest colors to refine the saliency value of color $c$ by,
\begin{equation}\label{equ:smoothing}
S'(c) = \frac{1}{(m-1)T} \sum_{i=1}^{m} (T-D(c, c_i))S(c_i)
\end{equation}
where $T=\sum_{i=1}^{m} D(c, c_i)$ is the sum of distances between color $c$ and its
$m$ nearest neighbors $c_i$, and the normalization factor comes from
%\begin{equation*}\label{equ:smoothingN}
$\sum_{i=1}^{m} (T-D(c, c_i))=(m-1)T.$
%\end{equation*}
%
Note that we use a linearly-varying smoothing weight $(T-D(c, c_i))$ to assign
larger weights to colors closer to $c$ in the color feature space.
%
In our experiments, we found that such linearly-varying weights are better than Gaussian
weights, which fall off too sharply.
%
\figref{fig:HCSmoothing} shows the typical effect of color space smoothing with the
corresponding histograms sorted by decreasing saliency values.
%
Note that similar histogram bins are closer to each other after such smoothing,
indicating that similar colors have higher likelihood of being assigned similar saliency values,
thus reducing quantization artifacts (see \figref{fig:plots}).
\vnudge
\mypara{Implementation details}
To quantize the color space into $12^3$ different colors, we uniformly divide each color
channel into 12 different levels.
%
While the quantization of colors is performed in the RGB color space, we measure color
differences in the \Lab color space because of its perceptual accuracy.
%
However, we do not perform quantization directly in the \Lab color space since not all colors in
the range $L^*\in[0, 100]$, and $a^*, b^*\in[-127,127]$ necessarily correspond to real colors.
%
Experimentally we observed worse quantization artifacts using direct \Lab color space
quantization.
%
Best results were obtained by quantization in the RGB space while measuring distance in the \Lab
color space, as opposed to performing both quantization and distance calculation in a
single color space, either RGB or \Lab.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Region Based Contrast}
\begin{figure}
\centering
\includegraphics[width=\columnwidth]{region_contrast.pdf}\\
\caption{Image regions generated by Felzenszwalb and Huttenlocher's
segmentation method~\cite{04ijcv/felzenszwalb_efficient}~(left),
region contrast based segmentation with~(left-middle) and without~(right-middle) distance weighting.
Incorporating the spatial context, we get a high quality saliency cut~(right) comparable to
human labeled ground truth.
}\label{fig:regContrast} \vnudge
\end{figure}
Humans pay more attention to those image regions that contrast strongly with their
surroundings~\cite{03neuroscience/luminanceContrast}.
%
Besides contrast, spatial relationships play an important role in human attention.
%
High contrast to its surrounding regions is usually stronger evidence for
saliency of a region than high contrast to far-away regions.
%
Since directly introducing spatial relationships when computing pixel-level contrast is
computationally expensive, we introduce a contrast analysis method, \emph{region contrast} (RC),
so as to integrate spatial relationships into region-level contrast computation.
%
In RC, we first segment the input image into regions, then compute color contrast at the region level,
and define the saliency for each region as the weighted sum of the region's contrasts
to all other regions in the image.
%
The weights are set according to the spatial distances with farther regions being assigned smaller
weights.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\vnudge
\mypara{Region contrast by sparse histogram comparison}
We first segment the input image into regions using a graph-based image segmentation
method \cite{04ijcv/felzenszwalb_efficient}.
%
Then we build the color histogram for each region as in \secref{sec:HC}.
%
For a region $r_k$, we compute its saliency value by measuring its color contrast to all other
regions in the image,
\begin{equation}\label{equ:regContrastSaliency}
S(r_k) = \sum_{r_k \neq r_i} w(r_i) D_r(r_k, r_i),
\end{equation}
where $w(r_i)$ is the weight of region $r_i$ and $D_r(\cdot, \cdot)$ is the color distance metric
between the two regions.
%
Here we use the number of pixels in $r_i$ as $w(r_i)$ to emphasize color contrast to bigger regions.
%
The color distance between two regions $r_1$ and $r_2$ is defined as,
\begin{equation}\label{equ:regContrast}
D_r(r_1, r_2) = \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} f(c_{1,i}) f(c_{2,j}) D(c_{1,i}, c_{2,j})
\end{equation}
where $f(c_{k,i})$ is the probability of the $i$-th color $c_{k,i}$ among all $n_k$ colors in the
$k$-th
region $r_k$, $k=\{1,2\}$.
%
%Similarly we define $c_{2,j}$, $n_2$ and $f(c_{2,j})$ for region $r_2$.
%
Note that we use the probability of a color in the probability density function (i.e. normalized color histogram) of the region as the weight for this color to emphasize more the color differences between dominant colors.
Storing and calculating the regular matrix format histogram for each region is inefficient since
each region typically contains a small number of colors in the color histogram of the whole image.
%
Instead, we use a sparse histogram representation for efficient storage and computation.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\vnudge
\mypara{Spatially weighted region contrast} We further incorporate spatial information by
introducing a spatial weighting term in \equref{equ:regContrastSaliency} to increase the effects
of closer regions and decrease the effects of farther regions.
%
Specifically, for any region $r_k$, the spatially weighted region contrast based saliency is
defined as:
\begin{equation}\label{equ:regContrastSpatial}
S(r_k) = \sum_{r_k \neq r_i} \exp({-D_s(r_k, r_i)/\sigma_s^2}) w(r_i) D_r(r_k, r_i)
\end{equation}
where $D_s(r_k, r_i)$ is the spatial distance between regions $r_k$ and $r_i$, and $\sigma_s$
controls the strength of spatial weighting.
%
Larger values of $\sigma_s$ reduce the effect of spatial weighting
so that contrast to farther regions would contribute more to the saliency of the current region.
%
The spatial distance between two regions is defined as the Euclidean distance between their
centroids. In our implementation, we use $\sigma_s^2 = 0.4$ with pixel
coordinates normalized to $[0, 1]$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Experimental Comparisons}\label{sec:Experiment}
\begin{figure*}[t!]
\begin{overpic}[width=\textwidth]{comparison.pdf} \small
\put(3,0){(a)~original}
\put(18.5,0){(b)~LC}
\put(33,0){(c)~CA}
\put(48,0){(d)~FT}
\put(60,0){(e)~\HC}
\put(74,0){(f)~\RC}
\put(90,0){(g)~RCC}
\end{overpic}
\caption{Visual comparison of saliency maps. (a)~original images, saliency maps produced using
(b)~Zhai and Shah~\cite{06acmmm/ZhaiS_spatiotemporal},
(c)~Goferman et al.~\cite{10cvpr/goferman_context},
(d) Achanta et al.~\cite{09cvpr/Achanta_FTSaliency},
(e) our HC and (f)~RC methods, and (g)~RC-based saliency cut results.
Our methods generate uniformly highlighted salient regions. (See our project webpage for all results on the full benchmark dataset.)
}\label{fig:VisualComparison} \vnudge
\end{figure*}
We have evaluated the results of our approach on the publicly available database provided
by Achanta et al. \cite{09cvpr/Achanta_FTSaliency}.
%
To the best of our knowledge, the database is the largest of its kind, and has ground truth in the
form of accurate human-marked labels for salient regions.
%
We compared the proposed global contrast based methods with $8$ state-of-the-art saliency detection methods.
%
Following~\cite{09cvpr/Achanta_FTSaliency}, we selected these methods according to:
number of citations (\IT ~and \SR), recency (\GB, SR, \AC, \FT ~and \CA), variety (IT is
biologically-motivated, \MZ~is purely computational, GB is hybrid, SR works in the frequency
domain, AC and FT output full resolution saliency maps), and being related to
our approach (\LC).
We used our methods and the others to compute saliency maps for all the 1000 images in
the database.
%
\tabref{tab:TimeEfficency} compares the average time taken by each method.
%
Our algorithms, HC and RC, are implemented in C++.
%
For the other methods namely IT, GB, SR, FT and CA, we used the authors' implementations,
while for LC, we implemented the algorithm in C++ since we could not find the authors'
implementation.
%
For typical natural images, our HC method needs $O(N)$ computation time and is sufficiently
efficient for real-time applications. In contrast, our RC variant is slower as it requires image
segmentation~\cite{04ijcv/felzenszwalb_efficient}, but produces superior quality saliency maps.
\begin{table*}
\centering
\begin{tabular}{l|c|c|c|c|c|c|c|c|c|c} \hline\hline
Method & \IT & \MZ & \GB & \SR & \FT & \AC & \CA & \LC & HC & RC \\ \hline
Time(s) & 0.611 & 0.070 & 1.614 & 0.064 & 0.016 & 0.109 & 53.1 & 0.018 & 0.019 & 0.253 \\ \hline
Code & Matlab & C++ & Matlab & Matlab & C++ & C++ & Matlab & C++ & C++ & C++ \\ \hline\hline
\end{tabular}
\caption{Average time taken to compute a saliency map for images in the database
by Achanta et al.~\cite{09cvpr/Achanta_FTSaliency}. Most images in the
database~(see our project webpage) have resolution $400\times300$.
Algorithms were tested using a Dual Core 2.6 GHz machine with 2GB RAM.
} \label{tab:TimeEfficency}
\end{table*}
\begin{figure*}
\centering
\includegraphics[width=\textwidth]{plots.pdf}
\caption{Precision-recall curves for naive thresholding of saliency maps
using 1000 publicly available benchmark images. (Left, middle)~Different options
of our method compared with \GB, \MZ, \FT, \IT, \SR, \AC, \CA, and \LC.
NHC denotes a naive version of our HC method with color space smoothing disabled, and
NRC denotes our RC method with spatial related weighting disabled.
%
(Right)~Precision-recall bars for our saliency cut algorithm using
different saliency maps as initialization. Our method RC shows high
precision, recall, and $F_{\beta}$ values over the 1000-image database. (Please refer to
our project webpage for the respective result images.)
} \label{fig:plots} \vnudge
\end{figure*}
In order to comprehensively evaluate the accuracy of our methods for salient object
segmentation, we performed two experiments using different objective comparison measures.
%
In the first experiment, to segment salient objects and calculate precision and recall
curves~\cite{07cvpr/hou_SpectralResidual}, we binarized the saliency map using every possible fixed
threshold, similar to the fixed thresholding experiment in~\cite{09cvpr/Achanta_FTSaliency}.
%
In the second experiment, we segment salient objects by iteratively applying
GrabCut~\cite{04tog/rother_grabcut} initialized using thresholded saliency maps, as we will describe later.
%
We also use the obtained saliency maps as importance weighting for content aware image
resizing and non-photo realistic rendering.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\vnudge
\mypara{Segmentation by fixed thresholding}
The simplest way to get a binary segmentation of salient objects is to
threshold the saliency map with a threshold $T_f \in [0, 255]$.
%
To reliably compare how well various saliency detection methods highlight salient regions in images,
we vary the threshold $T_f$ from $0$ to $255$. \figref{fig:plots} shows the resulting precision vs. recall curves.
%
We also present the benefits of adding the color space smoothing and spatial weighting
schemes, along with objective comparison with other saliency extraction methods.
%
Visual comparison of saliency maps obtained by the various methods can be seen in
Figures~\ref{fig:cmp1vAll} and \ref{fig:VisualComparison}. %(All test results on the whole database are included as
%supplementary material.)
The precision and recall curves clearly show that our methods outperform the other eight methods.
%
The extremities of the precision vs. recall curve are interesting: At maximum recall
where $T_f = 0$, all pixels are retained as positives, i.e., considered to be
foreground, so all the methods have the same precision and recall values; precision
$0.2$ and recall $1.0$ at this point indicate that, on average, there are $20\%$ image
pixels belonging to the ground truth salient regions.
%
At the other end, the minimum recall values of our methods are higher than those of the other
methods, because the saliency maps computed by our methods are smoother and contain more
pixels with the saliency value $255$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\vnudge
\mypara{Saliency cut}
We now consider the use of the computed saliency map to assist in salient object segmentation.
%
Saliency maps have been previously employed for unsupervised object segmentation:
%
Ma and Zhang~\cite{03ACMMM/Ma_Contrast-based} find rectangular salient regions
by fuzzy region growing on their saliency maps.
%
Ko and Nam~\cite{06josa/KoN_InterestSegmentation} select salient regions using a support
vector machine trained on image segment features, and then cluster these
regions to extract salient objects.
%
Han et al.~\cite{06TCSVT/han_unsupervised} model color, texture, and edge features in a Markov
random field framework to grow salient object regions from seed values in the saliency maps.
%
More recently, Achanta et al.~\cite{09cvpr/Achanta_FTSaliency} average saliency values within
image segments produced by mean-shift segmentation, and then find salient objects by identifying
image segments that have average saliency above a threshold that is set to be twice the mean
saliency value of the entire image.
\begin{figure}[b!]
\includegraphics[width=\columnwidth]{saliency_cut.pdf}
\caption{Saliency Cut. (Left to right) Initial segmentation, trimap after first iteration,
trimap after second iteration, final segmentation, and manually labeled ground truth.
In the segmented images, blue is foreground, gray is background, while in
the trimaps, foreground is red, background is green, and unknown regions are left unchanged.
}\label{fig:AttCutSteps} \vnudge
\end{figure}
\begin{figure*}
\begin{overpic}[width=\linewidth]{cut_compare.pdf}\small
\put(3,0){(a)~original}
\put(17,0){(b)~\LC}
\put(32,0){(c)~\CA}
\put(46,0){(d)~\FT}
\put(60,0){(e)~\HC}
\put(74,0){(f)~\RC}
\put(87,0){(g)~ground truth}
\end{overpic}
\caption{Saliency cut using different saliency maps for initialization.
The respective saliency maps are shown in \figref{fig:VisualComparison}.
}\label{fig:cutCmp} \vnudge
\end{figure*}
In our approach, we iteratively apply GrabCut \cite{04tog/rother_grabcut} to refine the
segmentation result initially obtained by thresholding the saliency map (see
\figref{fig:AttCutSteps}).
%
Instead of manually selecting a rectangular region to initialize the process, as in classical GrabCut,
we automatically initialize GrabCut using a segmentation obtained by binarizing the saliency map using a fixed threshold; the threshold is chosen empirically to be the threshold that gives $95\%$ recall rate in our fixed thresholding experiments.
Once initialized, we iteratively run GrabCut to improve the saliency cut result. (At most $4$ iterations are run in our experiments.)
%
After each iteration, we use dilation and erosion operations on the current segmentation
result to get a new trimap for the next GrabCut iteration.
%
As shown in \figref{fig:AttCutSteps}, the region outside the dilated region is set to
background, the region inside the eroded region is set to foreground, and the remaining
areas are set to unknown in the trimap.
%
GrabCut, which by itself is an iterative process using Gaussian mixture models and graph cut,
helps to refine salient object regions at each step.
%
Regions closer to an initial salient object region are more likely to be part of that salient object than far-away regions.
%
Thus, our new initialization enables GrabCut to include nearby salient regions,
and exclude non-salient regions according to color feature dissimilarity.
%
In the implementation, we set a narrow image-border region (15 pixels wide) to be always in the background in order to
avoid slow convergence in the border region.
\figref{fig:AttCutSteps} shows two examples of our saliency cut algorithm.
%
In the flag example, unwanted regions are correctly excluded during GrabCut iterations.
%
In the flower example, our saliency cut method successfully expanded the initial salient regions
(obtained directly from the saliency map) and converged to an accurate segmentation result.
To objectively evaluate our new saliency cut method using our RC-map as initialization,
we compare our results with results obtained by coupling iterative GrabCut with initializations
from saliency maps computed by other methods.
%
For consistency, we binarize each such saliency map using a threshold that gives $95\%$
recall rate in the corresponding fixed thresholding experiment (see \figref{fig:plots}). A visual comparison of the results is shown in \figref{fig:cutCmp}.
%
Average precision, recall, and $F$-Measure are compared over the entire ground-truth
database~\cite{09cvpr/Achanta_FTSaliency}, with the $F$-Measure defined as:
\begin{equation}\label{equ:FMeasure}
F_{\beta} = \frac{(1+\beta^2)Precision \times Recall}{\beta^2 \times Precision + Recall}.
\end{equation}
We use $\beta^2 = 0.3$ as in Achanta et al.~\cite{09cvpr/Achanta_FTSaliency} to weigh
precision more than recall.
%
As can be seen from the comparison (see Figures~\ref{fig:plots}-right and \ref{fig:cutCmp}),
saliency cut using our RC and HC saliency maps significantly outperform other methods.
%
Compared with the state-of-the-art results on this database by Achanta et al. (precision = $75\%$, recall = $83\%$), we achieved better accuracy (precision = $90\%$, recall = $90\%$). (Our demo software is available at our project webpage.)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[t!]
\begin{overpic}[width=\columnwidth]{contentAware_application.pdf} \small
\put(3,0){original}
\put(21,0){CA}
\put(31,0){RC}
\put(48,0){original}
\put(75,0){CA}
\put(90,0){RC}
\end{overpic}
\caption{Comparison of content aware image resizing~\cite{09cgf/ZhangC} results
using \CA saliency maps and our RC saliency maps.
}\label{fig:Resizing} \vnudge
\end{figure}
\vnudge
\mypara{Content aware image resizing} In image re-targeting, saliency maps are usually used to
specify relative importance across image parts (see also~\cite{09_image_resize}).
%
We experimented with using our saliency maps in the image resizing method proposed by Zhang et al.~\cite{09cgf/ZhangC}\footnote{The authors' original implementation is publicly available.}, which
distributes distortion energy to relatively non-salient regions of an image while
preserving both global and local image features.
%
Figure~\ref{fig:Resizing} compares the resizing results using our \RC with the results using \CA ~saliency maps.
Our RC saliency maps help produce better resizing results since the saliency values in salient object regions are piece-wise smooth,
which is important for energy based resizing methods.
%
CA saliency maps, having higher saliency values at object boundaries, are less suitable for
applications like resizing, which require entire salient objects to be uniformly highlighted.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\vnudge
\mypara{Non-photorealistic rendering}
%
Artists often abstract images and highlight meaningful parts of an image while masking out
unimportant regions~\cite{99/zeki_innerVision}.
%
Inspired by this observation, a number of non-photorealistic rendering~(NPR) efforts use saliency
maps to generate interesting effects~\cite{02tog/decarlo_stylization}.
%
We experimentally compared our work with the most related, state-of-the-art saliency detection
algorithm~\cite{09cvpr/Achanta_FTSaliency} in the context of a recent NPR
technique~\cite{10pg/Huang_Zhang} (see \figref{fig:NPR}).
%
Our \RC give better saliency masks, which help the NPR method to better preserve details
in important image parts and region boundaries, while smoothing out others.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[t!]
\begin{overpic}[width=\columnwidth]{npr_application2.pdf} \small
\end{overpic}
\caption{(Middle, right)~\FT$ $ and RC saliency maps are used respectively
for stylized rendering~\cite{10pg/Huang_Zhang} of an input image~(left).
Our method produces a better saliency map, see insets, resulting in
improved preservation of details, e.g., around the head and the fence regions.
}\label{fig:NPR} \vnudge
\end{figure}
\begin{figure}[t!]
\begin{overpic}[width=\columnwidth]{challenging_maps.pdf} \small
\end{overpic}
\caption{ Challenging examples for our histogram based methods involve non-salient regions with
similar colors as the salient parts~(top), or an image with textured background~(bottom).
(Left to right)~Input image, HC-map, HC saliency cut, RC-map, RC saliency cut.
} \label{fig:challenging_maps} \vnudge
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\vnudge
\section{Conclusions and Future Work}\label{sec:Conclusion}
We presented global contrast based saliency computation methods, namely Histogram based
Contrast~(HC) and spatial information-enhanced Region based Contrast~(RC).
%
While the HC method is fast and generates results with fine details, the RC method generates
spatially coherent high quality saliency maps at the cost of reduced computational efficiency.
%
We evaluated our methods on the largest publicly available data set and compared our scheme with
eight other state-of-the-art methods.
%
Experiments indicate the proposed schemes to be superior in terms of both precision and recall,
while still being simple and efficient.
In the future, we plan to investigate efficient algorithms that incorporate spatial relationships
in saliency computation while preserving fine details in the resulting saliency maps. Also, it is
desirable to develop saliency detection algorithms to handle cluttered and textured background,
which could introduce artifacts to our global histogram based approach.
%(although we did not encounter such images in the database).
%
Finally, it may be beneficial to incorporate high level factors like human faces, symmetry into saliency maps.
We believe the proposed saliency maps can be used for efficient object detection \cite{06TCSVT/han_unsupervised},
reliable image classification, robust image scene analysis~\cite{journal/tog/ChengZMHH10},
leading to improved image retrieval \cite{tog09/ChenCT_Sketch2Photo}.
%\paragraph{Acknowledgements.} This research was supported by the
%National Basic Research Project of China (NO. 2011CB302205),
%the Natural Science Foundation of China (NO. U0735001)
%and the National High Technology Research and Development Program of China (NO. 2009AA01Z327).
\vnudge
\paragraph{Acknowledgements.} This research was supported by the
973 Program (2011CB302205), the 863 Program (2009AA01Z327), the Key Project of
S\&T (2011ZX01042-001-002), and NSFC (U0735001).
Ming-Ming Cheng was funded by Google PhD fellowship, IBM PhD fellowship,
and New PhD Researcher Award (Ministry of Edu., CN).
{\small
\bibliographystyle{ieee}
\bibliography{Saliency}
}
\end{document}