-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathindex.html
463 lines (420 loc) · 20.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="description"
content="Auditing Gender Presentation Differences in Text-to-Image Models">
<meta name="keywords" content="Presentation Differences, Text-to-Image Models, Genders">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Auditing Gender Presentation Differences in Text-to-Image Models</title>
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
rel="stylesheet">
<link rel="stylesheet" href="./static/css/bulma.min.css">
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
<link rel="stylesheet"
href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="./static/css/index.css">
<link rel="icon" href="./static/images/gatech.png">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script defer src="./static/js/fontawesome.all.min.js"></script>
<script src="./static/js/bulma-carousel.min.js"></script>
<script src="./static/js/bulma-slider.min.js"></script>
<script src="./static/js/index.js"></script>
<style>
table.customTable {
width: 50%;
background-color: #FFFFFF;
border-collapse: collapse;
border-width: 2px;
border-color: rgb(214, 236, 244);
border-style: solid;
color: #000000;
margin-left: auto;
margin-right: auto;
}
table.customTable td {
border-width: 2px;
border-color: rgb(214, 236, 244);
border-style: solid;
padding: 5px;
text-align: center;
vertical-align: middle;
}
table.customTable th {
border-width: 2px;
border-color: rgb(214, 236, 244);
border-style: solid;
padding: 5px;
}
table.customTable thead {
background-color: rgb(214, 236, 244);
}
</style>
</head>
<body>
<section class="hero">
<div class="hero-body">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column has-text-centered">
<h1 class="title is-1 publication-title">Auditing Gender Presentation Differences in Text-to-Image Models</h1>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a href="https://stevenyzzhang.github.io/website/">Yanzhe Zhang</a><sup>1</sup>,</span>
<span class="author-block">
<a href="http://www.lujiang.info/">Lu Jiang</a><sup>2</sup><sup>3</sup>,</span>
<span class="author-block">
<a href="https://faculty.cc.gatech.edu/~turk/">Greg Turk</a><sup>1</sup>,
</span>
<span class="author-block">
<a href="https://cs.stanford.edu/~diyiy/">Diyi Yang</a><sup>4</sup>,
</span>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block"><sup>1</sup>Georgia Institute of Technology,</span>
<span class="author-block"><sup>2</sup>Google Research,</span>
<span class="author-block"><sup>3</sup>Carnegie Mellon University,</span>
<span class="author-block"><sup>4</sup>Stanford University</span>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block">
<img src="./static/images/GeorgiaTech_RGB.png" style="margin-right: 50px;" width="200" align="absmiddle"/>
</span><!---->
<span class="author-block">
<img src="./static/images/google_research.svg" width="300" align="absmiddle"/>
</span>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block">
<img src="./static/images/cmu-wordmark-horizontal-r.png" style="margin-right: 50px;" width="400" align="absmiddle"/>
</span><!---->
<span class="author-block">
<img src="./static/images/stanford-university-logo-2.png" width="200" align="absmiddle"/>
</span>
</div>
<div class="column has-text-centered">
<div class="publication-links">
<!-- PDF Link. -->
<span class="link-block">
<a href="https://arxiv.org/pdf/2302.03675.pdf"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-file-pdf"></i>
</span>
<span>Paper</span>
</a>
</span>
<span class="link-block">
<a href="https://arxiv.org/abs/2302.03675"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="ai ai-arxiv"></i>
</span>
<span>arXiv</span>
</a>
</span>
<!-- Code Link. -->
<span class="link-block">
<a href="https://github.com/SALT-NLP/GEP_data"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fab fa-github"></i>
</span>
<span>Code</span>
</a>
</span>
<!-- Dataset Link. -->
<span class="link-block">
<a href="https://github.com/SALT-NLP/GEP_data"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="far fa-images"></i>
</span>
<span>Data</span>
</a>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<section class="hero teaser">
<div class="container is-max-desktop">
<div class="hero-body">
<img src="./static/images/summary_1.png"
class="summary-image"
alt="Summary image."/>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<!-- Abstract. -->
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Abstract</h2>
<div class="content has-text-justified">
<p>
Text-to-image models, which can generate high-quality images based on textual input, have recently enabled various content-creation tools. Despite significantly affecting a wide range of downstream applications, the distributions of these generated images are still not fully understood, especially when it comes to the potential stereotypical attributes of different genders. In this work, we propose a paradigm (Gender Presentation Differences) that utilizes fine-grained self-presentation attributes to study how gender is presented differently in text-to-image models. By probing gender indicators in the input text (e.g., "a woman" or "a man"), we quantify the frequency differences of presentation-centric attributes (e.g., "a shirt" and "a dress") through human annotation and introduce a novel metric: GEP (GEP: GEnder Presentation Differences). Furthermore, we propose an automatic method to estimate such differences. The automatic GEP metric based on our approach yields a higher correlation with human annotations than that based on existing CLIP scores, consistently across three state-of-the-art text-to-image models. Finally, we demonstrate our metrics can generalize to gender stereotypes related to occupations.
</p>
</div>
</div>
</div>
<!--/ Abstract. -->
<!-- Paper video. -->
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h3 class="title is-4">Note</h3>
<div class="content has-text-justified">
<p>
This study uses GEP to refer specifically to the attribute-level presentation differences between images generated from different gender indicators. Note that the definition of GEP is not built on the common usage of gender presentation (gender expression, used to distinguish from gender identity). Also, we do not make assumptions about the genders of the people generated by text-to-image models.
</p>
</div>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column">
<div class="content">
<h2 class="title is-3">(Update!) Leaderboard</h2>
<p>
We report <strong>the automatic GEP scores (GEP<sub>CLS</sub>) </strong> of five stable diffusion checkpoints (from v1.2 to v2.1) and three popular finetuned checkpoints from the community. All checkpoints are tested in the default configuration (PNDMScheduler, 50 steps, guidance 7.5) using the explicit setting of our prompts (introduced below).
</p>
<p>
We urge the community to be aware of and intentionally mitigate such fairness-related issues while iterating the models. These factors also need to be considered by users of those models when deciding which checkpoints to use.
</p>
<p>
(If you want to evaluate your model/report your scores, please email z_yanzhe AT gatech.edu)
</p>
</div>
<table class="customTable">
<thead>
<tr>
<th>Model</th>
<td>Auto GEP score</td>
</tr>
</thead>
<tbody>
<tr>
<th>Stable Diffusion v1.2</th>
<td>0.054</td>
</tr>
<tr>
<th>Stable Diffusion v1.4</th>
<td>0.059</td>
</tr>
<tr>
<th>Stable Diffusion v1.5</th>
<td>0.064</td>
</tr>
<tr>
<th>Stable Diffusion v2.0 base</th>
<td>0.065</td>
</tr>
<tr>
<th>Stable Diffusion v2.1 base</th>
<td>0.078</td>
</tr>
<tr>
<th>prompthero/openjourney</th>
<td>0.073</td>
</tr>
<tr>
<th>hakurei/waifu-diffusion</th>
<td>0.067</td>
</tr>
<tr>
<th>Lykon/DreamShaper</th>
<td>0.034</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column">
<div class="content">
<h2 class="title is-3">Examples</h2>
<img src="./static/images/example_0_1.png"
class="example-image"
alt="Example image."/>
<img src="./static/images/example_1_1.png"
class="example-image"
alt="Example image."/>
<img src="./static/images/example_2_1.png"
class="example-image"
alt="Example image."/>
<img src="./static/images/example_3_1.png"
class="example-image"
alt="Example image."/>
<img src="./static/images/example_4_1.png"
class="example-image"
alt="Example image."/>
<img src="./static/images/example_5_1.png"
class="example-image"
alt="Example image."/>
<p>
Note: We consider two settings while contructing the prompts:
</p>
<p>
(1) <strong>Neutral</strong>: One does not specify any attributes in the prompts, e.g., "A woman holding an umbrella."
</p>
<p>(2) <strong>Explicit</strong>: Specify attributes in the prompts, e.g., "A woman in a tie holding an umbrella."
</p>
</div>
</div>
</div>
<div class="columns is-centered">
<div class="column is-full-width">
<h2 class="title is-3">Why GEP?</h2>
<div class="content has-text-justified">
<p>
To study gender biases in text-to-image models, prior studies classify generated images into gender categories and measure biases using the relative gender frequencies.
</p>
<p>
In this work, we avoid appearance-based gender classification, which is subjective and raises ethical concerns. Instead, we examine concrete and objective <strong>attribute-wise differences</strong> between images generated by text-to-image models with different gender-specific prompts.
</p>
<p>
We aim to provide a <strong>neutral description</strong> of attribute differences present in these generated images, and suggest such differences as an objective lens for practitioners to use to understand potential issues exhibited by text-to-image models, without any presuppositions of genders in these images.
</p>
</div>
</div>
</div>
<div class="columns is-centered">
<div class="column is-full-width">
<h2 class="title is-3">The GEP Metric</h2>
<div class="content has-text-justified">
<center><img src="./static/images/attribute.png"
class="example-image"
style="margin-left: auto; margin-right: auto;"
width="800"
align="center"
alt="Attribute Table."/></center>
<p style="font-size: small" align="center">The GEP metric is based on 15 attributes, which are retrieved from ConceptNet.</p>
<p>
By probing gender indicators in the input text (e.g., "a woman" or "a man"), we quantify the <strong>frequency differences</strong> of various presentation-centric attributes (see the table above) as <strong>the GEP vector</strong> (a 15-dim vector for each model in one setting):
</p>
<center><img src="./static/images/GEP_vec.png"
class="example-image"
style="margin-left: auto; margin-right: auto;"
width="800"
align="center"
alt="Attribute Table."/></center>
<p style="font-size: small" align="center">The GEP vectors for three models in the neutral setting (up) and the explicit setting (bottom). The y axes are presentation differences ("woman" - "man") in symmetric log scaling.</p>
<p>
For instance, the frequency difference on "boots" is calculated by subtracting the frequency of "boots" in images generated from "A woman" from the frequency of "boots" in images generated from "A man".
</p>
<p>
<strong>The GEP score</strong> is the normalized l<math><sub>1</sub></math> norm of the GEP vector, which facilitates the comparison between models (see the table below). By definition, the GEP score ranges from 0 to 1, while a lower GEP score suggests a more negligible presentation difference in predefined attributes.
</p>
<center><img src="./static/images/table.png"
class="example-image"
style="margin-left: auto; margin-right: auto;"
width="500"
align="center"
alt="Results of the GEP score."/></center>
<p style="font-size: small" align="center">CS: CLIP Score, GEP: GEP Score. CogView refers to CogView2, DALLE refers to DALLE-2, Stable refers to Stable Diffusion.</p>
</div>
</div>
</div>
<div class="columns is-centered">
<div class="column is-full-width">
<div class="content has-text-justified">
<p style="font-size: large; background-color : rgb(214, 236, 244)">
<strong>Data Release:</strong> Based on 2 genders, 15 attributes, and 16 contexts, we create 512 <strong>prompts</strong>. We generate 5 <strong>images</strong> per prompt using 3 state-of-the-art text-to-image models (7680 images in total).
We label the existence of attributes in images that are needed to calculate the GEP metric. <strong>We release the prompts/images/annotations</strong> <a href="https://github.com/SALT-NLP/GEP_data">here</a>.
</p>
</div>
</div>
</div>
<div class="columns is-centered">
<div class="column is-full-width">
<h2 class="title is-3">Automatic Estimation</h2>
<div class="content has-text-justified">
<p>
To automatically calculate the GEP metric, we propose <strong>cross-modal classifiers</strong> based on CLIP embeddings to detect the existence of attributes, outperforming the (calibrated) CLIP similarity in terms of the correlation with human annotations.
</p>
<center><span>
<img src="./static/images/auto.png" align="absmiddle"/>
</span><!---->
</center>
</div>
<p style="font-size: small" align="center">Left: Automatic GEP estimation using CLIP similarity. Right: Automatic GEP estimation using Cross-modal classifiers.</p>
<p>
Specifically, we train attribute classifiers on the shared space of CLIP using text captions only and use such classifiers to classify the CLIP embedding of generated images. Note that the proposed approach is as flexible and scalable as calculating CLIP similarity while achieving better performance.
</p>
</div>
</div>
<div class="columns is-centered">
<div class="column is-full-width">
<div class="content has-text-justified">
<p style="font-size: large; background-color : rgb(214, 236, 244)">
<strong>Code Release:</strong> We release the code of the whole pipeline <a href="https://github.com/SALT-NLP/GEP_data">here</a>, together with ready-to-use code for testing new models.
</p>
</div>
</div>
</div>
<div class="columns is-centered">
<div class="column is-full-width">
<h2 class="title is-3">Ethics Statement</h2>
<div class="content has-text-justified">
<p>
In this work, gender indicators that prompt text-to-image models are limited to binary genders. However, gender is _not_ binary. We are fully aware of the harmfulness of excluding non-binary people as it might further marginalize minority groups. Text-to-image models, unfortunately, are intensively trained on two genders. The lack of representation of LGBT individuals in datasets remains a limiting factor for our analysis. Importantly, the framework we propose can be extended to non-binary groups. As dataset representation improves for text-to-image models, we urge future work to re-evaluate representation differences across a wider set of genders.
</p>
</div>
</div>
</div>
<div class="columns is-centered">
<div class="column is-full-width">
<h2 class="title is-3">Acknowledgement</h2>
<div class="content has-text-justified">
<p>
This work was partially supported by the Google Research Collabs program. The authors appreciate valuable feedback and leadership support from David Salesin and Rahul Sukthankar, with special thanks to Tomas Izo for supporting our research in Responsible AI. We would like to thank Hongxin Zhang, Camille Harris, Shang-Ling Hsu, Caleb Ziems, Will Held, Omar Shaikh, Jaemin Cho, Kihyuk Sohn, Vinodkumar Prabhakaran, Susanna Ricco, Emily Denton, and Mohit Bansal for their helpful insights and feedback.
</p>
</div>
</div>
</div>
</section>
<section class="section" id="BibTeX">
<div class="container is-max-desktop content">
<h2 class="title">BibTeX</h2>
<pre><code>@misc{zhang2023auditing,
title={Auditing Gender Presentation Differences in Text-to-Image Models},
author={Yanzhe Zhang and Lu Jiang and Greg Turk and Diyi Yang},
year={2023},
eprint={2302.03675},
archivePrefix={arXiv},
primaryClass={cs.CV}
}</code></pre>
</div>
</section>
<footer class="footer">
<div class="container">
<div class="columns is-centered">
<div class="column is-8">
<div class="content">
<p>
This website is licensed under a <a rel="license"
href="http://creativecommons.org/licenses/by-sa/4.0/">Creative
Commons Attribution-ShareAlike 4.0 International License</a>.
</p>
<p>
This source code of this website is borrowed from <a
href="https://github.com/nerfies/nerfies.github.io">Nerfies</a>.
</p>
</div>
</div>
</div>
</div>
</footer>
</body>
</html>