-
Notifications
You must be signed in to change notification settings - Fork 0
Understanding Data
To understand the data, Team has taken the reference of below paper. We have also noted down the key points of these research papers which will fast track the understanding of this project for any new comer.
Papers which we have referred were:
Diversity in Faces--> https://arxiv.org/pdf/1901.10436.pdf
Deep Learning Face Attributes in the Wild --> https://arxiv.org/pdf/1411.7766.pdf
YFCC100M: The New Data in Multimedia Research ---> https://arxiv.org/pdf/1503.01817.pdf
-
These AI systems learn what they are taught. If they are not taught with robust and diverse data sets, accuracy and fairness are at risk.
-
The training data sets should be large enough and diverse enough to learn the many ways in which faces inherently differ.
-
Dimensions like face symmetry, facial contrast, and the sizes, distances and ratios of the various attributes of the face (eyes, nose, forehead, etc.), among many others, are important.
-
Diversity in Faces (DiF) is a new large and diverse data set designed to advance the study of fairness and accuracy in face recognition technology.
-
The DiF annotations are made on faces sampled from the publicly available YFCC-100M data set of 100 million images.
-
The community has built open evaluations around these data sets, such as MegaFace [13], MS- Celeb [14] and the NIST Face Recognition Vendor Test (FRVT)1.
-
One prominent example of an early face data set and open evaluation is Labeled Faces in the Wild (LFW)
-
Data Set Construction: - How should balance be measured? Are age, gender and skin color sufficient? What about other highly personal attributes that are part of our identity, such as race, ethnicity, culture, geography, or visible forms of self-expression that are reflected in our faces in a myriad of ways?
-
We concluded that no single facial feature or combination of commonly used classifications – such as age, gender and skin color – would suffice. Therefore, we formulated a novel multi-modal approach that incorporates a diversity of face analysis methods.
-
Pre-processing Pipeline: - The first step we took was to check whether the URL was still active. If so, we then checked the license.
-
We also removed faces of size less than 50 × 50 pixels or with inter-ocular distance of less than 30 pixels. We removed faces with substantial non-frontal pose.
-
As prior work has pointed out, skin color alone is not a strong predictor of race, and other features such as facial proportions are important.
-
To provide the basis for the three craniofacial feature coding schemes used in DiF, we built on the subset of 19 facial landmark.
-
Each image is then cropped to 128x128 pixels to create a squared image with the face mid-line centered vertically. Next, we convert the spatially transformed image to grayscale to measure intensity.
-
This study found that high-contrast faces were judged to be younger than low-contrast faces.
-
In order to generate a feature measure for the whole face, we extract ITA for pixels within a masked face region as shown in Figure.
- Predicting face attributes in the wild is challenging due to complex face variations.
- Paper proposed a novel deep learning framework for attribute prediction in the wild. It cascades two CNNs, LNet and ANet, which are fine- tuned jointly with attribute tags, but pre-trained differently.
- LNet is pre-trained by massive general object categories for face localization, while ANet is pre-trained by massive face identities for attribute prediction.
- Firstly, LNet is trained in a weakly supervised manner, i.e. only image-level attribute tags of training images are provided, making data preparation much easier.
- Secondly, ANet extracts discriminative face representation, making attribute recognition from the entire face region possible. ANet is pre-trained by classifying massive face identities and is fine-tuned by attributes.
- ANet is pre-trained with massive face identities. It discloses that the pre-trained high-level hidden neurons of ANet implicitly learn and discover semantic concepts that are related to identity, such as race, gender, and age.
- The Creative Commons (CC), a nonprofit organization that was founded in 2001, seeks to build a rich public domain of “some rights reserved” media, sometimes referred to as the copyleft movement.
- This dataset is the largest public multimedia collection that has ever been released, comprising a total of 100 million media objects, of which approximately 99.2 mil- lion are photos and 0.8 million are videos, all of which have been uploaded to Flickr between 2004 and 2014 and published under a CC commercial or non-commercial license.
- Like the metadata, the photo and video data can then be mounted as a read-only network drive.
- Each media object included in the dataset is represented by its metadata in the form its Flickr identifier, the user that created it, the camera that took it, the time at which it was taken and when it was uploaded, the location where it was taken (if available), and the CC license it was published under.
- There are 48,366,323 photos and 103,506 videos in the dataset that have been annotated with a geographic coordinate, either manually by the user or automatically via GPS.
- To understand more about the visual content represented in the dataset, Author’s used a deep learning approach to find the presence of a variety of concepts, such as people, animals, objects, food, events, architecture, and scenery.
- Computing features for 100 million media items is a time- consuming and computationally expensive task. Accompanying the datasets with precomputed features reduced the burden on the participating teams, allowing them to focus on solving the task at hand rather than on processing the data.
- The YFCC100M offers the opportunity to advance research, give rise to new challenges and solve existing ones.