THE WEDNESDAY PAPER — Matching Problem

-By V´ıtor Albiero, Nisha Srinivas, Esteban Villalobos, Jorge Perez-Facuse, Roberto Rosenthal, Domingo Mery, Karl Ricanek, and Kevin W. Bowyer

Paper Link

Similar Code

Matching live images (“selfies”) to images from ID documents is a problem that can arise in various applications. A challenging instance of the problem arises when the face image on the ID document is from early adolescence and the live image is from later adolescence. We explore this problem using a private dataset called Chilean Young Adult (CHIYA) dataset, where we match live face images taken at age 18–19 to face images on ID documents created at ages 9 to 18. State-of-the-art deep learning face matchers (e.g., ArcFace) have relatively poor accuracy for documentto-selfie face matching. To achieve higher accuracy, we fine-tune the best available open-source model with triplet loss for a few-shot learning. Experiments show that our approach achieves higher accuracy than the DocFace+ model recently developed for this problem. Our fine-tuned model was able to improve the true acceptance rate for the most difficult (largest age span) subset from 62.92% to 96.67% at a false acceptance rate of 0.01%. Our fine-tuned model is available for use by other researchers.

The main challenge is ID document image is from early adolescence and the live image from later adolescent / early adulthood, so the facial appearance may have undergone a substantial natural change. This problem context arises, for example, in applications where a young person presents his/her ID card to enrol for a government benefit and the ID card either does not contain a digital image or the application is not authorised or equipped to access such an electronically embedded image.

Some previous works are from before the wave of deep learning algorithms in face recognition. Using the open CNN matcher that achieves the highest accuracy, we fine-tune it using triplet loss for few-shot learning. We use triplet loss to deal with the fact that we have just two images per person, as triplet loss does not perform classification, but instead verification, thus no convergence difficulty is observed.

Few-Shot Learning with Triplet Loss

To deal with the heterogeneity in the two types of images, the fine-tuning selects positive and negative pairs that are the opposite type of the anchor, thus learning to separate impostors and authentic pairs in the appropriate context. By using more than 230,000 subjects in training, with subjects having different elapsed times for their image pairs, the fine-tuning process should learn to deal with the elapsed time problem. Thus, the fine-tuning approach is designed with features to address the issues of (i) few images per subject, (ii) matching images of two different types, and (iii) varying age difference during adolescence for different image pairs.


  • Only work to study adolescence scanned identity document to selfie matching, demonstrating the difficulty of the problem.
  • A freely available model with a significant increase in accuracy in the CHIYA dataset; this model also performs well on a different dataset (Public-Ivs ), and surpass previous works on ID versus selfie matching problem.
  • Finding of no consistent or large gender difference in accuracy between adolescents document-to-selfie face matching.

Two streams of previous work come together in this project. One is the doc-face-to-selfie matching problem, and the other is the problem of face matching across varying time lapse during adolesence.

Face Matching Across Adolescence

Adolescence refers to the period of growth and development of children, roughly the age range of 10 to 19 years, starting with puberty and ending with an adult identity and behaviour. “In the Wild Celebrity Children” (ITWCC) dataset contains images of subjects at child, preadolescence and adolescence ages.

Document Face to Selfie Face Matching

Using a dataset collected in a real banking scenario, called FaceBank used pre-trained CNNs to extract features to match ID cards to selfies. Their private dataset consist of 13, 501 subjects, which had their ID cards and faces. The classifiers were trained to predict if the pair is authentic or impostor, and showed accuracy rates higher than 93%.

Points to note from this stream of previous work include:

  • No previous work on document-to-selfie face matching considers the adolescent age range.
  • Only [15] works with document face images acquired by taking an image of the ID document, as distinguished from digital images stored as part of the ID document.


Accuracy Evaluation

Accuracy Difference By Gender

Comparison Between ID Card Formats

Final Acceptance Rate

Finally, from the 5, 503 images available, we were able to detect a face in 5, 502.


This work studied the problem of matching ID cards to selfies across age differences in adolescence. Our results show that existing methods perform poorly on the task, especially when there is a large age difference (8–9 years) between the ID card and selfie images. Our proposed method AIM-CHIYA is effective in improving the accuracy of the best method (ArcFace), increasing the average TAR across groups from 85.13% to 98.76% at a FAR of 0.01%. Also, our method improves over the accuracy of the fine-tuned state-of-the-art DocFace+ model. Our analysis suggests that there is not a general significant difference in accuracy between males and females, as both have similar accuracy. Of the two types of card format in the CHIYA dataset, results show that the yellow card results in lower accuracy. We also find that yellow cards are a larger fraction of the hardest subgroup. AIM-CHIYA was able to improve the accuracy on both formats, resulting in much more similar accuracy across groups, reducing the difference of TAR@FAR=0.01% between the best (i18s1819) and the worst (i10s1819) subgroups from 34.39% to 2.95%.