Facial Recognition

(Editor:   Pam Marek)


Why do we make errors in recognition? Research indicates that memory is reconstructive rather than reproductive, a distinction originally made in Bartlett's (1932) classic work. In other words, what people recognize or recall is seldom an exact replica of an original event, object, or face, but is rather a re-creation of what has been experienced. As people undertake the task of reconstructing the past, they are influenced by multiple variables including, but not limited to, the extent to which the target memory is associated with other memories (Roediger & McDermott, 1995), or is linked to multiple contexts. In some cases, people may be confident that an event, object, or face is familiar, but may be unable to identity in what particular context they have seen it before. This effect of familiarity (Schacter, 2001) has been targeted as a source of false recognition in eyewitness identifications. Memory is also influenced by attention (Stanny & Johnson, 2000), encoding (the processes people use to bring information into long-term memory), the extent to which the encoding context resembles the retrieval context (Godden & Baddeley, 1975), and by the emotional content of the stimuli (LaBar & Phelps, 1998). With all these variables involved, it is not surprising that people have a tendency to make errors.

What are the consequences of recognition errors? Insofar as facial recognition, errors can lead to the embarrassment that most people have experienced when they forget the name of someone to whom they were recently introduced. However, errors can be considerably more serious. For example, incorrect eyewitness identification may result in the long-term imprisonment of an innocent person who was incorrectly identified as a perpetrator.

How do researchers assess the accuracy of recognition memory? Suppose people were asked to study several faces. Shortly thereafter, in a "test" phase, they would be shown the same "old" faces, interspersed among other "new" faces ("foils") that had not been seen before. Their task would be to indicate ("Yes" or "No") whether they had seen each face before. Responses would be classified in one of four categories derived from signal detection theory:

Actual situation/"Reality"

Response "Old" (shown in study phase) "New" (foil, not shown in study phase)
"Yes," saw during study Hit (correct answer) False alarm/recognition (incorrect answer)
"No, " did not see during study Miss (incorrect answer) Correct rejection (correct answer)
Figure 1

Examining only the "hits" would provide a misleading impression of accuracy. Suppose, on the test, 50% of the faces were "new" and 50% were "old." People who could frequently, but not perfectly, discriminate between the "new" and "old" faces might have a 40% hit rate and 40% correct rejections (with 20% false alarms or misses). In contrast, people who could not frequently discriminate between the "new" and "old" faces might guess "old" whenever they were in doubt. Such a pattern of guessing might yield an even higher hit rate (perhaps even 50% if all "old" guesses were correct), but would also yield a high false alarm rate (possibly even reaching 50% if such people responded "old" on every trial). Clearly, the people who can discriminate between the old and new faces are more accurate than those who simply say "old" whenever they are in doubt, yet the hit rates for these two types of people might be quite similar. Thus, to provide a clearer picture of accuracy in recognition memory experiments, researchers often calculate a discrimination index, a mathematical combination of hit and false alarm rates. In this facial recognition experiment, the discrimination index is labeled A' (A prime). It ranges from .50 (chance level) to 1.00 (perfect discrimination).

This facial recognition experiment is designed to yield two discrimination indices, one for immediate recognition and one for delayed recognition (the following day). Additionally, group data will permit assessment of the effect of familiarity on discrimination and false recognition.

Design & Procedures

In the study phase, you will view 10 "Most Wanted" faces, shown simultaneously. There will be no time limit for studying the faces. Next, you will complete two recognition tests, one immediately and another the following day. On each test, you will view a set of 20 sequentially presented faces (10 from the Most Wanted list and 10 "foils," faces you have not seen before). For each face, you will click a button to respond "Yes" or "No" to the question "Is this one of the Most Wanted?" On the second (delayed) test, some participants will be randomly assigned to view a series of faces with the same foils shown on the first test, whereas others will be assigned to view a series of faces with different foils. Thus, this is a 2 (time of test: Day 1 or Day 2) x 2 (Day 2 foil: Same as Day 1/False memory or Different from Day 1/Control) mixed-samples design. Time of test is a repeated-measures variable. Type of foil on Day 2 is an independent-measures variable.

Because of the familiarity effect, it is hypothesized that people who view the same foils (on the second day) will have poorer discrimination and more false recognition than people who view the different foils. In other words, if the foils on the first and second test are the same, people taking the second test may recognize that the foils are familiar, yet fail to identify the source of the familiarity. They may mistakenly conclude that the foils were seen during study of the Most Wanted faces, rather than on the first test. Such misjudgment is called an error in source monitoring.

Data Analyses

Data is downloadable in three formats (XML, Excel spreadsheet format, and comma delimited for statistical software packages). Figure 1 shows an excerpt from a sample Excel spreadsheet. The first four columns provide classification data (user ID number, gender, class id, and age; Columns G and N provide the dates of the first and second tests respectively.

The column labeled "Condition" indicates whether participants viewed the same foil faces on Day 2 (False Memory) or different foil faces on Day 2 (Control). The columns Hit, Miss, False Recognition, and Correct Rejection indicate the number of times each occurred on Day 1 and Day 2 respectively. Because there are 10 Most Wanted and 10 foil faces on each test, the total of Hits + Misses equals 10, as does the total of False Alarms + Correct Rejections. The column labeled DI includes the discrimination index for each day. The columns labeled Day1Time and Day2Time indicate how long it took to respond to the test set of faces.

Sample data image from the Facial Recognition experiment
Figure 2

The most appropriate analysis of this data would involve conducting 2 (condition: control or false memory) by 2 (time of test: day 1 or day 2) mixed-samples analyses of variance. For the dependent variables of discrimination index and false alarms, you might expect to find poorer discrimination and more false alarms on Day 2, with the differences being more pronounced in the false memory condition. If available statistical packages do not include a procedure for mixed analyses of variance, t tests may be used, with the understanding that multiple tests increase the risk of Type I error (concluding there is a difference when, in reality, there is no difference in the population). Independent samples t-tests might be used to determine if discrimination scores and false alarms differed on Day 2 for the false memory and control conditions. Dependent samples t tests might be used to determine if discrimination scores declined and false alarms increased from Day 1 to Day 2 in the false memory condition.


Recent research on facial recognition has focused on heuristics (mental short cuts) that people use when deciding if they recognize a face. For example, Kleider and Goldinger (2006) have examined the interplay between the generation and resemblance heuristics. The generation heuristic involves making a recognition decision based on whether or not details of an original context can be retrieved, with such generation leading to a more positive recognition judgment. The resemblance heuristic may lead to a positive recognition judgment if a face is closely associated with a prototype (e.g., a thief). Other researchers (e.g., Weber & Brewer, 2003) have continued to study factors influencing confidence-accuracy calibration and how longer exposure may inappropriately inflate confidence (Memon, Hope, & Bull, 2003). Cross-race bias, the difficulty people have in identifying faces of ethic groups other than their own, remains a concern (Sporer, 2001) in facial recognition, as does own-age bias (Wright & Stroud, 2002). As DNA evidence becomes a source of exoneration for innocent people convicted of crimes, research has suggested that a majority of erroneous convictions have stemmed from reliance on eyewitness identifications (Wells et al., 2000). Clearly, continuing clarification of the conditions in which facial recognition is faulty or accurate has meaningful applications in the justice system.


Barlett, F. C. (1932). Remembering: A study in experimental and social psychology. 
    Cambridge, England: Cambridge University Press.

Godden, D. R., & Baddeley, A. D. (1975). Context-dependent memory in two 
    natural environments: On land and underwater. British Journal of Psychology, 66, 325-331.

Kleider, H. M., & Goldinger, S. D. (2006). The generation and resemblance 
    heuristics in face recognition: Cooperation and competition. Journal 
    of Experimental Psychology: Learning, Memory, and Cognition, 32, 259-276.

LaBar, K. S., & Phelps, E. A. (1998). Arousal-mediated memory consolidation. 
    Psychological Science, 9, 490-493.

Memon, A., Hope, L., & Bull, R. (2003). Exposure duration: Effects on eyewitness 
    accuracy and confidence. British Journal of Psychology, 94, 339-354.

Roediger, H. L. III, & McDermott, K. B. (1995). Journal of Experimental 
    Psychology: Learning, Memory, and Cognition, 23, 803-814.

Schacter, D. L. (2001). The seven sins of memory. New York: Houghton Mifflin.

Sporer, S. L. (2001). The cross-race effect: Beyond recognition of faces 
    in the laboratory. Psychology, Public Policy, and Law, 7, 170-200.

Stanny, C. J., & Johnson, T. C. (2000). Effects of stress induced by a 
    simulated shooting on recall by police and citizen witnesses. 
    American Journal of Psychology, 113, 359-386.

Weber, N., & Brewer, N. (2003). The effect of judgment type and confidence 
    scale on confidence-accuracy calibration in face recognition. 
    Journal of Applied Psychology, 88, 490-499.

Wells, G. L., Malpass, R. S., Lindsay, R. C. L., Fisher, R. P., Turtle, 
    J. W., & Fulero, S. M. (2000). From the lab to the police station. 
    American Psychologist, 55, 581-598.

Wright, D. B., & Stroud, J. N. (2002). Age differences in lineup identification 
    accuracy: People are better with their own age. Law and Human Behavior, 26, 641-654.