Saturday, July 21, 2012

Job interest and performance: a revised view

Is job interest of any importance to job performance? It seems very likely that it should be, but as pointed out by Nye et al. (Nye, Su, Rounds, & Drasgow, 2012), "interest measures are generally ignored in the employee selection literature" (p. 384). Part of the reason seems to be that previous meta-analytic work reported a very low correlation between interest and performance, only about 0.1 (Hunter & Hunter, 1984). However, Nye at al. criticized the often cited meta-analysis published by Hunter and Hunter and conducted a very extensive new analysis of the relation between interest and performance. They came up with a different conclusion: for studies where the interest scales matched the character of the jobs, the estimated correlation was 0.36, after correction for measurement errors and indirect range restriction. They concluded that interest should be considered in selection contexts. 

This is not the only example showing that earlier meta analyses of the effectiveness of predictors of job performance may be quite misleading. A recent publication on integrity tests by van Iddekinge et al. (2012) showed that earlier meta analytic work (Ones et al., 1993), cited by Hunter and Hunter, grossly over-estimated the validity of integrity tests.

The recent Nye at al. work  is undoubtedly very important. However, even stronger results can probably be obtained with specific interest measures. Vocational interest does not measure interest in a specific job, but in a class of jobs. In the UPP test, we measure routinely interest in the specific job under consideration, either in selection or in various types of follow-up. As an example, data from a study of employees in customer service in a finance company (Sjöberg, 2010) was re-analyzed. The correlation between job (not vocational) interest and supervisor rated performance on core job tasks was 0.55, after correction for measurement error and indirect range restriction. The specific interest measure is proximal to job performance, while vocational interest is distal, hence it should be expected to have a lower correlation.

What creates interest (Sjöberg, 2006)? For a given task content, optimal challenge may be the answer to the question. Interests are also probably somewhat elastic, i.e. you may develop a new interest under favorable circumstances (support, optimal challenge). Maybe one should try measure not only interest but also potential for developing interest. In a selection situation, it must be expected that interest scores are contaminated with impression management, and there is a need to correct for that factor. Alternatively, indirect measurement can be attempted, such as knowledge of facts. People who are strongly interested inform themselves about a job or area of study, hence know more. I tried this idea in the selection of applicants to the Stockholm School of Economics, with some success.


Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance. Psychological Bulletin, 96, 72-98.
Nye, C. D., Su, R., Rounds, J., & Drasgow, F. (2012). Vocational interests and performance: A quantitative summary of over 60 years of research. Perspectives on Psychological Science, 7(4), 384-403.
Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test validities: findings and implications for personnel selection and theories of job performance. Journal of Applied Psychology Monograph, 78, 679-703.
Van Iddekinge, C. H., Roth, P. L., Raymark, P. H., & Odle-Dusseau, H. N. (2012). The criterion-related validity of integrity tests: An updated meta-analysis. [doi:10.1037/a0021196]. Journal of Applied Psychology, 97(3), 499-530.
Sjöberg, L. (2006). What makes something interesting? (Review of the book, Exploring the psychology of interest by Paul J. Silvia). PsycCRITIQUES, 51 (46, Article 4), No Pagination Specified.
Sjöberg, L. (2010). UPP-testet och kundservice: Kriteriestudie. (The UPP test and customer service: A criterion study). Forskningsrapport 2010:6. Stockholm: Psykologisk Metod AB.
Sjöberg, L. (2010/2012). A third generation personality test (SSE/EFI Working Paper Series in Business Administration No. 2010:3). Stockholm: Stockholm School of Economics.
Click here,

Thursday, July 19, 2012

Dealing with test complexity

People have a limited ability to make complex judgments without the support of computers and explicit decision rules. This fact has been well-known for many years. An often cited classic is a paper by Miller [12] . Expert judgments of many kinds, including the assessment of job applicants, have confirmed this general principle   [3; 8] . There are some interesting exceptions in special cases, if the experts get fast and clear feedback based on valid theory [9] .  These conditions are rarely present in the assessment of job applicants.

It is usual for judges to come to different conclusions if the information they use is complex and extensive - a common situation. Furthermore, assessments tend to vary over time. At the same time that we have these limitations in our judgment capacity, we have a tendency to fall prey to an illusion. The more information we get, the more confident we are - but beyond a modest limit, judgments become worse as in formation increases. See Fig. 1. 

Figure 1.  Decision quality as a function of amount of information. 

Most personality tests give a complicated picture of a person. This is reasonable since everyone "knows" that people are complicated. Popular tests provide results for 30-40 dimensions. It is likely that such abundance of information is popular due to the information illusion discussed above.  More information makes us more confident. Research has, however, shown that explicit rules for combining formation gives better results. Such a rule can simply be based on the decision maker's own systematic strategy, so-called boot-strapping [7] , or explicitly judged importance weights. The use of weights is an effective way of answering the question: "How do I interpret this test result?" The alterative approach is use a holistic evaluation based on the pattern of results. Holism has traditionally had a strong position in the interpretation of test results, but it cannot be justified on empirical and scientific grounds [14]

Subjective interpretation typically results in narrative texts which may be very credible, due to a number of psychological factors. Such factors have been discussed as enabling "cold reading", i.e. credible inferences about a person, which lack factual basis [13] . Historical examples show how credibility of the Rorschach test was established  by "wizards" who could seemingly produce surprisingly correct statements about a person on the basis of responses to  that test [18] , in spite of the fact that this test, as well as other projective techniques have been found to lack validity [6; 10] . I give two examples of research, which illustrate how illusory credibility may be established.
The Forer effect. Flattering texts, which are full of statements which are generally true  and which say "both A and its Opposite B" are perceived  as very accurate. Forer showed this in a classic study a long time ago [5] ; results which have been replicated many times [4; 16] .  

Forer gave a group of students a "test" which he said would reveal their personalities. After some time a returned with narrative texts said to be based on the responses to the test. Each students got his or her text, but they were all the same. They were asked to judge how well the texts described their personalities. About 90 % said that the texts fitted very well. Here is what they got (typical astronomical texts):

"You have a need for other people to like and admire you, and yet you tend to be critical of yourself. While you have some personality weaknesses you are generally able to compensate for them. You have considerable unused capacity that you have not turned to your advantage. Disciplined and self-controlled on the outside, you tend to be worrisome and insecure on the inside. At times you have serious doubts as to whether you have made the right decision or done the right thing. You prefer a certain amount of change and variety and become dissatisfied when hemmed in by restrictions and limitations. You also pride yourself as an independent thinker; and do not accept others' statements without satisfactory proof. But you have found it unwise to be too frank in revealing yourself to others. At times you are extroverted, affable, and sociable, while at other times you are introverted, wary, and reserved. Some of your aspirations tend to be rather unrealistic. "

MBTI and PPA excel in using statements of this type , and they provide popular reading for those who have taken the tests. They are perceived to be almost perfectly accurate and to give self insights, but they simply flatter [15]  and/or confirm already existing self beliefs. Once credibility is established the tester can give important advice about selection, team composition and personal development. No research exists, which shows such advice to be useful, but since the test report is so persuasive the advice is probably also believed.

The "Draw-a-man"-effect". The draw-a-man test is credible to many users although it has no demonstrated validity
[17] . This is because of common-sense thinking about what various aspect of a drawing could mean. Example: large muscles mean problem with male self-image, large eyes imply paranoid tendencies, etc. Inn addition, there is selective memory of cases which supported these speculations, the others are forgotten or explained away [1; 2] .

The UPP test deals with complexity with aggregate variables, which are linear composites of selected subscales. Extensive research, over a period of 50 years,  has shown that this approach is superior to subjective integration of information [8; 11] . For a reveiew of work on UPP, click here.


[1]. Chapman, L. J., & Chapman, J. P. (1967). Genesis of popular but erroneous psychodiagnostic observations. Journal of Abormal Psychology, 73, 193-204.

[2]. Chapman, L. J., & Chapman, J. P. (1969). Illusory correlation as an obstacle to the use of valid psychodiagnostic signs. Journal of Abnormal Psychology, 74, 271-280.

[3]. Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 243, 1668-1674.

[4]. Dickson, D. H., & Kelly, I. W. (1985). The 'Barnum Effect in Personality Assessment: A Review of the Literature. Psychological Reports 57, 367-382.

[5]. Forer, B. R. (1949). The fallacy of personal validation: a classroom demonstration of gullibility. Journal of Abnormal & Social Psychology, 44, 118-123.

[6]. Garb, H. N., Lilienfeld, S. O., & Wood, J. M. (2004). Projective techniques and behavioral assessment. In S. N. Haynes & E. M. Heiby (Eds.), Comprehensive handbook of psychological assessment, Vol. 3: Behavioral assessment (pp. 453-469). Hoboken, NJ, US: John Wiley & Sons Inc.

[7]. Goldberg, L. R. (1970). Man versus model of man: A rationale plus some evidence for a method of improving clinical inferences. Psychological Bulletin, 73, 422-432.

[8]. Grove, W. M., & Meehl, P. E. (1996). Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: The clinical-statistical controversy. Psychology, Public Policy, and Law, 2, 293-323.

[9]. Kahneman, D., & Klein, G. (2009). Conditions for intuitive expertise: A failure to disagree. [doi:10.1037/a0016755]. American Psychologist, 64, 515-526.

[10]. Lilienfeld, S. O., Wood, J. M., & Garb, H. N. (2000). The scientific status of projective techniques. Psychological Science in the Public Interest, 1, 27-66.

[11]. Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. Minneapolis: University of Minnesota Press.

[12]. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81-97.

[13]. Rowland, I. (2005). The full facts book of cold reading, 4th edition. London: Full Facts Books.

[14]. Ruscio, J. (2002). The emptiness of holism. Skeptical Inquirer, 26, 46-50.

[15]. Thiriart, P. (1991). Acceptance of personality test results. Skeptical Inquirer, 15, 166-172.

[16]. Trankell, A. (1961). Magi och förnuft i människobedömning. Stockholm: Bonnier.

[17]. Willcock, E., Imuta, K., & Hayne, H. (2011). Children’s human figure drawings do not measure intellectual ability. [doi:10.1016/j.jecp.2011.04.013]. Journal of Experimental Child Psychology, 110, 444-452.

[18]. Wood, J. M., Nezworski, M. T., Lilienfeld, S. O., & Garb, H. N. (2003). What's wrong with the Rorschach?: Science confronts the controversial inkblot test. San Francisco, CA, US: Jossey-Bass.

Free counter and web stats