Saturday, April 30, 2011

360 degrees judgments as criteria of a personality test

In a current project we tested 166 middle level managers in health care with the UPP test (Sjöberg, 2010). 360 degrees dimensions (Arvonen, 2002) were related to the dimensions derived from the test. Results:

Correlations between 360-degrees judgments of managers and matching UPP test variables (N=166), corrected for measurment errors in the criteria and range restriction due to indirect selection
360 degrees judgment dimensions
Judges Relations Structuring Change
Superior manager 0.66 0.21 0.39
Colleagues 0.65 0.26 0.38
Direct reports 0.66 0.40 0.51
Self judgment 0.52 0.37 0.64

The test had a very high level of validity in these data, both in relation to the external criteria provided by independent judges of 3 types and levels, and self judgments.These values are somewhat unique for a personaliy test, but see also Hogan and Holland (2003) about the need to matcg predictors and criteria content-wise, an example of an old principle in psychological research (Sjöberg, 1980).

It is interesting to note that "Structuring" is not quite as strongly related to personality as the other two dimensions,  which may be because that dimensions reflects more of "can do" aspects than "will do". It should therefore be related more strongly to ability. Note that personality as measured by the UPP test is unrelated to ability, implying that a combination of personality and ability should be ideal for prediction.

Note that the correlations have NOT been increased by applying an exploratory multiple regression model, a common enough trick. 

A state-of -the-art article on correcting for measurement errior and range restriction is Hunter, Schmidt & Lee (2006).


Arvonen, J. (2002). Change, production and employees. An integrated model of leadership. Stockholm: Department of Psychology, University of Stockholm.

Hogan, J., & Holland, B. (2003). Using theory to evaluate personality and job-performance relations: A socioanalytic perspective. Journal of Applied Psychology, 88(1), 100-112.

Hunter, J. E., Schmidt, F. L., & Le, H. (2006). Implications of direct and indirect range restriction for meta-analysis methods and findings. [doi:10.1037/0021-9010.91.3.594]. Journal of Applied Psychology, 91(3), 594-612.

Sjöberg, L. (1980). Similarity and correlation. In E.-D. Lantermann & H. Feger (Eds.), Similarity and choice (pp. 70-87). Bern: Huber.Click here.

Sjöberg, L. (2010). A third generation personality test. Stockholm: Psykologisk Metod AB. Click here.

Friday, April 22, 2011

SIOP 2011 comments: Soldier recruitment

A major project in the US Army involves a new personality test, Tapas. This is a Big Five test with a combination of ipsative and normative formats which should make it less vulnerable to faking, although this has yet to be proven. A very interesting finding was that mental ability, or g, was an important predictor for "can do" criteria, while personality was important for "will do" criteria (around 0.2, probably not corrected for measurement error and restriction of range). The debate on how to weight g and personality must clearly take into account what is to be predicted. It is striking how much more important personality, even constrained to the ineffective Big Five framework, is with regard to "will do" criteria. Just what personailty dimenions are important is a matter of concern. Traditional military psychology, based as it was on WW II experience, said emotional stability, if combat effectriveness was a criterion and studied in real-world applications (war). Current work emphasizes conscientiousness as it is a dominating dimension in civilian and and peacetime applications, and perhaps even peaceful, settings. Will conscientiousness really help in high-stakes and threatening situations?

The Swedish Government has recently decided to create a professional army, where soldiers will get a small but decent salary. (SEK 16 500 per month). So far, the program is hugely popular with some 22 000 applicants for about 2000 openings. This means a selection ratio of 10 % which should make screening testing very feasible and effective. Values, held to be very important and measured in another US Arny project, can be measured by the proxy dimensional of emotional inteligence (EI) (self report).We have a wealth of data showing this. The second Army project uses another new  personality test, GAT, (see earlier blog entry) but it sems to lack a reltionship to Tapas. No such studies were mentioned (and nobody asked). The GOT test is, by the way, kept secret and item formats and content are not disclosed, measring such things a spiritutal value sand justice seems to be a real challenge.

Check out these for the promise of a proxy measure of values:

Engelberg, E., & Sjöberg, L. (2005). Emotional intelligence and interpersonal skills. In R. D. Roberts & R. Schulze (Eds.), International handbook of emotional intelligence (pp. 289-308). Cambridge MA: Hogrefe.
Click here.

Engelberg, E., & Sjöberg, L. (2006). Money attitudes and emotional intelligence. Journal of Applied Social Psychology, 36(8), 2027-2047. Click here.

Engelberg, E., & Sjöberg, L. (2007). Money obsession, social adjustment, and economic risk perception. Journal of Socio-Economics, 36(5), 689-697. Click here.

Alternatively, write an e-mail to get reprints, write to

Saturday, April 16, 2011

Comments on SIOP 2011: Narrow vs broad traits

The tremendous interest in Big Five factors has not resulted in improvements of personality tests in the sense of increased validity. The reason is that the five factors all have no or only very modest validity in relation to job performance. Very extensive research, summarized in dozens of meta analyses, document this fact. There is now growing consensus about the need to develop and use measures of "narrow traits" in order to increase validity.

Some of these narrow traits are found among the facets of the Big Five, others not. The principle guiding the search for an improved basis of testing are seldom found in theory, but in common-sense thinking about what traits one could expected to be of importance with regard to important criteria, such as psychopathy in the case of counter-productive behavior. Other examples are emotional intelligence and affect. The Big Five cannot account for more than a minor share of the variance of any of the narrow traits.

In order to get improved personality tests there is a need for tests which complement the Big Five with a number of narrow trait scales which are focused on job functionality. Some can be found among the facets, others not. There is a need to limit the narrow trait scales in order to make test validation and test interpretation practically manageable. The illusory richness of tests having 30-40 subscales creates merely a feeling of understanding the tested person, a feeling which is a reflection of a well-known tendency to over-estimate the value of information, if there is more information.

The feeling that the Big Five somehow constitute a final answer to personality and personality testing is fading. The sooner that belief is abandoned, the better it is.

Comments on SIOP 2011: Faking on personality tests

The issue of faking is alive and well. Several sessions at the 2011 SIOP are devoted to it. Nobody or very few deny that faking occurs and that it can affect the outcome of a test, sometimes severely so. It is also realized that faking greatly hurts the credibility of personality testing. Non-experts test users simply are convinced that the test takers often fake good in a high-stakes situation, such as when they apply for a very desirable job or admittance to a prestige school.

It is clear that faking reduces the validity of personality tests, if left uncorrected. The effect can be very substantial. Meta analyses of the validity of personality tests tend to be based on data from incumbents, since job performance (criterion) data cannot normally be obtained from all applicants, and applicant scores are only correlated about 0.5 with incumbent scores. Hence, faking makes the data used in meta analyses of doubtful relevance to the question of test validity.

The most important of the Big Five factors, conscientiousness, is the one most affected by faking. It is also clear that the group of fakers, while heterogeneous, may contain some people who are risky to hire. Ignoring faking comes with great risks for the test users.

What can be done?A powerful alternative is to measure social desirability (SD) and use and SD scale to correct other scales for faking, to the extent hat they correlate with SD (not all scales do and correlations vary strongly in the typical case).The procedure has been validated both in experimental and field work.

There are a few objections, however.

1. SD scales are said to measure "personality". It is somewhat unclear what this means and why it is an argument. SD scales to have correlates with many other dimensions and they also have a certain amount of consistency over time and situations. So what? They can still measure faking at any given time.

2. There are several SD scales and they do not measure the same thing. The best known scales do have high intercorrelations, however.

3. You cannot detect who is a faker. Well, you can to some extent, albeit not perfectly, but who said that psychometrics ever comes up with perfect solutions?

4. Some people fake bad. This can be detected, but is not a major problem. Few people fake bad in  a high-situations where they have applied for a desirable job.

Some commercial test suppliers and their agents try to solve the problem of faking by denying that it exists. This is not a credible statement. Since the future of personality testing is probably dependent on there being a solution to the faking problem - why not use the solution described here? It works.

Friday, April 15, 2011

Comments on SIOP 2011: Emotional intelligence

Yesterday's panel discussion of EI attracted a huge crowd, which certainly confirmed the statement of one of the panelists, that interest in EI is strong and steadily increasing. The panelists, all well-known researchers in the field, gave a good over-view of the field and agreed that the term should only be used for performance-based measures, not for so-called mixed models (read the Bar-on test). However, the major test of performance EI is the MSCEIT which has NO incremental validity in accounting for job performance, while mixed-model approaches do have such utility. So, why bother about performance EI, for other than theoretical purposes? It can be said that "we do not know what mixed-model tests of EI measure". This is true for such tests as the Bar-On (often used in Sweden) which indeed seem not to measure anything beyond the Big Five, or very little.

Tests can be devised which measure self-assessed EI, and they do make important contributions, admittedly for unclear reasons. Here is a promising topic for research. Meanwhile, they can be used in practical work. Worry about "faking" need not be a concern since scores can be adjusted for this bias factor with the help of one or more measures of social desirability - such adjustment is clearly necessary, by the way.

It is also interesting that performance and self-assessed EI have similar correlates such as age and gender so there is SOME evidence for a relationship, in spite for the very low within-group correlation.

In addition, work in my group has shown clear relationships between self-report EI and values: people with low EI in this sense tend to be materialistic, egoistic, and perhaps even manipulative. It is, reverse scored, a "dark side" measure of such attitudes and motivations, and could function as a good proxy measurement of them.

As a final comment, why continue work on performance EI if it offers so little of practical value? Sure, there are logical reasons it should have priority to the term EI, but that seems a remote advantage.
Free counter and web stats