Albert Bandura, rated as the fourth most eminent psychologist in the 20th century (Haggbloom et al., 2002), first criticized my work in 2003 in a paper with Edwin Locke, an equally eminent I-O psychologist. I responded to that critique with my own article-length review of control theory ( Vancouver, 2005 [PDF] ), which is the theoretical approach I take and the primary target of the Bandura-Locke critique. More recently, Bandura (2012) has mounted a sole offensive in the form of a 36-page guest editorial in the Journal of Management . I responded to the more recent critique with my own editorial response in JOM ( Vancouver, 2012 ), but I was limited to only five pages. Given the constraint, I focused that response on the limitations of verbal theorizing, as opposed to formal (e.g., computational) theorizing, and the rhetorical fallacies in Bandura's editorial. On this web page, I address the "methodological deficiencies" Bandura claims plague my research studies. These problems are listed in a table in Bandura's editorial, which we have partially reproduced here. We left off the last part of the table that refers to a paper by a different set of researchers. Otherwise, we reproduced all 20 "deficiencies" claimed regarding my papers. Of these 20, 15 are false or mostly false, but five are true, but trivial. To see our response to each deficiencies, merely click on the claim in the table below.
Table 1 from Bandura, 2012
Methodological Deficiencies in Tests of Perceptual Control Theory for Negative Self-Efficacy Effects
Vancouver, Thompson, & Williams (2001)
- Deficient assessment of self-efficacy
- Guessing game with random sequential disconnectedness
- Participants discarded because they guessed the correct solution on first trial
- Posited key mediator, "perceived discrepancy," never measured
Vancouver, Thompson, Tischner, & Putka (2002)
- Deficient assessment of self-efficacy
- Guessing game with random sequential disconnectedness
- Deletion of goal comparator rendered control theory untestable
Vancouver, More, & Yoder (2008)
- Deficient assessment of self-efficacy
- Performance tasks with random sequential disconnectedness
- Pseudo "do your best" goal substituted for an explicit goal comparator in test of control theory
- Discrepancy not judgable against an indefinite "do your best" goal comparator
Vancouver & Kendall (2006)
- Deficient assessment of self-efficacy
- Participants discarded for "illogical responses" judging themselves inefficacious for low grades
- Assessment of strength of self-efficacy abandoned
- Posited "perceived preparedness" mediator not measured
- Effort tested by retrospective self-report
- Loss of discriminative information by converting continuous exam scores to categorical grades
- Lack of variance in goals for testing performance effects
- Past performance deleted from reported hierarchical linear modeling (HLM) analysis
- Use of a dated HLM program
Other issues raised by Bandura (in press) deserve comment. These will be listed and address soon.In the meantime, below are the responses to the items in the table above.
Deficient Assessment (2001; 2002)
This first issue is best classified as mostly false because all measurement of latent variables is deficient to some extent. However, the reason Bandura asserts that my measures are deficient in the 2001 and 2002 papers has to do with the task in those studies: the Mastermind game. Bandura cannot understand how participants could answer a self-efficacy measure given each task?s performance is based on determining a pattern of colors that change randomly each game. Of course, the answer is that what is constant from game to game is the means and method for determining the color pattern (i.e., breaking the code). Indeed, the respondents were very good at assessing their capacity at this, as indicated by the strong positive relationship (r = .54) between average performance and average self-efficacy ? a relationship exactly in line with Stajkovic and Luthan?s (1996) meta-analysis on that relationship. This is strong evidence that the measure is valid and that task performance is somewhat a function of capacity. Return to Table 1
Deficient Assessment (2008)
Bandura never specifies the measurement deficiency in the 2008 study, so I do not know what his problem was there. Return to Table 1
Deficient Assessment (2006)
Bandura had a couple of issues related to the measurement of self-efficacy in the Vancouver and Kendall (2006). First, he did not like that we used a single item. I understand this. I would rather have a multi-item scale, though I do not think it is necessary. Actually, we did have a multi-item scale which we abandoned (i.e., that is one of the true "deficiencies"), but that is the topic of the third item listed in Bandura's table for this study, which is where I talk about that issue. I think the main issue here is that Bandura thinks we asked participants to foretell their grade on the upcoming quiz, which is a forecast and not measure of self-efficacy (though often a very fine line if one follows Bandura's prescription for constructing a self-efficacy scale). However, we did not ask them to foretell the future, but the estimate their grade if they took the quiz immediately (generally two days before the actual quiz). If we asked them to foretell, they would figure in their anticipated change in capacity that the studying would provide. This would not provide an assessment of the effect of current capacity beliefs on the amount of effort that they thought they would need to get the grade they wanted (or would be willing to settle for). So this claim is false. Return to Table 1
Random sequential disconnectedness (2001; 2002; 2008)
The next false deficiency had to do with something Bandura (in press) called random sequential discontentedness. This was a presumably a problem in the Mastermind studies (the 2001 and 2002 papers) and the 2008 study where we used a task called the hurricane game. He is not clear exactly what he means by this, but it seems there are three possibilities. First, he may be referring to his belief that the tasks have no skill component (i.e., performance is based on luck). This was one of Bandura and Locke's (2003) main criticisms of the Mastermind game, which I addressed in my 2005 response. The most obvious counter is that we had evidence that people varied in how good they were at the game, which could not happen if it were purely a game of chance. Indeed, the game is called Mastermind because it takes thought to do it well.
The second meaning might be that performances do not build on previous performances, which means we cannot generalize to cases where performances do build on previous performances. Such building on previous performances is not uncommon, but then neither is it the case that performances are independent. It is an empirical question whether the effects generalize to such situations.
The third possibility, closely related to the first two, is that by random sequential disconnectedness Bandura means that one cannot learn how to do the task better. Indeed, Bandura claims that a problem with this random discontentedness is that nothing is ?incrementally learnable? p. (12). If this means that self-efficacy only operates when something is incrementally learnable, then he is articulating a new boundary condition for the theory. Indeed, regarding the 2008 study Bandura notes that "self-efficacy beliefs cannot function proactively to promote incremental growth" (p. 12). Again, surely he is not saying that self-efficacy's only chance to be functional is to promote incremental growth. What seems more likely the case is that Bandura is using the red herring fallacy. Red herrings are when someone throws in an irrelevant topic to distract from the original issue. Learning or growth are not requirements of self-efficacy's functionality. Learning is one of many aspects of human behavior where self-efficacy can play a role, and certainly self-efficacy beliefs need to be learned (evidenced in all my studies), but it is not relevant to questions of where self-efficacy can negatively relate to resource allocation during or while planning for task performance. Indeed, to be clear, if Bandura is claiming these boundary conditions, that is his prerogative, but that does not mean someone else (like me) cannot suggest self-efficacy has other effects in other kinds of contexts. No, this is just a red herring. Oh yeah, and subsequent research has shown that learning does remove the negative effect unless one does not do the analysis correctly (Yeo & Neal, 2006, who show how to do the analysis correctly and what happens if one does not). Indeed, I am sure that the Mastermind game and the hurricane game have learnable elements, but it is just not the case that we would see much evidence of that in the relatively short study sessions used. Return to Table 1
Discarded Participants (2001; 2002)
We did not discard any participants. We discarded a game if the solution was found on the first attempt because that would be performance based on pure luck. Indeed, this "deficiency" appears to be used as evidence of the random, guessing issue rather than some independent issue. However, as note in my 2005 response, there is an element of luck in the Mastermind game, but once an individual makes a guess there is information available that can be used to inform subsequent tries. Performance was largely based on how well one used this information, a point we demonstrated clearly in the 2002 paper. Return to Table 1
Mediators (2001; 2006)
A set of false deficiencies relate to mediators we failed to measure. I debated calling these true "deficiencies" because it is true that we did not measure directly all the processes and intermediate results we talked about. Few studies that attempt to account for the deeper workings of the mind do. However, I decided "false" seemed a more accurate classification given the specifics provided. For example, regarding the 2001 paper Bandura (in press) claims we said perceived discrepancies were a key mediator. We said discrepancies were, not perceived discrepancies. Had it been perceived discrepancies a stronger case could be made for trying to assess these. However, control theory does not expect the discrepancies to be perceived. Indeed, much of the operation of feedback loops occurs subconsciously. This means that asking about them is likely not to be fruitful. I am not saying it could never be done. Indeed, regarding the 2006 paper Bandurea claims that we do not measure "perceived preparedness," but in fact we do. Perceived preparedness and belief in capacity are the same thing. Thus, when we asked students to describe the grade they thought they would get on the exam if they took it at that time (2 days prior to the exam time), we were measuring their perception of preparedness/capacity at that instant. Return to Table 1
Deletion of goal comparator (2002)
The next deficiency in Bandura's table refers to a similar issue, the "deletion" of the goal comparator in the 2002 studies. Indeed, this complaint was described in a set of paragraphs claiming that goals were a problem for all my studies. To give some perspective, goals are central to both social cognitive and control theory conceptualizations of self-regulation (Austin & Vancouver, 1996). Moreover, both theories predict that self-efficacy would positively relate to accepting and staying committed to difficult goals as well as the level of self-set goals one is likely to adopt (Bandura, 1986; Powers, 1991; Vancouver, 2005). Most importantly, both theories predict that performance would be positively related to goal level (or goal difficulty). Yet, in the first study reported in the 2001 paper we observed a negative relationship between goal level and performance similar to the negative relationship between self-efficacy and performance. For Bandura, this was a sign that the task was flawed. This bothered us as well, but we suspected the culprit was that our measure of goal was picking up changes in predicted performance as opposed to goal change. That is, individuals did not want to look silly claiming a goal level that their performance level did not seem to warrant. To determine whether the fluctuation in self-reported goal and the corresponding negative goal-to-performance relationship were valid or spurious, we manipulated goal level in the second study reported in the 2001 paper. Consistent with self-regulation theories, we found a positive effect for goal level on performance and we found, as I have argued all along, that self-efficacy predicted acceptance of the difficult goal. We did find the same negative relationship between our self-reported goal measure and performance, further indicating that this covariance was likely spurious. The result of this analysis and interpretation was to drop the self-report measure of goal level in the second set of studies (i.e., the two reported in the 2002 paper). Bandura apparently interpreted this lack a goal measure as the deletion of the goal comparator process. If choosing to measure something or not affects the processes participants use in a study, one likely has a serious problem (e.g., demand characteristics might be playing a role). The fact that our main findings did not change in the studies reported in the 2002 paper indicates that we did not have that problem. I am not sure what Bandura?s problem is.
One issue regarding the Mastermind studies and goals needs to be clear, however. The goal, perception, and comparator likely most relevant to determining performance relates to the sub-process of "thinking through" the information provided by the computer while playing the game. Nearly every self-regulation theory, including control and social cognitive theory, describe a hierarchy of goals. That is, goals are accomplished via subgoals and are evoke because of superordinate goals (Austin & Vancouver, 1996). In the Mastermind game, a subgoal we presumed is evoked as part of the process needed to achieve the game performance goal is to consider the information available from previous moves and use that information to determine and evaluate one?s next move. It is this perception of considering the information that we felt was likely biased by self-efficacy beliefs. This goal comparator process was never removed, but it is difficult to measure directly. The best we could do was measure the results of insufficient thinking through (i.e., number of logical errors made given the feedback), which we did in the second study in the 2002 paper. Return to Table 1
Do Your Best Goal (2008)
With regard to the 2008 study where participants task was to nail (i.e., click on) boards (i.e., squares) moving randomly about the computer screen Bandura (in press) complained that the participants had a weak, do-your-best goal, which goal-setting theory research shows does not work well. Although it is true that we assigned a non-specific do-your-best goal for the overall task, we assigned a clear specific goal for each instance of the task (i.e., nail the board). Given the within-person design we used (i.e., examining what happens across instances of the task for each participant), this specific goal is the relevant goal. It appears that Bandura is thinking in terms of between-person designs where the goal for the whole session is most relevant. That said, many a study of self-efficacy as occurred without the assignment of some goal, specific or not, by the researcher. This might add some noise, but it does not invalid these studies. Likewise, control theory studies can be conducted without assigning specific goals. Indeed, figuring out what goals are being sought is often a purpose of control theory research. To be clear though, that was not the case in the 2008 study; the goal was clear. Thus the last two "deficiencies" listed for the 2008 study are false. Return to Table 1
Deleted Participants (2006)
No participants were deleted from the study for illogical responses. But see the next issue regarding the strength measure of self-efficacy. Return to Table 1
Abandoned Strength of Self-Efficacy Measure
It is true that we "abandoned" the strength measure of self-efficacy in the 2006 study. Strength measures ask respondents to indicate the confidence (from 0 to 100 percent) they have in their capacity to reach various performance levels, which in this case were grades. In contrast, magnitude (or level) measures ask participants to indicate what they can do (yes or no) across the performance levels. Bandura (1986) originally argued that one should count the number of levels the respondent answered yes to on the magnitude measure to get a magnitude score, and then should sum the confidence ratings on the strength measure items in which "yes" was endorsed on the magnitude scale in order to obtain a strength measure. More recently he has advocated averaging all the items across both scales to obtain a self-efficacy strength measure (Bandura, 1997). Note that adding, average, counting the number of yes's, or simply recording the highest level endorsed on the magnitude scale all get one the same thing. This is easiest to see if the coding used is 1 for yes, 0 for no. Moreover, simply asking one to indicate the highest level one can obtain also gets the same result, but using only one item. That last approach was what we choose to use. It is the most efficient approach. Meanwhile, the strength measure is odd for a number of reasons. First, strength of belief should refer to one's confidence in the belief, not the confidence one has in reaching each level of performance. That is, strength of belief refers to the degree to which one holds a belief, not the level of that belief. Moreover, strength and magnitude should be largely independent. For example, I am highly confident in my belief that I would perform terribly in the MBA (or the NFL for that matter). Strength should predict how well the belief stands up to disconfirming information, not the level of the belief. That said, Bandura's suggestions regarding how to measure strength have always confounded magnitude/level to the point where strength is not measured at all; only magnitude is. This is ironic because in his diatribe (Bandura, in press) he accuses Maurer and Andrews (2000) of creating a bipolar scale and then coding it in a way to get a single dimension score. It is true that their scale is bipolar, but it measures a single dimension. Indeed, they were advocating a magnitude measure using sound psychometric principles. Moreover, I believe that Bandura's magnitude and strength measures are both measuring a single dimension: magnitude. That is, strength is not measured.
So what does this have to do with me and the 2006 study? Well it turns out, as Maurer and Andrew's (2000) also found, respondents can get confused by the strength measures. Specifically, when asked how confident they are that they can reach some low level on the scale, they will mark it low because they are confident they will reach much higher on the scale. That is, some (not many) appear to think the question is asking them the probably of a particular outcome, not their ability to perform at or above that level of outcome. Now when the scale is averaged across all the items, which makes sense if individuals are thinking "at or above" (i.e., it represents the "area under the curve" essentially), the lower confidence ratings for lower levels of performance add error to the measure. We have found this happens in only certain kinds of contexts, like the Vancouver and Kendall (2006) one. So, we dropped the measure. We did not, as erroneously indicated by Bandura (in press; Table 1), drop any participants. Return to Table 1
Retrospective Assessment of Effort (2006)
It is true that we measured effort retrospectively via self-report. Of course, this was a supplement to our planned study time measure. Specifically, effort was operationalized as the amount of time studying for the exam. We were more interested in the planned time to study because of our interest in the planning context, but we (and our reviewers) wanted to assess whether the plans had downstream effects as well (e.g., related to actual time studying and performance). We asked them to prospectively report the amount of time they intended to study. To get at actual time studying, we ask how much time they actually studied retrospectively. Bandura (in press) thought we should record study time "antecedently." I have to admit I do not know what that means, but I would certainly agree that self-reports of either planned or past studying are proxies for actual planned and actual studying times. Not sure what the less than perfect measurement does to the findings besides weaken them. Bandura did not make any arguments regarding how the measurement method might have biased the findings in any substantive way. But we did not hide our methods and others are free to question our interpretations as they wish.
Continuous to Categorical (2006)
Another true "deficiency" is that we translated continuous exam scores into categorical grades. Well, actually, that is only partly true. The exam scores were translated into 11 grade levels (letter grades with pluses and minuses) which are essentially interval scale, not categorical. What might bother Bandura is that some information was lost in that translation. However, that loss of information could only weakened our findings, not created a spurious effect. Indeed, it is knowable how much the finding was weakened. Specifically, to correct for the biased we could simply multiple the effect sizes we found by 1.012. Given that, it hardly seems worth reading this paragraph. Return to Table 1
Lack of Variance in Goals (2006)
This "deficiency" is also true. That is, we had very little variance on goals. Not sure what he expected us to do or have done about that. It is what it was. You measure your variables and report your findings. Not sure what the methodological deficiency was. Return to Table 1
Past Performance Deleted from HLM Analysis (2006)
This complaint seems to stem from an analysis Bandura had done on the Vancouver and Kendall data. Apparently, they found that past performance was significantly negatively related to subsequent performance, though I am not sure given the effect nor probably of Type II error level was reported. Assuming the above though, Bandura seems to think this is remarkable because (a) he cannot understand how it could be and (b) I always complain about the lack of past performance as a control in other people's studies. Let me begin with the second issue.
I frequently point out in my studies that when researchers test the relationship between self-efficacy and performance using between-person, passive observational designs (i.e., no experimental manipulation), one cannot tell if self-efficacy is causing performance, performance is causing self-efficacy, or both. In particular, it seems reasonable, as Bandura often points out, that one develops their sense of capacity (i.e., their self-efficacy belief) regarding a specific task by observing performance on that task. That is, if one does well, self-efficacy likely goes up; but if one does poorly, self-efficacy likely goes down. Indeed, I confirm this prediction in nearly all my studies of self-efficacy, including the 2006 study. However, this would only be a problem if there was some systematic differences in the levels of self-efficacy around which each person?s belief varied. Of course, there is likely to be systematic differences due to the systematic difference in performance that occurs across individuals. This systematic difference in performance might be because of systematic differences in self-efficacy beliefs, but it also might be, indeed seems much more likely to be, because of systematic differences in actual capacity. That is, some people are better at the task than other people. My concern with passive observational, between-person studies is that it is this third variable, actual capacity, determines performance and performance determines self-efficacy. If that is the case, then controlling for actual capacity would leave no covariance between self-efficacy and performance. Because actual capacity is a difficult thing to measure most use a proxy: performance. Thus, some statistically control performance, measured at a previous time, when examining the relationship between self-efficacy and performance, measured at a subsequent time. When they do that, the relationship between self-efficacy and performance is typically negligible (Heggestad & Kanfer, 2005).
Perhaps not surprisingly, Bandura has a problem with controlling past performance. He thinks that self-efficacy is essentially the third variable responsible for a person?s past performance level. Thus, when one controls for past performance, one is removing self-efficacy covariance from the equation, which is why self-efficacy is not related to subsequent performance when past performance is controlled. For me, when I think about whether actual capacity or belief in capacity are responsible for performance, actual capacity wins, but Bandura?s argument is not fallacious. That is, he could be correct, and if so, the controlling for past performance method it problematic for the reason he gives. The solution to this problem for Bandura was a statistical one (see Wood & Bandura, 1989), but that procedure was also predicated on the assumption that Bandura was correct (we talk about this in the 2001 paper). It could not be used to test whether he was correct or not. That is why I addressed the problem with a design change: study behavior within an individual across repeated performances. If one does that, one can confirm that capacity does not change; or if does, one can control for it using trend rather than past performance. In my studies, there is little evidence that actual capacity changes (this does not mean learning is not taking place: individuals are likely learning their capacity, or, in the case of the 2006 study, they are learning the course material, but they are not changing their capacity to learn the material). Bottom line, there is generally no reason to statistically control for past performance (i.e., by including it in the model) when using a within-person design (though there are some exceptions to this).
The above still does not explain why past performance would be negatively related to subsequent performance. One possible answer is that past performance is positively affecting self-efficacy, which is negatively affecting subsequent performance. Given that self-efficacy's affect on subsequent performance is not very strong, I would not expect the indirect effect to be very strong. Bandura did not tell us what the effect was. I do not remember if I even looked at the relationship. Indeed, I has not planning to look at subsequent performance at all, but that is discussed below in the next and final issue. Return to Table 1
Dated HLM Program (2006)
The last "deficiency" Bandura mentions in his table is that we used a dated HLM program. Well, again, this is only partly true. At the time we used it, it was up-to-date. Time has passed and now it is out-of-date. So "deficiency" seems harsh. But the issue here is that Bandura claims that if one used a more up-to-date program, the negative self-efficacy with subsequent performance relationship would become non-significant. He notes that Stajkovic did that analysis for him. Well I redid it as well and they are correct. The relationship is no longer significant at the .05 level. The new p-value was .051 (the original was .046). That is, I found that the barely significant negative effect for performance is now barely non-significant when using a presumably better software program. The effect size estimate did not change.
So what does this mean? First, this is an object lesson in the silliness of hypothesis testing (see Hunter and Schmidt, 2004, for a good discussion of this). For instance, if the arbitrarily conventional alpha level were a little smaller, it would not have been significant in the first place; a little larger and it still would have been significant. It seems odd that Stajkovic, who is a meta-analyst, would get caught up in such silliness. But it gets worse. Bandura mentions that 25 models were run on our data and that none found the significant negative effect of self-efficacy on subsequent performance. That sounds pretty bad. It sounds like they really gave it every chance to appear and it did not. However, what he also says is that they repeated our 25 models. If you look at our paper, only two of those 25 models included self-efficacy as a predictor and performance as a criterion (one with goals controlled and one without, the latter of which was reported as non-significant originally). We were not trying to obfuscate anything here. I cannot speak to Bandura's intentions.
A semi-interesting aspect of this final point is that I never really expected a negative performance effect in this study. The data was likely to be too messy for what is a weak effect, as I have always argued the negative effect of self-efficacy on performance would be. Specifically, for a negative effect to occur, the students would have to be pretty badly miscalibrated and rely heavily on this miscalibrated belief to determine their study time. Because we were not trying to actively miscalibrate them (like a direct intervention might), it seemed any natural effect would be hard to detect. We did not even look at the effect of self-efficacy on subsequent performance until a reviewer asked for it. Our main purpose for conducting the study was to examine the effect on effort allocation (e.g., planned study time). That effect remained negative even using the latest software (as of this writing). Return to Table 1