Of Babies and Bathwater
All that said, while I do very much like SI (And how many men aren't referring to Sports Illustrated when they use that abbreviation?) I don't always agree with or approve of everything they publish. In the most recent issue, for example, there is an article by Paul Kurtz arguing that science can be used to help us make ethical judgements. Having read it, I can't claim to agree with his points, though I do think he makes some excellent ones. He also provides a quote that I rather approve of:
We might live in a better world if inquiry were to replace faith; deliberation, passionate commitment; and education and persuasion, force and war.
I mean, seriously, how awesome is that? It almost compares to my favorite quote of all time from Erasmus that goes:
All misery and injustice will disappear, if only reason can penetrate ignorance, superstition, and hate.
But, again, the point of this post isn't to chat about quotes that can be used to support a humanistic outlook (Erasmus, of course, was Christian, but that also isn't the point). I'm really just saying that I don't always agree with the things that SI prints. It's one such instance of disagreement that will be our topic today.
In the July 2004 issue of the Skeptical Inquirer there was an article titled Capital Punishment and Homicide: Sociological Realities and Econometric Illusions by Ted Goertzel, a professor of sociology at Rutgers University. This article, as you might guess from the title, contrasts sociological work on the death penalty in the United States with the work of economists on the same subject. Needless to say, I was thrilled to see a sociologist taking the economists to task, especially given their propensity to... um... borrow things from us.
On reading further, however, it rapidly becomes clear that Dr. Goertzel is not waging the battle of sociology against economics; instead he appears to be interested in a methodological battle, implicitly against some of his fellow sociologists, and he seems intent on pursuing it in the strongest terms. And when I say "the strongest terms," I mean it:
In fact, the comparative method has produced valid, useful, and consistent findings, while econometrics has failed in this and every similar area of research.
But what does this "econometrics," mean? Is he referring to a rational-choice based economic model of human behavior? At first I thought so, but it turns out his definition is much broader than that. Specifically, his view of econometrics includes virtually the entirety of quantitative social science:
...econometrics, also known as multiple regression modeling, structural equation modeling, or path analysis. This involves constructing complex mathematical models on the assumption that the models mirror what happens in the real world.
Thus, as you can see, Dr. Goertzel is not so much taking economists to task, as everyone who employs the general linear model in social science. Moreover, while I would argue that his definition of multiple regression is correct, for the most part, it displays a disturbing tendency to gloss over certain points. Specifically, in the above, it is not assumed that the models mirror what happens in the real world- the idea is to see if such a model can be constructed, and if so, how that may be done. We have fit statistics for a reason, after all.
As a contrast to this "econometric" modelling, Dr. Goertzel advocates what he refers to as "comparative" research. Now, comparative is, itself, a pretty vague label. In sociology, this covers any research that looks at two or more entities in relation to each other, such as an examination of the effect of welfare policies on several different national economies. The term "comparative" has also been used to refer to the "Qualitative Comparative Analysis" methodology developed by Charles Ragin of the University of Arizona. So, there is some question about what exactly Goertzel means by "comparative" research. His explanation of his meaning is unclear, particularly to a social scientist, but is probably summed up best as follows:
Once the statistical data are collected, the analysis consists largely in displaying them in tables, graphs, and charts which are then interpreted in light of qualitative knowledge of the states in question. This research can be understood by people with only modest statistical background. This allows consumers of the research to make their own interpretations, drawing on their qualitative knowledge of the states in question.
Thus, in Goertzel's view, the preferred way to interpret comparative statistical data is by displaying it graphically and then, essentially, eyeballing it for results. The actual challenge appears to be not so much analyzing the data, as finding a way to display it that allows a relevant pattern to become apparent. Those of you who know me realize that I'm chomping at the bit here, since I don't place a great deal of faith in eyeball judgements when it comes to science. Indeed, I don't know why anyone does, since it is well known that graphs can be quite misleading. Graphs can, unfortunately, become very subtle Rorschach tests allowing the viewer to project whatever interpretations they please onto them. So, I can't say that I'm all that taken by Goertzel's "organize and eyeball" approach.
Now, some bright young qualitative person out there is doubtless going to observe that because regression models can be constructed in a variety of ways, the same potential for infinite interpretation can be had in them as well. Goertzel so much as argues this himself when he says:
There are many ways to adjust things statistically, and the answer will depend on which one is chosen. We also know that of the many possible ways to specify a regression model, each researcher is likely to prefer one that will give results consistent with his or her predispositions.
The difference, however, is that statistical approaches to data anaysis allow us to quantify how useful our answers are. This is no small point- when I lecture my students on statistics (And lord knows if you never touch on statistics in a Soc 101 course, you're doing something wrong) I always explain that statistics is a system for guessing that provides two pieces of information. The first piece of information answers the question, "What is the effect, if any?" This would be the answer to a question like, "Does the death penalty reduce crime?" The second piece of information addresses the question, "How sure are we that our answer is correct?" While some might argue that the first question can be answered with graphs, I have yet to see a graph that can answer the second to my satisfaction. Further, by providing specific figures on the strenghts of each relationship, and of the model as a whole, the researcher gains greater insight into exactly what is going on, rather than simply taking the entire situation as an analytical "black box." Certainly graphs have their place, but statistics offer something that graphs frequently cannot: precision.
That being said, Goertzel does make some useful points about regression modeling in the context of the death penalty. His first such point has to do with the appropriateness of regression for analyzing the efficacy of the death penalty:
...this method has consistently failed to offer reliable and valid results in studies of social problems where the data are very limited. Its most successful use is in making predictions in areas where there is a large flow of data for testing.
This is, indeed, correct. Regression analysis is a method that begs for large datasets with many, many cases. In situations with few cases, also referred to as "small-n" situations, regression may give biased and inconsistent estimates, and is thus useless for understanding the social world. Of course, there are ways to try to correct for these problems, including the adjustment of standard errors, but the fundamental logic of the regression model makes small-n cases problematic. This is, of course, relevant here because the number of execuations in the United States, even over long periods, is relatively low. Further, since the rate of executions in most states is low, one must include a very long span of time, and thus a great deal of fluctuation in conditions, into a model in order to incorporate enough executions. Whether I agree with Goertzel's proposed alternative to regression analysis or not (and I don't, as you could already tell) he is correct in questioning the applicability of regression to the death penalty.
Goertzel makes another useful point in terms of the patterning of data for regression analysis. Specifically, even if you have a dataset that includes enough cases and isn't plagued by excessive exogenous variation, you may have a difficult time using regression safely. He writes that:
Statistician Francis Anscombe (1973) demonstrated how bizarre the Flatland assumption [i.e. that two variables may have a consistent relation to each other] can be. He plotted four graphs that have become known as Anscombe’s Quartet. Each of the graphs shows the relationship between two variables. The graphs are very different, but for a resident of Flatland they are all the same. If we approximate them with a straight line (following a “linear regression equation”) the lines are all the same (figure 2). Only the first of Anscombe’s four graphs is a reasonable candidate for a linear regression analysis, because a straight line is a reasonable approximation for the underlying pattern.
Goertzel is correct, statisticians and sociologists have long been aware that data may or may not be amenable to regression modeling. Sometimes the data itself may be patterned in such a way as to violate the assumptions of the regression model, systematically altering estimates for the worse. As the author claims, this does seem, suggestively, to be the case when we examine the actual patterning of data for the death penalty. Clearly, the presence of extreme outliers may compromise the accuracy of our models. However, this is only an argument against the use of regression in cases when the data clearly violate regression assumptions, and more specifically an argument against using regresison to model the effect of the death penalty. This is not a broadly effective critique of regression, since it essentially boils down to an advisory that you should use the right tool for the right job. Just as you would not try to turn a screw with a hammer, Anscombe is warning against the use of regression when the data are clearly incompatible.
It is further worth pointing out that sociologists and statisticians have not been unaware of Anscombe's quartet and are not without methods for dealing with it. For panel two, we have quadratic regression, a special case of multiple regression, that allows us to fit a curve to the data instead of a line. By neglecting to mention quadratic regression, Goertzel is giving a rather narrow, and not wholly accurate, picture of what regression can actually do. In quadrants three and four we have a fairly clear case of an outlier, which is often dropped from a a dataset. Certainly throwing away data isn't an ideal solution, but if there IS a consistent pattern in all the data, save for one case, there is good reason to believe that your outlier case is responding to a different process, is error-filled, or has otherwise been permuted in some way that obscures reality.
Another valid point comes when Goertzel, in quoting another researcher, notices the number of regression specifications that may be attempted before a research study is published:
There is simply too little data and too many ways to manipulate it. In one careful review, McManus (1985, 417) found that: “there is much uncertainty as to the ‘correct’ empirical model that should be used to draw inferences, and each researcher typically tries dozens, perhaps hundreds, of specifications before selecting one or a few to report. Usually, and understandably, the ones selected for publication are those that make the strongest case for the researcher’s prior hypothesis.” [Emphasis added]
As Adrian Raftery of the University of Washington observed in 1995 (As it happens, this is one of my favorite articles of all time: "Bayesian Model Selection in Social Research," Sociological Methodology. 25. 111-163.), running repeated regression models and refining the models each time can essentially "conserve error." In short, because our statistical models usually set the probability of making an alpha-error, or detecting an effect when one is actually absent (false-positive) at .05 (5%) repeated running of models creates a statistical likelihood, or even certainty, that a false positive will be detected. (Note that this differs from the error-reducing properties of study replication. In the former case, each model is not statistically independent of the others whereas, in the latter, the models are independent. I'd go into more detail but, really, unless somebody asks, I'm not going to take the time) As such, routinely running dozens or hundreds of models and refining them at each step introduces a serious risk of error. Raftery even went so far as to demonstrate that repeated modeling and refining could locate strong effects even in random data.
However, while this drawback to regression modeling is valid, it is not impossible to overcome. Raftery's own paper advocated the use of Bayes factors and, particularly, the Bayes Information Criteria (BIC) as a method for determining model fit. Since BIC is resistant to the above stated error issues, it can provide researchers with a more valid way to test and refine models.
Goertzel also challenges the idea of finding unique relationships between variables, stating that:
Econometricians inhabit the mythical land of Ceteris Paribus, a place where everything is constant except the variables they choose to write about.
So, since context varies all the time, we should give up on any attempt to identify regular patterns across all contexts. As I have stated before I find this logic highly questionable. If we attend too much to context, we become historians, merely providing a factual account of what happened without explaining why. Certainly context matters, certaintly we cannot expect to discover mechanistic laws with the same general application as the laws of physics, but to throw up our hands and surrender to complexity is unwise.
I must also confess that I rather doubt Goertzel's sincerity on this point. I do not accuse him of lying, that is not my meaning. Rather, I find it odd that a researcher who claims to be able to answer the question "Does the death penalty deter crime," should advance the argument that a relationship between two variables cannot be determined. If one can assert that the death penatly does, or does not, deter crime, without first obtaining a host of information on the circumstances, one obviously harbors a belief that two variables can have a consistent relation to each other, while being influenced by other factors, or that, ceteris paribus, one variable has a particular effect on another. Since he concludes his article by saying that:
The value of this [comparative] research is shown by its success in demonstrating that capital punishment has not deterred homicide.
It would seem apparent that Goertzel does believe that two variables can have a particular relationship to each other, all things being equal. Why his objection, then, only applies to statistical methods and not to "comparative" methods remains elusive.
In the final analysis, I agree with a great deal that Dr. Goertzel says. Regression is a technique that can be abused and bent to support incorrect conclusions, much like any other technique. It can be applied to inappropriate circumstances, such as small-n death penalty studies, and arguably use of standard p-value based fit statistics may lead researchers to concentrate, rather than eliminate, error. These are all points that social scientists need to be aware of.
Yet, Goertzel goes too far when he claims that:
It is time to abandon the illusion that mathematics can convert the real world into the mythical land of Ceteris Paribus. Social science can provide valid and reliable results with methods that present the data with as little statistical manipulation as possible and interpret it in light of the best qualitative information available.
Statistical analysis is a highly useful and informative approach to data, when used properly. Certainly statistical methods have been abused in the search for definitive answers to questions about the death penalty and other public matters, but is that reason enough to abandon a powerful set of tools? The problems he identifies in regression modeling are not intractable, nor even unique to this sort of analysis. Many of them must be grappled with regardless of the approach one takes to data.
So, does this mean that I think Dr. Goertzel is wrong, or a fool? Well, not precisely. As it happens, I commend him both for writing such a clear article, and for taking the time to make such matters public. It might be reasonably argued that The Skeptical Inquirer is a niche publication, but it certainly has a wider readership than most of our blogs, and thus Dr. Goertzel might be justly called a public sociologist. No, it isn't that I disagree with him, I simply believe that he takes too little evidence and attempts to stretch it much too far.
When we attempt to discern the facts about social life, and answer questions in the public arena, it is right that we question our methods. Let's just be careful that we don't throw out the baby with the bathwater.