The Evaluation Debate, Summer 2004

On Thursday, July 15, 2004, Claremont Graduate University co-sponsored a very thoughtful debate in front of a standing-room-only audience of over 120 people.  This debate was a part of our annual Professional Development Workshop Series (click on the link for details about this year's workshops).  Highlights from the debate, as well as a streaming video of the event, are below.

Determining Causality in Program Evaluation & Applied Research:

Should Experimental Evidence Be the Gold Standard?

 
Mark W. Lipsey
Vanderbilt University


Michael Scriven
Western Michigan University

 Moderated by:

Stewart I. Donaldson
Claremont Graduate University

Lipsey Opens With:

“In this context, it seems to me that there are at least three topics that we might discuss.”

“One has to do with the way randomized trials appear in government agencies and the legislation and so on, some of which is simplistic and inept, as uncharacteristic as that is of government activity.”

“Another thing we might talk about is the little flack in the American Evaluation Association (AEA) that involves the stance that was taken last year opposing an obscure division of the Department of Education to try to bring in some randomized evaluations to some of the projects it was funding.  Since this event is being sponsored by an AEA Affiliate, that is a possibility.  I’d be happy to explain to you why I think the AEA now has the same relationship to the Field of Evaluation as the Flat Earth Society has to the Field of Geology.”

“The third thing we might talk about is the methodological issue and what is actually at stake in these methodological critiques.  That is actually what I want to talk about, but if anyone, maybe the audience, or Michael wants to talk about the others, then I’d be happy to do that.”

 “We really are poorly served by this gold standard terminology.  I think that when you use randomized experiments, which I am basically going to defend in this context, they are much like what Winston Churchill once said about democracy.  He said, ‘It’s the worst form of government except for all the others that have been tried from time to time.’  I do not think this is the gold standard.  I think that for impact assessment randomized experiments are the worst methodology except for some of the others that have been tried from time to time.  That is pretty much my theme here.”

“Experimental and quasi-experimental designs have been around a long time and have well known properties.  What’s really new is this broadside against them from certain research communities.”

“This issue has evoked mostly a yawn in areas where intervention research and program evaluation is done broadly.  So, in mental health, public health, drugs, medicine, chronic delinquency evaluations, and a whole range of areas this is not a particularly exciting topic where randomized field trials are well respected, well known, widely used, and understood to be something of the state of the art for doing impact assessments.  The reactions I’ve seen have come predominantly from the education research culture and to a certain extent from one wing of economists that work in this field that have an interesting take on it and I will get to that later on.”

“Let me turn now to the non-experimental approaches.  This is an area that has fascinated me.  Back when flap was going on, methodological pluralism was all over the Evaltalk.   I kept asking respondents and finally gave up on what these other methods were that were supposed to be equally valid, and the most interesting list came out: epidemiological methods, observational correlation modeling, realist methods, case studies, qualitative, ethnographic, Glasser and Strauss’ grounded theory, and from Michael Scriven the modus operandi technique, forensic analysis, direct observation, all put forth in establishing the effects of programs.” 

“I have in recent years, every time I see somebody putting forward the argument that qualitative methods could be used to assess program effects, I’ve been writing them for some examples.  Show me a case where this was done convincingly.”

"Why is the education research culture so riled up about randomized experiments?  Here are a couple of possibilities.  In all the politics this year, the Bush Administration, the Department of Education, the No Child Left Behind Act, there’s a lot not to like there, okay?  They have been pushing for randomized designs, so we may as well not like those too.  The biggest factor I think is ideological.  The education research culture bought into constructivism and post modernist epistemologies and so on really big time and there is a lot of ideological opposition.  Tom Cook calls it science phobia to quantitative methods and experimentation and so on.  Third, I think that there is a considerable amount of ignorance, not stupidity, not stupidity, but ignorance.”

Scriven Responds:

“Well, apart from the character assassination at the end, which I can tell you in the education community there may be people in it about which those things can be said, but the greatest attacks on constructivism are from people within the education community.  So, there are plenty of others like us who absolutely reject all of that crap and so, it is certainly not true.  Some of my friends are also on the side of the angels over there, like Tom Cook, for the new move.  So, no, I don’t think that is really a very plausible account of the story.”

“I think that if you want to look at reasons why people objected, the three big ones are these.  One, the objections were not at all against randomized control trials (RCT), they were against the decision to take all $500 million dollars of their research money and pull it out of anything except randomized control trials.  Now, it is quite clear the previous speaker is not identifying himself with this extreme wing, but who is the leader of the extreme wing?  It is the guy who is the head of the Institute of Educational Science that has the $500 million, and what does he say?  He says there is no scientific way of establishing causation except by randomized and allocated control group trials, etc. etc.  There is no such thing as scientific research in the area of human behavior except by means of RCTs, and that is complete bullshit!  It happens to be coming from the guy who has all of the money.  So, the sad thing is that this is man killing out off alternatives.”

“Read Tom Cook on problems in practice of running RCTs.  So, this is a very tricky procedure.  While it has theoretical advantages, the theoretical advantages in validity aspects of it are undeniable.  That is not the issue. The issue is not whether or not there is an alternative that has the same theoretical bulletproofness.  The question is whether there is an alternative that can get you results beyond reasonable doubt, and that is another story all together. Very often, you can get results beyond reasonable doubt in other ways.”

“First, the concessions.  We have not used RCTs when we should have many, many times.  There have been many occasions when we could have pulled off RCTs, when we could have staffed them with competent people, and this is still the case in the present, and that was the best design around.  The arguments around are sloppy arguments including a number of arguments that Professor Lipsey ran into at the Evaltalk discussion.  There was a lot of whistling in the dark going on there and ideological crap going on.  You have to get down to the logic of the cases and you can’t just pull this off by waving things like constructivism, observational, or etc.  So, this is a situation where there is no doubt at all.  This is a very powerful tool, and sometimes much the best tool, but it has as the same value as the torque wrench in a good mechanic’s toolbox.  For certain tasks, you can’t beat it.  After all, this is a quantitative instrument. The torque wrench reads out in inches and meters and so on, so this is very important if you are interested in matching the specs that you are supposed to be matching…a very good instrument.  Nothing can match it, but it has a very narrow range of uses.  Now, that doesn’t matter if the alternative approaches aren’t very good, but of course there is a lot of them and some of them are very good indeed.”

“Well, there’s a lot more I’d like to say, but perhaps I can just leave it by saying I think I agree strongly with him.  A lot of the attacks have been empty and they have lacked specific examples that will work.  A lot of the attacks are based on ideological positions, which are logically unsound.  All of this is true, but never the less given the difficulties facing RCTs, one has to be very cautious going to any sort of wholesale commitment to them.  I hope in the future we can develop a better kind of existence than what we have at the moment.”

The Debate Document

You can now download "The Claremont Debate: Lipsey vs. Scriven" from the Journal of MultiDisciplinary Evaluation (No. 3 Oct. 05).  This paper summarizes the history and the controversial statements that set the context for the debate.

The Debate Video

Based on popular request, we are pleased that we have been able to make this debate available to users with Apple Quicktime, RealPlayer v.10, or RealOne Player.

Debate Section 1: Lipsey's Opening Comments
Debate Section 2: Scriven's Response and Lipsey's Rebuttal
Debate Section 3: Questions from the Audience

Technical Note:

It may be necessary to change the default media player of your computer to one of the programs above, as earlier versions of RealPlayer are not equipped for this video.

RealOne Player may state that the software to play the video is unavailable, but by bypassing this message, the player will bring the video up in a few seconds.

It is also possible to open Quicktime and, in the "Open URL in New Player" option, type in "rtsp://streams.cgu.edu/pt_debate1.mov".  Quicktime will then route itself to the first part of the debate.  Parts 2 and 3 have identical addresses; simply replace "debate1.mov" with "debate2.mov" etc.

Please send any thoughts, questions, or comments about the debate to Paul Thomas at paul.thomas@cgu.edu.

This debate was held following the workshops on Thursday, July 15, at 5:30 p.m.  Admission to this debate was free.

Sponsored by the Southern California Evaluation Association and Claremont Graduate University.