Wednesday, July 29, 2009

Numbers don't kill careers, people do.

This is the second part of a longer piece discussing the particulars of the Obama/Duncan Race to the Top, a scheme developed by powerful people who deign to help your child to "outcompete any worker in the world." Funding depends on metrics, determining who's competent, and who is not. Arne Duncan has already made noises about using tests designed for other purposes to measure teachers. I'm not sure if he's trying to fool us, or if he's really that ignorant.

I used to practice, and teach, medicine. Time after time, some young gun would run a willy-nilly battery of needless tests, then come running to me with the "diagnosis," completely off base.

If you do not have a good grasp of how tests depend on the prevalence of the condition being tested, you will be misled by even good tests. Lots of decent doctors are lousy at metrics.

Suppose Arne's Army developed a test that could separate the good teachers from the bad. Never mind what "good" or "bad" means, the test itself defines competency.

For simplicity, let's make it a urine dipstick--if you get a "+" you're in the club, a "-" and your license is revoked.

Suppose the test was so well designed that if you were competent, the test would be "+" 95% of the time, and if you were incompetent, it would be "-" 95% of the time. Not bad, eh?

So what happens if you get a "-"--does that mean you're likely incompetent? Are you 95% confident in the result?

(I'll pay the Jeopardy theme while you come up with an answer....)

No way to tell--the test's usefulness depends on how many teachers are actually, in fact, incompetent.

Let's say only 5% of teachers are incompetent--in that case, a positive test reflects a truly incompetent teacher only half the time.
Here's the math: if a 100 teachers are tested, the 5 who are incompetent would likely be picked up. Of the remaining 95, however, there is a 5% false positive rate, and 5% of 95 is about 5. Only half of those who test positive would indeed be incompetent.

It's counterintuitive, but it's real; it's called Bayes theorem, and it reveals a practical problem with any type of binary testing--the accuracy of the results depends on the frequency of the condition you are testing for.

If 90% of teachers are incompetent, and some of the public might even believe that, then the chance that a test is a true positive exceeds 99%--same test, different population.

I have no problem with metrics, but I am opposed to what knuckleheads can do with numbers. Numbers don't kill careers, people do.

Most docs are reasonably bright people with a generous dose of ambition, and a lot of them can't grasp this--what hope do we have that Arne will?

The picture is from the Life archives via Google--OK for personal, non-commercial use.
A tip of the hat to Tom Hoffman at Tuttle SVC for pointing out the Gerald Bracey article.


John Spencer said...

I'm glad to see you blogging again. This is a great, thought-provoking blog. I really had no idea how metrics works, but this definitely made me think!

doyle said...

Dear John,

I wandered around Ireland for a bit, but I'm back.

I'm not even sure "metrics" is the right word--I'm waiting for a statistician to set me straight!

Charlie Roy said...

I believe you had a similar post last year. I'm sure the lords of stats will be back. I heard a phrase I like last week: "Measure what you value don't value what you measure." Interesting thought. We are throwing around the ideas of merit pay but we want a system that is not a one number indicates everything type system with test scores as the only factor.

What makes a good teacher? I'd argue the following:
1. commitment to developing children into reflective human beings who think critically and live humanely
2. consider themselves a professional and continue to improve constantly.
3. model justice and commitment through their actions and how they treat those around them

I don't suppose Duncan will come up with a test to measure the above factors.

doyle said...

Dear Charlie,

Yep, it was the post on understanding the difficulties of screening for drugs--same principle.

I am not being fair to the Obama/Duncan team here--I don't think that they would go by a single measure--but using tests requires a subtlety I'm not sure Duncan possesses.

I like your criteria. They may be difficult to measure, indeed, their value immeasurable in more than one sense,

I keep hoping Duncan comes up with a higher purpose than "our" children out-competing "their" children.

John Spencer said...

I got in an argument in a staff meeting with a "consultant" about this. I said that measuring learning was about as possible as measuring love or justice or anything else important.

He then went into great detail about how, if I love my wife, I could create a continuum with various categories for things like commitment to do household chores, romantic dates, etc. I laughed at first, until I realized the man was being absolutely serious.

I've said this before, but the worst kind of ideology is the type that won't admit that it's an ideology: the type that pretends to be science. It's not just that I disagree with it, but that I think it's toxic to humanity.

Blogger In Middle-earth said...

Kia ora e Michael!

Bayes' theorem has been used in discussions to do with drug testing (for athletes for instance) as similar situations exist where tests are claimed to be 99% accurate (or whatever). The scientist in me jumps to the idea that if there is no unequivocal single test, the introduction of another completely different test using different criteria is needed to reduce the chance of a wrong result. After all, this is what we're supposedly aiming at - that the chance of a wrong result approaches zero.

The problem is that there is always the residue - the thin region of overlap of any pair of tests. One could suggest a third test to reduce this even further. But there is always a residue for seldom is the thin sliver eliminated altogether.

The difficulty is that we are dealing with humans. We don't like the idea of making mistakes - by using any system of testing - when humans are involved. But that's life, and not just in teaching.

It happens, as you well know, in medicine. It happens in skydiving. It happens in the justice systems. It happens in love.

Some humans cannot accept the possibility of a mistake where humans are involved yet they are prepared to accept it with animals or with any mechanical system where quality assurance of goods is assessed.

Humans are different.

Just as well I reckon.

Peace in harmony

doyle said...

Dear John,

Many consultants depend on that kind of voodoo quantitative universe--they get paid to analyze and fix problems.

No simpler way to "prove" then "fix" problems than using pseudo-quantifiable variables.

So few folks have a handle on numbers these days--it gets easier and easier to fool the many.

Your "consultant" may truly believe what he sells--good salespeople do. I think Arne Duncan's sales pitch works as well as it does because he actually believes that 100% of the children can be better than average.

By the way, I scored an 87.6 on the exam--do you love your wife as much as I love mine? If not, I have a consultant that will sell you a method that can improve your numbers....

Dear Ken,

If those in power had the same scientist in them that you have in you, I'd not be so worried.

I'm not that worried anyway--read Charlie Roy's comments and his blog, he gets it. I work under a principal and a supervisor who both get it. I have no problem with qualified people observing and assessing my practices in the classroom, and a few of us in our school observe each other across disciplines to improve our teaching.

Ido worry, though, that what's happening at the national level is not simply an attempt to improve education and to weed out incompetent teachers; the people at the table have monied interests. Corporations are openly trying to groom schools to serve their needs in the feigned interests of national economic security. Testing companies are raking in money in a test crazy culture. It's no secret that at least some of those in the Bush administration "saw NCLB as a Trojan horse for the choice agenda."

I can accept the possibility of mistakes, and even ignorance (as long as we work to repair that); what I will not accept, though, is an effort by the powerful few to destroy what works for most of the rest of us because of undeclared allegiances.

Blogger In Middle-earth said...

Kia ora e Michael.

Ah yes. "Undeclared allegiances". I've met this idea before. It goes with, "If it ain't broke, break it."

Managerialism suffers the same dysfunction.

Catchya later