Exhibit 4.6 of The Investing in Innovation Fund: Summary of 67 Evaluations, Final Report, U.S. Dept. of Education, 2018

As part of the federal recovery effort to boost the economy after the 2008 recession, the U.S. Education Department suddenly had a big pot of money to give away to “innovations” in education. Since then, more than $1.5 billion has been spent on almost 200 ideas because Congress continued to appropriate funds even after the recession ended.  Big chunks went to building new KIPP charter schools and training thousands of new Teach for America recruits to become teachers. Other funds made it possible for lesser known programs in reading, writing, math and science instruction to reach classrooms around the country. Many of the grant projects involved technology, sometimes delivering lessons or material over the internet. One “innovation” was to help teachers select good apps for their students. Another was for a novel way to evaluate teachers.

In order to obtain the grants, recipients had to determine if their ideas were effective by tracking test scores. Results are in for the first wave of 67 programs, representing roughly $700 million of the innovation grants and it doesn’t look promising.

Only 12 of the 67 innovations, or 18 percent, were found to have any positive impact on student achievement, according to a report published earlier in 2018. Some of these positive impacts were very tiny but as long as the students who received the “innovative treatment” posted larger test score gains than a comparison group of students who were taught as usual, it counted.

“It’s only a handful,” said Barbara Goodson, a researcher at Abt Associates Inc., a research and consulting firm that was hired to analyze the results of the Investing in Innovation (i3) Fund for the Department of Education. “It’s discouraging to everybody. We are desperate to find what works. Here was a program that was supposed to identify promising models. People are disappointed that we didn’t come up with 20 new models.”

“That’s the dirty secret of all of education research,” Goodson added. “It is really hard to change student achievement. We have rarely been able to do it. It’s harder than anybody thinks.” She cited a prior 2013 study that also found when education reforms were put to rigorous scientific tests with control groups and random assignment, 90 percent of them failed to find positive effects.

Why is innovation so hard in education?

To Goodson, who has specialized in early childhood education research for 40 years, the problem is that learning is ultimately about changing human behavior and that is always difficult for adults and children. And so many other things — like nutrition, sleep, safety and relationships at home — affect learning. “We’ve known for the longest time that economic background characteristics swamp any education intervention,” she said. “We’re starting out with only being able to make a small difference in how people do. The lever of education is only operating on a small slice of the pie.”

In some cases, the current measures of effectiveness, generally standardized assessments, may be too broad to capture the targets of these innovations, Goodson said. For example, a phonics program might help some kids read more fluently. But the ability to read more fluently might only be indirectly captured in a reading test that’s focused on comprehension and vocabulary. An intervention aimed at soft skills, such as the ability to persist and try again, can’t be measured at all on these conventional tests.

Many interventions target kids who are several grade levels behind. A seventh-grade math test might not pick up on how a student progressed through two year’s worth of math from third-grade multiplication of single digits to fifth-grade addition of fractions. Instead the test might suggest a minuscule academic improvement because the student flubbed most of the seventh-grade questions on solving for x and graphing equations.

A more sensitive yardstick for measuring innovation would require creating and administering more tests to students. That’s a hard sell to principals, teachers and families who may already feel that there’s too much testing in schools.

Saro Mohammed, a partner at the Learning Accelerator, a non-profit organization that supports using technology to tailor instruction to each child, says that it’s sometimes hard to prove that an innovation works is because of unintended consequences when schools try something new. For example, if a school increases the amount of time that children read independently to try to boost reading achievement, it might shorten the amount of time that students work together collaboratively or engage in a group discussion.

“Your reading outcomes may turn out to be the same [as the control group], but it’s not because independent reading doesn’t work,” Mohammed said. “It’s because you inadvertently changed something else. Education is super complex. There are lots of moving pieces.”

Mohammed said the study results are not all bad. Only one of the 67 programs produced negative results, meaning that kids in the intervention ended up worse off than learning as usual. Most studies ended up producing “null” results and she said that means “we’re not doing worse than business as usual. In trying these new things, we’re not doing harm on the academic side.”

Mohammed also pointed out that learning improvements are slow and incremental. It can take longer than even the three-to-five-year time horizon that the innovation grants allowed.

Eighteen of the studies had to be thrown out because of problems with the data or the study design. In some cases, too many students who tried the innovation were ignored in the final figures. When you exclude kids with disabilities, for example, that can skew the results upward. Too many of the early-stage innovations weren’t tried on enough students to produce statistically significant results.  That means even when the students in the intervention produced larger test score gains than those in a comparison control group, the researchers still had to call it a “null” result if the odds of reproducing such a positive result were no better than flipping a coin. (One of the reasons that many small education studies cannot be replicated is because they were lucky flukes in the first place.) In more recent grant making, Goodson says the small studies have been “powered up”  so that the results will be statistically useful. (They’re now called Education Innovation and Research grants.)

This grant program was also a first test of using rigorous scientific evidence as a way of issuing grants in education. Proven concepts received the largest $25-50 million grants. Ideas with the least evidence received less than $5 million to help them build an evidence base. Ideas in between might get $15 million. Among the 48 least proven ideas, only 4 were found to increase student achievement. That’s a low 8 percent success rate. (Links to all the publicly available evaluations for each program are here. Appendix D of the report lists the academic results for each program.)

But programs in the highest tier were supposed to have a proven track record and only two of the four — the KIPP charter school network and Reading Recovery — generated stronger test scores.

Michael Hansen, director of the Brown Center on Education Policy at the Brookings Institution,  characterized the results as “discouraging” but cautioned that high failure rates are not a reason to give up on educational innovation. “This is the nature of R&D,” he said. “If we stop giving out grants, then we stop innovating.”

This story about innovation in education was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for the Hechinger newsletter.