What are "evidence mills?"

Oct 28, 2023

Evidence mills are companies hired by educational program developers to conduct research showing their programs are effective and “evidence-based,” in order to qualify for government funding. However, critics argue these studies lack rigor, independence, and transparency, often using small sample sizes, short durations, lack of control groups, and vague measures to produce positive findings that support their clients. Proponents contend evidence mills provide useful affordable services, but there are widespread concerns about conflicts of interest and research quality. The rise of evidence mills has been linked to policies requiring so-called “evidence-based programs,” leading to high demand for studies that give interventions the appearance of being “evidence-based.”

I bring this up as I’ve just attended a mandatory, day-long, professional development session. I can’t help but think of the event as evidence of “corporate capture,” whereby the corporation has an ally inside the District that makes sure that not only employees sign up, but that everyone has to sign up AND attend. In this case, the training was/is mandatory for all Resource Specialist Program (RSP) teachers, like me.

Waiting for us at check in was a stack of materials, including a copy of The Knowledge Gap by Natalie Wexler ($18 on Amazon). I’m sure these weren’t gifts from the host. Rather, I’m confident that these were part of the “package” that was sold to the district. No mention of the text was made during the whole day, so this “free gift” seemed odd. It’s more like a case of “just sell them something else” to increase the profitability of the event. At least when I would teach Photoshop to forensic scientists, my book - Forensic Photoshop (pdf version) - served as their note catcher.

Problems from the start

When I received the notice to attend, it included a local District contact person. There was a suggestion in the language of the notice that accommodations could be requested, so I requested them. I sent off an email asking for access to the printed materials in advance, using the language from my Schedule A and Reasonable Accommodations letters - my impaired functional language (Autism: Level 2) is supported by assistive technology. The gracious contact explained that she could send me a few things that she had, but that the corporation tightly controlled their materials. There would be no access to their PowerPoint slides or any of the materials they would be using during the session. Similarly, the “practice kits” were the property of the company. Using my phone to scan / convert would not be allowed.

Here we go again …

Deep diving in the capitalist dumpster

I dug around the vendor’s web site to see what I was in for. I wanted to support myself as best as could be possible. Remember, us Level 2s have difficulty with language. So I wanted to prepare by pre-processing in information. I do this all the time. As I see it, it’s part of the “autism tax.”

The vendor, the 95 Percent Group, is one of the many on the “Science of Reading” bandwagon. I found their flyers for the products they would be demonstrating during the training, the one that lists “the evidence” that supports their methods.

At the top of the flyer was a giant pink graphic indicating that the product qualifies for ESSA funding.

The Every Student Succeeds Act (ESSA) was passed in the US in 2015 as the most recent reauthorization of the seminal Elementary and Secondary Education Act of 1965, replacing the controversial No Child Left Behind Act. If you ask our captured government, the ESSA aims to ensure access to high-quality education for all students, especially disadvantaged groups, by returning more accountability and school improvement authority to states and districts rather than federal control. Key provisions include maintaining annual standardized testing requirements whilst allowing state flexibility, identifying low-performing schools for federally-funded improvement support, providing grants for initiatives like preschool access and enrichment programming (what I was ostensibly there to be trained to do), and promoting increased stakeholder engagement and teacher leadership. With “bipartisan” (aka, everyone was bribed, errr … lobbied successfully) Congressional backing, ESSA has served as the main federal K-12 education law since its passage in 2015.

Tier 3 evidence (shown on the graphic as “Level 3”) under ESSA refers to interventions supported by “moderate evidence” from well-designed and well-implemented quasi-experimental studies.

Quasi-experimental studies aim to demonstrate causality between an intervention and outcomes without the use of random assignment to treatment and control groups. Common quasi-experimental designs include regression discontinuity, propensity score matching, and (most frequently for the “evidence mills”) pre-test/post-test assessments with non-randomly selected control groups.

In reality, “non-randomly selected control groups” translates out to a teacher giving the intervention to each of their classes. Some classes will naturally perform better than other than others simply by who is assigned to the class. Schools do not seek to balance intellect and support needs in each class period. They just put kids in classes.

Pre-test/post-test assessments with teachers just using the classes they have on their roster will yield less reliable results data for a few key reasons:

Lack of random assignment means there will be inherent, uncontrolled differences between the treatment and control groups. This introduces selection bias where the groups are not equivalent at baseline.
Pre-existing differences between groups, rather than the intervention itself, could influence the results. For example, one class may have had higher pre-test scores, motivation levels, or other advantages.
Without random assignment, it’s unclear whether any differences in outcomes are actually due to the intervention rather than pre-existing group differences. Changes could be due to regression toward the mean.
Non-equivalent groups make it more challenging to isolate the impact of the intervention itself. Confounding variables and external factors will likely influence outcomes.
Self-selection bias can be an issue if participants choose whether to be in the treatment or control groups rather than being randomly placed. This is especially true of small-group work when allowing student choice of group selection.
Small, non-representative sample sizes in non-randomized studies can further limit generalizability of findings. Just because it worked in a few classes in suburban Missouri does not mean it will work in Title 1 Los Angeles.

So whilst quasi-experiments have value, non-random assignment introduces uncertainty about the comparability of groups. This reduces (or eliminates) confidence that observed results are directly attributable to the intervention being studied. Careful statistical controls and analysis are needed to address these limitations.

By employing techniques like statistical controls, comparison groups, and pre-testing, high-quality quasi-experiments can provide evidence that an intervention is effective. However, they are not considered as scientifically rigorous as randomized controlled trials.

Additionally, Tier 3 evidence is generally regarded as providing initial or preliminary support for an intervention’s efficacy and usefulness. Additional evidence is typically needed to firmly establish an intervention as “evidence-based” or “highly reliable”. In other words, the intervention needs to be conducted in a wide variety of contexts, using statistical controls, random assignments, and proper control groups.

But wait, there’s more …

Further down the page, I found the company that provided. the report. Learning Experience Design (LXD) Research and Consulting, part of the Charles River Media Group, was responsible for the research and the report. So, off I went to find their information.

Beginning with the end in mind, LXD is not a neutral, third-party research firm. Rather, “LXD Research guides recruitment with public schools and discusses the design of the study to consider how the final product will meet the requirements of critical reviewers (Evidence for ESSA, What Works Clearinghouse, State Boards of Education, etc.).” (source) Put another way, LXD will find the optimal setting to get the data necessary to qualify your intervention for government funding. WOW?!

Further bolstering the thesis that this firm is indeed a “evidence mill,” its parent company describes itself this way, “Charles River Media Group is a Boston-based production company that specializes in corporate communications, television commercials, live streaming events, film, non-profit, and political videos.” (source) It’s a marketing firm, so it figures that its research division will know how to write up their reports.

More interestingly, however, Charles River Media Group (CRMG) describes LXD this way on it’s main page, “LXD and Pre-Production - Whether it’s Learning Experience Design, Evaluation, Consultation or Pre-Production for your media production, our team will work with you whether it’s on paper or on screen.” (source) Nothing at all about educational research.

To be fair, CRMG does describe LXD’s educational research on a separate page. There, you will find this gem, “(Our CEO and CRO are available for additional consulting and company boards.)” Another WOW moment. Can you say, “conflict of interest.”

Who’s LXD

At the helm of LXD is Dr. Rachel Schlechter. From her LinkedIn profile, she seems rather well educated. Unlike a lot of people in this space, however, she doesn’t list employment as a classroom teacher in her work experience. You can find, however, that she is Pragmatic Marketing Certified Level III (link). Given that much of her posting activity is centered on marketing ESSA services to companies as a way to increase profits, this certification is probably paying off.

Destiny Riley, their “Qualitative Researcher” is likely the brains behind the reports. According to her LinkedIn profile, she does have actual classroom experience as well as experience in creative writing. Here’s where a look into what it means to be a “qualitative researcher” can help us in our journey.

The main differences between “qualitative” and “quantitative” researchers are:

Qualitative researchers gather non-numerical, textural data through methods like interviews, observations, and document analysis. Their goal is to understand meanings, concepts, experiences, and insights.
Quantitative researchers gather numerical data that can be analyzed statistically. They use methods like surveys, experiments, and analysis of datasets with quantifiable metrics.
Qualitative research is more exploratory and open-ended, whilst quantitative research tests specific hypotheses and research questions.
Qualitative researchers interpret and make sense of narrative data through techniques like coding and thematic analysis. Quantitative researchers analyze numerical data using statistical software and methods.
Qualitative research can uncover new theories and detailed insights about phenomena. Quantitative research tests theories and provides measurable, generalizable results.
Qualitative researchers often use more conversational methods and get to know research subjects personally. Quantitative researchers maintain objective distance from what they study.
Qualitative research provides depth and understanding of social contexts. Quantitative research provides breadth and generalizability through larger sample sizes.

Don’t get me wrong, both approaches have value in research and can complement each other. I’ve done both myself. Mixed methods research combines qualitative and quantitative techniques to leverage the strengths of each.

The Lead Researcher, Paul Chase, PhD, likely met Dr. Schlechter at Tuffts University. According to his LinkedIn profile, they were both there around the same time. His profile notes a nine-month stint as a student teacher as well as quick trip to Japan as an exchange teacher.

Last on our list (I’m reading their page from left to right) is Isabella Ilievski. She is listed as their Research Project Manager. She spent 4 months as a student teacher before turning her sights on research. Her LinkedIn page activity lists this post from LXD that contains this quote, “… One last shout out to our Digital Promise Certification Guidance clients, Labster, Edpuzzle, Handwriting Heroes, QoreInsights, engage2learn, 95 Percent Group LLC, Get More Math!, Mrs. Myers' Education Services, and MindPlay, for partnering with us to tell and validate your #research-informed design story!)

What does it mean to “tell and validate your #research-informed design story!” I’ll get there in a bit.

The posting includes this comment from another LinkedIn user, “It's always wonderful to collaborate with Dr. Rachel Schechter. And her track record as being the most approved firm on Evidence for ESSA reading is not surprising knowing the high quality work she does!”

What is the nature of the quality that’s being commented upon, the research or the write-up?

Considerations

Based upon what I found prior to arriving at the event, I didn’t think that I would take much away from the event. After all, the media group’s marketing and reporting is quite slick, but kind of hides the real issues. AND … I think that’s the brilliance of Dr. Schechter’s team. They write a report good enough to turn what seems like small-scale action research into “evidence” that an intervention can be effective everywhere. We’ll see what it actually is in a bit. Hang on.

Back to the LinkedIn post comment about LXD partnering with companies like the 95% Group to “tell and validate your #research-informed design story!” What does it mean to “validate” research?

Validating a research study means assessing whether the study measured what it intended to measure in a methodologically sound way. The main components researchers look at when validating a study are:

Construct validity - do the instruments/measures used actually assess the concepts they are supposed to? Were variables accurately operationalized?
Internal validity - can a causal relationship between variables be properly demonstrated given the study design? Were confounding variables accounted for?
External validity - to what contexts/populations can study findings be generalized? How representative was the sample?
Statistical validity - were appropriate statistical analysis techniques used correctly? Do the results achieve statistical significance?
Conclusion validity - is the study free from systematic bias? Are alternative explanations reasonably ruled out?
Criterion validity - does the study result align with other measures or outcomes of the same construct?
Content validity - do measures comprehensively cover all aspects of the constructs being studied?

In general, to validate a study, researchers examine methodology to ensure it adheres to rigorous standards and logically leads to the conclusions reached. A validity assessment examines whether the study properly measures and tests what it claims to according to accepted scientific principles.

This information changes things dramatically. Did LXD conduct the study, or did they validate a study conducted by other researchers, or did they conduct and validate the study themselves?

I wasn’t able to find the actual studies, as such. They’re not in any peer-reviewed publication. They are, however, linked off of various marketing pages on the 95% Group’s web site. Here’s a link to one that we’ll dissect in a bit. Additionally, I did find this quote (above) on the main page of the LXD web site from Laura Stewart, the Chief Academic Officer of the 95 Percent Group (LinkedIn Profile). Her experience in education seems to come from here career in educational publishing.

Nevertheless, it seems that LXD not only designed the study and coordinated with the schools to run it, validated their own work, and also consulted on the design of the 95% Group’s products as well as working on the marketing. This amazing feat was done with a listed staff of four people. Wow!

Is this the right way to do things, you might ask? There are pros and cons to having the original researchers validate their own study:

Pros:

The researchers have intimate knowledge of the study methodology, instruments, data, and analyses. This allows them to thoroughly examine validity.
It is efficient and avoids the need to get outside validators up to speed on study details.
Researchers validating their own work may be more motivated to conduct a rigorous assessment.

Cons:

Researchers are inherently biased towards their own work and may consciously or unconsciously overlook flaws.
There can be a lack of objectivity and neutrality when reviewing one's own research practices.
Blind spots or assumptions made during the original study may persist during validation.
Outside perspectives can often identify issues or alternative explanations that researchers miss.
Relying solely on original researchers violates basic principles of impartiality and independent verification.

In general, best validation practice involves both original researchers critically self-reflecting on their methods AND bringing in external experts to independently evaluate the study. This balances insider knowledge with unbiased external scrutiny. But original researchers alone cannot provide full authoritative validation of a study's legitimacy. Some degree of third-party assessment is considered essential for robust research validation.

Deconstructing the “evidence”

Back to the study I found. It appears to be a report summarizing the results of an efficacy study examining the efficacy of an educational reading intervention. The first section contains a very attractively designed summary, so much so that it can serve as a marketing slick for those that never make it past the first few pages.

Why might one want to read through the whole thing?

First of all, an efficacy report summarizes the results of an efficacy study, which is a type of research study designed to evaluate how effective an intervention, treatment, or program is under ideal and controlled conditions. Efficacy reports describe the study design, methodology, data analysis, and key findings from the efficacy study. They help determine if an intervention can work in an ideal scenario before being tested for real-world effectiveness. Efficacy reports provide evidence of what results are possible from an intervention under optimal circumstances. Is your classroom typical? Would you consider your situation “optimal?”

Unfortunately, the report does not contain any conflict of interest disclosures about research ethics approval (IRB approval), funding sources, conflicts of interest, or the institutional affiliations and relationships of the authors. The report itself provides limited information about who conducted the study, their potential biases, or the sponsorship behind it. But we can assume that LXD produced it because their name is on it. But to be fair, IRB involvement is not mandatory - meaning it’s not illegal to conduct such research without an IRB being involved. It’s just highly unusual, and can be quite problematic … as we’ll see in a moment.

Ethical statements, conflict of interest disclosures, and funding source declarations are extremely important in assessing the validity and legitimacy of efficacy reports and scientific research studies in general. A lack of transparency around these issues can significantly undermine credibility.

Here are some key reasons why ethical and conflict of interest declarations matter:

Allows assessment of whether proper research ethics protocols were followed and if the study design was morally appropriate. This is vital when using human subjects in research. We’ll get there in a bit.
Reveals any potential biases, affiliations, or incentives that could influence how the study was conducted and its conclusions.
Establishes if appropriate safeguards were in place to maintain scientific integrity and objectivity.
Provides context on who sponsored/funded the research and their possible stakes in the results.
Discloses any relevant relationships, financial interests, or non-monetary benefits tied to the study.
Ensures participants were treated ethically and gave properly informed consent. Part of the ethics piece that we’ll get to next.
Allows outside reviewers to factor in conflicts when determining validity of the findings.

Whilst the presence of conflicts or funding sources does not inherently make findings invalid, transparency is crucial. The lack of disclosure raises justifiable skepticism. Strong efficacy reports will proactively address ethical considerations and disclose any real or perceived competing interests. So why omit this vital information?

The ethics of testing human subjects

Based on the information provided in the report, it appears to be a report of a research study involving the testing of human subjects. Specifically:

The report states the study was conducted across 16 elementary schools, with students in grades K-3 participating.
It mentions random assignment of classrooms within each school to either the treatment or control conditions.
Student achievement on reading assessments is reported as the main outcome measure.
Sample sizes and demographic data for students in the study are provided.
The methodology describes an experiment involving manipulation of classroom reading instruction and materials for the treatment group, compared to regular instruction for the control group.

So human subjects, specifically elementary school students, were involved in an experimental intervention where they were organized into groups, given particular instructional materials/methods, and their learning outcomes measured.

Since human participants were directly intervened with and studied, this constitutes human subjects research according to standard definitions. Best practices for human subjects research include securing proper ethical approval and oversight, informed consent/assent from participants, and disclosing the processes followed to protect subjects’ rights and welfare. However, the report does not mention if any such ethical procedures were undertaken. It mentions nothing about an Institutional Review Board (IRB) signing off on the research. (Neither could I find any mention of LXD working with an IRB on their web page.) As a side note, I’m the current Chair of the IRB at Towcester Abbey. I’m trained and qualified to serve in that role and the Abbey’s IRB, an independent IRB, is registered with US HHS (#IORG0009727).

An IRB is an independent ethics committee that reviews and oversees research involving human subjects. IRB involvement and oversight would be important for a study like this for the following reasons:

First, IRBs ensure that studies adhere to ethical principles and regulatory standards for human subjects research. They require that researchers minimize risks and obtain informed consent from participants. IRB review is a cornerstone of protecting subjects’ rights, welfare, and privacy. This is where studies conducted in schools often fail the first steps of IRB review. They wrongly assume that the general consent forms - like filming classrooms or taking field trips - would apply to human research. Hint: they don’t.

Second, IRBs assess whether the potential benefits of a study outweigh any risks. They can require modifications to study design to further reduce risks to subjects. For an intervention study like this one involving minors in schools, an IRB could provide valuable guidance on ethical implementation. Consider this question, would parents consent to having their children potentially land in the control group where they might receive an inferior education? That’s essentially what the study has done. One group gets the amazing new intervention and the other gets the old, unproven one.

Third, IRBs examine participant recruitment processes and materials to guarantee there is no coercion or undue influence. For example, what happens to students when parents opt them out of the study? This independent scrutiny promotes voluntariness and social justice. IRB oversight could be especially important given this study’s school setting.

Finally, IRB approval and supervision signals to the public, researchers, and regulators that a study underwent an impartial ethics review. This can increase trustworthiness and credibility. Mandatory registration of IRB-approved studies also facilitates transparency.

If a human subjects research study report lacks information about ethics oversight, IRB involvement, and disclosure of conflicts of interest, the public could reasonably reach some potentially concerning conclusions:

The researchers may have cut corners on ethics and failed to obtain proper approvals or fully inform subjects. This could violate research norms and put participants (young school children) at increased risk.
There was something to hide regarding ethics procedures or the researchers' affiliations, so this information was omitted. Secrecy surrounding a study breeds distrust.
The researchers had already decided on the study outcome they wanted to show and did not want ethics oversight that could hinder this agenda. A lack of impartial oversight is a red flag.
The study was funded by an organization with financial interests in the results, which could bias the design, execution, and reporting to favour the funder. Disclosing funding sources is standard practice to evaluate this.
Reported results and conclusions may be exaggerated, distorted or unreliable without independent ethics monitoring and complete transparency from the researchers themselves.
Cutting ethical corners would reflect poorly on the integrity of the researchers. Combined with lack of details provided about the authors, it becomes hard to evaluate their credibility.

Indeed, lack of ethical disclosures could lead the public to question the legitimacy of the study and truthfulness of its conclusions. It violates expected transparency practices that allow properly assessing the validity of research. This could undermine trust in the work based on the omissions. It certainly did for me.

Professional Development Day

With all of this in mind, I now felt prepared for my professional development day. It was not at all anything that could be called “training” or “instruction.” It was a sales presentation. The presenter did not check for understanding or call for questions. The “you do time” was focused on using their toolset, which we would have to buy if we wanted to do the suggested program at our schools.

I had to constantly remind myself that District PDs are monologues, not dialogues. I had to curb my desire to speak and ask questions. I had to centre myself so intently that I became entirely dysregulated.

At the end of the event, we were grouped according to the age of the students we teach. High school teachers gathered together at the front of the room. We stared at each other, puzzled and confused. Someone asked the obvious: did anyone hear anything that remotely sounded helpful for high school students? The obvious answer around the group was a resounding, NO.

Towards the end of what became a gripe-session, the presenter came to ask us if we had any questions. One teacher, who like me has a Developmental Maths class, asked if there are studies showing that the proposed programs are effective with dyslexic students.

To our combined horror, the presenter began by saying that “many teachers note success with their dyslexic students …” At this point, I couldn’t take it any more. I clarified the teacher’s question. Does the company have any ESSA-ready studies done on dyslexic students (I already knew the answer)? When the presenter began to prevaricate, I answered for her. I let her know that I had deep-dived their website and found none for dyslexics, gestalt processors, or any other “non-standard” populations.

Why couldn’t the presenter just say, “no, we don’t?” Likely because she was there to generate sales, and “no” doesn’t lead to sales activity. Rather, the semi-scripted response invites the teacher to investigate the intervention for themselves, inviting her to join the bandwagon.

Conclusions

I’ve been very careful in my language here. I’ve provided links to the claims that I’ve made. You can verify the statements I’ve made. I encourage you to do so. Please know that the opinions presented here are my own, and do not represent any of my employers, either past or present. They’re simply a recounting of my experiences as I engaged with the processes described. Again, do your own homework and verify my opinions and claims to your own satisfaction.

So, ouch. The cognitive dissonance that comes from finding out such horror, that the Science of Reading is just a marketing slogan supported by the bandwagon effect, is quite jarring and dysregulating. I knew it was unscientific. I’ve posted about it previously. But I didn’t quite realize how bad it was until this week.

In the comments below, let me know how you feel after reading this far. Would you allow your children to participate in such research? How would you know if their school is doing such things in their classrooms?

The AutSide

Discussion about this post