Amanda Spielman – 2023 Speech to the University of Oxford’s Department of Education

The speech made by Amanda Spielman, the Chief Inspector of Ofsted, on 18 January 2023.

So I have been asked to talk today about the use of research evidence in education and I’m going to talk mainly about how Ofsted uses research, but I am also going to be talking about its wider use in the education sector.

Overall, I think there is a tremendous amount for the sector to be proud of: England is really ahead of many countries in harnessing research effectively in education. And Ofsted has clearly been part of that movement in recent years.

I must declare at the outset that I am not myself an education researcher. But I have now spent more than 20 years in education, and in all of that time I have been working in different contexts to make good use of available evidence, and to encourage others to do the same, and have made sure that at Ofsted we now have the capacity to do that well.

And of course, we have several big stakes in good use of research evidence.

First, we want to ground our inspection approach as securely as we can in evidence about education itself.

In this way inspections can encourage schools (and of course nurseries, colleges and the other entities we inspect) to align their models and practices with what is already known about quality. That is a big part of being a force for improvement.

Secondly, we aim to build and iterate inspection models that achieve the intended purposes with sufficient validity and reliability and minimal unintended consequences. Of course, we don’t have total freedom here: we have to work within our statutory framework and within the policy constraints that are set by government, including funding. So that’s 2 stakes.

The third stake is the aggregation of the evidence that we collect in doing our work, and the related research work that we carry out, makes us a generator of research evidence for others’ benefit, as well as a user.

And of course, we are just one part of a wider landscape. Much excellent work has been carried out in universities like this one [the University of Oxford] over many years; the Education Endowment Foundation (EEF) has become part of the national network of What Works centres; and many other institutes and bodies do significant work.

And that brings me to a fourth strand, which links back to the first. Many bodies act as intermediaries, translating complex maps of academic evidence into reports and summaries that can be more immediately useful to practitioners. And this is not of itself a core Ofsted activity, but we know that it is one of the ways that our products are used.

Curriculum reviews

For instance, over the last 2 years, we have drawn up and published a series of curriculum reviews. These offer a researched conception of what we consider to be a high-quality education, by subject and by phase. They help translate our researched framework into subjects and phases. And they provide a platform for inspector training in judging curriculum quality.

(And of course, if we are to be consistent as an inspectorate, we must have a shared conception of what constitutes quality. If you ask people to judge quality in the absence of a clear corporate statement, they will inevitably bring their own views to bear: and of course, individual views will always vary to some extent.)

But we also know that schools draw extensively on these reviews to develop their curriculums. They have been downloaded many hundreds of thousand times. I believe this shows a tremendous appetite for engagement with educational research, as well as an understandable desire to gain some insight into Ofsted’s approach.

But of course, there is no comprehensive and definitive version of educational truth. There is much that is well established, and much that is not. New evidence and insights can cast doubt on or discredit previously accepted wisdom. I’ll come back to the difficulties this creates a bit later.

But children’s lives cannot be put on hold. So neither schools nor we can down tools, to wait for a pot of fairy gold at the end of an evidential rainbow. We must work with what is available, and what is most relevant to our work, while recognising that we will always have to iterate in the light of new developments.

How Ofsted works

I think this is a good moment to explain just a little more about Ofsted.

In many ways we [Ofsted] operate as you would expect. The principles of good inspection and regulation are straightforward: proportionality, accountability, consistency, transparency and targeting. These are the Hampton principles, and they are deeply embedded in our frameworks and handbooks.

But how does an inspectorate work?

I think we operate to a fairly standard model.

Our frameworks and handbooks are the policy instruments. They are powerful levers on the education sector, and they exert influence long before an inspector comes through the door.

The inspection process itself is designed around professional dialogue. It is intended to help schools improve – and our post-inspection surveys do find that, in most cases, it does.

At the end of most inspections, we make judgements, for overall effectiveness and for several component judgements. They give parents, responsible bodies and government a clear statement about the overall performance of the institution.

We also publish inspection reports, describing what is being done well and what needs to improve.

We inspect at the level of the individual school and other institutions, but to report only at this level would be a tremendous waste of evidence and insight. So we have a strand that is responsible for drawing out the insights from the aggregation of our evidence, and for additional research where needed to supplement this, and also to run our evaluation programme.

In fact, there are 3 distinct flows here.

One is the dissemination programme, that includes the curriculum reviews I just talked about, thematic reviews and other research, such as reports recently commissioned by the DfE on tutoring and on T Levels. These are intended mainly for policymakers and for the education sector.

One flow is back into our frameworks and handbooks.

And the final flow is back into our inspection processes, including inspector training and quality assurance.

And of course, we are informed by the work of institutions in all this – we do not exist in a bubble.

What inspection is, and is not

And I want to take a couple of minutes to remind us of a broader question: what are the purposes of inspection?

I believe there are 3 main purposes for inspection today that are relevant for the area of research. These sit in the context of a long-standing government policy that puts responsibility for diagnosis with Ofsted, but locates responsibility for treatment and support with schools themselves and with the regions group at the Department for Education (DfE). (This policy is often misunderstood by people who would like us to function primarily as a support mechanism.)

So, what are those purposes?

First, inspections provide information and assurance to parents. Ofsted was created in the early 90s in the context of the parents charter.

Secondly, they inform central and local government and other controllers of schools. Given the independence of our judgements, they provide a legitimate basis for action by others when its needed. And they also signal excellence that others can learn from.

And then, thirdly, they can and should be of value to the people at the receiving end: to teachers and heads. This is true even when inspection is limited to diagnosis. I would be deviating too far from my subject today if I went into the reasons why, but this is a matter of tremendous importance to me.

Case study: the education inspection framework (EIF)

So I am going to take as a case study the development of our main education inspection framework, the EIF. It had to meet those purposes: they are largely defined by government. But we do have flexibility in how we go about meeting these purposes.

And we aim to ground all our work in research evidence and to operate as transparently as possible.

So we took time and care to develop the framework iteratively over 2 years.

To prepare, we reviewed a wide range of research, from many universities, from the Education Endowment Foundation, from the Department for Education, and from other sources. We summarised what we drew on in a review that was published to provide transparency, both as to the evidence we used and our interpretation of that evidence. This gave the framework additional credibility showed the thought, attention and range of views that fed into its development.

And we also did some substantial work on the state of curricula in both primary and secondary schools that, itself, will be informed by research into cognitive psychology. This is an important body of knowledge that wasn’t always being drawn on.

The first phase of our curriculum research found systemic weaknesses in much of curriculum approach and design.

In the second phase we studied a sample of schools that had curriculum thinking and development embedded in their approach.

The third phase, tested a model of inspecting curriculum, based on our findings. This confirmed much of what we found in the first 2 phases and also allowed us to explore some potential curriculum indicators, some evidence collection methods, and also the practical limitations of inspections. And we were also able to test our ability to discern strength from weakness in curriculum development and application.

All of this evidence gathering, research, consultation, evaluation, iterative development and testing resulted in the most evidenced framework that Ofsted has ever produced. The EIF is built around a strong and well-warranted construct of what good education is. And it is built around the importance of curriculum: the real substance of education.

And I have talked before about the substance and purpose of education. It does need to prepare young people for life and work, but that is not all. It must also be about broadening their minds and horizons. It should give them the tools to make their communities and the world better places to live in. And it should allow them to contribute to society and the advancement of civilisation, not just the labour market.

The EIF is broad enough to recognise all of these purposes of education. And it is why it firmly promotes a full and rich conception of knowledge, not a narrow and reductive one.

The EIF and the sector-specific handbooks now underpin all the education inspections we do. They help us to assess the quality of education a service provides.

I will add that there has been considerable interest from overseas education ministries and inspectorates in the EIF, and in how we developed it. As far as we know, it really is the first education inspection framework to be developed in this way.

Area SEND framework development

To do the EIF, we had a wealth of research and findings to draw on. But that is not always the case. Sometimes, we have to develop iteratively in the light of experience, bringing in such evidence as is available.

I thought I’d talk briefly about our new framework for special needs inspections for a quick contrast. These inspections review the effectiveness of all the relevant agencies in providing joined up special educational needs and/or disabilities (SEND) services in a local area. There is surprisingly very little research evidence to draw on for this.

In planning a successor to our first framework, we recognised the important work and lessons from the first set of inspections, but we did also see room for improvement.

We’d already identified recurring weaknesses, flaws and delays in the identification of children’s needs. We had also often found a lack of clarity about who is responsible for what, between the various organisations involved.

We also listened to a lot of feedback from children, young people and their families, from people working in all kinds of SEND and related services, and from the many organisations that support children and young people with SEND as well as representative bodies.

We combined the inspection analysis with the feedback from the various strands of engagement. That enabled us to develop and refine our new proposals. These proposals or aspects of them were then tested through discussions and a set of pilot inspections. (Piloting is a very powerful tool for us.)

All of this led to a new approach with 9 proposals for improvement, which we consulted on last summer. Happily, we found strong support for all proposals, increasing our confidence in the direction, and also provided valuable comments and suggestions that led to some changes and clarifications in the draft framework and handbook.

In summary, we have started by building on our existing framework and inspection programme. We incorporated our analysis, feedback and engagement. We tested our new proposals. We consulted on them – and all of this going into the framework. We think we have created an approach that will improve outcomes for pupils with SEND, help families navigate a complex and sometimes adversarial system, and strengthen accountability by clarifying where responsibility for improvement lies.

I think it’s a good example of how to develop a framework in a less evidence-rich environment.


The next thing I want to talk about is evaluation.

These cases studies illustrate how we draw on established research and generate research to design our models, in the light of both well-developed and under-developed bodies of research.

But we also need to know whether our frameworks and methodologies are being implemented as intended and having the effects we expect. We therefore have a programme of evaluation work. When we do this, we make a contribution to the body of professional knowledge about inspection. But, significantly for us, the evaluation work completes a positive feedback loop. We harness those findings and then use them in refining our process, our handbooks and our frameworks.

One important example of how we evaluate is by using research methods to establish how reliable inspections are. Our frameworks and handbooks clearly outline what we focus on in inspection, and what we consider to be of high quality. So inspector judgement is, from the very start, focused on a construct that’s transparent to all through our handbooks. Our inspectors are there to apply the framework, not to apply their own individual ideas of what good looks like.

Beyond our routine quality assurance activities, we have conducted reliability studies on inspector judgement inter-rater reliability. In other words: do 2 inspectors come to the same judgement? We saw high levels of agreement in the results.

Taken together, our quality assurance work and reliability studies all feed back into the continuing development of our frameworks and handbooks.

The limits on consistency

And I want to talk a bit more, actually, about the concept of consistency of inspection judgements. Those of you here who, like Michelle Meadows and Jo-Anne Baird, are experts in educational assessment will immediately recognise the issue of reliability, with all its counter-intuitive complexities.

School inspection is of course a process of human judgement. It complements various other measurement processes, including exams and testing and also many other kinds of measurement, such as attendance reporting. Judgements of overall effectiveness are composite judgements reflecting many aspects of performance.

Now the reliability of human judgement processes has been studied in contexts in and beyond education. Michelle’s 2005 review of the literature on marking reliability was something I read early in my time at Ofqual, and gave me really valuable insight into the strengths and limitations of human judgement.

For me, there are 2 particularly important lessons that come from that literature. First, that ‘perfect’ reliability is unlikely to be achievable. And secondly, that improving reliability often comes at the price of sacrificing some validity. The narrower the construct you choose to assess, the more precisely you can assess it, at least in theory. But the narrower the construct, the less valuable the assessment is likely to be in practice.

And as you all know, national expectations of schools and other education institutions are broad. There is a democratic consensus that compulsory education should extend far beyond minimum competence in maths and literacy, that it should encompass wider personal development on many fronts as well as academic study, and that schools should have responsibilities for safeguarding children.

This means that the ‘overall effectiveness’ that we are required to judge is, and is likely to remain, a broad construct. The corollary of this is that so-called ‘perfect’ reliability is not achievable.

We accept this in many other areas of life, though perhaps without pausing to think a great deal about it. Driving test examiners; judges passing sentence in courts; judges in an Olympic sporting event; I am sure you can think of other examples where we accept that there will be some level of human variation. (The Eurovision Song Contest is an example of where the divergence between markers is so extreme as to suggest that they may not all be assessing the same construct.)

And in fact one of the reasons that inspection continues to exist is precisely because we all recognise that data measures alone cannot carry the entire weight of measuring quality. And there can be unintended consequences of putting too much weight on data outcomes alone: there can be unhealthy backwash, for children and adults alike. So looking under the bonnet, at how outcomes are being achieved, has real value.

There will therefore always be a degree of variability than cannot be engineered out of inspection, and where we could do more harm than good if we tried.

But of course, we take consistency very seriously. We design the framework with great care, to be clear, structured and unambiguous. We design inspection processes with great care. We put a great deal of effort into recruiting and training our inspectors, when they join, in their early months and throughout their time with us. We have many quality assurance processes, covering all aspects of the process and also our reporting. And we have many sources of feedback: post-inspection surveys, complaints, our evaluation work, as well as regular interaction with sector representative bodies. All of this is used to keep on improving our work.

Proactive research

But our research isn’t only about developing and improving Ofsted’s regular work. We publish a lot that faces the outside world.

Some of this is relatively straightforward aggregated information: we produce official statistics, including inspection outcome data, and publications such as our annual children’s social care survey.

We also aggregate, analyse and disseminate evidence that we collect through our routine work, to produce our annual report and other publications.

And we do more than just secondary analysis of inspection and regulatory evidence. We also conduct primary research where we need to supplement what we can learn directly from inspection.

Our body of work on pandemic recovery was a significant recent contribution. We recognised that we were particularly well-placed to report on the continuing challenges schools and children faced as education gradually returned to normal. We do have unparalleled access to thousands of children and professionals.

We saw the effects of the pandemic and restrictions on children: on their academic progress but also on their physical, social and emotional development. And for a minority of children, being out of the line of teachers’ sight had harmful consequences.

We saw the efforts that have and are still being made to accelerate children’s learning and wider development and to address those harms. Collating and aggregating and evaluating what we found gave valuable insights.

We reported on a live, shifting situation, publishing dozens of rapid reports, briefing notes and commentaries from September 2020 onwards. Our reports and the speed of their publication helped everyone understand what was happening. Our insight was crucial in making sure that policymakers understood the continuing challenges and it helped us highlight the good or innovative practice that others could learn from. We also reported on poorer practice and on how we would expect schools and other providers to improve.

And professionals in all sectors have told us that our research accurately reflected their experience of the pandemic and post-pandemic periods. We know that we were one of the few bodies doing early research on this. And there was international interest in our work – it was picked up in places like Portugal and South Korea, for example, as well as by other European inspectorates. And I think this showed both its importance and the scarcity of credible research on education during the pandemic.

This work made us very aware of the difficulties in schools, colleges and nurseries, at every level, from those working directly with children, all the way through to their leaders.

It also gave us a strong basis for our decision to return to inspection, confident that we had the right level of understanding of the continuing challenges. It helped us to frame the right expectations, suitably high but still realistic. We wanted to see high ambition and support to help children make up for lost time. But our judgements needed to be fair in this context.

And it is worth noting that the flexibility designed into the EIF allowed us to do this within the existing framework. The previous framework would not have been able to adapt in the same way. We would have needed a new temporary framework – something that professionals in the sector clearly told us that they did not want. The sector had spent time contributing to the development of the EIF, and then in understanding and embedding it. Sector feedback was very clearly in favour of sticking with the framework, suitably applied.

We’re also examining other trends in education and social care, bringing our unique position and reach to bear for the benefit of children and learners. We have researched, for example, how local authorities plan for sufficient accommodation and services for children in carehow alternative provision for primary-age pupils is being used; and how secondary schools are supporting struggling readers.


Much of our research work is commissioned by government. One example is our work on tutoring, the first phase of which was published last year. This was based on visits to 63 schools to explore their tuition strategies and how well they had integrated tuition with their core education programmes, to report on the progress and, to the extent possible, the effectiveness of the National Tutoring Programme, on which the government is spending £1 billion.

We found some good use of tutoring, but also that quality varied greatly depending on the school and the tutoring provider. And we also found limited understanding of the effectiveness of tutoring. Used well and properly integrated, tutoring can be a huge help to pupils who fall behind, but it is a very expensive intervention. It therefore needs to have a big enough impact to justify its cost.

There are obvious difficulties with assessing impact. Getting a handle on the effectiveness of tutoring at the level of the individual child or the school is always going to be problematic: how do you attribute progress as between classroom teaching and tutoring? It may be possible where tutoring is very targeted at specific topics or areas of the curriculum. But expectations here do need to be realistic.

Our reviews are already helping the government develop the tuition programme and helping schools and colleges to implement and integrate tutoring better, and the second phase of our research, which is currently in the field, will explore how schools are adapting and applying the programme after a year’s experience.

Policy evaluation

Some of our work is characterised as policy evaluation. One recent example was the exemption of outstanding schools from inspection.

We have now reported on the first year of inspections of previously exempt schools since the exemption was lifted. Most schools inspected were no longer outstanding, and over a fifth dropped to requires improvement or inadequate. These were typically the schools that had gone longest without inspection, typically around 13 years. And we have also set a somewhat higher bar for the outstanding grade in the EIF, so no-one should over-interpret this data. But nonetheless, we can now see that the policy expectation of continuing improvement in the absence of inspection was not realised.

We will be publishing a further report on this strand of inspection later this spring, including an analysis of the weaknesses that have been found in formerly outstanding schools that have been judged RI or inadequate.

Research for practitioners

Our research doesn’t just provide recommendations or suggest improvements for policymakers though. We also publish research reports and reviews for the education sector: for early years, schools and post-16, from the viewpoint of our inspection framework.

For example, we recently published our ‘Best start in life’ research review, which examines the factors that contribute to a high-quality early education. The review drew on a range of sources, including academic and policy literature.

That was the first in a series of reports on early education. We identified some of the features that high-quality early years curriculum and pedagogy may have. What were these features? A curriculum that considers what all children should learn, practitioners who choose activities and experiences after they have determined the curriculum, and adults who think carefully about what children already know, teaching them what they need to know, and broadening their interests.

It was the latest in the series of research reviews we have published since early 2021 – I mentioned the school curriculum reviews earlier.

I think this might be a good moment to pick up on the issue of challenge and contest in education research. Some of our work is in areas where there is little that is contested. But much of it, like so many domains of knowledge, is in areas that are highly contested. And this is certainly true of much of the curriculum.

I can remember a previous Ofqual research director, Michelle’s predecessor, a man with a very long memory, telling me that in successive rounds of qualification reform, the 2 subjects that have always been hardest to finalise have been religious studies and mathematics, where the divergence of views among academic subject experts is especially, and perhaps surprisingly to those who aren’t in the mathematics world, particularly wide. I also remember hearing that in the most recent round of reforms, disagreements between members in another subject expert group were so profound that tears were shed in a group meeting.

It is therefore entirely unsurprising that our work attracts hostility from some quarters. I think this tends to reflect those wider continuing disputes.

As we said in the principles paper which we published ahead of the curriculum reviews:

Educational research is contestable and contested, and so are documents such as these research reviews. Therefore, we are sharing our thinking with subject communities so that we can get input from the broader subject community. We hope that publishing our evidence base for how we have developed our understanding of subject quality will provide insight, both on what evidence we have used and on how we have interpreted that evidence when creating research criteria for our subject reports.

Each curriculum review collates relevant research evidence, but they are not intended to be all-embracing papers covering the entirety of academic thought on a subject. That is not our job, and it would not be a responsible use of our time and resources. Instead, their primary purpose is to lay out the evidence-base for the kind of subject education that our frameworks reward as high quality. They give a broad foundation for the judgements that we make.

While it is not their primary purpose, we do also hope that they will help subject leaders in their curriculum planning. The reviews are not narrowly prescriptive but offer what appear to be reliable general principles that schools can then apply intelligently. They are also not overly restrictive: each review lays out only the possible feature of high-quality education, without claiming that these are the only features. The enormous popularity with schools, of both the reports and of the related webinars that we offer, is an encouraging indicator that they are indeed helpful.

And we have also heard how helpful schools have found having reviews across the set of subjects. Schools are really appreciating the exploration of the nature of a high quality curriculum across subjects, including computing, PE, music and so on. These research reviews fill a vacuum because in some subjects, curriculum (as opposed to pedagogical approaches) has not been a significant focus of other work. Subject and senior leaders regularly share their appreciation of our work, which gives them guidance across a range of subjects.

And of course, this will in turn contribute to improving the quality of education, raising standards for all children.

How the sector uses research

In exploring the place and function of research evidence in educational policy and practice, it is also interesting to reflect on how the sectors we inspect themselves use research.

On the one hand, there is a very positive picture, with much to be optimistic about. We know that many teachers see being reflective practitioners and researching practice as part of their professional identity. Teachers and other practitioners draw on EEF toolkits and summaries, for example, and apply them in their everyday practice. All this is helping to eliminate some of the perhaps fashionable fads and follies of the past.

Twinned with our focus on subject education in the EIF, there’s also been a renewed interest in subject-based research. This development, in particular, really helpfully bridges academic departments within universities with classroom subject teaching in different phases of education. And teachers write about these things, blog about them, and exchange their knowledge at practitioner conferences such as ResearchEd.

And the aroma of that interest has drifted upwards – out of the classroom – to school leaders who, because of their leadership of the curriculum, are developing their subject research knowledge about how best to sustain and develop school subjects. In this way, I think we have contributed to an intellectual resurgence in school leadership. And I think this really is a tremendous thing, to awaken intellectual curiosity at all levels of educational institutions.

But, on the other hand, this brings complexity. As you all know, navigating research is not without its difficulties. The sheer range of research and evidence in a domain as large as education is daunting: some research is not empirical, other kinds of research are empirical, using qualitative and quantitative methods. Discerning strength, weakness, relevance, and applicability in research requires professional judgement. And without this, cargo cults and lethal mutations can emerge.

What I do think would be helpful now is a clearer overall architecture that recognises and values all the parts of the system that generate educational research and evidence, including the entities that are translating research into usable products for practitioners, and the tools to navigate it. And it would also be helpful to have a clearer medium-term focus on building consensus through research.


Now, this evening, I have concentrated mainly on how Ofsted uses research. What I really wanted to make clear is that research isn’t just one part of what we do, it is a part of everything we do.

It informs our day-to-day work, our frameworks and handbooks, and our overall approach. It helps us strive to be better, and to inspire improvement in the sectors we work in. And it lets us to share what we know with government and with practitioners so that they can make informed decisions.

And I hope that you will take this talk and our wider approach as showing how much we value the work that happens in this and in many other universities, here and abroad, as well as in smaller specialist institutions. I believe that you and the whole education sector benefit from this renewed intellectual energy, which is being harnessed so constructively in so many places. I’m fortunate to been in positions over the last 20 years where I have been able to promote this healthy development.

And with that, I’d be happy to take your questions. I have brought along 2 colleagues today: Alex Jones, who is our Director of Insights and Research, and Richard Kueh, acting Deputy director for Research and Evaluation, who was previously the religious education lead in our curriculum unit and author of our RE curriculum review.

Thank you.