The European home of Assessment (E-ATP) | Berlin, Germany | 23rd-25th October | Post-Conference review | cApStAn
This year’s European Conference of the Association of Test Publishers (E-ATP) was set in the vibrant city of Berlin. The conference theme was “Assessing Tomorrow—Shaping the Future Together”. Was the content aligned with the conference theme? Both linguists and psychometricians are used to respond “it depends” to every question, and I shall make no exception to this rule. My colleague Anubhav Nathani and I used every available time slot to attend sessions and every break to network with our peers, our partners, our competitors and our friends, and I can’t say that there was a coherent attitude toward the future of testing: there are the techno-optimists, such as Donald Clark, who delivered the opening keynote, titled “AI changes everything”. I have been following his Donald Clark Plan B blog for some time, so it came as no surprise that he finds us all too slow to embrace AI and too stately to overhaul education and assessment, now that AI has equipped us with the tools to do so. Donals makes bold claims about generative AI’s capacity to provide feedback to learners and effectively help them.
In a presentation by Patrick Coates, we could see the outcome of a rigorous comparison between zero-shot prompting, an agentic approach (with one AI agent tasked with reviewing the items it generated), and a RAG approach (retrieval augmented generation, in this case without retrieval). A clear and edifying session, with data and a robust methodology to underpin the findings.
In every second session, we heard that AI needs to be supervised by humans and that the output of large language models is often unreliable. In these presentations, the human expert in the loop is allegedly the cornerstone of a discerning use of generative AI. However, there were also stats and voices to make us suspect that humans are the weak link in an AI-driven workflow. Humans are notoriously incompetent for coherent and consistent oversight (of machines). Paul Edelblut, whose organisation has been using algorithms to score essays automatically for over two decades, advocates (and uses solid data to demonstrate) that automated scoring is more reliable and more consistent than human scoring.
A comprehensive presentation on the assessment landscape in Germany’s education sector by Ulrich Schulze-Althoff reported numerous challenges, a huge investment to make schools in precarious locations more autonomous, and a timid rising trend in formative assessment. The opportunities are there, but nothing will change overnight in a Federal Republic where each Land has its own policy and where teachers’ unions are a powerful force. Shaping the future together will still take some doing, but the ripple effect of successful implementation of data-driven diagnostics and formative assessment in e.g. Hamburg may accelerate adoption.
The two panel sessions I personally contributed to were both very well attended and elicited the participation of a knowledgeable audience: in Leveraging AI to Transform Learning: Insights from the Product Development Journey, moderated by Ada Woo, Sara Vispoel, Rory McCorkle, and I shared the floor and our experience to dispel some myths, sing the praise of principled design and discuss the crucial milestones in an AI roadmap. In Diverse Voices, Fair Assessments—Navigating Cultural and Linguistic Variations in Test Creation and Administration, Cicek Svensson, Nikki Eatchel, and I looked at the different components of the assessment life cycle in which diversity comes into play – the audience chimed in and the discussion was lively.
In a nutshell: for some of the participating organisations, the integration of AI in product development has been more spectacular on power point slides than in real life, and the need for a Responsible AI policy has emerged as the first priority. For others, steady progress has been made but roadblocks such as managing client expectations, navigating privacy issues or understanding regulations have slowed down the pace. The closing keynote was a healthy, constructive, interactive panel session where interesting questions were raised.
Thank you to Chelsea Dowd and the Conference Committee for a successful event!
~ Steve Dept, Managing Founder