Tag Archives: assessment

BTL Surpass for online assessment in Computer Science

Over the last couple of years I have been leading the introduction of BTL’s Surpass online assessment platform for  exams in Computer Science. I posted the requirements for an online exam system we agreed on a few months ago. I have now written up an evaluation case study: Use of BTL Surpass for online exams in Computer Science, an LTDI report (local copy). TL;DR: nothing is perfect, but Surpass did what we hoped, and it is planned to continue & expand its use.

My colleagues Hans-Wofgang has also presented on our experiences of “Enhancing the Learning Experience on Programming-focused Courses via Electronic Assessment Tools” at the Trends in Functional Programming in Education Conference, Canterbury, 19-21. This paper includes work by Sanusi Usman on using Surpass for formative assessment.

A question for online exams in computer science showing few lines of JAVA code with gaps for the student to complete.
A fill the blanks style question for online exams in computer coding. (Not from a real exam!)

Requirements for online exam system

Some time back we started looking for an online exam system for some of our computer science exams. Part of the process was to list a set of “acceptance criteria,” i.e. conditions that any system we looked at had to meet. One of my aims in writing these was to  avoid chasing after some mythical ‘perfect’ system, and focus on finding one that would meet our needs. Although the headings below differ, as a system for high stakes assessment the overarching requirements were security, reliability, scalability, which are reflected below.

Having these criteria were useful in reaching a consensus decision when there was no ‘perfect’ system.

Security:

  • Only authorised staff (+ external examiners) to have access before exam time.
  • Only authorised staff and students to have access during exams.
  • Only authorised staff (+ external examiners) to have access to results.
  • Authorised staff and external examiners  to have only the level of access they need, no more.
  • Software must be kept up-to-date and patched in a timely fashion
  • Must track and report all access attempts
  • Must not rely on security by obscurity.
  • Secure access must not depend on location.

Audit:

  • Provide suitable access to internal checkers and external examiners.
  • Logging of changes to questions and exams would  be desirable.
  • It must be possible to set a point after which exams cannot be changed (e.g. once they are passed by checkers)
  • Must be able to check marking (either exam setter or other individual), i.e. provide clear reports on how each question was answered by each candidate.
  • Must be possible to adjust marking/remark if an error is found after the exam (e.g. if a mistake was made in setting the correct option for mcq, or if question was found to be ambiguous or too hard)

Pedagogy:

  • Must should be possible to reproduce content of previous CS electronic exams in similar or better format [this one turned out not to be  important]
  • Must be able to decide how many points to assign to each question
  • Desirable to have provision for alternate answers or insignificant difference in answers (e.g.  y=a*b, y=b*a)
  • Desirable to reproduce style of standard HW CS exam papers, i.e. four potentially multipart questions, with student able to choose which 3 to answer
  • Desirable to be possible to provide access to past papers on formative basis
  • Desirable to support formative assessment with feedback to students
  • Must be able to remove access to past papers if necessary.
  • Students should be able to practice with same (or very similar) system prior to exam
  • Desirable to be able to open up access to a controlled list of websites and tools (c.f. open book exams)
  • Should be able to use mathematical symbols in questions and answers, including student entered text answers.

Operational

  • Desirable to have programmatic transfer of staff information to assessment system (i.e. to know who has what role for each exam)
  • Must be able to transfer student information from student information system to assessment system (who sits which exam and at which campus).
  • Desirable to be able to transfer study requirements from student information system to assessment system (e.g. who gets extra time in exams)
  • Programmatic transfer student results from assessment system to student record systems or VLE (one is required)
  • Desirable to support import/export of tests via QTI.
  • Integration with VLE for access to past papers, mock exams, formative assessment in general (e.g. IMS LTI)
  • Hardware & software requirements for test taking must be compatible with PCs we have (at all campuses and distance learning partners).
  • Set up requirements for labs in which assessments are taken must be within capabilities of available technical staff at relevant centre (at all campuses and distance learning partners).
  • Lab infrastructure* and servers must be able to operate under load of full class logging in simultaneously (* at all campuses and distance learning partners)
  • Must have adequate paper back up at all stages, at all locations
  • Must be provision for study support exam provision (e.g. extra time for some students)
  • Need to know whether there is secure API access to responses.
  • API documentation must be open and response formats open and flexible.
  • Require support helpline / forum / community.
  • Timing of release of encryption key

Other

  • Costs. Clarify how many students would be involved, what this would cost.

 

Quick notes: Ian Pirie on assessment

Ian Pirie Asst Principal for Learning Developments at University of Edinburgh came out to Heriot-Watt yesterday to talk about some assessment and feedback initiatives at UoE.  The background ideas motivating what they have been doing are not new, and Ian didn’t say that they were, they’re centred around the pedagogy of assessment & feedback as learning, and the generally low student satisfaction relating to feedback shown though the USS. Ian did make a very compelling argument about the focus of assessment: he asked whether we thought the point of assessment was

  1. to ensure standards are maintained [e.g. only the best will pass]
  2. to show what students have learnt,
    or
  3. to help students learn.

The responses from the room were split 2:1 between answers 2 and 3, showing progress away from the exam-as-a-hurdle model of assessment. Ian’s excellent point was that if you design your assessment to help students learn, that will mean doing things like making sure  your assessments address the right objectives, that the students understand these learning objectives and criteria, and that they get feedback which is useful to them, then you will also address points 2 and 1.

Ideas I found interesting from the initiatives at UoE, included

  • Having students describe learning objectives in their own words, to check they understand them (or at least have read them).
  • Giving students verbal feedback and having them write it up themselves (for the same reason). Don’t give students their mark until they have done this, that means they won’t avoid doing it but also once students know they have / have not done “well enough” their interest in the assessment wanes.
  • Peer marking with adaptive comparative judgement. Getting students to rank other students’ work leads to reliable marking (the course leader can assess which pieces of work sit on grade boundaries if that’s what you need)

In the context of that last one, Ian mention No More Marking which has links with the Mathematics Learning Support Centre at Loughborough University. I would like to know more about how many comparisons need to be made before a reliable rank ordering is arrived at, which will influence how practical the approach is given the number of students on a course and the length of the work being marked (you wouldn’t want all students to have to mark all submissions if each submission was many pages long). But given the advantages of peer marking on getting students to reflect on what were the objectives for a specific assessment I am seriously considering using the approach to mark a small piece of coursework from my design for online learning course. There’s the additional rationale there that it illustrates the use of technology to manage assessment and facilitate a pedagogic approach, showing that computer aided assessment goes beyond multiple choice objective tests, which is part of the syllabus for that course.

New projects for me at Heriot-Watt

Understanding large numbers in context, an exercise with socrative

I came across an exercise that aimed to demonstrate that numbers are easier to understand when broken  down and put into context, it’s one of a number of really useful resources for the general public, journalists and teachers from the Royal Statistical Society. The idea is that large numbers associated with important government budgets–you know, a few billion here, a few billion there, pretty soon you’re dealing with large numbers–but such large numbers are difficult to get our heads around, whereas the same number expressed in a more familiar context, e.g. a person’s annual or weekly budget, should be easy to understand.  I wondered whether that exercise would work as an in-class exercise using socrative,–it’s the sort of thing that might be a relevant ice breaker for a critical thinking course that I teach.

A brief aside: Socrative is a free online student response system which “lets teachers engage and assess their students with educational activities on tablets, laptops and smartphones”. The teacher writes some multiple choice or short-response questions for students to answer, normally in-class. I’ve used it in some classes and students seem to appreciate the opportunity to think and reflect on what they’ve been learning; I find it useful in establishing a dialogue which reflects the response from the class as a whole, not just one or two students.

I put the questions from the Royal Stats. Soc. into socrative as multiple choice questions, with no feedback on whether the answer was right or wrong except for the final question, just some linking text to explain what I was asking about. I left it running in “student-paced” mode and asked friends on facebook to try it out over the next few days. Here’s a run through what they saw:

Screenshot from 2015-03-31 14:54:19Screenshot from 2015-03-31 14:55:13Screenshot from 2015-03-31 14:55:52Screenshot from 2015-03-31 14:56:40Screenshot from 2015-03-31 14:58:46Screenshot from 2015-03-31 14:59:21

 

Socrative lets you download the results as a spreadsheet showing the responses from each person to each question. A useful way to visualise the responses is as a sankey diagram:
sankeymatic_1200x1000 (1)

[I created that diagram with sankeymatic. It was quite painless, though I could have been more intelligent in how I got from the raw responses to the input format required.]

So did it work? What I was hoping to see was the initial answers being all over the place, but converging on the correct answer, that is not so many chosing £10B per annum for Q1 as £30 per person per week for the last question. That’s not really what I’m seeing. But I have some strange friends, a few people commented that they knew the answer for the big per annum number but either could or couldn’t do the arithmetic to get to the weekly figure. Also it’s possible that the question wording was misleading people into thinking about how much would it cost to treat a person for week in an NHS hospital. Finally I have some odd friends who are more interested in educational technology than in answering questions about statistics, who might just have been looking to see how socrative worked. So I’m still interested in trying out this question in class. Certainly socrative worked well for this, and one thing I learnt (somewhat by accident) is that you can leave a quiz running in socrative open for responses for several months.

 

QAA Scotland Focus On Assessment and Feedback Workshop

Today was spent at a QAA Scotland event which aimed to identify and share good practice in assessment and feedback, and to gather suggestions for feeding in to a policy summit for senior institutional managers that will be held on 14 May.  I’ve never had much to do with technology for assessment, though I’ve worked with good specialists in that area, and so this was a useful event for catching up with what is going on.

"True Humility" by George du Maurier, originally published in Punch, 9 November 1895. (Via Wikipedia, click image for details)
“True Humility” by George du Maurier, originally published in Punch, 9 November 1895. (Via Wikipedia)

The first presentation was from Gill Ferrell on electronic management of assessment. She started by summarising the JISC assessment and feedback programmes of 2011-2014. An initial baseline survey for this programme had identified practice that could at best be described as “excellent in parts” but with causes for concern in other areas. There were wide variations in practice for no clear reason, programmes in which assessment was fragmentary rather than building a coherent picture of a student’s capabilities and progress, there not much evidence of formative assessment, not much student involvement in deciding how assessment was carried out, assessments that did not reflect how people would work after they graduate, policies that were more about procedures than educational aims and so on.  Gill identified some of the excellent parts that had served as staring points for the programme–for example the REAP project from CAPLE formerly at Strathclyde University–and she explained how the programme proceeded from there with ideas such as: projects agreeing on basic principles of what they were trying to do (the challenge was to do this in such a way that allowed for scope to change and improve practice); projects involving students in setting learning objectives; encouraging discussion around feedback; changing the timing of assessment to avoid over-compartmentalized learning; shifting from summative for formative assessment and making assessment ipsative, i.e. focussing on comparing with the students past performance to show what each individual was learning.

A lifecycle model for assessment from Manchester Metropolitan helped locate some of the points where progress can be made.

Assessment lifecycle developed at Manchest Metropolitan University. Source: Open course on Assessment in HE.
Assessment lifecycle developed at Manchester Metropolitan University. Source: Open course on Assessment in HE.

Steps 5, “marking and production of feedback” and 8 “Reflecting” were those were most help seemed to be needed (Gill has a blog post with more details).

The challenges  were all pedagogic rather than technical; there was a clear message from the programme that the electronic management of assessment and feedback was effective and efficient.  So, Jisc started scoping work on the Electronic Management of Assessment. A second baseline review in Aug 2014 showed trends in the use of technology that have also been seen in similar surveys by the Heads of eLearning Forum: eSubmission (e.g. use of TurnItIn) is the most embedded use of technology in managing assessment, followed by some use of technology for feedback. Marking and exams were the areas where least was happening. The main pain points were around systems integration: systems were found to be inflexible, many were based around US assumptions of assessment practice and processes, and assessment systems, VLEs and student record systems often just didn’t talk to each other. Staff resistance to use of technology for assessment was also reported to be a problem; students were felt to be much more accepting. There was something of an urban myth that QAA wouldn’t permit certain practices, which enshrined policy and existing procedure so that innovation happened “in the gaps between policy”.

The problems Gill identified all sounded quite familiar to me, particularly the fragmentary practice and lack of systems integration. What surprised most was the little uptake of computer marked assessments and computer set exams. My background is in mathematical sciences, so I’ve seen innovative (i.e. going beyond MCQs) computer marked assessments since about 1995 (see SToMP and CALM). I know it’s not appropriate for all subjects, but I was surprised it’s not used more where it is appropriate (more on that later). On computer set exams, it’s now nearly 10 years since school pupils first sat online exams, so why is HE so far behind?

We then split into parallel sessions for some short case-study style presentations. I heard from:

Katrin Uhilg and Anna Rolinska form the University of Glasgow about the use of wikis (or other collaborative authoring environments such as Google Docs) for learning oriented assessment in translations. The tutor sets a text to be translated, students work in  groups on this, but can see and provide feedback on each other’s work. They need to make informed decisions about how to provide and how to respond to feedback. I wish there had been more time to go into some of the practicalities around this.

Jane Guiller of Glasgow Caledonian had students creating interactive learning resources using Xerte. They provide support for the use of Xerte and for issues such as copyright. These were peer assessed using a rubric. Students really appreciate demonstrating a deep understanding of a topic by creating something that is different to an essay. The approach also builds and demonstrates the students digital literacy skills. There was a mention at the end that the resources created are released as OERs.

Lucy Golden and Shona Robertson of the University of Dundee spoke about using on wordpress blogs in a distance learning course on teaching in FE. Learners were encouraged to keep a reflective blog on their progress; Lucy and Shona described how they encouraged (OK, required) the keeping of this blog through a five-step induction, and how they and the students provided feedback. These are challenges that I can relate to from  asking students on one of my own course to keep a reflective blog.

Jamie McDermott and Lori Stevenson of Glasgow Caledonian University presented on using rubrics in Grademark (on TurnItIn). The suggestion came from their learning technologist John Smith, who clearly deserves a bonus, who pointed out that they had access to this facility that would speed up marking and the provision of feedback and would help clarify the criteria for various grades. After Jamie used Grademark Rubrics successfully in one module they have been implemented across a programme. Lori described the thoroughness with which they had been developed, with drafting, feedback from other staff, feedback from students and reflection. A lot of effort, but all with collateral benefits of better coherency across the programme and better understanding  by the students of what was required of them

Each one of these four case studies contained something that I hope to use with my students.

The final plenary was Sally Jordan who teaches physics at the Open University talking about computer marked assessment. Sally demonstrated some of the features of the OU’s assessment system, for example the use of a computer algebra system to make sure that mathematically equivalent answers were marked appropriately (e.g. y  = (x +2)/2 and y = x/2 + 1 may both be correct). Also the use of text analysis to mark short textual answers, allowing for “it decreases” to be marked as partially right and “it halves” to be marked as fully correct when the model answer is “it decreases by 50%”.  This isn’t simple key word matching: you have to be able to distinguish between “kinetic energy converts to potential energy” and “potential energy converts to kinetic  energy” as right and entirely wrong, even though they have the same words in them. These are useful for testing a student’s conceptual understanding of physics, and can be placed “close to the learning activity” so that they provide feedback at the right time.

Here was the innovative automatic marking I had expected to be commonly used for appropriate subjects. But Sally also said that an analysis of computer marked assessments in Moodle showed that 75% of the questions were plain old multiple choice questions, and probably much as 90% were some variety of selection response question. These lack authenticity (no patient ever says “Doctor, I’ve got one of the following four things wrong with me…”)  and can be badly set so as to be guessable without previous knowledge. So why? Well, Sally had made clear that the OU is exceptional: huge numbers of students learning at a distance mean that there are fewer more cost effective options for marking and providing feedback,  even when a large amount of effort is required. The numbers of students also allowed for piloting of questions and the use of assessment analytics to sort out the most useful questions and feedback. For the rest of us, Sally suggested we could do two things:
A) run moocs, with peer marking and use machine learning to infer the rules for marking automatically, or
B) talk to each other. Share the load of developing questions, share the questions (make them editable for different contexts).

So, although I haven’t worked much in assessment, I ended up feeling on familiar ground, with an argument being made for one form of Open Education or another.