Marking and Assessment Judgements - London Metropolitan University

Introduction

Making sound assessment judgements is essential for reliable student assessment, and is the main mechanism whereby value (a mark) is attached to student work. However, that it is an unreliable process is well documented across the sector. In line with the principles embodied in our Education for Social Justice Framework (ESJF), all staff responsible for making assessment judgements should take action to maintain fairness and transparency in the decisions being made.

Assessment judgements should allow everyone involved in the process to have confidence that the marks awarded are a fair reflection of the quality of learning demonstrated in the work submitted. This will include students, the university, markers, moderators, and external examiners. It is therefore extremely important that assessment judgements are conducted in a way that has been carefully considered and constructed so that the process can be explained to all concerned and is acknowledged to be fair.

This section will consider the activities involved in making assessment decisions in 3 sections:

Preparation for marking,
Marking
Moderation of marks or external examining.

Much of the work needed to ensure that assessment decisions are accurate, fair, and consistent will have taken place beforehand, in the design of assessment criteria and grade descriptors. Ensuring that the assessment decisions are useful to students also requires a careful feedback strategy

Useful additional resources are available from Advance HE.

Preparation for marking

The design of assessment criteria and grade descriptors will be part of the process undertaken before assessment decisions are made.
An assessment rubric (assessment criteria plus grading scheme) will have been prepared and shared with students and staff as part of the assessment process which shows evaluative criteria, quality definitions for those criteria at particular levels and a scoring strategy.
Reliability of assessment decisions is influenced by the nature of the subject and the markers’ own interpretation of criteria (Bloxham and Boyd 2007).
Prior to making assessment decisions, the markers should meet at a calibration event. This should occur before the assessment is launched to students.
A calibration event enables the markers and support staff to discuss the expectations of the assessment and the assessment and grading criteria.
Academic and other staff involved in supporting preparation for assessment should be involved in calibration processes. This should be an important part of inducting new staff responsible for teaching and assessing on a module.
Students need to have the chance to discuss the expectations of the assessment as part of the assessment and feedback cycle (see, for example, Hounsell et al 2008).
Staff responsible for making assessment decisions are internal examiners and recruited by the Subject Standards Boards.

Factors that may be a source of bias in marking

the quality of the introductory paragraph;
the readability of the text, surface features;
the accurate use of writing conventions;
the timing of the marking;
knowledge of the candidate;
the quality of preceding 5 papers, if poor, tends to result in higher grades for the subsequent paper;
the values and beliefs of the assessors;
experience in marking similar work;
being asked to explain a grade tends to result in grades gathering around the median.

(Orrell 2008)

Marking

Assessment judgements may be undertaken for summative and/or formative purposes.

Marking undertaken for grading or summative purposes is one of the means by which assessment is judged to be valid, reliable and fair. It is a complex process and reliability is influenced by the nature of the subject in that more discursive topics rely more on subjective judgements and the markers’ own interpretation of criteria. For this reason, the shared application of standards relies on negotiation and discussion of criteria interpretation in the local context.

So whilst it is not possible to assure reliability across the sector or even across courses there are several aspects of good practice that can help to promote confidence in the reliability of marking within a specific module (Price 2005). (see Preparation for Marking).

Subject Standards Boards have responsibility for ensuring that students’ work is marked by sufficiently experienced and professional staff, formally approved at the departmental level as internal examiners. Markers should be members of the university staff who are familiar with the teaching associated with the assignments being assessed, and the standard that is required of students. The status of the markers is an important factor to be considered in the choice of double marking procedure. Where staff are new to marking or from outside of the university, time should be provided to induct them into the standards for the course, perhaps through more extensive moderation or marking workshops.

Using assessment criteria and rubrics for marking

Each assessment task should be accompanied by a set of assessment criteria. These criteria will have been developed by the course team and discussed as part of preparation for marking.

Discussion between markers ensure that assessment criteria provide a framework that helps to ensure that assessors are in broad agreement (i.e. increase inter-tutor reliability) about the basis of assessment decisions and to pinpoint areas of disagreement between markers (again, inter-tutor reliability). They are also important in supporting the internal consistency of the individual marker.

Use of Rubrics (assessment criteria & markscheme)

A rubric is an extension of the assessment criteria to provide a mark scheme, a scoring guide that provides students, markers, moderators and external examiners with evaluative criteria, quality definitions for those criteria at particular levels and a scoring strategy (Popham 1997; Dawson 2017). Guidance from Oxford Brookes University (OCSLD 2020) provides helpful, practical advice on their use.

Develop a rubric for each type of assessment task a student is asked to complete.
Within a programme, use the same rubric for a given type of assessment task across modules within a level of study.
Set out clearly each of the criteria that will be assessed for a given type of assessment task.
Show the weighting of each criterion in determining the grade awarded for the work.
Use criteria that describe features of the assessment task itself (for example, abstract, literature review and methods for a research project and voice and slide design for a screencast) and/or that describe knowledge and its application within the context of the assessment task, for example, critical understanding, analysis and synthesis, evaluation.
Describe the features and qualities of the work (performance standards) for each criterion separately in clear and simple English.
Describe the performance standards for each degree class and for work that fails to meet the pass mark expressed on a scale shown in percentages.
Share rubrics with students at the beginning of each module, in module handbooks, and on the VLE
Use assessment rubrics during in-class activities with students to encourage self and peer assessment and to develop students’ understanding of the performance standards and criteria in use for their work.
When offering feedback to students on their performance, provide comments that reflect the statements in the assessment rubric so students can make links between their performance and the criteria that were used to assess it.

Anonymous marking

Anonymous marking involves marking work without reference to the individual to derive a grade, and thereafter matching the work to the individual. University policy is that students’ work should, where possible, be anonymized for marking, so that markers do not know the identity of the student who submitted the work. This is because there is good reason to suppose that non-anonymous marking raises a serious risk of introducing subliminal marking biases - an important consideration in addressing awarding gaps and fair practice and an essential element of the ESJF emphasis on Inclusive Assessment. Recent research into racial inequalities in assessment in HE highlighted that black students wanted anonymous marking where possible, “because they felt this provided more chance of being judged fairly and without bias” (Campbell et al, 2021, p. 36).

For guidance on setting up online assessments including making provision for anonymous submission and marking, see the WebLearn guide to assessment tools.

The process could still provide for personalised feedback comments where the assessment has a formative purpose, but that step would be decoupled from the grading process. A recent study of anonymous marking revealed that “[f]eedback on non-anonymously marked work was perceived by students to have greater potential for learning than feedback on anonymously marked work” (Pitt and Winstone (2018) cited in Pitt and Quinlan (2022) p.53).

Exceptions to anonymous marking include research degrees and forms of assessment approved by standards boards as impractical.

Research projects

Research projects may provide challenges for anonymous marking as they are often marked with the students identity revealed to the markers. Approaches to help solve resulting bias in marking include:

Exclude staff from marking projects they have supervised, so that projects can be marked anonymously. In a trial of this procedure (reported by Newstead, 2002) marks went down by 5% on average, but students were much more satisfied with the process.
Have every student project marked anonymously by a second marker as well as the supervisor. This is probably the minimum requirement for marking what is usually a substantial piece of work that contributes more than any other item of assessment to the overall grade awarded for a course.
Introducing a larger number of markers into the process may reduce the chance of, or impact of, unconscious bias.

Oral presentations

Oral presentations cannot be anonymised even for the second marker, and this is a compelling reason for oral presentations to be double marked particularly if they contribute to an overall degree classification. If the presentations are video-taped, or audio-taped, they can also be reviewed by a third marker, and although they cannot avoid being aware of the gender and appearance of the students, will at least be unaware of the students’ personal histories in the department and will have no personal relationship with the students that could bias their appraisal of their performance.

Promoting consistency in marking

In addition to formal methods for controlling the quality of marking, such as moderation, there are a number of activities and processes that can help to achieve and maintain consistency of marking across markers.

These are all different aspects of one underlying principle – that of involving all staff who take part in the assessment of student as active members of a community of shared practice and understanding.

Shared staff involvement in all aspects of student assessment (communities of practice)

Tightly knit communities of practice among academics involved in student assessment were at one time much easier to maintain in higher education. This has become much more difficult with the increasing trend towards fragmentation of the academic communities in higher education, which is part of the rationale for greater transparency of the process through increased documentation. Activities that help to promote such communities of practice are:

Common involvement in and ownership of the formal assessment criteria through calibration activities forming part of preparation for assessment (see above)
Attendance at moderation meetings and discussions of student performance.
Pre-meetings of markers in advance of the main marking task, to discuss what is required of students and what markers should be looking for in their work, and to marker’s applications of the assessment criteria. This can usefully be done with a small sample of work marked independently and with judgements shared before completing the remaining marking. This process is a second calibration exercise.
Team preparation of course/programme (re)validation documents.
Team development of programme handbooks and module booklets.
Marking workshops - a useful way of inducting new or less experienced tutors and part-time lecturers. This might include a similar process to pre-meetings of markers, more extensive collective marking or marking of work from a previous cohort – the principle is to share interpretations of the assessment criteria.
Discussion of, and common approaches (within subjects) to, ways of dealing with language proficiency, referencing and plagiarism. – This recognizes that learning how to write and communicate appropriately in different subject contexts is a developmental learning process which continues throughout the course of study.

Marking examinations

When pairs of markers meet to compare examination marks and resolve any differences through discussion, a record of the basis for the marks awarded based on the assessment criteria can save time and make the agreement process fairer.

Rather than having to read the answer again and try to remember why they awarded the mark they did, markers of discursive papers who have used a form like this EXTRA can quickly locate the reasons for their different marks and focus on those to agree a mark more quickly and more fairly. Such a form may also be useful in interpreting detailed mark schemes for feedback to students.

Moderation and external examining

The purpose of moderation and external examining relates to the assurance of academic standards.

Calibration before the launch of an assessment activity (see above) and moderation of marked work afterwards are designed to safeguard against errors and biases in marking and help to create an assessment community amongst academic staff .

Moderation is an expensive and time-consuming process but will improve reliability, particularly when used in conjunction with calibration activities, so that awareness of assessment criteria and/or marking schemes act along with discussion of marking decisions.

The method chosen for moderation after the assessment task has been completed should suit the module, task and level of the assessment.

Seven methods of moderation have been identified

universal unseen double marking (where two markers mark all the assignments, with the second marker not having sight of the marks awarded by the first marker),
universal seen double marking (where a second marker sees all the assignments and has access to marks awarded by the first marker),
universal second marking as a check or audit (where the second marker sees all the assignments to review the work of the first marker),
second marking as sampling (where the second marker sees only a sample of the students’ work),
partial second marking (where second marking is applied only to certain categories of assignment, such as fails, firsts and borderlines),
marking teams (where groups of markers work together), and
second marking for clerical aspects of the first marking (such as transcription or addition of marks).

What system to select for a given module or assignment? The minimum standards required by the University assessment policy are for second marking for a 20% sample of the assignments, with no requirement for the second marker to work blind to the marks awarded by the first.

Course/Subject leaders and Subject Standards Boards should consider whether the minimum standard is always sufficient, however, given the principles of fairness to students, fitness for purpose, and the requirement for mechanisms to avoid and detect biases and errors. There are several factors to bear in mind:

Where students do not routinely receive feedback on their work, there is a greater likelihood of lapses in attention on the part of markers. This means that there is a much stronger need for universal double or second marking for examination scripts than for coursework.
There is potentially a trade off between the expertise and experience of the marker and the level of quality control required. This means that there is a much stronger need for universal double or second marking where less experienced markers or staff from outside the university are involved in student assessment.
The importance of the assignment, and the significance of the decisions being taken by markers, should influence the choice of the moderation model. This means that there is a much stronger case for universal double unseen marking for assignments that contribute significantly to students’ degree classification.

Moderation is an expensive process which can delay the return of feedback to students therefore it is useful to spend some time in deciding on the method of double or second marking. Key factors include:

the form and function of the assessment
the contributory weight of the assessment
the level of study
the norms of the discipline
the number of students
the number of markers and their level of experience and responsibility
the necessity of making adjustments to marks across the whole cohort, not just the sample of work moderated.

Whatever model is adopted, the decision should be formalised through a Subject Standards Board and the rationale for the choice should be made clear.

Resolution of marks

If work has been universally double marked (seen or unseen) the two markers will agree or negotiate the mark for each individual piece of work. Where moderation is based on a sample and there is general agreement, the first marker’s marks stand for the cohort. If there is substantial variation in marks then further discussion and exploration is needed to determine the extent of the differences. Whatever adjustment is made must be applied fairly to all relevant candidates and not just to those sampled by the second marker. If the two markers cannot agree a third internal marker is used (Bloxham and Boyd 2007). There should be a record of the moderation process available for quality audits and reviews. All involved in marking should take responsibility for their work by signing off the mark sheets at the end of the process.

External examining (see Chapter 5 of the Quality Manual 21-22)

External examiners are recruited by the university to provide oversight of academic standards on a module or course.

Making assessment judgement - Top tips

Staff making assessment decisions should be fully familiar with the academic regulations.
The process used for arriving at assessment decisions should be fair and transparent, considered and constructed to that it can be effectively communicated to markers, moderators, students, and external examiners,
Assessment decisions should be guided by assessment criteria specifically designed for the assessment task, and suitable grade descriptors. (See section 5: Assessment criteria).
Staff teams engaged in making assessment decisions should meet before the assessment is launched to agree the criteria and benchmarks that will apply, to ensure consistency and fairness.
A calibration event involving all those involved in making assessment decisions should take place before marking commences.
Wherever possible, student work should be marked anonymously.
A suitable second or double marking process that complies with academic regulations must be carried out; there may be a need for further moderation or parity exercises where there is a large cohort and/ or a large marking team, or where the work is complex or carrying a high credit load.
That a moderation process has been carried out should be visible on the work assessed normally through the addition of comments by, or the name of, a moderator.
Staff should always be mindful to assess only the work presented for assessment and avoid introducing knowledge of the student and their circumstances.
Students should have the opportunity to participate in or observe the process of making assessment decisions eg through reviewing and grading prior submissions or providing a video guidance on how the marker on the module reviews/grades the work etc. This is particularly important where self- or peer- assessment takes place
Students should not be engaged in summative assessment of their own or others’ work.
Marks and feedback should be issued in accordance with academic regulations and University policy.
Close the assessment loop by sharing external examiner feedback with students and the actions taken to improve the assessment (if necessary) as a result of this.

References and Resources

Bloxham, S. and Boyd, P. (2007). Developing Effective Assessment in Higher Education. Maidenhead: McGraw Hill.

Campbell, P. et al (2021) Available at:Tackling Racial Inequalities in Assessment in Higher Education. A Multi-Disciplinary Case Study. University of Leicester.

Dawson, P. (2017). Assessment rubrics: towards clearer and more replicable design, research and practice. Assessment & Evaluation in Higher Education, 42(3), 347-360.

Ecclestone, K. (2001). “I know a 2.1 when I see it”: understanding degree standards in programmes franchised to colleges. Journal of Further and Higher Education, 25, 301-313.

Edd, P. and Quinlan, K.(2022) Available at: Impacts of Higher Education Assessment and Feedback Policy and Practice on Students: A Review of the Literature 2016-2021.

Newstead, S.E. & Dennis, I. (1990). Blind marking and sex bias in student assessment. Assessment and Evaluation in Higher Education, 15, 132-139.

Newstead, S.E.(. (2002). Examining the examiners; why are we so bad at assessing students?. Psychology Learning and Teaching, 2, 70-75

Orrell, J. (2008) ‘Assessment beyond belief: the cognitive process of grading,’ in Havnes, A. and McDowell,L. (eds) Balancing dilemmas in assessment and learning in contemporary education. London, Routledge, pp.251-263.

OCSLD (2020). The Design and Use of Assessment Rubrics. Oxford Brookes University

Popham, W. J. (1997). What's wrong and what's right-with rubrics. Educational Leadership, 55, 72-75.

Price, M. (2005) Assessment standards: the role of communities of practice and the scholarship of assessment, Assessment and Evaluation in Higher education, 30, 3, 213-230