Proposal view
Proposal Type: Symposium 
Domain: Assessment and Evaluation 
SIG: Assessment and Evaluation 
Scheduling category: Assessment and Evaluation 
Type Submitted Symposium 
Title Critical elements of peer assessment for learning: Reliability, task-type, and feedback 
Abstract

Peer assessment is an arrangement where equal status students judge a peers’ performance with a rating scheme or qualitative report (Topping, 1998), and stimulates students to share responsibility, reflect, discuss and collaborate with their peers (Boud, 1990; Orsmond, Merry, & Callaghan, 2004). Although, many researchers advocate that peer assessment has a positive effect on learning, the empirical evidence is either based on student self-report ratings or anecdotal evidence from case studies and usually not on standardised performance improvement measures. Furthermore, peer assessment is rarely studied in (quasi-) experimental settings (comparing an experimental group to a control or baseline group), which considerably limits the claims and evidence regarding specific conditions that are believed to affect learning. Hence, the empirical support for learning effects, as well as effects of specific peer assessment conditions, is scarce.

The contributions in this symposium will address three elements that are critical to peer assessment for learning. A first element is reliability. Reliability of peer versus teacher assessment is a core topic in peer assessment research and learning benefits are often inferred from the accuracy of peer versus teacher assessment (Falchikov & Goldfinch, 2000), but the reliability of multiple peer assessments and the relationships of self-assessment with peer and teacher assessment are rarely considered. Secondly, reviews of peer assessment studies reveal a high degree of diversity in practices, but the impact of elements like task-complexity task-type and (domain specific task and peer assessment task) on learning is hardly investigated. Thirdly, the element of feedback for learning has recently regained attention (Hattie & Timperley, 2007; Shute, 2008) and although the role of peer feedback for peer assessment is broadly advocated, the impact of peer feedback content and the specific peer feedback setting (e.g., opportunity to reply; sender of feedback) on students’ performance and learning are not well understood.

 
Equipment Computer and data projector / beamer
Keywords Assessment methods
New Modes of Assessment
Peer Interaction 
Chairperson list
First Name Last Name/Surname Institution Country E-Mail EARLI Number
Dominique Sluijsmans Open University of the Netherlands Netherlands dominique.sluijsmans@ou.nl  
Organiser list
First Name Last Name/Surname Institution Country E-Mail EARLI Number
Jan-Willem Strijbos Leiden University Netherlands jwstrijbos@fsw.leidenuniv.nl  
Dominique Sluijsmans Open University of the Netherlands Netherlands dominique.sluijsmans@ou.nl  
Discussant list
First Name Last Name/Surname Institution Country E-Mail EARLI Number
Frank Fischer Ludwig Maximillians University Germany frank.fischer@psy.lmu.de  
Paper Details
Paper type Empirical
Title Self-, peer and instructor assessment in a wiki environment: Reliability and accuracy
Abstract

The present study focuses on the combination of self-, peer and instructor assessment in a 1st year university course. Students (N = 325) worked collaboratively in groups of 8 on wiki-assignments. They were asked to rate themselves and their peers on 4 criteria. The instructors used the same criteria and scale to assess the students. The main aims of this study are to (1) check the reliability of several peer marks of the same students before (2) comparing the peer assessment scores with self- and instructor assessment scores in order to investigate the reliability and accuracy of self- and peer assessment. Krippendorff’s alpha (α) was used to calculate the reliability of the peer scores of an individual student. When the most deviant assessment scores are discarded the α coefficients for the four criteria are between .75 and .83, which can be considered reliable. When using this reliable peer assessment score and comparing it to the self- and instructor assessment scores, we find rather low degrees of correlation between self-assessment scores and both peer and instructor assessment scores, and moderate degrees of correlation between peer assessment scores and instructor assessment scores. The results imply that students’ peer assessment scores on the work of their fellow students in the wiki-environment appear to be mutually concordant and consistent with instructor scores.

Summary

Technological tools are able to support both collaborative learning and alternative forms of assessment. As a consequence, self- and peer assessment tools are increasingly implemented in online learning environments. Several authors have studied the accuracy of self- and peer assessment (for an overview see the review study of Dochy, Segers, & Sluijsmans, 1999). However, the results are often inconclusive with regard of the accuracy of the ratings. Moreover, peer assessment is often compared to self- or instructor assessment, but mutual comparisons of the assessment scores assigned to each individual by their peers are currently lacking. In this respect, the main aims of this study are to (1) check the reliability of several peer marks of the same students before (2) comparing the peer assessment scores with self- and instructor assessment scores in order to investigate the reliability and accuracy of self- and peer assessment.

Method

The present study focuses on the combination of self-, peer and instructor assessment in a 1st year university course. According to Dochy et al. (1999), “self- and peer assessment are combined when students are assessing peers but the self is also included as a member of the group and must be assessed” (p. 340). Students (N = 325) had to work collaboratively in groups of 8 on wiki-assignments. They were asked to rate themselves and their peers on 4 criteria: (1) contribution (are the contributions relevant, full of information, and goal-oriented); (2) discussion (does the student states an opinion, grounded on arguments and sources); (3) sources (is the student actively searching for - or creating - material that is relevant for the wiki); and (4) social aspects (is the student actively collaborating, interacting, discussing, commenting, and helping others). Each criterion was graded on a scale ranging from 1 to 4. The instructors used the same criteria and scale to assess the students.

Results

Krippendorff’s alpha (α) was used to calculate the reliability of the peer scores of an individual student. The results show that when taking the peer assessment scores of all students in a group (on average 5.85 students), the assessment scores cannot be considered highly reliable (α between .41 and .65 for the 4 criteria). However, if we discard the most deviant assessment scores (based on an approach suggested by Sluijsmans, Dochy, and Moerkerke (1999) and keeping on average the assessment of 4.07 to 4.28 students), the α coefficients for the four criteria are between .75 and .83, which can be considered reliable.

When using this reliable peer assessment score and comparing it to the self- and instructor assessment scores, we find rather low (.28 to .40) degrees of correlation between self-assessment scores and both peer and instructor assessment scores, and moderate (.40 to .58) degrees of correlation between peer assessment scores and instructor assessment scores – except for the “social” criteria, where all correlations are of a moderate degree (.42 to .58). The correlations are all significant and are presented in Table 1.

Conclusion

In conclusion, a high reliability can be achieved in peer assessment with 1st year university students and moderate positive correlations between peer assessment and instructor assessment can be found. These results imply that students’ peer assessment scores on the work of their fellow students in the wiki-environment appear to be mutually concordant and consistent with instructor scores. The presentation will go deeper into the process of reliability and accuracy checking and implications for assessment procedures in wiki-based environments will be discussed.

References

Dochy, F., Segers, M., & Sluijsmans, D. (1999) The use of self-, peer and co-assessment in higher education: A review. Studies in Higher Education, 24, 331-350.

Sluijsmans, D., Dochy, F., & Moerkerke, G. (1999).Creating a learning environment by using self-, peer- and co-assessment. Learning Environments Research, 1, 293–319.

Keywords Assessment methods
New Modes of Assessment
Peer Interaction
Appendices Table 1_De Wever et al.jpg 
Authors
Name Surname Institution Country e-mail EARLI Number Presenting
Bram De Wever Gent University Belgium bram.dewever@ugent.be   *  
Hilde Van Keer Gent University Belgium hilde.vankeer@ugent.be    
Tammy Schellens Gent University Belgium tammy.schellens@ugent.be    
Martin Valcke Gent University Belgium martin.valcke@ugent.be    
Paper type Empirical
Title Peer assessment for learning research skills: Effects of task type and task complexity
Abstract

Previous research by Van Zundert, Sluijsmans, and Van Merriënboer (submitted) emphasized that a wide variation in peer assessment practices complicates clarifying specifically which variables foster learning. Moreover, it was show that the share of true/quasi experimental studies in this field is small, which is surprising as this type of education calls for more innovative modes of assessment. The current study tackled these issues by conducting quasi experimental research in secondary education, specifying peer assessment in terms of scoring and commenting. We compared peer assessment skill and domain skill, and investigated the effects of task complexity on these skills. A total of 110 secondary education students were presented with study tasks and peer assessment formats (to support students in their domain skill and peer assessment skill respectively), transfer tasks (to measure domain skill), and peer assessment tasks (to measure peer assessment skill), all in the domain of biology research. In case of scoring, for complex tasks, students’ peer assessment skill appeared to be poorer than their domain skill. In case of providing comments, peer assessment skill was poorer than their domain skill, regardless of task complexity. Future research should examine why students experience more difficulties in peer assessments and how they can be supported in learning this skill.

Summary

Peer assessment has become increasingly popular in education. A wide variation in peer assessment practices complicates clarifying which variables foster learning. Previous research by Van Zundert, Sluijsmans, and Van Merriënboer (submitted) indicated that this remains problematic because of holistic peer assessment descriptions in existing research reports. Moreover, a diversity in terms and instruments for measuring peer assessment effects and a lack of true/quasi experimental research in this field complicate drawing inferences in terms of cause and effect. Additionally, the share of studies in secondary education is small, which is surprising as current developments here call for more innovative modes of assessment. This study tackled these issues by conducting quasi experimental research in secondary education, specifying peer assessment in terms of scoring and commenting.

This study compared peer assessment skill and domain skill, and investigated the effects of task complexity and task type on these skills. Complex tasks induce higher cognitive load, based on three principles of element interactivity of Cognitive Load Theory: adding more elements, adding irrelevant information, and providing ambiguous relations between the elements. It was hypothesized that:

  1. Students perform better on domain-specific tasks than on peer assessment tasks;
  2. This is especially true in case of complex tasks;
  3. Students experience higher cognitive load in case of complex tasks.

Method

Participants

A total of 110 students from a school for secondary education participated. They were randomly divided between simple (N=51) and complex (N=59) conditions. Their mean age was 15.43 years (SD= .60), and 47.3% was male.

Design

The study was set up as a 2x2 mixed factorial design with the between-subjects factor Task Complexity (simple and complex) and within-subjects factor Task Type (domain-specific task and peer assessment task).

Materials

Three task types (study-, transfer-, and peer assessment tasks) and a peer assessment format were designed collaboratively with a biology teacher. All materials were embedded in an electronic learning environment specifically designed for this study.

Study tasks. To support students in their domain skill, four study tasks were designed for each complexity. These tasks consisted of short biology experiment descriptions. Students were shown how to identify the six steps of scientific research (e.g.: observation, conclusions) in each experiment.

Peer assessment formats. A peer assessment format was developed as an aid to guide students in their peer assessment skill. It contained information on the six steps with an explanation, and assessment guidelines.

Transfer tasks. To measure domain skill, there were two transfer tasks, which provided students with experiment descriptions that were with dissimilar to the ones used in the study tasks. Students were required to fill in the correct steps of scientific research near the appropriate experiment text and to provide a rationale for their choice. To prevent guessing, only experiment text for five of the six steps of scientific research was displayed in each transfer task.

Peer assessment tasks. To measure peer assessment skill, two peer assessment tasks were designed. Students assessed the solutions of a fictitious peer. Here too, only experiment text for five steps was displayed in each task.

Cognitive load measure. To measure students’ perceived cognitive load on all tasks, a subjective 9-point rating scale ranging form 1 ‘very, very low effort’ to 9 ‘very, very, high effort’ was used (Paas, Van Merriënboer, & Adams, 1994).

Quality of Selection Rating Scale (QSRS). An expert score on the two transfer tasks and on the two peer assessment tasks was a first measure for domain skill and peer assessment skill. Performance could vary between a minimum score of 0 and a maximum score of 10 per task type (i.e., transfer task and peer assessment task).

Quality of Argumentation Rating Scale (QARS). The written rationale students provided in the transfer tasks and peer assessment tasks provided a second measure for domain skill and peer assessment skill. A minimum score of 0 and a maximum score of 40 could be achieved per task type. Two researchers independently rated all comments, with an interrater reliability of .73 (Cohen’s Kappa).

Procedure

Students received a short introduction, logged on to a computer and entered the electronic learning environment. Students studied four study tasks and were simultaneously presented with the peer assessment format. Subsequently, they solved two transfer tasks of similar complexity to their study tasks. Thereupon, they received two peer assessment tasks of similar complexity to the study tasks, and the corresponding solutions of a fictitious peer. Students peer assessed the received task solutions. They indicated their perceived cognitive load after each task (four study tasks, two transfer tasks, and two peer assessment tasks; hence eight times in total). The entire procedure took approximately two hours.

Results

QSRS performance

For complex tasks, students scored significantly lower on peer assessment skill (M=6.86; SD=1.73) than domain skill (M=7.86; SD=2.22).

QARS performance

Both for simple and complex tasks, students performed significantly worse in peer assessment (M=18.43; SD=6.60; M=16.60; SD=5.44 for simple and complex tasks respectively) than in domain (M=24.47; SD=4.80; M=22.85; SD=5.94 for simple and complex tasks respectively).

Cognitive load

Students’ perceived cognitive load on peer assessment tasks was significantly higher (M=4.79; SD=1.20; M=4.79; SD=1.37 for simple and complex tasks respectively) than that of transfer tasks (M=4.43; SD=1.36; M=4.47; SD=1.48 for simple and complex tasks respectively) and study tasks (M=4.07; SD=1.51; M=4.06; SD=1.47 for simple and complex tasks respectively).

Conclusion

For scoring, the interaction effect of the second hypothesis was confirmed. For providing comments, the main effect of the first hypothesis was confirmed. These results raise the questions why students experience more difficulties in peer assessment than in domain-related work, and how students can be supported in peer assessment. Future research may explore student and teacher views on this issue, which can be used to design new peer assessment supports.

References

Paas, G. W. C., Van Merriënboer, J. J. G., & Adams, J. J. (1994). Measurement of cognitive load in instructional research. Perceptual and Motor Skills, 79, 419–430.

Van Zundert, M. J., Sluijsmans, D. M. A., & Van Merriënboer, J. J. G. (submitted). Peer assessment for learning: A state-of-the-art in research and future directions.

Keywords Assessment methods
New Modes of Assessment
Peer Interaction
Appendices
Authors
Name Surname Institution Country e-mail EARLI Number Presenting
Marjo Van Zundert Open University of the Netherlands Netherlands marjo.vanzundert@ou.nl   *  
Dominique Sluijsmans Open University of the Netherlands Netherlands dominique.sluijsmans@ou.nl    
Jeroen Van Merrienboer Open University of the Netherlands Netherlands jeroen.vanmerrienboer@ou.nl    
Paper type Empirical
Title Peer feedback in undergraduate academic writing: How do feedback content, writing ability-level and gender of the sender affect feedback perception and performance?
Abstract

The shift towards student-centered learning places a high emphasis on students to assume responsibility for their learning. Peer assessment is well-suited in this respect: equal status students judge a peers’ performance by rating or a qualitative report (Topping, 1998). The impact of feedback on learning recently regained attention (Hattie & Timperley, 2007). Although peer assessment researchers stress the role of feedback, the evidence for peer feedback effects are scarce. Moreover, the impact of feedback content is hardly studied. Students also express concerns about the fairness and usefulness of peer assessment (Cheng & Warren, 1997), which appears related to sender characteristics that may influence the effect of peer feedback (Leung, Su, & Morris, 2001). In this study the effect of feedback content (concise evaluative feedback (CEF) or elaborated informative feedback (EIF), the senders’ writing ability-level (high/low) and the senders’ gender (male/female) on feedback perception and performance will be investigated.

In comparison to concise evaluative feedback (CEF), elaborate informative feedback (EIF) is perceived as more adequate (PAF) and leads to more positive affect. Similarly, feedback by a high ability sender is perceived as more adequate and leads to more positive affect. However, students’ response to willingness to improve (WI) is more complex: the feedback by ability-level interaction varies according to the senders’ gender. This point towards a possible gender discrimination of our female participants regarding a female peer, known as the ‘Queen Bee’ effect: females appear more critical of EIF by a high-ability female peer. The response in terms of WI is transferred to revision performance where a similar three-way interaction is observed. Future research should identify whether such a pattern might be observed for males, as well as the impact of sender characteristics in actual classroom settings.

Summary

The shift towards student-centered learning places a high emphasis on students to assume responsibility for their learning. Peer assessment is well-suited in this respect: equal status students judge a peers’ performance by rating or a qualitative report (Topping, 1998). The impact of feedback on learning recently regained attention (Hattie & Timperley, 2007). Although peer assessment researchers stress the role of feedback, the evidence for peer feedback effects are scarce. Moreover, the impact of feedback content is hardly studied. Students also express concerns about the fairness and usefulness of peer assessment (Cheng & Warren, 1997), which appears related to sender characteristics that may influence the effect of peer feedback (Leung, Su, & Morris, 2001). In this study the effect of feedback content (concise evaluative feedback (CEF) or elaborated informative feedback (EIF), the senders’ writing ability-level (high/low) and the senders’ gender (male/female) on feedback perception and performance will be investigated.

Method

Participants. The subjects were 163 first year Educational Science students at Leiden University and participation in the experiment was part of a regular course. All students were female.

Design and materials. A three-way factorial between-subjects pretest treatment posttest control group design was conducted. Subjects in experimental conditions received concise evaluative feedback (CEF) or elaborated informative feedback (EIF), writing ability-level of the sender was high or low, and the senders’ gender was female (Astrid) or male (Joost). The experiment lasted ninety minutes (30 minutes per phase). Students were equally distributed over research conditions. The experiment was embedded in the context of ‘academic writing’. Subjects revised three texts in view of text comprehension criteria (e.g., clarity, conciseness, etc.) and completed questions regarding their writing ability, feedback perception and peer assessment attitude (semantic differential).

Procedure. During the pretest all subjects were asked to write down what they thought to be the criteria for a readable and understandable text. Next, they revised a text that contained several errors. Afterwards they answered questions regarding academic writing and their own writing ability. During the treatment phase the subjects studied a scenario consisting of text revised by a fictional student and the feedback that this fictional student received by a fictional peer. Feedback content, senders’ ability-level and sender’s gender were manipulated. Subjects in the experimental conditions answered questions on the perceived peer feedback adequacy (PAF: fairness, usefulness, acceptance), willingness to improve (WI) and affect (AF). Afterwards, they made a second revision of the ‘revised text’ using the psychological criteria for text comprehension provided in the scenario and apply the feedback. Subjects in the control condition received the same the criteria and revised the same text, but they received no feedback. During the posttest all subjects were asked to describe the criteria provided in the treatment/control phase. Subsequently they revised a third text and answered questions on their attitude towards peer assessment and feedback.

Results

Feedback perception. All scales were reliable: PAF (9 items, α=.90), WI (3 items, α=.71), and AF (6 items, α=.62). MANOVA reveals an overall main effect for feedback content (Wilks λ = .90, F = 4.595, p=.004, ηp2 =.093) and sender’s ability-level (Wilks λ =.72, F=17.485, p<.000, ηp2 =.281). Knowledge of sub-criteria is a significant overall covariate (Wilks λ = .94, F=2.721, p=.047, ηp2 =.057) and the between-subject effects reveal that it is only contributes to AF (F(1, 136) = 6.474, p=.012, ηp2 =.045). There is a significant main effect for feedback content on PAF (F(1, 136) = 10.746, p=.001, ηp2 =.073) and AF (F(1, 136) = 5.561, p=.020, ηp2 =.039). There is a significant main effect for sender’s ability-level on PAF (F(1, 136) = 36.276, p<.000, ηp2 =.211) and AF (F(1, 136) = 6.439, p=.012, ηp2 =.045). There is a significant three-way interaction for WI (F(1, 136) = 5.628, p=.019, ηp2 =.040; see Figure 1).

Learning outcomes. Revision performance (RP) correlated with time on task in all phases and ratios were computed (RP by time on task) and used in a repeated measures ANOVA. Mauchly’s test indicated that the sphericity had been violated (χ² = 28.873, df=2, p<.000). Degrees of freedom were corrected using Huynh-Feldt estimates (ε=.828). The results reveal a multivariate main effect for RP over time (F(1.8, 279.2) = 112.997, p<.000, ηp2 =.423): RP for the treatment (Mean=.221, SD=.010) and posttest (Mean=.283, SD=.011) are both higher than the pretest (Mean=.104, SD=.006). The multivariate interaction between RP and condition was not significant (F(14.5, 279.2) = 1.209, p=.266, ηp2 =.059). When excluding the control group, there is a three-way interaction for RP (Figure 2): the interaction between feedback content and ability-level of the sender depends on the sender’s gender (F(1, 137) = 6.330, p=.013, ηp2 =.044, R2 =.061).

Conclusion

In comparison to concise evaluative feedback (CEF), elaborate informative feedback (EIF) is perceived as more adequate (PAF) and leads to more positive affect. Similarly, feedback by a high ability sender is perceived as more adequate and leads to more positive affect. However, students’ response to willingness to improve (WI) is more complex: the feedback by ability-level interaction varies according to the senders’ gender. This point towards a possible gender discrimination of our female participants regarding a female peer, known as the ‘Queen Bee’ effect: females appear more critical of EIF by a high-ability female peer. The response in terms of WI is transferred to revision performance where a similar three-way interaction is observed. Future research should identify whether such a pattern might be observed for males, as well as the impact of sender characteristics in actual classroom settings.

References

Cheng, W., & Warren, M. (1997). Having second thoughts: Student perceptions before and after a peer assessment exercise. Studies in Higher Education, 22, 233-239.

Leung, K., Su, S., & Morris, M. W. (2001). When is criticism not constructive? The roles of fairness perceptions and dispositional attributions in employee acceptance of critical supervisory feedback. Human Relations, 54, 1155-1187.

Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77, 81-112.

Topping, K. (1998). Peer assessment between students in colleges and universities. Review of Educational Research, 68, 249-276.

Keywords Assessment methods
New Modes of Assessment
Peer Interaction
Appendices Figure 1_Strijbos et al.jpg 
Figure 2_Strijbos et al.jpg
Authors
Name Surname Institution Country e-mail EARLI Number Presenting
Jan-Willem Strijbos Leiden University Netherlands jwstrijbos@fsw.leidenuniv.nl   *  
Susanne Narciss Technical University Dresden Germany narciss@rcs.urz.tu-dresden.de    
Mien Segers Leiden University Netherlands segers@fsw.leidenuniv.nl    
Paper type Empirical
Title The effectiveness of peer feedback for learning: The effects of constructiveness, accuracy and embedding in the learning environment
Abstract

Under what conditions is peer feedback beneficial for learning? This study on formative peer assessment for writing assignments in grade 7 shows that receiving ‘justified’ comments in peer feedback will improve performance. No significant effect is found for appropriateness to assessment criteria, presence of positive and negative comments, specificity, presence of suggestions for improvement and clear formulation. Providing justifications is found to be more important than the accuracy of the comments. Asking assessees to indicate personal needs for feedback to their assessor before or to reflect upon the received feedback after peer assessment is not found to increases the learning gain, but the former does increase the constructiveness of the received feedback.

Summary

The main and inevitable difference between peer and staff feedback is that peer assessors are no domain experts, as opposed to teachers. A first consequence is that the accuracy of the feedback cannot be guaranteed. A second consequence is that the assessor is not seen as a ‘knowledge authority’ by the assessee, which results in more reticence about acceptance of the assessor’s judgement or advice by the assessee. A first issue is whether training peers in giving constructive feedback is a good trade-off for the lack of guaranteed accuracy of their feedback? And which characteristics of feedback are crucial to make it “constructive”? The present study provides an external validation of some quality criteria for peer feedback in the literature (e.g., Prins, Sluijsmans, & Kirschner, 2006). Six characteristics of ‘constructive feedback’ are selected in this study:

·       Appropriateness to assessment criteria;

·       Presence of positive and negative comments (unless no negative possible);

·       Specificity;

·       Justification;

·       Presence of suggestions for improvement;

·       Clear formulation.

A second issues is how peer feedback should be embedded in the learning environment, and how assessees should be taught to deal with it? Several researchers report a scepticism in students to rely on peer’s feedback. Gibbs and Simpson (2004) emphasise that it is not sufficient to focus on the kind or quality of feedback provided, but that it is necessary to address students’ response to feedback explicitly, in order to raise its impact on learning. They discuss two tactics to tackle this problem. A first tactic is to provide only feedback to those aspects that students request. Assessees may feel more personally addressed and support a ‘mindful reception’ of the feedback (Bangert-Drowns, Kulik, Kulik, & Morgan, 1991), and assessors may be motivated and guided to give useful feedback through feelings of ‘individual accountability’ and ‘positive interdependency’ (Slavin, 1989). In our study, this tactic was build into an a priori question form. A second tactic is to ask students to demonstrate how they used the feedback in their revisions. This tactic aims at closing the ‘feedback loop’ (Boud, 2000), and an a posteriori reply form was designed for this study.

Aim of this study

The following hypotheses will be tested:

1.      The six characteristics of constructive feedback have a positive effect on the performance improvement by assesses;

2.      The accuracy of critique in peer feedback is positively related to performance improvement, but the six characteristics of constructive feedback have an additional positive effect;

3.      Students in conditions with an a priori question form or an a posterior reply form show more performance improvement than students in the plain peer feedback condition;

4.      Students in the condition with an a priori question form provide ‘better’ feedback in terms of the six characteristics of constructive feedback.

Method

Participants. Sixty-eight Flemish seventh grade students participated. They were divided over three different classes, taught by the same teacher.

Peer assessment design. The present study adopts a (quasi)experimental repeated measures pretest-posttest control group design (see Figure 1). Three successive writing assignments (draft – feedback – revision cycle) are studied.

Conditions. The study consists of three conditions (see Figure 1), assigned at class level randomly.

1.                  ‘PA-REPLY group’: assessees report – in a written reply to the teacher – which feedback comments they took into account and how.

2.                  ‘QUEST-PA group’: assessees indicate – in a ‘question form’ – to their peers for which criteria they request feedback.

3.                  ‘PA group’: a ‘plain’ peer feedback condition.

Analyses. The dataset contains different assignments by the same student. Therefore, a multilevel approach will be used (Snijders & Bosker, 1999).

Results and conclusions

Hypothesis 1: Constructiveness of feedback. When controlled for entry level of performance, only justification has a significant positive effect on performance improvement. Furthermore, an interaction effect between justification and entry level appears to be significant and replaces the main effect of entry level. This indicates that the ceiling effect of students with a high entry level only comes into play when students receive feedback with justifications.

Hypothesis 2: Constructiveness against accuracy. The number of accurate negative comments is a significant predictor of performance improvement when added on its own or together with entry level in a regression. Adding all constructiveness indicators as competitive predictors, the accuracy loses its predictive power, while justification remains significant in a one-sided test. Consequently, it is more important to give constructive comments than to give accurate critique.

Hypothesis 3: Condition. Students in the conditions with the a priori question form and the a posterior reply form do not make significantly more progress than students in the condition with plain peer feedback.

Hypothesis 4: Condition and constructiveness of feedback. Feedback by assessors who receive a filled in question form by the assessee before they give feedback is composed more constructively (with regard to formulation, specificity and positive and negative comments) than in the conditions where the assessor starts from scratch. An increased perception of individual accountability and positive interdependency might have led to a higher engagement in providing constructive feedback. Their feedback, however, does not fit as good to the indicated assessment criteria as in the other conditions. Perhaps it is more difficult to distinguish the different criteria in this condition, because assessors are not expected to go systematically through all assessment criteria but respond to the assessees’ needs.

References

Bangert-Drowns, R., Kulik, C. L. C., Kulik, J. A., & Morgan, M. (1991). The instructional effect of feedback in test-like events. Review of Educational Research, 61, 213-238.

Boud, D. (2000). Sustainable assessment: Rethinking assessment for the learning society. Studies in Continuing Education, 22, 151-167.

Gibbs, G., & Simpson, C. (2004). Conditions under which assessment supports students' learning. Learning and Teaching in Higher Education, 1, 3-31.

Prins, F., Sluijsmans, D., & Kirschner, P. A. (2006). Feedback for general practitioners in training: Quality, styles, and preferences. Advances in Health Sciences Education, 11, 289-303.

Slavin, R. E. (1989). Research on cooperative learning: An international perspective. Scandinavian Journal of Educational Research, 33, 231-243.

Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel analysis. London: Sage.

Keywords Assessment methods
New Modes of Assessment
Peer Interaction
Appendices Figure 1_Gielen et al.jpg 
Authors
Name Surname Institution Country e-mail EARLI Number Presenting
Sarah Gielen University of Leuven Belgium sarah.gielen@ped.kuleuven.be    
Elien Peeters University of Leuven Belgium elien.peeters@ped.kuleuven.be    
Filip Dochy University of Leuven Belgium filip.dochy@ped.kuleuven.be   *  
Patrick Onghena University of Leuven Belgium patrick.onghena@ped.kuleuven.be    
Katrien Struyven University of Leuven Belgium katrien.struyven@ped.kuleuven.be    
Visit NQcontent
© European Association for Research on Learning and Instruction, 2010 All rights reserved.