Tentamen (uitwerkingen)

Examining the validity of an analytic rating scale for a Spanish test for academic purposes using the argument-based approach to validation

6 keer bekeken 0 keer verkocht

Vak
Validity of an analytic rating scale

Instelling
Validity Of An Analytic Rating Scale

Students wishing to study at a university where the medium of instruction is different from their mother tongue are often required to prove their proficiency by taking a language test for academic purposes. These tests are considered high-stakes because results are used to make decisions that hav...

[Meer zien]

Voorbeeld 2 van de 15 pagina's

Bekijk voorbeeld

Geupload op 6 augustus 2024
Aantal pagina's 15
Geschreven in 2024/2025
Type Tentamen (uitwerkingen)
Bevat Vragen en antwoorden

examining the validity of an analytic rating scale

Instelling Validity of an analytic rating scale
Vak Validity of an analytic rating scale

$14.99

Toegevoegd

In winkelwagen Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Assessing Writing 35 (2018) 41–55

Contents lists available at ScienceDirect

Assessing Writing
journal homepage: www.elsevier.com/locate/asw

T
Examining the validity of an analytic rating scale for a Spanish test
for academic purposes using the argument-based approach to
validation
Arturo Mendozaa, , Ute Knochb
⁎

a
Department of Applied Linguistics, School of Languages, Linguistics and Translation, Universidad Nacional Autónoma de México, Circuito interior s/
n, CP 04510, Mexico City, Mexico
b
Director, Language Testing Research Centre, University of Melbourne, Parkville 3010, Victoria, Australia

AR TI CLE I NF O AB S T R A CT

Keywords: Rating scales are used to assess the performance of examinees presented with open-ended tasks.
Analytic rating scales Drawing on an argument-based approach to validation, this study reports on the development of
Writing assessment for academic purposes an analytic rating scale designed for a Spanish test for academic purposes. The study is one of the
Argument-based approach to validation ﬁrst that sets out the detailed scale development and validation activities for a rating scale for
Many-facet Rasch measurement
Spanish as a second language. The rating scale was grounded in a communicative competence
model and developed and validated over two phases. The ﬁrst version was trialed by ﬁve raters,
and its quality was analyzed by means of many-facet Rasch measurement. Based on the raters’
experience and on the statistical results, the rating scale was modiﬁed and a second version was
trialed by six raters. After the rating process, raters were sent an online questionnaire in order to
collect their opinions and perceptions of the rating scale, the training and the feedback provided
during the rating process. The results suggest the rating scale was of good quality and raters’
comments were generally positive, although they mentioned that more samples and training
were needed. The study has implications for rating scale development and validation for lan-
guages other than English.

1. Introduction

Students wishing to study at a university where the medium of instruction is diﬀerent from their mother tongue are often required
to prove their proﬁciency by taking a language test for academic purposes. These tests are considered high-stakes because results are
used to make decisions that have important consequences in students’ lives (Bachman & Palmer, 2010; Kane, 2013). In order to
guarantee that scores are fair, language tests must be carefully scrutinized and validated to ensure that the scores and the inter-
pretations based on the scores are valid and fair. When examinees are presented with open-ended writing tasks, the scripts they
produce are usually assessed by trained raters who use a rating scale to assign a score to the examinee’s performance. Rating such
performances is a complex undertaking. A score on such a writing test is not always purely a reﬂection of the writers’ performance,
but the outcome of the interaction between the rater, the rating scale and the script (Crusan, 2014; McNamara, 1996; Weigle, 2002).
This interaction can lead to undesired sources of variability that threaten the reliability of the exam and its results (East, 2009). This is
why rater training and monitoring is essential (Knoch, 2009, 2011; Weigle, 2002), as is studying the quality of the scoring process and

⁎
Corresponding author at: Department of Applied Linguistics, School of Languages, Linguistics and Translation, Universidad Nacional Autónoma de México,
Circuito interior s/n, CP 04510, Mexico City, Mexico.
E-mail addresses: a.mendoza@enallt.unam.mx (A. Mendoza), uknoch@unimelb.edu.au (U. Knoch).

https://doi.org/10.1016/j.asw.2017.12.003

1075-2935/ © 2018 Elsevier Inc. All rights reserved.
Received 11 July 2016; Received in revised form 7 December 2017; Accepted 19 December 2017

, A. Mendoza, U. Knoch Assessing Writing 35 (2018) 41–55

the scores (Montee & Malone, 2014).
While publications on work on rating processes, including scale development, and rater functioning are abundant in the as-
sessment of English as a second or foreign language, very little has been written about similar endeavors in the assessment of other
languages, for example Spanish (but see e.g., Ducasse & Hill, 2015 for the development of a rating scale to assess the writing of
Spanish speaking graduate students). In this paper, we describe the development and validation of a rating scale for Spanish for
academic purposes. This study is important as it sets out in detail the kind of procedures that other researchers involved in scale
development for rating scales for languages other than English may want to follow or adapt. In particular, we argue that rating scales
for languages other than English cannot simply rely on adapting a scale developed for English in such contexts as there are clear
diﬀerences in the languages and in the way second language ability develops. In the literature review that follows, we describe the
existing literature on rating scale development and validation, the assessment of language for academic purposes more generally and
the assessment of Spanish for academic purposes. We then describe the context of the study and the current project in more detail.

1.1. Rating scale development and validation

The development and validation of rating scales for academic writing is no simple undertaking. Scales should be conceived and
designed with the purpose of the assessment in mind (Crusan, 2014; Fulcher, 2010; Knoch, 2009; Montee & Malone, 2014; Weigle,
2002) and should be a good representation of the construct of the assessment (McNamara, 2002). In the anglophone context, rating
scales are often adapted or adopted from existing scales (Becker, 2011). For instance, in an academic setting, rating scales might be
derived from rating scales used in large-scale language tests for academic purposes. However, East (2009) cautions about the perils of
adapting rating scales from other similar ones, especially across languages. He argues that rating scales should take the target
language into account.
Rating scale developers have a number of decisions to make in the development process, all of which have been described in detail
in the literature. The type of rating scale selected (e.g., holistic, analytic, checklist, etc.) needs to closely reﬂect the purpose of the test
(Crusan, 2014; Hamp-Lyons, 1991; Montee & Malone, 2014; Weigle, 2002) and the outcome reported to users (Knoch, 2009). The
criteria in a scale are usually a reﬂection of the test construct and can either be based on a theory of language learning or devel-
opment or may be a reﬂection of a careful empirical analysis of written data produced by students (Fulcher, 2010; Knoch, 2009, 2011;
Montee & Malone, 2014). Scale designers need to ensure that the scale is not overly context-dependent, and therefore not gen-
eralizable to other testing contexts (Fulcher, 2010). Further decisions involve the number of band levels included in a scale (see e.g.,
Alderson, Clapham, & Wall, 1995; Attali, Lewis, & Steier, 2012).
Rating scale validation is often not clearly articulated in scale development reports, which makes it diﬃcult to conduct com-
parisons between studies or replication research. Scale validation projects are also rarely framed within a theoretical model of scale
validation or validation in assessment. A brief review of recently published scale development and validation studies in language
assessments shows that very few of these studies were grounded within a theoretical model of validation (but see Deygers & Van
Gorp, 2015; Janssen, Meier, & Trace, 2015; Knoch, 2009; Lallmamode, Daud, & Kassim, 2016; Youn, 2015). In a recent paper
integrating rating processes into an argument-based framework to validation, Knoch and Chapelle (2017) put forward a range of
warrants, assumptions and possible sources of backing, many of which are directly relevant to the validation of rating scales. Drawing
on Kane’s conceptualization of inferences, warrants and assumptions Kane (2001, 2006, 2013), they were able to show that rating
processes are not only located within the evaluation inference as commonly conceptualized, but have relevance throughout most
inferences described in validation work. The warrants and assumptions relating to rating scales focused not only narrowly on the
scoring inference (as it was previously conceptualized), but showed that rating scales relate more broadly to all inferences in an
argument-based approach to validation, including the explanation inference (which examines the theoretical construct underlying
the test and the scale, as well as test consequences and decisions). Their framework provides a useful starting point for rating scale
validation and we will draw on this framework to situate our validation work as outlined in the description of the current study
below. Due to the scope of this study, we will focus on parts of the evaluation and explanation inference in this paper only, however
in the ﬁnal section of the paper, we also provide suggestions for future work to broaden the validation activities. We list the speciﬁc
warrants and assumptions for which we sought backing for this study in Table 1 in the methodology section.

1.2. Language tests for academic purposes

Tests designed for academic purposes should authentically reﬂect the writing skills needed by students for academic success
(Cumming, 2013, 2014). These skills vary from ﬁeld to ﬁeld, making the selection of writing tasks a diﬃcult endeavor. Studies
conducted in Anglophone contexts have shown the diversity of genres and writing tasks required of university students across
academic disciplines (Canseco & Byrd, 1989; Cooper & Bikowski, 2007; Gardner & Nesi, 2012; Hale et al., 1996; Horowitz, 1986).
Research has also been conducted with faculty members and students regarding the importance of diﬀerent academic writing skills
(Rosenfeld, Courtney, & Fowles, 2004; Rosenfeld, Leung, & Oltman, 2001) and these studies have highlighted the importance of
academic writing skills such as paraphrasing, and the ability to appropriately cite from a range of sources. For Spanish, such studies
are few, but they reﬂect to a large extent what has been found in Anglophone contexts (Castelló et al., 2012; Hernández & Castelló,
2014; Mendoza, 2014). Without a careful examination of the setting under assessment – in this case, academic writing – there is a risk
of under-representing or ill-deﬁning the construct (Cumming, 2014).

42

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper Ariikelsey. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $14.99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 67096 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen

Laatst bekeken door jou

Tentamen (uitwerkingen) ·

(0)

ONSONCC Chemotherapy Immunotherapy Certificate Exam Questions and Answers 2024 / 2025 (Verified Answers by Expert)

Tentamen (uitwerkingen) ·

(0)

Popular books

Popular Universities in the United States

Tentamen (uitwerkingen)

Examining the validity of an analytic rating scale for a Spanish test for academic purposes using the argument-based approach to validation

Document informatie

Onderwerpen

Geschreven voor

Verkoper

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?

Laatst bekeken door jou

Tentamen (uitwerkingen) ·

ONSONCC Chemotherapy Immunotherapy Certificate Exam Questions and Answers 2024 / 2025 (Verified Answers by Expert)

Tentamen (uitwerkingen) ·

Midterm NSG6420 Exam Questions with Correct Answers 100% Solved