Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

The Routledge Handbook of Language Testing
Nội dung xem thử
Mô tả chi tiết
The Routledge Handbook of
Language Testing
The Routledge Handbook of Language Testing offers a critical and comprehensive overview of
language testing and assessment within the fields of applied linguistics and language study.
An understanding of language testing is essential for applied linguistic research, language
education, and a growing range of public policy issues. This handbook is an indispensable
introduction and reference to the study of the subject. Specially commissioned chapters by
leading academics and researchers of language testing address the most important topics facing
researchers and practitioners, including:
An overview of the key issues in language testing
Key research methods and techniques in language test validation
The social and ethical aspects of language testing
The philosophical and historical underpinnings of assessment practices
The key literature in the field
Test design and development practices through use of practical examples
The Routledge Handbook of Language Testing is the ideal resource for postgraduate students, language
teachers and those working in the field of applied linguistics.
Glenn Fulcher is Reader in Education (Applied Linguistics and Language Testing) at the
University of Leicester in the United Kingdom. His research interests include validation theory,
test and rating scale design, retrofit issues, assessment philosophy, and the politics of testing.
Fred Davidson is a Professor of Linguistics at the University of Illinois at Urbana-Champaign.
His interests include language test development and the history and philosophy of educational
and psychological measurement.
This page intentionally left blank
The Routledge Handbook of
Language Testing
Edited by
Glenn Fulcher and Fred Davidson
First published 2012
by Routledge
2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN
Simultaneously published in the USA and Canada
by Routledge
711 Third Avenue, New York, NY 10017
Routledge is an imprint of the Taylor & Francis Group, an informa business
© 2012 Selection and editorial matter, Glenn Fulcher and Fred Davidson; individual chapters,
the contributors.
The right of the editors to be identified as the authors of the editorial material, and of the
authors for their individual chapters, has been asserted in accordance with sections 77 and 78
of the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced or utilised in any
form or by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying and recording, or in any information storage or retrieval system,
without permission in writing from the publishers.
Trademark notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Cataloging in Publication Data
The Routledge handbook of language testing / edited by Glenn Fulcher and Fred
Davidson.
p. cm.
Includes bibliographical references and index.
1. Language and languages–Ability testing. I. Fulcher, Glenn. II. Davidson, Fred.
P53.4.R68 2011
418.0028'7–dc23
2011019617
ISBN: 978-0-415-57063-3 (hbk)
ISBN: 978-0-203-18128-7 (ebk)
Typeset in Times New Roman
by Taylor & Francis Books
Contents
List of illustrations ix
List of contributors xi
Introduction 1
Glenn Fulcher and Fred Davidson
PART I
Validity 19
1 Conceptions of validity 21
Carol A. Chapelle
2 Articulating a validity argument 34
Michael Kane
3 Validity issues in designing accommodations for English
language learners 48
Jamal Abedi
PART II
Classroom assessment and washback 63
4 Classroom assessment 65
Carolyn E. Turner
5 Washback 79
Dianne Wall
6 Assessing young learners 93
Angela Hasselgreen
v
7 Dynamic assessment 106
Marta Antón
8 Diagnostic assessment in language classrooms 120
Eunice Eunhee Jang
PART III
The social uses of language testing 135
9 Designing language tests for specific social uses 137
Carol Lynn Moder and Gene B. Halleck
10 Language assessment for communication disorders 150
John W. Oller, Jr.
11 Language assessment for immigration and citizenship 162
Antony John Kunnan
12 Social dimensions of language testing 178
Richard F. Young
PART IV
Test specifications 195
13 Test specifications and criterion referenced assessment 197
Fred Davidson
14 Evidence-centered design in language testing 208
Robert J. Mislevy and Chengbin Yin
15 Claims, evidence, and inference in performance assessment 223
Steven J. Ross
PART V
Writing items and tasks 235
16 Item writing and writers 237
Dong-il Shin
17 Writing integrated items 249
Lia Plakans
18 Test-taking strategies and task design 262
Andrew D. Cohen
Contents
vi
PART VI
Prototyping and field tests 279
19 Prototyping new item types 281
Susan Nissan and Mary Schedl
20 Pre-operational testing 295
Dorry M. Kenyon and David MacGregor
21 Piloting vocabulary tests 307
John Read
PART VII
Measurement theory and practice 321
22 Classical test theory 323
James Dean Brown
23 Item response theory 336
Gary J. Ockey
24 Reliability and dependability 350
Neil Jones
25 The generalisability of scores from language tests 363
Rob Schoonen
26 Scoring performance tests 378
Glenn Fulcher
PART VIII
Administration and training 393
27 Quality management in test production and administration 395
Nick Saville
28 Interlocutor and rater training 413
Annie Brown
29 Technology in language testing 426
Yasuyo Sawaki
30 Validity and the automated scoring of performance tests 438
Xiaoming Xi
Contents
vii
PART IX
Ethics and language policy 453
31 Ethical codes and unexpected consequences 455
Alan Davies
32 Fairness 469
F. Scott Walters
33 Standards-based testing 479
Thom Hudson
34 Language testing and language management 495
Bernard Spolsky
Index 506
Contents
viii
Illustrations
Tables
7.1 Examiner–student discourse during DA episodes 114
8.1 Incremental granularity in score reporting (for the student JK) 127
14.1 Summary of evidence-centered design layers in the context of
language testing 210
14.2 Design pattern attributes and relationships to assessment argument 212
14.3 A design pattern for assessing cause and effect reasoning reading comprehension 213
14.4 Steps taken to redesign TOEFL iBT and TOEIC speaking and writing tests and
guided by layers in Evidence-Centered Design 217
23.1 Test taker response on a multiple-choice listening test 339
25.1 Scores for 10 persons on a seven-item test (fictitious data) 366
25.2 Analysis of variance table for the sample data 367
25.3 Scores for 15 persons on two speaking tasks rated twice on a 30-point scale
(fictitious data) 369
25.4 Analysis of variance table (ptr) for the sample data 2 (Table 25.3) 370
25.5 Analysis of variance table (p(r:t)) for the sample data 2 (Table 25.3), with raters
nested within task 371
26.1 Clustering scores by levels 380
33.1 Interagency Language Roundtable Levels and selected contexts – speaking 481
33.2 The Interagency Language Roundtable (ILR) to the American Council on the
Teaching of Foreign Languages (ACTFL) concordance 482
33.3 Foreign Service Institute descriptor for Level 2 speaking 483
33.4 American Council on the Teaching of Foreign Languages Advanced descriptor 483
33.5 Example standards for foreign language learning 484
33.6 Example descriptors for the intermediate learner range of American Council on
the Teaching of Foreign Languages K-12 Guidelines 485
33.7 Canadian Language Benchmarks speaking and listening competencies 486
33.8 Example global performance descriptor and performance conditions from the
Canadian Language Benchmarks 486
33.9 Common European Framework—global scale 488
33.10 Example descriptors for the Common European Framework of Reference
for Languages 489
ix
33.11 California English Language Development Standards Listening and Speaking
Strategies and Applications 490
33.12 California English Language Development Test blueprint from grade 2 491
Figures
8.1 Distracter characteristics curves (DCCs) 126
13.1 A sample score report for the internet-based TOEFL, Reading Subsection 198
13.2 The evolutionary role of test specs in test development 203
14.1 Toulmin’s general structure for arguments 209
14.2 Extended Toulmin diagram for assessment 212
19.1 Schematic table 287
19.2 Summary task 288
19.3 Outline completion task 289
21.1 Sample item from the WAT for Dutch Children 315
23.1 Item characteristic curves for three items from the 1PL Rasch model 338
23.2 Item characteristic curves for 1PL, 2PL, and 3PL 342
23.3 Category response curves for a five-point rating scale 344
24.1 Some sources of error in a test 352
25.1 Generalisability coefficients (Eρ2
) for different numbers of tasks and raters 370
26.1 Empirically derived, binary-choice, boundary definition scale schematic 385
27.1 Core processes 403
27.2 The assessment cycle 405
27.3 The assessment cycle showing periodic test review 408
32.1 Worksheet grid 476
Box
13.1 A sample test specification 199
Illustrations
x
Contributors
Jamal Abedi is a Professor at the School of Education of the University of California, Davis,
and a research partner at the National Center for Research on Evaluation, Standards, and Student
Testing. His research interests include accommodations and classification for English language learners,
and comparability of alternate assessments for students with significant cognitive disabilities.
Marta Antón is Associate Professor of Spanish at Indiana University–Purdue University Indianapolis and Research Fellow at the Indiana Center for Intercultural Communication. She has published on classroom interaction, sociocultural theory, dynamic assessment and Spanish sociolinguistics.
Annie Brown is a Principal Research Fellow with the Australian Council for Educational
Research. Her research interests include rater and interlocutor behaviour and the assessment of
oral proficiency
James Dean (J.D.) Brown, Professor and Chair in the Department of Second Language
Studies at the University of Hawai’i at Manoa, has worked in places ranging from Brazil to
Yugoslavia, and has published numerous articles and books on language testing, curriculum
design, and research methods.
Carol A. Chapelle is Distinguished Professor in Liberal Arts and Sciences at Iowa State University, where she teaches courses in applied linguistics. She is author of books on technology
for language learning and assessment.
Andrew D. Cohen is Professor of Second Language Studies at the University of Minnesota. His
research interests are in language learner strategies, language assessment, pragmatics, and research
methods.
Alan Davies is Emeritus Professor of Applied Linguistics at the University of Edinburgh. His
research interests include language proficiency testing; language testing as a research methodology;
the role of ethics in applied linguistics and language testing; and the native speaker construct.
Gene B. Halleck is Professor of Linguistics/TESL at Oklahoma State University. She has designed a
set of diagnostic tests, the Oral Proficiency Tests for Aviation, to accompany an aviation English
curriculum designed for the US Federal Aviation Administration.
Angela Hasselgreen is Professor of Language Didactics at Bergen University College, Faculty
for Teacher training, Norway. She has carried out research and published extensively on the
subject of assessing young language learners.
xi
Thom Hudson is Professor of Second Language Studies at the University of Hawai’i. His research
has concentrated on second language reading, testing, language for specific purposes, and program
development.
Eunice Eunhee Jang is Associate Professor at Ontario Institute for Studies in Education,
University of Toronto. Her research areas include diagnostic assessment for improving English
language learners’ English development, validation of English proficiency descriptors-based
assessment for K-12 English language learners, evaluation of school effectiveness in challenging
circumstances, and fairness issues.
Neil Jones is currently Director of the European Survey on Language Competences within the
Research and Validation Unit at the University of Cambridge, ESOL Examinations. His interests
include applying IRT to item banking, analysis and computer-adaptive testing, formative assessment,
scaling, and cross-language equating and standard-setting within multilingual proficiency frameworks.
Michael Kane holds the Samuel J. Messick Chair in Validity at Educational Testing Service in
Princeton, NJ. His main research interests are in validity theory and practice, in test score precision and errors of measurement, in differential performance and bias across racial/ethnic and
other groups, and in standard setting.
Dorry M. Kenyon, Vice-President and Director of the Language Testing Division at the
Center for Applied Linguistics in Washington, DC, is particularly interested in the application of
new technologies to the development and delivery of content and language assessments.
Antony John Kunnan is a Professor of Education at California State University, Los Angeles
and Honorary Professor of Education at the University of Hong Kong. His interests include test
fairness, test evaluation, structural equation modelling, differential item functioning, and language
requirements for immigration and citizenship.
David MacGregor is Manager of the Academic Language Testing Research and Development
Team at the Center for Applied Linguistics. His research interests include exploring the construct
of academic language and applications of cognitive linguistics to language testing.
Robert J. Mislevy is Professor of Measurement, Statistics and Evaluation and Affiliated Professor
of Second Language Acquisition at the University of Maryland at College Park. His research applies
developments in psychology, statistics, and technology to practical problems in educational assessment.
Carol Lynn Moder is Professor of Linguistics/TESL at Oklahoma State University. She has designed
aviation English assessments for the International Training Division of the US Federal Aviation
Administration and for Ordinate Corporation. She served as a consultant to the International Civil
Aviation Organization in 2005.
Susan Nissan is a Director of English Language Learning Assessments at Educational Testing
Service. Her current research interests include assessing listening, the use of corpora in assessment
and standard setting.
Gary J. Ockey is Director of Assessment in the English Language Institute at Kanda University of
International Studies in Chiba, Japan. He has published numerous theoretical and quantitative
articles in the field of language assessment.
Contributors
xii
John W. Oller, Jr. is the University of Louisiana Hawthorne Regents Professor IV. His current research focuses on communication disorders attributable to disrupted biological and social
control systems from genetics to human languages.
Lia Plakans is Assistant Professor in Foreign Language/ESL Education at the University of
Iowa. Her research focuses on assessing integrated skills and connections in L2 reading and
writing, language planning and policy, and second language learning.
John Read is an Associate Professor in Applied Language Studies at the University of Auckland, New Zealand. His research interests are in vocabulary assessment and the testing of English
for academic and professional purposes.
Steven J. Ross is Professor of Second Language Acquisition at the University of Maryland’s
School of Language, Literature and Culture. His research interests are focused on assessment of
language proficiency and performance, and how assessment methods provide the evidential basis
for models of instructed and naturalistic second language acquisition.
Nick Saville is Director of Research and Validation at the University of Cambridge, ESOL
Examinations. His research includes the development of Reference Level Descriptions for English to
supplement the Common European Framework of Reference (CEFR) and language assessment in
the context of migration.
Yasuyo Sawaki is Associate Professor of foreign language education and applied linguistics at
Waseda University in Tokyo, Japan. She has research interests in diverse topics in second/foreign language assessment including test validation, diagnosing language ability, test preparation,
and using technology in language assessment.
Mary Schedl is a Second Language Test Specialist at Educational Testing Service. Her current
research interests include assessing reading comprehension, especially the interrelationship of
specific text characteristics and item difficulty.
Rob Schoonen is Associate Professor of Second Language Acquisition at the University of
Amsterdam. His main research interests are second and foreign language acquisition, models of
language proficiency, language testing and research methodology.
Dong-il Shin is Professor in the Department of English Language and Literature at ChungAng University, Seoul, South Korea. His research interests include test development and validation, the interfaces between language testing and other academic disciplines, and the social
dimensions of English language assessment in South Korea.
Bernard Spolsky is Professor emeritus in the English Department at Bar-Ilan University. He
has published widely in applied linguistics, with a focus on language assessment, language policy
and language management.
Carolyn E. Turner is Associate Professor in the Department of Integrated Studies in Education,
McGill University. Her main focus and commitment is language testing/assessment within educational
settings.
Contributors
xiii
Dianne Wall teaches language testing at Lancaster University (UK), and is Head of Research at
Trinity College London examinations board. She specialises in language test construction and
evaluation, and the use of innovation theory to study the washback and impact of new tests.
F. Scott Walters teaches L2 assessment at Southern Connecticut State University. His research
involves L2 pragmatics testing, conversation analysis and assessment-literacy development.
Xiaoming Xi is a Senior Research Scientist in Educational Testing Service. Her areas of
interest include factors affecting performance on speaking tests, issues related to speech scoring,
automated scoring of speech, and validity and fairness issues in the broader context of test use.
Chengbin Yin specializes in Second and Foreign Language Education at the University of Maryland
at College Park. Her current research focuses on assessment design from a sociocognitive/
interactionist perspective and measurement models in educational assessment.
Richard F. Young is Professor of English at the University of Wisconsin-Madison. His research
is concerned with the relationship between language and social context, with a particular focus
on variation in interlanguage, morphology, talking and testing, language and interaction, and
discursive practices in language teaching and learning.
Contributors
xiv