The Routledge Handbook of Language Testing

The Routledge Handbook of

Language Testing

The Routledge Handbook of Language Testing offers a critical and comprehensive overview of

language testing and assessment within the fields of applied linguistics and language study.

An understanding of language testing is essential for applied linguistic research, language

education, and a growing range of public policy issues. This handbook is an indispensable

introduction and reference to the study of the subject. Specially commissioned chapters by

leading academics and researchers of language testing address the most important topics facing

researchers and practitioners, including:

An overview of the key issues in language testing

Key research methods and techniques in language test validation

The social and ethical aspects of language testing

The philosophical and historical underpinnings of assessment practices

The key literature in the field

Test design and development practices through use of practical examples

The Routledge Handbook of Language Testing is the ideal resource for postgraduate students, language

teachers and those working in the field of applied linguistics.

Glenn Fulcher is Reader in Education (Applied Linguistics and Language Testing) at the

University of Leicester in the United Kingdom. His research interests include validation theory,

test and rating scale design, retrofit issues, assessment philosophy, and the politics of testing.

Fred Davidson is a Professor of Linguistics at the University of Illinois at Urbana-Champaign.

His interests include language test development and the history and philosophy of educational

and psychological measurement.

This page intentionally left blank

The Routledge Handbook of

Language Testing

Edited by

Glenn Fulcher and Fred Davidson

First published 2012

by Routledge

2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN

Simultaneously published in the USA and Canada

by Routledge

711 Third Avenue, New York, NY 10017

Routledge is an imprint of the Taylor & Francis Group, an informa business

the contributors.

The right of the editors to be identified as the authors of the editorial material, and of the

authors for their individual chapters, has been asserted in accordance with sections 77 and 78

of the Copyright, Designs and Patents Act 1988.

form or by any electronic, mechanical, or other means, now known or hereafter invented,

including photocopying and recording, or in any information storage or retrieval system,

without permission in writing from the publishers.

Trademark notice: Product or corporate names may be trademarks or registered trademarks, and

are used only for identification and explanation without intent to infringe.

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

Library of Congress Cataloging in Publication Data

The Routledge handbook of language testing / edited by Glenn Fulcher and Fred

Davidson.

p. cm.

Includes bibliographical references and index.

1. Language and languages–Ability testing. I. Fulcher, Glenn. II. Davidson, Fred.

P53.4.R68 2011

418.0028'7–dc23

2011019617

ISBN: 978-0-415-57063-3 (hbk)

ISBN: 978-0-203-18128-7 (ebk)

Typeset in Times New Roman

by Taylor & Francis Books

Contents

List of illustrations ix

List of contributors xi

Introduction 1

Glenn Fulcher and Fred Davidson

PART I

Validity 19

1 Conceptions of validity 21

Carol A. Chapelle

2 Articulating a validity argument 34

Michael Kane

3 Validity issues in designing accommodations for English

language learners 48

Jamal Abedi

PART II

Classroom assessment and washback 63

4 Classroom assessment 65

Carolyn E. Turner

5 Washback 79

Dianne Wall

6 Assessing young learners 93

Angela Hasselgreen

7 Dynamic assessment 106

Marta Antón

8 Diagnostic assessment in language classrooms 120

Eunice Eunhee Jang

PART III

The social uses of language testing 135

9 Designing language tests for specific social uses 137

Carol Lynn Moder and Gene B. Halleck

10 Language assessment for communication disorders 150

John W. Oller, Jr.

11 Language assessment for immigration and citizenship 162

Antony John Kunnan

12 Social dimensions of language testing 178

Richard F. Young

PART IV

Test specifications 195

13 Test specifications and criterion referenced assessment 197

Fred Davidson

14 Evidence-centered design in language testing 208

Robert J. Mislevy and Chengbin Yin

15 Claims, evidence, and inference in performance assessment 223

Steven J. Ross

PART V

Writing items and tasks 235

16 Item writing and writers 237

Dong-il Shin

17 Writing integrated items 249

Lia Plakans

18 Test-taking strategies and task design 262

Andrew D. Cohen

Contents

PART VI

Prototyping and field tests 279

19 Prototyping new item types 281

Susan Nissan and Mary Schedl

20 Pre-operational testing 295

Dorry M. Kenyon and David MacGregor

21 Piloting vocabulary tests 307

John Read

PART VII

Measurement theory and practice 321

22 Classical test theory 323

James Dean Brown

23 Item response theory 336

Gary J. Ockey

24 Reliability and dependability 350

Neil Jones

25 The generalisability of scores from language tests 363

Rob Schoonen

26 Scoring performance tests 378

Glenn Fulcher

PART VIII

Administration and training 393

27 Quality management in test production and administration 395

Nick Saville

28 Interlocutor and rater training 413

Annie Brown

29 Technology in language testing 426

Yasuyo Sawaki

30 Validity and the automated scoring of performance tests 438

Xiaoming Xi

Contents

vii

PART IX

Ethics and language policy 453

31 Ethical codes and unexpected consequences 455

Alan Davies

32 Fairness 469

F. Scott Walters

33 Standards-based testing 479

Thom Hudson

34 Language testing and language management 495

Bernard Spolsky

Index 506

Contents

viii

Illustrations

Tables

7.1 Examiner–student discourse during DA episodes 114

8.1 Incremental granularity in score reporting (for the student JK) 127

14.1 Summary of evidence-centered design layers in the context of

language testing 210

14.2 Design pattern attributes and relationships to assessment argument 212

14.3 A design pattern for assessing cause and effect reasoning reading comprehension 213

14.4 Steps taken to redesign TOEFL iBT and TOEIC speaking and writing tests and

guided by layers in Evidence-Centered Design 217

23.1 Test taker response on a multiple-choice listening test 339

25.1 Scores for 10 persons on a seven-item test (fictitious data) 366

25.2 Analysis of variance table for the sample data 367

25.3 Scores for 15 persons on two speaking tasks rated twice on a 30-point scale

(fictitious data) 369

25.4 Analysis of variance table (ptr) for the sample data 2 (Table 25.3) 370

25.5 Analysis of variance table (p(r:t)) for the sample data 2 (Table 25.3), with raters

nested within task 371

26.1 Clustering scores by levels 380

33.1 Interagency Language Roundtable Levels and selected contexts – speaking 481

33.2 The Interagency Language Roundtable (ILR) to the American Council on the

Teaching of Foreign Languages (ACTFL) concordance 482

33.3 Foreign Service Institute descriptor for Level 2 speaking 483

33.4 American Council on the Teaching of Foreign Languages Advanced descriptor 483

33.5 Example standards for foreign language learning 484

33.6 Example descriptors for the intermediate learner range of American Council on

the Teaching of Foreign Languages K-12 Guidelines 485

33.7 Canadian Language Benchmarks speaking and listening competencies 486

33.8 Example global performance descriptor and performance conditions from the

Canadian Language Benchmarks 486

33.9 Common European Framework—global scale 488

33.10 Example descriptors for the Common European Framework of Reference

for Languages 489

33.11 California English Language Development Standards Listening and Speaking

Strategies and Applications 490

33.12 California English Language Development Test blueprint from grade 2 491

Figures

8.1 Distracter characteristics curves (DCCs) 126

13.1 A sample score report for the internet-based TOEFL, Reading Subsection 198

13.2 The evolutionary role of test specs in test development 203

14.1 Toulmin’s general structure for arguments 209

14.2 Extended Toulmin diagram for assessment 212

19.1 Schematic table 287

19.2 Summary task 288

19.3 Outline completion task 289

21.1 Sample item from the WAT for Dutch Children 315

23.1 Item characteristic curves for three items from the 1PL Rasch model 338

23.2 Item characteristic curves for 1PL, 2PL, and 3PL 342

23.3 Category response curves for a five-point rating scale 344

24.1 Some sources of error in a test 352

25.1 Generalisability coefficients (Eρ2

) for different numbers of tasks and raters 370

26.1 Empirically derived, binary-choice, boundary definition scale schematic 385

27.1 Core processes 403

27.2 The assessment cycle 405

27.3 The assessment cycle showing periodic test review 408

32.1 Worksheet grid 476

Box

13.1 A sample test specification 199

Illustrations

Contributors

Jamal Abedi is a Professor at the School of Education of the University of California, Davis,

and a research partner at the National Center for Research on Evaluation, Standards, and Student

Testing. His research interests include accommodations and classification for English language learners,

and comparability of alternate assessments for students with significant cognitive disabilities.

Marta Antón is Associate Professor of Spanish at Indiana University–Purdue University Indianapolis and Research Fellow at the Indiana Center for Intercultural Communication. She has published on classroom interaction, sociocultural theory, dynamic assessment and Spanish sociolinguistics.

Annie Brown is a Principal Research Fellow with the Australian Council for Educational

Research. Her research interests include rater and interlocutor behaviour and the assessment of

oral proficiency

James Dean (J.D.) Brown, Professor and Chair in the Department of Second Language

Studies at the University of Hawai’i at Manoa, has worked in places ranging from Brazil to

Yugoslavia, and has published numerous articles and books on language testing, curriculum

design, and research methods.

Carol A. Chapelle is Distinguished Professor in Liberal Arts and Sciences at Iowa State University, where she teaches courses in applied linguistics. She is author of books on technology

for language learning and assessment.

Andrew D. Cohen is Professor of Second Language Studies at the University of Minnesota. His

research interests are in language learner strategies, language assessment, pragmatics, and research

methods.

Alan Davies is Emeritus Professor of Applied Linguistics at the University of Edinburgh. His

research interests include language proficiency testing; language testing as a research methodology;

the role of ethics in applied linguistics and language testing; and the native speaker construct.

Gene B. Halleck is Professor of Linguistics/TESL at Oklahoma State University. She has designed a

set of diagnostic tests, the Oral Proficiency Tests for Aviation, to accompany an aviation English

curriculum designed for the US Federal Aviation Administration.

Angela Hasselgreen is Professor of Language Didactics at Bergen University College, Faculty

for Teacher training, Norway. She has carried out research and published extensively on the

subject of assessing young language learners.

Thom Hudson is Professor of Second Language Studies at the University of Hawai’i. His research

has concentrated on second language reading, testing, language for specific purposes, and program

development.

Eunice Eunhee Jang is Associate Professor at Ontario Institute for Studies in Education,

University of Toronto. Her research areas include diagnostic assessment for improving English

language learners’ English development, validation of English proficiency descriptors-based

assessment for K-12 English language learners, evaluation of school effectiveness in challenging

circumstances, and fairness issues.

Neil Jones is currently Director of the European Survey on Language Competences within the

Research and Validation Unit at the University of Cambridge, ESOL Examinations. His interests

include applying IRT to item banking, analysis and computer-adaptive testing, formative assessment,

scaling, and cross-language equating and standard-setting within multilingual proficiency frameworks.

Michael Kane holds the Samuel J. Messick Chair in Validity at Educational Testing Service in

Princeton, NJ. His main research interests are in validity theory and practice, in test score precision and errors of measurement, in differential performance and bias across racial/ethnic and

other groups, and in standard setting.

Dorry M. Kenyon, Vice-President and Director of the Language Testing Division at the

Center for Applied Linguistics in Washington, DC, is particularly interested in the application of

new technologies to the development and delivery of content and language assessments.

Antony John Kunnan is a Professor of Education at California State University, Los Angeles

and Honorary Professor of Education at the University of Hong Kong. His interests include test

fairness, test evaluation, structural equation modelling, differential item functioning, and language

requirements for immigration and citizenship.

David MacGregor is Manager of the Academic Language Testing Research and Development

Team at the Center for Applied Linguistics. His research interests include exploring the construct

of academic language and applications of cognitive linguistics to language testing.

Robert J. Mislevy is Professor of Measurement, Statistics and Evaluation and Affiliated Professor

of Second Language Acquisition at the University of Maryland at College Park. His research applies

developments in psychology, statistics, and technology to practical problems in educational assessment.

Carol Lynn Moder is Professor of Linguistics/TESL at Oklahoma State University. She has designed

aviation English assessments for the International Training Division of the US Federal Aviation

Administration and for Ordinate Corporation. She served as a consultant to the International Civil

Aviation Organization in 2005.

Susan Nissan is a Director of English Language Learning Assessments at Educational Testing

Service. Her current research interests include assessing listening, the use of corpora in assessment

and standard setting.

Gary J. Ockey is Director of Assessment in the English Language Institute at Kanda University of

International Studies in Chiba, Japan. He has published numerous theoretical and quantitative

articles in the field of language assessment.

Contributors

xii

John W. Oller, Jr. is the University of Louisiana Hawthorne Regents Professor IV. His current research focuses on communication disorders attributable to disrupted biological and social

control systems from genetics to human languages.

Lia Plakans is Assistant Professor in Foreign Language/ESL Education at the University of

Iowa. Her research focuses on assessing integrated skills and connections in L2 reading and

writing, language planning and policy, and second language learning.

John Read is an Associate Professor in Applied Language Studies at the University of Auckland, New Zealand. His research interests are in vocabulary assessment and the testing of English

for academic and professional purposes.

Steven J. Ross is Professor of Second Language Acquisition at the University of Maryland’s

School of Language, Literature and Culture. His research interests are focused on assessment of

language proficiency and performance, and how assessment methods provide the evidential basis

for models of instructed and naturalistic second language acquisition.

Nick Saville is Director of Research and Validation at the University of Cambridge, ESOL

Examinations. His research includes the development of Reference Level Descriptions for English to

supplement the Common European Framework of Reference (CEFR) and language assessment in

the context of migration.

Yasuyo Sawaki is Associate Professor of foreign language education and applied linguistics at

Waseda University in Tokyo, Japan. She has research interests in diverse topics in second/foreign language assessment including test validation, diagnosing language ability, test preparation,

and using technology in language assessment.

Mary Schedl is a Second Language Test Specialist at Educational Testing Service. Her current

research interests include assessing reading comprehension, especially the interrelationship of

specific text characteristics and item difficulty.

Rob Schoonen is Associate Professor of Second Language Acquisition at the University of

Amsterdam. His main research interests are second and foreign language acquisition, models of

language proficiency, language testing and research methodology.

Dong-il Shin is Professor in the Department of English Language and Literature at ChungAng University, Seoul, South Korea. His research interests include test development and validation, the interfaces between language testing and other academic disciplines, and the social

dimensions of English language assessment in South Korea.

Bernard Spolsky is Professor emeritus in the English Department at Bar-Ilan University. He

has published widely in applied linguistics, with a focus on language assessment, language policy

and language management.

Carolyn E. Turner is Associate Professor in the Department of Integrated Studies in Education,

McGill University. Her main focus and commitment is language testing/assessment within educational

settings.

Contributors

xiii

Dianne Wall teaches language testing at Lancaster University (UK), and is Head of Research at

Trinity College London examinations board. She specialises in language test construction and

evaluation, and the use of innovation theory to study the washback and impact of new tests.

F. Scott Walters teaches L2 assessment at Southern Connecticut State University. His research

involves L2 pragmatics testing, conversation analysis and assessment-literacy development.

Xiaoming Xi is a Senior Research Scientist in Educational Testing Service. Her areas of

interest include factors affecting performance on speaking tests, issues related to speech scoring,

automated scoring of speech, and validity and fairness issues in the broader context of test use.

Chengbin Yin specializes in Second and Foreign Language Education at the University of Maryland

at College Park. Her current research focuses on assessment design from a sociocognitive/

interactionist perspective and measurement models in educational assessment.

Richard F. Young is Professor of English at the University of Wisconsin-Madison. His research

is concerned with the relationship between language and social context, with a particular focus

on variation in interlanguage, morphology, talking and testing, language and interaction, and

discursive practices in language teaching and learning.

Contributors

xiv

Thư viện tri thức trực tuyến

The Routledge Handbook of Language Testing

Nội dung xem thử

Mô tả chi tiết

Tài liệu tương tự (6)

The Routledge handbook of Neoplatonism

The routledge handbook of festivals

The Routledge Handbook of Language and Culture

The Routledge Handbook of Discourse Analysis (Routledge handbooks in applied linguistics)

The Routledge Handbook of Language and Intercultural Communication

The routledge handbook of teaching english to young learners