Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Quality of Telephone-Based Spoken Dialogue Systems phần 3 ppsx
MIỄN PHÍ
Số trang
49
Kích thước
690.6 KB
Định dạng
PDF
Lượt xem
768

Quality of Telephone-Based Spoken Dialogue Systems phần 3 ppsx

Nội dung xem thử

Mô tả chi tiết

Quality of Human-Machine Interaction over the Phone 75

has later been modified to better predict the effects of ambient noise, quantiz￾ing distortion, and time-variant impairments like lost frames or packets. The

current model version is described in detail in ITU-T Rec. G.107 (2003).

The idea underlying the E-model is to transform the effects of individual im￾pairments (e.g. those caused by noise, echo, delay, etc.) first to an intermediate

‘transmission rating scale’. During this transformation, instrumentally mea￾surable parameters of the transmission path are transformed into the respective

amount of degradation they provoke, called ‘impairment factors’. Three types

of impairment factors, reflecting three types of degradations, are calculated:

All types of degradations which occur simultaneously to the speech signal,

e.g. a too loud connection, quantizing noise, or a non-optimum sidetone,

are expressed by the simultaneous impairment factor Is.

All degradations occurring delayed to the speech signals, e.g. the effects of

pure delay (in a conversation) or of listener and talker echo, are expressed

by the delayed impairment factor Id.

All degradations resulting from low bit-rate codecs, partly also under trans￾mission error conditions, are expressed by the effective equipment impair￾ment factor Ie,eff. Ie,eff takes the equipment impairment factors for the

error-free case, Ie, into account.

These types of degradations do not necessarily reflect the quality dimensions

which can be obtained in a multidimensional auditory scaling experiment. In

fact, such dimensions have been identified as “intelligibility” or “overall clar￾ity”, “naturalness” or “fidelity”, loudness, color of sound, or the distinction

between background and signal distortions (McGee, 1964; McDermott, 1969;

Bappert and Blauert, 1994). Instead, the impairment factors of the E-model have

been chosen for practical reasons, to distinguish between parameters which can

easily be measured and handled in the network planning process.

The different impairment factors are subtracted from the highest possible

transmission rating level Ro which is determined by the overall signal-to-noise

ratio of the connection. This ratio is calculated assuming a standard active

speech level of -26 dB below the overload point of the digital system, cf. the

definition of the active speech level in ITU-T Rec. P.56 (1993), and taking the

SLR and RLR loudness ratings, the circuit noise Nc and N for, as well as the

ambient room noise into account. An allowance for the transmission rating level

is made to reflect the differences in user expectation towards networks differing

from the standard wireline one (e.g. cordless or mobile phones), expressed

by a so-called ‘advantage of access’ factor A. For a discussion of this factor

see Möller (2000). In result, the overall transmission rating factor R of the

connection can be calculated as

76

This transmission rating factor is the principal output of the E-model. It reflects

the overall quality level of the connection which is described by the input param￾eters discussed in the last section. For normal parameter settings

R can be transformed to an estimation of a mean user judgment on a 5-point

ACR quality scale defined in ITU-T Rec. P.800 (1996), using the fixed S-shaped

relationship

Both the transmission rating factor R and the estimated mean opinion score

MOS give an indication of the overall quality of the connection. They can be

related to network planning quality classes defined in ITU-T Rec. G. 109 (1999),

see Table 2.5. For the network planner, not only the overall R value is important,

but also the single contributions (Ro, Is, Id and Ie,eff), because they provide

an indication on the sources of the quality degradations and potential reduction

solutions (e.g. by introducing an echo canceller). Other formulae exist for

relating R to the percentage of users rating a connection good or better (%GoB)

or poor or worse (%PoW).

The exact formulae for calculating Ro, Is, Id, and Ie,eff are given in ITU-T

Rec. G.107 (2003). For Ie and A, fixed values are defined in ITU-T Appendix

I to Rec. G.113 (2002) and ITU-T Rec. G.107 (2003). Another example of a

network planning model is the SUBMOD model developed by British Telecom

(ITU-T Suppl. 3 to P-Series Rec., 1993), which is based on ideas from Richards

(1973).

If the network has already been set up, it is possible to obtain realistic mea￾surements of major parts of the network equipment. The measurements can be

Quality of Human-Machine Interaction over the Phone 77

performed either off-line (intrusively, when the equipment is put out of network

operation), or on-line in operating networks (non-intrusive measurement). In

operating networks, however, it might be difficult to access the user interfaces;

therefore, standard values are taken for this part of the transmission chain. The

measured input parameters or signals can be used as an input to the signal-based

or network planning models (so-called monitoring models). In this way, it be￾comes possible to monitor quality for the specific network under consideration.

Different models and model combinations can be envisaged, and details can

be found in the literature (Möller and Raake, 2002; ITU-T Rec. P.562, 2004;

Ludwig, 2003).

From the principles used by the models, the quality aspects which may be

predicted become obvious. Current signal-based measures predict only one￾way voice transmission quality for specific parts of the transmission channel

that they have been optimized for. These predictions usually reach a high

accuracy because adequate input parameters are available. In contrast to this,

network planning models like the E-model base their predictions on simplified

and perhaps imprecisely estimated planning values. In addition to one-way

voice transmission quality, they cover conversational aspects and to a certain

extent the effects caused by the service and its context of use. All models which

have been described in this section address HHI over the phone. Investigations

on how they may be used in HMI for predicting ASR performance are described

in Chapter 4, and for synthesized speech in Chapter 5.

2.4.2 SDS Specification

The specification phase of an SDS may be of crucial importance for the

success of a service. An appropriate specification will give an indication of

the scale of the whole task, increases the modularity of a system, allows early

problem spotting, and is particularly suited to check the functionality of the

system to be set up. The specification should be initialized by a survey of user

requirements: Who are the potential users, and where, why and how will they

use the service?

Before starting with an exact specification of a service and the underlying

system, the target functionality has to be clarified. Several authors point out that

system functionality may be a very critical issue for the success of a service.

For example, Lamel et al. (1998b) reported that the prototype users of their

French ARISE system for train information did not differentiate between the

service functionality (operative functions) and the system responses which may

be critically determined by the technical functions. In the case that the system

informs the user about its limitations, the system response may be appropriate

under the given constraints, but completely dissatisfying for the user. Thus,

78

systems which are well-designed from a technological and from an interaction

point of view may be unusable because of a restricted functionality.

In order to design systems and services which are usable, human factor issues

should be taken into account early in the specification phase (Dybkjær and

Bernsen, 2000). The specification should cover all aspects which potentially

influence the system usability, including its ease of use, its capability to perform

a natural, flexible and robust dialogue with the user, a sufficient task domain

coverage, and contextual factors in the deployment of the SDS (e.g. service

improvement or economical benefit). The following information needs to be

specified:

Application domain and task. Although developers are seeking application￾independent systems, there are a number of principle design decisions which

are dependent on the specific application under consideration. Within a do￾main, different tasks may require completely differing solutions, e.g. an

information task may be insensible to security requirements whereas the

corresponding reservation may require the communication of a credit card

number and thus may be inappropriate for the speech modality. The applica￾tion will also determine the linguistic aspects of the interaction (vocabulary,

syntax, etc.).

User and task requirements. They may be determined from recordings of

human services if the corresponding situation exists, or via interviews in

case of new tasks which have no prior history in HHI.

Intended user group.

Contextual factors. They may be amongst the most important factors in￾fluencing user’s satisfaction with SDSs, and include service improvement

(longer opening hours, introduction of new functionalities, avoid queues,

etc.) and economical benefits (e.g. users pay less for an SDS service than

for a human one), see Dybkjær and Bernsen (2000).

Common knowledge which will have to be shared between the human user

and the SDS. This knowledge will arise from the application domain and

task, and will have to be specified in terms of an initial vocabulary and lan￾guage model, the required speech understanding capability, and the speech

output capability.

Common knowledge which will have to be shared between the SDS and the

underlying application, and the corresponding interface (e.g. SQL).

Knowledge to be included in the user model, cf. the discussion of user

models in Section 2.1.3.4.

Tải ngay đi em, còn do dự, trời tối mất!
Quality of Telephone-Based Spoken Dialogue Systems phần 3 ppsx | Siêu Thị PDF