Title: Talking to Computers: An
Empirical Investigation
Journal: Carnegie Mellon University
Research Showcase, Computer Science Department
Link :
Introduction:
This
research article is written by Alexander G. Hauptmann and Alexander I. Rudnicky
from Department of Computer Science Carnegie-Mellon University. Basically, it
describes the empirical study of man-computer speech interaction.
The objectives of this study are:
1) To
study how speech input to a computer differs from interpersonal spoken
communication.
2) To
describe the differences between speech-to-computer mode, speech-to human mode
and typing-to-computer mode.
On the other hand, the purpose of this study
is:
To compare three
types of communication mode; speech-to-computer mode, speech-to-human mode and
typing-to-computer mode and also to prove that speech to a computer is not as
ill-formed as one would perceive.
Statement of problem:
Differences
between three communication modes; speech-to-computer, speech-to-human and
typing-to-computer are difficult to be found and the misconception that
speech-to-computer is much ill-formed than speech-to-human.
Research questions:
1) How
does speech input to a computer differs from interpersonal spoken
communication?
2) What
are the differences between speech-to-computer mode, speech-to-human mode and
typing-to-computer mode?
Theoretical framework:
There is no
specific theory being used for this investigation.
Sample:
A total of forty
subjects were taken from a population of electronic mail users in the Computer
Department at C-MU. Ten of them were set up as pilot subjects to test and debug
the experimental setup. The remaining thirty subjects were classified as
non-proficient users to their electronic mail experience. Furthermore, some of
the subjects had no experience with the mail system being used in this study.
Method:
Methodology
There are three
modes of communication which are being tested in this investigation;
speech-to-computer mode, speech-to-human-mode and typing-to-computer mode. Ten
subjects were randomly assigned to each communication mode and each subject
completed three sessions. The data was then analysed accordingly to each mode.
In the
speech-to-computer mode, subjects were told that the computer could comprehend
their utterances with an occasional help by the experimenter. The experimenter
was in the adjacent room and transcribed all commands into equivalent system
commands, assuming or pretending the system itself had comprehended the
utterances. However, when the subjects were in their editing mode, speech input
was disabled and subjects had to edit manually using the keyboard. Subjects
were asked to speak all commands to the system but the way of speaking is up to
the subjects.
In the
speech-to-human mode, the experimenter is place in the same room as the subject
and translated their utterances into typed commands to the electronic mail
system. However, when the subjects were in their editing mode, the speech input
is again disabled and subjects had to edit manually using the keyboard.
In the
typing-to-computer mode, subjects were left to believe that a computer natural
language mail system could interpret their typing. This mode was the same as
the speech-to-computer mode except the presence of speech input. The subjects
were told to type everything themselves. The system or in other words the
invisible experimenter was able to process the subjects input.
Procedure
Subjects were
first given a questionnaire designed to determine their level of familiarity
with electronic mail. The subjects were given background information and an
instruction based on their communication mode. A total of nine tasks were
distributed to the subjects and each task has something to with the mail
database file the subjects were working on. Task which given included replying
to mail, locating information about previously sent mail and adding a carbon
copy of some new mail to the file. Each subjects received the same tasks in the
same order. The first three sessions were taken as training sessions thus; it
is not included in the final analysis as it was meant to double check the
equipment and to ensure the subject has understood the tasks. Furthermore, an
attitude questionnaire was given to the subjects to assess the subjects’
feedback upon the particular interaction mode. A time stamped time screen image
together with the voice commands was recorded in each session and the videotape
recordings were transcribed. Besides that, the typed input from the typing mode
provided a comparable data. The total time taken for each task was also
recorded.
Instruments for data gathering and
instruments for data analysis:
The data
collected was put into four classifications which are attitude, communication,
errors and syntax. Styles of interaction are also emphasized. The data
collected are also transcribed and put in a form of summary ANOVA table.
Findings:
A
sample transcript for a subject can be found in Figures 1, 2 and 3. Table 1
summarizes the results of the statistical analyses performed on the
quantitative data. A total of 3233 words were spoken by the subjects in 708
utterances. The total vocabulary consisted of 304 distinct words.
- Attitude: Subjects felt positive about
the experiment, as indicated by the mean score of 30.3. There was no
likely to have influence between the three groups of subjects.
- Communicative Variables: The number of utterances per
session and the time to completion were no big different for the three
groups. However, the total number of words used to solve the tasks showed
major differences. Two speech groups; speak-to-computer takes average of
60.35 words while speak-to-human approximately 65.5 words average. Both
considerably using more words than typing group which took up the average
of only 36.8 words. The utterance
was also different between groups. Speech-to-computer contained the longest
utterances at average 6.10 words. Speech-to-human average 5.45 words and
the typed lines group only 3.21 words average. The number of distinct
words used was also different in the three communication modes. The typing
condition participants needed only 23.75 distinct words to complete a
session. The speech-to- computer and speech-to-human subjects used 32.7
and 36.65 distinct words to complete a session.
- Error Variables: Word repetitions percentage
was not significant between the groups. The relative number of noise words
like ‘umm’, ‘ahh’ and ‘ohh’ was obvious over the communication modes. The
typed mode had zero, the speech-to-computer mode consist 4% per thousand
and the speech-to-human mode averaged 15% noise per thousand words.
- Syntax Variables: There was no significant
difference between relative frequencies of pronoun usage. The frequency of
pronouns, subjects did increase significantly in the speech conditions.
The frequency per word spoken went form 0.048 in speech-to-computer group
and 0.021 in speech-to-human group down to 0.008 per word in the typing
group.
Conclusion:
In
conclusion, some of the findings reported in this paper seem to contradict with
the preliminary impressions by Werner. He concludes that computer discourse is
much less structured than seen in the present experiment. We believe that these
different and somewhat contradictory experiences point to the crucial
importance of task definition in the success of a speech recognition system. A successful
speech recognition application requires careful task analysis, followed by
equally careful language and environment design. Even though people interact
with the computer in a more disciplined way, a number of purely speeches
related phenomena were still observed. Thus, the subjects were more likely to
stick to their familiar set of commands in the familiar (typed) interaction
mode, while they used more natural English-like ways of phrasing
utterances in the two speech conditions.
Some of these differences in
communication modes, like the increased use of pronouns in the discourse,
represent a quantitative shift in the use of language. The principles of
natural language processing systems can be applied to these phenomena in typed
input situations should also be adaptable in the spoken communication mode.
This adaptation is by no means trivial, as pointed out by Hayes, Hauptmann,
Carbonell and Tomita (1986).
Reflection on Talking to Computers: An Empirical
Investigation:
The use of technology in education has closely mirrored the development
of the personal computer. Since their introduction in the late seventies,
personal computers have developed in speed, power and ease of use. Many early
innovations in educational technology grew out of a desire to help students
with various physical and learning disabilities overcome barriers to success in
school. Among the many innovative tools, programs that converted printed text
into audible speech have been among the most popular. Although originally
designed for students who were visually impaired or had learning difficulties,
educators soon realized that text-to-speech software could benefit students
with a wide range of learning needs, including language learners.
Based on
this investigation, it is proven that we can use these three approaches;
speech-to-computer, speech-to-human and typing-to-computer in learning language
especially second language. Nowadays, people are more concerned with
speech-to-human as it is the prominent mode in learning language and neglecting
the other two modes of learning language. Speech-to-computer can help learners
to improve in multiple ways as in their grammar, vocabulary and syntax as the
findings of this investigation shows that the subjects are more formal and use
the correct ways of speaking with the computer. Thus, learners should be exposed
to the use of computer in class to sharpen their speaking skills.
REFLECTION:
Salaam beautiful
souls! What do we feel upon getting this assignment (summary of article
regarding speaking skill)? Honestly, it is not an easy task. It may seemed
simple as it just a summary but after going through the whole process of
searching an empirical research and to find a good article which emphasizes on
speaking skills; it is quite a tough process altogether. At first, we searched
articles on jurn.org but we couldn’t find an empirical yet comprehensive work
on speaking skills using computer application (CALL). Then, we started
searching using other websites and at last we found this article; Talking to
Computers using Google search engines. We thought our job has become easier
when we have found the best article to start with but, we were wrong as we
found difficulties in comprehending the article itself. The article is too
technical so we had to read it many times and highlight the key points.
However, after consulting Dr Rozina; we got an idea on to do the summary and we
managed to deduce the gist of the article. The article is an investigation of
three modes of communication which are speech-to-computer, speech-to-human and
typing-to-computer. As a human being, we might think that it is easier to
communicate with human beings rather than a tool or a machine but, this
investigation shows how human beings can actually produce better speech in
terms of vocabulary, grammar and syntax when communicating with a computer than
with human interaction. This is because humans tend to think that the computers
have insufficient knowledge than themselves so they used proper and formal
speech to make the computer understand their commands. Hence,
speech-to-computer has the most organized speech than speech-to-human as the
subjects know that human can easily comprehend one another even without proper
language. In conclusion, this article is a big help in terms of making us as a
group aware that speech-to-computer can be a reliable mode to communication
especially for second language learners of English.
0 comments:
Post a Comment