Monday 26 November 2012

GRADED ASSIGMENT 1:(b) Summary of the Article





Title: Talking to Computers: An Empirical Investigation
Journal: Carnegie Mellon University Research Showcase, Computer Science Department
Link :


Introduction:
            This research article is written by Alexander G. Hauptmann and Alexander I. Rudnicky from Department of Computer Science Carnegie-Mellon University. Basically, it describes the empirical study of man-computer speech interaction.
The objectives of this study are:
       1)      To study how speech input to a computer differs from interpersonal spoken communication.
       2)      To describe the differences between speech-to-computer mode, speech-to human mode and typing-to-computer mode.

On the other hand, the purpose of this study is:
To compare three types of communication mode; speech-to-computer mode, speech-to-human mode and typing-to-computer mode and also to prove that speech to a computer is not as ill-formed as one would perceive.
Statement of problem:
Differences between three communication modes; speech-to-computer, speech-to-human and typing-to-computer are difficult to be found and the misconception that speech-to-computer is much ill-formed than speech-to-human.
Research questions:
      1)      How does speech input to a computer differs from interpersonal spoken communication?
      2)      What are the differences between speech-to-computer mode, speech-to-human mode and typing-to-computer mode?
Theoretical framework:
There is no specific theory being used for this investigation.
Sample: 
A total of forty subjects were taken from a population of electronic mail users in the Computer Department at C-MU. Ten of them were set up as pilot subjects to test and debug the experimental setup. The remaining thirty subjects were classified as non-proficient users to their electronic mail experience. Furthermore, some of the subjects had no experience with the mail system being used in this study.
Method:
Methodology
There are three modes of communication which are being tested in this investigation; speech-to-computer mode, speech-to-human-mode and typing-to-computer mode. Ten subjects were randomly assigned to each communication mode and each subject completed three sessions. The data was then analysed accordingly to each mode.
In the speech-to-computer mode, subjects were told that the computer could comprehend their utterances with an occasional help by the experimenter. The experimenter was in the adjacent room and transcribed all commands into equivalent system commands, assuming or pretending the system itself had comprehended the utterances. However, when the subjects were in their editing mode, speech input was disabled and subjects had to edit manually using the keyboard. Subjects were asked to speak all commands to the system but the way of speaking is up to the subjects.
In the speech-to-human mode, the experimenter is place in the same room as the subject and translated their utterances into typed commands to the electronic mail system. However, when the subjects were in their editing mode, the speech input is again disabled and subjects had to edit manually using the keyboard.
In the typing-to-computer mode, subjects were left to believe that a computer natural language mail system could interpret their typing. This mode was the same as the speech-to-computer mode except the presence of speech input. The subjects were told to type everything themselves. The system or in other words the invisible experimenter was able to process the subjects input.
Procedure
Subjects were first given a questionnaire designed to determine their level of familiarity with electronic mail. The subjects were given background information and an instruction based on their communication mode. A total of nine tasks were distributed to the subjects and each task has something to with the mail database file the subjects were working on. Task which given included replying to mail, locating information about previously sent mail and adding a carbon copy of some new mail to the file. Each subjects received the same tasks in the same order. The first three sessions were taken as training sessions thus; it is not included in the final analysis as it was meant to double check the equipment and to ensure the subject has understood the tasks. Furthermore, an attitude questionnaire was given to the subjects to assess the subjects’ feedback upon the particular interaction mode. A time stamped time screen image together with the voice commands was recorded in each session and the videotape recordings were transcribed. Besides that, the typed input from the typing mode provided a comparable data. The total time taken for each task was also recorded.
Instruments for data gathering and instruments for data analysis:
The data collected was put into four classifications which are attitude, communication, errors and syntax. Styles of interaction are also emphasized. The data collected are also transcribed and put in a form of summary ANOVA table.
Findings:
            A sample transcript for a subject can be found in Figures 1, 2 and 3. Table 1 summarizes the results of the statistical analyses performed on the quantitative data. A total of 3233 words were spoken by the subjects in 708 utterances. The total vocabulary consisted of 304 distinct words.
  • Attitude: Subjects felt positive about the experiment, as indicated by the mean score of 30.3. There was no likely to have influence between the three groups of subjects.
  • Communicative Variables: The number of utterances per session and the time to completion were no big different for the three groups. However, the total number of words used to solve the tasks showed major differences. Two speech groups; speak-to-computer takes average of 60.35 words while speak-to-human approximately 65.5 words average. Both considerably using more words than typing group which took up the average of only 36.8 words.  The utterance was also different between groups.  Speech-to-computer contained the longest utterances at average 6.10 words. Speech-to-human average 5.45 words and the typed lines group only 3.21 words average. The number of distinct words used was also different in the three communication modes. The typing condition participants needed only 23.75 distinct words to complete a session. The speech-to- computer and speech-to-human subjects used 32.7 and 36.65 distinct words to complete a session.
  • Error Variables: Word repetitions percentage was not significant between the groups. The relative number of noise words like ‘umm’, ‘ahh’ and ‘ohh’ was obvious over the communication modes. The typed mode had zero, the speech-to-computer mode consist 4% per thousand and the speech-to-human mode averaged 15% noise per thousand words.
  • Syntax Variables: There was no significant difference between relative frequencies of pronoun usage. The frequency of pronouns, subjects did increase significantly in the speech conditions. The frequency per word spoken went form 0.048 in speech-to-computer group and 0.021 in speech-to-human group down to 0.008 per word in the typing group.

Conclusion:
            In conclusion, some of the findings reported in this paper seem to contradict with the preliminary impressions by Werner. He concludes that computer discourse is much less structured than seen in the present experiment. We believe that these different and somewhat contradictory experiences point to the crucial importance of task definition in the success of a speech recognition system. A successful speech recognition application requires careful task analysis, followed by equally careful language and environment design. Even though people interact with the computer in a more disciplined way, a number of purely speeches related phenomena were still observed. Thus, the subjects were more likely to stick to their familiar set of commands in the familiar (typed) interaction mode, while they used more natural English-like ways of phrasing utterances in the two speech conditions.
Some of these differences in communication modes, like the increased use of pronouns in the discourse, represent a quantitative shift in the use of language. The principles of natural language processing systems can be applied to these phenomena in typed input situations should also be adaptable in the spoken communication mode. This adaptation is by no means trivial, as pointed out by Hayes, Hauptmann, Carbonell and Tomita (1986).

Reflection on Talking to Computers: An Empirical Investigation:
The use of technology in education has closely mirrored the development of the personal computer. Since their introduction in the late seventies, personal computers have developed in speed, power and ease of use. Many early innovations in educational technology grew out of a desire to help students with various physical and learning disabilities overcome barriers to success in school. Among the many innovative tools, programs that converted printed text into audible speech have been among the most popular. Although originally designed for students who were visually impaired or had learning difficulties, educators soon realized that text-to-speech software could benefit students with a wide range of learning needs, including language learners.
Based on this investigation, it is proven that we can use these three approaches; speech-to-computer, speech-to-human and typing-to-computer in learning language especially second language. Nowadays, people are more concerned with speech-to-human as it is the prominent mode in learning language and neglecting the other two modes of learning language. Speech-to-computer can help learners to improve in multiple ways as in their grammar, vocabulary and syntax as the findings of this investigation shows that the subjects are more formal and use the correct ways of speaking with the computer. Thus, learners should be exposed to the use of computer in class to sharpen their speaking skills.

REFLECTION:
Salaam beautiful souls! What do we feel upon getting this assignment (summary of article regarding speaking skill)? Honestly, it is not an easy task. It may seemed simple as it just a summary but after going through the whole process of searching an empirical research and to find a good article which emphasizes on speaking skills; it is quite a tough process altogether. At first, we searched articles on jurn.org but we couldn’t find an empirical yet comprehensive work on speaking skills using computer application (CALL). Then, we started searching using other websites and at last we found this article; Talking to Computers using Google search engines. We thought our job has become easier when we have found the best article to start with but, we were wrong as we found difficulties in comprehending the article itself. The article is too technical so we had to read it many times and highlight the key points. However, after consulting Dr Rozina; we got an idea on to do the summary and we managed to deduce the gist of the article. The article is an investigation of three modes of communication which are speech-to-computer, speech-to-human and typing-to-computer. As a human being, we might think that it is easier to communicate with human beings rather than a tool or a machine but, this investigation shows how human beings can actually produce better speech in terms of vocabulary, grammar and syntax when communicating with a computer than with human interaction. This is because humans tend to think that the computers have insufficient knowledge than themselves so they used proper and formal speech to make the computer understand their commands. Hence, speech-to-computer has the most organized speech than speech-to-human as the subjects know that human can easily comprehend one another even without proper language. In conclusion, this article is a big help in terms of making us as a group aware that speech-to-computer can be a reliable mode to communication especially for second language learners of English. 

0 comments:

Post a Comment