The present invention relates to a speech recognition system for teaching assistance, and more particularly of using a automatic speech recognition (ASR) classroom server and a listener-typist to provide caption service in classroom for the hearing impaired.
In ordinary classrooms, hearing impaired students have problems in class, because there is no monitor to directly display the captions of the teacher's lecture content. In various presentations and conferences, the hearing impaired cannot participate because there is no monitor to directly display captions.
Therefore, setting up captions for the hearing impaired that can show what the teacher or speaker says is a great boon for the hearing impaired.
Nowadays, some conferences use a listener-typist to type the content of the speaker with the computer on the spot and display it on the computer screen as captions, so that the hearing impaired can understand the situation on the spot. However, the listener-typist spends a lot of energy listening to the content of the speaker. Once the working hours are too long, there may be missed sentences and typos. Therefore, a more complete listener-typist solution must be provided.
The object of the present invention is to provide a speech recognition system for teaching assistance, to provide caption service for the hearing impaired in the classroom. The contents of the present invention are described as below.
This system includes a speaker and a automatic speech recognition (ASR) classroom server, a listener-typist and a computer, a hearing impaired and a live screen, Connect the ASR classroom server, the computer and the live screen with a local area network. All are in the same classroom.
The automatic speech recognition (ASR) classroom server includes: a microphone input; an open source speech recognition toolkit for speech recognition and signal processing; a web server is responsible for providing the interface of the web page, which is transmitted to the computer and the live screen through the HTTP protocol; a recording module is used for the playback function of the listener-typist.
The audio of the speaker is sent by the microphone input to the ASR classroom server for being converted into text caption, then the text caption is sent to the live screen of the hearing impaired and the computer of the listener-typist together with the speaker's audio, so that the hearing impaired can read the text caption spoken by the speaker. If the text caption has some errors, the listener-typist can correct immediately on the computer.
The ASR classroom server 2 uses an open source speech recognition toolkit Kaldi ASR 9 for speech recognition and signal processing, which can be obtained freely under Apache License v2.0.
The ASR classroom server 2 has to be equipped with a web server 10, which is an interface for providing the web and for being delivered to clients through HTTP (web browser). The clients mean the computer 4 and the live screen 6. The ASR classroom server 2 has a recording module 11 for being used by the listener-typist 3 to conduct a playback function.
Referring to
Referring to
The listener-typist 3 is set up to have the authority of reading and writing in the ASR classroom server 2 so as to be capable to revise the text generated by the Kaldi ASR 9 in the web server 10. Each section of the text has a label, for example, if the listener- typist 3 clicks two times on the C section of the text, the web server 10 will follow the instructions of the related label to ask the audio record 13 to playback the paragraph of the N3 second with time length Z seconds, so that the listener-typist 3 can recognize the contents spoken by the speaker 1 for amending the text.
Referring to
The text caption 61 on the live screen 6 reading by the hearing impaired 5 is a convertion of the lecturing contents of the speaker 1 by Kaldi ASR 9, usually more than 98% are correct. If the listener- typist 3 finds some. errors, the listener-typist 3 can correct it. The hearing impaired 5 can store the text caption 61 after the class, and the text caption 61 stored is the perfect edition amended by the listener-typist 3.
The scope of the present invention depends upon the following claims, and is not limited by the above embodiments.