This application claims the benefit of Taiwan Application No. 112150753, filed on Dec. 26, 2023, the entirety of which is incorporated by reference herein.
The present invention relates to spectacle-type display devices, and, in particular, to spectacle-type display devices configured to display subtitles.
When watching a stage show, hearing-impaired or hard-of-hearing users have a poor viewing experience, because they can hardly hear the lines of dialogue spoken by the actors. To solve this problem, staff members in the theater may operate a caption machine to show corresponding subtitles when they hear the lines. However, this solution requires abundant human resources, and the staff may make mistakes. Thus, how to correctly and immediately display the lines as spoken by the actors is a problem in this field that needs to be solved.
Embodiments of the present disclosure provides a spectacle-type display device, which includes a lens, a micro project device, a memory, and a control unit. The control unit is configured to read a line in the memory. The control unit is further configured to compare an input voice and the line to generate a comparison result. The control unit is further configured to determine to output the line according to the comparison result. The control unit is further configured to control the micro project device to project the line on the lens, in response to a determination to output the line.
The present invention can be more fully understood by reading the subsequent detailed description and examples with references made to the accompanying drawings, wherein:
The following description is made for the purpose of illustrating the general principles of the invention and should not be taken in a limiting sense. The scope of the invention is best determined by reference to the appended claims.
Refer to
The microphone 130 is configured to receive voice and convert the received voice into electronic signals. In some embodiments, the microphone 130 is a directional microphone. The directional microphone is able to receive voice from a certain direction and ignore other voice from the direction other than the certain direction. Thus, using a directional microphone can ignore the voice of other audiences besides the user (who is wearing the spectacle-type display device 100) and achieve a better reception effect. The camera 140 is configured to capture images and convert the captured images into electronic signals. For example, the camera 140 is charge coupled device (CCD) or complementary metal-oxide semiconductor (CMOS) image sensor.
The control unit 150 provides computation and process ability. The control unit 150 is able to perform programs, software, firmware, and modules. Moreover, the control unit 150 is configured to control the frame 110, lenses 120, the microphone 130, and the camera 140 (e.g. using instructions). For example, the control unit 150 may comprise general purpose processor, central process unit, and/or micro control unit.
The voice comparison module 151 is configured to receive the input voice (speech) from the microphone 130. The voice comparison module 151 determines the line (i.e. string) to be output according to the input voice. The position determination module 152 receives the input image from the camera 140. The position determination module 152 is configured to determine the position where the line is displayed (projected) on the lenses 120 according to the input image. The voice comparison module 151 and the position determination module 152 will be described in more detail below.
The memory 160 stores data required by the operations of the control unit 150, such as program codes and lines (string). The control unit 150 is able to access and write to the memory 161. In some embodiments, the memory 161 stores program code. The program codes can be read and operated by the control unit 150 and cause the control unit 150 to operate or implement the voice comparison module 151, the position determination module 152, and methods in accordance with the embodiments of the present disclosure. The memory 160 may comprise non-volatile memories, such as read only memory (ROM) and flash memory. The memory 160 may also comprise volatile memories, such as dynamic random access memory (DRAM) and static random access memory (SRAM).
The micro project device 170 is able to project a given line on a specific position of the lenses 120, under the control of the control unit 150. Thus, the user can simultaneously see the lines and the performance through the lenses 120.
Refer to
In the embodiment shown in
Refer to
Refer to
In step S420, the voice comparison module 151 reads a first line among a plurality of lines (string) stored in the memory. The lines stored in the memory may be referred to as a line library. For example, when the voice comparison module 151 is operated by the control unit 150, the voice comparison module 151 reads the lines stored in the memory 160. Alternatively, when the voice comparison module 151 is operated by the electronic device 200, the voice comparison module 151 reads the lines stored in the memory of the electronic device 200. In other embodiments, the voice comparison module 151 may read the lines stored in a server. These lines are the line as written in the script, and the actors will perform according to the script. In other words, the input voice corresponds to one of the lines in the line library. These lines are stored in the memory in advance (e.g., before the performance starts). In some embodiments, these lines are stored in the memory before step S410 or step S420. In some embodiments, these lines are stored in the memory as a text file.
In step S430, the voice comparison module 151 compares the input voice and the voice corresponding to the first line (i.e. the vocalization of the first line) to generate a comparison result. Thus, the voice comparison module 151 determines whether the first line among the lines matches to the input voice. As mentioned above, the first line is a text file, and the voice corresponding to the first line is an audio signal, which represents the pronunciation of the first line. In some embodiments, the voice comparison module 151 determines whether the input voice is the same as the voice corresponding to the first line using voiceprint comparison technology. In some embodiments, the memory which stores the lines also stores voices (audio signals) corresponding to these lines. The voice comparison module 151 reads the voices stored in the memory to determine whether these voices are the same as the input voice. In another embodiment, the voice comparison module 151 converts the input voice into texts using Speech-to-Text or speech recognition technology to determine whether the input voice is the same as the first line. By doing so, it is not necessary to store the voices (audio signal) corresponding to the lines.
In some embodiments, the comparison result is the accuracy rate. The input voice can be separated into multiple voice words (also referred to as spoken words). Each voice word is an audio signal representing the pronunciation of a word (a vocabulary). The first line can be separated into multiple line words (also referred to as written words). The accuracy rate is calculated as follows:
In step S440, the voice comparison module 151 determines whether to output the first line according to the comparison result (accuracy rate). In some embodiments, the voice comparison module 151 outputs the first line (for example, to the control unit 150), when the voice comparison module 151 determines that the accuracy rate is higher than a first threshold. For example, the first threshold is 80%. Thus, the spectacle-type display device 100 can display the correct line, even if the line spoken by the actor has a minor difference with the line as written in the script. If the voice comparison module 151 determines that the first line doesn't match to the input voice, for example, the comparison result (accuracy rate) is lower than the first threshold and higher than a second threshold, the voice comparison module 151 continues to compare the input voice and other lines (i.e. a second line) in the lines library. When the voice comparison module 151 determines that the accuracy rate is lower than the second threshold, the voice comparison module 151 doesn't output the first line (e.g. to the control unit 150). For example, the second threshold is 20%. Thus, the spectacle-type display device 100 doesn't display any line when the actor forgets the line (and doesn't speak any line).
Refer to Table 1 and Table 2, Table 1 and Table 2 illustrate the relationship between the input voice, the line in the lines library, and the line output from the voice comparison module 151.
Thus, the voice comparison module 151 in the above embodiments determines whether the input voice matches to the known (given) words, instead of converting the input voice into words. In other words, the voice comparison module 151 in accordance with the embodiments of the present disclosure doesn't perform language understanding and sentence generation. Comparing to speech recognition (including language understanding and sentence generation), the voice comparison module 151 in accordance with embodiments of the present disclosure requires less computation. Thus, the voice comparison module 151 in accordance with the embodiments of the present disclosure can instantly output the line spoken by the actor. Moreover, the voice comparison module 151 in accordance with the embodiments of the present disclosure outputs the words which are stored in the memory in advance. By doing so, outputting the wrong content because of a wrong input voice can be avoid. Particularly, the voice comparison module 151 in accordance with the embodiments of the present disclosure can solve the problem of determining homophones (words having the same pronunciation) and words having close pronunciations.
Refer to
In step S530, the position determination module 152 cuts (divides) the input image into multiple areas (referred to as image area below). The position determination module 152 may cut the input image in any way. For example, the position determination module 152 may cut the input image into N*M image areas, and N and M may be any positive integer. N may or may not be equal to M. In some embodiments, each of the image areas has the same size. In step S540, the position determination module 152 cuts (divides) the lenses 120 into multiple areas (referred to as lens area below) in the way that the input image is cut (i.e. in the way that the position determination module 152 cuts the input image into image areas). For example, the lenses 120 are respectively cut into N*M areas. Thus, each lens area corresponds to one image area. In some embodiments, the position determination module 152 assigns coordinates (or number) to the image areas and the lens areas in the same way. One image area and the lens area corresponding to that image area have the same coordinates (or number). For example, refer to
In step S550, the position determination module 152 determines the position where the line is displayed on the lenses 120. The position determination module 152 determines a first image area in the input image according to the marked up position. The first image area does not include any human. Then, the position determination module 152 determines to display the line on the first lens area of the lenses 120. The first lens area corresponds to the first image area. In some embodiments, the position determination module 152 may also selects more than one area to display the line.
In some embodiments, the position determination module 152 doesn't perform steps S510˜S540. In these embodiments, the position determination module 152 determines to display the line on the default position of the lenses 120. For example, the default position is the top, bottom, leftmost, or rightmost of the lenses 120, because actors usually stand on the middle of the stage. By doing so, the amount of computation and the process time may be reduced, and the spectacle-type display device 100 can display the line faster.
Thus, by using the position determination module 152, the position determination module 152 in accordance to the embodiments of the present disclosure can avoid the line to block the actors in the view of the user. Refer to
In conclusion, the spectacle-type display device in accordance to the embodiments of the present disclosure can correctly display the line corresponding to the received voice and can display the line (caption) without blocking the actor. Thus, the spectacle-type display device in accordance to the embodiments of the present disclosure can display the correct line at the correct timing and improve the user experience.
While the invention has been described by way of example and in terms of the preferred embodiments, it should be understood that the invention is not limited to the disclosed embodiments. On the contrary, it is intended to cover various modifications and similar arrangements (as would be apparent to those skilled in the art). Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.
Number | Date | Country | Kind |
---|---|---|---|
112150753 | Dec 2023 | TW | national |