The present disclosure relates to a language presentation device, a language presentation method, and a language presentation program for presenting a language on the basis of a recognition result of an uttered voice.
Patent document 1 discloses a voice translation device that receives voices of at least two kinds of languages, recognizes the contents of the received voices, and translates the recognized contents into different languages. This voice translation device outputs a translated content in voice and displays, in different directions on the screen, a text of an input voice and a text of the translated content.
Patent Document 1: WO 2017/086434
The concept of the present disclosure has been conceived in view of the above circumstances in the art, and an object of the disclosure is to provide a language presentation device, a language presentation method, and a language presentation program that allow two persons who cannot understand each other's language to make a conversation while continuing to look at each other's face by presenting languages of them toward their respective faces and to thereby realize, in a simplified manner, a natural and smooth conversation.
The present disclosure provides a language presentation device including a first acquisition unit configured to acquire a first voice uttered by at least one of a first user and a second user who are located with a transparent presentation unit interposed between the first user and the second user; a second acquisition unit configured to acquire a content of the acquired first voice and a translated content obtained by translating the content of the first voice into a language suitable for the first user or the second user; and a control unit configured to present the acquired content of the first voice and the acquired translated content on the transparent presentation unit in such a manner that one of the acquired content of the first voice and the acquired translated content is inverted in a left-right direction.
The disclosure provides a language presentation method employed in a language presentation device that serves for a conversation between a first user and a second user located with a transparent presentation unit interposed between them, including the steps of acquiring a first voice uttered by at least one of a first user and a second user; acquiring a content of the acquired first voice and a translated content obtained by translating the content of the first voice into a language suitable for the first user or the second user; and presenting the acquired content of the first voice and the acquired translated content on the transparent presentation unit in such a manner that one of the acquired content of the first voice and the acquired translated content is inverted in a left-right direction.
The disclosure also provides a language presentation program for causing a language presentation device that is a computer and serves for a conversation between a first user and a second user located with a transparent presentation unit interposed between the first user and the second user to execute the steps of acquiring a first voice uttered by at least one of a first user and a second user; acquiring a content of the acquired first voice and a translated content obtained by translating the content of the first voice into a language suitable for the first user or the second user; and presenting the acquired content of the first voice and the acquired translated content on the transparent presentation unit in such a manner that one of the acquired content of the first voice and the acquired translated content is inverted in a left-right direction.
Furthermore, the disclosure provides a language presentation device including a transparent presentation unit; an acquisition unit configured to acquire a first voice uttered by a user in a first language; and a control unit configured to present a content of the acquired first voice and a second content obtained by translating the content of the first voice into a second language that is different from the first language on the transparent presentation unit in such a manner that the content of the acquired first voice and the translated content are inverted from each other in a left-right direction.
Still further, the disclosure provides a language presentation program for causing a language presentation device that is a computer connected to a transparent presentation unit to execute the steps of acquiring a first voice uttered by a user in a first language; acquiring a content of the acquired first voice and a translated second content obtained by translating the content of the first voice into a second language that is different from the first language; and presenting the acquired content of the first voice and the acquired second content on the transparent presentation unit in such a manner that the content of the acquired first voice and the translated content are inverted from each other in a left-right direction.
The present disclosure allows two persons who cannot understand each other's language to make a conversation while continuing to look at each other's face by presenting languages of them toward their respective faces and thereby realizes, in a simplified manner, a natural and smooth conversation.
The configuration of the above-described Patent document 1 may be able to realize a smooth conversation between two persons by causing them to look at respective pictures even if each person cannot understand the language of the other. However, in Patent document 1, the two persons who cannot understand each other's language need to look at the screen of the voice translation device by turning their eyes from each other's face (e.g., eyes) during a conversation between them. As a result, a person who is accustomed to making a conversation while looking at the eyes of the other person (e.g., a foreigner who has come to Japan for sightseeing, business, or the like) would feel uncomfortable and have difficulty making a conversation naturally and smoothly.
The first embodiment described below has been conceived in view of the above circumstances in the art, and will describe a language presentation device, a language presentation method, and a language presentation program that allow two persons who cannot understand each other's language to make a conversation while continuing to look at each other's face by presenting languages of them toward their respective faces and to thereby realize, in a simplified manner, a natural and smooth conversation.
The embodiment in which a language presentation device, a language presentation method, and a language presentation program according to the disclosure will be disclosed in a specific manner will be described in detail by referring to the accompanying drawings when necessary. However, unnecessarily detailed descriptions may be avoided. For example, detailed descriptions of already well-known items and duplicated descriptions of constituent elements having substantially the same ones already described may be omitted. This is to prevent the following description from becoming unnecessarily redundant and thereby facilitate understanding of those skilled in the art. The following description and the accompanying drawings are provided to allow those skilled in the art to understand the disclosure thoroughly and are not intended to restrict the subject matter set forth in the claims.
In the following, an example that a language presentation system including the language presentation according to the disclosure is used for (e.g., assists) a conversation that a host and a guest who cannot understand each other's language make while facing each other looking at each other's face with a transparent screen located between them on a counter such as a reception counter (see
The language presentation system 100 shown in
The face-to-face translator 10 (an example of a term “language presentation device”) is configured so as to include a communication unit 11, a memory 12, a control unit 13, and a storage unit 14. The face-to-face translator 10 is configured using an information processing device that is a computer such as a server or a PC (personal computer) and is installed at, for example, such a position as to be recognized visually by neither the host HST1 nor the guest GST1 (e.g., inside a counter (not shown) or in a backyard monitoring room (not shown)). The face-to-face translator 10 assists a conversation between the host HST1 and the guest GST1 who face each other with the transparent screen 30 located between them.
The communication unit 11, which serves as a communication interface for a communication with the translation server 50, transmits data (hereinafter referred to as “uttered voice data”) of a voice (described later) picked up by the microphone MC1 to the translation server 50 over the network NW. The communication unit 11 receives, over the network NW, translated text data and translated voice data transmitted from the translation server 50. The communication unit 11 may store data or information acquired by itself in the memory 12 temporarily.
The memory 12, which is configured using, for example, a RAM (random access memory) and a ROM (read-only memory), temporarily holds programs and data that are necessary for operation of the face-to-face translator 10 and data or information generated during an operation of the face-to-face translator 10. The RAM is a work memory that is used, for example, during an operation of the face-to-face translator 10. The ROM stores and holds, in advance, programs and data for, for example, controlling the face-to-face translator 10.
The memory 12 holds information relating to a language (e.g., Japanese) used by the host HST1 and information relating to a language (e.g., English) used by the guest GST1 in such a manner that they are correlated with each other. The information relating to the language used by the host HST1 may be either recorded in, for example, the ROM in advance or stored in the memory 12 as information that is set every time a manipulation is made (e.g., the button BT1 for language selection is pushed) by the host HST1. The information relating to the language used by the guest GST1 is stored in the memory 12 as information that is set every time a manipulation is made (e.g., the button BT1 for language selection is pushed) by the guest GST1.
The memory 12 holds information indicating a projection position, on the transparent screen 30, of first text data obtained by character-recognizing the content of a voice (an example of a term “first voice”) uttered by the host HST1 (i.e., information indicating a height of presentation of the first text data on the transparent screen 30).
Likewise, the memory 12 holds information indicating a projection position, on the transparent screen 30, of second text data obtained by character-recognizing the content of a voice (an example of a term “second voice”) uttered by the guest GST1 (i.e., information indicating a height of presentation of the second text data on the transparent screen 30).
For example, the control unit 13 is a processor PRC1 that is configured using a CPU (central processing unit), an MPU (microprocessing unit), a DSP (digital signal processor), or an FPGA (field programmable gate array). Functioning as a controller for controlling the operation of the face-to-face translator 10, the control unit 13 performs control processing for supervising operations of individual units of the face-to-face translator 10 comprehensively, data input/output processing with the individual units of the face-to-face translator 10, data computation (calculation) processing, and data storage processing. The control unit 13 operates according to programs and data stored in the memory 12. Using the memory 12 during an operation, the control unit 13 may store data or information generated or acquired by the control unit 13 in the memory 12 temporarily. The details of the operation of the control unit 13 will be described later with reference to
The storage unit 14 is a storage device that is configured using an HDD (hard disk drive) or an SSD (solid-state drive), for example. For example, the storage unit 14 stores data or information generated or acquired by the control unit 13. The storage unit 14 may be omitted in the configuration of the face-to-face translator 10.
The projector 20 (an example of a term “transparent presentation unit”) is connected to the face-to-face translator 10 so as to be able to transmit and receive data or information to and from the face-to-face translator 10. The projector 20 is disposed so as to be opposed to the transparent screen 30. When receiving and acquiring data of a projection image including a projection instruction transmitted from the face-to-face translator 10, the projector 20 generates, on the basis of the projection instruction, projection light (e.g., visible light) for projection, onto the transparent screen 30, of the projection image specified by the projection information and projects it toward the transparent screen 30. In this manner, the projector 20 can project, onto the transparent screen 30, a projection image (e.g., text data corresponding to a voice uttered by the host HST1 or the guest GST1) specified by the face-to-face translator 10 and thereby assist a conversation between the host HST1 and the guest GST1.
The transparent screen 30 (an example of the term “transparent presentation unit”) has a structure that, for example, a sheet on which a projection light coming from the projector 20 is to be projected is bonded to a transparent glass plate and is installed in a stationary manner. Projection light (e.g., visible light) coming from the projector 20 is projected onto the transparent screen 30, and the transparent screen 30 presents, to both of the host HST1 and the guest GST1, a projection image for assisting a conversation between the host HST1 and the guest GST1 (e.g., text data corresponding to a voice uttered by the host HST1 or the guest GST1). The transparent screen 30 does not always require the projector 20. For example, it is preferable that the transparent screen 30 be a transparent display whose transparency is higher than or equal to about 40% and it is particularly preferable that its transparency be higher than or equal to about 70%. The transparent screen 30 may be a transparent liquid crystal display, a transparent organic EL display, or the like having such a characteristic.
Furthermore, the transparent screen 30 may be a transparent screen (refer to Referential Non-patent Document 1, for example) in which switching can be made between a transparent mode and a screen mode alternately.
In the first embodiment, a transparent touch panel that can display data or information supplied from the face-to-face translator 10 and detect a direct manipulation, such as a touch, of the host HST1 and the guest GST1 may be provided in place of the projector 20 and the transparent screen 30 as an example transparent presentation unit.
The button BT1 is a language selection button to be used for setting information relating to languages used by the host HST1 and the guest GST1 and, as shown in
The switch SW1 is a switch that is pushed by the host HST1 to inform the face-to-face translator 10 of timing of utterance of the host HST1. In other words, the switch SW1 is pushed by the host HST1 immediately before utterance of the host HST1. This allows the face-to-face translator 10 to recognize the timing when the host HST1 has spoken on the basis of a signal received from the switch SW1.
The switch SW2 is a switch that is pushed by the guest GST1 to inform the face-to-face translator 10 of timing of utterance of the guest GST1. In other words, the switch SW2 is pushed by the guest GST1 immediately before utterance of the guest GST1. This allows the face-to-face translator 10 to recognize the timing when the guest GST1 has spoken on the basis of a signal received from the switch SW2.
The microphone MC1 picks up a voice uttered by one of the host HST1 and the guest GST1 who is speaking alternately and sends a signal indicating the picked up voice to the face-to-face translator 10. To make it easier to pick up a voice of the guest GST1 than a voice of the host HST1, the microphone MC1 may be installed on the stage of the transparent screen 30 so as to be directed to the side of the guest GST1. Alternatively, to equally pick up a voice uttered by the host HST1 and a voice uttered by the guest GST1, the microphone MC1 may be installed on the stage of the transparent screen 30 so as to be equally distant from the host HST1 side and the guest GST1 side.
The speaker SP1 receives a signal of voice data that is output from the face-to-face translator 10 and outputs a corresponding voice. For example, a signal of voice data that is input to the speaker SP1 is one of a signal indicating voice data of a voice uttered by the host HST1, a signal indicating voice data of a voice uttered by the guest GST1, a signal indicating voice data of a result of translation, into a language suitable for the guest GST1, of the content of a voice uttered by the host HST1 (i.e., translated voice data), and a signal indicating voice data of a result of translation, into a language suitable for the host HST1, of the content of a voice uttered by the guest GST1 (i.e., translated voice data).
The translation server 50 (an example of the term “language presentation device”) is configured so as to include a communication unit 51, a memory 52, a translation control unit 53, and a storage unit 54. The translation server 50 is a cloud server that is configured using an information processing device that is a computer such as a server or a PC, and is connected to the face-to-face translator 10 via the network NW. When receiving and acquiring voice data from the face-to-face translator 10, the translation server 50 character-recognizes a voice corresponding to the acquired voice data and performs translation processing on the acquired voice. The translation server 50 transmits, to the face-to-face translator 10, text data as a character recognition result (hereinafter referred to as “recognized text data”), text data as a translation processing result (hereinafter referred to as “translated text data”), and voice data as a translation processing result (hereinafter referred to as “translated voice data”).
The communication unit 51, which serves as a communication interface for a communication with the face-to-face translator 10, transmits recognized text data, translated text data, and translated voice data as mentioned above to the face-to-face translator 10 over the network NW. The communication unit 11 receives, over the network NW, uttered voice data transmitted from the face-to-face translator 10. The communication unit 51 may store, temporarily, data or information acquired by itself in the memory 52.
The memory 52, which is configured using, for example, a RAM and a ROM, temporarily holds programs and data that are necessary for operation of the translation server 50 and data or information generated during an operation of the translation server 50. The RAM is a work memory that is used, for example, during an operation of the translation server 50. The ROM stores and holds, in advance, programs and data for, for example, controlling the translation server 50.
For example, the translation control unit 53 is a processor PRC2 configured using a CPU, an MPU, a DSP, or an FPGA. Functioning as a controller for controlling the operation of the translation server 50, the translation control unit 53 performs control processing for supervising operations of individual units of the translation server 50 comprehensively, data input/output processing with the individual units of the translation server 50, data computation (calculation) processing, and data storage processing. The translation control unit 53 operates according to programs and data stored in the memory 52. Using the memory 52 during an operation, the translation control unit 53 may store data or information generated or acquired by the translation control unit 53 in the memory 52 temporarily. The details of the operation of the translation control unit 53 will be described later with reference to
The storage unit 54 is a storage device configured using an HDD or an SSD, for example. For example, the storage unit 54 stores data of information acquired by the translation control unit 53. Furthermore, the storage unit 54 holds a dictionary DB (database) to be used by the translation control unit 53 in performing translation processing on recognized text data. Still further, the storage unit 54 holds a voice DB to be used by the translation control unit 53 to generate voice data (that is, translated voice data) corresponding to translated text data. The translation server 50 may update the contents of the above-mentioned dictionary DB and voice DB on a regular basis by, for example, communicating with, on a regular basis, an external dictionary server (not shown) that is connected via the network NW.
Next, how the language presentation system 100 according to the first embodiment operates will be outlined with reference to
Referring to (in Japanese language)”) and transmits it to the face-to-face translator 10. The face-to-face translator 10 receives and acquires the recognized text data transmitted from the translation server 50. The face-to-face translator 10 presents the recognized text data HTX1 to the host HST1 by projecting it on the transparent screen 30 via the projector 20.
Then, as shown in
Then, as shown in
Then, assume that at time t=t4 that is after time t=t3 the guest GST1 pushes the switch SW2 and says “Thank you for letting me know” (in English). When the voice of “Thank you for letting me know” said by the guest GST1 has been picked up by the microphone MC1, the face-to-face translator 10 acquires data of that voice (uttered voice data) from the microphone MC1 and transmits it to the translation server 50. The translation server 50 performs character recognition processing on the uttered voice data transmitted from the face-to-face translator 10, generates recognized text data as its character recognition result (i.e., text data of “Thank you for letting me know” (English)) and transmits it to the face-to-face translator 10. The face-to-face translator 10 receives and acquires the recognized text data transmitted from the translation server 50. The face-to-face translator 10 presents the recognized text data GLTX2 to the guest GST1 by projecting it onto the transparent screen 30 via the projector 20.
Then, as shown in (in Japanese language)”) by referring to the dictionary DB stored in the storage unit 54. Furthermore, at time t=t5, the translation server 50 generates voice data (translated voice data) corresponding to the translated text data. The translation server 50 transmits the translated text data and the translated voice data to the face-to-face translator 10 in such a manner that they are correlated with each other. The face-to-face translator 10 receives and acquires the translated text data and the translated voice data transmitted from the translation server 50. The face-to-face translator 10 presents the translated text data HLTX2 to the host HST1 by projecting it onto the transparent screen 30 via the projector 20 in a state that it is inverted in the left-right direction from the direction in which the recognized text data GLTX2 is being presented on the transparent screen 30. Still further, at time t=t5, the face-to-face translator 10 causes the speaker SP1 to output the translated voice data. The timing at which the translation server 50 generates translated text data and translated voice data may be time t4 (earlier timing) rather than time t5. As shown in
Next, an operation procedure of the language presentation system 100 according to the first embodiment will be described with reference to
Referring to
The control unit 13 (an example of a term “first acquisition unit”) of the face-to-face translator 10 acquires, by receiving it via the communication unit 51, the audio data of the voice (an example of the term “first voice”) picked up by the microphone MC1 at step S1 (S11). Capable of recognizing, immediately before the time point of step S11, which switch has been pushed, the control unit 13 of the face-to-face translator 10 can recognize by which of the host HST1 and the guest GST1 the voice of the voice data acquired at the time point of step S11 was uttered. Since the control unit 13 of the face-to-face translator 10 has recognized in advance what languages the host HST1 and the guest GST1 use, the control unit 13 may infer which of the host HST1 and the guest GST1 has spoken by, for example, inferring a language of the uttered voice data by performing a known language inference processing using the uttered voice data.
The communication unit 11 of the face-to-face translator 10 transmits the voice data (i.e., uttered voice data) acquired at step S11 to the translation server 50 (S12). In the case of a setting that the translation control unit 53 of the translation server 50 has not recognized information relating to a language (e.g., Japanese) used by the host HST1 and information relating to a language (e.g., English) used by the guest GST1, for example, the communication unit 11 of the face-to-face translator 10 may transmit information relating to the respective languages used by the host HST1 and the guest GST1 to the translation server 50 together with the uttered voice data. This allows the translation control unit 53 of the translation server 50 to recognize from what language to what language a translation should be made on the basis of the language-related information that has been transmitted from the face-to-face translator 10 at the time point of step S12.
The translation control unit 53 of the translation server 50 receives and acquires the uttered voice data transmitted from the face-to-face translator 10 at step S12 and performs known character recognition processing on the acquired uttered voice data (S21). Using a result of the character recognition processing obtained at step S21, the translation control unit 53 of the translation server 50 generates recognized text data that is character-recognized data of the content of the uttered voice data (S22). The communication unit 51 of the translation server 50 transmits the recognized text data generated at step S22 to the face-to-face translator 10 (S23).
The translation control unit 53 of the translation server 50 generates translated text data by performing translation processing on the character recognition result obtained at step S21 into a language suitable for the host HST1 or the guest GST1 by referring to the dictionary DB stored in the storage unit 54 (S24). Furthermore, the translation control unit 53 of the translation server 50 generates translated voice data that is a connection of voice data corresponding to respective text data (e.g., words and sentences) in the translated text data and is suitable for the host HST1 or the guest GST1 by referring to the voice DB stored in the storage unit 54 (S24). The communication unit 51 of the translation server 50 transmits both of the translated text data and the translated voice data generated at step S24 to the face-to-face translator 10 (S25).
The translation control unit 53 of the translation server 50 may execute steps S22 and S23 and steps S24 and S25 either in parallel or in order of steps S22, S23, S24, and S25 after the execution of step S21.
Although it was described above with reference to
The communication unit 11 (an example of a term “second acquisition unit”) of the face-to-face translator 10 receives and acquires the recognized text data that is transmitted from the translation server 50 at step S23 (S13). The control unit 13 of the face-to-face translator 10 generates a first projection instruction to project the recognized text data onto the transparent screen 30 and transmits the first projection instruction including the recognized text data to the projector 20 via the communication unit 11 (S13). The projector 20 projects the recognized text data onto the transparent screen 30 in such a manner that the host HST1 and the guest GST1 can see it on the basis of the first projection instruction received from the face-to-face translator 10 (S2).
Furthermore, the communication unit 11 of the face-to-face translator 10 receives and acquires the translated text data and the translated voice data transmitted from the translation server 50 at step S25 (S14). The translated text data indicates the content of a translated voice (an example of a term “second voice”) obtained by translating the content of the voice of uttered voice data into a language suitable for the host HST1 or the guest GST1. The translated voice data is voice data that is a connection of voice data corresponding to respective words constituting the translated text data and is suitable for the host HST1 or the guest GST1. The control unit 13 of the face-to-face translator 10 outputs the translated voice data to the speaker SP1 and thereby presents a translated voice representing the content of the translated voice data to the host HST1 or the guest GST1 by causing the speaker SP1 to output it (S3).
The control unit 13 of the face-to-face translator 10 generates a second projection instruction to project the translated text data in a state that it is inverted in the left-right direction from the direction in which the recognized text data is being presented on the transparent screen 30 and transmits the second projection instruction including the translated text data to the projector 20 via the communication unit 11 (S15). The projector 20 projects the translated text data onto the transparent screen 30 in such a manner that the host HST1 and the guest GST1 can see it on the basis of the second projection instruction received from the face-to-face translator 10 (S4).
As described above, in the language presentation system 100 according to the first embodiment, the face-to-face translator 10 acquires uttered voice data of a voice uttered by at least one of the host HST1 and the guest GST1 who are facing each other with the transparent screen 30 interposed between them. The face-to-face translator 10 acquires the voice content of the acquired uttered voice data and the voice content of translated voice data obtained by translating the voice content of the uttered voice data into a language suitable for the host HST1 or the guest GST1. The face-to-face translator 10 presents the acquired voice content of the uttered voice data and the acquired voice content of the translated voice data on the transparent screen 30 in such a manner that one of them is inverted in the left-right direction.
Configured in the above-described manner, the language presentation system 100 allows two persons who cannot understand each other's language to make a conversation while continuing to look at each other's face by presenting languages of them toward their respective faces and hence realizes, in a simplified manner, a natural and smooth conversation.
The face-to-face translator 10 acquires, as the content of a first voice, recognized text data (an example of the term “first text data”) obtained by character-recognizing the voice content of the uttered voice data, and acquires, as a translated content, translated text data (an example of the term “second text data”) obtained by translating the recognized text data into the language suitable for the host HST1 or the guest GST1. With this measure, the face-to-face translator 10 can properly present the content of a voice uttered by the host HST1 or the guest GST1 on the transparent screen 30 as text data and hence can effectively facilitates understanding of the voice like a telop (e.g., subtitle) used in a television broadcast, for example.
The face-to-face translator 10 further acquires, as a translated content, voice data of a second voice (e.g., translated voice data) obtained by translating the content of the uttered voice data into the language suitable for the host HST1 or the guest GST1. With this measure, by conveying not only the text but also an output sound to the counterpart, the face-to-face translator 10 can effectively convey, to the counterpart, the voice obtained by translating the voice uttered by the host HST1 or the guest GST1 into a language that can be understood by the counterpart and hence can help the counterpart to understand the content of the voice quickly.
The face-to-face translator 10 sends a projection instruction to the projector 20 so that the voice content of the uttered voice data can be presented on the transparent screen 30 in the form of outline characters in a first-shape frame (e.g., rectangular frame) that is painted out in a first color (e.g., light blue). For example, outline characters are characters that are made recognizable to the host HST1 on a rectangular frame that is “painted out” in light blue by removing only character portions and thereby making them appear. Outline characters are less recognizable than solid characters (described below). On the other hand, the face-to-face translator 10 sends a projection instruction to the projector 20 so that the voice content of the translated text data can be presented on the transparent screen 30 in the form of solid characters having a second color (e.g., white) in a transparent second-shape frame (e.g., rectangular frame). For example, solid characters are characters that are made recognizable to the guest GST1 on a transparent rectangular background frame by writing only character portions in white. Solid characters are more recognizable than outline characters (described above). With this measure, in the face-to-face translator 10, for example, a text of the content of a voice uttered by the host HST1 may presented to the host HST1 in the form of outline characters only for confirmation. On the other hand, a text written in solid characters that are more visible to the guest GST1 than in the case of outline characters can be presented to the guest GST1. In this manner, texts can be presented on the transparent screen 30 in favor of the guest GST1 so as to avoid confusion in recognition due to presentation of text data that can be understood by the two persons.
For example, the transparent presentation unit may be a touch panel (not shown) that can be manipulated by each of the host HST1 and the guest GST1 instead of being composed of the projector 20 and the transparent screen 30. The face-to-face translator 10 presents recognized text data (an example of the term “content of the first voice”) at a first presentation position and presents translated text data (an example of the term “translated content”) at a second presentation position on the touch panel on the basis of the first presentation position and the second presentation position specified by the host HST1 and the guest GST1, respectively. With this measure, the face-to-face translator 10 can display (present) recognized text data and translated text data on the touch panel at desired positions specified by the host HST1 and the guest GST1, respectively. That is, text data corresponding to their respective languages can be presented at such positions that the host HST1 and the guest GST1 can see each other's face easily and recognize each other during an actual conversation, for example, at positions located a little below their lines of sight.
The face-to-face translator 10 presents translated text data (an example of the term “translated content”) and recognized text data (an example of the term “content of the first voice”) on the transparent screen 30 in such a manner that the translated text data is located above the recognized text data. With this measure, for example, the face-to-face translator 10 can present the translated text data at a top-side position on the transparent screen 30 where the guest GST1 can see it more easily with preference given to the guest GST1 over the host HST1. That is, the texts can be on the transparent screen 30 in favor of the guest GST1.
The face-to-face translator 10 presents translated text data (an example of the term “translated content”) for a longer time than recognized text data (an example of the term “content of the first voice”) on the transparent screen 30. With this measure, for example, the face-to-face translator 10 can present, on the transparent screen 30, the translated text data to be viewed by the guest GST1 for a longer time than the recognized text data to be viewed by the host HST1 for confirmation with preference given to the guest GST1 over the host HST1. That is, the texts can be on the transparent screen 30 in favor of the guest GST1.
The face-to-face translator 10 presents translated text data (an example of the term “translated content”) on the transparent presentation 30 in a prescribed color (e.g., white) that is high in recognition rate. With this measure, since the face-to-face translator 10 allows the guest GST1 to see the translated text data that is projected onto the transparent screen 30 in the prescribed color (e.g., white), the guest GST1 can understand the content of the translated text data quickly.
For example, the transparent presentation unit is composed of the transparent screen 30 and the projector 20. The face-to-face translator 10 sends, to the projector 20, an instruction to project recognized text data (an example of the term “content of the first voice”) and translated text data (an example of the term “translated content”) onto the transparent screen 30. With this measure, the face-to-face translator 10 can present the recognized text data of a voice uttered by the host HST1 and the translated text data suitable for the guest GST1 on the transparent screen 30 in a simple manner.
For example, the transparent presentation unit is a touch panel (not shown) that can be manipulated by the host HST1 and the guest GST1. The face-to-face translator 10 sends recognized text data (an example of the term “content of the first voice”) and translated text data (an example of the term “translated content”) to the touch panel so that they are displayed on the touch panel. With this measure, even if equipped with neither the projector 20 nor the transparent screen, the face-to-face translator 10 allows the host HST1 and the guest GST1 to see the recognized text data and the translated text data in a state that they face each other with the touch panel interposed between them and hence can realize a natural conversation effectively.
In the first embodiment, the size of each of various text data (more specifically, recognized text data and translated text data) to be projected onto the transparent screen 30 is specified by the projector 20 and may be included in a projection instruction sent from the face-to-face translator 10, for example. With this measure, the face-to-face translator 10 can flexibly change the size of text data to be projected onto the transparent screen 30 according to, for example, an age range specified by a manipulation of the host HST1 or the guest GST1.
In the first embodiment, the transparent screen 30 is provided as an example of the transparent presentation unit. Thus, the language presentation system 100 can be used as a service tool in entertaining a special customer (e.g., guest GST1) by installing the transparent screen 30 at, for example, a place where a luxury environment can be produced (e.g., a selling area of cosmetic products in a department store or a reception counter of a premium train).
In the language presentation system 100 according to the first embodiment, the control unit 13 (an example of the term “acquisition unit”) of the face-to-face translator 10 acquires, together with the transparent screen 30 (an example of the term “transparent presentation unit”), a first voice (e.g., a voice included in uttered voice data) of a first language (e.g., Japanese) uttered by the host HST1 or the guest GST1. The control unit 13 of the face-to-face translator 10 present the content of the acquired first voice and a translated content obtained by translating the content of the first voice into a second language (e.g., English) that is different from the first language on the transparent screen 30 directly or via the projector 20 in such a manner that they are inverted from each other in the left-right direction.
With this measure, the face-to-face translator 10 can present the content of a voice of a first language (e.g., Japanese) uttered by one user (e.g., the host HST1 who speaks Japanese) and a translated content obtained by translating the content of that voice into a second language (e.g., English) that is suitable for the other user (e.g., the guest GST1 who speaks English) on the transparent screen 30 in such a manner that they are inverted from each other in the left-right direction. Thus, when, for example, two persons who cannot understand each other's language make a conversation, each of them can see a text of his or her own language and a text of the other person's language while looking at the other person's face. As a result, a natural and smooth conversation can be realized in a simplified manner.
Although the embodiment has been described above with reference to the accompanying drawings, it goes without saying that the disclosure is not limited to that example. It is apparent that those skilled in the art could conceive various changes, modifications, replacements, additions, deletions, or equivalents within the confines of the claims, and they are construed as being included in the technical scope of the disclosure. And constituent elements of the above-described various embodiments can be combined in a desired manner without departing from the spirit and scope of the invention.
Incidentally, in the language presentation system 100 according to the first embodiment, the table TBL1 on which the transparent screen 30 is installed is not limited to a table placed on a counter (see
Although the above-described first embodiment is directed to the case that the host HST1 and the guest GST1 make a conversation facing each other with the transparent screen (installed on the counter such as a reception counter) interposed between them, the place where the transparent screen is installed is not limited to a counter such as a reception counter and may be a taxi, a restaurant, a conference room, an information office of a train station, etc. For example, a transparent glass plate provided between the driver seat and the rear seat in a taxi can be used as the transparent screen 30. In a restaurant, a conference room, or an information office of a train station, the transparent screen 30 may be provided between persons who face each other and make a conversation.
The above-described language presentation system 100 according to the first embodiment can be applied to what is called finger-pointing translation in which text data of each other's language is displayed on a touch panel or the like.
The present application is based on Japanese Patent Application No. 2018-013425 filed on Jan. 30, 2018, the disclosure of which is invoked herein by reference.
The present disclosure is useful when applied to language presentation devices, language presentation methods, and language presentation programs that allow two persons who cannot understand each other's language to make a conversation while continuing to look at each other's face by presenting languages of them toward their respective faces and thereby realize, in a simplified manner, a natural and smooth conversation.
Number | Date | Country | Kind |
---|---|---|---|
2018-013415 | Jan 2018 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/001554 | 1/18/2019 | WO | 00 |