This application is based on and claims priority under 35 U.S.C. § 119 of an Indian patent application number 202041000734, filed on Jan. 7, 2020, in the Indian Patent Office, the disclosure of which is incorporated by reference herein in its entirety.
The present disclosure relates to generation of multimodal content in an application. More particularly, the disclosure relates to systems and methods for multimodal content generation within an application based on user input.
With the widespread of internet and social networking platforms, the exchange of information over such platforms has increased. Since the information exchange is communicated via messages on such platforms, it is important that the intent embedded in the message should be conveyed properly to the recipient. Most of the platforms provide static content with user intervening contents such as emoticon, memes, Graphics Interchange Format (GIF) images, etc., which are non-customizable and may not be appropriate for a conversation. Hence, tools for traditional content generation are not able to convey the intent behind the message properly leading to sending more messages for the same context to explain the intent behind the message due to improper intent communication.
Further, there are customizable digital content creation tools available which are aided by supporting props such as Greeting Templates, Sticker templates. However, it is tedious to find right content and navigate through endless templates to generate desired content.
A related technology provides a dynamic content creation, modification and distribution from a single source of content in online and offline scenarios, so that a user may create a dynamic content from any network device based on user modifiable elements, modify and distribute multiple sizes and/or formats on different platforms.
Another related technology provides adaptable layouts for social feeds. In the related technology, an activity is generated based on the social network action to collect metadata associated with a shared content. The shared content and the metadata are then mapped to layout templates that are each generated for different display layout formats associated with different types of client devices. .
However, the foregoing related technologies do not provide methods and systems for generating a multimodal content within an application based on a user input.
Example embodiments address at least the above problems and/or disadvantages and other disadvantages not described above. Also, the example embodiments are not required to overcome the disadvantages described above, and may not overcome any of the problems described above.
In accordance with an aspect of the disclosure, there is provided a method of generating a customized content, including: obtaining an input from a user; detecting, from the input, at least one feature and a modality of the input among a plurality of modalities comprising a text format, a sound format, a still image format, and a moving image format; determining a mode of the customized content, from a plurality of modes, based on the at least one feature and the modality of the input, the plurality of modes including an image mode and a text mode; and generating the customized content based on the determined mode.
The detecting the at least feature and the modality of the input may include: detecting an emotion or an activity of the user based on texts recognized in the input; and categorizing the modality and the at least one feature based on the detected emotion of the user or the detected activity of the user.
The generating the customized content may include generating the customized content based on the determined mode and the detected emotion of the user.
The categorizing the modality and the at least one features may include categorizing the modality and the at least one feature further based on a learned model and a predefined model.
The method may further include: obtaining intent information based on texts extracted from the input, wherein the texts extracted from the input may include at least one verb or one adjective, and wherein the generating the customized content may include generating the customized content based on the extracted texts and the determined mode.
The method may include determining at least one of a font size, a font type, or a color of the texts, based on the intent information; and the generating the customized content may include generating the customized content based on the determined mode and the at least one of the font size, the font type, or the color of the texts.
The method may further include: obtaining intent information based on texts extracted from the input, and determining a layout of the customized content based on the intent information, wherein the generating the customized content may include generating the customized content based on the layout and the determined mode.
The method may further include: obtaining at least one of time information or location information from the input, and the generating the customized content may include generating the customized content based on the determined mode and the at least one of the time information or the location information.
The method may further include: determining that the customized content requires an intervention of the user; and in response to the intervention of the user, displaying a second customized content.
The input may be a voice signal, and the method may further include: converting the voice signal into texts; identifying words, each of which has a pitch and a volume that are greater than a predetermined pitch and a predetermined volume, from the texts converted from the voice signal; and determining text information based on the identified words, wherein the generating the customized content may include generating the customized content based on the text information and the determined mode.
The method may further include: obtaining intent information, which indicates an intention of the user, based on texts extracted from the input, wherein the input comprises a plurality of texts, wherein the generating the customized content may include generating the customized content based on the intent information.
In accordance with an aspect of the disclosure, there is provided an apparatus for generating a customized content, the apparatus including: at least one memory configured to store one or more instructions; at least one processor configured to execute the one or more instructions to: obtain an input from a user; detect, from the input, at least one feature and modality of the input among a plurality of modalities comprising a text format, a sound format, a still image format, and a moving image format; determine a mode of the customized content, from a plurality of modes, based on the at least one feature and the modality of the input, the plurality of modes including an image mode and a text mode; and generate the customized content based on the determined mode, and a display configured to display the customized content.
The at least one processor may be further configured to execute the one or more instructions to: detect an emotion or an activity of the user based on texts recognized in the input; and categorize the modality and the at least one feature based on the detected emotion of the user or the detected activity of the user.
The at least one processor may be further configured to execute the one or more instructions to: generate the customized content based on the determined mode and the detected emotion of the user.
The at least one processor may be further configured to execute the one or more instructions to: obtain intent information based on texts extracted from the input, the texts extracted from the input comprising at least one verb or one adjective, and generate the customized content based on the extracted texts and the determined mode.
The at least one processor may be further configured to execute the one or more instructions to: determine at least one of a font size, a font type, or a color of the texts based on the intent information, and generate the customized content based on the determined mode and the at least one of the font size, the font type, or the color of the texts.
The at least one processor may be further configured to execute the one or more instructions to: obtain intent information based on texts extracted from the input; determine a layout of the customized content based on the intent information; and generate the customized content based on the layout and the determined mode.
The at least one processor may be further configured to execute the one or more instructions to: obtain at least one of time information or location information from the input, and generate the customized content based on the determined mode and the at least one of the time information or the location information.
The at least one processor may be further configured to execute the one or more instructions to: determine that the customized content requires an intervention of the user; and in response to the intervention of the user, control the display to display a second customized content.
In accordance with an aspect of the disclosure, there is provided a non-transitory computer readable storage medium having a computer readable instructions stored therein, when executed by at least one processor, configured to execute the computer readable instructions to cause the at least one processor to: obtain an input from a user; detect, from the input, at least one feature and modality of the input among a plurality of modalities comprising a text format, a sound format, a still image format, and a moving image format; determine a mode of a customized content, from a plurality of modes, based on the at least one feature and the modality of the input, the plurality of modes including an image mode and a text mode; and generate the customized content based on the determined mode.
The above and other aspects, features, and advantages of certain embodiments of the present disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings, in which:
Various embodiments are described in greater detail below with reference to the accompanying drawings.
In the following description, like drawing reference numerals are used for like elements, even in different drawings. The matters defined in the description, such as detailed construction and elements, are provided to assist in a comprehensive understanding of the example embodiments. However, it is apparent that the embodiments can be practiced without those specifically defined matters. Also, well-known functions or constructions are not described in detail since they would obscure the description with unnecessary detail.
The terms and words used in the following description and claims are not limited to the bibliographical meanings, but, are merely used by the inventor to enable a clear and consistent understanding of the disclosure. Accordingly, it should be apparent to those skilled in the art that the following description of various embodiments of the disclosure is provided for illustration purpose only and not for the purpose of limiting the disclosure as defined by the appended claims and their equivalents.
It is to be understood that the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a component surface” includes reference to one or more of such surfaces.
As used herein, the terms “1st” or “first” and “2nd” or “second” may use corresponding components regardless of importance or order and are used to distinguish one component from another without limiting the components.
Expressions such as “at least one of,” when preceding a list of elements, modify the entire list of elements and do not modify the individual elements of the list. For example, the expression, “at least one of a, b, and c,” should be understood as including only a, only b, only c, both a and b, both a and c, both b and c, all of a, b, and c, or any variations of the aforementioned examples.
Embodiments of the present disclosure will be described below in detail with reference to the accompanying drawings.
In an embodiment, the further devices 104-1, 104-2, 104-3, 104-4, . . . , 104-n may interchangeably be referred to as devices 104-1, 104-2, 104-3, 104-4, . . . , 104-n. The further devices 104-1, 104-2, 104-3, 104-4, . . . , 104-n may collectively be referred to as the devices 104, without departing from the scope of the disclosure. In an embodiment, the devices 104 may individually be referred to as the device 104, without departing from the scope of the present disclosure.
In an embodiment, the devices 104 may include but are not limited to, physical devices, vehicles, home appliances, and any other electronic item that can be connected to the network 106. For example, with respect to the home appliances, the devices 104 may include, but are not limited to, an Air Conditioner (AC), a refrigerator, a sound system, a television, a cellular device, a communication device, a microwave oven, an ambient light, a voice assistance device, interchangeably referred to as the voice assistance.
The electronic device 100 may interact with the devices 104 through a network 106. The network 106 may be a wired network or a wireless network. The network 106 may include, but is not limited to, a mobile network, a broadband network, a Wide Area Network (WAN), a Local Area Network (LAN), and a Personal Area Network.
In an embodiment, the electronic device 100 may be embodied as a smartphone, without departing from the scope of the present disclosure. In an embodiment, the electronic device 100 may be configured to generate multimodal content 112 within an application based on user input 110.
In an embodiment, the user 108 may enter an input 110 within an application on the electronic device 100. The electronic device 100 may recognize the input 110 to generate multimodal content as an output 112 within the application.
In an embodiment, the input 110 may be multimodal input such as text, voice, image, videos, GIF, Augmented Reality input, Virtual Reality input, Extended Reality input or any other mode of input. The input 110 is inputted by the user 108 within the application on the electronic device 100. The electronic device 100 may detect modality information and plurality of features from the input 110, and may identify intent of the input 108 based on the detected modality information and the detected plurality of features. The modality information may also be referred to as modality throughout the present disclosure. The electronic device 100 retrieves information for multimodal customized content generation and generates the multimodal customized content 112 based on the detected modality information, the detected plurality of features and the retrieved information. The electronic device 100 renders at least one multimodal customized content 112 to the user 108 within the application on the electronic device 100 for further action. Throughout the specification, the term of “multimodal customized content may be used interchangeably with the terms of “multimodal content” or “customized content”.
Constructional and operational details of the electronic device 100 are explained in detail referring to
In an embodiment, the electronic device 100 may be implemented to generate a multimodal content as an output 112 within the application. In another embodiment, the electronic device 100 may be implemented using the information from one of the devices 104 to generate the multimodal content as an output 112. For instance, the electronic device 100 may include a processor 202 which obtains an input 110 from a user 108 in an application. The processor 202 may detect at least one modality and a plurality of features from the input 110 and identify intent of the user from the input 110 based on the detected modality and the detected plurality of features. In order to detect the modality and the plurality of features from the input 110, the processor 202 may detect emotion of the user 108, recognize activity of the user 108 and/or categorize the modality and the plurality of features based on the detected emotion, recognized activity, learned model and a predefined model. The processor 202 may retrieve information for multimodal content generation and generate the multimodal content based on the detected modality, the detected plurality of features and the retrieved information. In order to retrieve the information, the processor 202 may obtain the intent of the user from the input 110, extract at least one keyword from the intent of the user from input 110 and search a database of a user device 100 and other connected devices 104 for the information based on the at least one keyword. The processor 202 may render the generated multimodal content 112 to the user 108. Constructional and operational details of the electronic device 100 are explained in detail in later sections of the present disclosure.
In an embodiment, the electronic device 100 may include the processor 202, memory 204, module(s) 206, and database 208. The module(s) 206 and the memory 204 are connected to the processor 202. The processor 202 may be implemented as a single processing unit or a number of computer processing units. The processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor 202 is configured to fetch and execute computer-readable instructions and data stored in the memory 204.
The memory 204 may include any non-transitory computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read-only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
The module(s) 206, among other things, include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement data types. The module(s) 206 may also be implemented as signal processor(s), state machine(s), logic circuitries, and/or any other device or component that manipulate signals based on operational instructions.
Further, the module(s) 206 may be implemented in hardware, instructions executed by at least one processing unit, e.g., the processor 202. The processing unit may include a processor, a state machine, a logic array and/or any other suitable devices capable of processing instructions. The processing unit may be a general-purpose processor which executes instructions to cause the general-purpose processor to perform operations or, the processing unit may be dedicated to performing the required functions. In some example embodiments, the module(s) 206 may be machine-readable instructions (software, such as web-application, mobile application, program, etc.) which, when executed by a processor/processing unit, perform any of the described functionalities.
In an implementation, the module(s) 206 may include a modality feature extraction module 210, an intent identification module 212, an information module 214, a beautify module 216, a structural module 218, a modality prediction module 220 and a rendering module 222. The modality feature extraction module 210, the intent identification module 212, the information module 214, the beautify module 216, the structural module 218, the modality prediction module 220 and the rendering module 222 are in communication with each other. The database 208 serves, among other things, as a repository for storing data processed, received, and generated by one or more of the modules 206.
In an embodiment of the present disclosure, the module(s) 206 may be implemented as a part of the processor 202. In another embodiment of the disclosure, the module(s) 206 may be external to the processor 202. In yet another embodiment of the disclosure, the module(s) 206 may be a part of the memory 204. In another embodiment of the present disclosure, the module(s) 206 may be a part of hardware, separate from the processor 202.
The electronic device may generate multimodal content as output 112 within the application on the electronic device 100. For the sake of brevity, features of the disclosure explained in detail in the description referring to
The electronic device 100 may include at least one processor 302 (also referred to herein as “the processor 302”), a memory 304, a communication interface unit(s) 306, display 308, a microphones(s) 310, speaker(s) 312, a resource(s) 314, a camera 316, a sensor 318, a module(s) 320, and/or database 322. The processor 302, the memory 304, the communication interface unit(s) 306, the display 308, the microphones(s) 310, the speaker(s) 312, the resource(s) 314, the camera 316, the sensor 318, and/or the module(s) 320 may be communicatively coupled with each other via a bus (illustrated using directional arrows). The electronic device 100 may also include one or more input devices (not shown in
The processor 302 may be a single processing unit or a number of units, all of which could include multiple computing units. The processor 302 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, processor cores, multi-core processors, multiprocessors, state machines, logic circuitries, application-specific integrated circuits, field programmable gate arrays and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor 302 may be configured to fetch and/or execute computer-readable instructions and/or data (e.g., the data 314) stored in the memory 304. In an example embodiment, the processor 202 of the electronic device 100 may be integrated with the processor 302 for optimizing the causal device usage parameters of the electronic device 100.
The memory 304 may include any non-transitory computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and/or dynamic random access memory (DRAM), and/or non-volatile memory, such as read-only memory (ROM), erasable programmable ROM (EPROM), flash memory, hard disks, optical disks, and/or magnetic tapes. In an example embodiment, the memory 204 of the electronic device 100 may be integrated with the memory 304 of the electronic device 100 for optimizing the causal device usage parameters of the electronic device 100.
The communication interface unit(s) 306 may enable (e.g., facilitate) communication by the electronic device 100 with the devices 104. The display 308 may display various types of information (for example, media content, multimedia data, text data, etc.) in the form of messages to the user of the electronic device 100. The display 308 may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, an organic LED (OLED) display, a plasma cell display, an electronic ink array display, an electronic paper display, a flexible LCD, a flexible electro-chromic display, and/or a flexible electro wetting display. The display 308 can be a touch enabled display unit or a non-touch display unit. In an example, the electronic device 100 may be the smartphone with or without voice assistance capabilities. The microphones(s) 310 and the speaker(s) 312 may be integrated with the electronic device 100.
The resource(s) 314 may be physical and/or virtual components of the electronic device 100 that provide inherent capabilities and/or contribute to the performance of the electronic device 100. Examples of the resource(s) 314 may include, but are not limited to, memory (e.g., the memory 304), power unit (e.g. a battery), display unit (e.g., the display 308), etc. The resource(s) 314 may include a power unit/battery unit, a network unit (e.g., the communication interface unit(s) 306), etc., in addition to the processor 302, the memory 304, and the display 308.
The camera 316 may be integral or external to the electronic device 100 (therefore illustrated with dashed lines). Examples of the camera 316 include, but are not limited to, a 3D camera, a 360-degree camera, a stereoscopic camera, a depth camera, etc. In an example, the electronic device 100 may be the smartphone and therefore may include the camera 316.
The sensor 318 may be integral or external to the electronic device 100 (therefore illustrated with dashed lines). Examples of the sensor 318 include, but not limited to, an eye-tracing sensor, a facial expression sensor, an accelerometer, a gyroscope, a location sensor, a gesture sensor, a grip sensor, a biometric sensor, an audio module, and/or location/position detection sensor. The sensor 318 may include a plurality of sensors.
The module(s) 320 may include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement data types. The module(s) 320 may also be implemented as, signal processor(s), state machine(s), logic circuitries, and/or any other device and/or component that manipulate signals based on operational instructions.
Further, the module(s) 320 may be implemented in hardware, instructions executed by a processing unit, or by a combination thereof. The processing unit may include a computer, a processor, such as the processor 302, a state machine, a logic array and/or any other suitable devices capable of processing instructions. The processing unit may be a general-purpose processor which executes instructions to cause the general-purpose processor to perform operations or, the processing unit may be dedicated to perform the functions. In another aspect of the present disclosure, the module(s) 320 may be machine-readable instructions (software) which, when executed by a processor/processing unit, may perform any of the described functionalities.
According to some example embodiments, operations described herein as being performed by any or all of the module(s) 206, the modality feature extraction module 210, the intent identification module 212, the information module 214, the beautify module 216, the structural module 218, the modality prediction module 220 and the rendering module 222, may be performed by at least one processor (e.g., the processor 302) executing program code that includes instructions (e.g., the module(s) 206 and/or the module(s) 320) corresponding to the operations. The instructions may be stored in a memory (e.g., the memory 304).
Referring to
Referring to
In the data pipeline 120, text features are extracted from the input 110 including at least one of an image 1101, a voice (or a voice signal) 1103, and/or a rich text 1105. As shown in
The non-text features such as image, voice, stress, textual features are classified. Thereafter, these annotations and classifications are used for predicting features as tags using the learned model 402 and the predefined model 404 which are then processed for applying beautification at a later stage based on respective features. The learned model 402 may include Bi-LSTM Encoder 4021, feature attentions 4023 and Softmax 4025. The pre-defined model 404 may include Bi-LSTM Decoder 4041, multimodal classifier 4043(using information of feature attentions 4023) and Softmax 4045. Further, these tags are mapped to form classes which are used for predicting modes 408 and multimodal features 406 of the input 110.
The audio parser 410 may parse audio input included in the input 110 and extract features such as a text sequence along with corresponding time points 4101, volume with time points 4103, words in the audio sound (Audio of words 4105), emotion with text and audio 4107, and/or importance of text parts 4109.
In an embodiment, the importance of text parts 4109 may be determined with a combination of the text sequence along with time points 4101 and the volume corresponding to each of time points 4103. For example, a certain text (e.g. Text11201 as shown in
In an embodiment, the text parser 420 may extract or detect features from the text included in the input 110. The features extracted from the text included in the input 110 may include the text sequence 4201, font/size/style/color of the text 4203, emotion/emotion transition 4205, summary 4207, and/or importance of text part 4209. In an embodiment, the importance of text part 4209 may be determined based on any of the foregoing features extracted from the text. For example, if the font, size, style (e.g. italics), and/or color of a certain text (e.g. Text21203 as shown in
In an embodiment, the image parser 430 may extract or detect features from the image included in the input 110. The features extracted from the image included in the input 110 may include background/position (of text and/or an object in the image)/color 4301, a region of interest (ROI) 4303, text/order of text 4305, summary 4307, and/or importance of image parts 4309. In an embodiment, the importance of image parts 4309 may be determined based on any of the foregoing features extracted from the image included in the input 110. For example, the text “LOL” included in the image 4311 may represent the emotion extractable from the image and the text “LOL” may be regarded as the feature of high importance. The electronic device 100 may store a list of words or variations thereof (e.g., “LOL”), in the database 322, which are classified as important words or emotion words.
The video parser 440 may parse video input included in the input 110 and extract features such as frames and audio 4401 and/or importance of video parts based on a rate of scene changes, volume of each scene, etc. Features extractable by the audio parser 410 and the image parser 420 may also be extracted by the video parser 440.
The audio parser 410, the text parser 420, the image parser 430, and the video parser 440 may be a part of the modality feature extraction module 210 and/or the processor 202.
Referring to
For example, assuming that the text in the input 110 recites “Dear Barath, Happy Birthday”, the modality feature extraction module 210 detects a textual format as modality information from the input 110 and extracts features such as happy-minded emotion based on the words of “happy birthday” included in the text. The electronic device 100 may generate customized content 450 with an image of flower(s) 4501 suitable for the mood of happy-minded and the words of “happy birthday”. The customized content 450 may also include the original text of “Dear Barath, Happy Birthday” in an appropriate position in the customized content 450. The electronic device 100 may perform an Internet search using a keyword “happy” to find an image (e.g., the image 4501) related to the emotion of happiness, or search a local storage (e.g., the memory 204 or the database 208) to find an image related to the happiness emotion.
Referring to
Referring to
Referring to
The smiling face with expressions such as winking, grinning, and rolling on the floor laughing may be classified into one category of “smiling face” in an embodiment. A pre-existing image recognition method may be used for recognizing the mood extractable from the image. The beautify module 216 receives the detected multimodal features 406 and modes 408 as an input to generate one or more layout templates for the multimodal content. The one or more layout templates include style 511 of texts, font 513 of texts, and color 515 of foreground and background of the multimodal content 112.
Referring to
Referring to
Referring to
Referring to
The electronic device 100 receives an input 110 from a user within an application at step 1002. Further, at step 1004, the electronic device 100 detects or extracts modality information and a plurality of features from the input 110. The detecting of the modality (modality information) and the plurality of features from the input 110 includes detecting emotion of the user, recognizing activity of the user and categorizing the modality and the plurality of features based on the detected emotion, recognized activity, learned model and a predefined model. At step 1006, the electronic device 100 identifies intent of the input 110 from the detected modality and the plurality of features. At step 1008, the electronic device 100 retrieves information for multimodal content generation and at step 1010, the electronic device 100 generates the multimodal content based on the detected modality, the detected plurality of features and the retrieved information. The retrieving of the information includes receiving the intent of the input, extracting at least one keyword from the received intent of the input and/or searching the information based on the at least one keyword in an internal database of the electronic device 100 and/or other connected devices. At step 1012, the electronic device 100 renders the generated multimodal content to the user. The user may select at least one multimodal content if a plurality of multimodal contents are generated and shown on a display 308 of the electronic device 100.
The electronic device may obtain an input 110 from a user at step 1022. The input 110 may be obtained via an application installed in the electronic device 100. The processor 202 of the electronic device 100 may detect, from the user input 110, at least one feature and a modality of the input among a plurality of modalities at step 1024. The modalities may include at least one of a text format, a sound format, a still image format, and a moving image format. The detection of the at least one feature and the modality of the input may include detecting an emotion or an activity of the user based on texts recognized in the input 110 and categorizing the modality and the at least one feature based on the detected emotion of the user or the detected activity of the user. The categorization may be performed based on a learned model and a predefined model. In an embodiment, the processor 202 may obtain intent information based on texts extracted from the input 110. Based on the intent formation, the processor 202 may determine at least one of a font size, a font type, and a color of the texts. Based on the intent information, the processor 202 may a layout of the customized content. In an embodiment, the processor 202 may obtain at least one of time information and location information from the input 110. In an embodiment, the processor 202 may determine that the customized content requires an intervention of the user and in response to the intervention of the user, the processor 202 may control to display a second customized content.
The processor 202 of the electronic device 100 may determine a mode of a customized content from a plurality of modes based on the at least one feature and the modality of the input 110 at step 1026. The plurality of modes may include an image mode and a text mode. The customized content may be generated further based on the detected emotion of the user.
The processor 202 of the electronic device 100 may generate the customized content based on the determined mode at step 1028. The customized content may be generated further based on the extracted texts and the determined mode. The customized content may be generated further based on the determined mode and the at least one of a font size, a font type, and a color of the texts. The customized content may be generated further based on the determined mode and the determined layout. The customized content may be generated based on the determined mode and the at least one of the time information and the location information.
Referring to
Referring to
Referring to
Referring to
Referring to
Referring to
Referring to
Referring to
Referring to
For another instance, the user 108 enters a text “Dear Parents, Please note that Sanskrit speaking and Vedic Maths class is cancelled for tomorrow on account of Akshay Tritya festival, however, there will be: Fun with Science: under 3 years olds 10-11 am=Simple science experiments. Dodge the tables: 7-13 year olds, 11 am-12 noon=revision of tables 2-12. No class for 4-6 years old tomorrow.” Looking forward to seeing you!”. The modality feature extraction module 210 recognizes the input as text, the intent identification module 212 recognizes the intent as “information”. The information module 214 retrieves the information such as when: tomorrow, keywords: Keywords: Sanskrit Speaking, Vedic Maths, Fun with science, Dodge the tables, Cancel due to Akshay tritya Festival, Simple science experiments, revision of tables 2-12, no class, Time: 10-11 am, 11-12 pm, Age: <3, 7-13, 4-6. The beautify module 216 predicts the layout template such as Style: Simple, Font: Time New Roman (Body), Size: 12, Text Color: Black. The structural module 218 determines the structure as Class: Sanskrit Speaking, Vedic Maths, Fun with science, Dodge the tables Time: 10-11 am, 11-12 pm; Age: <3, 7-13, 4-6 Details: Cancel due to Akshaytritya Festival, Simple science experiments, revision of tables 2-12, no class. The modality prediction module 220 determines the mode of the multimodal content 112 to be a text mode and accordingly, the rendering module 222 generates the multimodal content as an output, illustrated in the table below:
Referring to
Likewise, steps 2402 through 2416 illustrate generating customized content 2418 by matching each of modules with corresponding function as illustrated in
The foregoing exemplary embodiments are merely exemplary and are not to be construed as limiting. The present teaching can be readily applied to other types of apparatuses. Also, the description of the exemplary embodiments is intended to be illustrative, and not to limit the scope of the claims, and many alternatives, modifications, and variations will be apparent to those skilled in the art.
Number | Date | Country | Kind |
---|---|---|---|
202041000734 | Jan 2020 | IN | national |