This invention generally relates to information processing, content processing and generation, multimedia, ontological subject processing, and generating multimedia compositions.
Content creation and generation is an important task in the online world of today for variety of reasons and in various areas of interest. The subject matters of the contents can range from sophisticated scientific research topics and programs, local or global political issues, business oriented analysis, to the daily life subject matters of temporary interest such as celebrity news, advertisement, entertainment etc. The contents are usually represented by a variety of types and media forms such as textual, audio or aural, visual, graphical, or by any combination of them, i.e. multimedia, in general.
Multimedia contents are more in demand and valuable since contents would be much more informative, entertaining, pleasing and easier to grasp when they are accompanied by more than one media representations. However, valuable content creation and particularly multimedia content creation and generation is not a trivial task.
A creator of a valuable content should usually know a great deal about the subject matter of the content and ways of presentations in order to create even a single media content such as a textual content. Making multimedia contents needs yet additional expertise, is time consuming, expensive and do not lend itself to automation easily.
Consequently, generation of content in general, and multimedia content in particular, is not straightforward making it a difficult assignment for general public as well as professional. Therefore, there is a need in the art for a process or method and system that can facilitate the production and sharing of variety of contents for everyone and for many desirable applications.
Information and contents can be represented by different languages and forms such as text, audio, image, video or combination of these forms.
A content creator, usually, has some ideas, design, scripts, or perhaps just a keyword and would like to generate a desired content for publishing or broadcasting or presentation. The starting content can be a short message text (e.g. SMS), twitter message, an audio command or speech, an email, a movie script, a short or long essay, written or spoken in any language. The starting content can even sometimes be a multimedia content. Assuming we have a given content then the problem is to transform the given content to another content having different materials, length, languages, media form, or publication/broadcasting type. Therefore, very often or for variety of reasons we need to represent a content by another content.
Accordingly, it would be desirable to have a method and system that can transform the representation of information from one form, language, and shape to another by capturing and regenerating the essence and semantics of the given information and represent it in another form having a desired semantic relationship with the given content. For instance, text messages in the forms of short messaging services (SMS), emails, twitter texts, or even long essays and scripts, would be more appealing and sometimes more informative if they are accompanied or transformed to a visual or aural message that are semantically related to the given message. Specially for entertainment, education, artistic experimentations, advertisement, and many other desirable applications it can be quite useful to have a system with a method of converting, for instance, textual compositions to, or accompanying by, other forms such as compositions of visual, audio, graphical, or graphical essay, and the like.
In this disclosure a method and system is presented to find or generate a representative content for a given content. The representative content can be of the same or different type of content media. The method can be used to generate a textual representative content for a textual given content, a visual representative content for a given textual content or a given aural content and/or vice versa, or an audio representative content for a given visual content or given textual content and/or vice versa and so forth.
According to one embodiment of the invention the representative content is found or is composed or is generated from pre-existing or pre-built contents or the partitions of pre-exited or pre-built contents.
The problem then lies in finding or selecting an appropriate representative for a given content. The most appropriate representative content however is not always easy to find since there can be found many representative contents for a given content or sometimes not being able to find a suitable enough representative satisfying the desired semantic relationship between the given content and the representative one. The most appropriate representative content may be found in different partitions of a collection of contents that may not be of the same form and type as the given content.
The disclosed method is in effect to transform or translate the contents of different forms, types, and languages to each other in order to produce a desired content as a representative for a given content. Although the disclosed method is essentially applicable for performing content representation and transformation regardless of the type of content and languages, in the exemplified embodiment we use the method in a general instance. That is to generate multimedia content for a given content. However, since semantics can be best processed by textual representation therefore we focus on transforming the textual information to other types or converting a multimedia content to another multimedia content by extracting the textual information of the multimedia contents. Hence, in the description of we use an equally general exemplary embodiment wherein the given content is a textual content which will be transformed to or will be represented by multimedia content. In one embodiment, according to this disclosure, this is done automatically for an input content and/or and at the request of a user or a client's.
The method uses the existing or pre-built contents to generate new contents. According to one embodiment of the invention, a plurality of multimedia contents or a set of segments of multimedia contents are obtained from which the Ontological Subjects of different types, e.g. textual, audio or aural, visual, are extracted from the said plurality of multimedia contents or their partitions. Ontological subjects, used in this disclosure, in general, are in accordance with the definitions of patent application entitled “System And Method For A Unified Semantic Ranking Of Compositions Of Ontological Subjects And The Applications Thereof”. Filed on Apr. 7, 2010, application Ser. No. 12/755,415 (incorporated herein as a reference). However, more specific types of Ontological Subjects (OSs) are given in the definition section of the detailed description of the current disclosure.
The corresponded Ontological Subjects of different types then are stored and indexed in a computer readable database or storage media for further processing and usage. From the list of OSs of the different type, desired types of Participation Matrixes, having desired orders, (denoted by XYPMkl in this disclosure) are built. The XYPMkls show the participation of Ontological Subjects of one type (type X) and a predetermined order (order k) into Ontological Subjects of another type (type Y) and another predetermined order (order l).
For instance a TVPM12 can be built that shows in each partitions of a movie what words have been used in the dialogue of the characters in the movie's partitions, or segments, or clips (the T stands for textual and V stands for visual Ontological subjects). So effectively one dimension of the matrix corresponds to the words and another dimension is corresponded to the clip that the words have been appeared or used. (the clips can, for instance, be denoted by names of the data files that the clips are stored)
Depends on the application, different PM matrixes can be built to show, for instances, the participation of audio partitions in text partitions, audio partitions in visual partition, or textual partitions in textual partitions and so on. Nevertheless, since the semantically related partitions of different types can be represented and processed easier by its textual forms, we mainly focus on the participations matrices of TTPMkl format.
The information of the TTPMkls are used for finding and selecting the most appropriate partitions of one type, language, and form as the representative of one or more Ontological Subjects of another type, language, and form. For instance, using the raking methods disclosed in the non-provisional patent application Ser. No. 12/755,415 which is incorporated herein as reference; one can scores all the partitions of a composition or a set of compositions that contain a specific OS or a group of specific OSs and select the most semantically related partitions based on their scores or ranks. In fact when the stored repository of pre-existing multimedia partitions becomes very large, the information of Participation Matrixes and the ranking methods of patent application Ser. No. 12/755,415 can be used to cluster and classify the partitions and/or be used for searching thorough countless multimedia pre-existing partitions and find the most semantically related multimedia partitions in response to a query and building an effective multimedia search engine.
Methods are given for calculating the most semantically related partitions from a set of partitions, e.g. existing partitions of plurality of contents, to a given partition. In one embodiment, the method comprises building participation matrix/s for a first set of contents and also building participation matrix/s for the given content and using the information of the both participation matrices to find the most semantically related partitions from the first set of content to the partitions of the given content.
Exemplary systems are also given for generating a new multimedia composition from pre-existing, or pre-built multimedia partitions for an input composition or content. In one embodiment, we desire to find the most semantically related pre-existed partitions to the partitions of an input composition to the system for which we want to compose a semantically related multimedia composition. The semantic relatedness is a predefined relationship function. For instance, a semantic relationship can be defined as “similarity” which can be measured by simply counting/calculating the common OSs of the two partitions or be measured by evaluating/calculating other predefined similarity functions. More desired or complicated relationships can also be considered, such as a certain range of semantic similarity, or semantically opposite relation, or semantic stem similarity, contextual correlations, etc.
Now, for instance, when a client or user input a message in the form of a text or audio the method provide algorithm/s to select the most appropriate visual or audio partition to represent the text or the audio for accompanying the message or be used instead of the message. The system then composes a multimedia content by retrieving the pre-stored partitions of the multimedia, from the storage, databases, or filing systems and assembles a new multimedia content according to the clients input or request essentially using the existing or pre-built contents.
The system employing the method can further expand the client's input, (i.e. the given content) to include more semantics and materials according to the definition of services that the system is designed for. In one embodiment the expansion is equivalent to applying the method more than once. For instance one can generate a secondary content for the given content and then apply the method further to generate yet another content (e.g. a multimedia) for the generated secondary content.
Furthermore, the characteristics or attributes of the audio, video or texts can be modified before composing the final generated contents using the customary methods of video and signal processing or text processing methods. For instance, a movie can be transformed to animation using video processing methods (e.g. by edge detection) or colors being modified, voice being modified by speech processing methods (e.g. synthesizing or modifying voices), voice being generated by computer, or replacing words or phrases with other words and phrases in the text by natural language processing methods (e.g. synonym substitution), etc.
Moreover the method and system can readily be used for clustering or classifying contents and/or searching and finding the most appropriate videos or multimedia contents from a collection or repository of videos and multimedia contents, in response to given content (e.g. a keyword, question, textual content, audio command, speech, another multimedia etc.)
The visual or audio partitions can be selected from a special genre, or specified character/s, player, voices, music, types and the like. Alternatively the inventory of partitions of the existing or pre-built multimedia contents can be classified under different databases according to predetermined criteria such as the genre, directors, creators, writers, speakers, or the character/s, the voice of the characters, or any other desired criteria. A non comprehensive list of applications is given in the description for illustration purposes only. Those skilled and knowledgeable in the art can readily employ or adapt the method for variety of applications that have not been explicitly mentioned throughout the disclosure without departing from the scope and sprit of the present invention.
a: illustrates conceptually an exemplary embodiment of a multimedia content and shows the process of a method for extracting and storing the ontological subjects of different type and order from a plurality of multimedia segments or compositions and build the participation matrices.
b: illustrate more explicitly the concept of participation matrix and the process of building one exemplary participation matrix for the multimedia content of
a: illustrates explicitly the building of one desired participation matrix, i.e. TTPM12 for an exemplary input content in the form of an input text.
b: shows the process of using the stored TTPMst12 from the
Information bearing symbols can be in the form of audio signals, text characters, and visuals such as pictures, still images, animations, or video clips and signals. In this disclosure the information bearing symbols are called Ontological Subjects and are defined herein below in the definitions section.
Now we start describing the invention in details. In this invention it is noticed that many applications can be viewed as finding or generating a piece of content in response to or as representative for another content. The applications may include generating a multimedia content for a given textual script, translating a composition from one language to another, or providing a response to a chatter input to a chatting machine or chat-robot, or simply getting some content generated which is related to an input or a given content.
Furthermore, semantics contain information that can be carried by symbols and OSs as defined in the definition section in the form of texts, data, and signals. Therefore semantics are carried and transmitted by symbols. In the world of semantics which is comprehendible by human, semantics are most efficiently transferred by natural language texts. Therefore, if the semantic information of different representation (i.e. video signal/data, audio signal/data, and texts in another language) is transformed to textual type and symbols, the semantic processing of the semantic information represented by different media can also be processed by text. Consequently, using the initial corresponding media representation of the semantics, one becomes able to convert the results of the semantic processing of the texts back again to the desired media representative e.g. visual, audio or textual. Accordingly we may first transform the semantics representation media to textual forms of a desired language, e.g. English, and perform our processing and calculation and finally represent the resultant composition of semantics by the desired media, language, form or type. That is the basic idea of the invention.
Accordingly, in this disclosure a method and system are given that can transform the representation of the information from one form, shape, or language to another by capturing and regenerating the essence and semantics of the original information and represent it with another piece of content, having perhaps a different media type, language, or form, according to predefined relationship functions between the given content and the represented content. An interesting application of the method is, of course, to transform a text message to a multimedia clip using pre-built or pre-existing multimedia contents.
In another aspect, a method is given, for instance in it's general form, for converting a given multimedia content to another multimedia content using pre-built or pre-existed multimedia content or their partitions to compose a new content. The partition of the composed content and the partitions of the given content have certain pre-defined relationship. However, as mentioned, since semantics can be best processed by textual representation therefore we focus on transforming a given textual information or content to other contents with the same or different type. The given textual content could have been extracted from a multimedia content itself. In the preferred embodiment, according to this disclosure, the generation of content for a given content is done automatically for an input content and/or and at the request of a user or a client/s. For example, textual compositions to visual/audio compositions which are semantically similar or expressing a pre-defined semantic relationship to the given content. The given content and the composed content can belong to different languages. For example the language of the given content could be English and the language of the composed content be Spanish.
The method uses the existing or pre-built contents, single or multimedia, to generate new representative contents for a given content. Although, the method is applicable for transformation of all types of contents (even with different languages) and compositions to each other, in the detailed description the method is explained by way of a general exemplified case of transforming a textual content to a multimedia content. That is because a multimedia content can also be semantically represented by a textual content. Therefore for ease of explanation we assume the given content is textual. The given content therefore can be any textual composition, i.e. textual OSs, such as keywords, a date, a subject matter, a sentence, a paragraph, a short script, short and long essay, or a document in any language. The given textual content furthermore could have been generated for another content in general, e.g. an initial given content is a sentence and the secondary given content could be a number of sentences or statement semantically related to the first given content (the given sentence) and therefore our assumed given content could be a representative content itself.
Now the invention is disclosed in details in reference to the accompanying figures and exemplary cases and embodiments in the following sub sections.
According to an exemplary embodiment of the invention, a plurality of multimedia contents or a set of segments of multimedia contents are obtained from which the Ontological Subjects of different types, e.g. textual, audio or aural, visuals, are extracted from partitions of the set of said multimedia contents.
Referring to
Therefore if one can have semantically similar or matched partitions of OSs of different nature, type, or language then one can transform a composition of one type of OSs, e.g. a textual composition, to another form or type of composition, e.g. visual or audio. Accordingly in this disclosure, in order to get semantically related TOS, AOS, and VOSs we should have a repository of OSs of different types. To build the repository, one exemplary convenient way is to start with the available or premade multimedia contents and separate their ontological subjects of different types with the desired orders. Alternatively it is possible to have a pre-built database or filing systems of audio, visual, and textual partitions of related, similar or the same semantic content.
Referring to
Referring to
Also shown in the
Therefore we can have three kinds of representations here which are referring basically to the same or similar semantics or semantic partitions or frames. As mentioned the desirable semantic partitions here are usually the sentences which were pronounced in the clips or would be pronounced in the composed clip as we will discuss later in this disclosure.
The partitions consequently can be indexed and kept, temporal or permanent, in databases or file systems for easy retrieval and later use or processing.
Also shown in
In general case the number of pre-existing or prebuilt multimedia contents and partitions could be very very large and diverse, so the number of Ontological Subjects of any type and any order becomes large thereby making a large repository and inventory of most of the practical and routine visual scenes and the associated texts and the vocal conversation etc.
Referring to
The general stored participation matrix is denoted by TXPMst12, and take the following form:
The index “st” stands for “stored” and show that this matrix is built for multimedia partitions of pre-existed multimedia contents and is stored (temporarily or more permanently). However in implementation of the method the PM can be stored or shown by other forms and instruments such as lists, lists in lists, dictionaries, cell arrays and so on which basically contain the information of participation of one OS in another OS. The current notations and formulation is for ease of explanation and calculation and should not be interpreted as the only way of implementing the method's concept in actual implementations.
The entries of the PMs are nonzero if the words or character is used or participated in the partitions of the text, the audio or the video, and is zero otherwise as indicated in the
The matrix TXPMst12 carry the most important and useful information related to the multimedia partitions. It can be used to summarize a large multimedia content, cluster and classify the partitions and/or be used for searching thorough large number of multimedia pre-existing partitions and finding the most semantically related multimedia partitions in response to a query and building an effective multimedia search engine. The applications of participation matrices have been explained in the patent application Ser. No. 12/755,415 filed on 7 Apr. 2010 which can be used readily in here. However in this embodiment we are more focused in composing new multimedia compositions using pre-existed multimedia contents because it uses the most general use of PMs besides being able effectively to search and sift through partitions in response to a given content or query.
Referring to
However there could be built various other Participation Matrices and with different orders, such as TVPM22 or AVPM22. For example TVPM22 shows the participation information of sentences into their respective visual clips in which the visual clips would usually be a data file object, rather than a text string, where they can be referred to by their file names. In practice only the textual representation of the audio and visual partitions is enough for performing the calculations and processing the semantic information. Moreover texts of different languages can be used to built a PM. Assuming there is a collection of textual contents from one language, one can partition the text of the first language and find the semantically related, e.g. similar, textual partitions from the second language and build a part TTPMlm wherein the first T belongs to one language and the second T belongs to a different language.
The purpose of making a list of all type of OSs and building the PMs is to have them stored as an inventory of building blokes of a multimedia that on demand can be fetched or retrieved and be used in a new multimedia composition. Nevertheless these software objects can be built on demand and in real time. So from a large number of multimedia collections we build a list of TOS1 and TOS2 extracted from the partitions of those collections and corresponding them to the respective visual and audio OSs. Consequently the participation matrixes are built that essentially carry the information about the usage of each lower order OS, specially the TOS1 in higher order OSs. These stored data and information are for both retrieval facilitation and numerical calculation such as calculating the similarity measures between OSs as will follow later on in this disclosure.
In an actual and practical uses it is desirable to have it stored a large inventory list of visual partitions, along with a large list of their textual and audio partition so that databases can accommodate any or at least most of hypothetical retrieval requests, from a client, user or a software engine, demanding a predefined relationship with their prospective representative content.
In this embodiment we construct a library of video clips of real videos, animations, comic scripts, pictures, etc. that are conveying a semantic unit (or segment) or a number of semantic units that might be related. Semantic units here are meant as short semantically meaningful and/or a combination of semantics which can describe an event or object or an abstract idea. For instance an English word is a semantic unit referring to an entity, a symbol, or state of an entity and the like whereas in here a sentence that is composed of a subject, verb, and an object can also be considered a semantic unit too. Therefore semantic units can use combiner or connectors to form a larger semantic unit.
Referring to
One simple way to get the text and extract and partition the OSs of different orders and types is to convert the audio to text or get the text from the subtitles of the movie and clips when there is one included in the multimedia. Nowadays most of DVDs also contain the text of the conversations and almost the scripts of the movie. Moreover there are numerous, freely available, repositories of movies and their text and scripts and many user generated video clips are also available. However, in reality legal issues related to copyrighted materials and contents must be taken care of by certain arrangements which are not the main points of this description.
Referring to
In one special and important case input content in the
Accordingly in
Now using the information of TTPMin12 and TTPMst12 we can find the best semantically matched partitions (from the stored pre-existed multimedia partitions) for the input partitions in order to assemble a multimedia composition for the input text or audio. Particularly when someone translate or substitute one or of the TOS1 in the LTOSin1 or LTOSst1 she/he can get a desired relationship, e.g an antonym relation, between the partitions of the input with the stored partitions.
a shows a simple exemplary case of demonstrating the method to find the best matched partitions of the multimedia partitions. In this case assume the text of the input content is “How are you? Nice weather today!.”, that contains two simple partitions or sentences. Again here we built a participation matrix for this input textual composition and calculate the similarity matrix if the desired predefined relationship function is the semantic similarity between the partitions of the given text and the stored partition.
However in
SMin,st2|1=(TTPMin12)′*TTPMst12. (3)
In Eq. (3) the “′” shows the matrix transposition operation and SMin,st2|1 is the similarity matrix shows the pair-wise similarity of each of the partitions of the input compositions with each of the stored partitions.
For the above exemplary cases of the stored partitions and the input partition the resulting similarity matrix is:
which shows the first partition of the input content is most similar to the second stored partition and the second partition of the input content is most similar to the third stored partition (where the similarity coefficient, i.e. the entry of similarity matrix, in each row is maximum). Hence the second and third partitions of the stored repository of multimedia partitions will be the best suited representatives of the input text and the Generated Multi Media Vector (denoted as GMMVout and shows the sequence of output partitions) as the output of the Multimedia generator will be:
GMMVout=[P2stP3st].
The GMMVout is therefore used for assembling and playing the generated multimedia corresponding to the input text from the stored inventory of the multimedia partitions. In general when the numbers of stored partitions are large enough there could be found more than one representative contents with high similarity with one or more of the input partitions. Therefore there is also a possibility to choose more than one representative partition from the stored partitions to one of the input partitions.
It should be mentioned that P2st and P3st can be either the audio part or visual parts of different pre-existed multimedia contents or both can belong to the same pre-existed multimedia contents as long as they are corresponded to the same textual partitions (for the case of semantic similarity relation). In other words the process can independently be done for text to audio conversion in demand or text to visual conversion in demand and if desired combing the audio and visual parts of different origin to generate new multimedia compositions. In general, the audio and the visual partitions can be selected from independent pre-existed multimedia contents or different sets of multimedia contents.
Furthermore the attributes of the representative partitions or the generated content in general can be modified before or after assembling the generated contents. For instance characteristics, attributes, and semantics of the generated content can be altered from their pre-existed form. Voices can be filtered by signal equalizers, videos, and visuals can be distorted or changed, visual colors can be modified and changed, or even the semantics can be altered or substituted with other symbols or ontological subjects.
There could also be other types of relationship that can be defined such as an antonym or semantically opposite type of relationships. In this case one can substitute the words or the partitions of the input content with their antonym by consulting with a word dictionary, taxonomy, ontology etc. For implementation one can translate or substitute one or more of the initial TOS1 in the LTOSin1 or LTOSst1 with one or more TOS1 that have predetermined or predefined relationship with the initial TOS1 and get a desired relationship, e.g. an antonym relation, between the partitions of the input with the stored partitions using the calculations above.
Another relationship function can be defined as a measure of semantic context similarity, by replacing some of the words or partitions of the input content and/or the words or partition of the stored content with their stems and senses, synonyms, similar words, associated words, or any other words and partitions which deemed desirable. Semantic context similarity in particular might be useful in order to increase the chance of finding matches and representatives for the given content. One way to find semantic context similarity between the partitions is to replace groups of words and phrases, having similar meaning or stem, with one common word or phrases, wherein the replacement for a similar group of words derived from a dictionary or ontology such as WordNet collections as explained in the patent application Ser. No. 12/755,415.
Another way to get a predefined contextual relationship, while still using Eq. (3), between the partitions of the input content and that of the generated or found partitions from the stored repository of partitions of the contents is to replace words, i.e. TOS1 of initial TOS1 in the LTOSin1 or LTOSst1 with one or more TOS1 that are semantic associates of the initial TOS1, having predetermined association strength to each other in order. The contextual similarity or contextual relationship case is interesting in practice for finding or generating short or long answers in response to a chatter's input or a questionnaire input in which the response and the input are somehow related but not each others mirror so that there could be, for instance, a meaningful conversation between a chatter and a machine. Therefore similarity matrix concept can be used effectively in the implementation and calculation to find one or more partitions from the stored partitions having desired relationship with one or more of the partitions of the input composition or content. For instance the similarity matrix, SM, can even be used to find semantically opposite or non-similar partitions.
Going back to the
It is also noticed that the list of words and character, i.e. the rows of both PMs, can be either chopped or extended to have the same number of participating ontological subjects. Those competent in the art can simply realize that there are various embodiments to make the building of similarity matrix possible by either extending or chopping one or both lists of the words and characters form the input text and/or from the stored list so that the multiplication (i.e. Eq. (3)) is possible. Depending on the definition of the similarity measure the appropriate list of OSs, e.g. LTOS1, or row dimension of both matrixes can be determined. For instance, it is possible to only consider those stored partitions that contain at least one common word with one of the partitions of the input content, in the calculation by, for example, combing them out their corresponding rows of the TTPMst12.
In general other forms of similarity measures can be defined as:
SMijin,stl|k=ƒ(Piinkl,Pjstkl) (4)
where SMin,stl|k is the similarity matrix of OSs of order l given k derived based on the participations of OSs of order k in the OSs of order l of input and the stored ones, Piinkl and Pjstkl are the ith and jth column of the TTPMinkl, and TTPMstkl respectively and corresponds to the partitions of input text and that of the stored partitions. Also ƒ is a predefined function or operator of the two vectors Piinkl and Pjstkl. The function ƒ yields the desired similarity measure and usually is proportional to the inner product or scalar multiplication of the two vectors.
In one preferred embodiment the similarity of partitions can be given by:
which is the cosine similarity, i.e. correlation, measure and in fact shows the similarity between partitions of the input composition and that of the stored partitions in the system. This similarity measure is between zero and one.
Alternatively, in many cases the similarity measure is more justified if one uses the following formula:
where PiinklPjstk1 is the number of common OSs of order k between Piinkl, i.e. OSiinl, and Pjstkl, i.e. OSjstl (the inner product of binary vectors of Piinkl and Pjstkl) and PiinklPjstkl is the total number of unique OSs of order k for the combined Piinkl and Pjstkl (i.e. the summation of logical OR of binary vectors of Piinkl and Pjstkl).
Having obtained the similarity measure of input partitions and the stored partitions, for each input partition the one or more most semantically similar partitions of stored OSs of the desired type can be selected to be used as the representative of that particular input partition. Usually the sequence of the partitions would be the same as the sequence of the partition in the input textual or audio composition.
One may built an inventory of content partitions, e.g. video clips, in house to have a pre-built repository of multimedia clips, or visuals, and audio clips corresponding to a plurality of textual contents, such as a list of sentences, phrases, statements, essays, etc. In this case she/he normally has an inventory of pre-built video shots, animations, avatars, 3D animation etc., with their corresponded textual dialogue or description of the scene.
In practice the repository of stored partitions of the set of the original (pre-built) or pre-existed multimedia contents can include several tens or hundreds of millions of partitions almost semantically covering all partitions of possible input compositions to the system. Moreover there will be many choices of classes of visuals, audio, voices, and languages that can represent the input composition or content. Those masterful in the art can devise and introduce efficient numerical methods and alternative formulations to efficiently calculate the similarities or any other desired semantic relationship values, finding and retrieving the most desired and appropriate stored partition to be used in the generated multimedia without departing from the scope and sprit of the current disclosure.
More importantly, though in the illustrating example we were more focused on finding the visual partitions for the input text or audio, it is possible, by the same concept, to find the most semantically similar audio partitions for the input text or an input aural or in general any input multimedia content in any language which might not be the same as the stored contents. Therefore in the generated multimedia it becomes possible as an option to a user to choose the voice, language, and the visual to represent an input composition or content. The voice, language, and visual partitions in the generated multimedia should not necessarily belong to the same pre-existed multimedia content. Therefore the method can also be used to generate cross-language contents or multimedia content.
Referring to
The system consists of hardware and software programs needed to store the databases, obtain and process pre-existed or pre-built contents, perform the algorithms, and process the requests of clients and receive user input content from the user's computer devices or across a communication network. Customarily the system will include processing units of variety of type and physical mechanisms, computer servers and software packages for serving the client in the frontend or working for the client request at the backend engine and fulfill the client request (e.g. web servers, file servers, application servers, etc.).
User's Computer devices can include laptop, desktop computers, handheld computer devices, mobile phones, any point of connection to the internees terminal, workstation computers, gaming machines and generally any electronic device capable of sending, receiving and processing data thorough communication and computer networks. Said electronic devices or computers can be connected to networks by network interfaces via wires, cables, fiber optics, or wireless networks of current and/or future generations and technologies such as Wi-Fi, WiMAX, 4G networks, etc.
In describing an exemplary service of the system in
Continuing therefore with describing
The system can also contain the processing units and software and storage apparatuses to perform the algorithm and storing the partitions of pre-existing or original pre-built multimedia contents from different sources and store them in its databases for other parts or units of the system to use. The stored partitions can further be classified and stored in the database based on their features such as genre, characters, time of production, subjects, serial name, characters' voices, type, required bandwidth or any other conceivable feature that can cluster or classify the pre-existing media contents.
A user can chose a real or animated character with a voice of the same or the voice of another preferred character, even the user's voice, to represent the user in the generated multimedia. Moreover a repository of movie collections can be turned into animated version, which many people might find more entertaining, and from which users can chose from. Therefore using the method and the system disclosed here with the combination of available voices and visuals one can have an endless opportunity to compose multimedia contents as representations of even a single input composition.
The method, system and service can be applied to perform many useful services, such as real time conversion of call conferences to multimedia conferences, educational and presentation purposes, entertainments, and fun etc.
One of the possible applications of such a system and method of multimedia generation is for aspiring or casual content creators who quickly want to prototype their textual or audio content turned into a multimedia content. Moreover they can use the system and method to capitalize on their content creation talent.
Optionally the content creators and the service and system provider can capitalize on their creation and investment as depicted in
Referring to
As a practical and general example of using such system in it's more complicated shape of service and operation, consider the animated television series of “The Simpsons” or the sitcom of the ‘Seinfeld” shows that was a hit during the 90's. Now if a talented creator composes single or episodic scripts and would want to use the characters of Seinfeld show, or the voices of Simpson's series characters for that matter, then she/he probably would find a significant number of viewers and audiences for her compositions (given that the semantics of the input script is well composed) by borrowing the characters of the Seinfeld show. Accordingly she/he would like and should be able to capitalize on her/his talent. One way is to allow the service provider or provider of the content generator to be able to insert advertisement materials in the composed content or multimedia from the user input composition. Of course the Seinfeld shows as well as many other in demand media contents are copyrighted. However there are possibilities to use copyrighted materials by various financial instruments such as licensing by the service provider, paying royalty, revenue sharing with the copyright holder etc.
Though the above hypothetical example is targeted at entertainment applications, nevertheless the disclosed methods and systems have many valuable applications in education, journalism, document translation, and rapid sharing of ideas and more effective communication etc.
Applications:
Few exemplary applications of the methods and the systems disclosed here are listed hereunder, which are intended for further emphasize and illustration only and not meant neither as an exhaustive application list nor as restrictive technical boundaries to the teachings of the invention nor the applications being restricted to these applications only.
1) Representing a text messages in the forms of short messaging services (SMS), emails, twitter texts, or even long essays and scripts, with other contents having different type, language, or media and/or length. In particular those texts would be more appealing and sometimes more informative if they are accompanied or transformed to a visual or audio message for conveying the same message as the text. For instance, a SMS can be converted to a multimedia message essentially conveying the same message as the given SMS but in a more entertaining and informative way.
2) Assisting to find and generate content for education, entertainment, artistic experimentations and many other desirable applications it can be quite useful to have a system with a method of converting a given content
3) The method can also be used for tagging and/or translating multimedia contents and/or their partitions so as to assist for efficient searching, ranking, and classifying collections of multimedia contents using the ranking methods of Ontological Subjects of different orders as disclosed in the patent application Ser. No. 12/755,415.
4) Converting and representing a summary or short note to a more comprehensive essay or multimedia, scripts to multimedia contents conveying the same essence and semantics.
4) Using the methods, a chat-robot or chatting machine can produce relevant responses to the input of a chatter so as to make the conversation between the user and the machine an intelligent conversation. A system can be envisioned that can converse with a user in which a user write or say something and the system, using the disclosed method, response back in some form or type of media content that has certain semantic relationship with the user input. Such system can be used as a Q&A service for users and clients wherein the system provides variety of contents for the user in response to his input (question).
5) In networks supporting mobile networking and communication one can use the method and system to provide content search and content generation ability and service to a mobile users through speech and voice recognition or mobile text messaging.
6) The method and system can effectively be used for content searches, e.g. multimedia searches, scoring relative to each other, and selections.
7) The method and the system can also be used for ranking the contents in a set of content. Specially ranking the multimedia content based on their substance in a set of multimedia content. The system and method can also be used by the same way for clustering and classification of multimedia content by calculating the relevancy, i.e. the values of the predefined relationship functions, a plurality of content to given content and then grouping the relevant content having passed predetermined relevancy threshold value and clustering those passed in a group or database, or files corresponding to the given content.
8) Assume a reporter or an amateur content creator want to have relevant highlights of a speech or speeches (perhaps from a famous speaker or personality) from some collection of audio or visual archives of the speaker by providing a content such as a keyword, statement or an essay. Furthermore, a simple statement can be turned into a content that also includes more known details in addition to the given content.
9) Small business owners can generate multimedia advertisement clips. Content creators and advertiser can find the related advertisement or interesting contents, e.g. visuals, for using into their advertisements to include with their content and on the other hand advertisers or agents can find the most suitable content to put their ad in. A creative writer (or an application developer) can transform her/his content into a multimedia and insert some advertisement into his generated content.
10) Visualizing a textual content to visually equivalent semantics using the pre-existing or pre-built visuals. Furthermore the textual content can further be enriched with more substances. For instance a textual statement states a particular fact about an entity can further be visualized by adding further information that is known about that entity. In other words generating a visual essay with a desired length for a given subject matter or a given content
11) Translating contents from one language to another. If the given content is in one language and the collection of pre-existing contents are in another language, one can generate participating matrixes in the same manner that is generated for variety of types, e.g. TVPMs, this time there is another type, attribute, or label for PMs which is the language of content. For instance one can translate a given text from language X to a representative text in language Y by making TXTYPM using the partitions or OSs of both languages. For this application the partitions of the a plurality of pre-existed contents, e.g. textual, in language Y are translated, using human operators for example, to language X wherein there is a one to one mapping relation between the partitions of the pre-existed contents of language Y and the translated corresponded partitions in language X. Alternatively one can have a dictionary of translated pair of partitions from two different languages wherein the keys of dictionary are partitions in language X, and the values of the dictionary are the semantically equivalent partitions in language Y. Therefore the method can still be applied here to translate a given content in language X to be transformed to representative contents in language Y. In this case one use the collective contents that corresponds to the keys of the dictionary in language X to build one or more TXTXPM and perform the method using the information of said matrix, TXTXPM, and find the representative contents from the stored partitions of contents in language X, i.e. they key of said dictionary, and consequently find the equivalent representative content in language Y. Once there is a large enough TxTYPM or its equivalent TXTXPM, i.e. large dimensions for PMs, the translation can be done effectively, and efficiently.
12) Comparing videos and audio in the form of electrical signals is very complicated and process intensive. However, once their semantic representation is transformed to textual contents, or tagged by textual partitions, then it would be easier to:
The list above is not comprehensive and mentions a number of possible applications the methods and the related systems may be employed by users and service providers, and software developers. Those skilled in the art can use the teaching of the invention for employing in numerous other applications departing from the scope and spirit of the invention.
In summary, it is noticed that for most of the subject matters there is a great deal of contents in the form of texts, audio, video, graphics, pictures, etc. in any language and culture. It is usually very hard to compose a totally new content or creating a content that is not at least semantically related to pre-existed contents, without using the existing contents or their parts one way or another. However the content and the composition made of combined partitions of the collection of contents can be different from any of pre-existing contents. The composition's script can make a great differentiation between the contents. Therefore combining different parts of the already existing contents into a new combination can yield a new content and composition that can be very valuable, especially, if the content media is also being modified while keeping the essential semantics of a given content or script.
Accordingly, the invention provides methods and systems for generating representative content for a given content in various forms, types, languages, and media. The method uses pre-built and pre-existing content partitions in a new and modified combination to yield a new content which has predefined relationship with a given content. The methods therefore are instrumental in creating and generating new contents which can be accompanied with other media contents. The methods and the systems can assist average creators and contributors to regenerate, transform, and produce content of high value, substance, attraction and entertaining and pleasing for consumers of the content.
Those familiar with the art can yet envision and use the methods and systems for many other applications. It is understood that the preferred or exemplary embodiments and examples described herein are given to illustrate the principles of the invention and should not be construed as limiting its scope. Various modifications to the specific embodiments could be introduced by those skilled in the art without departing from the scope and spirit of the invention as set forth in the following claims.
This application claims priority from U.S. provisional patent application No. 61/253,511 filed on Oct. 21, 2009, entitled “System and Method of Multimedia Generation” which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61253511 | Oct 2009 | US |