The present invention relates generally to animation based encoding, decoding and playback of a video content, and, particularly but not exclusively, to a method and system for animation based encoding, decoding and playback of a video content in an architecture.
Digital video communication is a rapidly developing field especially with the progress made in video coding techniques. This progress has led to a high number of video applications, such as High-Definition Television (HDTV), videoconferencing and real-time video transmission over multimedia. Due to the advent of multimedia computing, the demand for these videos has increased, their storage and manipulation in their raw form is very expensive and it significantly increases the transmission time and makes storage costly. Also, the video file which is stored in form of simple digital chunk is very less informative for the machine to understand. Also, the existing video processing algorithms do not have a maintained standard defining which algorithm to use when. Also, the video search engines contemporarily are mostly based on manual data fed in the metadata part which leads to a very limited search space.
For example, Chinese Patent Application CN106210612A discloses about a video coding method and device, and a video decoding method and device. The video coding device comprises a video collection unit which is used for collecting video images; a processing unit which is used for carrying out compression coding on background images in the video images, thereby obtaining video compression data, and carrying out structuring on foreground moving targets in the video images, thereby obtaining foreground target metadata; and a data transmission unit which is used for transmitting the video compression data and the foreground target metadata, wherein the foreground target metadata is the data in which video structured semantic information is stored. This invention provides a method to compress a video with the video details obtained in form of objects and background and the action with the timestamp and location details.
Another United States Patent Application US20100156911A1 discloses about a method wherein a request may be received to trigger an animation action in response to reaching a bookmark during playback of a media object. In response to the request, data is stored defining a new animation timeline configured to perform the animation action when playback of the media object reaches the bookmark. When the media object is played back, a determination is made as to whether the bookmark has been encountered. If the bookmark is encountered, the new animation timeline is started, thereby triggering the specified animation action. An animation action may also be added to an animation timeline that triggers a media object action at a location within a media object. When the animation action is encountered during playback of the animation timeline, the specified media object action is performed on the associated media object. This invention discloses that the animation event is triggered when reaching a bookmark or a point of interest.
Another European Patent Application EP1452037B1 discloses about a video coding and decoding method, wherein a picture is first divided into sub-pictures corresponding to one or more subjectively important picture regions and to a background region sub-picture, which remains after the other sub-pictures are removed from the picture. The sub-pictures are formed to conform to predetermined allowable groups of video coding macroblocks MBs. The allowable groups of MBs can be, for example, of rectangular shape. The picture is then divided into slices so that each sub-picture is encoded independent of other sub-pictures except for the background region sub-picture, which may be coded using another sub-pictures. The slices of the background sub-picture are formed in a scan-order with skipping over MBs that belong to another sub/picture. The background sub-picture is only decoded if all the positions and sizes of all other sub-pictures can be reconstructed on decoding the picture.
Another European Patent Application EP1492351A1 discloses about true-colour images that are transmitted in ITV systems by disassembling an image frame into background and foreground image elements, and providing background and foreground image elements that are changed in respect to background and foreground image elements of a preceding image frame to a data carousel generator and/or a data server. These true-colour images are received in ITV systems by receiving background and foreground image elements that are changed in respect to received background and foreground image elements of a preceding image frame from a data carousel decoder and/or a data server, and assembling an image frame from the received background and foreground image elements.
In view of the above deficiencies mentioned in the conventional approaches, there is a need to have a technical solution to ameliorate said one or more deficiencies or to at least provide a solution to change the way a video is stored to make it more understandable to the machine, as well as reduce the video size and the transmission bandwidth. Hence, there is a need of a video compression technique that helps in reducing the number of bits required to represent a digital video data while maintaining an acceptable video quality.
This summary is provided to introduce concepts related to a method and system for animation based encoding, decoding and playback of a video content in an architecture. The invention, more particularly, relates to animating actions on the video content while playback after decoding the encoded video content, wherein a video compression, decompression and playback technique is used to save bandwidth and storage for the video content. This summary is neither intended to identify essential features of the present invention nor is it intended for use in determining or limiting the scope of the present invention.
For example, various embodiments herein may include one or more methods and systems for animation based encoding, decoding and playback of a video content in a client-server architecture. In one of the implementations, the method includes processing the video content for dividing the video content into a plurality of parts based on one or more category of instructions. Further, the method includes detecting one or more object frames and a base frame from the plurality of parts of the video based on one or more related parameters. The one or more related parameters includes physical and behavioural nature of the relevant object, action performed by the relevant object, speed, angle and orientation of the relevant object, time and location of the plurality of activities and the like. Further, the detected object frame and the base frame are segregated from the plurality of parts of the video based on the related parameters. Further, detecting a plurality of activities in the object frame and storing the object frame, the base frame, the plurality of activities and the related parameters in a second database. The method further includes identifying and mapping a plurality of API's corresponding to the plurality of activities based on the related parameters. Further, a request for playback of the video content is received from one of a plurality of client devices. Here, the plurality of client devices includes smartphones, tablet computer, web interface, camcorder and the like. Upon receiving a request for playback of the video content, the plurality of activities with the object frame and the base frame are merged together for outputting a formatted video playback based on the related parameters.
In another implementation, the method includes capturing the video content for playback. Further, the method includes processing the captured video content for dividing the video content into a plurality of parts based on one or more category of instructions. Further, the method includes detecting one or more object frames and a base frame from the plurality of parts of the video based on one or more related parameters. Further, the detected object frame and the base frame are segregated from the plurality of parts of the video based on the related parameters. Further, detecting a plurality of activities in the object frame and storing the object frame, the base frame, the plurality of activities and the related parameters in a second database. The method further includes identifying and mapping a plurality of API's corresponding to the plurality of activities based on the related parameters. Further, the method includes merging the plurality of activities with the object frame and the base frame together for outputting a formatted video playback based on the related parameters.
In another implementation, the method includes receiving a request for playback of the video content from one of a plurality of client devices. Further, the method includes processing the received video content for dividing the video content into a plurality of parts based on one or more category of instructions. Further, the method includes detecting one or more object frames and a base frame from the plurality of parts of the video based on one or more related parameters. Further, the detected object frame and the base frame are segregated from the plurality of parts of the video based on the related parameters. Further, detecting a plurality of activities in the object frame and storing the object frame, the base frame, the plurality of activities and the related parameters in a second database. The method further includes identifying and mapping a plurality of API's corresponding to the plurality of activities based on the related parameters. Further, the method includes merging the plurality of activities with the object frame and the base frame together for outputting a formatted video playback based on the related parameters.
In another implementation, the method includes sending a request for playback of video content to the server. Further, the method includes receiving from the server one or more object frames, a base frame, plurality of API's corresponding to a plurality of activities and one or more related parameters. Furthermore, the method includes merging the object frames and the base frame with the corresponding plurality of activities associated with the plurality of API's and playing the merged video.
In another implementation, the system includes a video processor module configured to process the video content to divide the video content into a plurality of parts based on one or more category of instructions. Further, the system includes an object and base frame detection module which is configured to detect one or more object frames and a base frame from the plurality of parts of the video based on one or more related parameters. Further, an object and base frame segregation module is configured to segregate the object frame and the base frame from the plurality of parts of the video based on the related parameters. Further, an activity detection module is configured to detect a plurality of activities in the object frame. Furthermore, the system includes a second database stores the object frame, the base frame, the plurality of activities and the related parameters. The system further includes an activity updating module which is configured to identify a plurality of API's corresponding to the plurality of activities based on the related parameters and to map a plurality of API's corresponding to the plurality of activities based on the related parameters. Further, the system includes a server which is configured to receive a request for playback of the video content from one of a plurality of client devices. Further, the system includes an animator module which is configured to merge the plurality of activities with the object frame and the base frame for outputting a formatted video playback based on the related parameters.
The various embodiments of the present disclosure provides a method and system for animation based encoding, decoding and playback of a video content in a client-server architecture. The invention, more particularly, relates to animating actions on the video content while playback after decoding the encoded video content, wherein a video compression, decompression and playback technique is used to save bandwidth and storage for the video content.
The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to reference like features and modules.
It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems embodying the principles of the present disclosure. Similarly, it will be appreciated that any flow charts, flow diagrams, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
The various embodiments of the present disclosure provides a method and system for animation based encoding, decoding and playback of a video content in a client-server architecture. The invention, more particularly, relates to animating actions on the video content while playback after decoding the encoded video content, wherein a video compression, decompression and playback technique is used to save bandwidth and storage for the video content.
In the following description, for purpose of explanation, specific details are set forth in order to provide an understanding of the present claimed subject matter. It will be apparent, however, to one skilled in the art that the present claimed subject matter may be practiced without these details. One skilled in the art will recognize that embodiments of the present claimed subject matter, some of which are described below, may be incorporated into a number of systems.
However, the methods and systems are not limited to the specific embodiments described herein. Further, structures and devices shown in the figures are illustrative of exemplary embodiments of the present claimed subject matter and are meant to avoid obscuring of the present claimed subject matter.
Furthermore, connections between components and/or modules within the figures are not intended to be limited to direct connections. Rather, these components and modules may be modified, re-formatted or otherwise changed by intermediary components and modules.
The present claimed subject matter provides an improved method and system for animation based encoding, decoding and playback of a video content in a client-server architecture.
Various embodiments herein may include one or more methods and systems for animation based encoding, decoding and playback of a video content in a client-server architecture. In one of the embodiments, the video content is processed for dividing the video content into a plurality of parts based on one or more category of instructions. Further, one or more object frames and a base frame are detected from the plurality of parts of the video based on one or more related parameters. The one or more related parameters includes physical and behavioural nature of the relevant object, action performed by the relevant object, speed, angle and orientation of the relevant object, time and location of the plurality of activities and the like. Further, the detected object frame and the base frame are segregated from the plurality of parts of the video based on the related parameters. Further, a plurality of activities are detected in the object frame and the object frame, the base frame, the plurality of activities and the related parameters are stored in a second database. Further, a plurality of API's corresponding to the plurality of activities are identified and mapped based on the related parameters. Further, a request for playback of the video content is received from one of a plurality of client devices. Here, the plurality of client devices includes smartphones, tablet computer, web interface, camcorder and the like. Upon receiving a request for playback of the video content, the plurality of activities with the object frame and the base frame are merged together for outputting a formatted video playback based on the related parameters.
In another embodiment, the video content is captured for playback. Further, the captured video content is processed for dividing the video content into a plurality of parts based on one or more category of instructions. Further, one or more object frames and a base frame are detected from the plurality of parts of the video based on one or more related parameters. Further, the detected object frame and the base frame are segregated from the plurality of parts of the video based on the related parameters. Further, a plurality of activities are detected in the object frame and the object frame, the base frame, the plurality of activities and the related parameters are stored in a second database. Further, a plurality of API's corresponding to the plurality of activities are identified and mapped based on the related parameters. Further, the plurality of activities are merged with the object frame and the base frame together for outputting a formatted video playback based on the related parameters.
In another embodiment, a request is received for playback of the video content from one of a plurality of client devices. Further, the received video content is processed for dividing the video content into a plurality of parts based on one or more category of instructions. Further, one or more object frames and a base frame are detected from the plurality of parts of the video based on one or more related parameters. Further, the detected object frame and the base frame are segregated from the plurality of parts of the video based on the related parameters. Further, a plurality of activities are detected in the object frame and the object frame, the base frame, the plurality of activities and the related parameters are stored in a second database. Further, a plurality of API's corresponding to the plurality of activities are identified and mapped based on the related parameters. Further, the plurality of activities are merged with the object frame and the base frame together for outputting a formatted video playback based on the related parameters.
In another embodiment, a video player is configured to send a request for playback of video content to the server. Further, one or more object frames, a base frame, plurality of API's corresponding to a plurality of activities and one or more related parameters are received from the server. Furthermore, the object frames and the base frame are merged with the corresponding plurality of activities associated with the plurality of API's and the video player is further configured to play the merged video.
In another embodiment, the video player is further configured to download one or more object frames, the base frame, the plurality of API's corresponding to the plurality of activities and one or more related parameters and to store one or more object frames, the base frame, the plurality of API's corresponding to the plurality of activities and one or more related parameters. The video player which is configured to play the merged video further creates buffer of the merged video and the downloaded video.
In another embodiment, a video processor module is configured to process the video content to divide the video content into a plurality of parts based on one or more category of instructions. Further, an object and base frame detection module is configured to detect one or more object frames and a base frame from the plurality of parts of the video based on one or more related parameters. Further, an object and base frame segregation module is configured to segregate the object frame and the base frame from the plurality of parts of the video based on the related parameters. Further, an activity detection module is configured to detect a plurality of activities in the object frame. Furthermore, a second database is configured to store the object frame, the base frame, the plurality of activities and the related parameters. Further, an activity updating module is configured to identify a plurality of API's corresponding to the plurality of activities based on the related parameters and to map a plurality of API's corresponding to the plurality of activities based on the related parameters. Further, a server is configured to receive a request for playback of the video content from one of a plurality of client devices. Further, an animator module is configured to merge the plurality of activities with the object frame and the base frame for outputting a formatted video playback based on the related parameters.
In another embodiment, the object frame and the base frame are stored in the form of an image and the plurality of activities are stored in the form of an action with the location and the timestamp.
In another embodiment, the video content is processed for dividing said video content into a plurality of parts based on one or more category of instructions, wherein the received video content is processed by the video processor module. Further, one or more types of the video content are detected and one or more category of instructions are applied on the type of the video content by a first database. The video content is then divided into a plurality of parts based on the one or more category of instructions from the first database.
In another embodiment, a plurality of unknown activities are identified by the activity updating module. A plurality of API's are created for the plurality of unknown activities by the activity updating module. These created plurality of API's are mapped with the plurality of unknown activities. Moreover, the created plurality of API's for the plurality of unknown activities are updated in a third database.
In another embodiment, the related parameters of the object frames are extracted from the video content.
In another embodiment, the plurality of unknown activities that are identified by the activity updating module further comprises detecting the plurality of API's corresponding to the plurality of activities in the third database and segregating the plurality of activities from the plurality of unknown activities by the activity updating module.
In another embodiment, a foreign object and a relevant object from the object frame are detected by an object segregation module.
In another embodiment, the plurality of activities that are irrelevant in the video content are segregated by an activity segregation module.
In another embodiment, a plurality of timestamps corresponding to the plurality of activities are stored by a timestamp module. Further, a plurality of location details and the orientation of the relevant object corresponding to the plurality of activities are stored by an object locating module. A plurality of data tables are generated based on the timestamp and location information and stored by a files generation module.
In another embodiment, the location is a set of coordinates corresponding to the plurality of activities. And the plurality of timestamps are corresponding to start and end of the plurality of activities with respect to the location.
In another embodiment, an additional information corresponding to the object frame is stored in the second database. Further, an interaction input is detected on the object frame during playback of the video content and the additional information along with the object frame is displayed.
In another embodiment, the first database is a video processing cloud and the video processing cloud further provides instructions related to the detecting of the scene from the plurality of parts of the video to the video processor module and determines the instructions for providing to each of the plurality of parts of the video. Further, each of the plurality of parts of the video is assigned to the server, wherein said server provides the required instructions and a buffer of instructions are provided for downloading at the server.
In another embodiment, the second database is a storage cloud.
In another embodiment, the third database is an API cloud and the API cloud further stores the plurality of API's and provides the plurality of API's corresponding to the plurality of activities and a buffer of the plurality of API's at the client device.
In another embodiment, the first database, second database and the third database correspond to a single database providing a virtual division among themselves.
In another embodiment, the server is connected with the client and the storage cloud by a server connection module. And the client is connected with the server and the storage cloud by a client connection module.
In another embodiment, a plurality of instructions are generated for video playback corresponding to the object frame, the base frame and the plurality of activities based on the related parameters by a file generation module.
It should be noted that the description merely illustrates the principles of the present invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described herein, embody the principles of the present invention. Furthermore, all examples recited herein are principally intended expressly to be only for explanatory purposes to help the reader in understanding the principles of the invention and the concepts contributed by the inventor to furthering the art and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the invention, as well as specific examples thereof, are intended to encompass equivalents thereof.
In the present implementation, the server 108 includes, but are not limited to, a proxy server, a mail server, a web server, an application server, real-time communication server, an FTP server and the like.
In the present implementation, the client devices or user devices include, but are not limited to, mobile phones (for e.g. a smart phone), Personal Digital Assistants (PDAs), smart TVs, wearable devices (for e.g. smart watches and smart bands), tablet computers, Personal Computers (PCs), laptops, display devices, content playing devices, IoT devices, devices on content delivery network (CDN) and the like.
In the present implementation, the system 100 further includes one or more processor(s). The processor may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is configured to fetch and execute computer-readable instructions stored in a memory.
In the present implementation, the database may be implemented as, but not limited to, enterprise database, remote database, local database, and the like. Further, the database may themselves be located either within the vicinity of each other or may be located at different geographic locations. Furthermore, the database may be implemented inside or outside the system 100 and the database may be implemented as a single database or a plurality of parallel databases connected to each other and with the system 100 through network. Further, the database may be resided in each of the plurality of client devices, wherein the client 112 as shown in
In the present implementation, the audio/video input is the input source to the video processor module 102. The audio/video input can be an analog video signal or digital video data that is processed and deduced by the video processor module 102. It may also be an existing video format such as .mp4, .avi, and the like.
In the present implementation, the video processing cloud 114 is configured to provide the appropriate algorithm to process a part of the video content. The video processing cloud 114 is configured to provide scene detection algorithms to the video processor module 102. It further divides the video into a plurality of parts or sub frames and determines the algorithm to be used for each of the plurality of parts. Further, the video processing cloud 114 is configured to assign the plurality of parts or sub frames to the video processing server 110 that provides the appropriate algorithms to deduce about the object frame, base frame and plurality of activities of the video content. Further, the video processing cloud 114 is configured to detect and store a plurality of unknown activities in the form of animation in the API cloud 118. Further, a buffer of algorithms are provided which could be downloaded at the server 110. Further, the video processing cloud 114 is configured to maintain the video processing standards.
In the present implementation, the API cloud 118 is configured to store a plurality of animations that the video processing cloud 114 has processed. It further provides the accurate API as per the activity segregated out by the video processor module 102. The API cloud 118 is further configured to create an optimized and a Graphics Processing Unit (GPU) safe library. It is configured to provide a buffer of API's at the client 112 where the video is played.
In the present implementation, the storage cloud 116 is configured to store the object frame, the base frame and the plurality of activities that are segregated by the video processor module 102. The storage cloud 116 is present between the server 110 and client 112 through the connection module (104, 106). Here, the video processing cloud 114 is a first database, the storage cloud 116 is a second database and the API cloud 117 is a third database. The first database, second database and the third database correspond to a single database providing a virtual division among themselves.
Further, the system 100 includes a video processor module 102, a connection module (104, 106) and an animator module 108. The video processor module 102 is configured to process the analog video input and to segregate the entities which includes the objects also referred to as the object frame, the background frames also referred to as the base frame and the plurality of actions also referred to as the plurality of activities. The video processor module 102 is further configured to store these entities in the animator module 108. The video processor module 102 works in conjunction with the video processing cloud. Further, the conventional algorithms of the video processing techniques are used to deduce about the object frame, base frame and plurality of activities of the video content. Further, the system 100 includes the connection module which includes the server connection module 104 and the client connection module 106. The server connection module 104 is configured to connect the server 110 with the client 112 and the storage cloud 116. It also sends the output of the video processor module 102 to the storage cloud 116. The client connection module 106 is configured to connect the client 112 with the server 110 and the storage cloud 116. It also fetches the output of the video processor module 102 from the storage cloud 116. Further, the system 100 includes the animator module 108 which is configured to merge the plurality of activities with the object frame and the base frame and to animate a video out of it. The animator module 108 is connected to the API cloud 118 which helps it to map the plurality of activities with the animation API. It further works in conjunction with the API cloud 118.
In the present implementation, the system 100 includes the storage which includes the server storage 102 and the client storage 122. The server storage 120 is the storage device at the server side in which the output of the video processor module 102 is stored. The output of the video processor module 102 comes as the object frame, the base frame and the plurality of activities involved. These object frames and the base frames are stored as images and the plurality of activities are stored as action with location and timestamp. Further, the client storage 122 is configured to store the data obtained from the storage cloud 116. The data is the output of the video processor module 102 which comes as the object frame, the base frame and the plurality of activities involved. These object frames and the base frames are stored as images and the plurality of activities are stored as action with location and timestamp.
Further, the audio/video output is obtained using the animator module 108 which is configured to merge the plurality of activities with the object frame and the base frame.
Further, the scene detection module 202 is configured to detect the type of algorithm to be used in the video content. Each of the plurality of parts of the video content may need different type of processing algorithm. This scene detection module 202 is configured to detect the algorithm to be used as per the change in the video content. Further, the type of the video is obtained to apply the appropriate processing algorithm. Further, the appropriate algorithms are deployed to detect the type of the scene. The video processing cloud 114 obtains the type of the scene from the scene detection module 202 and then determines from one or more category of instructions to apply as per the relevance of the scene. Further, the video division module 204 is configured to divide the video into a plurality of parts as per the processing algorithm required to proceed. The video can be divided into parts and even sub frames to apply processing and make it available as a video thread for the video processors. Further, many known methods are used for detection of scene changes in a video content, colour change, motion change and the like and automatically splitting the video into separate clips. Once the division of the each of the plurality of parts is completed, said each of the plurality of parts is sent to the video processing cloud 114 where the available server is assigned the tasks to process the video. The video is divided into a plurality of parts as per the video processing algorithm to be used.
Further, the objects and base frames detection module 206 is configured to detect one or more object frames present in the part of the video content. The main three key steps in the analysis of video process are: moving objects detection in video frames, track the detected object or objects from one frame to another and study of tracked object paths to estimate their behaviours. Mathematically every image frame is matrix of order i×j, and the fth image frame be defined as a matrix:
where i and j is the width and height of the image frame respectively. The pixel intensity or gray value at location (m, n) at time t is denoted by (m, n, t). Further, the objects and base frames segregation module 208 is configured to segregate the object frame and the base frame. The fundamental objective of the image segmentation calculations is to partition a picture into comparative areas. Each division calculation normally addresses two issues, to decide criteria based on that segmentation of images is doing and the technique for attaining effective dividing. The various division methods that are used are image segmentation using Graph-Cuts (Normalized cuts), mean shift clustering, active contours and the like. Further, the objects segregation module 210 is configured to detect if the object is relevant to the context. The appropriate machine learning algorithms are used to differentiate a relevant object and a foreign object from the object frame. The present invention discloses characterization of optimal decision rules. If anomalies that are local optimal decision rules are local even when the nominal behaviour exhibits global spatial and temporal statistical dependencies. This helps collapse the large ambient data dimension for detecting local anomalies. Consequently, consistent data-driven local observed rules with provable performance can be derived with limited training data. The observed rules are based on scores functions derived from local nearest neighbour distances. These rules aggregate statistics across spatio-temporal locations & scales, and produce a single composite score for video segments.
Further, the activity detection module 212 is configured to detect the plurality of activities in the video content. The activities can be motion detection, illuminance change detection, colour change detection and the like. In an exemplary implementation, the human activity detection/recognition is provided herein. The human activity recognition can be separated into three levels of representations, individually the low-level core technology, the mid-level human activity recognition systems and the high-level applications. In the first level of core technology, three main processing stages are considered, i.e., object segmentation, feature extraction and representation, and activity detection and classification algorithms. The human object is first segmented out from the video sequence. The characteristics of the human object such as shape, silhouette, colours, poses, and body motions are then properly extracted and represented by a set of features. Subsequently, an activity detection or classification algorithm is applied on the extracted features to recognize the various human activities. Moreover, in the second level of human activity recognition systems, three important recognition systems are discussed including single person activity recognition, multiple people interaction and crowd behaviour, and abnormal activity recognition. Finally, the third level of applications discusses the recognized results applied in surveillance environments, entertainment environments or healthcare systems. In the first stage of the core technology, the object segmentation is performed on each frame in the video sequence to extract the target object. Depending on the mobility of cameras, the object segmentation can be categorized as two types of segmentation, the static camera segmentation and moving camera segmentation. In the second stage of the core technology, characteristics of the segmented objects such as shape, silhouette, colours and motions are extracted and represented in some form of features. The features can be categorized as four groups, space-time information, frequency transform, local descriptors and body modelling. In the third stage of the core technology, the activity detection and classification algorithms are used to recognize various human activities based on the represented features. They can be categorized as dynamic time warping (DTW), generative models, discriminative models and others.
Furthermore, the activity segregation module 214 is configured to segregate the irrelevant activities from a video content. For example, an irrelevant activity can be some insect dancing in front of a CCTV camera. Further, the activity updating module 216 is configured to identify a plurality of unknown activities. Further, the timestamp module 218 is configured to store timestamps of each of the plurality of activities. The time-stamping, time-coding, and spotting are all crucial parts of audio and video workflows, especially for captioning and subtitling services and translation. This refers to the process of adding timing markers also known as timestamps to a transcription. The time-stamps can be added at regular intervals, or when certain events happen in the audio or video file. Usually the time-stamps just contain minutes and seconds, though they can sometimes contain frames or milliseconds as well. Further, the object locating module 220 is to store the location details of the plurality of activities. It can store the motion as start and end point of the motion and curvature of motion. Further, the file generation module 222 is configured to generate a plurality of data tables based on the timestamp and location information. The examples of the data tables generated are as below:
Further, the video processor module 102 is configured to output the activity details of the video content as the type of the activity i.e. the activity, who performs the activity i.e. the object, on whom is the activity performed i.e. the base frame, when the activity is performed i.e. the timestamp and where the activity is performed i.e. the location. The output is a formatted video playback based on the related parameters. The related parameters includes physical and behavioural nature of the relevant object, action performed by the relevant object, speed, angle and orientation of the relevant object, time and location of the plurality of activities and the like.
Further, the animation API animates the activity that had occurred. It needs basic parameters required for the animation to run. Some examples are shown below:
Further, the player is an application capable of reading the object frame and the base frame and draw activities on and with them so as to give an illusion of a video. It is made up of simple image linkers and animation APIs. It is an application compatible for playback of a video in the format file. Further, the video player provides animation modules which are called with association of one or more objects. Further, the playback buffer is obtained by first downloading the contents which are the data of the plurality of activities, the object frame and the base frame. Then, merging the object frame and the base frame with the plurality of activities associated API's and playing the merged video.
Object Flower=new Object( )
BaseFrame Soil=new BaseFrame( )
Further, as a cactus in this soil is irrelevant to grow. Thus, the object is irrelevant to the context base frame. Thus, cactus would be a foreign object to this soil.
Further, a plurality of data tables based on the timestamp and location information as shown below are generated by the file generation module. As the above data tables are generated for the given video scenario, the activity is animated at the given time and the location and with the applicable animation APIs. Further, in this figure, the mapped animation API is downloaded and initiated at the node to play the animation. For example, F Blossom( ) API is downloaded for flower's blossom activity.
O: Set of foreground Objects
B: Set of Background Object
A: Action
Further, the video processor module 102 is configured to generate a function called as an Action function G (O, A, B) which is the function that is obtained after merging the entities O, A and B. thus G(O, A, B) is denoted as follows:
G(O,A,B):MovingCar(Car, Highway, Moving);
Such that,
O: Car
B: Highway
A: Moving
Here, the O, B being the images of the car and the highway, also holds the physical and behavioral data. Thus, O and B represent the object or the computer readable variable which holds the value of the object frame and the background frame. In
F(S): MovingCarAnimation(S)
Such that,
S={speed, angle, curvature, . . . }
Further, the animation-action mapping function is configured to calculate the most similar Animation function mapped to the input action function, which is given as below:
H(G)˜F
Thus, H(G) gives the most similar Animation Function F corresponding to given Action Function G which is shown in the below table:
Further, if an animation F is produced by an action G, then an animation F can also produce an action F−1 which is G. For example, if MovingCarAnimation(F) is produced due to MovingCarAction(G) then MovingCarAction(G) can also produce MovingCarAction2(G′) which would had been MovingCarAction(G). In simple terms, Moving Car animation can produce Moving Car action if Moving Car animation is produced by Moving Car action and vice versa. The action function G (O,A,B) is the inverse of F. Thus, F−1=G. This implies,
If, G→F
Then, F→G
Hence, F↔G
Thus, the Similarity function is the measure of how inverse an animation-action pair is. As shown in
For example, there is no action-animation pair in the map for moving car without gravity as such a video has never been processed. Thus, when such an action is detected, the Action Function Gc is created by the video processor module 102. But a similar function Fc is not found in the map. Thus the create module 1404 creates a new Animation Function Fc for this action. As shown in
G: Action function for the motion of the car for parking it,
V.P.: The vertical plane of the background frame, and
H.P.: The horizontal plane of the background frame.
In the
y=a EQ1:
where ‘a’ is a constant distance from H.P. As the motion is horizontal and EQ1 is parallel to H.P. After reaching to the parking lot, the car needs to rotate by some angle to adjust the turns as shown in
(x−a)2+(y−b)2=r2 EQ2:
where,
a: distance between H.P. and the center of the circle;
b: distance between V.P. and the center of the circle; and
r: radius of the circle
Further, this motion could also be represented by the equation for the arc of the circle. This is given by:
arclength=2πr(ø/360) EQ2′:
where,
r: radius of the arc; and
Ψ: central angle of the arc in degrees
y=mx+c EQ3:
where,
m: slope/gradient; and
c: intercept <value of y when x=0>
Further, the other motions shown in
(x−a)2+(y−b)2=r2 EQ4:
y=mx+c EQ5:
(x−a)2+(y−b)2=r2 EQ6:
y=mx+c EQ7:
(x−a)2+(y−b)2=r2 EQ8:
x=b EQ9:
where ‘b’ is a constant distance from V.P. As the motion is horizontal, EQ9 is parallel to V.P. Hence, the action function G is represented as below:
G=EQ1>EQ2>EQ3>EQ4>EQ5>EQ6>EQ7>EQ8>EQ9>null
where,
>: a special type of binary function such that,
If A>B, A happens before B; and
Null marks the end of the function.
Thus, G is the combination of all the motions that had taken place. Further, the animation function F as discussed above is used while playing the video. During the search, the action functions are generated with the help of the animation function. The action functions similar to the occurred action is received by the video processor module. It is the decision of the video processor module either to map the action to animation API or create a new animation API corresponding to the action occurred if there is no similarity.
In the example above, the animation-action map stores the linear and the rotary motions of the car. Thus, many action functions would be downloaded until all these types of motion functions are obtained i.e. from <EQ1 to EQ9>. The set of similar functions are downloaded until all of EQ1 to EQ9 are found. In case any of the motion function is not found, then the action function's animation function is created and added into the map, which is shown in the below table:
Thus,
G=G1∪G2∪G3∪G4∪G7 Or G3∪G5∪G7∪G8.
In another exemplary embodiment, match highlights can be made by analysing the frequencies of the video and sound waves. Further, the data related to the game is obtained which is most important. For example, a football goal kick could be kept in the highlights.
Scene1: Wedding of Blood bride:
Part1:
time <actor, action, base frame>
T0<bride, gets ready, wedding set>
T1<bride, listening to wedding prayers, wedding set>
Part 2:
T2<bridegroom, holds hand, wedding set>
T3<bridegroom, dies, wedding set>
Scene 2: Killing by blood bride:
Part3:
time <actor, action, base frame>
Tx <bride, dies, wedding set>
Ty <bride, becomes ghost, wedding set>
Part 2:
Tz <bride, kills X bride's bridegroom, X's wedding set>
The actors of the scene are detected and their physical and behavioral data traits are obtained. Further, the present invention provides a very refined and advanced video search engine, wherein even if the movie name is not known, the search could still return a relevant result.
In
It should be noted that the description merely illustrates the principles of the present invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described herein, embody the principles of the present invention. Furthermore, all the used cases recited herein are principally intended expressly to be only for explanatory purposes to help the reader in understanding the principles of the invention and the concepts contributed by the inventor(s) to furthering the art and are to be construed as being without limitation to such specifically recited used cases and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the invention, as well as specific examples thereof, are intended to encompass equivalents thereof.
Number | Date | Country | Kind |
---|---|---|---|
201911015094 | Apr 2019 | IN | national |
This application is a 371 of International Application No. PCT/KR2020/005050, filed Apr. 14, 2020, which claims priority to Indian Patent Application No. 201911015094, filed Apr. 15, 2019, the disclosures of which are herein incorporated by reference in their entirety
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/KR2020/005050 | 4/14/2020 | WO | 00 |