The present disclosure relates generally to the field of multimedia stream communications. More specifically, an aspect of the present disclosure provides systems and methods for real-time user verification and modification of a multimedia stream that includes the user.
During online interviews or other online interactions, a person's accent or looks may influence the outcome of the interaction (e.g., interview). On the other hand, during a video conference, if a user uses an avatar, it may not be possible to know if the user is who they say they are.
Accordingly, there is interest in improving multimedia steam communications.
The present disclosure relates generally to the field of multimedia steam communications. More specifically, an aspect of the present disclosure provides systems and methods for real-time user verification and modification of a multimedia stream that includes the user.
In accordance with aspects of this disclosure, a system for real-time verification and modification of a multimedia stream is presented. The system includes a processor and a memory. The memory includes instructions stored thereon, which, when executed by the processor, cause the system to: access a real-time multimedia stream of a user. The real-time multimedia stream includes a real-time image of the user and a real-time speech of the user. The instruction, when executed by the processor, further cause the system to determine a facial mapping of the user based on the real-time image of the user; authenticate an identity of the user based on the facial mapping of the user, in response to the authenticated identity of the user generate a mask of the user based on the facial mapping of the user, modify the real-time multimedia stream to display the mask on the user, determine feedback based on analyzing the real-time multimedia stream of the user, and display the feedback in real-time. The feedback includes a feedback score or a visual representation of a quality of an engagement of the user.
In an aspect of the present disclosure, the instructions, when executed by the processor, may further cause the system to generate a modified real-time speech of the user based on the real-time speech of the user. The modified real-time speech of the user may be different from the real-time speech of the user.
In another aspect of the present disclosure, the modified real-time speech may include a modified accent, emotion, demeanor, and/or inflection.
In yet another aspect of the present disclosure, the real-time image of the user may be modified by a pre-determined avatar selected by the user.
In a further aspect of the present disclosure, the instructions, when executed by the processor, may further cause the system to determine that the feedback score is below a threshold value and provide a visual warning that the feedback is below the threshold value.
In yet a further aspect of the present disclosure, the real-time multimedia stream of a user may include biometric data. Determining feedback may include providing biometric data as an input to a machine learning network and predicting the feedback by the machine learning network.
In an aspect of the present disclosure, the instructions, when executed by the processor, may further cause the system to display a review screen. The review screen includes a multimedia review and replay the multimedia stream.
In another aspect of the present disclosure, authenticating the identity of the user may be further based on a unique ID stored in a blockchain.
In yet another aspect of the present disclosure, the visual representation of the quality of an engagement of the user may include a circular graph and/or a bar chart.
In a further aspect of the present disclosure, the instructions, when executed by the processor, may further cause the system to generate a final report after completion of the real-time multimedia stream. The final report may include a scrollable timeline and feedback correlated to a time of the timeline.
In accordance with aspects of this disclosure, a computer-implemented method for real-time verification and modification of a multimedia stream includes: accessing a real-time multimedia stream of a user, the real-time multimedia stream including a real-time image of the user and a real-time speech of the user, determining a facial mapping of the user based on the real-time image of the user, authenticating an identity of the user based on the facial mapping of the user, generating a mask of the user based on the facial mapping of the user in response to the authenticated identity of the user, modifying the real-time multimedia stream to display the mask on the user, determining feedback based on analyzing the real-time multimedia stream, and displaying the feedback in real time.
In yet a further aspect of the present disclosure, the method may further include generating a modified real-time speech of the user based on the real-time speech of the user. The modified real-time speech of the user may be different from the real-time speech of the user.
In an aspect of the present disclosure, the modified real-time speech includes a modified accent, emotion, demeanor, and/or inflection.
In another aspect of the present disclosure, the real-time image of the user may be modified by a pre-determined avatar selected by the user.
In yet another aspect of the present disclosure, the method may further include determining that the feedback score is below a threshold value and providing a warning that the feedback is below the threshold value.
In a further aspect of the present disclosure, the real-time multimedia stream of a user may include biometric data. Determining feedback may include providing biometric data as an input to a machine learning network and predicting the feedback by the machine learning network.
In yet a further aspect of the present disclosure, authenticating the identity of the user may be further based on a unique ID stored in a blockchain.
In an aspect of the present disclosure, the feedback further includes a circular graph and/or a bar chart.
In another aspect of the present disclosure, the method may further include generating a final report after completion of the real-time multimedia stream. The final report may include a scrollable timeline and feedback correlated to a time of the timeline.
In accordance with aspects of this disclosure, a non-transitory computer-readable medium storing instructions which, when executed by a processor, cause the processor to perform a computer-implemented method for real-time verification and modification of a multimedia stream is presented. The method includes: accessing a real-time multimedia stream of a user, the real-time multimedia stream including a real-time image of the user and a real-time speech of the user, determining a facial mapping of the user based on the real-time image of the user, authenticating an identity of the user based on the facial mapping of the user, generating a mask of the user based on the facial mapping of the user in response to the authenticated identity of the user, modifying the real-time multimedia stream to display the mask on the user, determining feedback based on analyzing the real-time multimedia stream, and displaying the feedback in real time.
The details of one or more aspects of this disclosure are set forth in the accompanying drawings and the description below. Other aspects, features, and advantages will be apparent from the description, the drawings, and the claims that follow.
A better understanding of the features and advantages of the disclosed technology will be obtained by reference to the following detailed description that sets forth illustrative aspects, in which the principles of the technology are utilized, and the accompanying drawings of which:
Further details and aspects of exemplary aspects of the disclosure are described in more detail below with reference to the appended figures. Any of the above aspects and aspects of this disclosure may be combined without departing from the scope of the disclosure.
The present disclosure relates generally to the field of multimedia steam verification modification. More specifically, an aspect of the present disclosure provides systems and methods for real-time user verification and modification of a multimedia stream that includes the user.
Although this disclosure will be described in terms of specific aspects, it will be readily apparent to those skilled in this art that various modifications, rearrangements, and substitutions may be made without departing from the spirit of this disclosure.
For purposes of promoting an understanding of the principles of this disclosure, reference will now be made to exemplary aspects illustrated in the drawings, and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of this disclosure is thereby intended. Any alterations and further modifications of the inventive features illustrated herein, and any additional applications of the principles of this disclosure, as illustrated herein, which would occur to one skilled in the relevant art and having possession of this disclosure, are to be considered within the scope of this disclosure.
Referring to
In aspects of the disclosure, the memory 230 can be random access memory, read-only memory, magnetic disk memory, solid-state memory, optical disc memory, and/or another type of memory. In some aspects of the disclosure, the memory 230 can be separate from the controller 200 and can communicate with the processor 220 through communication buses of a circuit board and/or through communication cables such as serial ATA cables or other types of cables. The memory 230 includes computer-readable instructions that are executable by the processor 220 to operate the controller 200. In other aspects of the disclosure, the controller 200 may include a network interface 240 to communicate with other computers or to a server. A storage device 210 may be used for storing data. The disclosed method may run on the controller 200 or on a user device, including, for example, on a mobile device, an IoT device, or a server system.
With reference to
In machine learning, a CNN is a class of artificial neural network (ANN), most commonly applied to analyzing visual imagery. The convolutional aspect of a CNN relates to applying matrix processing operations to localized portions of an image, and the results of those operations (which can involve dozens of different parallel and serial calculations) are sets of many features that are delivered to the next layer. A CNN typically includes convolution layers, activation function layers, deconvolution layers (e.g., in segmentation networks), and/or pooling (typically max pooling) layers to reduce dimensionality without losing too many features. Additional information may be included in the operations that generate these features. Providing unique information that yields features that give the neural networks information can be used to provide an aggregate way to differentiate between different data input to the neural networks. The deep learning network may include a convolutional long short-term memory neural network (CNN-LSTM). Although CNNs are used as an example, other machine learning classifiers are contemplated.
The deep learning network 320 may be trained based on labeling training data to optimize weights. For example, image feature data may be taken and labeled using other image feature data. In some methods in accordance with this disclosure, the training may include supervised learning or semi-supervised. Persons skilled in the art will understand training the deep learning network 320 and how to implement it.
Referring to
Initially, at block 402, the controller 200 accesses a real-time multimedia stream that includes a user. The real-time multimedia stream may include a real-time image of the user and/or a real-time speech of the user. The multimedia stream of the user may include biometric data on the user derived from the image of the user or the voice/speech of the user. The real-time multimedia stream may be captured using a webcam, for example, during a video conference.
At block 404, the controller 200 determines a facial mapping of the user based on the real-time image of the user (
At block 406, the controller 200 authenticates an identity of the user based on the facial mapping of the user. In aspects, authenticating the identity of the user may further be based on a unique ID stored in a blockchain. For example, the blockchain may include an Ethereum-based blockchain. In aspects, the blockchain-based proof of identity provides users mobility and security. For example, the blockchain-based proof of identity provides a tamper-proof and trusted medium to distribute the asymmetric verification and encryption keys of the identity holders (i.e., the user). The controller 200 may use the voice of the user to authenticate the identity of the user. The user may onboard prior to the video chat (e.g., multimedia stream) by entering basic user information, which may be reviewed by a 3r d party background provider for baseline information. A multi-point biometric still frame scan with liveness check may be performed. The user may provide their identification, such as a driver's license, with may be matched with the still frame scan. The photo match may be compared to a threshold value to confirm the user's identity. For example, anything above about 80% would pass, and anything below 80% may go to a manual review. The name and address form the identification may be compared to the information entered.
At block 408 the controller 200 generates a mask of the user based on the facial mapping of the user in response to the authenticated identity of the user. For example, the real-time image of the user may be modified by a pre-determined avatar (i.e., mask) selected by the user.
At block 410, the controller 200 modifies the real-time multimedia stream to display the predetermined avatar (e.g., mask) on the user in the multimedia stream. In aspects, the controller 200 may analyze the real time speech of the use and modify the real-time speech of the user based on the real-time speech of the user. The modified real-time speech of the user is different from the real-time speech of the user. For example, the modified real-time speech may include a modified accent, emotion, demeanor, and/or inflection (
At block 412, the controller 200 determines feedback based on analyzing the real-time multimedia stream. In aspects, the controller 200 may determine feedback by providing biometric data as an input to a machine learning network and predicting the feedback by the machine learning network.
At block 414, the controller 200 displays the feedback in real-time. The feedback may include a feedback score. In aspects, the controller 200 may determine that the feedback score is below a threshold value and provide a visual warning that the feedback is below the threshold value.
In aspects, the feedback may include a feedback score or a visual representation of a quality of an engagement of the user. The visual representation of the quality of an engagement of the user may include a circular graph and/or a bar chart.
In aspects, the controller 200 may cause a display to display a review screen (
In aspects, the method 400 may be used to continue the persona (the avatar and modified real-time speech) in a remote work environment. The method 400 enables integration with virtual meeting software to help minimize workplace bias as well while ensuring that the employee is always the person in attendance and monitoring engagement in real time. Monitoring engagement is valuable to provide progressive upward feedback internally and enable multi-layered companies to cater their messaging more effectively.
Referring to
Referring to
Referring to
In aspects, the system may provide the user questions to answer during the multimedia conference (
The aspects disclosed herein are examples of the disclosure and may be embodied in various forms. For instance, although certain aspects herein are described as separate aspects, each of the aspects herein may be combined with one or more of the other aspects herein. Specific structural and functional details disclosed herein are not to be interpreted as limiting, but as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ this disclosure in virtually any appropriately detailed structure.
The phrases “in an aspect,” “in aspects,” “in various aspects,” “in some aspects,” or “in other aspects” may each refer to one or more of the same or different aspects in accordance with this disclosure.
It should be understood that the description herein is only illustrative of this disclosure. Various alternatives and modifications can be devised by those skilled in the art without departing from the disclosure. Accordingly, this disclosure is intended to embrace all such alternatives, modifications, and variances. The aspects described are presented only to demonstrate certain examples of the disclosure. Other elements, steps, methods, and techniques that are insubstantially different from those described above and/or in the appended claims are also intended to be within the scope of the disclosure.
This application claims the benefit of, and priority to, U.S. Provisional Patent Application No. 63/400,560, filed on Aug. 24, 2022, the entire contents of each are hereby incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63400560 | Aug 2022 | US |