System and method for optimizing video communications based on device capabilities

Information

  • Patent Grant
  • 11770584
  • Patent Number
    11,770,584
  • Date Filed
    Monday, May 23, 2022
    a year ago
  • Date Issued
    Tuesday, September 26, 2023
    7 months ago
Abstract
A system and method for optimizing video for transmission on a device includes, in one example, the method includes capturing an original video frame and scaling the original video frame down to a lower resolution video frame. The lower resolution video frame is downscaled using a first encoder to produce a first layer output and the first layer output is decoded. The decoded first layer output is upscaled to match a resolution of the original video frame. A difference is obtained between the upscaled decoded first layer output and the original video frame. The difference is independently encoded using a second encoder to create a second layer output. The first and second layer outputs may be stored or sent to another device.
Description
BACKGROUND

The manner in which communication sessions with remote parties occur is currently limited in functionality and flexibility. Accordingly, what is needed are a system and method that addresses these issues.


SUMMARY

In some example embodiments, a method for optimizing video for transmission on a device based on the device's capabilities includes capturing, by a camera associated with the device, an original video frame, scaling the original video frame down to a lower resolution video frame, encoding the lower resolution video frame using a first encoder to produce a first layer output, decoding the first layer output, upscaling the decoded first layer output to match a resolution of the original video frame, obtaining a difference between the upscaled decoded first layer output and the original video frame, and encoding the difference using a second encoder to create a second layer output, wherein the encoding to produce the second layer output occurs independently from the encoding to produce the first layer output.


In one or more of the above examples, the first and second encoders perform the encoding of the first and second layer outputs, respectively, using different video coding standards.


In one or more of the above examples, the first and second encoders perform the encoding of the first and second layer outputs, respectively, using identical video coding standards.


In one or more of the above examples, the method further includes communicating, by the device, with another device in order to determine which video coding standard is to be used to perform the encoding by each of the first and second encoders.


In one or more of the above examples, the method further includes sending the first and second layer outputs to another device during a video call.


In one or more of the above examples, the method further includes sending the first and second layer outputs to a storage device.


In some example embodiments, a method for decoding video for display by a device, the method includes receiving an encoded first video frame and an encoded second video frame, independently decoding the encoded first and second video frames using a first decoder and a second decoder, respectively, upscaling the decoded first video frame to a resolution matching a resolution of the decoded second video frame, and adding the upscaled decoded first video frame and the decoded second video frame to create an additive video frame.


In one or more of the above examples, the first and second decoders perform the decoding of the encoded first and second video frames, respectively, using different video coding standards.


In one or more of the above examples, the first and second decoders perform the decoding of the encoded first and second video frames, respectively, using identical video coding standards.


In one or more of the above examples, the method further includes sending the additive video frame for display by the device.


In one or more of the above examples, receiving the encoded first video frame and the encoded second video frame includes retrieving the encoded first video frame and the encoded second video frame from a storage device.


In some example embodiments, a device or system for sending and receiving optimized video frames includes a processor, and a memory coupled to the processor, the memory having a plurality of instructions stored therein for execution by the processor, the plurality of instructions including instructions for scaling an original video frame down to a lower resolution video frame, encoding the lower resolution video frame using a first encoder to produce a first layer output, decoding the first layer output, upscaling the decoded first layer output to match a resolution of the original video frame, obtaining a difference between the upscaled decoded first layer output and the original video frame, and encoding the difference using a second encoder to create a second layer output, wherein the encoding to produce the second layer output occurs independently from the encoding to produce the first layer output.


In one or more of the above examples, the first and second encoders perform the encoding of the first and second layer outputs, respectively, using different video coding standards.


In one or more of the above examples, the first and second encoders perform the encoding of the first and second layer outputs, respectively, using identical video coding standards.


In one or more of the above examples, the instructions further include communicating with another device in order to determine which video coding standard is to be used to perform the encoding by each of the first and second encoders.


In one or more of the above examples, the instructions further include sending the first and second layer outputs to another device during a video call.


In one or more of the above examples, the instructions further include sending the first and second layer outputs to a storage device.





BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding, reference is now made to the following description taken in conjunction with the accompanying Drawings in which:



FIGS. 1A-1C illustrate various embodiments of environments within which video communications may be optimized;



FIG. 2 illustrates one embodiment of an encoding process that may be used by a transmitting device to optimize a video frame prior to transmission or storage;



FIG. 3 illustrates one embodiment of a decoding process that may be used by a receiving device to recover a video frame optimized by the encoding process of FIG. 2;



FIG. 4 illustrates a flow chart showing one embodiment of an encoding process that may be used by a transmitting device to optimize a video frame prior to transmission or storage;



FIG. 5 illustrates a flow chart showing one embodiment of a decoding process that may be used by a receiving device to recover a video frame optimized by the encoding process of FIG. 4;



FIG. 6 illustrates a flow chart showing one embodiment of a process that may occur to establish and use video encoding parameters;



FIG. 7 illustrates one embodiment of a server conference call environment within which different encoded frames may be used for video communications;



FIGS. 8A-8D illustrate various embodiments of environments showing different optimization configurations; and



FIG. 9 is a simplified diagram of one embodiment of a computer system that may be used in embodiments of the present disclosure as a communication device or a server.





DETAILED DESCRIPTION

It is understood that the following disclosure provides many different embodiments or examples. Specific examples of components and arrangements are described below to simplify the present disclosure. These are, of course, merely examples and are not intended to be limiting. In addition, the present disclosure may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed.


Referring to FIGS. 1A-1C, embodiments of an environment 100 are illustrated within which various aspects of the present disclosure may be practiced. The environment 100 of FIG. 1A includes a first communication device 102 and a second communication device 104. The two devices 102 and 104 may be involved in a one-way or two-way communication session involving video. The two devices may be similar or different, and may include identical or different hardware and/or software capabilities, such as graphics processing units (GPUs), video encoders, and video decoders.


The environment 100 of FIG. 1B illustrates video information being sent from a communication device 102 to a storage 106. The environment of FIG. 1C illustrates a conference call environment where a server 108 uses a selective transmission unit 110 to manage a conference call with multiple communication devices 102, 104, and 112. Although only three communication devices are illustrated, it is understood that any number of devices may be in communication with the server 108, subject to technical limitations such as bandwidth, processing power, and/or similar factors.


The communication devices 102, 104, and 112 may be mobile devices (e.g., tablets, smartphones, personal digital assistants (PDAs), or netbooks), laptops, desktops, workstations, smart televisions, and/or any other computing device capable of receiving and/or sending electronic communications via a wired or wireless network connection. Such communications may be direct (e.g., via a peer-to-peer network, an ad hoc network, or using a direct connection), indirect, such as through a server or other proxy (e.g., in a client-server model), or may use a combination of direct and indirect communications.


One video optimization method involves the use of video scaling, which enables more efficient resource usage in video communications. Generally, the scaling of video may be accomplished using two different methods. The first scaling method is resolution scaling, in which a video frame has similar information at different resolutions, but uses different amounts of bandwidth due to the different resolutions. The second scaling method is temporal scaling, in which reference frames are arranged such that every other frame (or some percentage or number of frames) can be dropped without any real impact on the decoding process. The present disclosure refers generally to resolution scaling, although it is understood that temporal scaling may be incorporated with aspects of the described embodiments.


The present disclosure provides a scaling approach that enables video optimizations for various devices even when those devices do not include support for standards such as Scalable Video Coding (SVC) as embodied in the Annex G extension of the H.264/MPEG-4 AVC video compression standard. This allows the present disclosure's approach to be used with a broad range of devices, including devices such as older mobile phones and devices with different encoding and decoding hardware and/or software. By dynamically adjusting to each device's capabilities, the scaling process may be configured to achieve an optimized outcome that may take into account the device itself, available network bandwidth, and/or other factors. Furthermore, for devices that support standards such as SVC, the present disclosure's approach may provide more flexibility due to its enabling of independent encoding steps and the provision for using different encoders during different steps of the encoding process. For purposes of convenience, the terms “codec,” “video coding format,” and “video coding standard” may be used interchangeably in the present disclosure.


Referring to FIG. 2, one embodiment of an encoding process 200 that may be used by a sending device (e.g., one of the communication devices of FIGS. 1A-1C or the server 106/STU 110) is illustrated. An original video frame 201a is captured by a camera in step 202. The resolution and other parameters of the video frame 201a may depend on the settings used to capture the image, the quality of the camera, and/or similar factors. For purposes of example, the video frame is captured at 1280×720.


The original frame is then scaled down in step 204 to create a scaled down frame 201b. The scaling may be performed, for example, using the device's GPU. For purposes of example, the original video frame 201a is scaled down to 320×180 for the frame 201b. The frame 201b is then encoded in step 206 to produce a Layer 0 output. The Layer 0 output is sent to a server, another device, and/or to storage in step 216, depending on the environment within which the device is operating.


Depending on factors such as the level of scaling and the compression type used, Layer 0 may be significantly smaller than the original frame while containing much of the same information as the original frame. For example, Layer 0 may be around 1/16th the size of the original image and the amount of bandwidth may be reduced to around ⅛th of the original bandwidth that would otherwise be needed.


The Layer 0 output is decoded in step 208 and scaled up to the original resolution in step 210 to create a frame 201c. In the present example, the decoded frame 201b is scaled up from 320×180 to 1280×720 by the GPU. Due to the process of scaling and/or encoding/decoding, the 201b frame will likely not be exactly the same as the original frame 201a even after it is scaled up. For example, if a lossy algorithm is used to scale down the frame to 320×180, then some information will generally be lost during the downscaling process. When the frame is upscaled to the original resolution as frame 201c, the lost information may result in differences between the scaled up frame 201c and the original frame 201a.


In step 212, the difference between the original frame 201a and the scaled up frame 201c is calculated. This operation may be performed, for example, by the GPU. This difference results in a “ghost” image 201d that contains the differences between the original frame 201a and the scaled up frame 201c. The actual content of the ghost image 201d may vary depending on the process used to scale the frame and the encoding process used to create the Layer 0 output. In step 214, the ghost image 201d is encoded to produce a Layer 1 output. The Layer 1 output is sent to a server, another device, and/or storage in step 216, depending on the environment within which the device is operating. Is it understood that the terms “Layer 0” and “Layer 1” are used for purposes of illustration and any identifiers may be used for the encoder outputs.


It is noted that the encoding step 214 is independent of the encoding step 206. Accordingly, different encoding processes may be used by the two steps or the same encoding process may be used. This allows flexibility in the encoding processes. For example, a preferred encoder for the low resolution encoding that produces the Layer 0 output may not be ideal for the high resolution encoding of step 214. Accordingly, because of the described independent encoding process, the encoding steps 206 and 214 may be performed using different video coding standards.


The encoders may provide header information, such as encoder type, layer number, timestamps (e.g., to ensure the correct Layer 0 and Layer 1 frames are used properly on the receiving device), resolution information, and/or other information. The encoding process 200 of FIG. 2, including the creation and inclusion of header information, may be managed by an application on the device, and may include coordination with an STU (e.g., the STU 110) of FIG. 1C and/or other communication devices. According, determining which video coding standards may be used may include a negotiation process with other devices. The encoders may be hardware, while the decoders (which are generally less complex and use fewer resources) may be hardware or software. If hardware encoders are not available, software encoders may be used with adjustments made to account for the slower encoding and higher resource usage.


It is noted that, in the present embodiment, information may not be transferred between the two independently operating encoders. Instead, each encoder may simply encode the frame it receives without taking information from the other encoder into account. In other embodiments, information may be transferred between the encoders. While two separate encoders are used for purposes of example, both encoding steps may be performed by a single encoder in some embodiments.


Referring to FIG. 3, one embodiment of a decoding process 300 that may be used by a receiving device (e.g., one of the communication devices of FIGS. 1A-1C or the server 106/STU 110) is illustrated. For purposes of example, the receiving device is receiving the Layer 0 and Layer 1 outputs sent by the process 200 of FIG. 2. The Layer 0 and Layer 1 outputs of FIG. 2 are received in step 302. The low resolution Layer 0 stream is decoded in step 304 to recover the scaled down frame 201b. The frame 201b is scaled up (e.g., by the GPU) from its current resolution of 320×180 to the resolution of 1280×720 for frame 201c that will match the ghost image 201d.


The high resolution Layer 1 stream is independently decoded in step 308 to recover the ghost image 201d. Depending on the video coding standards used to encode the Layer 0 and Layer 1 outputs, the decoders for steps 304 and 308 may be different or may be the same. The ghost image 201d and the scaled up frame 201c are added in step 310 (e.g., by the GPU) to recreate the image 201a or an approximation thereof. It is noted that the recreated frame 201a of FIG. 3 may not exactly match the original frame of FIG. 2. The recreated frame 201a is then displayed in step 312.


It is understood that the encoder/decoder may depend on the device and its capabilities. Examples of hardware and software vendors and their supported encoder/decoder standards that may be used with the present disclosure are provided below in Table 1.












TABLE 1








Encoder/Decoder



Chipset Vendor/Software Vendor
Standards Supported









Qualcomm
Vp8, H.264



Samsung Exynos
Vp8, H.264



MediaTek
H.264



Google (software)
Vp9, Vp8, H.264



Apple (iPhone)
H.264










As can be seen, some devices may not support certain video coding standards, which in turn affects the selection of the encoders used in the encoding process 200 of FIG. 2. The receiving device is also taken into account, as it must be able to decode the received Layer 0 and Layer 1 streams. Examples of possible pairings of sending and receiving devices are provided in the following Tables 2-5. It is noted that if no native compatibility exists between two devices, a software encoder/decoder solution may be provided (identified as Damaka H.264 in the following tables). Listed standards may be in order of preference, but the order may change in some situations.










TABLE 2







Android Transmitter (Encoder)










Low Resolution
Difference Image
Android Receiver (Decoder)





Vp9, Vp8, H.264,
Vp8, H.264
Vp9, Vp8, H.264, Damaka H.264


Damaka H.264

















TABLE 3







Android Transmitter (Encoder)










Low Resolution
Difference Image
iPhone Receiver (Decoder)





Vp9, Vp8, H.264,
Vp8, H.264
Hardware: H.264


Damaka H.264

Software: Vp9, Vp8

















TABLE 4







Iphone Transmitter (Encoder)










Low Resolution
Difference Image
Android Receiver (Decoder)





H.264
H.264
H.264

















TABLE 5







Iphone Transmitter (Encoder)










Low Resolution
Difference Image
iPhone Receiver (Decoder)





H.264
H.264
H.264









It is understood that many different combinations are possible and such combinations may change as new models of devices are introduced, as well as new or modified encoders and decoders. Accordingly, due to the flexibility provided by the encoding process described herein, the process may be applied relatively easily to currently unreleased combinations of hardware and software.


Generally, the process described herein encodes both lower resolution video frames and difference video frames independently. The type of encoder used for lower resolutions can be different from the type of encoder used for higher resolution. For example, Vp9 can be used for low resolution encoding, while Vp8 (which may have built-in support in current devices) can be used for high resolution encoding. The process on the receiving end uses independent decoding and the synchronized addition of images.


Referring to FIG. 4, a flowchart illustrates one embodiment of a method 400 that may be used by a device to encode and send video information. In step 402, an original video frame is acquired. In step 404, the original video frame is scaled down. In step 406, the scaled down video frame is encoded to produce a Layer 0 output. In step 408, the Layer 0 output is transmitted or stored. In step 410, the Layer 0 output is decoded. In step 412, a difference between the Layer 0 output and the original video frame is obtained. In step 414, the difference is encoded to produce a Layer 1 output. This encoding is independent of the encoding in step 406 and may use a different video coding standard. In step 416, the Layer 1 output is transmitted or stored.


Referring to FIG. 5, a flowchart illustrates one embodiment of a method 500 that may be used by a device to decode received video information. In step 502, a Layer 0 frame and a Layer 1 frame are obtained. In step 504, the Layer 0 and Layer 1 frames are decoded. In step 506, the decoded Layer 0 frame is scaled up to match the resolution of the decoded Layer 1 frame. In step 508, the scaled up Layer 0 frame and the Layer 1 frame are added to create an additive frame. In step 510, the additive frame is displayed.


Referring to FIG. 6, a flowchart illustrates one embodiment of a method 600 that may be used by a device to establish video parameters. In step 502, video parameters are established during communications with a server and/or another device. In step 504, encoding is performed based on the established parameters. In step 506, Layer 0 output is sent, and Layer 1 output is sent if needed.


Referring to FIG. 7, one embodiment of an environment 700 illustrates (from the perspective of the device 102) communication devices 102, 104, 112, and 702 interacting on a conference call via a server 108/STU 110. In the present example, each device 102, 104, 112, and 702 may have the ability to transmit at multiple resolutions and to receive multiple streams of video of different participants. Accordingly, the STU 110 includes logic to determine such factors as what resolution(s) each device should use to send its video to the server 108, how many video streams each device should receive from the server 108, and how many “small” videos and “large” videos should be sent to a device. In the present example, a “small” video uses only Layer 0 frames and a “large” video uses the recreated frames formed by adding the Layer 0 and Layer 1 frames. Accordingly, a device may be showing users in a grid (generally “small” videos) and/or may have one user in a spotlight (a “large” video). The STU 110 then selects and transmits the video streams as needed.


Compared to a simulcast conference call model, the described process may provide all required video streams while using less bandwidth (e.g., approximately fifteen to thirty percent less). The process may, in some situations, cause an additional delay (e.g., thirty-three to eighty milliseconds). It is understood that these examples may vary based on a large number of factors and are for purposes of illustration only. Adjustments may be made, for example, by reducing the bit rate, changing the maximum resolution, sending only Layer 0 frames, and/or dropping the frame rate.


Referring to FIGS. 8A-8D, embodiments of an environment 800 are illustrated within which various aspects of the present disclosure may be practiced. In previous embodiments, as shown with respect to FIG. 8A, the server 108/STU 110 was generally managing multiple devices with each device performing the encoding and decoding operations needed for that device. This distribution of encoding/decoding may enable the STU 110 to handle more devices for a particular conference session (e.g., may provide more scalability) as the encoding and decoding processes are offloaded to each device, rather than being performed by the server 108/STU 110. FIG. 8A may also illustrate the storage of encoded data from the device 102 and then the forwarding of the encoded data to the device 104 for decoding. However, in FIGS. 8B-8D, the server 108/STU 110 may perform encoding and/or decoding steps when communicating with a device.


Referring to FIG. 8B, the device 102 may be streaming (or may have previously streamed) video data to the server 108. It is understood that the video stream may be processed by the server 108 without use of the STU 110 or may be managed by the STU 110. The video stream may be sent in encoded format (e.g., using the video scaling optimization process disclosed herein) as shown and the server 108/STU 110 decodes the stream. The server 108/STU 110 then encodes the data prior to sending the data to the device 104, which decodes the data. In the illustration of FIG. 8B, it is understood that encoding/decoding negotiations may occur between each device 102, 104 and the server 108/STU 110, or the server 108/STU 110 may use information from negotiations between the devices 102 and 104 for its encoding and decoding.


Referring to FIG. 8C, the device 102 may be streaming (or may have previously streamed) video data to the server 108. It is understood that the video stream may be processed by the server 108 without use of the STU 110 or may be managed by the STU 110. However, the video stream is not in encoded format (e.g., does not use the video scaling optimization process disclosed herein) as shown and the server 108/STU 110 does not need to decode the stream. The server 108/STU 110 then encodes the data prior to sending the data to the device 104, which decodes the data. In the illustration of FIG. 8C, it is understood that encoding/decoding negotiations may occur between the device 104 and the server 108/STU 110.


Referring to FIG. 8D, the device 102 may be streaming (or may have previously streamed) video data to the server 108. It is understood that the video stream may be processed by the server 108 without use of the STU 110 or may be managed by the STU 110. The video stream may be sent in encoded format (e.g., using the video scaling optimization process disclosed herein) as shown and the server 108/STU 110 decodes the stream. The server 108/STU 110 then sends the data to the device 104 without encoding, and the device 104 does not need to decode the data. In the illustration of FIG. 8D, it is understood that encoding/decoding negotiations may occur between the device 102 and the server 108/STU 110.


As an example scenario using server-side encoding and decoding, the device 102 may stream video data to the server 108 for storage. The device 102 then goes offline. During a later communication session, the server 108/STU 110 retrieves the stored data and provides it to the device 104. As the device 104 was not able to negotiate the encoding/decoding parameters with the device 102, the server 108/STU 110 may perform encoding/decoding in order to establish the parameters with the device 104. It is understood that this process may be used with live streaming video call data, as well as with stored data. It is further understood that this server-side encoding and decoding may occur with only some devices (e.g., the device 102 of FIG. 1C) on a conference call, with other devices (e.g., the devices 104 and 112 of FIG. 1C) being managed as shown in FIG. 8A. This enables the server 108/STU 110 to manage exceptions on a per device basis, while still offloading as much of the encoding/decoding to the remaining devices as possible.


Referring to FIG. 9, one embodiment of a computer system 900 is illustrated. The computer system 900 is one possible example of a system component or computing device such as a communication device or a server. The computer system 900 may include a controller (e.g., a central processing unit (“CPU”)) 902, a memory unit 904, an input/output (“I/O”) device 906, and a network interface 908. The components 902, 904, 906, and 908 are interconnected by a transport system (e.g., a bus) 910. A power supply (PS) 912 may provide power to components of the computer system 900, such as the CPU 902 and memory unit 904. It is understood that the computer system 900 may be differently configured and that each of the listed components may actually represent several different components. For example, the CPU 902 may actually represent a multi-processor or a distributed processing system; the memory unit 904 may include different levels of cache memory, main memory, hard disks, and remote storage locations; the I/O device 906 may include monitors, keyboards, and the like; and the network interface 908 may include one or more network cards providing one or more wired and/or wireless connections to a network 916. Therefore, a wide range of flexibility is anticipated in the configuration of the computer system 900.


The computer system 900 may use any operating system (or multiple operating systems), including various versions of operating systems provided by Microsoft (such as WINDOWS), Apple (such as iOS or Mac OS X), Google (Android), UNIX, and LINUX, and may include operating systems specifically developed for handheld devices, personal computers, and servers depending on the use of the computer system 900. The operating system, as well as other instructions (e.g., for the processes and message sequences described herein), may be stored in the memory unit 904 and executed by the processor 902. For example, if the computer system 900 is the server 108 or a communication device 102, 104, 112, or 702, the memory unit 904 may include instructions for performing some or all of the message sequences and methods described with respect to such devices in the present disclosure.


The network 916 may be a single network or may represent multiple networks, including networks of different types. For example, the server 108 or a communication device 102, 104, 112, or 702 may be coupled to a network that includes a cellular link coupled to a data packet network, or data packet link such as a wide local area network (WLAN) coupled to a data packet network. Accordingly, many different network types and configurations may be used to establish communications between the server 108, communication devices 102, 104, 112, 702, servers, and/or other components described herein.


Exemplary network, system, and connection types include the internet, WiMax, local area networks (LANs) (e.g., IEEE 802.11a and 802.11g wi-fi networks), digital audio broadcasting systems (e.g., HD Radio, T-DMB and ISDB-TSB), terrestrial digital television systems (e.g., DVB-T, DVB-H, T-DMB and ISDB-T), WiMax wireless metropolitan area networks (MANs) (e.g., IEEE 802.16 networks), Mobile Broadband Wireless Access (MBWA) networks (e.g., IEEE 802.20 networks), Ultra Mobile Broadband (UMB) systems, Flash-OFDM cellular systems, and Ultra wideband (UWB) systems. Furthermore, the present disclosure may be used with communications systems such as Global System for Mobile communications (GSM) and/or code division multiple access (CDMA) communications systems. Connections to such networks may be wireless or may use a line (e.g., digital subscriber lines (DSL), cable lines, and fiber optic lines).


Communication among the server 108, communication devices 102, 104, 112, 702, servers, and/or other components described herein may be accomplished using predefined and publicly available (i.e., non-proprietary) communication standards or protocols (e.g., those defined by the Internet Engineering Task Force (IETF) or the International Telecommunications Union-Telecommunications Standard Sector (ITU-T)), and/or proprietary protocols. For example, signaling communications (e.g., session setup, management, and teardown) may use a protocol such as the Session Initiation Protocol (SIP), while data traffic may be communicated using a protocol such as the Real-time Transport Protocol (RTP), File Transfer Protocol (FTP), and/or Hyper-Text Transfer Protocol (HTTP). A sharing session and other communications as described herein may be connection-based (e.g., using a protocol such as the transmission control protocol/internet protocol (TCP/IP)) or connection-less (e.g., using a protocol such as the user datagram protocol (UDP)). It is understood that various types of communications may occur simultaneously, including, but not limited to, voice calls, instant messages, audio and video, emails, document sharing, and any other type of resource transfer, where a resource represents any digital data.


While the preceding description shows and describes one or more embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the present disclosure. For example, various steps illustrated within a particular sequence diagram or flow chart may be combined or further divided. In addition, steps described in one diagram or flow chart may be incorporated into another diagram or flow chart. Furthermore, the described functionality may be provided by hardware and/or software, and may be distributed or combined into a single platform. Additionally, functionality described in a particular example may be achieved in a manner different than that illustrated, but is still encompassed within the present disclosure. Therefore, the claims should be interpreted in a broad manner, consistent with the present disclosure.

Claims
  • 1. A method for optimizing video for transmission on a device based on the device's capabilities, the method comprising: capturing, by a camera associated with the device, an original video frame;scaling the original video frame down to a lower resolution video frame;encoding the lower resolution video frame using a first encoder to produce a first layer output;decoding the first layer output;upscaling the decoded first layer output to match a resolution of the original video frame;obtaining a difference between the upscaled decoded first layer output and the original video frame; andencoding the difference using a second encoder to create a second layer output, wherein the encoding to produce the second layer output occurs independently from the encoding to produce the first layer output.
  • 2. The method of claim 1 wherein the first and second encoders perform the encoding of the first and second output layers, respectively, using different video coding standards.
  • 3. The method of claim 1 wherein the first and second encoders perform the encoding of the first and second output layers, respectively, using identical video coding standards.
  • 4. The method of claim 1 further comprising communicating, by the device, with another device in order to determine which video coding standard is to be used to perform the encoding by each of the first and second encoders.
  • 5. The method of claim 1 further comprising sending the first and second output layers to another device during a video call.
  • 6. The method of claim 1 further comprising sending the first and second output layers to a storage device.
  • 7. A method for decoding video for display by a device, the method comprising: receiving an encoded first video frame and an encoded second video frame;independently decoding the encoded first and second video frames using a first decoder and a second decoder, respectively;upscaling the decoded first video frame to a resolution matching a resolution of the decoded second video frame; andadding the upscaled decoded first video frame and the decoded second video frame to create an additive video frame.
  • 8. The method of claim 7 wherein the first and second decoders perform the decoding of the encoded first and second video frames, respectively, using different video coding standards.
  • 9. The method of claim 7 wherein the first and second decoders perform the decoding of the encoded first and second video frames, respectively, using identical video coding standards.
  • 10. The method of claim 7 further comprising sending the additive video frame for display by the device.
  • 11. The method of claim 7 wherein receiving the encoded first video frame and the encoded second video frame includes retrieving the encoded first video frame and the encoded second video frame from a storage device.
  • 12. A device for sending and receiving optimized video frames, the device comprising: a processor; anda memory coupled to the processor, the memory having a plurality of instructions stored therein for execution by the processor, the plurality of instructions including instructions for scaling an original video frame down to a lower resolution video frame;encoding the lower resolution video frame using a first encoder to produce a first layer output;decoding the first layer output;upscaling the decoded first layer output to match a resolution of the original video frame;obtaining a difference between the upscaled decoded first layer output and the original video frame; andencoding the difference using a second encoder to create a second layer output, wherein the encoding to produce the second layer output occurs independently from the encoding to produce the first layer output.
  • 13. The device of claim 12 wherein the first and second encoders perform the encoding of the first and second output layers, respectively, using different video coding standards.
  • 14. The device of claim 12 wherein the first and second encoders perform the encoding of the first and second output layers, respectively, using identical video coding standards.
  • 15. The device of claim 12 wherein the instructions further include communicating with another device in order to determine which video coding standard is to be used to perform the encoding by each of the first and second encoders.
  • 16. The device of claim 12 wherein the instructions further include sending the first and second output layers to another device during a video call.
  • 17. The device of claim 12 wherein the instructions further include sending the first and second output layers to a storage device.
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application Ser. No. 63/192,051, filed on May 23, 2021, and entitled “SYSTEM AND METHOD FOR OPTIMIZING VIDEO COMMUNICATIONS BASED ON DEVICE CAPABILITIES,” which is hereby incorporated by reference in its entirety.

US Referenced Citations (150)
Number Name Date Kind
5442637 Nguyen Aug 1995 A
5612744 Lee Mar 1997 A
5761309 Ohashi et al. Jun 1998 A
5790637 Johnson et al. Aug 1998 A
5818447 Wolf et al. Oct 1998 A
5889762 Pajuvirta et al. Mar 1999 A
6031818 Lo et al. Feb 2000 A
6041078 Rao Mar 2000 A
6128283 Sabaa et al. Oct 2000 A
6141687 Blair Oct 2000 A
6161082 Goldberg et al. Dec 2000 A
6195694 Chen et al. Feb 2001 B1
6202084 Kumar et al. Mar 2001 B1
6219638 Padmanabhan et al. Apr 2001 B1
6298129 Culver et al. Oct 2001 B1
6311150 Ramaswamy et al. Oct 2001 B1
6343067 Drottar et al. Jan 2002 B1
6360196 Poznanski et al. Mar 2002 B1
6389016 Sabaa et al. May 2002 B1
6438376 Elliott et al. Aug 2002 B1
6473425 Bellaton et al. Oct 2002 B1
6574668 Gubbi et al. Jun 2003 B1
6606112 Falco Aug 2003 B1
6654420 Snook Nov 2003 B1
6674904 McQueen Jan 2004 B1
6741691 Ritter et al. May 2004 B1
6754181 Elliott et al. Jun 2004 B1
6766373 Beadle et al. Jul 2004 B1
6826613 Wang et al. Nov 2004 B1
6836765 Sussman Dec 2004 B1
6842460 Olkkonen et al. Jan 2005 B1
6850769 Grob et al. Feb 2005 B2
6898413 Yip et al. May 2005 B2
6912278 Hamilton Jun 2005 B1
6940826 Simard et al. Sep 2005 B1
6963555 Brenner et al. Nov 2005 B1
6975718 Pearce et al. Dec 2005 B1
6987756 Ravindranath et al. Jan 2006 B1
6999575 Sheinbein Feb 2006 B1
6999932 Zhou Feb 2006 B1
7006508 Bondy et al. Feb 2006 B2
7010109 Gritzer et al. Mar 2006 B2
7013155 Ruf et al. Mar 2006 B1
7079529 Khuc Jul 2006 B1
7080158 Squire Jul 2006 B1
7092385 Gallant et al. Aug 2006 B2
7117526 Short Oct 2006 B1
7123710 Ravishankar Oct 2006 B2
7184415 Chaney et al. Feb 2007 B2
7185114 Hariharasubrahmanian Feb 2007 B1
7272377 Cox et al. Sep 2007 B2
7302496 Metzger Nov 2007 B1
7304985 Sojka et al. Dec 2007 B2
7345999 Su et al. Mar 2008 B2
7346044 Chou et al. Mar 2008 B1
7353252 Yang et al. Apr 2008 B1
7353255 Acharya et al. Apr 2008 B2
7412374 Seiler et al. Aug 2008 B1
7457279 Scott et al. Nov 2008 B1
7477282 Firestone et al. Jan 2009 B2
7487248 Moran et al. Feb 2009 B2
7512652 Appelman et al. Mar 2009 B1
7542472 Gerendai et al. Jun 2009 B1
7546334 Redlich Jun 2009 B2
7564843 Manjunatha et al. Jul 2009 B2
7570743 Barclay et al. Aug 2009 B2
7574523 Traversat et al. Aug 2009 B2
7590758 Takeda et al. Sep 2009 B2
7613171 Zehavi et al. Nov 2009 B2
7623476 Ravikumar et al. Nov 2009 B2
7623516 Chaturvedi et al. Nov 2009 B2
7656870 Ravikumar et al. Feb 2010 B2
7664495 Bonner et al. Feb 2010 B1
7769881 Matsubara et al. Aug 2010 B2
7774495 Pabla et al. Aug 2010 B2
7778187 Chaturvedi et al. Aug 2010 B2
7782866 Walsh et al. Aug 2010 B1
7917584 Arthursson Mar 2011 B2
8009586 Chaturvedi et al. Aug 2011 B2
8065418 Abuan et al. Nov 2011 B1
8135232 Kimura Mar 2012 B2
8200796 Margulis Jun 2012 B1
8402551 Lee Mar 2013 B2
8407314 Chaturvedi et al. Mar 2013 B2
8407576 Yin et al. Mar 2013 B1
8447117 Liao May 2013 B2
8560642 Pantos et al. Oct 2013 B2
8611540 Chaturvedi et al. Dec 2013 B2
8990877 Hart Mar 2015 B2
9143489 Chaturvedi et al. Sep 2015 B2
9356997 Chaturvedi et al. May 2016 B2
9742846 Chaturvedi et al. Aug 2017 B2
10091258 Carter et al. Oct 2018 B2
10097638 Chaturvedi et al. Oct 2018 B2
10147202 Nystad Dec 2018 B2
10834256 Nair et al. Nov 2020 B1
10887549 Wehrung et al. Jan 2021 B1
10924709 Faulkner et al. Feb 2021 B1
11315158 Lidster et al. Apr 2022 B1
20020112181 Smith Aug 2002 A1
20030036886 Stone Feb 2003 A1
20030164853 Zhu et al. Sep 2003 A1
20040091151 Jin May 2004 A1
20040141005 Banatwala et al. Jul 2004 A1
20050071678 Lee et al. Mar 2005 A1
20050138110 Redlich Jun 2005 A1
20050147212 Benco et al. Jul 2005 A1
20050193311 Das Sep 2005 A1
20060195519 Slater et al. Aug 2006 A1
20060233163 Celi et al. Oct 2006 A1
20070003044 Liang et al. Jan 2007 A1
20080005666 Sefton Jan 2008 A1
20080037753 Hofmann Feb 2008 A1
20080163378 Lee Jul 2008 A1
20090178019 Bahrs Jul 2009 A1
20090178144 Redlich Jul 2009 A1
20090254572 Redlich Oct 2009 A1
20090282251 Cook et al. Nov 2009 A1
20100005179 Dickson Jan 2010 A1
20100064344 Wang Mar 2010 A1
20100158402 Nagase Jun 2010 A1
20100202511 Shin Aug 2010 A1
20100250497 Redlich Sep 2010 A1
20100299529 Fielder Nov 2010 A1
20110044211 Long et al. Feb 2011 A1
20110110603 Ikai May 2011 A1
20110129156 Liao Jun 2011 A1
20110145687 Grigsby et al. Jun 2011 A1
20110164824 Kimura Jul 2011 A1
20120030733 Andrews Feb 2012 A1
20120064976 Gault et al. Mar 2012 A1
20120173971 Sefton Jul 2012 A1
20120252407 Poltorak Oct 2012 A1
20120321083 Phadke Dec 2012 A1
20130051476 Morris Feb 2013 A1
20130063241 Simon Mar 2013 A1
20130091290 Hirokawa Apr 2013 A1
20140096036 Mohler Apr 2014 A1
20140185801 Wang Jul 2014 A1
20150295777 Cholkar et al. Oct 2015 A1
20160057391 Block et al. Feb 2016 A1
20160234264 Coffman et al. Aug 2016 A1
20170249394 Loeb et al. Aug 2017 A1
20180176508 Pell Jun 2018 A1
20190273767 Nelson et al. Sep 2019 A1
20200274965 Ravichandran Aug 2020 A1
20200301647 Yoshida Sep 2020 A1
20200382618 Faulkner et al. Dec 2020 A1
20210099574 Nair et al. Apr 2021 A1
20220086197 Lohita et al. Mar 2022 A1
Foreign Referenced Citations (16)
Number Date Country
1603339 Dec 2005 EP
1638275 Mar 2006 EP
1848163 Oct 2007 EP
1988698 Nov 2008 EP
1404082 Oct 2012 EP
1988697 Feb 2018 EP
2005094600 Apr 2005 JP
2005227592 Aug 2005 JP
2007043598 Feb 2007 JP
20050030548 Mar 2005 KR
03079635 Sep 2003 WO
2005009019 Jan 2005 WO
2004063843 Mar 2005 WO
2006064047 Jun 2006 WO
2006075677 Jul 2006 WO
2008099420 Dec 2008 WO
Non-Patent Literature Citations (28)
Entry
Balamurugan Karpagavinayagam et al. (Monitoring Architecture for Lawful Interception in VoIP Networks, ICIMP 2007, Aug. 24, 2008).
Blanchet et al; “IPv6 Tunnel Broker with the Tunnel Setup Protocol (TSP)”; May 6, 2008; IETF; IETF draft of RFC 5572, draftblanchet-v6ops-tunnelbroker-tsp-04; pp. 1-33.
Chathapuram, “Security in Peer-To-Peer Networks”, Aug. 8, 2001, XP002251813.
Cooper et al; “NAT Traversal for dSIP”; Feb. 25, 2007; IETF; IETF draft draft-matthews-p2psip-dsip-nat-traversal-00; pp. 1-23.
Cooper et al; “The Effect of NATs on P2PSIP Overlay Architecture”; IETF; IETF draft draft-matthews-p2psip-nats-and-overlays-01.txt; pp. 1-20.
Dunigan, Tom, “Almost TCP over UDP (atou),” last modified Jan. 12, 2004; retrieved on Jan. 18, 2011 from 18 pgs.
Hao Wang, Skype VoIP service-architecture and comparison, In: INFOTECH Seminar Advanced Communication Services (ASC), 2005, pp. 4, 7, 8.
Isaacs, Ellen et al., “Hubbub: A sound-enhanced mobile instant messenger that supports awareness and opportunistic interactions,” Proceedings of the SIGCHI Conference On Human Factors in Computing Systems; vol. 4, Issue No. 1; Minneapolis, Minnesota; Apr. 20-25, 2002; pp. 179-186.
J. Rosenberg et al., SIP: Session Initiation Protocol (Jun. 2008) retrieved at http://tools.ietf.org/html/rfc3261. Relevant pages provided.
J. Rosenberg et al. “Session Traversal Utilities for NAT (STUN)”, draft-ietf-behave-rfc3489bis-06, Mar. 5, 2007.
Jeff Tyson, “How Instant Messaging Works”, www.verizon.com/learningcenter, Mar. 9, 2005.
Mahy et al., The Session Initiation Protocol (SIP) “Replaces” Header, Sep. 2004, RFC 3891, pp. 1-16.
NiceLog User's Manual 385A0114-08 Rev. A2, Mar. 2004.
Pejman Khadivi, Terence D. Todd and Dongmei Zhao, “Handoff trigger nodes for hybrid IEEE 802.11 WLAN/cellular networks,” Proc. Of IEEE International Conference on Quality of Service in Heterogeneous Wired/Wireless Networks, pp. 164-170, Oct. 18, 2004.
Philippe Bazot et al., Developing SIP and IP Multimedia Subsystem (IMS) Applications (Feb. 5, 2007) retrieved at redbooks IBM form No. SG24-7255-00. Relevant pages provided.
Qian Zhang; Chuanxiong Guo; Zihua Guo; Wenwu Zhu, “Efficient mobility management for vertical handoff between WWAN and WLAN,” Communications Magazine, IEEE, vol. 41. issue 11, Nov. 2003, pp. 102-108.
RFC 5694 (“Peer-to-Peer (P2P) Architecture: Definition, Taxonomies, Examples, and Applicability”, Nov. 2009).
Rory Bland, et al,“P2P Routing” Mar. 2002.
Rosenberg, “STUN—Simple Traversal of UDP Through NAT”, Sep. 2002, XP015005058.
Rosenberg, J; “Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols”; Oct. 29, 2007; I ETF; I ETF draft of RFC 5245, draft-ietf-mmusic-ice-19; pp. 1-120.
Salman A. Baset, et al, “An Analysis Of The Skype Peer-To-Peer Internet Telephony Protocol”, Department of Computer Science, Columbia University, New York, NY, USA, Sep. 15, 2004.
Seta, N.; Miyajima, H.; Zhang, L;; Fujii, T., “All-SIP Mobility: Session Continuity on Handover in Heterogeneous Access Environment,” Vehicular Technology Conference, 2007. VTC 2007—Spring. IEEE 65th, Apr. 22-25, 2007, pp. 1121-1126.
Singh et al., “Peer-to Peer Internet Telephony Using SIP”, Department of Computer Science, Columbia University, Oct. 31, 2004, XP-002336408.
Sinha, S. and Oglieski, A., A TCP Tutorial, Nov. 1998 (Date posted on Internet: Apr. 19, 2001) [Retrieved from the Internet ].
Srisuresh et al.; “State of Peer-to-Peer(P2P) Communication Across Network Address Translators(NATs)”; Nov. 19, 2007; I ETF; I ETF draft for RFC 5128, draft-ietf-behave-p2p-state-06.txt; pp. 1-33.
T. Dierks & E. Rescorla, The Transport Layer Security (TLS) Protocol (Ver. 1.2, Aug. 2008) retrieved at http://tools.ietf.org/htmllrfc5246. Relevant pages provided.
Wireless Application Protocol—Wireless Transport Layer Security Specification, Version 18—Feb. 2000, Wireless Application Forum, Ltd. 2000; 99 pages.
WISPA: Wireless Internet Service Providers Association; WISPA-CS-IPNA-2.0; May 1, 2009.
Provisional Applications (1)
Number Date Country
63192051 May 2021 US