The present disclosure generally relates to real-time screen recording and content sharing; and in particular relates to a system architecture for configuring devices to form a decentralized, device-to-device (D2D), edge, and/or “Fog” network and further configuring the devices to provide real-time screen and content sharing functionality over the network via an improved streaming protocol which provides high throughput and reduced latency, despite the technical challenges associated with real-time content sharing across decentralized networks.
Content sharing between devices is desirable in a number of different scenarios and each scenario may present its own unique challenges. For example, it may be desirable to share content between mobile devices associated with workers at a mining site or on a cruise ship; yet, physical obstructions or proximity to radio towers may result in general bandwidth restrictions which may limit functionality and streaming quality. Further, the type of content being shared can affect performance and implementation requirements. For example, a screen of a desktop computer to be shared may reflect general movement patterns of sliding, page up, and page down and may define content that does not change significantly most of the time, whereas a screen of a third party application such as game may define content that changes substantially over a short period of time.
Most conventional mobile platforms for content creation and sharing are cloud-based. Notwithstanding the tremendous success of the cloud-based solution, a consensus is that there are some open issues associated with cloud-based services, including unpredictable latency and jitter, privacy/security concerns, demand on network bandwidth and storage.
It is with these observations in mind, among others, that various aspects of the present disclosure were conceived and developed.
The present patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Corresponding reference characters indicate corresponding elements among the view of the drawings. The headings used in the figures do not limit the scope of the claims.
The present disclosure is directed to embodiments of a decentralized, D2D, peer-to-peer, and/or fog-based system architecture for connecting and sharing content between devices. In particular, the system architecture may include a service application installed to and executable by two or more devices for forming a decentralized network, such as a D2D, fog network, edge network, or any ad hoc network and connecting the devices via the network. In addition, the service application may define or otherwise be implemented in conjunction with an improved streaming protocol (RTScreen) that configures devices for, via the network, real-time screen recording/sharing, real-time collaborative content creation, and live content sharing via screen broadcast between the devices as described herein. In some embodiments, the subject screen recoding/sharing may be provided directly between the devices of the network. Using the network and service application as described, the devices do not require connection to the Internet or an intermediate device such as a managing server, administrator, or network device. Collectively, the various components of the system architecture provide screen-sharing and/or streaming of content from one streaming device to another or many other receiving devices, as described herein. Advantageously, it should be understood that each device of the network described herein, configured with the service application, may function both as a streaming device and/or also as a receiving device.
The present system architecture, including the improved streaming protocol dubbed RTScreen described herein, is configured to address specific technical challenges associated with media content sharing in the context of D2D or peer-to-peer networking and computing. First, full screen recording is largely somewhat undeveloped with respect to conventional mobile device operating systems. For example, full screen recording was disallowed on iOS devices until at least the summer of 2017. Second, real-time screen sharing has its unique characteristics. For example, a computer screen includes movement patterns associated with sliding, navigating a page up, and/or navigating a page down where the content does not change significantly most of the time; whereas sharing screen in third party applications such as games leads to drastic content changes of the screen content. Further, real-time streaming of a high resolution screen directly between devices makes it more difficult to carry out buffering and adaptive usage of bandwidth, which is fundamentally different from on-demand video streaming over the Internet where more aggressive buffering can be employed. The situation gets even more complicated in a system that allows anyone with an App to be a streamer or a watcher, and where multiple streaming sessions are competing for limited local WiFi bandwidth in the same network.
Responsive to the aforementioned technical challenges, a variety of technical solutions are described herein as part of the disclosed system architecture, associated with RTScreen or otherwise. In particular for example, RTScreen includes screen content streaming and screen content management protocol features for implementing close-loop (both feedback and feedforward) control between a streaming device and a receiving device (or devices) to enable real-time adaptation, thereby significantly enhancing real-time streaming performance. RTScreen further includes features for adaptively updating a streaming rate based on changes to screen content; features for unified dynamic packetization, active frame duplication, and selective packet transmission to increase the likelihood of packet delivery (associated with media content) even under congested network conditions; automatic screen-to-screen delay for dynamically adjusting a receiving device buffer based on network quality; smart stream congestion detection; features for traffic shaping and packet scheduling to improve streaming performance with multiple different streaming devices; and overrun control for managing temporary reductions in processing resources; such protocol features collectively designed to address unique challenges with content sharing over a D2D network, as further described herein.
Numerous use cases or possible applications of the present novel concept are contemplated. For example, when delivering a digital presentation via a presenter device, a presenter may open a set of presentation slides within the described service application (executed by the presenter device), or while the application is running in the background. Other users associated with devices connected to the presenter device may then “connect” to the speaker's application using an authentication token generated by the speaker's application. The screen of the presenter's application may then be shared, via live screen recording and streaming technology described herein, with the connected devices in a real time manner. In this way the users may be able to view whatever the host (the presenter in this example) is viewing on the host device. As another example, a user may share multimedia content (e.g., music or video) with other users connected within the same network in real time. Embodiments of the system architecture for forming a network for real-time media content sharing between devices of the network are generally described as 100 or 100A in
Referring to
Advantageously, it should be understood that the devices 101, configured with the application 102, may also function both as streaming and/or receiving devices, such that the receiving device 106 may also be configured to share with and stream content to the streaming device 104. In addition,
The network 108 as illustrated may include any decentralized network such as a D2D network, a fog-based network, peer-to-peer network, an edge network, D2D or peer-to-peer content delivery network (CDN), or any form of ad hoc network such that the network 108 provides a decentralized logical infrastructure for media sharing between the devices 101. In some embodiments, the network 108 is devoid of a central, core, or intermediate entity such as a base station, router, intermediate resource management, or other intermediate network element such that e.g., the streaming device 104 establishes a direct connection with the receiving device 106, and vice versa and the devices are in direct communication with one another, via in-band or out-band frequencies and/or direct connections (examples of which are described herein). In this manner, in some embodiments, the network 108 leverages the close physical proximity of the devices 101 to one another while also leveraging the various networking advantages of direct communication between the devices 101 for media content sharing or otherwise, as described herein.
In some embodiments, the network 108 includes a fog network; which refers to extending computing to an edge of a network, also known as edge computing, or fogging. In other words, fog networking facilitates the operation of compute, storage, and networking services between end devices. With a fog network, by interconnecting devices (directly or otherwise) at the network edge, data may be processed more efficiently rather than having to transmit data to a cloud or intermediate device for processing.
In some embodiments, the network 108 may further define a peer-to-peer overly network or ad hoc network. A peer-to-peer overly network relates to a network in which member nodes obtain services in the absence of server-based infrastructure. In a peer-to-peer overlay, peer nodes co-operate with each other both to provide services and to maintain the network. Peer-to-peer overlay networks can be built on top of an underlying network that provides host to host connectivity, such as a network utilizing an Internet Protocol (IP). Examples of an overlay network include, but are not limited to, the Internet (e.g., overlay on a telephone network), Chord, Content Addressable Network (CAN), Pastry, and Viceroy. In general, nodes in an overlay may not all be able to communicate with each other pairwise. However, they may all communicate with one another via the overlay through multi-hop underlay routes. An ad hoc network is a temporary, self-organizing, type of device-to-device or temporary computer-to-computer network connection (generally wireless), for establishing a connection between devices without having to connect to an access point, router, or other such network device. An ad hoc network operates without any need for infrastructure such as a central entity including a base station or networking device, may utilize mufti-hop radio relay concepts, and hence be considered a multi-hopped network.
The network 108 may also include a software-defined network. In a software defined network, a network can be managed and modified without the need to interact with physical switches or other network devices. A software-defined controller or other software element, directs the delivery of network services as desired, regardless of the specific connections between the devices. The controller (or control layer) further manages the policies and flow of traffic through a network. Where a traditional network would involve a specialized appliance or network device, a software-defined network replaces such an appliance with an application that uses the controller to manage data plane behavior and other network functions.
In some embodiments, the network 108 may include an Internet of Things network. Loosely, the term “Internet of Things” or “IoT” (or “Internet of Everything” or “IoE”) refers to uniquely identifiable objects (things) and their virtual representations in a network-based architecture. In particular, the IoT network relates to the ability to connect more than just computers and communications devices, but rather the ability to connect “objects” in general, such as lights, appliances, vehicles, heating, ventilating, and air-conditioning (HVAC), windows and window shades and blinds, doors, locks, displays, etc. The “Internet of Things” thus generally refers to the interconnection of objects (e.g., smart objects), such as sensors and actuators, over a computer network (e.g., via IP), which may be the public Internet or a private network. In these embodiments, as a simple non-limiting example, the receiving device 106 may include a display of a refrigerator or other smart device of an IoT network.
In some embodiments, the network 108 may include a general CON, private CON, or peer-to-peer CON. As such, the streaming device 104 may be considered an original content distributor of the network 108 or point of presence (POP), and may share media content with the receiving devices 106. In this embodiment, a variety of unicasting and/or multicasting techniques may be implemented to optimize bandwidth consumption.
In any event, the network 108 is configured to be decentralized in the manner described by any of the embodiments described herein in order to improve upon and address the limitations of e.g., general Transmission Control Protocol (TCP) which may result in packet loss and may be prone to errors due to network congestion. The system 100 is uniquely designed to optimize and improve media content sharing in the context of any of the decentralized networks described herein.
Different embodiments of the system 100 are contemplated. For example, referring to
Each of the streaming device 104A/104B, the receiving devices 106A-106F, or broker device 114 may include a name or identifier, typically the same as the hostname or an arbitrary name set by the device owner. The streaming devices 104A-104B may access or grab the media content 112 in the form of screen content frames (sometimes referred to herein as “frames”), may record audio using speakers, microphones, or other audio components of the streaming devices 104A-104B, and transmit this media content 112 to the receiving devices 106A-106F and optionally the broker device 114. In turn, the receiving devices 106A-106F receive screen frames and audio segments from the streaming device 104A/104B or a broker and display them. The broker device 114 may include an intermediate server that receives screen frames from a streaming device 104 and forwards them to one or more receiving devices 106. From the streamer side, the broker device 114 is the same as a receiving device 106 (while the connection initiation is in the reverse direction), and from the receiving device 106 side, the broker device 114 is the same as the streaming device 104. In some embodiments, the broker device 114 is a server with a public IP address accessible via the Internet that allows devices 101 behind firewalls to share screens over the Internet.
This present disclosure is directed mainly to the local D2D based fog network streaming of a screen and/or audio file without the broker device 114 for simplicity. In addition, audio streaming is similar to screen frame streaming such that the present disclosure mainly focuses on the streaming of a screen or video file from one device to another. In some embodiments, the traffic pattern mainly defines or relates to a publish-subscribe model (with feedback from the receiving device 106 to the streaming device 104, and/or vice versa), and the present disclosure describes the additional components of the system 100 with the assumption that only a sole streaming device 104 is implemented, although the present disclosure is not limited in this regard.
Referring to the process flow 200 of
At block 202, a host device of the devices 101 (shown as 220 in
Referring to
Referring to
Further in block 202, an authentication process may be initiated in which the host device 220 generates a password for authenticating a particular one of the client devices 222 to the host device 220. Once this authentication is completed, the network 108 is generally configured for accommodating the real time screen recording and content sharing functionality described herein. In addition, upon being authenticated, future sharing sessions may no longer require authentication from client devices 222 that have previously been authenticated or have otherwise previously provided approval for forming the network 108 with the host device 220.
Referring to block 204, media content, in the form of a document or content presented in another application, including text content, audio, image/audio objects, or other media content associated with e.g., the streaming device 104A of the network 108, may be accessed and may be recorded or otherwise temporarily stored. In some embodiments, the application screen of the streaming device 104A is captured and recorded, in a continuous manner, using screen recording application programming interfaces (APIs) associated with, e.g., Android and iOS. Recording the media content in this manner may initially include capturing screen updates.
More specifically, in some embodiments, recording of the screen of the streaming device 104A may include two different approaches: (i) in-app screen recording 226; and/or (ii) full screen recording 228, as depicted in
Conversely, the full screen recording 228 functions differently, as elaborated further below. Since, there is no in-app recording in this case, a background service may be used to record the full screen of the streaming device 104A. In other words, in this example, the application 102, or at least aspects of the application 102, may take the form of a background service. This background service can only be enabled by the user, so, it is generally considered to be secured because without user intervention, this service cannot be enabled to commence screen recording. As some examples, full screen broadcast may be enabled from an iOS control center; i.e., on iOS devices, this service may be available natively in the operating system itself. There are few technical challenges in sharing the screen using this approach. (A) The background service has limited resources in terms of CPU and memory, and processing and transmitting large-size frames is difficult with such limited resources. To resolve this issue, hardware enabled encoding may be used, because it is fast and uses very less resources. (B) The second challenge is to run a video streaming server within the background service to support live streaming in a fog or general decentralized network. Addressing these challenges is described in greater detail below.
At block 206, the media content recorded by the streaming device 104A may be encoded for transmission to other devices 101. Referring to
Referring to
Upon receiving the encoded frames 234, real time sharing of the aforementioned media content between the streaming device 104A and one or more of the receiving devices 106 continues as the receiving devices 106 decode the encoded frames 234 and display the data using the presentation timestamp provided by the streaming device 104A.
Returning to the steps of
The implementation of streaming may include generating the encoded frames 234 in order to send them over UDP using either multicast or multiple unicast as described. One main challenge that arises with implementation of RTP in this context is the potential for data loss or corrupted data during transmission, which may lead to corrupted video on the receiving end. To resolve these issues, larger frames need to be further fragmented to smaller packets and multiple tiny frames need to be combined into one single packet. In some embodiments, an existing implementation of RTP according may be sufficient for streaming. The benefit of using RTP is that it comes in handy with the Real Time Streaming Protocol (RTSP) which can fulfill all the session creation requirements and adjusting an encoder according to the other receiving devices in the network. However, the existing RTP implementation can have noticeable latency and is resource intensive so it may not work well with a background service, as in the case where the streaming device 104A is configured for full-screen recording 228.
Stream Session, Stream Management (Streamer and Watcher):
RTScreen, a possible component of the system 100 and system 100A, is an improved streaming protocol that enhances real-time streaming performance. RTScreen may comprise a streaming policy, protocol/s, or platform implemented completely in software on commercial off-the-shelf smartphones with Android and iOS and desktop computers with Linux, Windows and Mac OS (operating system) and connected with TCP/IP network through WiFi or Ethernet, and may be integrated with the application 102 or separately implemented. Any streaming device 104 and/or host machine of the devices 101 may be equipped with RTScreen via the application 102 (or a separate application) and can function as a streaming or receiving device in a screen sharing group.
The following remaining discussion is directed to implementation of RTScreen for streaming/sharing media content once the network 108 is formed among the devices 101 according to
Multiple stream sessions may be generated between the same pair of a streamer and watcher to stream the same screen content, while each session may have different quality configurations. Each session may define a session id, typically an identifier associated with the same streamer. Hence, in a system with multiple streamers or with a broker (broker device 114), a stream session may be uniquely identified by the streamer name and the session id (or broker name and the session id) at the same time. Note that if the screen sharing involves one streamer, one broker and one watcher, two sessions are required—one from the streamer to the broker and the other from the broker to the watcher.
Different stream sessions can connect one or more watchers to the same media object or different media objects of one or more streamers. For example, if the streaming machine/device 104 has multiple screens, the watcher can create multiple video objects to stream these screens independently. However, this is the rare case. Typically, different video objects can contain frames from the same screen but with different quality parameters. Each media object may include a media object ID, usually a human understandable alphanumeric string with a suffix of quality level. Similar to the session ID, the media object ID is only meaningful to the same streamer. In a system with multiple streamers or with brokers, a media object must be uniquely identified by both the streamer name and the media object ID. The broker does not change the media object ID when forwarding the stream.
Software associated with the streamer and watcher may contain two planes (similar to an IP router): a data plane, and a management plane. The data plane is responsible for the transferring of the actual screen frame and audio using a UDP-based RTScreen data stream protocol; and the management plane is responsible for setting up and tearing down stream sessions as well as applying adaptive streaming policy via a TCP-based RTScreen management protocol as defined herein. As a result, similar to the stream session, there is a stream management session between a pair of a streamer and watcher (or streamer and broker, or broker and watcher), which is uniquely identified by a pair of streamer name and watcher name (or streamer name and broker name, or broker name and watcher name).
A watcher's management plane may contain components 370 including a stream management client 372 and an adaptive streaming policy 374. The stream management client 372 connects to the stream management server 322 on the streamer side. The adaptive streaming policy 374 module is discussed in greater detail below.
Procedure of Streaming and Messages:
As previously described, the behavior of the streamer and the watcher may be categorized into two planes—the management plane and the data plane. Further, two independent protocols may be used for the two planes: a TCP based request-response style RTScreen management protocol for the management plane, and a UDP based RTScreen data stream protocol for the data plane. One possible procedure of streaming screen frames in the management plane is shown as follows, and also illustrated in
One possible procedure of streaming screen frames in the data plane is shown as follows, and also illustrated in
Emergent conditions may be triggered by a user interface (UI) of the application 102; e.g., the app is closed, or switched to background, etc.
Referring to
Details of Data Plane and Timing:
Referring to
The streaming of audio may be conducted using a similar procedure.
According to a possible timing model of RTScreen, it is assumed that the streamer and the watcher are synchronized with the same clock in millisecond precision (for the convenience of explanation). This possible timing model is defined as follows:
On the streamer side, a presentation time may be attached to each frame and sent along with the frame to help the watcher render the frames smoothly. Typically, the presentation time of a frame is just the grabber_ts.
On the watcher side, a frame has “arrived” if a certain percentage of the whole frame is received. This percentage is a parameter of the watcher (typically set to 95%). On the watcher side, for each iteration, the “current time” is typically defined as the watcher_ts A frame may be considered timely or “on time” if the current time matches the presentation time with a specified delay, as further described herein.
Message Format of the RTScreen Data Stream Protocol:
In one embodiment, the RTScreen Data Stream Protocol is a stateless binary protocol based on UDP. The streamer and the watcher mainly exchange three types of message: PACKET of screen frame or audio segment, FEEDBACK from the watcher to the streamer, and FEEDFORWARD from the streamer to the watcher containing important statistics that assist the adaptive streaming policy. Each type of message has a few variations for different purposes. A list of possible messages are provided below:
In some embodiments, a few rules may be implemented for message format as follows: all integer fields in the messages may be little endian; and all timestamps and all types of round trip time may be in milliseconds.
Packet of a Screen Frame:
As mentioned above, a screen frame may be encoded, packetized, and transmitted from a streamer (streaming device 104) to a watcher (receiving device 106). One possible format of the packet of a screen frame is illustrated in
FEEDBACK of Individual Packet of Screen Frame:
The feedback of an individual packet of a screen frame may be in the form of a FEEDBACK message sent by the watched/receiving device 106 immediate upon the receiving of each PACKET message. One possible format is illustrated in
FEEDBACK of a Whole Screen Frame:
The feedback of individual packet of screen frame is a FEEDBACK message sent by the watched/receiving device 106 after the whole frame is rendered. The format is illustrated in
FEEDFORWARD of a Whole Stream Session (Periodical):
The periodical feedforward of a whole stream session may be sent by the streamer to the watcher, mainly to deliver timing and congestion related information to help the watcher make adaptive streaming decisions. One possible format (periodical) is illustrated in
FEEDFORWARD of a Whole Stream Session (Emergent):
The emergent feedforward of a whole stream session is sent by the streamer to the watcher, usually denoting emergency conditions that require tearing down the stream session (or downgrade the stream session, which is essentially the same as tearing down this session). One possible format is illustrated in
Message Format of the RTScreen Management Protocol:
In one embodiment, the RTScreen management protocol is a text based protocol with payload of JSON objects. It is a stateless request-response protocol. Possible messages may include the following:
LIST message
The List message allows the watched to browse the available media objects on a streamer or a proxy. The format of the LIST message is shown as follows:
Response of the LIST Message
The response of the LIST message provides the watcher a list of available media objects, so that the watcher can set up stream session associated with one or more of them. The format of the response of LIST message may be as follows:
The response of the LIST message may contain a list of media objects (it might be empty if no media object is provided by the streamer, which is a rare case). Each list item contains at least four fields: an “ID”, a “service_level”, a “sync_seq” (synchronization sequence, media objects with the same sync_seq has the same reference of presentation time), and a “desc” (description text). Besides, the list item should also provide “frame_width”, “frame_height”, “target_fps”.
ADVERTISE Message
The response of the ADVERTISE message provides the proxy a list of available media objects, so that the proxy can set up stream session associated with one or more of them. The format of the response of the ADVERTISE message is shown as follows, which is almost the same as the response of the LIST message:
Response to ADVERTISE Message
The response to the ADVERTISE message is sent by the proxy, which is just a confirmation of the delivery of the ADVERTISE message, illustrated as follows:
SETUP Message
The watcher sends a SETUP message to set up a stream session. The format of the response of the SETUP message is shown as follows:
The “media_object_id” is the ID of the media object (usually obtained from the response of the LIST message). The “ip_address” is the IP address of the watcher to receive the stream. The “nonce” is a randomly generated 128 bit number to help both ends to generate cryptographic key to encrypt the streaming data.
Response to SETUP Message
The streamer sends a response of the SETUP message to finish the stream session set up. Once this message is sent out, the streamer may start sending PACKET messages to the designated IP address and the port. The format of the response of SETUP message may be as follows:
The “error” is an error code. If it is “None”, remaining fields will actually appear in thi smessage. Otherwise, this field denotes the error reason and the session is not set up. “session_id” is the session ID. “ip_address” is the IP address of the streamer to send the stream. The “stream_port” is the UDP port of the streamer to send the stream, and the “feedback_port” is the UDP port of the sender to receive feedback. It should be noted that these same names of ports means different things for the streamer and the watcher. The “nonce” is a randomly generated 128 bit number to help both ends to generate cryptographic key to encrypt the streaming data, similar as the “nonce” in the SETUP message.
It should be further noted that session set up and tear down will be different if the proxy is used, because usually in this case the streamer and the watcher does not possess public IP address and their TCP and UDP flows must be initiated by the streamer and the watcher to reach the proxy.
TEARDOWN Message
The TEARDOWN message is sent by the watcher to tear down an existing stream session. The format of TEARDOWN message is shown as follows:
The “session_id” is the ID of the stream session, usually obtained by the SETUP message.
Response of TEARDOWN Message
The response of the TEARDOWN message is sent by the streamer to confirm tearing down an existing stream session. The format of the response of TEARDOWN message is shown as follows:
It should be noted that the TEARDOWN message is not necessarily required to tear down a stream session. Stream session may be torn down directly in the data plane.
The following technologies were developed as part of RTScreen and the application 102 to improve the real-time streaming quality between the devices 101, collectively shown in Table 1. Using feedback and/or feedforward control messages, all following technologies are implemented in the adaptive streaming policy modules on the streamer side and the watcher side to improve real-time streaming performance (including latency reduction and throughput improvement). These modules work in concert to make decisions periodically or upon an event to achieve a common objective—smooth low delay stream with good resolution. In some embodiments, smoothness is the top priority, i.e., no stuck of frame for buffering or frame corruption. Low screen to screen delay may be the next priority. Streaming resolution may have a lower priority. As a result, the policy module may take effect by adaptively changing parameters of the underlying streaming mechanisms (such as frame rate, receiver buffering delay, quality level, etc.) provided by the protocol and the implementation to achieve these goals from different perspectives.
A summary associated with each core technology or technological feature of RTScreen is as follows:
Adaptive Frame Rate:
In one embodiment, RTScreen includes an Adaptive Frame Rate feature where the streamer detects how much the screen content changes. Referring to
The streamer may adjust the frame rate according to the extent of the screen content change. Typically the default rate may be 30 frames per second, and it can be configured on the streamer side. Typically a key frame can be enforced for every 30 frames. When the screen content is detected to be in almost not changing mode, the frame rate can be decreased by half for every one second, until one frame per second. However, if the screen content is detected to be in “changing significantly” mode, the rate may be immediately restored to the default streaming rate, and a key frame may be injected.
As a result, if the screen does not change significantly or does not change at all, which is common in streaming desktop or in some apps Apps where the UI maintains a similar image content, the transmission bandwidth consumption is significantly reduced.
Adaptive Packetizing-Duplication-Retransmission:
In one embodiment, implementing the present feature under RTScreen, each frame may be packetized with a packet size typically ranging from 192 byte to 1472 byte with an increment of 32 byte (mainly for the convenience of encryption). For the key frames where the influence of packet loss is more significant, the packet size is determined by the quality of the communication channel indicated by packet loss rate and RTT. The more lossy the link is, the smaller the packet size is used. For video codecs such as H.264, the codecs may also be instructed to generate NAL (network abstraction layer) Units of smaller size. This scheme may also work without the assistance of codecs and even agnostic of the bit stream after encoding.
A rule of thumb on packet size: size=base size/(1+k1*loss rate+k2*RTT). Here base_size is typically the maximum packet size configured on the streamer, loss_rate is the packet loss rate in percentage, and RTT is the round trip time, while k1 and k2 are two parameters obtained from empirical results. If the calculated packet size is less than the minimum packet size configured on the streamer, it is set to the minimum packet size.
Besides general packetization, the sender may actively duplicate the packets for sending a frame, i.e., send the same packet twice, in the hope that as long as one packet is received on time, the stream can be played smoothly. To avoid burst loss, the duplicated packets are sent out of order in a random fashion. Additionally, the sender may retransmit a packet if the feedback is not received within a short time period, typically 15 ms to (1000/FPS) ms. A packet may be retransmitted at most once.
The rationale behind packet duplication and retransmission is that given the uncontrollable delay of the link, esp., link with one or more wireless hops, and given the independence of the delay of packets, more attempts in packet level may help the delivery of the whole frame in time. An example of packet receiving timestamps is shown in
In some embodiments, this technology applies only to key frames in low service level to mainly cope with high jitter in the network transmission by achieving lower delay with higher bandwidth consumption. In H.264, key frames also include the SPS (sequence parameter set), PPS (picture parameter set), and other NAL Units that help the codec to stay synchronized. It should be noted that this may actually consume more network bandwidth.
Automatic Delay Control:
The present technology controls the buffering delay on the watcher side. To illustrate further, the following discussion elaborates on timing.
On the watcher side, for each iteration, the watcher process is woken up by the watcher timer and check whether there is a frame received and ready to be shown on the screen. Previously it is assumed the streamer and the watcher are synchronized with the same clock in millisecond precision. In reality, the clock on the streamer and the clock on the receiver cannot be synchronized perfectly, since they have different definition of time at 0. As a result, the watcher implementation will associate its clock to the streamer, typically at the arrival of the first packet of the first session. For multiple sessions of the same content in different service levels, the streamer is responsible to maintain the consistency of all presentation time (i.e., all presentation time of these sessions have the same reference of time at 0). This allows convenient mixing of the audio stream with the video stream or switching from one video stream of a certain quality to another video stream of another quality of the same content.
For simplicity, consider there is only one streamer-watcher pair; and on the watcher side, the clock association may function as follows:
For a system with multiple streamers and multiple watchers, relative presentation time and relative current time are defined for each streamer-watcher pair on the watcher side. The absolute presentation time is defined by each streamer and it does not change from watcher to watcher.
For the watcher, a frame is delivered (or a frame arrives at the watcher) if a large portion (typically 95%) of the frame is received. A frame is “on time” if relative current time−buffering delay≥relative presentation time (of this frame) and the frame is “not late.” A frame may be deemed late if its immediate next frame also satisfies the previous relation, shown in
Similarly, a frame may be considered to still be early if the relative current time is less than the presentation time of the frame plus the renderer delay. Usually the watcher process wakes up regularly at the target frame rate. However, it can wake up early if the stream does not maintain constant frame rate. Due to the existence of buffering, the watcher will always delay the rendering of the frame to a certain extent for buffering enough amount of packets. As long as the delay is enough small, human users will not be able to feel it. The buffering delay is a parameter of the watcher, which can be dynamically changed by the watcher by the following algorithm:
Automatic delay control also mitigates packet jitter from a different perspectivelt does not interfere with retransmission because the delay here means rendering delay respect to the frame grabbing time, not related to the delay of packet feedback (which is always sent immediately upon a packet is received).
The rationale behind automatic delay control is based on the packet propagation delay and jitter, which can be inferred from RTT.
Multilevel Adaptive Bitrate Streaming:
Since RTScreen mainly targets screen sharing on mobile devices such as smartphone and laptop with wireless connection, a multilevel adaptive bitrate streaming technology was developed. This technology is different from the adaptive bitrate streaming technologies designed for video-on-demand website, which segment the video and encode the segments into multiple levels ahead of time. Instead, our method is segment: less and the encoding is done on the fly. Note that online video can use much more capable servers than our streamer, which is typically on a smartphone or a laptop. In this case, it is desirable to monitor in real time the resource limit on both the streamer and the watcher, as well as the network capacity to make a decision on the service level. Our adaptive bitrate streaming method contains mainly five parts:
Possible service levels are listed in Tables 2, 3, 4, 5 below for screen frames of different ratio.
A few explanations regarding the service levels and the tables:
For iPhone 6, iPhone 6S, iPhone 7, iPhone 8, whose screen resolution is non-standard (1334*750) but the ratio is close to standard 16:9, resolution can be resized to 1280*720 in the preprocessor on the streamer side.
For an iPhone X, whose screen resolution is non-standard (2436*1125) and the ratio is also non-standard (13:6), it can be resized it to 2340*1080 in the preprocessor on the streamer side and use another set of quality level parameters in Table 5.
The receiver detects link congestion both from packet loss rate and RTT and estimate a link capacity using a similar method as TCP throughput modeling. Both the sender and the receiver monitor the host resource usage. If the link capacity is enough for the next service level and the host resource usage is below a threshold level (typically 50% of the CPU, 90% of the memory, and 50% of the hardware codec), the watcher makes the decision to upgrade the current service level to the next level. Since the link capacity estimation and host resource monitoring are made periodically (typically every 30 seconds), if the link capacity drops to a critical level (typically twice of the target bitrate of the current service level), or the host resource usage is above the threshold level, the watcher makes the decision to downgrade the current service level to the previous level.
Once the service upgrade or downgrade decision is made, a new session with the target service level will be set up. If no congestion is detected in the new session (i.e., most frames in the new session are delivered on time), and once a key frame is received in the new stream session, the watcher can switch to render the frames from the new session, and the old session can be torn down.
Packet Scheduling and Traffic Shaping:
Packet scheduling is implemented on the sender side to prioritize packets in stream sessions with low service level. This is designed to improve the smoothness when multiple stream sessions are running at the same time, especially when the watcher is going to switch from a low quality level to a high quality level. At the switching point, two sessions are open and the sender will prioritize packets in the low quality level, such that even if the high quality session cause pressure on the network bandwidth, only the packets in the high quality session will be lost. Also for the codec, frames of low service level are always encoded first for the same reason. The following formula may be used to calculate priority:
priority=(10−service level)*100+(100−min(session id,100))
The rationale is that session with lower service level has higher priority, and if two sessions are in the same service level, the session which is set up early (hence, with a smaller session ID) has higher priority. The calculated priority is an integer from 100 to 999.
Traffic shaping is implemented on the sender side for all sessions to enforce a maximum bandwidth usage. It mainly prevents bursting traffic that causes UDP packet loss on the sender side, especially when there are many sessions connected to the same streamer. In this case, once a new frame is grabbed, it is encoded and sent to multiple watchers, which creates a lot of traffic in a short amount of time (usually within one or a few milliseconds) compared to the frame grabbing period (more than 30 milliseconds).
Packet scheduling and traffic shaping are collectively implemented on the streamer, as shown in
Overrun Control:
Overrun control means the actions if the current frame is still in process while the next frame arrives. It can happen on the streamer or the watcher due to many reasons such as another process is using a significant portion of the shared CPU resource. Usually when overrun happens, there is no better method but discard the next frame, as illustrated in
For the streamer, overrun is detected if the streamer timer expires for the next iteration but the current iteration is still not finished. The streamer will take the following action once overrun happens:
For the watcher, overrun is detected similarly using the watcher timer. Note that whether the packets are received on time is irrelevant to overrun. It is less likely that a watcher overruns since watcher typically only handles one session. The watcher will take the following action once overrun happens:
The rationale behind overrun control is that overrun happens when there are too much work to do or too limited resource to spend within one period of frame processing. If it is caused by the streaming app, the workload can be reduced by skipping frames, in the hope that future frame can still be processed smoothly. However, if sever overrun happens continuous, usually it means some other process is taking a significant portion of system resource for emergent work, and hence, the streaming application 102 can be paused temporarily to wait for the user to resume.
Use Cases
A few use cases to illustrate how the system 100 or system 100A may be implemented for collaborative content creation and sharing will be discussed in greater detail.
The present inventive concept is designed to be more than just an individual mobile application. It defines a novel platform to provide private local content creation and sharing in a decentralized or fog network. The described systems 100A-100B emphasize that real time collaborative content creation is more advanced than screen sharing. It enriches the users with exciting user experience. Further, all these can take place without external Internet connectivity. The nature of the design as a platform allows for a variety of new use cases.
Consider an augmented reality (AR) application that APPLE used to demonstrate its ARKit SDK where a child is able to build and view virtual lego sets in the real world. This design would enable collaboration among children as they collaborate to build something together. This concept can be expanded all the way up to a professional level, with mechanical engineers viewing and analyzing CAD designs in real-time via augmented reality. This kind of collaborative content creation can take place anywhere without the need of internet connectivity as everything will take place in a fog network. The workflow is demonstrated in
Another application use case may relate to off-site meetings and gathering. Sharing and Collaborating on reports, presentations, and legal documents can be an inconvenient process when a group of associates are together in a restaurant or a hotel conference room. The service platform enables users to easily share and collaborate on documents privately even when there is no network connection from their device. Specifically, the CastEx service platform provides the users with tools for content creation. A basic example will be presenting a whiteboard in the service platform with few handy tools to draw or write something. The user can create content this way and also share it in real-time with others.
An example application of this design is for use in medical offices. It is not uncommon for a medical professional to share diagnosis and treatment information with the patient at the time of the visit. Descriptions of each of these data may or may not be given to the patient upon completion of the visit, either on paper or distributed digitally. Having this information is critical to making sure that medical advice is followed exactly as prescribed, and sharing it with family members or other medical professionals may allow for more precise treatment. Furthermore, the majority of medical offices today use digital media to create and describe this information. To improve communication between medical professionals and patients, adding the design proposed here to the existing systems used would enable the real-time informational content to be shared with the patient during the appointment, with the option of saving the stream and the audio from the doctor-patient conversation to the patient's mobile device for later review, allowing the patient to revisit the appointment exactly as it occurred.
The inventive concepts discussed herein provide various improvements to existing device-to-device communications. For example, while some mobile devices are configured with Walkie-talkie like features or functionality, the application 102 described herein could configure such devices to include video streaming in addition to audio streaming. Any device, configured by the application 102 (e.g., having the application 102 installed or having access to its functionality), can share media content with other devices similarly configured.
Main memory 704 can be Random Access Memory (RAM) or any other dynamic storage device(s) commonly known in the art. Read-only memory 706 can be any static storage device(s) such as Programmable Read-Only Memory (PROM) chips for storing static information such as instructions for processor 702. Mass storage device 707 can be used to store information and instructions. For example, hard disks such as the Adaptec® family of Small Computer Serial Interface (SCSI) drives, an optical disc, an array of disks such as Redundant Array of Independent Disks (RAID), such as the Adaptec® family of RAID drives, or any other mass storage devices, may be used.
Bus 701 communicatively couples processor(s) 702 with the other memory, storage, and communications blocks. Bus 701 can be a PCI/PCI-X, SCSI, or Universal Serial Bus (USB) based system bus (or other) depending on the storage devices used. Removable storage media 705 can be any kind of external hard drives, thumb drives, Compact Disc-Read Only Memory (CD-ROM), Compact Disc-Re-Writable (CD-RW), Digital Video Disk-Read Only Memory (DVD-ROM), etc.
Embodiments herein may be provided as a computer program product, which may include a machine-readable medium having stored thereon instructions which may be used to program a computer (or other electronic devices) to perform a process. The machine-readable medium may include, but is not limited to optical discs, CD-ROMs, magneto-optical disks, ROMs, RAMs, erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing electronic instructions. Moreover, embodiments herein may also be downloaded as a computer program product, wherein the program may be transferred from a remote computer to a requesting computer by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., modem or network connection).
As shown, main memory 704 may be encoded with the application 102 that supports functionality discussed above. In other words, aspects of the application 102 (and/or other resources as described herein) can be embodied as software code such as data and/or logic instructions (e.g., code stored in the memory or on another computer readable medium such as a disk) that supports processing functionality according to different embodiments described herein. During operation of one embodiment, processor(s) 702 accesses main memory 704 via the use of bus 701 in order to launch, run, execute, interpret, or otherwise perform processes, such as through logic instructions, executing on the processor 702 and based on the application 102 stored in main memory or otherwise tangibly stored.
The description above includes example systems, methods, techniques, instruction sequences, and/or computer program products that embody techniques of the present disclosure. However, it is understood that the described disclosure may be practiced without these specific details. In the present disclosure, the methods disclosed may be implemented as sets of instructions or software readable by a device. Further, it is understood that the specific order or hierarchy of steps in the methods disclosed are instances of example approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the method can be rearranged while remaining within the disclosed subject matter. The accompanying method claims present elements of the various steps in a sample order, and are not necessarily meant to be limited to the specific order or hierarchy presented.
The described disclosure may be provided as a computer program product, or software, that may include a machine-readable medium having stored thereon instructions, which may be used to program a computer system (or other electronic devices) to perform a process according to the present disclosure. A machine-readable medium includes any mechanism for storing information in a form (e.g., software, processing application) readable by a machine (e.g., a computer). The machine-readable medium may include, but is not limited to optical storage medium (e.g., CD-ROM); magneto-optical storage medium, read only memory (ROM); random access memory (RAM); erasable programmable memory (e.g., EPROM and EEPROM); flash memory; or other types of medium suitable for storing electronic instructions.
Certain embodiments are described herein as including one or more modules, services, and the like. Such modules are hardware-implemented, and thus include at least one tangible unit capable of performing certain operations and may be configured or arranged in a certain manner. For example, a hardware-implemented module may comprise dedicated circuitry that is permanently configured (e.g., as a special-purpose processor, such as a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware-implemented module may also comprise programmable circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software or firmware to perform certain operations. In some example embodiments, one or more computer systems (e.g., a standalone system, a client and/or server computer system, or a peer-to-peer computer system) or one or more processors may be configured by software (e.g., an application or application portion) as a hardware-implemented module that operates to perform certain operations as described herein.
Accordingly, the term “hardware-implemented module” or “module” encompasses a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner and/or to perform certain operations described herein. Considering embodiments in which hardware-implemented modules are temporarily configured (e.g., programmed), each of the hardware-implemented modules need not be configured or instantiated at any one instance in time. For example, where the hardware-implemented modules comprise a general-purpose processor configured using software, the general-purpose processor may be configured as respective different hardware-implemented modules at different times. Software may accordingly configure a processor, for example, to constitute a particular hardware-implemented module at one instance of time and to constitute a different hardware-implemented module at a different instance of time.
Hardware-implemented modules may provide information to, and/or receive information from, other hardware-implemented modules. Accordingly, the described hardware-implemented modules may be regarded as being communicatively coupled. Where multiple of such hardware-implemented modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware-implemented modules. In embodiments in which multiple hardware-implemented modules are configured or instantiated at different times, communications between such hardware-implemented modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware-implemented modules have access. For example, one hardware-implemented module may perform an operation, and may store the output of that operation in a memory device to which it is communicatively coupled. A further hardware-implemented module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware-implemented modules may also initiate communications with input or output devices.
It is believed that the present disclosure and many of its attendant advantages should be understood by the foregoing description, and it should be apparent that various changes may be made in the form, construction, and arrangement of the components without departing from the disclosed subject matter or without sacrificing all of its material advantages. The form described is merely explanatory, and it is the intention of the following claims to encompass and include such changes.
While the present disclosure has been described with reference to various embodiments, it should be understood that these embodiments are illustrative and that the scope of the disclosure is not limited to them. Many variations, modifications, additions, and improvements are possible. More generally, embodiments in accordance with the present disclosure have been described in the context of particular implementations. Functionality may be separated or combined in blocks differently in various embodiments of the disclosure or described with different terminology. These and other variations, modifications, additions, and improvements may fall within the scope of the disclosure as defined in the claims that follow.
This is a non-provisional application that claims benefit to U.S. provisional application Ser. No. 62/561,069 filed on Sep. 20, 2017 which is incorporated by reference in its entirety.
This invention was made with government support under ECCS-1408409 awarded by National Science Foundation. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
62561069 | Sep 2017 | US |