1. Field of the Invention
The present invention relates generally to computing systems and, more specifically, to a technique for offloading compute operations utilizing a low-latency data transmission protocol.
2. Description of the Related Art
Low power design for many consumer electronic products has become increasingly important in recent years. With the proliferation of battery-powered handheld devices, efficient power management is quite important to the success of a particular product or system. Among other things, users of handheld devices are demanding the ability to perform tasks on their device that may require the processing of large or complex compute operations. Examples of such tasks include auto-fix of captured video, stereoscopic image and video processing, computer vision, and computational photography. However, the demand for performing such tasks on a handheld device comes at the cost of reduced battery life.
Specifically, irrespective of the techniques that have been developed to increase performance on handheld devices, such as multi-threading techniques and multi-core techniques, too much power may be consumed by these devices when performing such computationally expensive tasks, which can lead to poor user experiences. Therefore, although a handheld device may have the processing power to perform those types of tasks, it may not be desirable for the handheld device to perform such tasks because of the negative impact on battery life. In fact, many handheld devices are simply not configured with sufficient processing power to perform complex processing tasks like those described above, because, as is well-understood, including such processing power in handheld devices would come at the cost of accelerated battery drain.
As the foregoing illustrates, what is needed in the art is a technique that allows handheld devices to perform compute operations that are more complex without substantially impacting battery life.
One embodiment of the present invention sets forth a method for offloading one or more compute operations to an offload device. The method includes the steps of discovering the offload device in a wireless private area network (WPAN) via a low-latency communications protocol, offloading data to the offload device for performing the one or more compute operations, and receiving from the offload device processed data generated when the one or more compute operations are performed on the offloaded data.
One advantage of the disclosed method is that a handheld device may perform complex operations without substantially impacting battery life. Another advantage of the disclosed method is that a handheld device has more flexibility in terms of the types of applications that can be installed or downloaded and executed using the handheld device.
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
In the following description, numerous specific details are set forth to provide a more thorough understanding of the present invention. However, it will be apparent to one of skill in the art that the present invention may be practiced without one or more of these specific details.
Wi-Fi Direct is a standard that allows Wi-Fi devices to connect and communicate with each other without the need for a wireless access point. Therefore, handheld devices 102 and device 106 may communicate directly, or peer-to-peer (P2P), through the Wi-Fi Direct protocol. In one embodiment, WPAN 100 is configured such that the handheld devices 102 may offload certain classes of compute operations to the offload device 106 by utilizing Wi-Fi Direct. Examples of tasks involving such compute operations include auto-fix of captured video, stereoscopic image and video processing, computer vision, and computational photography. Compared to devices communicating in a WPAN via a wireless access point, Wi-Fi Direct provides higher throughput for devices within close range, allowing for the transmission of greater amounts of data. In addition, with the ability to communicate directly between a handheld device 102 and the offload device 106, the amount of time required to offload a compute operation from the handheld device 102 to the offload device 106 and receive the processed results back from the offload device 106 may be within the processing times tolerated by many applications. Therefore, a handheld device 102 may offload compute operations suited for real-time computing offload scenarios, such as real-time processing of photos and videos captured using handheld device 102.
Although Wi-Fi Direct has been illustrated as an appropriate protocol for exchanging communications between the handheld devices 102 and the offload device 106, any protocol that encourages low-latency data transmissions may be utilized. In addition, a combination of low-latency communications protocols may be used for exchanging data and information between the handheld devices 102 and the offload device 106. For example, real-time transport protocol (RTP) may be used in conjunction with Wi-Fi Direct for streaming data between a handheld device 102 and the offload device 106. Using RTP in conjunction with Wi-Fi Direct may be used in situations where the compute operations involve the processing of video or audio data (e.g., the data and the resulting processed data would be streamed).
As also shown, and as will be described in greater detail herein, the handheld device 102 includes a client process 208 that communicates with a server process 210 via the communications protocol 108 when offloading compute operations from the handheld device 102 to the offload device 106. In operation, to offload an operation to the offload device 106, the client process 208 may discover the offload device 106 within the WPAN via a discovery mechanism. For example, the handheld device 102 and the offload device 106 may negotiate a link by using Wi-Fi Protected Setup. Once the offload device 106 is discovered, the client process 208 may offload large or complex compute operations to the offload device 106. Prior to offloading the compute operations from the handheld device 102, the client process 208 may perform certain operations such as encoding data for the compute operations that are being offloaded. Optionally, the client process 208 may also encrypt the encoded data in an effort to secure the data prior to offloading the compute operations over the wireless link. The server process 210 may perform the compute operations offloaded from the handheld device 102 to the offload device 106. Prior to performing the compute operations, the server process 210 may decrypt the data for the compute operations (i.e., if the data was encrypted). In addition, the server process 210 may decode the data and then perform the compute operations. Upon performing the compute operations, the server process 210 may transmit the processed results to the handheld device 102.
As an example for offloading certain classes of compute operations from the handheld device 102 to the offload device 106, the offload device 106 may advertise specific services that the handheld device 102 may need, such as support for gesture recognition or facial recognition tasks. If the handheld device 102 is then utilized for a gesture recognition or facial recognition task, the handheld device 102 may offload data collected by the handheld device 102, such as one or more captured-images, to the offload device 106 that has advertised those specific services. In other words, the processing related to the gesture recognition or facial recognition task occurs at the offload device 106, and the offload device 106 then transmits the processed results back to the handheld device 102.
In another contemplated implementation, rather than advertising specific services that the handheld device 102 may utilize, the offload device 106 may advertise its compute capabilities to the handheld device 102. The handheld device 102 can then leverage those compute capabilities on an as-needed basis, such as when executing a more sophisticated computer program. For example, in addition to offloading captured images for gesture recognition or facial recognition task from the handheld device 102 to the offload device 106, the handheld device 102 may also offload the program code for performing the gesture recognition or facial recognition task to the offload device 106. As a result, the offload device 106 is able to perform the gesture recognition or facial recognition task using the data and program code received from the handheld device 102 and then transmit the processed results back to the handheld device 102. With the ability to offload program code to the offload device 106, there is more flexibility in terms of the types of applications that can be installed or downloaded on the handheld device 102 because the handheld device 102 can offload the work related to those applications to an offload device 106 that advertises its compute capabilities to the handheld device 102.
At 312, the offload device 106, which may include one or more GPUs, performs one or more compute operations using the data offloaded from the handheld device 102. If program code is also offloaded from the handheld device 102, then the offload device 106 performs the compute operations based on the offloaded program code. After performing the compute operations, the offload device 106 encodes the processed results at 314 and optionally encrypts the results at 316, prior to transmitting the results back to the handheld device 102 at 318. Upon receiving the processed results, the handheld device 102, to the extent necessary, decrypts the processed results at 320 and decode the processed results at 322.
As the foregoing illustrates, by using one or more low-latency communications protocols, such as Wi-Fi Direct or a combination of Wi-Fi Direct and RTP, the handheld device 102 may offload compute operations to the offload device 106 and receive the processed results back from the offload device 106 within the processing times tolerated by many applications that may execute on the handheld device 102. In other words, the handheld device 102 may offload compute operations suited for real-time computing offload scenarios, thereby circumventing the need for the handheld device to perform those compute operations directly. Consequently, the handheld device 102 does not have to expend battery power performing such operations, which typically are computationally intensive operations that would quickly drain the batteries powering the handheld device 102.
As shown, the method begins at step 402, where the handheld device 102 discovers the offload device 106 in a WPAN 100 for offloading compute operations (e.g., via a discovery mechanism). As an example, the handheld device 102 may negotiate a link with the offload device 106 by using Wi-Fi Protected Setup.
Optionally, at step 404, the handheld device 102 may offload program code to the offload device 106 that is used for performing the compute operations. For example, the offload device 106 may advertise its compute capabilities to the handheld device 102, allowing the handheld device 102 to offload the program code for performing the compute operations.
At step 406, the handheld device 102 offloads data to the offload device 106 that is required for performing the compute operations. Upon offloading the data, the processing related to the compute operations occur at the offload device 106. At step 408, the handheld device 102 receives the processed results of the compute operations.
The techniques described above for offloading compute operations to an offload device via one or more low-latency communications protocols may be implemented in more conventional Wi-Fi network topologies too. For example,
In one embodiment, WPAN 500 is configured such that the handheld devices 502 may offload certain classes of compute operations to the offload device 506 by utilizing the access point 504. Because communications between the handheld devices 502 and device 506 are transmitted through the access point 504, there may be bandwidth limitations or performance issues when offloading those compute operations to the offload device 506. For example, the amount of data offloaded from a handheld device 502 when offloading a particular type of compute operation to the offload device 506 may exceed the bandwidth limitations of the channel between the handheld device 502 and the offload device 506. In such a situation, not all the data necessary to perform the compute operation can be transmitted to the offload device 506. Therefore, the handheld device 502 is configured to reduce the amount of data transmitted to the offload device 506.
In addition to bandwidth limitations, there also may be timing limitations that reduce the efficacy of offloading compute operations within WPAN 500. For example, the amount of time required to offload a compute operation from the handheld device 502 to the offload device 506 and receive the processed results back from the offload device 506 may be increased because those transmissions have to pass through the access point 504. Consequently, the round trip time associated with offloading compute operations from the handheld device 502 to the offload device 506 may exceed the processing time tolerated by the relevant compute application executing on the handheld device 502. In such situations, the offload techniques described herein may result in a poor user experience. Nonetheless, certain compute applications may have processing times that can be met, even when the transmissions related to compute operations offloaded from the handheld device 502 and the offload device 506 have to pass through the access point 504. Examples of such compute applications may include compute operations suited for near-real-time computing offload scenarios, such as batch processing of a large amount of data (e.g., facial recognition on all photos stored on a handheld device 502 or auto-fix of a badly captured video). In other words, the handheld device 502 may offload compute operations to the offload device 506 that do not require real-time processing.
The techniques described above for offloading compute operations to an offload device via one or more low-latency communications protocols allows a handheld device to execute various compute applications. For example,
In sum, embodiments of the invention provide techniques for offloading certain classes of compute operations from battery-powered handheld devices operating in a wireless private area network (WPAN) to devices with relatively greater computing capabilities operating in the WPAN that are not power-limited by batteries. Examples of such “offload” devices that have greater computing capabilities, but are not power-limited include, without limitation, desktop or server machines that have one or more graphics processing units (GPUs) or one or more GPUs that are configurable to implement Compute Unified Device Architecture (CUDA) capabilities. In order to offload certain classes of compute operations, a handheld device (i.e., the client) may discover an offload device within a local network via a discovery mechanism that includes a low-latency data transmission protocol, such as Wi-Fi Direct. Once the offload device is discovered, the handheld device may offload large or complex compute operations to the offload device, thereby circumventing the need for the handheld device to perform those compute operations, which would drain the batteries powering the handheld device.
As an example for offloading certain classes of compute operations from the handheld device to the offload device, the offload device may advertise specific services that the handheld device may need, such as support for gesture recognition or facial recognition tasks. If the handheld device is then utilized for a gesture recognition or facial recognition task, the handheld device may offload data collected by the handheld device, such as one or more captured-images, to the offload device that has advertised those specific services. In other words, the processing related to the gesture recognition or facial recognition task occurs at the offload device, and the offload device then transmits the processed results back to the handheld device.
In another contemplated implementation, rather than advertising specific services that a handheld device may utilize, an offload device may advertise its compute capabilities to the handheld device. The handheld device can then leverage those compute capabilities on an as-needed basis, such as when executing a more sophisticated computer program. For example, in addition to offloading captured images for gesture recognition or facial recognition task from the handheld device to the offload device, the handheld device may also offload the program code for processing the gesture recognition or facial recognition task to the offload device. As a result, the offload device processes the gesture recognition or facial recognition task and then transmits the processed results back to the handheld device.
One advantage of the disclosed techniques is that the techniques allow handheld devices to perform complex operations without substantially impacting battery life. Another advantage is that the handheld device has more flexibility in terms of the types of applications that can be installed or downloaded and executed using the handheld device.
One embodiment of the invention may be implemented as a program product for use with a computer system. The program(s) of the program product define functions of the embodiments (including the methods described herein) and can be contained on a variety of computer-readable storage media. Illustrative computer-readable storage media include, but are not limited to: (i) non-writable storage media (e.g., read-only memory devices within a computer such as compact disc read only memory (CD-ROM) disks readable by a CD-ROM drive, flash memory, read only memory (ROM) chips or any type of solid-state non-volatile semiconductor memory) on which information is permanently stored; and (ii) writable storage media (e.g., floppy disks within a diskette drive or hard-disk drive or any type of solid-state random-access semiconductor memory) on which alterable information is stored.
The invention has been described above with reference to specific embodiments. Persons of ordinary skill in the art, however, will understand that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The foregoing description and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Therefore, the scope of embodiments of the present invention is set forth in the claims that follow.