This disclosure relates in general to the field of computer systems and, more particularly, to visual computing.
The Internet has enabled interconnection of different computer networks all over the world. While previously, Internet-connectivity was limited to conventional general purpose computing systems, ever increasing numbers and types of products are being redesigned to accommodate connectivity with other devices over computer networks, including the Internet. For example, smart phones, tablet computers, wearables, and other mobile computing devices have become very popular, even supplanting larger, more traditional general purpose computing devices, such as traditional desktop computers in recent years. Increasingly, tasks traditionally performed on a general-purpose computer are performed using mobile computing devices with smaller form factors and more constrained features sets and operating systems. Further, traditional appliances and devices are becoming “smarter” as they are ubiquitous and equipped with functionality to connect to or consume content from the Internet. For instance, devices, such as televisions, gaming systems, household appliances, thermostats, automobiles, watches, have been outfitted with network adapters to allow the devices to connect with the Internet (or another device) either directly or through a connection with another computer connected to the network. As humans spend more and more of their time working with and consuming content using computing systems, concern has developed over how such use impacts human health and well-being. For instance, the impact of extensive use of computing systems on human ergonomics has emerged as a concern within institutions whose employees, customers, and user bases may be using computing systems in a manner that could have a negative impact on their health.
Like reference numbers and designations in the various drawings indicate like elements.
Over the past century, the work and recreation of human beings has become increasingly tethered to computers, electronics, and displays, including personal computers, televisions, and video games. An unintended consequence of this evolution toward activities where users sit and view displays for extended (and increasing) lengths of time are a similarly increasing array of health maladies, many of which are connected to poor ergonomics, posture, and extended sitting, including obesity, muscular skeletal disabilities, heart disease, and others. While a variety of desks, chairs, computer stands, keyboards, and other tools have been developed to help improve ergonomic environments for humans using user computing devices, an inherent weakness of such devices is that while humans engross themselves in work, video content, a video games, etc. using the user computer, the user loses focus on using the computer or chair or desk or keyboard correctly or consistently and may still relapse into a bad posture, positioning, or other habit that may ultimately be detrimental to the user's health, among other example issues.
Human pose, posture, and mood detection has been the source of research in recent years, with interest, for instance, from retailers and other commercial enterprises in better understanding the mood and behavioral tendencies of customers (e.g., as they enter or navigate a store). Human pose and movement detection have also been the subject of research within autonomous vehicle and industrial robotic systems, for instance, to detect humans within an environment and predict their behavior (e.g., vis-à-vis the vehicle or robot) based on the pose or posture of the human (e.g., to predict whether a human is more or less likely to move into the path of the vehicle and prompt evasive movement by the machine). Solutions currently being developed, however, rely on proprietary and expensive sensors, as well as large, resource intensive machine learning models and systems that require specialized computing systems. Such solutions are impracticable in connection with consumer-level user computing devices, both in the cost of such solutions, the physical dimensions of the solutions (which may conflict with their integration within user computing devices, which tend to evolve into thinner and/or smaller form factors), as well as such solutions overwhelming the resources of typical user computing devices (e.g., where memory and processor resources intended to perform the core function of the user computing device (e.g., providing content to a user) would be diverted to the performance of a complex machine learning algorithm used for the secondary purpose of detecting user posture), among other example issues.
In one example, sensors may be provided on a user computer device, which may automatically detect the posture of the user, while the user uses the user computer device. The posture may be continuously sensed, such that the user is provided with immediate and even historical feedback of the user's posture while using the device, including prompts to correct bad posture detected by the user and thereby improve negative health outcomes, which may be connected to users' unconscious and prolonged bad posture while using such devices.
For example, an improved user computing device, such as laptop computers, desktop computers, smart phones, video game consoles, and smart televisions may be equipped with sensors and logic to implement contactless posture detection and provide dynamic and accurate ergonomic feedback to the user. As user computing systems are often mass marketed, the economic model dictates that the bill of materials to construct such systems and their constituent subsystems be constructed with relatively inexpensive hardware so as to allow the price point of the overall system to be assessable to the general populous. This often requires a tradeoff, as advanced, state of the art hardware may guarantee the best or most desirable performance but may make the overall system prohibitively expensive. As such, a contactless posture detection system intended for mass market user computing devices ideally utilizes inexpensive sensor hardware that may be cheaply and easily integrated within existing computing systems. Making such a system accessible to the wider populous allows the potential health and social benefits (from improving user ergonomics) to be maximized. However, solution designs that utilize inexpensive components traditionally challenge the effectiveness and accuracy of subsystems designed for the mass market. For such systems to be accepted and adopted by users, the functionality and features of the system, particularly those relating to health-related feedback, should be accurate, with minimal false detections, so as to earn the trust of users, among other example considerations.
As an example of advanced imaging sensors, full RGB Depth (RGB-D) sensors have been developed, which natively fuse RGB sensors and compatible high resolution depth sensors (e.g., stereo or LiDAR) into a unified solution. Such sensors, given their relatively high cost and advanced performance capabilities, are currently implemented in research and industrial domains, such as robotics, autonomous vehicles, and visual computing research. An RGBD camera, however, is currently out of reach from a price perspective to be integrated within consumer-level user computing devices, without breaking the pricing model for such devices (e.g., where current personal computing RGB cameras are priced under $10 and RGBD cameras in the $100s).
While RGB cameras are readily available in modern user computing devices, posture detection results derived from a single, user-facing RGB camera sensor struggle to accurately capture the true posture of the user. Such a solution is generally likely to return errors in determining user posture in some situations (e.g., back posture) without explicit depth information (e.g., to determine forward and backward lean). While advanced camera or image sensor solutions may serve as a more accurate basis for a posture detection subsystem, such sensors may be prohibitively expensive to include in mass market user computing devices. Further, machine learning models that are built and trained to accept the relatively large and complex data sets from advanced camera sensors (e.g., stereo cameras, RGB-D cameras, etc.) may be unduly heavy and strain the memory and/or compute resources of a conventional user computing device developed for the mass market. Additionally, user-facing cameras have become nearly standard in conventional user computing devices, making it advantageous to implement a posture detection subsystem that makes use of this already available feature. Further, machine learning models developed to utilize conventional camera images may be more management in size and more appropriate for the computing capacity of user computing systems.
In some implementations of user computing devices, a contactless posture detection subsystem may be implemented, which utilizes traditional, user-facing RGB image sensors, by additionally providing a compact and inexpensive time of flight (ToF) sensor to enhance the data obtained from the RGB image with a low resolution depth image (e.g., 8×8 pixels corresponding to the camera's multi-megapixel RGB image). In some cases, the ToF sensor may be utilized in association with additional features of a user computing device, such as user presence detection. The photo image captured by the camera may be utilized to generate a preliminary posture estimation. The low-resolution depth image may be provided as an input, with the preliminary posture estimation, as inputs to a fusion model to introduce minimal additional depth information to lift the remaining ambiguities and markedly improve the accuracy of the posture determination.
Turning to the simplified block diagram 200 of
In some implementations, “secondary” programs, such as posture detection engine, may be run on other processor hardware 225 so as not to interfere with the performance of the “principal” applications or content to be implemented using CPU 210. For instance, additional processor(s) 225 may be provided, for instance, in the lid of a laptop computer or elsewhere on the motherboard and may execute at least a portion of the logic of posture detection engine 250. In some implementations, additional processor 225 may implement specialized processing logic, such as processing logic adapted for use in implemented convolutional neural networks (CNNs), deep neural networks (DNNs), spiking neural networks (SNNs), or other machine learning algorithms, which may be relied upon by the posture detection engine 250. Further, additional processing hardware (e.g., 225) may also include near memory (e.g., scratchpad memory), among other elements, to assist in accelerating workloads performed at the processing hardware 225 and/or to offload workloads that may compete with those for execution using CPU 210 and memory 215, among other example implementations.
In one example implementation, a posture detection engine 250, together with camera sensor 230 and depth sensor 240 may implement a posture detection subsystem of the user computing device 205. In one example, posture detection engine 250 may include a person detection engine 255, a neural network engine 260, a fusion engine 265, and a feedback engine 270, among other example components and subcomponents, including components representing subdivisions or combinations of these components, among other example implementations. In one implementation, person detection engine 255 may be implemented in hardware circuitry and/or software to utilize image data generated the camera sensor 230 and/or depth sensor 240 to identify a subarea of the overall image viewed using the camera sensor 230 that corresponds to a person-user using the user computing device 205 and within the field of view of the camera sensor 230 and depth sensor 240. The respective data generated by the camera sensor 230 and depth sensor 240 may be cropped or focused to include data corresponding to the subarea capturing the user and exclude data outside this subarea (e.g., to assist in making the input and the models using the input more lightweight and enhancing the efficiency of these models the user computing device's execution of the posture detection engine, among other example benefits).
A neural network engine 260 may be executed to train and implement a first stage neural network model trained to accept image data generated by the camera sensor 230 (including image data cropped using person detection engine 255) to generate a preliminary determination of the posture of the user based on image data showing the user positioned in front of the user computing device (e.g., laptop computer, desktop computer, smart TV, etc.). In one example implementation, the output of the first stage neural network model implemented using the neural network engine 260 may be a three-dimensional estimation of the user's 3D pose within the environment captured by the camera sensor 230. In one example, the first stage neural network model may be implemented as a DNN topology trained on a large corpus of data (e.g., images of many different users of the same or a similar user computing device in various lighting, at various distances from the user computing device, among other variables, from which the ground truth of each user's true posture is known), among other examples. In one example, neural network model may be a unified body model with a number of shape parameters, including vertices and joints corresponding to elements of the human anatomy (e.g., the SMPL-X model or another model modeling human structure anatomy). The neural network model may output a feature set (based on the 2D image input) that defines the position and/or angles of various joints of the human user identified in the 2D image, including position of joints corresponding the spine, the neck/skull, shoulders, elbows, wrists, hips, knees, ankles, among other examples. In some implementations, the camera of the user computing device (e.g., in a laptop, smartphone, desktop, etc.) may only capture the upper body of the user and the neural network model may focus on generating a feature set from an image input corresponding upper body joints. In other implementations, the camera of the user computing device captures more than the upper body, such as a smart TV or video game console, where the user may be typically positioned a greater distance away from the user computing device and its camera, allowing additional features of the user's position and posture to be determined using the trained neural network model (e.g., to detect positioning and angles of joints at the hip, knee, ankles, lower back, etc.), among other example implementations. Correct or incorrect posture may be determined from these feature set (e.g., based on the combination of joint positions and angles corresponding to joint positions and/or angles characteristic of a healthy or incorrect posture).
Turning to
To address errors and inaccuracy of an image-only based posture detection system, in some implementations, the system may be further enhanced to utilize inexpensive, easy-to-integrate, and/or low resolution depth sensors provided in addition to a webcam or other camera sensor of the user computing device to improve the accuracy of the posture detection system, such as improving the accuracy of measurements of how much the user leans forward or away from the device, as well as the angle of arms, wrists, hands, legs, or feet that face the camera of the user computing device. For instance, a user computing device may include a simple low cost ToF sensor. The ToF sensor may provide depth information at a low resolution (e.g., exponentially lower than the (e.g., megapixel level) resolution of the camera, for instance, on the order of 8×8, 16×16, 16×9, etc.) over the image. At least a subset of the ToF sensor's readings may be identified as corresponding to points on the user's body and correspond in time to a respective image collected by the camera of the user computing device (e.g., capturing the upper body of a user while the user sits at a PC user computing device). The data from the ToF sensor can be integrated to a neural network model (e.g., DNN) used for posture detection by adding additional inputs at the input layer of the DNN. During training on data that includes the ToF and camera sensor, the neural network model may automatically learn to make use of the additional low-resolution depth information to minimize the cost function that corresponds to the ground truth during training. Once trained, the sensor fusion model will make use of the depth information to make a better prediction of the user's posture and allow the posture detection subsystem to generate and deliver more accurate results to the user.
Returning to the discussion of
In some implementations, second feature set data, generated using fusion model engine 265, may be processed by a feedback engine 270 to identify that the second feature set data describes physical features of the user indicative of a good, healthy posture by the user, or alternatively, incorrect or harmful posture by the user. The feedback engine 270 may generate feedback data based on identifying the posture of the user for consumption by the user. For instance, feedback data may be presented to the user through a user interface, such as display 220, a secondary display of the user computing device 205, a speaker (e.g., as audio feedback), or even via a separate coordinating device (e.g., a wearable, such as a smartwatch in communication with the user computing device), among other examples. In some implementations, presentation of feedback information indicating correct/incorrect posture or other biomechanical/ergonomic (e.g., relating to neck/head positioning, arm/wrist/hand positioning on a keyboard or mouse, etc.) may be triggered in response to a single result of the sensor fusion model engine 265 (e.g., based on a single RGB image-depth map pair) or, alternatively, in response to a series of consistent results generated by the sensor fusion model engine 265 indicating that the user has been detected as maintaining a given posture or positioning over a span of time (e.g., based on a series of RGB image-depth map pairs captured by the camera sensor 230 and low-resolution depth sensor 240 and provided to a trained sensor fusion model executed using the sensor fusion model engine 265).
In some implementations, posture determinations embodied in the second feature set data and/or feedback data generated by the posture detection engine 250 may be shared or transmitted by the user computing device 205 to external computing systems 280 (e.g., via one or more communication networks (e.g., 275). Such data may be anonymized and/or encrypted and may be further processed by other trusted services, for instance, as part of a health monitoring service, therapeutic services, services offering gamification of proper work habits, social networks, etc., which may be used to further enhance the adoption and resulting health benefits that may be derived by providing users of user computing devices, with real time biofeedback to improve their biomechanical habits and body positioning when using such devices, among other example applications and potential benefits. Results generated by services or applications hosted on the user computing device 205 or one or more external computing devices (e.g., 280) may also be presented to the user (e.g., in addition to feedback information generated by the posture detection engine), among other example features.
In some cases, external computing systems (e.g., 280), with which a user computing device (e.g., 205) may provide remotely hosted services, such as data storage, information services, geolocation services, and computational services (e.g., data analytics, search, diagnostics, etc.). In some cases, data generated from these remotely hosted service may be returned to the user computing device 205 to be consumed by the posture detection engine 250 or other applications hosted on the user computing device 205, for instance to provide enhanced feedback, functionality, or other example features. One or more networks (e.g., 275) can facilitate communication between the user computing device 205 and external computing devices (e.g., 280). Such networks can include wired and/or wireless local networks, public networks, wide area networks, broadband cellular networks, the Internet, and the like.
In general, “servers,” “clients,” “computing devices,” “network elements,” “hosts,” “system-type system entities,” “user devices,” “gateways,” “IoT devices,” “sensor devices,” and “systems” (e.g., 205, 280, etc.) in an example computing environment, can include electronic computing devices operable to receive, transmit, process, store, or manage data and information associated with the computing environment. As used in this document, the term “computer,” “processor,” “processor device,” or “processing device” is intended to encompass any suitable processing apparatus. For example, elements shown as single devices within the computing environment may be implemented using a plurality of computing devices and processors, such as server pools including multiple server computers. Further, any, all, or some of the computing devices may be adapted to execute any operating system, including Linux, UNIX, Microsoft Windows, Apple OS, Apple IOS, Google Android, Windows Server, etc., as well as virtual machines adapted to virtualize execution of a particular operating system, including customized and proprietary operating systems.
In some implementations, a user computing device 205 may be participate with other devices, such as wearable devices, Internet-of-Things devices, connected home devices (e.g., home health devices), and other devices in a machine-to-machine network, such as Internet-of-things (IOT) networking, a fog network, connect home network, or other network (e.g., using wireless local area networks (WLAN), such as those standardized under IEEE 802.11 family of standards, home-area networks such as those standardized under the Zigbee Alliance, personal-area networks such as those standardized by the Bluetooth Special Interest Group, cellular data networks, such as those standardized by the Third-Generation Partnership Project (3GPP), and other types of networks, having wireless, or wired, connectivity).
Turning to
In one example, the respective fields of view of each of the camera sensor and the depth sensor of a user computing device may be configured or calibrated to align to each capture a similar view of user(s) of the user computing device. As shown in
Continuing with the example of
As introduced above, in some implementations, the computing (e.g., processing and memory) resources of a user computing device may be relatively constrained, compared to computers more specifically configured for performing machine learning operations. In some instances, a separate computing subsystem may be provided for use in executing a posture detection engine, such as a lid control hub located in a lid section (e.g., with the display) in a laptop, separate from the motherboard below the keyboard of the laptop, among other examples. The use of a sparse depth image (e.g., 410a-410b) may allow a simple (e.g., relative to the first stage neural network model 420) and lower dimension machine learning model to be utilized to implement second stage model 430. The use of smaller models (e.g., 430) may assist with fitting such models for use on secondary or companion chips on the user computing device (e.g., separate from the CPU of the user computing device), particularly where such companion chips have lower memory footprints.
Training of the machine learning models (e.g., 415, 420, 430) used in a posture detection pipeline (e.g., as illustrated in the example of
In some implementations, a low-resolution depth sensor may have a different field of view than that of the corresponding high resolution camera sensor on the user computing device. Turning to
The technology, resolution, angle, field of view, and other attributes and configurations of a depth sensor of a user computing device may be selected based on the anticipated user interactions with the user computing device. As an example, a depth sensor may be used and so configured on a given user computing device so as to effectively guarantee that the depth sensor will consistent generate a minimum or threshold number of depth pixels to measure a user of the user computing device during the user's anticipated use of the computing device. For instance, different user computing devices may adopt different depth sensors to account for the user anticipated distance from the user computing device and its sensors during the user's (or users') use of the user computing device. The minimum or threshold number of depth pixel “hits” may correspond to a number determined to yield improvements to a 2D-image-only-based posture determination using a sensor fusion model, such as described in the example of
While some of the systems and solution described and illustrated herein have been described as containing or being associated with a plurality of elements, not all elements explicitly illustrated or described may be utilized in each alternative implementation of the present disclosure. Additionally, one or more of the elements described herein may be located external to a system, while in other instances, certain elements may be included within or as a portion of one or more of the other described elements, as well as other elements not described in the illustrated implementation. Further, certain elements may be combined with other components, as well as used for alternative or additional purposes in addition to those purposes described herein.
Further, it should be appreciated that the examples presented above are non-limiting examples provided merely for purposes of illustrating certain principles and features and not necessarily limiting or constraining the potential embodiments of the concepts described herein. For instance, a variety of different embodiments can be realized utilizing various combinations of the features and components described herein, including combinations realized through the various implementations of components described herein. Other implementations, features, and details should be appreciated from the contents of this Specification.
Processor 800 can execute any type of instructions associated with algorithms, processes, or operations detailed herein. Generally, processor 800 can transform an element or an article (e.g., data) from one state or thing to another state or thing.
Code 804, which may be one or more instructions to be executed by processor 800, may be stored in memory 802, or may be stored in software, hardware, firmware, or any suitable combination thereof, or in any other internal or external component, device, element, or object where appropriate and based on particular needs. In one example, processor 800 can follow a program sequence of instructions indicated by code 804. Each instruction enters a front-end logic 806 and is processed by one or more decoders 808. The decoder may generate, as its output, a micro-operation such as a fixed width micro operation in a predefined format, or may generate other instructions, microinstructions, or control signals that reflect the original code instruction. Front-end logic 806 also includes register renaming logic 810 and scheduling logic 812, which generally allocates resources and queue the operation corresponding to the instruction for execution.
Processor 800 can also include execution logic 814 having a set of execution units 816a, 816b, 816n, etc. Some embodiments may include a number of execution units dedicated to specific functions or sets of functions. Other embodiments may include only one execution unit or one execution unit that can perform a particular function. Execution logic 814 performs the operations specified by code instructions.
After completion of execution of the operations specified by the code instructions, back-end logic 818 can retire the instructions of code 804. In one embodiment, processor 800 allows out of order execution but requires in order retirement of instructions. Retirement logic 820 may take a variety of known forms (e.g., re-order buffers or the like). In this manner, processor 800 is transformed during execution of code 804, at least in terms of the output generated by the decoder, hardware registers and tables utilized by register renaming logic 810, and any registers (not shown) modified by execution logic 814.
Although not shown in
Processors 970 and 980 may also each include integrated memory controller logic (MC) 972 and 982 to communicate with memory elements 932 and 934. In alternative embodiments, memory controller logic 972 and 982 may be discrete logic separate from processors 970 and 980. Memory elements 932 and/or 934 may store various data to be used by processors 970 and 980 in achieving operations and functionality outlined herein.
Processors 970 and 980 may be any type of processor, such as those discussed in connection with other figures. Processors 970 and 980 may exchange data via a point-to-point (PtP) interface 950 using point-to-point interface circuits 978 and 988, respectively. Processors 970 and 980 may each exchange data with a chipset 990 via individual point-to-point interfaces 952 and 954 using point-to-point interface circuits 976, 986, 994, and 998. Chipset 990 may also exchange data with a co-processor 938, such as a high-performance graphics circuit, machine learning accelerator, or other co-processor 938, via an interface 939, which could be a PtP interface circuit. In alternative embodiments, any or all of the PtP links illustrated in
Chipset 990 may be in communication with a bus 920 via an interface circuit 996. Bus 920 may have one or more devices that communicate over it, such as a bus bridge 918 and I/O devices 916. Via a bus 910, bus bridge 918 may be in communication with other devices such as a user interface 912 (such as a keyboard, mouse, touchscreen, or other input devices), communication devices 926 (such as modems, network interface devices, or other types of communication devices that may communicate through a computer network 960), audio I/O devices 914, and/or a data storage device 928. Data storage device 928 may store code 930, which may be executed by processors 970 and/or 980. In alternative embodiments, any portions of the bus architectures could be implemented with one or more PtP links.
The computer system depicted in
While some of the systems and solutions described and illustrated herein have been described as containing or being associated with a plurality of elements, not all elements explicitly illustrated or described may be utilized in each alternative implementation of the present disclosure. Additionally, one or more of the elements described herein may be located external to a system, while in other instances, certain elements may be included within or as a portion of one or more of the other described elements, as well as other elements not described in the illustrated implementation. Further, certain elements may be combined with other components, as well as used for alternative or additional purposes in addition to those purposes described herein.
Further, it should be appreciated that the examples presented above are non-limiting examples provided merely for purposes of illustrating certain principles and features and not necessarily limiting or constraining the potential embodiments of the concepts described herein. For instance, a variety of different embodiments can be realized utilizing various combinations of the features and components described herein, including combinations realized through the various implementations of components described herein. Other implementations, features, and details should be appreciated from the contents of this Specification.
Although this disclosure has been described in terms of certain implementations and generally associated methods, alterations and permutations of these implementations and methods will be apparent to those skilled in the art. For example, the actions described herein can be performed in a different order than as described and still achieve the desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve the desired results. In certain implementations, multitasking and parallel processing may be advantageous. Additionally, other user interface layouts and functionality can be supported. Other variations are within the scope of the following claims.
In general, one aspect of the subject matter described in this specification can be embodied in methods and executed instructions that include or cause the actions of identifying a sample that includes software code, generating a control flow graph for each of a plurality of functions included in the sample, and identifying, in each of the functions, features corresponding to instances of a set of control flow fragment types. The identified features can be used to generate a feature set for the sample from the identified features
These and other embodiments can each optionally include one or more of the following features. The features identified for each of the functions can be combined to generate a consolidated string for the sample and the feature set can be generated from the consolidated string. A string can be generated for each of the functions, each string describing the respective features identified for the function. Combining the features can include identifying a call in a particular one of the plurality of functions to another one of the plurality of functions and replacing a portion of the string of the particular function referencing the other function with contents of the string of the other function. Identifying the features can include abstracting each of the strings of the functions such that only features of the set of control flow fragment types are described in the strings. The set of control flow fragment types can include memory accesses by the function and function calls by the function. Identifying the features can include identifying instances of memory accesses by each of the functions and identifying instances of function calls by each of the functions. The feature set can identify each of the features identified for each of the functions. The feature set can be an n-graph.
Further, these and other embodiments can each optionally include one or more of the following features. The feature set can be provided for use in classifying the sample. For instance, classifying the sample can include clustering the sample with other samples based on corresponding features of the samples. Classifying the sample can further include determining a set of features relevant to a cluster of samples. Classifying the sample can also include determining whether to classify the sample as malware and/or determining whether the sample is likely one of one or more families of malware. Identifying the features can include abstracting each of the control flow graphs such that only features of the set of control flow fragment types are described in the control flow graphs. A plurality of samples can be received, including the sample. In some cases, the plurality of samples can be received from a plurality of sources. The feature set can identify a subset of features identified in the control flow graphs of the functions of the sample. The subset of features can correspond to memory accesses and function calls in the sample code.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
The following examples pertain to embodiments in accordance with this Specification. Example 1 is a non-transitory machine-readable storage medium with instructions stored thereon, the instructions executable by the machine to cause the machine to: receive image data generated by a camera with a first resolution, where the camera is provided on a user computing device to capture an image of a user of the user computing device; execute a first machine learning model trained to determine a feature set associated with posture of the user from the image data; receive depth data generated by a time of flight sensor provided on the user computing device, where the depth data has a second resolution lower than the first resolution and is generated contemporaneously with generation of the image data; provide the first feature set as a first input and the depth data as a second input to a second machine learning model to generate a second feature set as an output of the second machine learning model; and determine a posture of the user from the second feature set.
Example 2 includes the subject matter of example 1, where the image data includes two-dimensional red-green-blue (RGB) image data.
Example 3 includes the subject matter of any one of examples 1-2, where dimensions of the first features set are lower than dimensions of the image data.
Example 4 includes the subject matter of any one of examples 1-3, where the instructions are further executable to cause the machine to: provide a first version of the image data to a person detection model to detect that a view of the user occupies a subarea of the image data; and generate a cropped version of the image data, where the cropped version of the image data includes the subarea, where the cropped version of the image data is provided as an input to the first machine learning model.
Example 5 includes the subject matter of example 4, where the instructions are further executable to cause the machine to: determine a subset of depth pixels of the depth data corresponding to the subarea; and crop the depth data to generate a cropped version of the depth data to include the subset of depth pixels, where the cropped version of the depth data is provided as the second input to the second machine learning model.
Example 6 includes the subject matter of any one of examples 1-5, where the first machine learning model includes a convolutional neural network.
Example 7 includes the subject matter of any one of examples 1-6, where the first feature set and the second feature set each define a set of features associated with whether a body part of the user is angled toward or away from the user computing device.
Example 8 includes the subject matter of example 7, where the set of features in the second feature set are more accurate than the set of features in the first feature set.
Example 9 includes the subject matter of example 8, where the body part includes a torso of a user.
Example 10 includes the subject matter of example 8, where the body part includes a limb of a user.
Example 11 includes the subject matter of any one of examples 1-10, where the camera includes a webcam integrated into the user computing device and the ToF sensor includes a low-resolution ToF sensor integrated into the user computing device.
Example 12 includes the subject matter of any one of examples 1-11, where the user computing device includes one of a laptop computer, a desktop computer, a smart television, or a gaming system.
Example 13 includes the subject matter of any one of examples 1-12, where the instructions are further executable to cause the machine to determine whether the posture of the user is correct or incorrect based on the second feature set.
Example 14 includes the subject matter of example 13, where the instructions are further executable to cause the machine to generate feedback data for presentation to the user, where the feedback data identifies whether the posture of the user is correct or incorrect.
Example 15 is a method including: receiving two-dimensional image data generated by a camera of a user computing device, where the image data includes an image of a user using the user computing device; applying a first machine learning model to the image data to generate a first feature set, where the first feature set identifies features of a pose of the user from the image data; receiving depth data generated by a depth sensor of the user computing device, where the depth data includes a grid of depth pixels and is generated contemporaneously with the image data; providing the first feature set as a first input and the depth data as a second input to a second machine learning model to generate a second feature set as an output of the second model; and determining a posture of the user from the second feature set.
Example 16 includes the subject matter of example 15, further including determining whether the posture of the user is correct or incorrect based on the second feature set.
Example 17 includes the subject matter of example 16, further including generating feedback data for presentation to the user, where the feedback data identifies whether the posture of the user is correct or incorrect.
Example 18 includes the subject matter of any one of examples 15-17, where the image data includes red-green-blue (RGB) image data.
Example 19 includes the subject matter of any one of examples 15-18, where dimensions of the first features set are lower than dimensions of the image data.
Example 20 includes the subject matter of any one of examples 15-19, further including: providing a first version of the image data to a person detection model to detect that a view of the user occupies a subarea of the image data; and generating a cropped version of the image data, where the cropped version of the image data includes the subarea, where the cropped version of the image data is provided as an input to the first machine learning model.
Example 21 includes the subject matter of example 20, further including: determining a subset of depth pixels of the depth data corresponding to the subarea; and cropping the depth data to generate a cropped version of the depth data to include the subset of depth pixels, where the cropped version of the depth data is provided as the second input to the second machine learning model.
Example 22 includes the subject matter of any one of examples 15-21, where the first machine learning model includes a convolutional neural network.
Example 23 includes the subject matter of any one of examples 15-22, where the first feature set and the second feature set each define a set of features associated with whether a body part of the user is angled toward or away from the user computing device.
Example 24 includes the subject matter of example 23, where the set of features in the second feature set are more accurate than the set of features in the first feature set.
Example 25 includes the subject matter of example 24, where the body part includes a torso of a user.
Example 26 includes the subject matter of example 24, where the body part includes a limb of a user.
Example 27 includes the subject matter of any one of examples 15-26, where the camera includes a webcam integrated into the user computing device and the ToF sensor includes a low-resolution ToF sensor integrated into the user computing device.
Example 28 includes the subject matter of any one of examples 15-27, where the user computing device includes one of a laptop computer, a desktop computer, a smart television, or a gaming system.
Example 29 is a system including means to perform the method of any one of examples 15-28.
Example 30 is an apparatus including: a processor; a memory; a display; a camera sensor oriented to face a human viewer of the display; a depth sensor oriented to face the human viewer of the display; and a posture detection engine executable by the processor to: receive two-dimensional image data generated by the camera, where the image data includes an image of the human viewer; provide the image data as an input to a first machine learning model to determine a first feature set, where the first machine learning model is trained to determine a post of a human from two-dimensional images; receive depth data generated by the depth sensor contemporaneously with generation of the image data, where the depth data includes one or more depth measurements of the human viewer; provide the first feature set as a first input and the depth data as a second input to a second machine learning model to generate a second feature set as an output of the second machine learning model; determine a posture of the human viewer from the second feature set; and determine quality of the posture of the human viewer based on the second feature set.
Example 31 includes the subject matter of example 30, further including a central processing unit (CPU), where the processor is separate from the CPU, and logic implementing primary functionality of a user computing device is executed using the CPU.
Example 32 includes the subject matter of example 30, where the apparatus includes a user computing device, and the user computing device includes the processor, the display, the camera, the depth sensor, and the posture detection engine.
Example 33 includes the subject matter of example 32, where the user computing device includes one of a laptop computer, a desktop computer, a tablet computer, a smart television, or a video gaming system.
Example 34 includes the subject matter of any one of examples 30-33, where the camera and the depth sensor are embedded in a bezel, where the bezel at least partially frames the display.
Example 35 includes the subject matter of any one of examples 30-34, where the camera includes a high-resolution RGB camera and the depth sensor includes a low resolution time of flight sensor.
Example 36 includes the subject matter of any one of examples 30-35, where the posture detection engine is further to generate feedback data for presentation to the user, where the feedback data identifies whether the posture of the user is correct or incorrect.
Example 37 includes the subject matter of any one of examples 30-36, where dimensions of the first features set are lower than dimensions of the image data.
Example 38 includes the subject matter of any one of examples 30-37, where the posture detection engine is further to: provide a first version of the image data to a person detection model to detect that a view of the user occupies a subarea of the image data; and generate a cropped version of the image data, where the cropped version of the image data includes the subarea, where the cropped version of the image data is provided as an input to the first machine learning model.
Example 39 includes the subject matter of example 38, where the posture detection engine is further to: determine a subset of depth pixels of the depth data corresponding to the subarea; and crop the depth data to generate a cropped version of the depth data to include the subset of depth pixels, where the cropped version of the depth data is provided as the second input to the second machine learning model.
Example 40 includes the subject matter of any one of examples 30-39, where the first machine learning model includes a convolutional neural network.
Example 41 includes the subject matter of any one of examples 30-40, where the first feature set and the second feature set each define a set of features associated with whether a body part of the user is angled toward or away from the user computing device.
Example 42 includes the subject matter of example 41, where the set of features in the second feature set are more accurate than the set of features in the first feature set.
Example 43 includes the subject matter of example 42, where the body part includes a torso of a user.
Example 44 includes the subject matter of example 42, where the body part includes a limb of a user.
Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results.