The present application claims priority to Singapore patent application 10201602513X filed on 30 Mar. 2016, which is incorporated herein by reference in its entirety for all purposes.
The following discloses methods for providing task related information to a user, user assistance systems, and computer-readable media.
Various processes in industry are very complex, and it may be difficult for a human operator or a human inspector to assess all aspects that are relevant, for example relevant to operation of a device or machine, relevant to making a decision, and/or relevant to spotting a malfunction.
As such, there may be a desire for support of human operators or human inspectors.
Furthermore, other desirable features and characteristics will become apparent from the subsequent detailed description and the appended claims, taken in conjunction with the accompanying drawings and this background of the disclosure.
According to various embodiments, a method for providing task related information to a user may be provided. The method may include: determining location information based on a spatial model; determining task information based on a task model; determining sensor information; determining output information based on the location information, task information and sensor information; and providing the output information to the user.
According to various embodiments, the spatial model may include at least one of a spatial representation of a position in a work place, a scene recognition model, a vision recognition model for recognizing a body/view orientation, a vision recognition model for estimating a distance to a target position, a vision recognition model for detecting landmarks, and a vision recognition model for recognizing related objects.
According to various embodiments, the task model may include at least one of a position in relation to the spatial model, an indication of a vision task, or an action in relation to a user interface model.
According to various embodiments, the method may further include: determining a state of a task performance; and determining the output information further based on the state.
According to various embodiments, the state may be determined based on a dynamic Bayesian network.
According to various embodiments, determining the sensor information may include or may be included in determining a visual feature of an image.
According to various embodiments, the output information may include at least one of an orientation cue, an error indication, or a contextual cue.
According to various embodiments, the method may be applied to at least one of wire harness assembly, building inspection, or transport inspection.
According to various embodiments, a user assistance system for providing task related information to a user may be provided. The user assistance system may include: a location information determination circuit configured to determine location information based on a spatial model; a task information determination circuit configured to determine task information based on a task model; a sensor configured to determine sensor information; an output information determination circuit configured to determine output information based on the location information, task information and sensor information; and an output circuit configured to provide the output information to the user.
According to various embodiments, the spatial model may include at least one of a spatial representation of a position in a work place, a scene recognition model, a vision recognition model for recognizing a body/view orientation, a vision recognition model for estimating a distance to a target position, a vision recognition model for detecting landmarks, and a vision recognition model for recognizing related objects.
According to various embodiments, the task model may include at least one of a position in relation to the spatial model, an indication of a vision task, or an action in relation to a user interface model.
According to various embodiments, the user assistance system may further include a state determination circuit configured to determine a state of a task performance. According to various embodiments, the output information determination circuit may be configured to determine the output information further based on the state.
According to various embodiments, the state determination circuit may be configured to determine the state based on a dynamic Bayesian network.
According to various embodiments, the sensor may further be configured to determine a visual feature of an image.
According to various embodiments, the output information may include at least one of an orientation cue, an error indication, or a contextual cue.
According to various embodiments, the user assistance system may be configured to be applied to at least one of wire harness assembly, building inspection, or transport inspection.
According to various embodiments, the user assistance system may further include a wearable device including the output circuit.
According to various embodiments, the wearable device may include or may be included in a head mounted device.
According to various embodiments, the output circuit may be configured to provide the output information in an augmented reality.
According to various embodiments, a non-transitory computer-readable medium may be provided. The non-transitory computer-readable medium may include instructions, which when executed by a computer, make the computer perform a method for providing task related information to a user. The method may include: determining location information based on a spatial model; determining task information based on a task model; determining sensor information; determining output information based on the location information, task information and sensor information; and providing the output information to the user.
The accompanying figures, where like reference numerals refer to identical or functionally similar elements throughout the separate views and which together with the detailed description below are incorporated in and form part of the specification, serve to illustrate various embodiments, by way of example only, and to explain various principles and advantages in accordance with a present embodiment.
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been depicted to scale. For example, the dimensions of some of the elements in the block diagrams or steps in the flowcharts may be exaggerated in respect to other elements to help improve understanding of the present embodiment.
The following detailed description is merely exemplary in nature and is not intended to limit the invention or the application and uses of the invention. Furthermore, there is no intention to be bound by any theory presented in the preceding background of the invention or the following detailed description. It is the intent of the preferred embodiments to disclose a method and system which is able to assist a user (for example a worker or an engineer) in various tasks (for example visual inspection or operations in industries).
According to various embodiments, a “circuit” may be understood as any kind of a logic implementing entity, which may be special purpose circuitry or a processor executing software stored in a memory, firmware, or any combination thereof. Thus, in an embodiment, a “circuit” may be a hard-wired logic circuit or a programmable logic circuit such as a programmable processor, e.g. a microprocessor (e.g. a Complex Instruction Set Computer (CISC) processor or a Reduced Instruction Set Computer (RISC) processor). A “circuit” may also be a processor executing software, e.g. any kind of computer program, e.g. a computer program using a virtual machine code such as e.g. Java. Any other kind of implementation of the respective functions which will be described in more detail below may also be understood as a “circuit” in accordance with an alternative embodiment.
Various embodiments are described for devices (or systems), and various embodiments are described for methods. It will be understood that properties described for a device may also hold true for a related method, and vice versa.
Various processes in industry are very complex, and it may be difficult for a human operator or a human inspector to assess all aspects that are relevant, for example relevant to operation of a device or machine, relevant to making a decision, and/or relevant to spotting a malfunction.
According to various embodiments, devices and methods may be provided for support of human operators or human inspectors.
According to various embodiments, a wearable assistant, for example for visual inspection and operation in industries, may be provided.
Visual inspection and operation assistance may be a device or method (in other words: process) that assists human memory in making judgments, and performing specified operations on a set of procedural tasks.
According to various embodiments, a computational framework and system architecture of a wearable mobile assistant may be provided, for example for visual inspection and operation in industrial-related tasks.
In other words, location information and task information may be used to determine and present to a user information that supports the user in performing a task.
According to various embodiments, the spatial model may include at least one of a spatial representation of a position in a work place, a scene recognition model, a vision recognition model for recognizing a body orientation, a vision recognition model for estimating a distance to a target position, a vision recognition model for detecting landmarks, and a vision recognition model for recognizing related objects.
According to various embodiments, the task model may include at least one of a position in relation to the spatial model, an indication of a vision task, or an action in relation to a user interface model.
According to various embodiments, the method may further include: determining a state of a task performance; and determining the output information further based on the state.
According to various embodiments, the state may be determined based on a dynamic Bayesian network.
According to various embodiments, determining the sensor information may include or may be included in determining a visual feature of an image.
According to various embodiments, the output information may include at least one of an orientation cue, an error indication, or a contextual cue.
According to various embodiments, the method may be applied to at least one of wire harness assembly, building inspection, or transport inspection.
According to various embodiments, the spatial model may include at least one of a spatial representation of a position in a work place, a scene recognition model, a vision recognition model for recognizing a body/view orientation, a vision recognition model for estimating a distance to a target position, a vision recognition model for detecting landmarks, and a vision recognition model for recognizing related objects.
According to various embodiments, the task model may include at least one of a position in relation to the spatial model, an indication of a vision task, or an action in relation to a user interface model.
According to various embodiments, the state determination circuit 128 may be configured to determine a state of a task performance. According to various embodiments, the output information determination circuit 120 may be configured to determine the output information further based on the state.
According to various embodiments, the state determination circuit 128 may be configured to determine the state based on a dynamic Bayesian network.
According to various embodiments, the sensor 118 may further be configured to determine a visual feature of an image.
According to various embodiments, the output information may include at least one of an orientation cue, an error indication, or a contextual cue.
According to various embodiments, the user assistance system 126 may be configured to be applied to at least one of wire harness assembly, building inspection, or transport inspection.
According to various embodiments, the user assistance 126 system may further include a wearable device (not shown in
According to various embodiments, the wearable device may include or may be or may be included in a head mounted device.
According to various embodiments, the output circuit 122 may be configured to provide the output information in an augmented reality.
According to various embodiments, a non-transitory computer-readable medium may be provided. The non-transitory computer-readable medium may include instructions which, when executed by a computer, make the computer perform a method for providing task related information to a user (for example the method described above with reference to
A professional task in industrial visual inspection may be a knowledge-intensive activity, requiring domain knowledge and cognitive perception. Cognitive psychology identifies three categories of knowledge for intelligence: declarative knowledge, procedural knowledge and reactive knowledge.
According to various embodiments, a computational framework may be provided for a wearable mobile assistance for visual inspection in industrial applications. According to various embodiments, domain knowledge (as an example of declarative knowledge of workspace and tasks), task monitoring based on cognitive visual perception (as an example of procedural knowledge of the task), and a user interface (as an example of reactive knowledge) may be integrated based on augmented reality for real-time assistance.
According to various embodiments, the wearable assistant system may perform online tracking of a task, and may provide help on aspects of ‘where’, ‘what’, ‘how’, ‘when’, and ‘why’, which corresponds to:
In the following, a long-term memory for domain knowledge representation will be described. According to various embodiments, the long-term memory of domain knowledge may be incorporated by two models: the model of spatial knowledge (or model of spatial cognition) and the model of task representation.
In the following, the model of spatial cognition according to various embodiments will be described.
Each task of visual inspection in an industrial application may be performed in a restricted working area. The workspace may further be divided into several positions. At each position, one or more specified operations are to be performed on related objects. According to various embodiments, a hierarchical structure model may be provided to represent the spatial knowledge for a specific task of visual inspection and operation, as shown in
According to various embodiments, a frame structure may be employed to integrate both declarative knowledge of spatial information, and vision models to perform visual spatial perception. In each node, the local cognitive map may describe the allocentric location in the workspace and geometrical relations of view-points, landmarks, and other related objects. The node may also include vision recognition models (e.g. SVM (Support Vector Machine) models or image templates) for location recognition (for example scene recognition), orientation recognition, distance estimation, and detection of landmarks in the surrounding region.
Combining this spatial knowledge of an allocentric cognitive map and egocentric vision descriptions of the corresponding working position, the system according to various embodiments may be able to know where the user is, what the user is looking at, what the user should do next, and other similar information. The model of spatial cognition may cover all the positions in the working area for the tasks of visual inspection.
For example, as described above and as illustrated by the boxes 410, 412 in
In the following, a model of task representation according to various embodiments will be described.
The procedural knowledge may describe each task as a series of steps to solving a problem. The graphical model is employed to describe the procedural knowledge of a given task, as shown in
In the frame structure, a position slot may store a pointer to a position node in the model of spatial knowledge (in other words: a position connected to the spatial model). A slot of vision tasks may describe what vision operations are to be performed based on the information from the position node in spatial knowledge model, such as scene recognition, orientation and distance estimation, viewpoint to the working surface, landmark or object detection. An action slot may store a pointer connecting to the user interface (UI) model to describe what kind of assistance should be provided at a given instance, based on visual perception.
In the following, working memory for task tracking and monitoring according to various embodiments will be described.
According to various embodiments, once a task is selected, a dynamic model of the procedure may be generated by extracting related knowledge and information from the spatial and task models in long-term memory. According to various embodiments, a graphical model to represent the task in working memory and a dynamic Bayesian network (DBN) model for state tracking may be provided.
According to various embodiments, a DBN model may be provided to describe the dynamic procedure of a specific task. One particular state may be described as a t-slice DBN as shown in illustration 604 of
Assuming that the task takes T time steps (wherein it will be clear from the context whether T refers to a time or to a node of a task, like in
The prior and state transition pdfs (probability density functions) are defined on the task knowledge representation. The probability p(sk|sk−1) is high if the operation for subtask sk−1 has been completed in the previous time steps, otherwise, it is low. The observation probability p(yt|sk) may be defined on the models of task and spatial knowledge. If the scene and objects related to subtask sk are observed, the probability p(yt|sk) is high, otherwise, it is low. If the sequence of visual observations match the description of the task (e.g., scene matches the position, viewpoint matches working surface, and activity matches operation), the joint probability P(YT,SK) is high, otherwise, it is low.
According to various embodiments, the joint probability (1) may be exploited to perform online state inference for state tracking. At any time t during the task, it may be desired to estimate the user's state st according to the observations made so far. According to (1), this may be expressed as:
From (1), the log pdf may be obtained as
Hence, the current state can be obtained as
In the following, vision functions according to various embodiments will be described. Various vision functions, such as image classification for scene recognition, image recognition and retrieval for working place recognition, viewpoint estimation for spatial perception in working point, object detection, sign detection and text recognition, hand segmentation and gesture recognition for action recognition, may be provided in the framework according to various embodiments to perform working state monitoring.
According to various embodiments, various computer vision techniques may be employed and customized for tasks in different industrial applications. According to various embodiments, various vision functions may be provided which may be deployed for general scenarios, while customized for special situations.
In the following, scene recognition according to various embodiments will be described. To help a user in a task, it may be important to know where the user is. According to various embodiments, a vision-based scene recognition for workplace and position recognition may be provided. According to the domain knowledge representation, the system may perform scene recognition in hierarchical levels. First, at a top level, the scene recognition algorithm may classify the observed scenes into two categories: workspace or non-workspace. If the user is within the workspace area, a multi-class scene recognition may be performed to estimate the user's position, so that the system can predict what subtask the user has to perform.
According to various embodiments, a scene recognition model for workspace and position recognition may be provided. For a general case, SVM models may be trained only on gradient features. While special scenes may be considered, it may be extended to involve color features on semantic color names. According to various embodiments, for example when applied to wire routing, at the top level, the scene recognition model may be trained to recognize if the user has entered the working area and is facing the correct orientation to the assembly board.
In the following, distance and orientation estimation according to various embodiments will be described. Once the user enters a workspace, the user's visual attention may be interesting, for example, if the user is at the correct task region, or how far the user is to the target position, so as to estimate what action should be taken, and what helping information should be provided.
Taking wire harness assembly as an example, once the user enters the workspace, the devices or methods according to various embodiments may keep estimating the user's distance and orientation (i.e. working position), so it can understand the user's current state, predict the user's next action, and the required guidance in the task. Instead of precise detection keypoints for 3D reconstruction of the scene and viewpoint, which depends on 3D sensors, a vision method based on cognitive spatial perception of a user's workspace position may be provided according to various embodiments.
According to various embodiments, in cognitive concepts of spatial relations to a working place and operation point, when a user is standing facing a working board, the visual attention may be semantically described as “direct” to board, or looking at “up”, “down”, “left” or “right” side, and the distance may be represented as “close”, “near”, “moderate”, “far” and “far away”. The definitions of such cognitive concepts may be fuzzy but they may be informative enough for a user to understand his/her situation and make decision on the next action.
According to various embodiments, a learning method may be provided to learn such spatial concepts during working just from FPV (first person view) images. The tilt angles of viewpoints may be roughly classified into 3 categories, i.e., ‘−1’ for “up”, ‘0’ for “direct”, and ‘+1’ for “down”, and the pan angles of viewpoints may be roughly classified into 5 categories as ‘−2’ for “far-left”, ‘−1’ for “left”, ‘0’ for “direct”, ‘+1’ for “right” and ‘+2’ for “far-right”, respectively.
According to various embodiments, the distance to the board may be quantified into 5 categories, for example 1′ for “close”, ‘2’ for “near”, ‘3’ for “moderate”, ‘4’ for “far”, and ‘5’ for “far away”. According to various embodiments, a mapping from an input image to a set of scores representing cognitive spatial concepts on pan and tilt angles, as well as distance to the working location may be learned.
For an image from a working position, first a PHOG (Pyramid Histograms of Gradients) as a global representation of the image may be computed. The obtained image descriptor f may be a high-dimensional feature vector. PCA (Principal Component Analysis) may be used to transform f as a low-dimensional feature vector x=[x1, . . . , xK], where K may be selected as about 20 to 40. A hybrid linear model may be provided to learn the mapping from the feature space x∈RK to the score of a cognitive spatial concept. The hybrid linear model may learn a general mapping for all samples, and customized fine-tuning for some difficult samples. Let y represent the corresponding score of a cognitive spatial concept, e.g. the tilt angle of a viewpoint. Then the hybrid linear model may be expressed as
where the first part (in other words: first summand) may be a general linear regression model trained for all samples, and the second part (in other words: second summand) may be an additional fine-tuning bias customized on a neighbourhood sample in a complex training set. The hybrid model may be trained in two steps. In a first step, the general model may be trained on all the training samples. Then, in a second step, the top 20% of the most complex samples may be selected, from (or to) which the general model may be applied.
In the following, landmark recognition according to various embodiments will be described. In industrial inspection, there may often be a few specific and distinctive places and objects related to a task. These scenes and objects may be recognized by employing image matching techniques. According to various embodiments, a few images of the landmark may be stored in the spatial model. When approaching the working position, the input images may be compared with the stored images for landmark recognition.
According to various embodiments, a standard CBIR (Content Based Image Retrieval) pipeline with SIFT (Scale-invariant feature transform) features may be used. A short list of candidates may be found with an inverted file system (IFS), followed by geometric consistency checks with RANSAC (Random sample consensus) on top matches. If no landmark image passes RANSAC, the top match from the IFS may be declared to be a match landmark image.
In the following, object detection according to various embodiments will be described. In a workspace position, there may be one or two (or more) specific objects related to a specified task of examination or operation. According to various embodiments, HOG (histogram of oriented gradients) and SVM detector may be provided for object detection. The devices and methods according to various embodiments may perform active object detection under the guidance of position, distance and viewpoint estimation in the workspace. Thus, advantageously, the devices and methods according to various embodiments may achieve fast and robust object detection.
In the following, sign detection and text recognition according to various embodiments will be described. In the work place, there may be signs and marks to guide the user for correct operations. Signs and marks may be specially designed for people to easy find and understand, and they may be detected by devices and methods according to various embodiments.
In the following, hand detection and gesture recognition according to various embodiments will be described. According to various embodiments, devices and methods for hand segmentation in FPV videos may be provided. First, fast super-pixel segmentation may be performed. Then, a trained SVM may classify each super-pixel as skin region or not, for example based on colour and texture distributions of the super-pixel. The connected super-pixels of skin colour may be segmented into regions of hands based on the spatial constraints from FPV.
According to various embodiments, HMM (hidden Markov model) or DBN may be trained for hand gesture recognition.
In the following, a user interface according to various embodiments will be described.
According to various embodiments, an augmented reality interface may advantageously provide the ability to front-project information that might otherwise be hidden, concealed or occluded from a user's field of view.
According to various embodiments, in the display, information may be color-coded to match that of the task, and to enable information to be clearly distinguished from other on-screen (graphic) objects. Graphical information may be scaled to accommodate for different screen sizes—e.g. a wearable display compared to a portable tablet. According to various embodiments, the user interface may be designed to:
Intelligently adapt the display of information depending on the user's viewing angle and distance.
For task orientation 714, information may initially be displayed in the user interface to help guide the user orientate into position, for example by identifying start and end points, location of the assembly objects, and/or the location to move towards.
For task completion 730, ‘on doing’ the actual task, like illustrated by shaded box 722, information may flag up in the display when physical errors are identified. Furthermore, contextual information may be updated based on the users changing movement and orientation in the inspection procedure.
For task confirmation 728, on completing the task, the user may need to check the inspection task is correct. Here, the interface may highlight the completed sequence of a task or sub-task to enable the user to make comparisons to the real-world.
The user for example may be an operator 712 (or for example an engineer).
According to various embodiments, to support these three phases of operation, intelligent features in the user interface may include the ability to automatically scale graphical detail dependent on the user's proximity to the task, provide navigational cues to direct orientation, and support real-time error correction. These features may be based on the implementation of the visual functions and framework previously described.
Orientation cues 706 may be provided in the user interface 702. Graphical and audio cues may be provided to visually demonstrate the physical direction to the task. Information on the display may update directions and distance to a target object in real-time. This may be useful when orientating over a large distance. Features of the orientation cues 706 may include:
Information related to errors 708 may be provided in the user interface 702, for example related to error detection and recovery. Errors may include real-time errors detected in the inspection task, such as sequencing information in the wrong order, or the wrong placement of a target object. The system may highlight the error in the display, as well as provide suggestions for corrected actions (like illustrated in
Contextual cues 710 may be provided in the user interface 702. To reduce visual clutter and improve attention and visual search, the display of graphical information may automatically adapt and scale to the position of the user. This may advantageously reduce distractions in the environment, as information is prioritised in the task to support visual guidance. Features of the contextual clues 710 may include:
Various embodiments may be provided for wire harness assembly. The wire harness assembly industry may for example be related to aerospace, automobile, and shipping. During the wire harness process, operators are often required to sequentially assemble wires and wire bundles together on a specialized board, or work bench. Wire routing may involve a large workforce, and be very labor-intensive, resulting in high manufacturing costs. To support this process, devices and methods according to various embodiments may:
Various embodiments may be applied to building inspection. Building inspection may cover a wide spectrum of activities from surveying exterior and interior structures, repair work, to providing reports on poor installation and ceiling, windows and floors defects. Devices and methods according to various embodiments may:
According to various embodiments, in the event that an object is incorrectly positioned, a warning message may automatically flag up in the user's field of view. Prompt messages may then be provided to correct the sub-task, such as the position to orientate the object.
On completing the inspection task, the user may request the full structure be augmented to trace back through the order sequence.
Various embodiments may be applied to transport inspection. Inspection of transport may include trains, ships, airplanes, or other commercial vehicles. This may involve either the internal or external inspection of the vehicle. This may for example, be part of a surface structure of a ship, or internal cabin of an aircraft. Various embodiments may augment both visible and concealed information during the inspection process.
When inspecting over a wide surface area, the sequence of information around the structural surface may be augmented. According to various embodiments, it may be differentiated between faults and incorrect states. According to various embodiments, key features for inspection and scale information may be highlighted based on the user's proximity. According to various embodiments, it may easily be switched between the inspection of different object sizes—macro and micro views—e.g. the nose of an airplane, versus a small fault. According to various embodiments, it may be highlighted and distinguished between surface objects to inspect (e.g. vents, flaps, etc.), and deviations in their structure (e.g. stress, deformation, deterioration, etc.).
The devices and methods according to various embodiments (for example according to the computational framework according to various embodiments) may assist the user in the visual guidance of inspection and operation tasks that require following a complex set of navigational steps or procedures. In this context, a ‘user’ can be a factory operator, technician, engineer or other workplace personnel.
Various embodiments provide real-time navigational guidance using an augmented visual display. This may allow hands free interaction, and an intelligent approach to displaying information in a user's field of view (i.e. FPV, First-Person-View).
According to various embodiments, a framework and algorithms may be provided, which can actively detect features in the workplace environment using cognitive domain knowledge and real-time video stream from an optical wearable camera, and sequence information in a dynamic interface to help reduce the working memory, while adding the skill demands of the user.
Various embodiments may provide real-time visual recognition of scene and objects, task errors and surface anomalies, may logically sequence task information to support memory and visual guidance, may provide contextual information to aid in orientation of the inspected area, may adapt the display of visual information to suit the task and environment, and/or may provide an easy to learn user interface.
Various embodiments advantageously may provide reference to information concealed or occluded from view, may help reduce human errors and uncertainty, may improve task efficacy through appropriate strategies and decision making, may reduce the need for paper documentation, and/or may avoid the need for AR markers.
Various embodiments may be used for various tasks, for example assembly, maintenance, emission monitoring, shift operation, incident reporting, control room monitoring, security patrol, equipment, and/or waste management.
Various embodiments may be used in various industries, for example manufacture, power generation, construction, oil and gas, hydro and water, petrochemical, mining, environment, and/or science and research.
While exemplary embodiments have been presented in the foregoing detailed description of the invention, it should be appreciated that a vast number of variations exist.
It should further be appreciated that the exemplary embodiments are only examples, and are not intended to limit the scope, applicability, operation, or configuration of the invention in any way. Rather, the foregoing detailed description will provide those skilled in the art with a convenient road map for implementing an exemplary embodiment of the invention, it being understood that various changes may be made in the function and arrangement of elements and method of operation described in an exemplary embodiment without departing from the scope of the invention as set forth in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
10201602513X | Mar 2016 | SG | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/SG2017/050173 | 3/30/2017 | WO | 00 |