Systems and methods for capturing images used to obtain visual measurements for use in simultaneous location and mapping are described herein.
Many robots are electo-mechanical machines, which are controlled by a computer. Mobile robots have the capability to move around in their environment and are not fixed to one physical location. An example of a mobile robot that is in common use today is an automated guided vehicle or automatic guided vehicle (AGV). An AGV is typically considered to be a mobile robot that follows markers or wires in the floor, or uses a vision system or lasers for navigation. Mobile robots can be found in industry, military and security environments. They also appear as consumer products, for entertainment or to perform specific tasks such as vacuum cleaning and home assistance.
In order to achieve full autonomy, a mobile robot typically needs to possess the ability to explore its environment without user-intervention, build a reliable map of the environment, and localize itself within the map. Significant research has been conducted in the area of Simultaneous Localization and Mapping (SLAM) to address this problem in mobile robotics. The development of better navigation algorithms and more accurate sensors have enabled significant progress towards building better robots.
The present invention provides a mobile robot configured to navigate an operating environment, that includes: a body having a top surface; a drive mounted to the body; a recessed structure beneath the plane of the top surface near a geometric center of the body; a controller circuit in communication with the drive, wherein the controller circuit directs the drive to navigate the mobile robot through an environment using camera-based navigation system; and a camera including optics defining a camera field of view and a camera optical axis, wherein: the camera is positioned within the recessed structure and is tilted so that the camera optical axis is aligned at an acute angle of 30-40 degrees above a horizontal plane in line with the top surface and is aimed in a forward drive direction of the robot body, the field of view of the camera spans a frustum of 45-65 degrees in the vertical direction, and the camera is configured to capture images of the operating environment of the mobile robot.
In several embodiments, the camera is protected by a lens cover aligned at an acute angle with respect to the optical axis of the camera.
In certain embodiments, the lens cover is set back relative to an opening of the recessed structure and an acute angle with respect to the optical axis of the camera that is closer to perpendicular than an angle formed between a plane defined by the top surface and the optical axis of the camera.
In several embodiments, the acute angle is between 15 and 70 degrees.
In some embodiments, the angle formed between a plane defined by the opening in the recessed structure and the optical axis of the camera ranges between 10 and 60 degrees.
In some embodiments, the camera field of view is aimed at static features located in a range of 3 feet to 8 feet from a floor surface at a distance of 3 feet to 10 feet from the static features.
In some embodiments, the camera images contain about 6-12 pixels per inch and features at the top of the image move upward between successive images more quickly than the speed at which the mobile robot moves and features at the bottom of the image move downward between successive images more slowly than the speed at which the mobile robot moves and wherein the controller is configured to determine the speed of the mobile robot and location of features in the image in identifying disparity between successive images.
In certain embodiments, the mobile robot moves at a velocity of 220 mm per second to 450 mm per second and features lower than 45 degrees relative to the horizon will track slower than approximately 306 mm per second and features higher than 45 degrees will track faster than 306 mm per second.
In some embodiments, the optics define an f number that is between 1.8 and 2.0
In several embodiments, the optics define a focal length that is at least 40 cm.
In some embodiments, the body further includes: a memory in communication with the controller circuit; and an odometry sensor system in communication with the drive, wherein the memory further contains a visual measurement application, a simultaneous location and mapping (SLAM) application, a landmarks database, and a map of landmarks, wherein the controller circuit directs a processor to: actuate the drive and capture odometry data using the odometry sensor system; acquire a visual measurement by providing at least the captured odometry information and the captured image to the visual measurement application; determine an updated robot pose within an updated map of landmarks by providing at least the odometry information, and the visual measurement as inputs to the SLAM application; and determine robot behavior based upon inputs including the updated robot pose within the updated map of landmarks.
In some embodiments, the landmarks database includes: descriptions of a plurality of landmarks; a landmark image of each of the plurality of landmarks and an associated landmark pose from which the landmark image was captured; and descriptions of a plurality of features associated with a given landmark from the plurality of landmarks including a 3D position for each of the plurality of features associated with the given landmark.
In certain embodiments, the visual measurement application directs the processor to: identify features within an input image; identify a landmark from the landmark database in the input image based upon the similarity of the features identified in the input image to matching features associated with a landmark image of the identified landmark in the landmark database; and estimate a most likely relative pose by determining a rigid transformation of the 3D structure of the matching features associated with the identified landmark that results in the highest degree of similarity with the identified features in the input image, where the rigid transformation is determined based upon an estimate of relative pose and the acute angle at which the optical axis of the camera is aligned above the direction of motion of the mobile robot.
In some embodiments, identifying a landmark in the input image includes comparing unrectified image patches from the input image to landmark images within the landmark database.
In several embodiments, the SLAM application directs the processor to: estimate the location of the mobile robot within the map of landmarks based upon a previous location estimate, odometry data and at least one visual measurement; and update the map of landmarks based upon the estimated location of the mobile robot, the odometry data, and the at least one visual measurement.
In some embodiments, the controller circuit further directs a processor to actuate the drive to translate the mobile robot toward a landmark identified in a previous input frame; and the visual measurement application further directs a processor to search for features of the landmark identified in the previous input image in locations above the locations in which the features were identified in the previous input image.
In some embodiments, the visual measurement application further directs the processor to generate new landmarks by: detecting features within images in a sequence of images; identifying a set of features forming a landmark in multiple images from the sequence of images; estimating 3D structure of the set of features forming a landmark and relative robot poses at the times each of the multiple images is captured using the identified set of features forming the landmark in each of the multiple images; recording a new landmark in the landmark database, where recording the new landmark comprises storing: an image of the new landmark, at least the set of features forming the new landmark, and the 3D structure of the set of features forming the new landmark; and notify the SLAM application of the creation of a new landmark.
In several embodiments, the SLAM application directs a processor to: determine a landmark pose as the pose of the mobile robot at the time the image of the new landmark stored in the landmark database is captured; and recording the landmark pose for the new landmark in the landmark database.
In some embodiments, the visual measurement application further directs a processor to estimate 3D structure of the set of features forming the landmark and relative robot poses at the times each of the multiple images is captured by minimizing reprojection error of the 3D structure of the set of features forming the landmark onto each of the multiple images for the estimated relative robot poses at the times each of the multiple images is captured.
In some embodiments, the visual measurement application further directs a processor to identify at least one landmark in multiple images from the sequence of images by comparing unrectified image patches from the images in the sequence of images.
In some embodiments, the at least one processor is a single processor directed by the visual measurement application, behavioral controller application, and SLAM application.
In a number of embodiments, the at least one processor includes at least two processors where the behavioral controller application directs a first of the at least two processors and the virtual measurement application and SLAM application direct another of the at least two processors.
In some embodiments, the machine vision sensor system further includes a plurality of cameras under the top surface of the robot and having focal axes angled upward relative to the horizontal plane of the top surface and wherein at least one camera faces a reverse drive direction opposite to the forward drive direction.
In some embodiments, the machine vision sensor system includes a second camera including optics defining a camera field of view and a camera optical axis; and the second camera is positioned so that the optical axis of the optics of the second camera is aligned at an angle above the direction of motion.
In several embodiments, the machine vision system includes a stereo pair of cameras having overlapping fields of view.
In several embodiments, the camera is positioned within the recessed structure at a distance of at most 6 inches from a floor surface.
In some embodiments, the camera field of view is aimed at static features located in a range of 3 feet to 8 feet from a floor surface on a planar wall.
In several embodiments, the camera images contain about 6-12 pixels per inch.
In certain embodiments, the camera is a 320×420 VGA, 3 megapixel camera that does not have an IR filter.
Some embodiments of the invention provide a mobile robot configured to navigate an operating environment, including: a body that includes a top surface located at a height not more than 6 inches from a bottom surface, the body including a recessed structure under the top surface, the body containing: at least one processor; memory containing a behavioral controller application, where the behavioral controller application directs a processor to navigate an environment based on the captured images; a machine vision sensor system including a camera configured to capture images of the operating environment of the mobile robot, the camera including optics defining a camera field of view and a camera optical axis, where the camera is positioned within the recessed structure and is tilted so that the optical axis is aligned at an acute angle between 10 and 60 degrees above a forward drive direction of motion of the mobile robot.
In some embodiments, the body further includes: a lens cover protecting the camera, wherein the lens cover is set back relative to an opening of the recessed structure at an acute angle with respect to the optical axis of the camera that is closer to perpendicular than an angle formed between a plane defined by the opening in the recessed structure and the optical axis of the camera.
In some embodiments, the camera is a 320×420 VGA, 3 megapixel camera that does not have an IR filter.
Turning now to the drawings, particularly the
The combination of SLAM with visual sensors is often referred to as VSLAM 740. VSLAM 740 processes are typically vision and odometry-based, and enable reliable navigation in feature rich environments. Such visual techniques can be used by a vehicle, such as a mobile robot 10, to autonomously navigate an environment using a self-generated map that is continuously updated. A variety of machine vision systems have been proposed for use in VSLAM 740 including machine vision systems that include one or more cameras 125.
Systems and methods in accordance with embodiments of the invention perform VSLAM 740 using a camera 125 mounted under the top surface 108 of a mobile robot 10 and having an optical axis 155 aligned at an acute angle above the top surface 108 of a mobile robot 10. Many VSLAM 740 processes analyze disparity of features captured in a series of images in order to estimate distances to features and/or triangulate position. In general, the amount of disparity observed for a set of features between a series of images that have been captured from different vantage points determines the precision with which these features are mapped within the environment. The greater the observed disparity, the more accurate the distance measurement.
When a mobile robot 10 employs a forward looking camera 125 having an optical axis 155 of the lens 140 aligned parallel with the direction of forward motion, there is generally only a minimal amount of disparity that can be observed in a set of features positioned directly in front of the mobile robot 10 as the robot moves toward the features. As is discussed further below, features visible in the center of the field of view of a forward looking camera are likely to increase in scale as a mobile robot 10 moves toward the feature, with little discernable disparity between successive images of the feature. Accordingly, 3D structure of features within the center of the field of view of a forward looking camera may be difficult to ascertain from a series of images captured as the mobile robot 10 moves forward toward the features. The problem is particularly acute for small mobile robots, such as house cleaning robots, where the robot form factor dictates camera placement close to the ground (e.g., 4 inches or less above the ground). Location precision of forward looking cameras can be improved by increasing the field of view of the forward looking camera. However, increasing field of view reduces the angular resolution of the image data captured for a given image sensor resolution. Furthermore, the field of view is increased so that disparity can be observed with respect to the peripheral portions of the field of view, or the off-axis field of view, of the camera where the distortion of wide angle lenses is typically greatest and the angular resolution of the images captured by the camera is lowest.
Tilting a camera at an angle above the forward direction of motion can increase the disparity observed across the field of view of the camera when the camera moves toward an object. In particular, tilting the camera at an angle above the forward direction of motion increases the disparity observed within the center of the field of view, the portion of the camera field of view with the highest angular resolution. A feature observable within the center of the field of view of a camera tilted so that its optical axis 155 forms an acute angle relative to the horizontal axis aligned with the forward direction of motion will move upward in a series of images captured while moving toward the feature. Accordingly, objects such as (but not limited to) picture frames and televisions that are frequently hung on the walls of residences and having readily discernable features at any scale and in various lighting conditions, provide excellent navigation landmarks for mobile robots that operate in residential environments, such as house cleaning robots.
Tilting the camera 125 upward can also enable the mobile robot 10 to more precisely determine the 3D structure of the underside of objects hung on walls. Furthermore, tilting the camera 125 allows the mobile robot 10 to focus on an area within a typical indoor environment in which features are unchanging, such as those features imaged around door frames, picture frames and other static furniture and objects, allowing the mobile robot 100 to identify reliable landmarks repeatedly, thereby accurately localizing and mapping within an environment. Additionally, in implementations, the camera 125 on the mobile robot 10 is a 320×240 QVGA, 0.0768 mP camera (or 640×480 VGS, 0.3 MP camera) transferring images at a rate of less than 30 milliseconds with an image processing rate of 3 frames per second. In implementations, the camera 125 has no IR filter for better detection of features in low illumination environments. In implementations, the mobile robot 10 will create new landmarks if the number of detectable features falls below a threshold for a minimum number of features for detecting a landmark. In embodiments, the threshold number of landmarks is a cluster of identifiable features detected at a rate 1-10 landmarks per foot of travel and preferably 3 landmarks per foot of robot travel at a rate of approximately 1 ft per second or approximately 306 mm per second. In implementations, if the environment is too dark and the illumination is insufficient for feature detection, the mobile robot 10 will depend on another sensor, such as an optical dead reckoning sensor aimed at the floor (e.g. an LED or laser illuminated mouse sensor) to localize.
Disparity measurements made with respect to features visible within the center of the field of view of the camera 125 can be made more precisely by sacrificing field of view for increased angular resolution. In several embodiments, the angular resolution achieved using a specific sensor resolution is increased relative to a forward looking camera employing a wide angle lens by utilizing a tilted camera 125 that employs a view lens 140 having a horizontal field of view of, for example, 78 degrees. A typical distinction between narrow field of view lenses 140 and wide angle lenses is that perspective projection is generally a good approximation to the true imaging characteristics of narrow field of view lenses, whereas wide angle lenses introduce distortion. Accordingly, the angular resolution and modulation transfer function (MTF) of a lens tends to be more even over the field of view of a narrow field of view lens when compared to the variation in angular resolution and MTF experienced across the field of view of a wide angle lens. The distortion introduced by a wide angle lens can be corrected using computationally expensive distortion correction operations and disparity can be determined following rectification of the images captured by the camera. When a narrow field of view camera 125 is utilized, unrectified images can be utilized in subsequent image processing processes. Accordingly, use of narrow fields of view lenses in combination with appropriate VSLAM processes can provide advantages relative to more traditional VSLAM processes using wide field of view lenses in that they avoid performing the additional computationally expensive step of rectification and can locate features and/or landmarks (i.e. groups of features) with higher precision. An example of an implementation using a narrow field of view lens to capture an environment is described in detail below with reference to
Many VSLAM processes rely upon having a wide baseline between observed features to perform localization. As is discussed further below, an alternative approach involves identifying related features having distinctive 3D structure as landmarks. Relative pose of a mobile robot 10 can be readily determined based upon the 3D structure of the features forming a landmark and the appearance of the features in an image captured by the mobile robot 10. Therefore, the precision of a relative pose estimate is largely dependent upon the angular precision with which the distances between features observed within an image captured by the mobile robot's 10 camera 125 can be measured. Accordingly, configuring the mobile robot 10 with a tilted camera 125 having a narrow field of view lens can achieve improved performance over the equivalent system equipped with a forward facing camera or a camera with a wider field of view and the same resolution. Implementations using a tilted camera 125 are described in detail below with reference to
In many embodiments, the field of view of a mobile robot's 10 camera 125 and the specific angle formed between the camera's optical axis 155 and the forward direction of motion is determined based upon the requirements of specific applications including (but not limited to) the frame rate of the camera 125, the speed of the mobile robot 10, and the processing capacity of the processor(s) utilized to perform image processing within the mobile robot 10. As can readily be appreciated, the greater the speed of the mobile robot 10 the greater the observed disparity between captured frames. Disparity is observed by overlapping the FOV of frames, and the mobile robot 10 moves at a speed of between 220 mm per second and 450 mm per second and preferably travels at a speed of 306 mm per second. Increasing the tilt of the camera 125 can further increase the disparity. The ability to observe the disparity depends upon the rate at which the mobile robot 10 can capture and process frames of image data in real time. A lower frame rate can be compensated for by a larger field of view. As noted above, however, increasing field of view can come at the computational cost of rectifying the captured images. Therefore, in many optional configurations of the mobile robot 10, camera 125 tilt and field of view are selected to meet the requirements of a specific mobile robot and its operating environment.
The effective field of view of the navigation system 120 can be increased without decreasing angular resolution by adding additional cameras 125, 410 positioned at different locations around the mobile robot 10. In one optional aspect of the invention, each camera 125, 410 is housed within a separate recess 130 in the body 100 of the mobile robot 10 with a separate protective lens cover 135. Certain embodiments may include both a front facing tilted camera 125 and one or more rear facing tilted cameras 410. Implementations using multiple cameras are described in detail with reference to
During operation, the one or more cameras 125 mounted on the mobile robot 10 may become obstructed for any of a variety of reasons. For example, dust and debris may accumulate on the camera lens 140, or a lens cover 135, over time and with use of the mobile robot 10 and thus occlude portions of the images being captured. The mobile robot 10 is able to detect when some type of obstruction is occluding one or more portions of a camera lens 140. When an occlusion is detected, the mobile robot 10 may provide a notification that notifies a user to, for example, clean a lens cover 135 protecting the camera lens 140 in order to remove the occlusion. In order to detect the presence of an occlusion obstructing a portion of the field of view of a camera 125, some embodiments analyze the particular portions of the images providing useful information for a VSLAM process, and based on this analysis, are able to determine that certain other portions of the field of view may be occluded. In particular, some embodiments may maintain a histogram of the different portions of the field of view and a frequency with which each portion is capturing image data that is being used to generate new landmarks and/or recognize existing landmarks during the navigation of a mobile robot 10 through an environment using VSLAM. Regions that are used with low frequency can be flagged as occlusions and notifications generated accordingly.
Although much of the discussion that follows describes camera configurations used in combination with specific VSLAM processes, the techniques disclosed herein can be utilized by the mobile robot 10 configured using any of a variety of different mapping and navigation mechanisms. Accordingly, various optional configurations of the mobile robot 10 incorporating one or more tilted cameras 125 for use in navigating an environment are discussed further below.
Mobile Robots with Enhanced Vision Sensor Configurations
The mobile robot 10 incorporates a navigation system 120 including a camera 125 that can capture image data used by a VSLAM processes in the navigation of the mobile robot 10 and the mapping of the environment surrounding the mobile robot 10. The tilted camera 125 used in the navigation system 120 of the mobile robot 10 is illustrated in
As shown in
The mobile robot 10 can be configured to actuate its drive 111 based on a drive command. In some embodiments, the drive command may have x, y, and θ components and the command may be issued by a controller circuit 605. The mobile robot body 100 may have a forward portion 105 corresponding to the front half of the shaped body, and a rearward portion 110 corresponding to the back half of the shaped body. The drive includes right and left driven wheel modules 111a, 111b that may provide odometry to the controller circuit 605. The wheel modules 111a, 111b are substantially disposed along a transverse axis X defined by the body 100 and include respective drive motors 112a, 112b driving respective wheels 113a, 113b. The drive motors 112a, 112b may releasably connect to the body 100 (e.g., via fasteners or tool-less connections) with the drive motors 112a, 112b optionally positioned substantially over the respective wheels 113a, 113b. The wheel modules 111a, 111b can be releasably attached to the chassis and forced into engagement with the cleaning surface by springs. The mobile robot 10 may include a caster wheel 116 disposed to support a portion of the mobile robot body 100, here, a forward portion of a round body 100. In other implementations having a cantilevered cleaning head, such as a square front or tombstone shaped robot body 100, the caster wheel is disposed in a rearward portion of the robot body 100. The mobile robot body 100 supports a power source (e.g., a battery 117) for powering any electrical components of the mobile robot 10.
Referring again to
In many embodiments, a forward portion 105 of the body 100 carries a bumper 115, which can be utilized to detect (e.g., via one or more sensors of the bumper sensor system 550) events including (but not limited to) obstacles in a drive path of the mobile robot 10. Depending upon the behavioral programming of the mobile robot 10, it may respond to events (e.g., obstacles, cliffs, walls) detected by the bumper 115, cliff sensors 119a-119f, and one or more proximity sensors 120a-120n by controlling the wheel modules 111a, 111b to maneuver the robot 10 in response to the event (e.g., back away from a detected obstacle).
As illustrated, a user interface 126 is disposed on a top portion of the body 100 and can be used to receive one or more user commands and/or display a status of the mobile robot 10. The user interface 126 is in communication with the controller circuit 605 carried by the robot 10 such that one or more commands received by the user interface 126 can initiate execution of a cleaning routine by the robot 10.
The mobile robot 10 may also include a camera 125 navigation system 120 embedded within the body 100 of the robot 10 beneath the top cover 108. The navigation system 120 may include one or more cameras 125 (e.g., standard cameras, volumetric point cloud imaging cameras, three-dimensional (3D) imaging cameras, cameras with depth map sensors, visible light cameras and/or infrared cameras) that capture images of the surrounding environment. In one optional configuration, the camera 125 captures images of the environment that are positioned at an acute angle relative to the axis of motion (e.g., F or A) of the mobile robot 10. For example, as illustrated in
In these embodiments, the lens 140 (
The camera 125 may optionally be tilted so that the lower periphery of the field of view of the camera 125 is unoccluded by the body 100 of the mobile robot 10. Alternatively, in implementations, the body 100 of the mobile robot 10 partially occludes a lower portion of the field of view of the tilted camera 125 and the controller circuit 605 discards this portion of the field of view when imaging features. As noted above, tilting the camera 125 can increase the amount of disparity observed across the field of view of the camera 125 as the mobile robot 10 moves through the environment. In an implementation, the mobile robot 10 employs a titled camera 125 with optics having a sufficiently narrow field of view so that perspective projection can be assumed to be a good approximation to the true imaging characteristics of the narrow field of view lens. Subsequent image processing can be performed without rectifying the images and the camera 125 can observe features with a higher angular resolution than a wider angle lens that would also introduce distortion.
The images captured by the camera 125 may be used by VSLAM processes in order to make intelligent decisions about actions to take to maneuver the mobile robot 10 about an operating environment. While the camera 125 of the navigation system 120 is illustrated in
In addition to the camera 125 of the navigation system 120, the mobile robot 10 may include different types of sensor systems 500 in order to achieve reliable and robust autonomous movement. The additional sensor systems 500 may be used in conjunction with one another to create a perception of the mobile robot's 10 environment sufficient to allow the robot to make intelligent decisions about actions to take in that environment. The various sensor systems may include one or more types of sensors supported by the robot body 100 including, but not limited to, obstacle detection obstacle avoidance (ODOA) sensors, communication sensors, navigation sensors, range finding sensors, proximity sensors, contact sensors (e.g. bumper sensors), sonar, radar, LIDAR (Light Detection And Ranging, which can entail optical remote sensing that measures properties of scattered light to find range and/or other information of a distant target), and/or LADAR (Laser Detection and Ranging). In some implementations, the sensor system includes ranging sonar sensors, proximity cliff detectors 119a-119f, proximity sensors 120a-120n (e.g., “n” being an unlimited number in an array of proximity sensors looking out the sidewalls of the robot 10) contact sensors in the bumper sensor system 550, a laser scanner, and/or an imaging sonar.
There are several challenges involved in placing sensors on a robotics platform. First, the sensors are typically placed such that they have maximum coverage of areas of interest around the mobile robot 10. Second, the sensors are typically placed in such a way that the robot itself causes an absolute minimum of occlusion to the sensors; in essence, the sensors should not be placed such that they are blinded by the robot itself. Third, the placement and mounting of the sensors should not be intrusive to the rest of the industrial design of the platform. In terms of aesthetics, it can be assumed that a robot with sensors mounted inconspicuously is more attractive than otherwise. In terms of utility, sensors should be mounted in a manner so as not to interfere with normal robot operation (e.g., snagging on obstacles).
Additional options that can be employed in the implementation of the navigation system 120 of the mobile robot 10 are discussed further below.
In order to navigate through an environment, the mobile robot 10 may use information gathered from various different types of sensors in order to ascertain the characteristics of its surrounding environment. As noted above, the mobile robot 10 uses a navigation system 120 that includes one or more cameras 125 that capture images of the surrounding environment. The images may be provided to a VSLAM process for use in the localization and mapping of the mobile robot 10 within the environment.
In the implementation of
Returning to the implementation of
As noted above, the mobile robot 10 can optionally include a narrow field of view lens 140 that provides images in which perspective projection can be assumed to be a good approximation to the true imaging characteristics of narrow field of view lenses. Where a narrow field of view lens 140 is utilized by the mobile robot 10, the transfer of mechanical stresses from the mobile robot 10 to the lens 140 can distort the lens eliminating some of the benefits of utilizing the narrow field of view lens 140 by introducing a complex distortion correction process as part of the image processing pipeline. The design of the lens holder can play an important role in preventing the transfer of mechanical stresses from the mobile robot 10 to the lens 140 and avoiding distortion of the lens 140. Implementations of lens holders that can optionally be utilized in camera(s) 125 of the mobile robot 10 are discussed further below.
Although the mobile robot 10 is shown with a single camera 125 embedded within the top cover of the mobile robot body 100 in
Mobile Robot with Forward and Backward Cameras
The mobile robot 10 may optionally include multiple cameras distributed around the body 100 of the mobile robot. A particularly advantageous configuration involves the use of a tilted forward facing camera 125 and a tilted backward facing camera 410. The forward and backward facing tilted cameras 125, 410 can optionally be contained within separate recesses 130a, 130b within the top 108 of the body 100 of the mobile robot 10 and protected using lens covers 135a, 135b configured in a manner similar to those described above with respect to the mounting of a single recess 130 and tilted camera 125 behind a lens cover 135 with reference to
A cross-sectional view of the front and rear facing tilted cameras 125, 410 of the mobile robot 10 configured in the manner illustrated in
In some embodiments, both cameras 125, 410 may capture images of the surrounding environment and provide these images to a VSLAM process. In certain embodiments, only one of the cameras 125 or 410 provides a VSLAM process with input images. For example, the mobile robot 10 may use the front facing camera 125 to detect and track a set of features associated with a landmark while moving in a forward direction toward the landmark and, upon switching directions, use the rear facing camera 410 to detect and track the same set of features while moving away from the landmark.
The mobile robot 10 may simultaneously capture images of the surrounding environment using both the tilted front and rear cameras 125, 410, thereby capturing a larger portion of the surrounding environment in less time than a single-camera 125 enabled robot 10. The mobile robot 10 may optionally utilize a wide angle, omnidirectional, panoramic or fish-eye type lens to capture more of the surrounding environment at the expense of a decreased angular resolution. However, by using two cameras 125, 410 each with a narrowed field of view in comparison to, for example, a panoramic camera, to provide input images, the VSLAM process is able to detect a similar number of features as would be achieved using a panoramic, or similar wide field of view lens, but each feature is captured at a higher angular resolution with a narrower field of view lens (assuming comparable sensor resolution). In particular, the narrowed field of view spans a frustum of approximate 50-60 degrees in the vertical direction and is able to detect features in the environment at a height of generally 3-14 feet. As is discussed below, providing a VSLAM process with higher precision measurements of the locations of features visible within images captured by the machine vision sensor system 120 enables the VSLAM process to map the environment and localize the location of the mobile robot 10 with precision.
Although various optional configurations of the mobile robot 10 involving tilted front and rear facing cameras 125, 410 are described above with respect to
The behavior of a mobile robot 10 is typically selected from a number of behaviors based upon the characteristics of the mobile robot's 10 surrounding operating environment and/or the state of the mobile robot 10. In many embodiments, characteristics of the environment may be ascertained from images captured by a navigation system 120. Captured images can be used by one or more VSLAM processes to map the environment surrounding the mobile robot 10 and localize the position of the mobile robot 10 within the environment.
A mobile robot controller circuit 605 (hereafter “controller circuit 605”) that can be used for VSLAM using an enhanced navigation system 120 is illustrated in
The landmarks database 650 contains information concerning a number of previously observed landmarks that the mobile robot 10 can utilize to perform visual measurements from which a relative pose can be determined. A landmark can be considered to be a collection of features having a specific 3D structure. Any of a variety of features can be utilized to identify a landmark including (but not limited to) 2D features, 3D features, features identified using Scale-invariant Feature Transform (SIFT) descriptors, features identified using Speeded Up Robust Features (SURF) descriptors, and/or features identified using Binary Robust Independent Elementary Features (BRIEF) descriptors. When the mobile robot 10 is configured as a housecleaning robot, a landmark could be (but is not limited to) a set of features identified based upon the 3D structure of the corner of a picture frame or a set of features identified based upon the 3D structure of a doorframe. Such features are based on static geometry within the room and, although the features have some illumination and scale variation, they are more readily discerned and identified in aggregate as landmarks than objects located within a lower region of the environment that are frequently displaced (e.g. chairs, trash cans, pets, etc.) In implementations, the camera 125 on the mobile robot 10 is a 320×240 QVGA, 0.0768 MP camera (or 640×480 VGP, 0.3 MP camera) that has no IR filter for better detection of features in low illumination environments. In implementations, particularly when the robot 10 is starting a new mission without storing data between runs or entering a previously unexplored area, the mobile robot 10 will create new landmarks. In implementations, the mobile robot 10 also will create new landmarks if lighting variations make previously viewed features indiscernible and the number of detectable features falls below a threshold for a minimum number of features for detecting a landmark. In embodiments, the threshold number of landmarks is a cluster of identifiable features detected at a rate 1-10 landmarks per foot of travel and preferably 3 landmarks per foot of robot travel at a rate of approximately 1 ft per second or approximately 306 mm per second. The robot 10 thus builds a useful localization map for features discernable at that lighting intensity and, in implementations, the robot stores one or more persistent maps with landmarks viewed at various light intensities, for example those associated with data include a time of day and calendar date associated with seasonal lighting variations. In still other implementations, if the environment is too dark and the illumination is insufficient for feature detection, the mobile robot 10 will depend on another sensor or combination sensors, such as an wheel odometry and optical dead reckoning drift detection sensor 114 (
The visual measurement application 630 matches a portion of an input image to the landmark image and then determines relative pose based upon the spatial relationship of the features from the landmark identified in the input image and the 3D structure of the identified features from the landmark retrieved from the landmarks database 650. A variety of options exist for determining relative pose based upon the features from the landmark identified in the input image and the 3D structure of the identified features including (but not limited) determining relative pose based upon the rigid transformation of the 3D structure that yields a spatial relationship of features most similar to that observed in the input image, thereby minimizing reprojection error. Alternatively or additionally, the rigid transformation yields an estimation of the most likely relative pose given the observed spatial relationship of features and knowledge of the statistical characteristics of the sources of error within the visual measurement system. Irrespective of the specific process utilized to determine the relative pose, the precision of the relative pose estimate is increased by more precisely measuring the 3D structure of the features forming a landmark and/or the spatial relationship between the features identified within an input image. Processes for creating new landmarks for use in visual measurements and for determining relative pose using landmarks are discussed further below.
Referring again to
The map of landmarks 640 includes a map of the environment surrounding the mobile robot 10 and the position of landmarks relative to the location of the mobile robot within the environment. The map of landmarks 640 may include various pieces of information describing each landmark in the map, including (but not limited to) references to data describing the landmarks within the landmark database.
The behavioral control application 630 controls the actuation of different behaviors of the mobile robot 10 based on the surrounding environment and the state of the mobile robot 10. In some embodiments, as images are captured and analyzed by the SLAM application 635, the behavioral control application 645 determines how the mobile robot 10 should behave based on the understanding of the environment surrounding the mobile robot 10. The behavioral control application 645 may select from a number of different behaviors based on the particular characteristics of the environment and/or the state of the mobile robot 10. The behaviors may include, but are not limited to, a wall following behavior, an obstacle avoidance behavior, an escape behavior, among many other primitive behaviors that may be actuated by the mobile robot 10.
In several embodiments, the input/output interface 620 provides devices such as (but not limited to) sensors with the ability to communicate with the processor and/or memory. In some embodiments, the network interface 660 provides the mobile robot 10 with the ability to communicate with remote computing devices, such as computers and smartphone devices, via a wired and/or wireless data connection. Although various robot controller 605 architectures are illustrated in
The mobile robot 10 may include behavioral control applications 710 used to determine the mobile robot's behavior based upon the surrounding environment and/or the state of the mobile robot. The mobile robot 10 can include one or more behaviors that are activated by specific sensor inputs and an arbitrator determines which behaviors should be activated. Inputs can include images of the environment surrounding the mobile robot 10 and behaviors can be activated in response to characteristics of the environment ascertained from one or more captured images.
A mobile robot behavioral control application 710 configured to enable navigation within an environment based upon (but not limited to) a VSLAM process is conceptually illustrated in
The programmed behaviors 730 can include various modules that may be used to actuate different behaviors of the mobile robot 10. In particular, the programmed behaviors 730 may include a VSLAM module 740 and corresponding VSLAM database 744, a navigation module 742, and a number of additional behavior modules 743.
The VSLAM module 740 manages the mapping of the environment in which the mobile robot 10 operates and the localization of the mobile robot with respect to the mapping. The VSLAM module 740 can store data regarding the mapping of the environment in the VSLAM database 744. The data may include a map of the environment and characteristics of different regions of the map including, for example, regions that contain obstacles, other regions that contain traversable floor, regions that have been traversed, frontiers to regions that have not yet been traversed, the date and time of the information describing a specific region, and/or additional information that may be appropriate to the requirements of a specific application. In many instances, the VSLAM database 744 also includes information regarding the boundaries of the environment, including the location of stairs, walls, and/or doors. As can readily be appreciated, many other types of data may optionally be stored and utilized by the VSLAM module 740 in order to map the operating environment of the mobile robot 10. Where the VSLAM module 740 performs visual measurements and uses the visual measurements to provide relative poses as inputs to a SLAM module, the VSLAM database 744 can include a landmarks database similar to the landmarks database 650 described above.
The navigation module 742 actuates the manner in which the mobile robot 10 is to navigate through an environment based on the characteristics of the environment. For example, in implementations, the navigation module 742 may direct the mobile robot 10 to change directions, drive at a speed of approximately 306 mm per second and then slow down upon approaching an obstacle, drive in a certain manner (e.g., wiggling manner to scrub floors, or a pushing against a wall manner to clean sidewalls), or navigate to a home charging station.
Other behaviors 743 may also be specified for controlling the behavior of the mobile robot 10. Furthermore, to make behaviors 740-743 more powerful, it is possible to arbitrate between and/or chain the output of multiple behaviors together into the input of another behavior module to provide complex combination functions. The behaviors 740-743 are intended to implement manageable portions of the total cognizance of the mobile robot 10.
Referring again to
The robot resources 760 may be a network of functional modules (e.g., actuators, drives, and groups thereof) with one or more hardware controllers. The commands of the control arbiter 750 are typically specific to the resource to carry out a given action. The specific resources with which the mobile robot 10 is configured typically depends upon the requirements of the specific application to which the mobile robot 10 is adapted.
Although specific robot controllers and behavioral control applications are described above with respect to
The mobile robot 10 can continuously detect and process information from various on board sensors in order to navigate through an environment. A process 800 that can optionally be used by the mobile robot 10 to navigate an environment is illustrated in
As the mobile robot 10 moves, image and odometry data is captured (820). In one optional aspect of the invention, the mobile robot 10 captures each new image after travelling a threshold distance following capture of a prior image, such as (but not limited to) 20 cm between images. The specific distance between image captures typically depends on factors including (but not limited to) the speed of the robot 10, the field of view of the camera 125, and the real time processing capabilities of the specific mobile robot 10 configuration. In several embodiments, the odometry data is provided by one or more different types of odometers, including a wheel odometer that captures the odometry data based on the rotation of the wheels or an optical flow odometry system that obtains the odometry data by capturing images of a tracking surface and determining distance traveled, including correcting for any heading drift, based on the optical flow observed between successive images. Other embodiments may use additional odometry sensors or combinations of these sensors as appropriate to the requirements of the specific application.
Visual measurements can be generated (825) by the mobile robot 10 based on the odometry data and the captured images captured by the tilted camera 125. In some embodiments, the process 800 matches the new image to a set of landmark images stored in a landmarks database and for each match, estimates a relative pose determined relative to the landmark pose of the landmark given the 3-D structure and the feature correspondences between the new view and the landmark image.
A SLAM process can then be performed (830) using the visual measurement data and the odometry data. The mobile robot 10 can optionally maintain a map of landmarks and performs the SLAM process to estimate the location of the mobile robot 10 within this map. The SLAM process can also update the map of landmarks.
A determination (835) is made by the mobile robot 10 concerning whether the process has completed, and if so, completes. Otherwise, the process determines a new behavior (805).
The performance of a navigation process similar to the process described above with reference to
The mobile robot 10 can utilize landmarks, which are collections of features having specific visually discernable 3D structures, to perform navigation. The mobile robot 10 creates landmarks by capturing images of an environment and observing and aggregating common features between the images. By overlapping the images and measuring the disparity between each of the features and estimating changes in pose between the capture of the images, the mobile robot 10 moving at a known speed can measure the distance to each of the features. These distances can then be utilized to determine the 3D structure of the set of features that define a landmark. As is discussed above, when some or all of the features forming the landmark are subsequently observed, knowledge of the 3D structure of the features can be utilized to estimate the relative pose of the mobile robot 10. In implementations, the mobile robot 10 can take two or more images over a distance traveled in order to localize.
A process that can optionally be utilized by the mobile robot 10 to identify a new landmark for use in navigation is illustrated in
The 3D structure of the identified features can then be determined (906) by minimizing the reprojection error between the locations of the observed features in each of the input images and the predicted locations given a specific estimate for the 3D structure of the identified features. Due to uncertainty in the relative poses of the mobile robot 10 when the input images were captured, the mobile robot 10 can use techniques including (but not limited to) bundle adjustment to simultaneously determine (906) the 3D structure of the identified features and estimate the relative motion of the mobile robot during capture of the input images. The mobile robot 10 can optionally use any of a variety of processes for determining structure from motion including (but not limited to) a trifocal tensor method.
Information concerning a newly identified landmark can be added (908) to the landmarks database. The mobile robot 10 can optionally associate one or more of the input images or portions of the input images and the estimate of the pose from which a specific input image was captured with a landmark in the landmarks database as landmark images and corresponding landmark poses. The mobile robot 10 can also optionally store the features and/or descriptors of the features that are associated with the landmark and a description of the 3D structure of the features. As can readily be appreciated, the specific structure of the database is typically determined based upon the requirements of a specific application and can include (but is not limited to) the use of a collection of kd-trees for performing efficient approximate nearest neighbor searches based upon observed features. The information added to the landmarks database by the mobile robot 10 can then be utilized in the determination of relative poses by the mobile robot 10 during subsequent navigation.
A process that can optionally be utilized by the mobile robot 10 to determine relative pose using previously created landmarks is illustrated in
When sufficient similarity exists between features in the input image and features of one or more landmarks in the landmarks database, the mobile robot 10 can determine (1006) that a known landmark from the landmark database is visible within the input image. The mobile robot 10 can then estimate (1008) the pose of the robot relative to a landmark pose associated with the landmark in the landmarks database based upon the reprojection of the 3D structure of the features of the landmark that most closely matches the spatial relationship of the features observed within the input image in which the landmark is visible. The relative pose can optionally be a description of the translation and/or rotation of the mobile robot 10 relative to the landmark pose associated with the landmark in the landmarks database. The specific manner in which the relative pose is represented is largely dependent upon the requirements of a particular application. The process of determining the most likely relative pose estimate can vary depending upon the configuration of the mobile robot 10. In one optional configuration of the mobile robot 10, a cost metric is utilized to minimize reprojection errors of the landmark features visible within an input image. In another optional configuration of the mobile robot 10, the estimation process considers likely sources of error associated with one or more of the process of estimating the 3D structure of the features forming the landmark, and/or the location of the landmark pose. In one specific configuration of the mobile robot 10, an initial estimate of relative pose is formed using reprojection of the 3D structure of the features forming the landmark and then a maximum likelihood estimation is performed assuming that the 3D structure of the features and relative pose can vary. In other configurations of the mobile robot 10, any of a variety of techniques can be utilized to determine relative pose based upon one or more of odometry data, the location of features in the images used to create the landmark, the 3D structure of the landmark, and/or the spatial relationship of the features associated with the landmark that are visible within the input image.
When the mobile robot 10 does not detect the presence of a known landmark within an input image, the mobile robot can optionally attempt to create (1010) a new landmark or may simply obtain another input image.
The mobile robot 10 can provide (1012) the identity of an identified landmark and the estimated relative pose to a SLAM process, which can determine a global pose estimate and/or update the global map of landmarks maintained by the mobile robot. As noted above, a mobile robot 10 can be optionally configured using any of a variety of SLAM processes that rely upon relative pose estimates determined relative to known landmarks as inputs. In implementations, the robot 10 uses FAST SLAM and BRIEF descriptors.
The processes described above for creating new landmarks and for navigating based upon previously created landmarks rely upon the spatial relationships that can be established using reprojection of the 3D structure of a set of features. As discussed further below with reference to
An example of the field of view captured by a mobile robot 10 that has a camera 125 configured such that the optical axis 155 is aligned with the forward direction of motion of the mobile robot is illustrated in
As described above with regard to
An example of views of a scene captured by a camera 125 tilted so that its optical axis 155 forms an acute angle above the direction of motion of the mobile robot 10 are illustrated in
To illustrate the observable disparity, the three successive views 1310, 1312, 1314 are superimposed. When features do not lie on the exact centerline of the field of view of the camera 125, the features will move up and away from the centerline of the field of view of the camera. As can readily be appreciated, the presence of significant disparity can be utilized to determine distance to a feature with much greater precision than can be made from the feature observed in the successive views 1300, 1302, 1304 illustrated in
The impact of increasing observable disparity when determining the 3D structure of a landmark and estimating relative pose is illustrated by a series of simulations described below with reference to
While the simulations described above compare a forward facing camera and a camera tilted so that its optical axis at a 30 degree angle above the direction of motion of the mobile robot, similar reductions in location uncertainty can be achieved using cameras tilted so that the optical axes of the cameras are aligned at other acute angles above the direction of motion of the mobile robot. Furthermore, although the benefits of using a tilted camera 125 to increase disparity observed in successive images captured by the camera using processes for creating landmarks and determining relative pose based upon landmarks are described above with reference to
Cameras employing wide angle lenses are typically utilized in mobile robots configured to perform VSLAM and CV-SLAM. A significant benefit of using a tilted camera 125 is that features observed within the center of the field of view of the camera can be utilized to obtain precise depth estimates. As is discussed further below, the precision with which the mobile robot 10 can determine the 3D structure of landmarks can be further increased by utilizing a tilted camera 125 configured with a narrow field of view lens 140 having increased angular resolution relative a camera employing a sensor with the same resolution and a wider field of view lens.
Increasing Navigation Precision with Increased Angular Resolution
Configuring the mobile robot 10 with a tilted camera 125 having a narrow field of view and increased angular resolution can increase the precision with which the mobile robot 10 can determine the 3D structure of landmarks and the precision of relative pose estimates determined based upon the reprojection of the 3D structure of the landmarks. Furthermore, the use of narrow field of view lenses can enable the mobile robot 10 to perform image processing without the computational expense (e.g., time, processing power, etc.) of rectifying the images acquired by the camera 125. As can readily be appreciated, decreasing processing load can enable the mobile robot 10 to process more images at a higher frame rate (enabling the mobile robot to move faster), decrease power consumption (increasing battery life), and/or provide the mobile robot 10 with additional processing capacity to perform other functions.
By contrast, the mobile robot 10 illustrated in
While different fields of view are illustrated in
The extent of the field of view of the camera 125 utilized by the mobile robot 10 can become largely irrelevant when some or all of the field of view of the camera 125 is occluded or severely blurred. Accordingly, the mobile robot 10 can be optionally configured to detect occlusions and notify the user of the need to inspect the mobile robot 10 to attempt to disregard the occlusion from comparisons of successive images. Occlusion detection processes that can be optionally employed by the mobile robot 10 are discussed further below.
The one or more cameras 125, 140 in the navigation system 120 of the mobile robot 10 may experience deteriorated functionality due to ongoing exposure to the elements in the environment surrounding the mobile robot 10. In particular, within an indoor environment, dust, debris, fingerprints, hair, food particles, and a variety of other objects may collect and settle on the camera lens cover(s) 135 as the mobile robot 10 cleans the environment. These obstructions may diminish the quality of images captured by the camera 125 for use in a VSLAM process and thus diminish the accuracy of the mobile robot's 10 navigation through an indoor environment. In order to maintain a sufficient level of navigation performance, the mobile robot 10 may provide a notification to a user when a determination is made that the mobile robot 10 is no longer receiving useful information from some or all of the view of at least one camera 125.
The manner in which a portion of the field of view of one of the cameras 125 of the mobile robot 10 can become occluded are as varied as the environments in which the mobile robot 10 can be configured to operate. When a portion of the field of view is occluded, the camera 125 may not provide any useful image data that can be used by navigation processes including (but not limited to) a VSLAM process. Examples of specific types of occlusions and the resulting images captured by the occluded camera 125 are conceptually illustrated in
An opaque occlusion 1806 is illustrated in
An occlusion detection process that can optionally be performed by the mobile robot 10 is illustrated in
The mobile robot 10 determines (1910) the different portions of the field of view of the camera(s) 125 in which features are identified and utilized during navigation processes including (but not limited to) VSLAM processes and updates (1915) occlusion detection data accordingly. When the mobile robot 10 is configured in such a way that the navigation process performs a VSLAM process, certain portions of an image may contain features that the VSLAM process may utilize to generate landmarks for use in performing visual measurements and mapping the environment. The mobile robot 10 can collect occlusion detection data describing the portions of the images used to generate and/or detect landmarks, and these portions of the images may correspond to different portions of the camera field of view. In one optional aspect of the invention, the mobile robot 10 may maintain a histogram of the various portions of the field of view being used to identify features. When an occlusion is present, the histogram would reflect that these portions of the field of view are not being utilized by VSLAM to generate and/or detect landmarks.
The mobile robot 10 can determine (1920) whether a certain threshold number of input images have been captured. In many embodiments, the threshold number of input images may vary and the accuracy of the occlusion detection will generally increase with a larger number of input images. When a threshold number of images have not been captured, the process continues (1905) to capture additional images. In embodiments, the threshold number of images is 1-10 images per foot of travel and preferably 3 images per foot of robot travel at a rate of approximately 1 ft per second or approximately 306 mm per second. When the threshold number of images has been captured, the process determines (1925) whether the occlusion detection data identifies one or more portions of the field of view of the camera 125 that are capturing image data that is not being used by the navigation processes. When the occlusion detection data does not identify one or more portions of the field of view that are capturing image data that is not being used by the navigation processes, the mobile robot 10 assumes that the field of view of the camera(s) 125 is unoccluded, and the occlusion detection process completes.
In one implementation, when the occlusion detection data identifies one or more portions of the field of view are capturing image data that is not being used by the navigation processes, the mobile robot 10 provides (1930) a camera occlusion notification to one or more users associated with the mobile robot 10 so that the users can clear the occlusion from the field of view of the camera 125. In one optional aspect of the process, the notification is in the form of an electronic message delivered to a user using user account information located on a server with which the mobile robot 10 can communicate. For example, a user may be notified through email, text message, or other communication. In another optional aspect, the user may be notified through some form of indicator on the mobile robot 10 such as a flashing light, sound, or other appropriate alert mechanism. The mobile robot 10 may also be optionally configured with a wiper blade, fan, air knife, and/or another appropriate cleaning mechanism that mobile robot can use to attempt to eliminate the detected occlusion.
A communication diagram illustrating communication between the mobile robot 10, an external server and a user device in accordance with an optional aspect of the invention is illustrated in
A system for notifying user devices of an occlusion in accordance with an embodiment of the invention is illustrated in
In several embodiments, the variety of user devices can use HTTP, SMS text, or another appropriate protocol to receive message via a network 2108 such as the Internet. In the illustrated embodiment, user devices include personal computers 2105-2106 and mobile phones 2107. In other embodiments, user devices can include consumer electronics devices such as DVD players, Blu-ray players, televisions, set top boxes, video game consoles, tablets, and other devices that are capable of connecting to a server via HTTP and receiving messages.
While the above contains descriptions of many specific optional aspects of the invention, these should not be construed as limitations on the scope of the invention, but rather as an example of different configurations thereof. Accordingly, the scope of the invention should be determined not by the examples illustrated, but by the appended claims and their equivalents.
The present application is a continuation of U.S. patent application Ser. No. 14/856,526, filed on Sep. 16, 2015, which claims the benefit of and priority from U.S. Provisional Patent Application No. 62/085,025, filed on Nov. 26, 2014, entitled “Systems and Methods for Performing Simultaneous Localization and Mapping using Machine Vision Systems” and from U.S. Provisional Patent Application No. 62/116,307 filed on Feb. 13, 2015, entitled “Systems and Methods for Performing Simultaneous Localization and Mapping using Machine Vision Systems”, the disclosures of which are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
62116307 | Feb 2015 | US | |
62085025 | Nov 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14856526 | Sep 2015 | US |
Child | 15353368 | US |