The following disclosure relates generally to techniques for automatically generating building floor plans and determining associated absolute locations for them using visual data of images and additional data captured in building interiors by multiple capture devices, and for subsequently using the generated building floor plan information in one or more manners, such as to improve navigation of the building or in other manners.
In various fields and circumstances, such as architectural analysis, property inspection, real estate acquisition and development, remodeling and improvement services, general contracting and other circumstances, it may be desirable to view information about the interior of a house, office, or other building without having to physically travel to and enter the building, including to determine actual as-built information about the building rather than design information from before the building is constructed. However, it can be difficult to effectively capture, represent and use such building interior information, including to display visual information captured within building interiors to users at remote locations (e.g., to enable a user to fully understand the layout and other details of the interior, including to control the display in a user-selected manner). In addition, while a floor plan of a building may provide some information about layout and other details of a building interior, such use of floor plans has some drawbacks in certain situations, including that floor plans can be difficult to construct and maintain, to accurately scale and populate with information about room interiors, to visualize and otherwise use, etc.
The present disclosure describes techniques for using computing devices to perform automated operations related to automatically generating building floor plans and determining associated absolute locations for the generated floor plans using visual data of images and additional data captured in building interiors by multiple data capture devices, and for subsequently using the generated building floor plan information in one or more manners. In at least some embodiments, the described techniques include automatically generating a building floor plan based at least in part on analyzing visual data of images captured at multiple image acquisition locations in a building by a camera device to determine room shapes of the rooms surrounding the image acquisition locations, and automatically determining and associating GPS (Global Positioning System) location data or other absolute location data with the generated floor plan based at least in part on additional data captured at other data capture locations at the building by a separate mobile device that moves independently from the camera device, such as by automatically determining relative positions of an image acquisition location of the camera device and a different other data capture location of the mobile device in order to extend the absolute location data from the other data capture location of the mobile device to the image acquisition location of the camera device and its surrounding room shape. The images may, for example, include panorama images or other images (e.g., rectilinear perspective images) that are acquired at image acquisition locations in or around a multi-room building (e.g., a house, office, etc.) by one or more camera devices, referred to generally herein as ‘target images’. In at least some embodiments, the target images and/or other acquired data may be analyzed to generate a floor plan and/or other mapping information for the building (e.g., a three-dimensional model of the building's interior, a linked group of target images with pairwise inter-image directional information, etc.), such as by using visual data and determined acquisition locations of acquired images to determine room shapes and to position such room shapes relative to each other, or to otherwise determine relative positions of acquisition locations—in at least some such embodiments, the automated analysis and use of acquired images and/or other data is further performed without having or using any acquired depth data from any depth sensors or other distance-measuring devices about distances from an acquisition location to walls or other objects in the surrounding building. Such generated floor plans and/or other mapping information may be further used in various manners in various embodiments, such as for controlling navigation of mobile devices (e.g., autonomous vehicles), for display or other presentation on one or more client devices in corresponding GUIs (graphical user interfaces), etc. Additional details are included below regarding the automated determination and use of image acquisition location information, and some or all of the techniques described herein may be performed via automated operations of a Building Floor Plan Generation and Location Determination Manager (“BFPGLDM”) system in at least some embodiments, as discussed further below.
As noted above, automated operations of the BFPGLDM system may include automatically determining relative positions of an image acquisition location of the camera device, at which one or more target panorama images (or other target images) are acquired at a building (e.g., in a room or other defined area), and a different other data capture location of the mobile device at the building at which other additional data is captured, such as to enable GPS location data or other absolute location data acquired by the mobile device at the other data capture location to be extended to the separate image acquisition location of the camera device and/or to other locations around that image acquisition location that are determined at least in part from analysis of visual data of the one or more target images (e.g., locations of a room shape of a surrounding room, such as locations of walls of that room). The relative positions of such an image acquisition location and other data capture location may in some situations be referred to as inter-location ‘pose’ information, such as to correspond to at least directions between the image acquisition location and other data capture location and optionally to relative or absolute distances between the two locations.
As noted above, the additional data captured by the mobile device at a data capture location may be used as part of determining the position of an image acquisition location of the camera device relative to that data capture location of the mobile device, and may have various forms in various embodiments, with non-exclusive examples of types of additional captured data including one or more of the following: absolute location data for the mobile device, such as from one or more GPS receivers on the mobile device and/or that may be received or determined in other manners (e.g., via other wireless transmissions, such as Bluetooth, NFC, etc.; via relative positions to other nearby objects with known absolute locations; etc.); one or more additional images (e.g., non-panoramic perspective images) having additional visual data; motion data for the mobile device, such as from one or more IMU (inertial measurement unit) sensors on the mobile device; geographical directional data, such as from a compass sensor on the mobile device; etc. While the mobile device is referred to in the singular at times herein, it will be appreciated that multiple mobile devices may be used in some embodiments and situations for a given building, such as different mobile devices that capture additional data at different times (e.g., during different data acquisition sessions) and/or at different data capture locations (whether in the same or different rooms or other defined areas as one or more other mobile devices) and/or of different types. In addition, such a mobile device may have various forms in various embodiments, including as a mobile computing device (e.g., a smart phone, a tablet or laptop computer, etc.) that includes computing capabilities and that may be used to perform at least some of the automated operations. Additional details are included below related to mobile devices and types of captured additional data.
The determination of the position of an image acquisition location of a camera device relative to the other data capture location of a mobile device may be performed in various manners in various embodiments, including using visual data of the target image(s) captured at that image acquisition location by the camera device and/or using other additional data captured by the mobile device at that other data capture location.
As a first non-exclusive example of determining the position of an image acquisition location of a camera device relative to the other data capture location of a mobile device, and in situations in which the additional data captured by the mobile device includes one or more additional second images captured at the data capture location, the visual data of one or more first target images captured at the image acquisition location by the camera device may be analyzed and compared to the additional visual data of the one or more second images of the mobile device in order to determine shared visual content (e.g., the same structural wall elements, such as wall portions and/or wall borders, or more generally the same visible features), and the positions of that shared visual content in the first and second images from the different perspectives of the image acquisition location and the other data capture location may be used to determine at least relative inter-location pose information for those two locations. If distance information is available for the first target image(s) and/or the second additional image(s) (such as based on visible objects of known size or in other manners) or is otherwise available (e.g., via a distance measurement to a visible wall element), that information may be used to further determine the distance between those two locations as part of the inter-location pose information for those two locations. Additional details are included below related to determining inter-location pose information for a camera device's image acquisition location and a separate mobile device's other data capture location using overlapping visual coverage of shared visual content.
As a second non-exclusive example of determining the position of an image acquisition location of a camera device relative to the other data capture location of a mobile device, and in situations in which the additional data captured by the mobile device includes one or more additional second images captured at the data capture location, the additional visual data of the one or more second images of the mobile device may be analyzed to determine whether some or all of the camera device is visible in that additional visual data-if so, and if the camera device is at the image acquisition location when the second image(s) are captured (e.g., if the one or more additional second images are captured simultaneously or otherwise concurrently with the capture of the first target image(s) at that image acquisition location, or otherwise at a time when the camera device is at that image acquisition location, such as before or after the capture of the first target image(s)), the position of the camera device in that additional visual data (along with other structural elements visible in that additional visual data) may be used to determine at least relative inter-location pose information for those two locations. If distance information is available for the first target image(s) and/or the second additional image(s) (such as based on visible objects of known size or in other manners) or is otherwise available (e.g., via a distance measurement to a visible wall element), that information may be used to further determine the distance between those two locations as part of the inter-location pose information for those two locations. Additional details are included below related to determining inter-location pose information for a camera device's image acquisition location and a separate mobile device's other data capture location using some or all of the camera device being visible in the additional visual data of the mobile device's additional second images.
As a third non-exclusive example of determining the position of an image acquisition location of a camera device relative to the other data capture location of a mobile device, the visual data of one or more first target images captured by the camera device at the image acquisition location may be analyzed to determine whether some or all of the mobile device is visible in that visual data, whether directly visible or via visibility of a transporter of the mobile device (e.g., a vehicle carrying the mobile device, such as a drone; a human user carrying the mobile device; etc.) that blocks or obscures the mobile device-if so, and if the mobile device (or its transporter) is at the data capture location when the first target image(s) are captured (e.g., if the one or more first target images are captured simultaneously or otherwise concurrently with the additional data capture by the mobile device at that data capture location, or otherwise at a time when the mobile device is at that data capture location, such as before or after the capture of the additional data), the position of the mobile device (or its transporter) in that visual data (e.g., along with other structural elements visible in that visual data) may be used to determine at least relative inter-location pose information for those two locations. If distance information is available for the first target image(s) and/or second additional image(s) captured by the mobile device (such as based on visible objects of known size or in other manners) or is otherwise available (e.g., via a distance measurement to a visible wall element), that information may be used to further determine the distance between those two locations as part of the inter-location pose information for those two locations. Additional details are included below related to determining inter-location pose information for a camera device's image acquisition location and a separate mobile device's other data capture location using some or all of the mobile device (or its transporter) being visible in the visual data of the camera device's first target image(s).
As a fourth non-exclusive example of determining the position of an image acquisition location of a camera device relative to the other data capture location of a mobile device, and in situations in which the additional data captured by the mobile device includes one or more additional second images captured at the data capture location, the additional visual data of the one or more second images of the mobile device may be analyzed to determine whether one or more objects are visible in that additional visual data that have known location data (e.g., objects outside of the building, such as streets, buildings, lawns, pools, trees, other landmarks, etc.; objects inside the building, such as a visible marker or installed device that is placed at a known location or otherwise has an associated known location; etc.)—if so, that known location data for the visible object(s) may be used in combination with the position of the object(s) in that additional visual data (e.g., along with other structural elements visible in that additional visual data) to determine absolute location data for the mobile device (whether in addition to or instead of other location data available to the mobile device, such as via one or more GPS receivers of the mobile device). That absolute location data for the mobile device may then be extended to the image acquisition location of the camera device in one or more manners as previously discussed, or further based on one or more of the object(s) also being visible in the visual data of the target image(s) captured from that image acquisition location. Alternatively, even without such additional second images of the mobile device, the visual data of the target image(s) may be similarly analyzed to determine whether one or more such objects are visible in that visual data that have known location data, and if so, that known location data for the visible object(s) may be used in combination with the position of the object(s) in that visual data (e.g., along with other structural elements visible in that visual data) to determine absolute location data for the camera device (whether in addition to or instead of other location data available to the mobile device, such as via one or more GPS receivers of the mobile device).
In addition, if multiple types of automated operations are performed in a given embodiment and situation for determining the inter-location pose information for a camera device's image acquisition location and a mobile device's other data capture location in different manners, the results of those multiple automated operations may be used in various manners to represent that inter-location pose information. For example, in some such embodiments and situations, the results of a single one of the multiple types of automated operations may be selected and used to represent the inter-location pose information, such as for a result having a highest confidence level, and/or based on a defined priority for some or all of the multiple types of automated operations (e.g., to use the results of a highest priority type of automated operation if it is performed and available, or if not, to use the results of the next highest priority type of automated operation that is performed and available). Alternatively, in some embodiments and situations, the results of multiple types of automated operations may be combined and used together to represent the inter-location pose information, such as by performing an average (e.g., a weighted average using confidence levels associated with particular results for the weighting), by performing a statistical analysis (e.g., including to discard outliers at the low and/or high ends of the confidence values), etc. In addition, in some embodiments and situations, the results of one or more types of automated operations may be used as initial values that are provided as input to one or more additional types of automated operations that update (e.g., refine) those initial values, with the updated results of one or more of the additional types of automated operations used to represent the inter-location pose information.
As noted above, the generation of a partial or complete floor plan for a building may include analyzing the visual data of one or more target images captured by a camera device at one or more image acquisition locations in a room of the building (or other defined area at the building) to determine at least some of the walls of that room that are visible in that visual data and to combine multiple pieces of determined wall data to form a room shape for the surrounding room (or other shape of another defined area)—such a determination of the walls may, for example, include modeling the walls as planar surfaces and/or as groupings of 3D data points, and the resulting determined room shape may be a 3D (three-dimensional) and/or 2D (two-dimensional) room shape based at least in part on the walls and their inter-wall borders, as well as similarly modeling some or all of the floor and/or ceiling (e.g., for 3D room shapes) in at least some embodiments and situations. For example, the described techniques may, in at least some embodiments, include using one or more trained neural networks or other techniques to estimate a 3D room shape shown in one or more such target images—as non-exclusive examples, such 3D room shape estimation may include one or more of the following: using a trained convolutional neural network or other analysis technique to take the target image(s) as input and to estimate a 3D point cloud of the walls and other surfaces of the enclosing room from the visual contents of the target image and/or to estimate a piecewise planar representation (e.g., 3D walls and other planar surfaces) of the enclosing room from the visual contents of the target image(s); using a trained neural network or other analysis technique to take the target image(s) as input and to estimate wireframe structural lines of the enclosing room from the visual contents of the target image (e.g., structural lines to show one or more of borders between walls, borders between walls and ceiling, borders between walls and floor, outlines of doorways and/or other inter-room wall openings, outlines of windows, etc.); using a trained neural network or other analysis technique to detect wall structural elements (e.g., windows and/or sky-lights; passages into and/or out of the room, such as doorways and other openings in walls, stairs, hallways, etc.; borders between adjacent walls; borders between walls and a floor; borders between walls and a ceiling; corners (or solid geometry vertices) where at least three surfaces or planes meet; etc.) in the visual contents of the target image(s) and to optionally detect other fixed structural elements (e.g., countertops, bath tubs, sinks, islands, fireplaces, etc.) and to optionally generate 3D bounding boxes for the detected elements; etc. While the camera device is referred to in the singular at times herein, it will be appreciated that multiple camera devices may be used in some embodiments and situations for a given building, such as different camera devices that acquire different target images at different times (e.g., during different image acquisition sessions and/or at different image acquisition locations, whether in the same or different rooms or other defined areas as one or more other camera devices), different camera devices that acquire different target images at the same time (e.g., during the same image acquisition session and at different or the same image acquisition locations, whether in the same or different rooms or other defined areas as one or more other camera devices), etc.
In addition, in some embodiments and situations, the analysis of the visual data of one or more target images captured by one or more camera devices at one or more image acquisition locations in a room (or other defined area) may be combined with additional room shape data that is determined from analysis of other data captured by one or more mobile devices at one or more other data capture locations in that room (or other defined area), with non-exclusive examples including the following: analyzing additional visual data of additional images captured by the mobile device to determine information about at least some walls of a surrounding room (and optionally some or all of the floor and/or the ceiling), optionally in combination with IMU data to generate a 3D point cloud of at least some of the room shape; analyzing depth data captured by the mobile device using one or more sensors that measure depth or otherwise determine distances to walls or other surrounding objects; etc. In at least some embodiments, the operations of the mobile device may be based at least in part on performing a SLAM (Simultaneous Localization And Mapping) and/or SfM (Structure from Motion) and/or MVS (multiple-view stereovision) analysis, such as by using motion data from IMU sensors on the mobile computing device in combination with visual data from one or more image sensors on the mobile computing device, including in at least some such embodiments to use the additional data captured by the mobile computing device to generate an estimated three-dimensional (“3D”) shape of the enclosing room (e.g., based on a 3D point cloud with a plurality of 3D data points and/or estimated planar surfaces of walls and optionally the floor and/or ceiling)—in some such embodiments, these automated operations are performed without using any depth sensors or other distance-measuring devices about distances from the mobile computing device to walls or other objects in the surrounding room, while in other embodiments the mobile computing device (or other additional associated mobile device) may capture depth data to walls of the surrounding room and use that captured depth data as part of determining the position of the mobile computing device. The automated determination of the position for the mobile computing device may further be performed in some embodiments as part of generating a travel path of the mobile computing device through the enclosing room (e.g., using one or more of a SLAM, SfM and/or MVS analysis), whether instead of or in addition to generating a 3D shape of the enclosing room—in other embodiments, the automated determination of the position for the mobile computing device may be based at least in part on other analyses, such as via Wi-Fi triangulation, Visual Inertial Odometry (“VIO”), etc. Additional details are included below related to determining room shapes and to combining room shapes to form a partial or complete building floor plan.
The described techniques provide various benefits in various embodiments, including to allow partial or complete floor plans of multi-room buildings and other structures to be automatically generated concurrently with the acquisition of one or more target image(s) acquired for the building or other structure, and/or to allow such a partial or complete floor plan to be augmented with information about associated absolute location data, including in some embodiments without having or using information from depth sensors or other distance-measuring devices about distances from images' acquisition locations to walls or other objects in a surrounding building or other structure. Non-exclusive examples of such benefits include the following: the ability to provide feedback during capture of one or more target images acquired for a building or other structure to an operator of the camera device (e.g., to display or otherwise provide an operator user with a determined room shape for an enclosing room, such as part of a partial or complete floor plan for the building or other structure; to cause movement to a different image capture location that provides improved visual data, such as to move near a window in order to obtain visual data of objects external to the building that have known location data; etc.), including to optionally allow the user to determine one or more other areas of the building at which to acquire one or more further target images (e.g., for a partial floor plan, to acquire additional target images in other areas of the building that are not yet represented in the partial floor plan), and such as in a near-time or real-near-time manner relative to the acquisition of the target image (e.g., with initial information determined on a mobile computing device used immediately, such as displayed on the mobile computing device to the user, while also being further supplied to one or more other computing devices such as remote server computing systems for refinement or other updating); the ability to provide other types of feedback to a user or other transporter of a mobile device, such as to move to a different data capture location that provides improved additional data (e.g., to move near a window or outside a doorway in order to obtain stronger GPS signals and/or visual data of objects external to the building that have known location data; to move to a specified area proximate to the camera device, such as within visual range of the camera device or a specific area of such a range, such as centered in the visual coverage of the camera device at a specified range, to enable the mobile device or its transporter to be visible to the camera device, such as for the purpose of determining inter-location pose data between the camera device's image acquisition location and a data capture location of the mobile device at the specified area; etc.); etc. Furthermore, the described automated techniques allow such inter-location pose information and floor plan generation and association with absolute location data to be determined more quickly than previously existing techniques, and in at least some embodiments with greater accuracy, including by using information acquired from the actual building environment (rather than from plans on how the building should theoretically be constructed), as well as enabling the capture of changes to structural elements that occur after a building is initially constructed. Such described techniques further provide benefits in allowing improved automated navigation of a building by mobile devices (e.g., semi-autonomous or fully-autonomous vehicles), including to significantly reduce computing power and time used to attempt to otherwise learn a building's layout and/or location. In addition, in some embodiments the described techniques may be used to provide an improved GUI in which a user may more accurately and quickly obtain information about a building's interior (e.g., for use in navigating that interior), including in response to search requests, as part of providing personalized information to the user, as part of providing value estimates and/or other information about a building to a user, etc. Various other benefits are also provided by the described techniques, some of which are further described elsewhere herein.
As noted above, a building floor plan having associated room shape information for some or all rooms of the building may be generated and used in at least some embodiments, and may have various forms in various embodiments, such as a 2D (two-dimensional) floor map of the building (e.g., an orthographic top view or other overhead view of a schematic floor map that does not include or display height information) and/or a 3D (three-dimensional) or 2.5D (two and a half-dimensional) floor map model of the building that does display height information. Furthermore, in some embodiments, a target image (and optionally additional images) may be acquired outside of one or more buildings, such as in one of multiple separate areas of one or more properties (e.g., for a house, a garden, patio, deck, back yard, side yard, front yard, pool, carport, dock, etc.) that each has a previously or concurrently determined area shape (e.g., a 3D shape, a 2D shape, etc.)—if so, the shape of a surrounding area of the image may similarly be automatically determined and included as part of a building floor plan using the techniques described herein.
As noted above, in at least some embodiments and situations, some or all of the target images acquired for a building may be panorama images that are each acquired at one of multiple acquisition locations in or around the building, such as to generate a panorama image at each such acquisition location from one or more of a video captured at that acquisition location (e.g., a 360° video taken from a smartphone or other mobile device held by a user turning at that acquisition location), or multiple images captured in multiple directions from the acquisition location (e.g., from a smartphone or other mobile device held by a user turning at that acquisition location; from automated rotation of a device at that acquisition location, such as on a tripod at that acquisition location; etc.), or a simultaneous capture of all the image information for a particular acquisition location (e.g., using one or more fisheye lenses), etc. It will be appreciated that such a panorama image may in some situations be represented in a spherical coordinate system and provide up to 360° coverage around horizontal and/or vertical axes (e.g., 360° of coverage along a horizontal plane and around a vertical axis), while in other embodiments the acquired panorama images or other images may include less than 360° of vertical coverage (e.g., for images with a width exceeding a height by more than a typical aspect ratio, such as at or exceeding 21:9 or 16:9 or 3:2 or 7:5 or 4:3 or 5:4 or 1:1, including for so-called ‘ultrawide’ lenses and resulting ultrawide images). In addition, it will be appreciated that a user viewing such a panorama image (or other image with sufficient horizontal and/or vertical coverage that only a portion of the image is displayed at any given time) may be permitted to move the viewing direction within the panorama image to different orientations to cause different subset images (or “views”) to be rendered within the panorama image, and that such a panorama image may in some situations be represented in a spherical coordinate system (including, if the panorama image is represented in a spherical coordinate system and a particular view is being rendered, to convert the image being rendered into a planar coordinate system, such as for a perspective image view before it is displayed). Furthermore, acquisition metadata regarding the capture of such panorama images may be obtained and used in various manners, such as data acquired from IMU sensors or other sensors of a mobile device as it is carried by a user or otherwise moved between acquisition locations-non-exclusive examples of such acquisition metadata may include one or more of acquisition time; acquisition location, such as GPS coordinates or other indication of location; acquisition direction and/or orientation; relative or absolute order of acquisition for multiple images acquired for a building or that are otherwise associated; etc., and such acquisition metadata may further optionally be used as part of determining the images' acquisition locations in at least some embodiments and situations, as discussed further below. Additional details are included below regarding automated operations of device(s) implementing an Image/Data Capture and Analysis (IDCA) system involved in acquiring images and optionally acquisition metadata, including with respect to
As is also noted above, shapes of rooms of a building may be automatically determined in various manners in various embodiments. For example, in at least some embodiments, a Mapping Information Generation Manager (MIGM) system may analyze various images acquired in and around a building in order to automatically determine room shapes of the building's rooms (e.g., 3D room shapes, 2D room shapes, etc.) and to automatically generate a floor plan for the building. As one example, if multiple images are acquired within a particular room, those images may be analyzed to determine a 3D shape of the room in the building (e.g., to reflect the geometry of the surrounding structural elements of the building)—the analysis may include, for example, automated operations to ‘register’ the camera positions for the images in a common frame of reference so as to ‘align’ the images and to estimate 3D locations and shapes of objects in the room, such as by determining features visible in the content of such images (e.g., to determine the direction and/or orientation of the capture device when it took particular images, a path through the room traveled by the capture device, etc., such as by using SLAM techniques for multiple video frame images and/or other SfM techniques for a ‘dense’ set of images that are separated by at most a defined distance (such as 6 feet) to generate a 3D point cloud for the room including 3D points along walls of the room and at least some of the ceiling and floor of the room and optionally with 3D points corresponding to other objects in the room, etc.) and/or by determining and aggregating information about planes for detected features and normal (orthogonal) directions to those planes to identify planar surfaces for likely locations of walls and other surfaces of the room and to connect the various likely wall locations (e.g., using one or more constraints, such as having 90° angles between walls and/or between walls and the floor, as part of the so-called ‘Manhattan world assumption’) and form an estimated room shape for the room. After determining the estimated room shapes of the rooms in the building, the automated operations may, in at least some embodiments, further include positioning the multiple room shapes together to form a floor plan and/or other related mapping information for the building, such as by connecting the various room shapes, optionally based at least in part on information about doorways and staircases and other inter-room wall openings identified in particular rooms, and optionally based at least in part on determined travel path information of a mobile computing device between rooms. Similar techniques may be used as determining inter-location pose information for images captured at multiple locations, as discussed in greater detail elsewhere herein. Additional details are included below regarding automated operations of device(s) implementing an MIGM system involved in determining room shapes and combining room shapes to generate a floor plan, including with respect to
For illustrative purposes, some embodiments are described below in which specific types of information are acquired, used and/or presented in specific ways for specific types of structures and by using specific types of devices-however, it will be understood that the described techniques may be used in other manners in other embodiments, and that the invention is thus not limited to the exemplary details provided. As one non-exclusive example, while house floor plans may be generated in some example that do not include detailed measurements for particular rooms or for the overall houses, it will be appreciated that other types of floor plans or other mapping information may be similarly generated in other embodiments, including for buildings (or other structures or layouts) separate from houses. As another non-exclusive example, while floor plans for houses or other buildings may be used for display to assist viewers in navigating the buildings, generated mapping information may be used in other manners in other embodiments. As yet another non-exclusive example, while some embodiments discuss obtaining and using additional data from a mobile computing device that is separate from a camera device that captures a target image, in other embodiments the one or more devices used in addition to the camera device may have other forms, such as to use a mobile device that acquires some or all of the additional data but does not provide its own computing capabilities (e.g., an additional ‘non-computing’ mobile device), multiple separate mobile devices that each acquire some of the additional data (whether mobile computing devices and/or non-computing mobile devices), etc. In addition, the term “building” refers herein to any partially or fully enclosed structure, typically but not necessarily encompassing one or more rooms that visually or otherwise divide the interior space of the structure-non-limiting examples of such buildings include houses, apartment buildings or individual apartments therein, condominiums, office buildings, commercial buildings or other wholesale and retail structures (e.g., shopping malls, department stores, warehouses, etc.), etc. The term “acquire” or “capture” as used herein with reference to a building interior, acquisition location, or other location (unless context clearly indicates otherwise) may refer to any recording, storage, or logging of media, sensor data, and/or other information related to spatial and/or visual characteristics and/or otherwise perceivable characteristics of the building interior or subsets thereof, such as by a recording device or by another device that receives information from the recording device. As used herein, the term “panorama image” may refer to a visual representation that is based on, includes or is separable into multiple discrete component images originating from a substantially similar physical location in different directions and that depicts a larger field of view than any of the discrete component images depict individually, including images with a sufficiently wide-angle view from a physical location to include angles beyond that perceivable from a person's gaze in a single direction (e.g., greater than 120° or 150° or 180°, etc.). The term “sequence” of acquisition locations, as used herein, refers generally to two or more acquisition locations that are each visited at least once in a corresponding order, whether or not other non-acquisition locations are visited between them, and whether or not the visits to the acquisition locations occur during a single continuous period of time or at multiple different times, or by a single user and/or device or by multiple different users and/or devices. In addition, various details are provided in the drawings and text for exemplary purposes, but are not intended to limit the scope of the invention. For example, sizes and relative positions of elements in the drawings are not necessarily drawn to scale, with some details omitted and/or provided with greater prominence (e.g., via size and positioning) to enhance legibility and/or clarity. Furthermore, identical reference numbers may be used in the drawings to identify similar elements or acts.
In the illustrated embodiment, the IDCA system 150 obtains target images 155 captured at each of one or more image acquisition locations in each of one or more buildings by one or more camera devices 184, such as 360° panorama images captured using one or more camera devices that are designed to simultaneously capture 360° of horizontal visual coverage or that otherwise have one or more lens used in the aggregate to capture 360° of horizontal visual coverage. The IDCA system further obtains additional data 155 captured at each of one or more other data capture locations in each of the one or more buildings by one or more mobile data capture devices 185 that move independently from the camera devices 184 (e.g., are not mounted to or otherwise physically coupled together, such as to enable either device to be moved without moving the other device), with the illustrated mobile data capture device 185 in this example being a mobile computing device that includes computing capabilities—the additional captured data includes GPS location data (e.g., using one or more GPS receiver sensors 134) and/or other absolute location data, and may further include, for example, additional images (e.g., using one or more imaging systems 135), geographical direction data (e.g., using compass sensor 148c), device motion data (e.g., using one or more sensor modules 148, such as part of IMU sensors), etc.
The BFPGLDM system 140 obtains the images and other captured data 155 from the IDCA system 150 and uses it to determine absolute location data 156 for one or more positions in each of the buildings (e.g., extracts such data 156 from the captured data 155, determines such data 156 via analysis of the captured data 155, etc.), although in other embodiments the system 140 may directly control the capture of some or all such data, whether in addition to or instead of the IDCA system. The BFPGLDM system 140 further uses the visual data of captured images to determine room shapes of surrounding rooms, optionally in combination with some of the additional captured data (e.g., device motion data for the mobile data capture device), and combines the determined room shapes to generate associated building floor plans 165, such as by using corresponding functionality of the MIGM system 160, although in other embodiments may directly control some or all such generation of building floor plans, whether in addition to or instead of the MIGM system. The BFPGLDM system 140 also automatically determines particular GPS location data or other absolute location data to associate with each generated floor plan 159, whether during or after the floor plan generation, including in at least some embodiments by automatically determining information 157 about the camera device(s)′ image acquisition locations and the mobile data capture device(s)′ data capture locations (e.g., from analysis of the individual data captured by each device), determining absolute location data for at least some of the data capture locations, and further determining inter-location relative position data 158 for image acquisition locations and other data capture locations (e.g., between pairs of one such image acquisition location and one such data capture location that are associated, such as by being in the same room or other area, having overlapping acquired visual data, etc.), with that information 157 and 158 used to extend the absolute location data from a data capture location of the mobile data capture device to an image acquisition location of the camera device and its surrounding room shape.
In at least some embodiments and situations, the automated determinations by the BFPGLDM system 140 (and by the IDCA system and/or the MIGM system if the BFPGLDM system uses their functionality for data capture and floor plan generation, respectively) are performed concurrently with the data capture (e.g., in a real-time or near-real-time manner, such as within milliseconds, seconds, minutes, etc. of the data capture), including to generate partial building floor plans (e.g., to incrementally expand a floor plan with the room shape for each room in which the images and additional data are captured), and to optionally use such partial building floor plans and/or other acquired and generated data to provide feedback to one or more operator users of the camera device(s) and/or mobile data capture device(s), including in some embodiments and situations to display corresponding information in a GUI shown on a mobile data capture computing device. The BFPGLDM system 140 may optionally further use supporting information supplied by system operator users via computing devices 105 over intervening computer network(s) 199 in some embodiments and situations.
The IDCA system 150 and/or MIGM system 160 may in some embodiments execute on the same server computing system(s) 180 as the BFPGLDM system (e.g., with all systems being operated by a single entity or otherwise being executed in coordination with each other, such as with some or all functionality of all the systems integrated together), and in some embodiments the IDCA system 150 and/or MIGM system 160 may operate on one or more other systems separate from the system(s) 180 (e.g., on one or more mobile data capture devices 185 and/or other computing systems, not shown), whether instead of or in addition to the copies of those systems executing on the system(s) 180 (e.g., to have a copy of the MIGM system 160 executing on the device 185 to incrementally generate at least partial building floor plans as building images are acquired by the IDCA system 160 executing on the device 185 and/or by that copy of the MIGM system, while another copy of the MIGM system optionally executes on one or more server computing systems to generate a final complete building floor plan after all images are acquired; etc.). In the illustrated embodiment, client applications 154 for one or more of the BFPGLDM system and/or the IDCA system and/or the MIGM system may execute on the capture devices 185 (and in other embodiments and situations, some or all of the entire BFPGLDM system and/or the IDCA system and/or the MIGM system may execute on some or all mobile devices 185, such as in a distributed manner), and a BFPGLDM client application or other building information viewer system (not shown) may execute on one or more user client devices 175. In addition, building information may in some embodiments be obtained by the BFPGLDM system in manners other than via IDCA and/or MIGM systems (e.g., if such IDCA and/or MIGM systems are not part of the BFPGLDM system), such as to receive building images and/or other data from other sources, and/or to generate floor plans without using the MIGM system. Other data 143 may also be optionally stored and used by the system 140, including about users of capture devices 185 and/or camera devices 184 and/or other client devices 175 (e.g., as part of associated accounts at the BFPGLDM system), such as preference-related data (e.g., for use in personalizing information and/or functionality provided to the user, including feedback related to the data capture activities). Additional details related to the automated operations of the BFPGLDM system are included elsewhere herein, including with respect to
Various components of the mobile data capture computing device 185 are also illustrated in
One or more users (e.g., end-users, not shown) of one or more mobile client devices 175 may further interact over one or more computer networks 199 with the BFPGLDM system 140 (and optionally the IDCA system 150 and/or MIGM system 160), and/or with some or all of the BFPGLDM system executing on that device 175 (not shown), such as to participate in acquiring additional images in or around a building using one or more cameras of the device 175 or otherwise providing user-supplied information, displaying received building data, etc. Such mobile devices 175 may each execute a BFPGLDM client application or other building information viewer system (not shown) that is used to interact with the BFPGLDM system to request and receive building information, to present such received building information and/or other received information on that mobile device (e.g., as part of a GUI displayed on that mobile device), and further optionally receive and respond to interactions by one or more users with the presented information (e.g., with displayed user-selectable controls, such as part of the generated visual data enhancements), as discussed in greater detail elsewhere herein, including with respect to
In the depicted computing environment of
As noted above, the IDCA system may perform automated operations involved in generating multiple 360° panorama images at multiple associated image acquisition locations (e.g., in multiple rooms or other locations within a building or other structure and optionally around some or all of the exterior of the building or other structure), such as using visual data acquired via one or more camera devices 184, and for use in generating and providing a representation of an interior of the building or other structure. For example, in at least some such embodiments, such techniques may include using one or more such camera devices (e.g., a camera having one or more fisheye lenses and/or other lenses and mounted on a rotatable tripod or otherwise having an automated rotation mechanism; a camera having sufficient fisheye lenses and/or other lenses to acquire 360° horizontally without rotation; a camera of a smartphone or separate device held by or mounted on a user or the user's clothing and using one or more non-fisheye lenses, such as wide-angle rectilinear lenses and/or telephoto lenses and/or macro lenses and/or standard lenses; etc.) to acquire data from a sequence of multiple acquisition locations within multiple rooms of a house (or other building), and to optionally further acquire data involved in movement of the capture device (e.g., movement at an acquisition location, such as rotation; movement between some or all of the acquisition locations, such as for use in linking the multiple acquisition locations together; etc.), in at least some cases without having distances between the acquisition locations being measured or having other measured depth information to objects in an environment around the acquisition locations (e.g., without using any depth-sensing sensors). After an acquisition location's information is acquired, the techniques may include producing a 360° panorama image from that acquisition location with 360° of horizontal information around a vertical axis (e.g., a 360° panorama image that shows the surrounding room in an equirectangular format), and then providing the panorama images for subsequent use by the MIGM and/or BFPGLDM systems.
Additional details related to embodiments of a system providing at least some such functionality of an IDCA system are included in U.S. Non-Provisional patent application Ser. No. 16/693,286, filed Nov. 23, 2019 and entitled “Connecting And Using Building Data Acquired From Mobile Devices” (which includes disclosure of an example BIDCA system that is generally directed to obtaining and using panorama images from within one or more buildings or other structures); in U.S. Non-Provisional patent application Ser. No. 16/236,187, filed Dec. 28, 2018 and entitled “Automated Control Of Image Acquisition Via Use Of Acquisition Device Sensors” (which includes disclosure of an example IDCA system that is generally directed to obtaining and using panorama images from within one or more buildings or other structures); and in U.S. Non-Provisional patent application Ser. No. 16/190,162, filed Nov. 14, 2018 and entitled “Automated Mapping Information Generation From Inter-Connected Images”; each of which is incorporated herein by reference in its entirety.
In addition, a floor plan (or portion of it) may be linked to or otherwise associated with one or more additional types of information, such as one or more associated and linked images or other associated and linked information, including for a two-dimensional (“2D”) floor plan of a building to be linked to or otherwise associated with a separate 2.5D model floor plan rendering of the building and/or a 3D model floor plan rendering of the building, etc., and including for a floor plan of a multi-story or otherwise multi-level building to have multiple associated sub-floor plans for different stories or levels that are interlinked (e.g., via connecting stairway passages) or are part of a common 2.5D and/or 3D model. Accordingly, non-exclusive examples of an end-user's interactions with a displayed or otherwise generated 2D floor plan of a building may include one or more of the following: to change between a floor plan view and a view of a particular image at an acquisition location within or near the floor plan; to change between a 2D floor plan view and a 2.5D or 3D model view that optionally includes images texture-mapped to walls of the displayed model; to change the horizontal and/or vertical viewing direction from which a corresponding subset view of (or portal into) a panorama image is displayed, such as to determine a portion of a panorama image in a 3D coordinate system to which a current user viewing direction is directed, and to render a corresponding planar image that illustrates that portion of the panorama image without the curvature or other distortions present in the original panorama image; etc. Additional details regarding example embodiments of systems to provide or otherwise support at least some functionality of a building information viewer system and routine as discussed herein, including to display various types of information related to a building of interest and such as by a BIIP (Building Information Integrated Presentation) system and/or an ILTM (Image Locations Transition Manager) system and/or a BMLSM (Building Map Lighting Simulation Manager) system, are included in U.S. Non-Provisional patent application Ser. No. 16/681,787, filed Nov. 12, 2019 and entitled “Presenting Integrated Building Information Using Three-Dimensional Building Models,” in U.S. Non-Provisional patent application Ser. No. 16/841,581, filed Apr. 6, 2020 and entitled “Providing Simulated Lighting Information For Three-Dimensional Building Models,” and in U.S. Non-Provisional patent application Ser. No. 15/950,881, filed Apr. 11, 2018 and entitled “Presenting Image Transition Sequences Between Acquisition locations,” each of which is incorporated herein by reference in its entirety. In addition, while not illustrated in
In operation, a camera device 184 arrives at a first acquisition location 210A within a first room of the building interior (in this example, in a living room accessible via an external door 190-1), and acquires a target image with a view of a portion of the building interior that is visible from that acquisition location 210A (e.g., some or all of the first room, and optionally small portions of one or more other adjacent or nearby rooms, such as through doorway wall openings, non-doorway wall openings, hallways, stairways or other connecting passages from the first room). Similarly, mobile device 185 arrives at one or more different data capture locations in the first room at which it acquires additional data, such as additional images and GPS location data, such as is discussed further with respect to
After the first acquisition location 210A has been acquired, the camera device 184 may be moved or move under its own power to a next acquisition location (such as acquisition location 210B), and the mobile device 185 may similarly be moved or move under its own power to a next data capture location, optionally recording images and/or video and/or other data from the hardware components (e.g., from one or more IMUs, from the camera, etc.) during movement between locations. At the next acquisition location, the camera device(s) 184 may similarly acquire a 360° target panorama image and/or other type of target image from that acquisition location, and the mobile device(s) 185 may similarly capture additional data from one or more next data capture locations. This process may repeat for some or all rooms of the building 198 and in some cases parts of the property 183 external to the building, as illustrated for additional acquisition locations 210C-210P in this example, including in this example to acquire target panorama image(s) and associated additional other data on an external deck or patio or balcony area 186, on a larger external back yard or patio area 187a, in a separate side yard area 187b, near or in an external additional accessory structure area 189 (e.g., a garage, shed, accessory dwelling unit, greenhouse, gazebo, car port, etc.) that may have one or more rooms, in a front yard 187c outside the external doorway 190-1 (e.g., during a different acquisition session than used to acquire some or all of the other target images, such as with images for acquisition locations 210A to 210-O being acquired in a single image acquisition session in a substantially continuous manner that occurs within a period of time such as 5 minutes or 15 minutes or 30 minutes), and in other embodiments and situations from further acquisition locations (not shown) on an adjoining street or road 181 and/or sidewalk 182, from one or more overhead locations (e.g., from a drone, airplane, satellite, etc., not shown), etc. The acquired images for each acquisition location may also be further analyzed, including in some embodiments to render or otherwise place each panorama image in an equirectangular format, whether at the time of image acquisition or later, as well as further analyzed by the MIGM and/or BFPGLDM systems in the manners described herein.
As the mobile device moves through the building, it may receive GPS signals 178 at some or all data capture locations and associate corresponding GPS location data with each such data capture location, although in some embodiments and situations the mobile device may not be able to receive the GPS signals at some data capture locations and may instead perform other actions to determine absolute location data for such data capture locations—for example, in some embodiments and situations, the mobile device may further interact with one or more devices 235 in the home as part of determining absolute location data, such as to receive a wireless transmission from a device 235a having an associated absolute location (e.g., a Bluetooth beacon, a Wi-Fi transmitter, etc.), to identify a device 235 or other visible information (e.g., a marker on a wall) having an associated absolute location in visual data of the mobile device's second image(s), etc., and to further use such location data as part of determining the absolute location data for one or more data capture locations of the mobile device, whether in addition to or instead of using GPS location data. In at least some such embodiments, however, the camera device does not have a GPS receiver and does not receive any of the GPS signals, or may instead receive some GPS signals but not with sufficient data to determine its own GPS location with a sufficient degree of accuracy (e.g., below a defined distance or uncertainty threshold). In at least some embodiments, one or more additional devices may also be present at the building (e.g., a drone device 179 inside or outside the building) that also receives the GPS signals 178 and optionally acquires further visual data, and if so, the further visual data and associated GPS location data (or other absolute location data) captured at further locations by such one or more additional device may be used in combination with the camera device(s) and mobile device(s) to determine room shapes and to extend absolute location data to image acquisition locations and/or to floor plans (e.g., to room shapes within floor plans), such as to further use triangulation as part of determining absolute location data for image acquisition locations in the manners described herein.
Various details are provided with respect to
In particular,
In particular, information 255d1 illustrates features of the northeast portion of the living room that are visible in multiple second images captured along path 116, and information 255d2 further illustrates similar information about features in the northwest portion of the living room that are visible in the same or other second images captured along path 116, with various example features shown (e.g., corners 195-1 and 195-2, windows 196-1 and 196-2, etc.). As part of the automated analysis of the second images using the SLAM and/or MVS and/or SfM techniques, information about planes 286e and 286f corresponding to portions of the northern wall of the living room may be determined from the features that are detected, and information 287e and 285f about portions of the east and west walls of the living room may be similarly determined from corresponding features identified in the images. In addition to identifying such surface plane information for detected features (e.g., for each point in a determined sparse 3D point cloud from the image analysis), the SLAM and/or MVS and/or SfM techniques may further determine information about likely acquisition pose (locations and orientations/directions) 220 for the second image(s) captured from location 240a (e.g., pose location 220g and optionally direction 220e in information 255d1, and corresponding pose location 220g and optionally direction 220f in information 255d2), and likely acquisition pose (locations and orientations/directions) 222 for the second image(s) captured from location 240c (e.g., pose location 222g and optionally direction 222e in information 255d1, and corresponding pose location 220g and optionally direction 222f in information 255d2). While only features for part of the living room are illustrated in information 255d1 and 255d2, it will be appreciated that the other portions of the images corresponding to other portions of the living room may be analyzed in a similar manner, in order to determine possible information about possible planes for the various walls of the room, as well as for other features (not shown) in the living room. In addition, similar analyses may be performed between some or all other images captured in the living room, resulting in a variety of determined feature planes from the various image analyses that may correspond to walls of the room.
Information 255d3 further illustrates information about a variety of determined feature planes that may correspond to the west and north walls of the living room, from analyses of images captured from at least locations 240a and 240c, and optionally further from images captured at other locations (e.g., from image location 210B by the camera device). The illustrated plane information includes determined planes 286g near or at the northern wall (and thus corresponding possible locations of the northern wall), and determined planes 285g near or at the western wall (and thus corresponding possible locations of the western wall). In this example, there are a number of variations in different determined planes for the northern and western walls from different features detected in the analysis of the images, such as differences in position, angle and/or length, causing uncertainty as to the actual exact position and angle of each of the walls. While not illustrated in information 255d3, it will be appreciated that similar determined feature planes for the other walls of the living room may similarly be detected, along with determined feature planes corresponding to features that are not along the walls (e.g., furniture). Information 255d4 further illustrates additional determined feature planes that may correspond to the west and north walls of the living room, from analyses of various other images captured at the image acquisition locations along the path 115 in the living room by the camera device and/or at the data capture locations along the path 116 in the living room by the mobile device—in this example, the analyses of the further images provides even greater variations in different determined planes for the northern and western walls in this example. The information 255d4 further illustrates additional determined information that is used to aggregate information about the various determined feature planes in order to identify likely locations 295a and 295b of the west and north walls, as illustrated in information 255d5. In particular, information 255d4 includes indications 291a of normal orthogonal directions for some of the determined feature planes corresponding to the west wall, along with additional information 290a about those determined feature planes. In the example embodiment, the determined feature planes are clustered to represent hypothesized wall locations of the west wall, and the information about the hypothesized wall locations is combined to determine the likely wall location 295a, such as by weighting information from the various clusters and/or the underlying determined feature planes. In at least some embodiments, the hypothesized wall locations and/or normal information are analyzed via use of machine learning techniques to determine the resulting likely wall location, optionally by further applying assumptions or other constraints (such as a 90° corner, as illustrated 282 in information 255d3, and/or having flat walls) as part of the machine learning analysis or to results of the analysis. Similar analysis may be performed for the north wall using information 290b about corresponding determined feature planes and additional information 291b about resulting normal orthogonal directions for at least some of those determined feature planes. The resulting likely wall locations 295a and 295b for the west and north walls of the living room, respectively, are shown in information 255d5.
While not illustrated in
Additional details related to embodiments of a system providing at least some such functionality of an MIGM system or related system for generating floor plans and associated information and/or presenting floor plans and associated information, and/or of a system providing at least some such functionality of an BFPGLDM system or related system for determining acquisition positions of images, are included in U.S. Non-Provisional patent application Ser. No. 16/190,162, filed Nov. 14, 2018 and entitled “Automated Mapping Information Generation From Inter-Connected Images” (which includes disclosure of an example Floor Map Generation Manager, or FMGM, system that is generally directed to automated operations for generating and displaying a floor map or other floor plan of a building using images acquired in and around the building); in U.S. Non-Provisional patent application Ser. No. 16/681,787, filed Nov. 12, 2019 and entitled “Presenting Integrated Building Information Using Three-Dimensional Building Models” (which includes disclosure of an example FMGM system that is generally directed to automated operations for displaying a floor map or other floor plan of a building and associated information); in U.S. Non-Provisional patent application Ser. No. 16/841,581, filed Apr. 6, 2020 and entitled “Providing Simulated Lighting Information For Three-Dimensional Building Models” (which includes disclosure of an example FMGM system that is generally directed to automated operations for displaying a floor map or other floor plan of a building and associated information); in U.S. Non-Provisional patent application Ser. No. 17/080,604, filed Oct. 26, 2020 and entitled “Generating Floor Maps For Buildings From Automated Analysis Of Visual Data Of The Buildings' Interiors” (which includes disclosure of an example Video-To-Floor Map, or VTFM, system that is generally directed to automated operations for generating a floor map or other floor plan of a building using video data acquired in and around the building); in U.S. Provisional Patent Application No. 63/035,619, filed Jun. 5, 2020 and entitled “Automated Generation On Mobile Devices Of Panorama Images For Buildings Locations And Subsequent Use”; in U.S. Non-Provisional patent application Ser. No. 17/069,800, filed Oct. 13, 2020 and entitled “Automated Tools For Generating Building Mapping Information”; in U.S. Non-Provisional patent application Ser. No. 16/807,135, filed Mar. 2, 2020 and entitled “Automated Tools For Generating Mapping Information For Buildings” (which includes disclosure of an example MIGM system that is generally directed to automated operations for generating a floor map or other floor plan of a building using images acquired in and around the building); in U.S. Non-Provisional patent application Ser. No. 17/013,323, filed Sep. 4, 2020 and entitled “Automated Analysis Of Image Contents To Determine The Acquisition Location Of The Image” (which includes disclosure of an example Image Location Mapping Manager, or ILMM, system that is generally directed to automated operations for determining acquisition positions of images); and in U.S. Provisional Patent Application No. 63/117,372, filed Nov. 23, 2020 and entitled “Automated Determination Of Image Acquisition Locations In Building Interiors Using Determined Room Shapes” (which includes disclosure of an example Building Floor Plan Generation and Location Determination Manager, or BFPGLDM, system that is generally directed to automated operations for determining acquisition positions of images); each of which is incorporated herein by reference in its entirety.
Various details have been provided with respect to
As a non-exclusive example embodiment, the automated operations of the BFPGLDM system may include the following operations to determine acquisition positions (e.g., acquisition locations and optionally acquisition orientations) of target panorama images acquired by a camera device, such as for use in determining inter-location pose data, by using visual data of the target panorama images and additional data acquired by an accompanying mobile computing device (a mobile smart phone computing device, or ‘phone’, in this example). The various operations may, for example, include one or more of the following:
The various operations have various strengths and weaknesses—for example, room shape matching might not work completely in crowded or unusually-shaped rooms; and motion analysis can introduce location uncertainties and fail to provide camera orientation. By using multiple localization techniques together, benefits can be achieved, including to use different techniques in different situations, and to use results of some techniques as initial estimates that are updated by other techniques (e.g., using motion pattern matching and/or camera marker recognition as initial estimates used by optimization-based techniques such as depth/point cloud matching and RGB feature matching). In addition, multiple candidate results and/or confidence information from each of multiple techniques can be used to combine results from the multiple techniques (e.g., to discard results with lowest confidence from one or more techniques; to use statistical analysis combine results, such as discarding outliers or choosing a median; etc.).
After the automated acquisition position determination operations are performed for each of multiple target panorama images acquired for a building, and if applicable coordinate system mappings have been identified to allow multiple coordinate systems to be combined into a single coordinate system, then the entire set of panorama localization (6 degrees of freedom each) and coordinate system mappings (up to 5 or 6 degrees of freedom for each pair) can be combined into a small set of global systems (one per disconnected set of coordinate systems). Additional constraints or objective functions can also be applied based on knowledge or assumptions about the overall floor plan geometry, such as room non-intersection or door matching, and optimization-based techniques can optionally be employed again to optimize each of these near-global systems simultaneously, providing an improved set of global results. Such global optimization activities and resulting information can be updated each time a new target panorama image is added.
Once such information is determined for such target panorama images, the information may be used in a variety of manners, such as one or more of the following:
As another non-exclusive example embodiment, the automated operations of the BFPGLDM system may include the following actions. Begin with one or more target images with RGB visual data (but no separate depth data), optionally with further acquisition metadata for one or more of the target images that may include image capture time stamps, image room tags (e.g., supplied by a user who captured a target image for its enclosing room), etc. The automated operations may include doing pre-processing on the target image(s) to solve camera intrinsic and extrinsic if needed, such as to detect image vanishing lines and vanishing point, extract (if perspective image) camera focal length and field of view angle, solve camera roll pitch yaw angles relative to vanishing lines presented in the images, and re-project image into spherical space (with new camera pose leveled relative to the floor plane).
The automated operations may further include generating geometry predictions for each target image, including the following: estimating room shape geometry of the indoor structure in which the camera is located (e.g., using a convolutional-neural-network-based room shape estimator, such as HorizonNet and DuLaNet, to approximate room shape geometry to 3D shapes with uniform room height, with the camera can be found in the origin of this shape); optionally using an image structural wireframe estimator (e.g., LCNN) to predict image structural lines and projecting these lines in image 3D space as room corner candidates; using an object detection algorithm on the image to generate 2D object bounding boxes with labels and object image descriptor and then ray casting 2D image bounding boxes onto previously estimated 3D room shapes and generating footprints of 3D objects to represent their spatial information of objects, as well as using 3D bounding box generation algorithms; optionally generating image embedding vectors (e.g., using deep neural networks models) for later use in comparing image content similarities and image overlaps; and optionally tagging the image with one or more room types (e.g., bedroom, kitchen, etc.).
The automated operations may further include generating image-to-image relations between each target image to one or more additional images, including the following: optionally using a feature-based image matching algorithm between the pair of images, such as SfM to solve image angular connections or pairwise image location information (e.g., which direction in image A is connecting to which direction in image B); and optionally using a deep learning-based image co-visibility algorithm between the pair of images to determine image content similarity (e.g., for later use with an assumption that images sharing high co-visibility scores have a high chance to be close to each other spatially).
The automated operations may further include retrieving a set of room shapes candidates on which to attempt to localize each target image in order to determine a precise acquisition location of the target image—the room shape candidates may be obtained from room shapes estimated for a set of spatially-related additional images. Various heuristics may be used to generate binary relations between a pair of a target image and an additional image or between a target image and an area in existing floor plan, including the following: use similarity/overlaps between room type tags for the target image and paired image/area (if available, such as by created by automated image classification algorithm and/or photographer and/or subsequent annotator) to aggregate a list of preferred candidate room shapes; use the temporal relation between images (if image capture time stamp metadata is available) to retrieve a set of temporally-related additional images; use a feature-matching-based image alignment algorithm to generate pairwise or groupwise image co-relations (e.g., image relative angle or binary image co-relation); use a neural-network-based image comparison algorithm to generate pairwise image to image co-relation; to use IMU metadata collected during the image capture process (if available) to give image angular connections; and use SLAM-based camera tracking algorithm (if SLAM data available) to produce image spatial relation.
The automated operations may further include performing geometry matching for each target image to one or more candidate room shapes, to match the target image's estimated room shape to a corresponding determined room shape for a room on a floor plan or to a corresponding estimated room shape for an additional image, and localize a target image to a single room shape (e.g., to produce one or more camera pose acquisition positions for the target image, optionally along with a confidence score for each camera pose). The automated operations generally include the following: proposing a number of shape matching options (which is based on target image camera pose in the candidate room shape space); compute a score for each of the proposed camera poses (proposed shape matching position); select the camera pose with the highest score or use threshold to pick multiple camera poses; and refine the one or more selected camera poses.
The proposing of the various shape matching options may include assuming that 2 room shapes have the same scale if they are captured by the same camera at the same height (such as for one or more target images and one or more additional images that are concurrently captured during the same period of time). Corners of the room shapes are used to generate a collection of corner snapping options (alternative shape matches) between the target image's existing room shape and candidate room shape, with different shape orientations. The shape orientations are generated by snapping the horizontal vanishing angle of target image to the vanishing angle of paired additional or existing image or candidate room shape. So, if there are M predicted room corners in target image, N room corners in candidate room shape, and 4 vanishing directions from the target image and the paired additional or existing image, M*N*4 camera poses are proposed for the target image. When 2 images are captured with inconsistent camera height, a camera pose can be proposed by selecting 2 control corners from each shape, and using that to generate proposed scale and xyz, with the vanishing angle alignment used to correct the proposed camera angle.
The computing of a score for each of the proposed camera poses (proposed shape matching position) may include combining multiple individual scores given the proposed camera pose (e.g., taking the weighted sum of each individual score, extracting a descriptor from each of these terms and use machine learning model to generate the final score, etc.). Individual scores may include one or more of the following: a corner re-projection score, in which the candidate room shape is re-projected into the target image space, the projected room corners from candidate room shape are compared with room corners from original target image existing room shape, and each target room corner is matched with its nearest candidate room shape corner, using the distance of each matching corner pair and the number of matches to generate the corner re-projection score (e.g., with the closer the match, the higher the score); a wireframe structural line re-projection score, in which the candidate room shape's structural lines are re-projected into the target image space, the projected structural lines from the candidate room shape are compared with the structural lines from the target image estimated room shape, and each target image structural line is matched with its nearest candidate room shape structural line, using the distance of each matching structural line pair and the number of matches to generate the wireframe structural line re-projection score (e.g., with the closer the match, the higher the score); a structural wall element object re-projection score, in which the candidate room shape's 3D object bounding boxes from the candidate room shape are re-projected into the target image estimated room space, the projected object bounding boxes from the candidate room shape are compared with the object bounding boxes from the target image estimated room shape, and each target image object bounding box is matched with its nearest candidate room shape object bounding box, using the distance of each matching object bounding box pair based on an intersection-over-union and the consistency of object type tags; an image angular score, in which the departure/landing angle starting from target image to additional/existing image is generated, in which a separate departure/landing angle is also generated for each pair of images using a different technique (e.g., SfM, convolutional neural network, etc.), and in which the score is computed by comparing these 2 sets of angles (e.g., with the bigger the discrepancy, the more penalty in this score); an image content matching score, in which the image content similarity for a given image pair is generated (e.g., using a convolutional neural network); and a shape-based boundary intersection score, in which structural walls of the candidate room shape are re-projected in the 3D space of target image, and the mismatch between the structural walls of the projected room shape and of the target image estimated room shape are used to evaluate the proposed camera pose.
The refining of the one or more selected camera poses may include using an initial camera pose for the target image from the previous operations (e.g., using corner point matching), and refining the camera pose using one or a combination of multiple steps. The steps may include one or more of the following: performing an alignment using corner inliers, in which a distance threshold is used to filter all the matching pairs from the previous corner matching operations within a certain re-projection image distance (with the resulting corner pairs called corner inliers), and weighted least squares is used to find the best camera position xyz, with confidence scores from the predicted corners of the target image's estimated room shape (e.g., as generated by a neural network model) used as weights in the weighted least square regression to generate a more accurate camera position than the previous camera pose; performing an alignment using line matching of wireframe structural line predictions for the target image and for the candidate room shape (e.g., between horizontal lines on the floor), such as with a distance threshold used to filter all the matching lines from the previous line matching operations within a certain re-projection image distance (with the resulting line pairs called line inliers), and weighted least squares used to find the best camera position xyz, with confidence scores from the predicted structural lines of the target image's estimated room shape (e.g., as generated by a neural network model) used as weights in the weighted least square regression to generate a more accurate camera position than the previous camera pose; and performing a differentiable rendering optimization method using image normal predictions, where camera pose is optimized for a lower cost function value, by rendering the pixel-level surface normal information for the candidate room shape in the target image space starting from an initial camera pose guess, comparing the rendered surface normal with surface normal estimated from the target image in its image space (e.g., using a neural-network-based method like Taskonomy), and computing a cost value, to optimize camera pose by iteration until the cost value reaches a local minimum.
Various details have been provided above with respect to these example non-exclusive embodiments, but it will be appreciated that the provided details are included for illustrative purposes, and other embodiments may be performed in other manners without some or all such details.
The server computing system(s) 180 and executing BFPGLDM system 140, and server computing system(s) 380 and executing IDCA and MIGM systems 150 and 160, and data capture devices 185 and executing software 154, and mobile devices 175 and executing software 396 may communicate with each other and with other computing systems and devices in this illustrated embodiment, such as via one or more networks 199 (e.g., the Internet, one or more cellular telephone networks, etc.), including to interact with optional other navigable devices 395 that receive and use floor plans and optionally other generated information for navigation purposes (e.g., for use by semi-autonomous or fully autonomous vehicles or other devices), and for capture devices 185 to communicate with building devices 235 (e.g., using communication and/or sensor components to receive transmissions from transmitter devices and/or to otherwise communicate with other building devices, such as electronic lockboxes or locks, smart home devices, etc.). The mobile devices 175 in this example embodiment are illustrated as including one or more displays 392 on which to present provide building information from the BFPGLDM system, and optionally other components 394 (e.g., computing resources, I/O components, sensors, etc.). Some of the described functionality may be combined in less computing systems in other embodiments, such as to combine some or all of the BFPGLDM system 140 with a building information viewer system 396 in a single system or device (e.g., a mobile device 175), to combine the BFPGLDM system 140 and the data capture functionality of device(s) 185 in a single system or device, to combine the IDCA and MIGM systems 150 and 160 and the data capture functionality of device(s) 185 in a single system or device, to combine the BFPGLDM system 140 and one or both of the IDCA and MIGM systems 150 and 160 in a single system or device, to combine the BFPGLDM system 140 and the IDCA and MIGM systems 150 and 160 and the data capture functionality of device(s) 185 in a single system or device, etc.
In the illustrated embodiment, an embodiment of the BFPGLDM system 140 executes in memory 330 of the server computing system(s) 180 in order to perform at least some of the described techniques, such as by using the processor(s) 305 to execute software instructions of the system 140 in a manner that configures the processor(s) 305 and computing system 180 to perform automated operations that implement those described techniques. The illustrated embodiment of the BFPGLDM system may include one or more components (not shown), such as to each perform portions of the functionality of the BFPGLDM system, and the memory may further optionally execute one or more other programs 335—as one specific example, a copy of the IDCA and/or MIGM systems may execute as one of the other programs 335 in at least some embodiments, such as instead of or in addition to the IDCA and/or MIGM systems 150 and 160 on the server computing system(s) 380, and/or a copy of a building information viewer system may execute as one of the other programs 335 (e.g., if the computing system(s) 180 are the same as a mobile device 175). The BFPGLDM system 140 may further, during its operation, store and/or retrieve various types of data on storage 320 (e.g., in one or more databases or other data structures), such as acquired images/data 155, building floor plans and determined room shapes and associated wall element information 165, acquired absolute location data 156, data 157 and 158 about image acquisition locations and data capture locations (including inter-location pose data), generated floor plans and/or other mapping information and associated absolute location data 159 (e.g., generated and saved 2.5D and/or 3D models, building and room dimensions for use with associated floor plans, additional images and/or annotation information, etc.), and/or various types of optional other information 329 (e.g., various analytical information related to presentation or other use of one or more building interiors or other environments).
In addition, embodiments of the IDCA and MIGM systems 150 and 160 execute in memory 385 of the server computing system(s) 380 in the illustrated embodiment in order to perform techniques related to generating panorama images and floor plans for buildings, such as by using the processor(s) 381 to execute software instructions of the systems 150 and/or 160 in a manner that configures the processor(s) 381 and computing system(s) 380 to perform automated operations that implement those techniques. The illustrated embodiment of the IDCA and MIGM systems may include one or more components, not shown, to each perform portions of the functionality of the IDCA and MIGM systems, respectively, and the memory may further optionally execute one or more other programs 383. The IDCA and/or MIGM systems 150 and 160 may further, during operation, store and/or retrieve various types of data on storage 384 (e.g., in one or more databases or other data structures), such as video and/or image information 155 acquired for one or more buildings (e.g., 360° video or images for analysis to generate floor plans, to provide to users of client computing devices 370 for display, etc.), floor plans and/or other generated mapping information 165, and optionally other information 385 (e.g., additional images and/or annotation information for use with associated floor plans, building and room dimensions for use with associated floor plans, various analytical information related to presentation or other use of one or more building interiors or other environments, etc.)—while not illustrated in
Some or all of the mobile devices 175, mobile data capture devices 185, optional other navigable devices 395, other client devices 105 and other computing systems (not shown) may similarly include some or all of the same types of components illustrated for server computing system 180. As one non-limiting example, the mobile data capture devices 185 are each shown to include one or more hardware CPU(s) 132, memory 367, storage 365, one or more GPS receiver sensors 134, one or more imaging systems 135 e.g., for use in acquisition of video and/or images), optionally IMU hardware sensors 148 (e.g., for use in acquisition of associated device movement data, etc.), optionally one or more depth sensors 136, and optionally other components (not shown). In the illustrated example, zero or one or more client applications 154 (e.g., an application specific to the IDCA system and/or to the MIGM system and/or to the BFPGLDM system) and/or other programs 154 are executing in memory 367, such as to participate in communication with the BFPGLDM system 140, IDCA system 150, MIGM system 160 and/or other computing systems. While particular components are not illustrated for the other navigable devices 395 or other computing devices/systems 105, it will be appreciated that they may include similar and/or additional components.
It will also be appreciated that computing systems/devices 180 and 185 and 380 and 175 and the other systems and devices included within
It will also be appreciated that, while various items are illustrated as being stored in memory or on storage while being used, these items or portions of them may be transferred between memory and other storage devices for purposes of memory management and data integrity. Alternatively, in other embodiments some or all of the software components and/or systems may execute in memory on another device and communicate with the illustrated computing systems via inter-computer communication. Thus, in some embodiments, some or all of the described techniques may be performed by hardware means that include one or more processors and/or memory and/or storage when configured by one or more software programs (e.g., by the BFPGLDM system 140 executing on server computing systems 180, by a BFPGLDM client application or other building information viewer system executing on mobile devices 175 or other computing systems/devices, etc.) and/or data structures, such as by execution of software instructions of the one or more software programs and/or by storage of such software instructions and/or data structures, and such as to perform algorithms as described in the flow charts and other disclosure herein. Furthermore, in some embodiments, some or all of the systems and/or components may be implemented or provided in other manners, such as by consisting of one or more means that are implemented partially or fully in firmware and/or hardware (e.g., rather than as a means implemented in whole or in part by software instructions that configure a particular CPU or other processor), including, but not limited to, one or more application-specific integrated circuits (ASICs), standard integrated circuits, controllers (e.g., by executing appropriate instructions, and including microcontrollers and/or embedded controllers), field-programmable gate arrays (FPGAs), complex programmable logic devices (CPLDs), etc. Some or all of the components, systems and data structures may also be stored (e.g., as software instructions or structured data) on a non-transitory computer-readable storage mediums, such as a hard disk or flash drive or other non-volatile storage device, volatile or non-volatile memory (e.g., RAM or flash RAM), a network storage device, or a portable media article (e.g., a DVD disk, a CD disk, an optical disk, a flash memory device, etc.) to be read by an appropriate drive or via an appropriate connection. The systems, components and data structures may also in some embodiments be transmitted via generated data signals (e.g., as part of a carrier wave or other analog or digital propagated signal) on a variety of computer-readable transmission mediums, including wireless-based and wired/cable-based mediums, and may take a variety of forms (e.g., as part of a single or multiplexed analog signal, or as multiple discrete digital packets or frames). Such computer program products may also take other forms in other embodiments. Accordingly, embodiments of the present disclosure may be practiced with other computer system configurations.
The illustrated embodiment of the routine begins at block 405, where instructions or information are received. The routine continues to block 407, where it determines if the instructions or other information received in block 405 indicate to associate absolute location data with one or more devices at a building and/or other objects (e.g., objects inside the building, objects outside the building and visible from the building, etc.), and if so continues to block 409, where it determines or otherwise obtains absolute location data for such devices and/or objects, and stores them for later use. Such absolute location data may be automatically determined in various manners, such as based on public data sources (e.g., for objects outside the building), data captured during installation and/or placement of devices and/or objects (e.g., for objects inside a building, such as a visual marker), based on a mesh network of multiple such devices with associated transmitter and receiver capabilities and absolute location data for at least one such device, etc.
After block 409, or if it is instead determined in block 407 that the instructions or other information received in block 405 are not to associate absolute location data with devices and/or object, the routine continues to block 410, where it determines if the instructions or information received in block 405 are to generate a floor plan and/or other building information for use with an indicated building and to associate absolute location data with the floor plan and/or other building information, and if not continues to block 490. Otherwise, the routine continues to block 415 to perform the IDCA system routine to acquire one or more images with visual data and optionally acquisition metadata (e.g., orientation data and other pose data) for the building and/or other data (e.g., GPS locations for data capture locations) for the building, and to receive the results from the routine—one example of such an IDCA system routine is discussed further with respect to
After block 420, the routine continues to block 425, where it retrieves, for any data capture locations having associated GPS location data or other absolute location data from the operations of blocks 415 and/or 420 (e.g., GPS location data captured at particular data capture locations, absolute location data determined for particular data capture locations using visual data and/or other data acquired at those data capture locations, etc.), that associated GPS location data or other absolute location data for those data capture locations (if any). In block 430, the routine then determines whether to currently determine further absolute location data for any data capture locations using visual data and/or other data acquired at those data capture locations, such as data capture locations for which GPS location data was not previously captured or otherwise determined, all data capture locations for which such visual data and/or other data was acquired, etc. If so, the routine continues to block 435 to, for each such data capture location, analyze the visual data of any second images acquired from that data capture location to attempt to identify one or more visible objects with associated absolute locations and/or to use other captured data (e.g., transmissions received from an in-building device having a known absolute location) to identify other locations with known absolute location data, and to determine corresponding absolute location data for the data capture location by extending the known absolute location data from the identified objects or other locations. In embodiments in which a data capture location has associated GPS location data captured at that data capture location, such additional location data determined in block 435 may in some embodiments be used to supplement the captured GPS location data, such as in situations in which the GPS signal received by the mobile device at that data capture location was sufficiently weak to introduce uncertainty in the captured GPS location data.
After block 435, or if it was instead determined in block 430 to not determine further absolute location data for any data capture locations (e.g., to instead use the retrieved GPS location data or other absolute location data from block 425, the routine continues to block 440 to identify pairs of locations that each include one of the image acquisition locations and one of the data capture locations, such as location pairs used in the operations of block 420 to determine a room shape from a combination of visual data of target image(s) captured at that image acquisition location and other data captured at that data capture location and/or to determine inter-location pose data for the locations of the pair—other location pairs beyond any such location pairs used in block 420 may be further identified in block 440, such as based on proximity (e.g., being in the same room), capture time (e.g., having data captured simultaneously or within a defined amount of time of each other), comparison of visual data of target image(s) captured at an image acquisition location to additional visual data of second image(s) captured at a data capture location (e.g., to determine if there is overlapping visual data, if the visual data of the target image(s) includes the mobile device or its transporter at the data capture location, if the additional visual data of the second image(s) includes the camera device at the image acquisition location, etc.), etc. After block 440, the routine continues to block 445 where it retrieves, for any identified location pairs having associated inter-location pose data from the operations of block 420, that associated inter-location pose data for those location pairs (if any). In block 450, the routine then determines whether to currently determine further inter-location pose data for any location pairs using visual data of target image(s) and/or additional second images captured at one or more of the locations of the pair, such as for location pairs for which inter-location pose data was not previously determined, all location pairs, etc. If so, the routine continues to block 460 to, for each such location pair of an image capture location with one or more target images and a data capture location with one or more additional second images, perform one or more of the following: analyze the visual data of the target image(s) and the additional visual data of the second image(s) to determine if there is overlapping visual data, such as to identify common features in the overlapping visual data, and if so to use such common features or other information from the overlapping visual data to determine inter-location pose data for the location pair; analyze the visual data of the target image(s) to determine if the visual data of the target image(s) includes the mobile device or its transporter at the data capture location (e.g., at times of simultaneous or other concurrent data capture), and if so to use the position in the target image(s)′ visual data of the mobile device and/or transporter to determine inter-location pose data for the location pair; analyze the additional visual data of the second image(s) to determine if the additional visual data of the second image(s) includes the camera device at the image acquisition location (e.g., at times of simultaneous or other concurrent data capture), and if so to use the position in the second image(s)′ additional visual data of the camera device to determine inter-location pose data for the location pair; etc. In addition, if multiple types of such analyses are performed to determine inter-location pose data for a given location pair, the techniques may further include determining the inter-location pose data to use for that location pair, such as by selecting one of the types of analyses (e.g., for a type of analysis with a highest level of confidence or lowest associated uncertainty or otherwise having a highest associated priority; by combining inter-location pose data from multiple such analyses, such as using a weighted average or other combination technique; etc.).
After block 460, the routine continues to block 475 to, for some or all location pairs in which the data capture location has associated absolute location data from blocks 425 and/or 435, extend the absolute location data to the image acquisition location of the location pair using inter-location pose data for that location pair, and to a determined room shape that is generated based at least in part on analysis of the visual data of the target image(s) captured at that image acquisition location, including to the floor plans in which those determined room shapes are positioned. If any image acquisition location is part of multiple location pairs and has multiple extended absolute location data values from the multiple data capture locations of those multiple location pairs, the techniques may further include determining the absolute location data to use for that image acquisition location, such as by selecting the extended data from one of those data capture locations (e.g., for a data capture location with a highest level of confidence or lowest associated uncertainty or otherwise having a highest associated priority with respect to its own absolute location data and/or to the inter-location pose data for that data capture location; by combining absolute location data from multiple such data capture locations, such as using a weighted average or other combination technique; etc.).
After block 475, the routine continues to block 480 to determine whether to use absolute location data associated with one or more points of a floor plan to display that absolute location data on a map, such as based on instructions or other information received in block 405, and if so continues to block 485 to perform such a map display (e.g., to transmit the map data to a client device or other recipient for display), including to overlay or otherwise include one or more visual indicators on the map of the one or more visual indicators (e.g., to display the generated floor plan on the map). After block 485, or if it is instead determined in block 480 to not perform such a map display at the current time, the routine continues to block 489, where it optionally provides some or all of the determined and/or generated information for the routine to one or more recipients.
If it is determined in block 410 that the instructions or other information received in block 405 are not to generate building information, the routine continues instead to block 490 to perform one or more other indicated operations as appropriate. Such other indicated operations may include, for example, one or more of the following non-exclusive examples: receiving and storing (or otherwise determining) information about known absolute locations of particular devices and/or objects; receiving and storing information about buildings and/or capture devices and/or companion devices and/or users for later use; retrieving and providing information from a BFPGLDM system account for a user device and/or associated user to that device or user; etc.
After blocks 489 or 490, the routine continues to block 495 to determine whether to continue, such as until an explicit indication to terminate is received, or instead only if an explicit indication to continue is received. If it is determined to continue, the routine returns to block 405 to await additional instructions or other information, and otherwise continues to block 499 and ends.
While not illustrated with respect to the automated operations shown in the example embodiment of
The illustrated embodiment of the routine begins at block 505, where instructions or information are received. At block 510, the routine determines whether the received instructions or information indicate to perform directed acquisition of visual data and/or other data representing a building interior (e.g., in accordance with supplied information about one or more acquisition locations and/or other guidance acquisition instructions), and if not continues to block 590 to perform one or more other indicated operations, including in some embodiments and situations to receive one or more target images captured by one or more camera devices without directed acquisition and/or other data captured by one or more other mobile devices without directed acquisition. Otherwise, the routine proceeds to block 512 to receive an indication to begin the image acquisition process by a camera device at a first image acquisition location (e.g., from a user of a camera device that will perform the target image acquisition) and/or to begin the capture of other data by a mobile device at a first data capture location (e.g., from a user of a mobile data capture device that will perform the data capture process). After block 512, the routine proceeds to block 515 in order to perform image acquisition activities for acquiring a 360° panorama image for the image acquisition location at the target building of interest using the camera device (e.g., via one or more fisheye lenses and/or non-fisheye rectilinear lenses on the mobile device and to provide horizontal coverage of at least 360° around a vertical axis, although in other embodiments other types of images and/or other types of data may be acquired), and/or to perform data capture activities for acquiring other data at the data capture location by the mobile device (e.g., to capture GPS location data and one or more additional second images, and to optionally obtain IMU data and/or other acquisition metadata during the image acquisition activities), such as to concurrently capture data by both devices at locations that are proximate to each other (e.g., within visual range of each other or otherwise having overlapping visual data). As one non-exclusive example, the camera device may be a rotating (scanning) panorama camera equipped with a fisheye lens (e.g., with 180° of horizontal coverage) and/or other lens (e.g., with less than 180° of horizontal coverage, such as a regular lens or wide-angle lens or ultrawide lens or macro lens). The routine may also optionally obtain annotation and/or other information from one or more users of the camera device and/or the mobile device regarding the respective image acquisition location and/or data capture location and optionally a surrounding environment, such as for later use in presentation of information regarding the location(s) and/or surrounding environment.
After block 515 is completed, the routine continues to block 520 to determine if there are more image acquisition locations at which to acquire target images using the camera device and/or more data capture locations at which to acquire other data using the mobile device, such as based on corresponding information provided by one or more users of the device(s) and/or received in block 505—in some embodiments, the IDCA routine will acquire only one or more target images captured by the camera device at a single image acquisition location and/or other data captured at a single data capture location, and then proceed to block 577 to provide those target image(s) and/or other data and optionally corresponding information (e.g., to the BFPGLDM system and/or MIGM system for further use before receiving additional instructions or information to acquire one or more next images at one or more next image acquisition locations and/or one or more other groups of data at one or more next data capture locations). If there are more image acquisition locations at which to acquire additional images from the camera device at the current time and/or more data capture locations at which to acquire other data from the mobile device at the current time, the routine continues to block 522 to optionally initiate the acquisition of linking information (e.g., acceleration data, visual data, etc.) during movement of the device(s) along travel path(s) away from the current location(s) and towards next location(s) at the building. The acquired linking information may include additional sensor data (e.g., from one or more IMU, or inertial measurement units, on the mobile device or otherwise carried by the user) and/or additional visual information (e.g., images, video, etc.) recorded during such movement. Initiating the acquisition of such linking information may be performed in response to an explicit indication from a user of a device or based on one or more automated analyses of information recorded from a device. In addition, the routine may further optionally monitor the motion of a device in some embodiments during movement to the next acquisition location, and provide one or more guidance cues (e.g., to the user) regarding the motion of the device, quality of the sensor data and/or visual information being acquired, associated lighting/environmental conditions, advisability of acquiring images and/or other data at a next location, and any other suitable aspects of acquiring the linking information. Similarly, the routine may optionally obtain annotation and/or other information from the user(s) regarding the travel path(s), such as for later use in presentation of information regarding a travel path or a resulting inter-location connection. In block 524, the routine determines that the camera device has arrived at the next image acquisition location and/or that the mobile device has arrived at the next data capture location (e.g., based on an indication from a user, based on forward movement of the device stopping for at least a predefined amount of time, etc.), for use as the new current image acquisition location and/or data capture location, respectively, and returns to block 515 to perform further target image acquisition activities for the new current image acquisition location and/or further capture of other data for the new current data capture location.
If it is instead determined in block 520 that there are not any more image acquisition locations at which to acquire additional target images for the current building or other structure at the current time and not any more data capture locations at which to acquire additional other data for the current building or other structure at the current time, the routine proceeds to block 545 to optionally preprocess acquired 360° target panorama images and/or other acquired data before subsequent use (e.g., for generating related mapping information, for providing information about structural elements or other objects of rooms or other enclosing areas, etc.), such as to produce images of a particular type and/or in a particular format (e.g., to perform an equirectangular projection for each such image, with straight vertical data such as the sides of a typical rectangular door frame or a typical border between 2 adjacent walls remaining straight, and with straight horizontal data such as the top of a typical rectangular door frame or a border between a wall and a floor remaining straight at a horizontal midline of the image but being increasingly curved in the equirectangular projection image in a convex manner relative to the horizontal midline as the distance increases in the image from the horizontal midline and/or as the distance to the acquisition location decreases). In block 577, the images and other captured data and any associated generated or obtained information is stored for later use, and optionally provided to one or more recipients (e.g., to block 415 of routine 400 if invoked from that block)—
If it is instead determined in block 510 that the instructions or other information received in block 505 are not to acquire images and other data representing a building interior using directed capture, the routine continues instead to block 590 to perform any other indicated operations as appropriate, such as to receive one or more target images captured by one or more camera devices at one or more image acquisition locations without directed acquisition, to receive other data captured by one or more other mobile devices at one or more data capture locations without directed acquisition, to respond to requests for generated and stored information (e.g., to identify one or more panorama images that match one or more specified search criteria, etc.), to obtain and store other information about users of the system, to configure parameters to be used in various operations of the system (e.g., based at least in part on information specified by a user of the system, such as a user of a mobile device who acquires one or more building interiors, an operator user of the IDCA system, etc.), to perform any housekeeping tasks, etc.
Following blocks 577 or 590, the routine proceeds to block 595 to determine whether to continue, such as until an explicit indication to terminate is received, or instead only if an explicit indication to continue is received. If it is determined to continue, the routine returns to block 505 to await additional instructions or information, and if not proceeds to step 599 and ends.
While not illustrated with respect to the automated operations shown in the example embodiment of
The illustrated embodiment of the routine begins at block 605, where information or instructions are received. The routine continues to block 610 to determine whether image information and optionally other captured data is already available to be analyzed for one or more rooms (e.g., for some or all of an indicated building, such as based on one or more such images received in block 605 as previously generated by the IDCA routine), or if such image information instead is to be currently acquired. If it is determined in block 610 to currently acquire some or all of the image information, the routine continues to block 612 to acquire such information, optionally waiting for one or more users or devices to move throughout one or more rooms of a building and acquire panoramas or other target images at one or more image acquisition locations in one or more of the rooms or other areas (e.g., at multiple acquisition locations in each room of the building) and/or to acquire other second images and optionally other data at one or more data capture locations in the one or more rooms or other areas (e.g., at multiple data capture locations in each room of the building), optionally along with metadata information regarding the acquisition and/or interconnection linking information related to movement between acquisition locations, as discussed in greater detail elsewhere herein-implementation of block 612 may, for example, include invoking an IDCA system routine to perform such activities, with
After blocks 612 or 615, the routine continues to block 620, where it determines whether to generate mapping information that includes an inter-linked set of target panorama images (or other images) for a building or other group of rooms (referred to at times as a ‘virtual tour’, such as to enable an end-user to move from any one of the images of the linked set to one or more other images to which that starting current image is linked, including in some embodiments via selection of a user-selectable control for each such other linked image that is displayed along with a current image, optionally by overlaying visual representations of such user-selectable controls and corresponding inter-image directions on the visual data of the current image, and to similarly move from that next image to one or more additional images to which that next image is linked, etc.), and if so continues to block 625. The routine in block 625 selects pairs of at least some of the images (e.g., based on the images of a pair having overlapping visual content), and if acquisition location position information is not already determined and provided, determines, for each pair, relative directions between the images of the pair based on shared visual content and/or on other acquired linking interconnection information (e.g., movement information) related to the images of the pair (whether movement directly from the location at which one image of a pair was acquired to the location at which the other image of the pair was acquired, or instead movement between those starting and ending locations via one or more other intermediary locations of other images)—if acquisition location position information is already determined and provided, that information may be used to determine the relative direction information between pairs of images, whether instead of or in addition to the visual data analysis. The routine in block 625 may further optionally use at least the relative direction information for the pairs of images to determine global relative positions of some or all of the images to each other in a common coordinate system, and/or generate the inter-image links and corresponding user-selectable controls as noted above. Additional details are included elsewhere herein regarding creating such a linked set of images.
After block 625, or if it is instead determined in block 620 that the instructions or other information received in block 605 are not to determine a linked set of images, the routine continues to block 635 to determine whether the instructions received in block 605 indicate to generate other mapping information for an indicated building (e.g., a floor plan), and if so the routine continues to perform some or all of blocks 637-685 to do so, and otherwise continues to block 690. In block 637, the routine optionally obtains additional information about the building, such as from activities performed during acquisition and optionally analysis of the images, and/or from one or more external sources (e.g., online databases, information provided by one or more end-users, etc.)—such additional information may include, for example, exterior dimensions and/or shape of the building, additional images and/or annotation information acquired corresponding to particular locations external to the building (e.g., surrounding the building and/or for other structures on the same property, from one or more overhead locations, etc.), additional images and/or annotation information acquired corresponding to particular locations within the building (optionally for locations different from acquisition locations of the acquired panorama images or other images), determined acquisition location position information, etc.
After block 637, the routine continues to block 640 to select the next room (beginning with the first) for which one or more images (e.g., 360° target panorama images, other target images, other second images, etc.) acquired in the room are available, and to analyze the visual data of the image(s) for the room to determine a room shape (e.g., by determining at least wall locations), optionally along with determining uncertainty information about walls and/or other parts of the room shape, and optionally including identifying other wall and floor and ceiling elements (e.g., wall structural elements/objects, such as windows, doorways and stairways and other inter-room wall openings and connecting passages, wall borders between a wall and another wall and/or ceiling and/or floor, etc.) and their positions within the determined room shape of the room-if acquisition location position information is already determined and provided, that information may be used as part of determining the room shape information, whether instead of or in addition to the visual data analysis. In some embodiments, the room shape determination may include using boundaries of the walls with each other and at least one of the floor or ceiling to determine a 2D room shape (e.g., using one or trained machine learning models), while in other embodiments the room shape determination may be performed in other manners (e.g., by generating a 3D point cloud of some or all of the room walls and optionally the ceiling and/or floor, such as by analyzing at least visual data of the panorama image and optionally additional data acquired by an mobile data capture device or associated mobile computing device, optionally using one or more of SfM (Structure from Motion) or SLAM (Simultaneous Location And Mapping) or MVS (Multi-View Stereo) analysis). In addition, the activities of block 645 may further optionally determine and use acquisition location position information for each of the analyzed images (e.g., within a corresponding determined room shape), and/or obtain and use additional metadata for each panorama image (e.g., acquisition height information of the camera device or other mobile data capture device used to acquire a panorama image relative to the floor and/or the ceiling). Additional details are included elsewhere herein regarding determining room shapes and identifying additional information for the rooms. After block 640, the routine continues to block 645, where it determines whether there are more rooms for which to determine room shapes based on images acquired in those rooms, and if so returns to block 640 to select the next such room for which to determine a room shape.
If it is instead determined in block 645 that there are not more rooms for which to generate room shapes, the routine continues to block 660 to determine whether to further generate at least a partial floor plan for the building (e.g., based at least in part on the determined room shape(s) from block 640 and on determined acquisition location position information if available, and optionally further information regarding how to position the determined room shapes relative to each other). If not, such as when determining only one or more room shapes without generating further mapping information for a building (e.g., to determine the room shape for a single room based on one or more images acquired in the room by the IDCA system), the routine continues to block 688. Otherwise, the routine continues to block 665 to retrieve one or more room shapes (e.g., room shapes generated in block 645) or otherwise obtain one or more room shapes (e.g., based on human-supplied input) for rooms of the building, whether 2D or 3D room shapes, and then continues to block 670. In block 670, the routine uses the one or more room shapes to create an initial floor plan (e.g., an initial 2D floor plan using 2D room shapes and/or an initial 3D floor plan using 3D room shapes), such as a partial floor plan that includes one or more room shapes but less than all room shapes for the building, or a complete floor plan that includes all room shapes for the building. If there are multiple room shapes, the routine in block 670 further determines positioning of the room shapes relative to each other, such as by using visual overlap between images from multiple acquisition locations to determine relative positions of those acquisition locations and of the room shapes surrounding those acquisition locations, and/or by using other types of information (e.g., using connecting inter-room passages between rooms, optionally applying one or more constraints or optimizations; using determined acquisition location position information; etc.). In at least some embodiments, the routine in block 670 further refines some or all of the room shapes by generating a binary segmentation mask that covers the relatively positioned room shape(s), extracting a polygon representing the outline or contour of the segmentation mask, and separating the polygon into the refined room shape(s). Such a floor plan may include, for example, relative position and shape information for the various rooms without providing any actual dimension information for the individual rooms or building as a whole, and may further include multiple linked or associated sub-maps (e.g., to reflect different stories, levels, sections, etc.) of the building. The routine further optionally associates positions of the doors, wall openings and other identified wall elements on the floor plan.
After block 670, the routine optionally performs one or more steps 680-685 to determine and associate additional information with the floor plan. In block 680, the routine optionally estimates the dimensions of some or all of the rooms, such as from analysis of images and/or their acquisition metadata or from overall dimension information obtained for the exterior of the building, and associates the estimated dimensions with the floor plan—it will be appreciated that if sufficiently detailed dimension information were available, architectural drawings, blueprints, etc. may be generated from the floor plan. After block 680, the routine continues to block 683 to optionally associate further information with the floor plan (e.g., with particular rooms or other locations within the building), such as additional existing images with specified positions and/or annotation information. In block 685, if the room shapes from block 645 are not 3D room shapes, the routine further optionally estimates heights of walls in some or all rooms, such as from analysis of images and optionally sizes of known objects in the images, as well as height information about a camera when the images were acquired, and uses that height information to generate 3D room shapes for the rooms. The routine further optionally uses the 3D room shapes (whether from block 640 or block 685) to generate a 3D computer model floor plan of the building, with the 2D and 3D floor plans being associated with each other—in other embodiments, only a 3D computer model floor plan may be generated and used (including to provide a visual representation of a 2D floor plan if so desired by using a horizontal slice of the 3D computer model floor plan).
After block 685, or if it is instead determined in block 660 not to determine a floor plan, the routine continues to block 688 to store the determined room shape(s) and/or generated mapping information and/or other generated information, to optionally provide some or all of that information to one or more recipients (e.g., to block 420 of routine 400 if invoked from that block), and to optionally further use some or all of the determined and generated information, such as to provide the generated 2D floor plan and/or 3D computer model floor plan for display on one or more client devices and/or to one or more other devices for use in automating navigation of those devices and/or associated vehicles or other entities, to similarly provide and use information about determined room shapes and/or a linked set of images and/or about additional information determined about contents of rooms and/or passages between rooms, etc.
If it is instead determined in block 635 that the information or instructions received in block 605 are not to generate mapping information for an indicated building, the routine continues instead to block 690 to perform one or more other indicated operations as appropriate. Such other operations may include, for example, receiving and responding to requests for previously generated floor plans and/or previously determined room shapes and/or other generated information (e.g., requests for such information for display on one or more client devices, requests for such information to provide it to one or more other devices for use in automated navigation, etc.), obtaining and storing information about buildings for use in later operations (e.g., information about dimensions, numbers or types of rooms, total square footage, adjacent or nearby other buildings, adjacent or nearby vegetation, exterior images, etc.), etc.
After blocks 688 or 690, the routine continues to block 695 to determine whether to continue, such as until an explicit indication to terminate is received, or instead only if an explicit indication to continue is received. If it is determined to continue, the routine returns to block 605 to wait for and receive additional instructions or information, and otherwise continues to block 699 and ends.
While not illustrated with respect to the automated operations shown in the example embodiment of
The illustrated embodiment of the routine begins at block 705, where instructions or information are received. At block 710, the routine determines whether the received instructions or information in block 705 are to present determined information for one or more target buildings, and if so continues to block 715 to determine whether the received instructions or information in block 705 are to select one or more target buildings using specified criteria (e.g., based at least in part on an indicated building), and if not continues to block 720 to obtain an indication of a target building to use from the user (e.g., based on a current user selection, such as from a displayed list or other user selection mechanism; based on information received in block 705; etc.). Otherwise, if it is determined in block 715 to select one or more target buildings from specified criteria, the routine continues instead to block 725, where it obtains indications of one or more search criteria to use, such as from current user selections or as indicated in the information or instructions received in block 705, and then searches stored information about buildings (e.g., floor plans, videos, generated textual descriptions, etc.) to determine one or more of the buildings that satisfy the search criteria or otherwise obtains indications of one or more such matching target buildings, such as information that is currently or previously generated by the BFPGLDM system (with one example of operations of such a system being further discussed with respect to
After blocks 720 or 725, the routine continues to block 730 to determine whether the instructions or other information received in block 705 indicate to present one or more maps with one or more visual indicators for each of one or more target buildings, and if so continues to block 732 to do so, including to retrieve or otherwise generate one or more maps for one or more areas that include a location of the one or more target buildings (e.g., one or more maps that match criteria specified in the information of block 705 or otherwise determined, such as using preference information or other information specific to a recipient), and to initiate presentation of the map(s) (e.g., to transmit the map(s) to client device(s) for presentation on those devices) with the visual indicators overlaid on or otherwise included on the maps. In some embodiments and situations, the visual indicator(s) for a target building on a map include some or all of the generated floor plan for that target building. After block 732, the routine continues to block 795.
If it is instead determined in block 730 that the instructions or other information received in block 705 do not indicate to present one or more generated videos, the routine continues to block 735 to retrieve information for the target building for display (e.g., a floor plan; other generated mapping information for the building, such as a group of inter-linked images for use as part of a virtual tour; generated building description information; etc.), and optionally indications of associated linked information for the building interior and/or a surrounding location external to the building, and/or information about one or more generated explanations or other descriptions of the target building, and selects an initial view of the retrieved information (e.g., a view of the floor plan, a particular room shape, a particular image, some or all of the generated building description information, etc.). In block 740, the routine then displays or otherwise presents the current view of the retrieved information, and waits in block 745 for a user selection. After a user selection in block 745, if it is determined in block 750 that the user selection corresponds to adjusting the current view for the current target building (e.g., to change one or more aspects of the current view), the routine continues to block 755 to update the current view in accordance with the user selection, and then returns to block 740 to update the displayed or otherwise presented information accordingly. The user selection and corresponding updating of the current view may include, for example, displaying or otherwise presenting a piece of associated linked information that the user selects (e.g., a particular image associated with a displayed visual indication of a determined acquisition location, such as to overlay the associated linked information over at least some of the previous display; a particular other image linked to a current image and selected from the current image using a user-selectable control overlaid on the current image to represent that other image; etc.), and/or changing how the current view is displayed (e.g., zooming in or out; rotating information if appropriate; selecting a new portion of the floor plan to be displayed or otherwise presented, such as with some or all of the new portion not being previously visible, or instead with the new portion being a subset of the previously visible information; etc.). If it is instead determined in block 750 that the user selection is not to display further information for the current target building (e.g., to display information for another building, to end the current display operations, etc.), the routine continues instead to block 795, and returns to block 705 to perform operations for the user selection if the user selection involves such further operations.
If it is instead determined in block 710 that the instructions or other information received in block 705 are not to present information representing a building, the routine continues instead to block 760 to determine whether the instructions or other information received in block 705 indicate to identify other images (if any) corresponding to one or more indicated target images, and if so continues to blocks 765-770 to perform such activities. In particular, the routine in block 765 receives the indications of the one or more target images for the matching (such as from information received in block 705 or based on one or more current interactions with a user) along with one or more matching criteria (e.g., an amount of visual overlap), and in block 770 identifies one or more other images (if any) that match the indicated target image(s), such as by interacting with the IDCA and/or MIGM systems to obtain the other image(s). The routine then displays or otherwise provides information in block 770 about the identified other image(s), such as to provide information about them as part of search results, to display one or more of the identified other image(s), etc. If it is instead determined in block 760 that the instructions or other information received in block 705 are not to identify other images corresponding to one or more indicated target images, the routine continues instead to block 775 to determine whether the instructions or other information received in block 705 correspond to obtaining and providing guidance acquisition instructions during an image acquisition session with respect to one or more indicated target images (e.g., a most recently acquired image), and if so continues to block 780, and otherwise continues to block 790. In block 780, the routine obtains information about guidance acquisition instructions of one or more types, such as by interacting with the IDCA system, and displays or otherwise provides information in block 780 about the guidance acquisition instructions, such as by overlaying the guidance acquisition instructions on a partial floor plan and/or recently acquired image in manners discussed in greater detail elsewhere herein.
In block 790, the routine continues instead to perform other indicated operations as appropriate, such as to configure parameters to be used in various operations of the system (e.g., based at least in part on information specified by a user of the system, such as a user of a mobile device who acquires one or more building interiors, an operator user of the BFPGLDM and/or MIGM systems, etc., including for use in personalizing information display for a particular recipient user in accordance with his/her preferences or other information specific to that recipient), to obtain and store other information about users of the system (e.g., preferences or other information specific to that user), to respond to requests for generated and stored information, to perform any housekeeping tasks, etc.
Following blocks 732, 770 or 780 or 790, or if it is determined in block 750 that the user selection does not correspond to the current building, the routine proceeds to block 795 to determine whether to continue, such as until an explicit indication to terminate is received, or instead only if an explicit indication to continue is received. If it is determined to continue (including if the user made a selection in block 745 related to a new building to present), the routine returns to block 705 to await additional instructions or information (or to continue directly on to block 735 if the user made a selection in block 745 related to a new building to present), and if not proceeds to step 799 and ends.
Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present disclosure. It will be appreciated that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions. It will be further appreciated that in some implementations the functionality provided by the routines discussed above may be provided in alternative ways, such as being split among more routines or consolidated into fewer routines. Similarly, in some implementations illustrated routines may provide more or less functionality than is described, such as when other illustrated routines instead lack or include such functionality respectively, or when the amount of functionality that is provided is altered. In addition, while various operations may be illustrated as being performed in a particular manner (e.g., in serial or in parallel, or synchronous or asynchronous) and/or in a particular order, in other implementations the operations may be performed in other orders and in other manners. Any data structures discussed above may also be structured in different manners, such as by having a single data structure split into multiple data structures and/or by having multiple data structures consolidated into a single data structure. Similarly, in some implementations illustrated data structures may store more or less information than is described, such as when other illustrated data structures instead lack or include such information respectively, or when the amount or types of information that is stored is altered.
From the foregoing it will be appreciated that, although specific embodiments have been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the invention. Accordingly, the invention is not limited except as by corresponding claims and the elements recited by those claims. In addition, while certain aspects of the invention may be presented in certain claim forms at certain times, the inventors contemplate the various aspects of the invention in any available claim form. For example, while only some aspects of the invention may be recited as being embodied in a computer-readable medium at particular times, other aspects may likewise be so embodied.