A single camera used to take an image or video provides a single perspective on a subject, which is both literally and figuratively a single point of view. If multiple images are taken of the same subject, more information regarding the subject may be collected. If the images are numerous and are taken at varying positions or angles with respect to the subject, the aggregate of the images contains depth and perspective information that is not possible to obtain from a single normal camera.
The common use of smart phones and tablets has widely distributed cameras (as integrated with general purpose computers) among the general public. Due to this wide distribution, at any event or attraction, many individuals have cameras on their person. This creates infinite opportunities to collect multiple images of any interesting subject. By coordinating only a small fraction of the camera-carrying users and their cameras, each user may be able to capture better or more interesting images and video that are derived from a plurality of images or video captured by a plurality of devices.
Varying embodiments of the invention relate to collaborative imaging sessions of projects where several devices such as smartphones perform image captures under supervisory control for the purpose of creating a combined image. In some embodiments, in order to perform a collaborative imaging session, a group of participating nodes must be identified. One or more embodiments may identify nodes by using network service discovery tools such as BONJOUR® or by using a registration system.
Once a group of nodes is assembled, some embodiments allow for supervisory control of the group. The supervisory node may be a server, or any type of computer including an imaging participant in the collaborative imaging project. In one embodiment, the supervisory node will examine preview information and provide feedback to supervised nodes to adjust their positioning or other settings. When the positions and image previews are deemed suitable, the supervisory node will initiate an image capture, which may happen simultaneously or in a sequence.
In varying embodiments, the captured images are collected for processing to form a combined image, which may, for example, be a stitched image or a collage. The combined image may reflect more or better information than any one node could capture individually
The inventive embodiments described herein may have implication and use in and with respect to all types of devices, including single- and multi-processor computing systems and vertical devices (e.g. cameras or appliances) that incorporate single- or multi-processing computing systems. The discussion herein references a common computing configuration having a CPU resource including one or more microprocessors. The discussion is only for illustration and is not intended to confine the application to the disclosed hardware. Other systems having other known or common hardware configurations (now or in the future) are fully contemplated and expected. With that caveat, a typical hardware and software operating environment is discussed below. The hardware configuration may be found, for example, in a server, a laptop, a tablet, a desktop computer, a smart phone, a phone, or any computing device, whether mobile or stationary.
Referring to
Processor 105 may execute instructions necessary to carry out or control the operation of many functions performed by device 100 (e.g., such as to control one or more cameras and to run software applications like games, productivity software, and low level software such as frameworks). In general, many of the functions described herein are based upon a microprocessor acting upon software (instructions) embodying the function. The software instructions may be written in any known computer language and may be in any form including compiled or clear text. The instructions may be stored on any known media such as magnetic memory disks, optical disks or FLASH memory and other similar semiconductor-based media devices. Processor 105 may, for instance, drive display 110 and receive user input from user interface 115. User interface 115 can take a variety of forms, such as a button, keypad, dial, a click wheel, keyboard, display screen and/or a touch screen, or even a microphone or camera (video and/or still) to capture and interpret input sound/voice or images including video. The user interface 115 may capture user input for any purpose including management of a camera-related application and supervisory control of remote devices including remote cameras.
Processor 105 may be a system-on-chip, such as those found in mobile devices, and may include a dedicated graphics processing unit (GPU). Processor 105 may be based on reduced instruction-set computer (RISC) or complex instruction-set computer (CISC) architectures or any other suitable architecture and may include one or more processing cores. Graphics hardware 120 may be special purpose computational hardware for processing graphics and/or for assisting processor 105 to process graphics information. In one embodiment, graphics hardware 120 may include one or more programmable graphics processing units (GPU).
Image capture circuitry 150 includes a variety of camera components, known to the skilled artisan, such as an image sensor (e.g. CCD or CMOS). While not shown in the diagram, image capture circuitry 150 may be associated with a camera optical system including one or more lenses and the capability to adjust focus and aperture, either manually or automatically. The image sensor is in optical communication with the lens system so that light arriving on the sensor may form a desirable image. Furthermore, the activity of the image sensor may be controlled by operation of the processor in response to software instructions.
Sensors 125 and camera circuitry 150 may capture contextual and/or environmental phenomena such as location information; the status of the device with respect to light, gravity, and the magnetic north; and even still and video images. All captured contextual and environmental phenomena may be used to provide context for images captured by the camera. Output from the sensors 125 or camera circuitry 150 may be processed, at least in part, by video codec(s) 155 and/or processor 105 and/or graphics hardware 120 and/or a dedicated image processing unit incorporated within circuitry 150. Information so captured may be stored in memory 160 and/or storage 165 and/or in any storage accessible on an attached network. Memory 160 may include one or more different types of media used by processor 105, graphics hardware 120, and image capture circuitry 150 to perform device functions. For example, memory 160 may include memory cache, electrically erasable memory (e.g., flash), read-only memory (ROM), and/or random access memory (RAM). Storage 165 may store data such as media (e.g., audio, image, and video files), metadata for media, computer program instructions, or other software; including database applications, preference information, device profile information, and any other suitable data. Storage 165 may include one more non-transitory storage media including, for example, magnetic disks (fixed, floppy, and removable) and tape, optical media such as CD-ROMs and digital video disks (DVDs), and semiconductor memory devices such as Electrically Programmable Read-Only Memory (EPROM), and Electrically Erasable Programmable Read-Only Memory (EEPROM). Memory 160 and storage 165 may be used to retain computer program instructions or code organized into one or more modules in either compiled form or written in any desired computer programming language. When executed by, for example, processor 105, such computer program code may implement one or more of the acts or functions described herein.
Referring now to
Also coupled to networks 205, and/or data server computers 210, are client computers 215 (i.e., 215A, 215B and 215C), which may take the form of any smartphone, tablet, computer, set top box, entertainment device, communications device, or intelligent machine, including embedded systems. In some embodiments, users will employ client computers in the form of smart phones or tablets that include cameras. In some embodiments, network architecture 210 may also include network printers such as printer 220 and storage systems such as storage system 225, which may be used to store multi-media items (e.g., images) that are referenced herein. To facilitate communication between different network devices (e.g., data servers 210, end-user computers 215, network printer 220, and storage system 225), at least one gateway or router 230 may be optionally coupled therebetween. Furthermore, in order to facilitate such communication, each device employing the network may comprise a network adapter circuit and related software. For example, if an Ethernet network is desired for communication, each participating device must have an Ethernet adapter or embedded Ethernet capable ICs. Further, the devices may carry network adapters for any network in which they will participate (including PANs, LANs, WANs and cellular networks).
As noted above, embodiments of the inventions disclosed herein include software. As such, a general description of common computing software architecture is provided as expressed in layer diagrams of
With those caveats regarding software, referring to
In the application layer 35 of the software stack shown in
No limitation is intended by these hardware and software descriptions and the varying embodiments of the inventions herein may include any manner of computing device such as MACs®, PCs, PDAs, phones, servers, or even embedded systems.
Some embodiments seek to assemble multiple images that are sourced from multiple cameras so the images may be fused or otherwise combined to create a result that is more interesting or more information-rich than any one camera can produce. Some embodiments collect images or video from node-members of a collaborative imaging group where the pictures or video are captured during a collaborative imaging session or a collaborative imaging project. Differing embodiments may employ a camera that includes or is part of a general-purpose computer. In these embodiments, the general-purpose computer may be programmed to aid the functionality in collaborative imaging arrangement, employing a plurality of cameras/nodes. For example, the general purpose computer may be programmed to: control one or more cameras integral with the computer; provide for communication with Internet or other network-based resources; provide for direct or indirect communication among and between any members of the collaborative group; process images; and, simultaneously run a large number of software programs that are both related to and unrelated to photo/video capture and imaging.
Some examples of general-purpose computers that are suitable for use with certain embodiments are smartphones and tablets. A popular line of such devices is produced by Apple Inc. The Apple devices are commonly known as iOS devices, which include the Apple IPHONE® and IPAD® product lines. Smartphone and tablet platforms work well with many disclosed embodiments because they include cameras and the computing and communications infrastructure necessary for synchronizing and coordinating activity between the cameras. In some embodiments employing general-purpose computers such as smartphones and tablets, each computer runs one or more software programs for managing the activity related to the collaborative imaging arrangement. For example, the software may be capable of: presenting a GUI to receive settings and provide information to users; allowing for user control and provide for user feedback; sending or receiving communications via networks such as Bluetooth, Wi-Fi, LTE, UMTS, etc.; determining and applying camera settings; triggering an integral camera or any camera in the group to capture an image; processing images (including combining images) and sharing the images.
Referring to
Referring now to
Referring again to
As discussed above, device 401 may carry a software arrangement such as the software discussed with respect to
For many embodiments, communication among devices in a collaborative group may be a necessary function to the task of collaborative imaging. Varying embodiments envision differing ways to route shared information. There are three general ways for nodes to communicate between and amongst themselves: directly with each other without intermediary computers or networking equipment; indirectly but over local area networks such as WiFi; and, indirectly over wide are networks such as the Internet. For purposes of this discussion, indirect communications over a local network are analogous to path 430 shown in
As show in
In certain embodiments, communications path 425 may be used to access network-based resources represented by servers 435. For example, the network may be used for image processing or communication purposes. In some embodiments, a server-based service available over the network may allow group members to find each other and to form a group as discussed below. In some embodiments, path 425 may be used so that servers 435 can perform intermediary processing before forwarding information to another device 401 within a collaborative group. For example, when transporting image information, it may be desirable to allow servers 435 to perform image processing rather than the source or destination device. Such an arrangement may conserve time or the nodes' battery power, processor, and memory availability. As discussed below, some collaborative imaging embodiments involve combining images in any of a variety of ways. Since certain types of image combinations, such as fusing or stitching, can be intensive from the standpoint of processing resources, some embodiments perform some or all of the image combining at servers 435 (e.g. each device 401 forwards image information to servers 435 and some or all of the devices receive a final fused image or video in preview or final form).
In addition to or in lieu of the collaboration application 410, devices 401 may also be enabled for collaborative imaging largely through other applications 415 and/or low-level software 420. For example, APIs and frameworks may enable collaborative imaging functionality, and users may interact with that functionality through software and mechanisms pre-installed in the device, perhaps with the aid of a plugin (e.g. bundled photo, camera software, or an Internet browser application).
Referring to
Once two or more users have indicated a general interest in a collaborative imaging project, a specific group may be formed through a user interface such as those shown in
In some embodiments, when a node/user is added to a group, for example through the interface of
An invitation may be sent by any node to any other node. In some embodiments, a supervisory node, such as a master node or a managing node, may send invitations to other nodes. In other embodiments, a server may send the invitations. Invitations may be generated in response to a user's action or, in some embodiments, they may be created by software after evaluating a situational context. For example, software on a node may recognize proximity of other devices, and, based upon the proximity, the software may decide to offer the users the ability to enter a collaborative imaging session, either generally or to a specific project. There are many factors that software may consider in determining whether a collaborative imaging session might be offered to other nodes/cameras. For example, software may consider location information combined with calendar, map, and/or public event information to determine if a group of nodes/cameras is in a place and/or at a time when an event suitable for imaging is taking place. For purposes of this feature, other applications on a node/device may allow a user to flag an event that should be considered for collaborative imaging. Some examples of such other applications are a reminder application or a calendar application wherein an express entry can be made to flag an event as appropriate for group imaging. This entry may be a check box or a pull-down selection on an appointment or reminder. Alternatively, the reminder or calendar application information may be used to infer whether a situation is appropriate for a collaborative imaging project. For example, the collaboration application may search the reminder or calendar application for a keyword such as “collaboration” or any user defined word. In this manner, the user may flag collaboration-appropriate events in reminder or calendar software that is not specially adapted. In addition to searching the reminder and calendar data, the collaboration application may register to receive notice of the certain types of entries when they are made into those applications.
Similarly to calendars and reminders, web pages and web-based directories or a web-accessible database can employ flags or other indications regarding collaborative image desirability and permissibility. For example, events or attractions listed or otherwise promoted on the internet may embed (visibly or not) GPS information and an entry indicating the desirability of collaborative imaging. The collaboration application may access the web-based content so that the node user may be notified of the opportunity.
Referring again to
In some embodiments, each node/user may register to participate in collaborative imaging. The registration may provide consent to use location information of the node and, in addition, provide an indication of a collaborative imaging interest either in text or graphic/photo (e.g. a performer, sports event/game, artwork, or even civil infrastructure—the Golden Gate Bridge). In one embodiment, a networked/Internet based server may maintain the identity, location, and imaging interest of each participant and may make the information available so that groups may be formed through user interaction with the collaboration application 410 (e.g. as shown in
The supervisory arrangement of a collaborative imaging project may take many different forms. In some embodiments, a master-slave arrangement may be employed. In a master-slave arrangement, a single node: (1) receives and assembles all the relevant information for an imaging session from each member node (e.g. sensor and image information from all the cameras); (2) controls all the camera settings of all the cameras/nodes; and (3) sends the capture signal(s) to capture still images or begin and end video recording.
In other embodiments, the collaborative arrangement may be more peer-to-peer in nature and generally lacking supervisory functionality. In a peer-to-peer arrangement, each node/device/user decides its own settings and/or its own capture time and the images are sent to a single device or a server for processing (e.g. fusing, stitching, or other combination).
Still other embodiments may use a hybrid form of control. In a hybrid arrangement, each node/device/user controls some functions and parameters while a managing server or node/user controls others functions and parameters. In one such embodiment, individual nodes may decide exposure settings because they can most quickly react to exposure conditions. A managing node/user may be sent the exposure information and may be given the ability to override a member exposure setting (e.g. for creative reasons), but absent an override the managed unit will control. Finally, in a hybrid arrangement, the managing node/user may decide the capture time(s) for a still image or video recording.
In some supervisory embodiments, control may be shifted between nodes or users either on a temporal basis, for functional reasons, or upon user request. In these embodiments or any others, the supervisory node may receive its authority by holding a control token. Tokens are a common networking construct and the skilled artisan will understand how to implement a token-sharing system, including such a system with multiple tokens where each token represents a different function.
Referring again to
During or after any preview period, an image or video may be captured for use in the collaborative image result. There are several embodiments regarding the manner in which images are captured for use in the collaborative result. Depending upon the embodiment, not all nodes will necessarily capture images. In some instances a functional or creative decision may be made to exclude a node and in other instances, a node may be used only for the data it provides to aid in the overall process (i.e. even if captured, image data may not be used from every node).
In one embodiment, all devices may capture an image or video and record associated metadata. For example, in addition to the image or video data and any camera-related metadata (e.g. focal length, aperture, etc.), a node may record sensor data, clock data, battery data, user preference data, and any other information that may be useful in combining the captured images/video. Any data, including sensor data or user input relating to relative and absolute positioning of the camera, may be used to combine the plurality of images or video (e.g. GPS, magnetometer, accelerometer, etc.) Of course, this data may also be employed in a preview mode where the collaboration application may use a GUI to display an assumed relative positioning (e.g. on a map) of the group of cameras and the user may be asked to correct his or her node position or all node positions by dragging to a different (more correct) position.
In one embodiment, all the cameras/nodes may simultaneously capture images in order to record the subject at a single instant in time. Other embodiments provide for temporal sequencing of capture times. In these implementations, instead of having simultaneous capture, each camera or sub-group of cameras may capture its image(s) or video at designated times. Embodiments of this nature allow the individual images or video to be separated in both geographical space and time. For example, if the subject is moving, sequential triggers could provide an artistic visualization or follow the subject through its motion, appearing like an artistic video of the motion. Furthermore, if the cameras capture video instead of still images, individual frames from each camera's video can be selected in post-capture processing to achieve a wider variety of results at the discretion of the user.
Referring again to
In many embodiments, pre- and post-capture image and device properties may be shared or synchronized along with the image information. For example, all sensor information and camera settings may be distributed for use in image processing or in combining the images (e.g. stitching or fusing). In one embodiment, images may be shared in RAW form so, in order to process the images, sensor and camera information must be shared. While RAW images are large, they allow the most flexibility in post-capture processing.
Referring again to
Image combinations may also change character of the image information. For example, the varying images may be used to develop depth information relating to the subject. For example, multiple images captured from different angles provide depth and perspective information that may be useful for developing 3D images, models or scenes, or other types of image forms that exploit or expose depth and perspective information (e.g. disparity imaging). In one embodiment, the depth and perspective information may be employed to create images similar to those created by light-field or plenoptic cameras; this may provide users with the ability to perform focusing operations after the image is captured. In embodiments where this is the goal, a supervisory node (e.g. server or camera) may direct the settings and placement of the cameras and, in some embodiments, may retrieve multiple image captures from each of one or more of the group members. In one particular embodiment, a plenoptic lens effect may be achieved by: (1) setting all the nodes at the same focus level; (2) registering image or video feeds so they are all pointing at the same subject; and (3) intentionally sending focus parameters to make each node focus at different levels and snap one of more pictures at each focus point, potentially using multiple focus points for each node. The collected information may be processed by a server or supervisory node to produce a plenoptic image file or other information that may simulate post-capture focusing.
Referring again to
According to differing embodiments, the nodes may communicate by any mechanism known now or in the future. For example, any network available on a smartphone or tablet may be convenient, such as WiFi, Bluetooth, or a cellular network. Furthermore, as discussed above, communications may be direct or through network equipment/computers and/or a server. Moreover, all the nodes may not use the same communications network or protocol. For example, nearby nodes may use a PAN while geographically-distant nodes may use a cell network.
Since many embodiments of the invention envision communication between and among several nodes, a communications protocol may be desirable to ensure that the nodes receive all the intended messages. Many satisfactory protocols are known to the skilled artisan. For example, one embodiment may use a token system where a device must have an electronic token to send communications. The token may be passed in a ring or other system in order to organize communication between and among the participants.
The arrangement of the nodes can vary greatly and can be left to the creativity of the photographers. The examples shown in
Another interesting arrangement may be to synchronize image capture in time but not in location. For example, a group of friends could take pictures at the exact same time but in different parts of the country or world, thus sharing a common experience in time. In this case, the combined image may be more suitable to a collage, but the choice of fusing and stitching may be left to the users.
In yet another embodiment, geographically-separated nodes may capture images in an absolute time sequence such as minutes or seconds apart or even at the same local time. Moreover, time-separated capture sequences may be useful at live events where the collaborative group desires to track moving objects and the members of the group are arranged in the path of movement (e.g. 10 group members at a race track following a car or group of cars; or 5 members at a basketball game following a fast break). In different time-separated capture sequences, sensor data, such as time of day, GPS information, event information, and map information may be useful. For example, the collaboration application may use GPS, time, location, and event information to prompt each user to take a picture at their evening meal (text or graphics can be used to prompt the user regarding the image subject, and/or image analysis can be used to verify or identify the image subject).
Information regarding the physical node arrangement (e.g., the relative location of nodes and the direction of each lens) may be useful in pre-capture adjustments and post-capture processing operations. In order to determine the lens position and relative location of the nodes with respect to each other and the subject, a combination of sensor data and user input data may be used. In one embodiment, each of multiple devices may try to determine its relative position with respect to other nodes and the subject. After making such an attempt, the device may share with other nodes (or a supervisory node) both its source data (e.g. sensor data) and its conclusions regarding relative positioning. The shared data may speed or improve the final result calculated by a server, a supervisory, node or a combination thereof. Differing embodiments of the invention envision any one or more of the following techniques and information sources that may be used to estimate relative position and lens direction of other nodes: signal strength; audible signals broadcast by nodes at a scheduled time and received by other nodes at a relative time; audible signal measurement where different nodes use different frequencies of sound; a GUI on each node may be used to cue users to move their node or signal to the remainder of the group (e.g. raise a hand); and, any accessible data available from computing devices within or proximate to the nodes but not participating in the collaborative imaging project.
In some embodiments using supervisory nodes, the GUI on the supervisory node may display the camera view of every camera in the group. An illustrative example of this arrangement is shown in
A toolset for each node may provide a variety of capabilities for the operator of a supervisory node. For example, the supervisory node's view of the feeds may be real-time or near-real-time so that adjustments made to the camera setting or node position and tilt may be evaluated by reviewing the image data that results from those changes. In some embodiments, the supervisory node may additionally receive feedback in the form of a post-processed combined image. For example, with reference to
The user interface of
Depending upon user preference and/or the size of the supervisory node screen, the GUI types of
In some embodiments, the real-time or near-real-time feeds may be reduced in resolution or highly compressed in order to speed processing and transmission. In addition, the combined image GUI of
Some embodiments of the invention may use one or more communication tokens to define the level of communication that a particular node can have with the organizer. In a multiple token situation, different tokens may represent different forms of communication: a token for voice communication; a token for video feed; a token for still image feed; and a token for sending instructions to other nodes.
In cases where cameras are geographically disbursed, the nodes or the supervisory node may present a GUI that overlays the feeds shown in
In accordance with some embodiments, each supervised node may display a GUI or may allow audio information to inform the user of a supervised node regarding instructions from the supervisory node. In one embodiment, suggestions or instructions from the supervisory node may be received by a supervised node and may be graphically displayed (e.g. by arrows or by placing a goal box on the low-res combined image) or may be displayed in text or both. An example of such an interface is shown in
With reference to
In another set of embodiments, the supervisory node user may use the GUI of
In yet another embodiment, the automated iterative software assistance discussed above can be used to assist the user of a supervised node in maintaining the camera at the desired position. For example, once the supervisory node indicates that a node's position is correct, the software may treat that position as the target position and consistently provide instructions to the user of the supervised node regarding the maintenance of that position.
In another embodiment, subject to user preferences and settings, each node's GUI may provide a preview of the combined image with a highlighted box showing the node's contribution to the combined image. This embodiment may aide the users of supervised nodes in holding or finding a desired camera position.
Referring now to
It is to be understood that the above description is intended to be illustrative, and not restrictive. The material has been presented to enable any person skilled in the art to make and use the invention as claimed and is provided in the context of particular embodiments, variations of which will be readily apparent to those skilled in the art (e.g., many of the disclosed embodiments may be used in combination with each other). In addition, it will be understood that some of the operations identified herein may be performed in different orders. The scope of the invention therefore should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein.”