Displayed images are ubiquitous in the modern world. Such images can be presented via paintings, paper pictures, digital picture frames, paper and electronic billboards, advertisements, computer monitors, digital signage, and a wide variety of other mechanisms that shown images to human observers,
It is frequently beneficial to convey information with displayed images. For example, it may be beneficial to convey a Web address for a car company along with a picture of an available car so that an interested viewer can access more information about the car. Naturally, there are countless other examples of instances in which it would be beneficial to convey information with images.
Accordingly, new mechanisms for conveying information with images are provided.
Systems, methods, and media for extracting information and a display image from two captured images are provided. In some embodiments, systems for extracting information and a display image from two captured images are provided, he systems comprising: a rolling shutter sensor; and a hardware processor coupled to the rolling shutter sensor that is configured to: cause the rolling shutter sensor to capture two captured images; receive the two captured images; and extract the information and the display image from the two captured images, wherein the information is represented in the captured images as a flicker pattern.
In some embodiments, methods for extracting information and a display image from two captured images are provided, the methods comprising causing a roiling shutter sensor to capture two captured images using a hardware processor; receiving the two captured images using the hardware processor; and extracting the information and the display image from the two captured images using the hardware processor, wherein the information is represented in the captured images as a flicker pattern.
In some embodiments, non-transitory computer-readable media containing computer executable instructions that, when executed by a processor, cause the processor to perform method for extracting information and a display image from two captured images are provided, the method comprising: causing a rolling shutter sensor to capture two captured images; receiving the two captured images; and extracting the information and the display image from the two captured images, wherein the information is represented it the captured images as a flicker pattern.
Systems, methods, and media for extracting information and a display image from two captured images are provided.
In accordance with some embodiments, mechanisms (which can include systems, methods, and media) for extracting information and a display image from two captured images can enable displays and cameras to communicate with each other, while also displaying and capturing images for human consumption. A message can be transmitted with these displays by temporally modulating a display's brightness at high frequencies (that are imperceptible to humans) While displaying an image. The message can then be captured by a rolling shutter camera which converts the temporally modulated incident light into a spatial flicker pattern. In the captured image, the flicker pattern will be superimposed on the image shown on the display. The flicker pattern and the display image can be separated by performing suitable signal processing on two such captured images of the display that are captured with different exposure settings.
Turning to
Any suitable image display with a modulated brightness can be used as display 102 in some embodiments. For example, as shown in
In some embodiments, display 202 can be a controllable display, such as an LCD display. In such a case, an image presented on display 202 can be controlled by image driver 214 based on an image from an image source 212 (such as a memory device, an interface, a hardware process etc.). In some embodiments, display 202 can be a non-controllable display, such as a glass display that holds a fixed transparency with content printed thereon. in such a case, no image driver or image source is needed.
Another example of a display that can be used to implement display 102 is shown in
In some embodiments, display 222 can be a controllable display, such as a transflective display. In such a case, an image presented on display 222 can be controlled by image driver 228 based on an image from image source 230 (such as a memory device, an interface, a hardware processor, etc.). In some embodiments, display 202 can be a non-controllable display, such as a printed piece of paper, a work of art, etc. In such a case, no image driver or image source is needed.
Although two examples of displays that can be used to implement display 102 are shown above, it should be apparent to one of ordinary skill in the art that any suitable mechanism that provides an image and a modulated light source can be used to implement display 102.
In some embodiments, multiple modulated light sources can be used per display 102 and/or multiple displays 102 can be used to provide a higher data rate and/or multiple streams of information. When multiple modulated light sources are used with the same display, the light sources can be used at different frequencies for the same portion of the display or can be used at any suitable frequency for different portions of the display.
Turning to
When acting as a rolling shutter camera, sensor 304 can capture light one row at a time. For example, as shown in
As mentioned above, in order extract the information and the. display image from the captured image, two images can be captured and processed. An example of a mechanism through which this occurs is described below.
In order to capture two images, any suitable technique can be used. For example, in some embodiments, a bracketing mode can be used in which two sequential pictures are rapidly taken. Such an approach performs best when there is little or no movement of the camera. In instances where there is some movement, any suitable mechanism can be used to correct the captured image(s) for that motion.
As another example, in some embodiments, a simultaneous dual exposure (SDE) sensor can be used. An illustration 310 of an example of a SDE sensor is shown in
In the following paragraphs, an example of a way in which the information and the display image can be extracted from two captured images, in accordance with some embodiments, is described.
In the following paragraphs, it is assumed that the display completely occupies the sensor field-of-view so that every sensor pixel receives light only from the display area. This assumption is made only for ease of explanation, and is not a requirement of the mechanisms described herein. In general, a sensor pixel may receive light from outside the display, due to the display not completely occupying the sensor's field of view or due to occlusions. It can be shown that the image formation model for the general case has the same form as that of the special case where pixels receive light only from the display.
Conceptually, the display can be thought of as having two layers—a signal layer and a texture layer. The signal layer contains the information to be conveyed and the texture layer contains an image that is displayed for human consumption.
As set forth above, the information can be transmitted from display 102 to camera/processor 104 by temporally modulating the brightness of the display. The function through which the brightness is modulated is referred to herein as the signal function, f(t).
Any suitable technique for modulating the brightness of the display can be used in some embodiments. For example, in some embodiments. phase-shift keying (PSK) signal coding, where information is embedded in the phase of sinusoidal signals at any suitable frequency (e.g., 500 Hz), can be used. More particularly, for example, binary PSK signal coding, where the phase θ of sinusoids takes binary values (0 and π), thus encoding binary can be used to encode and transmit bits sequentially in time in some embodiments.
As the camera observes the display, a combination of light rays (shown as 206 and 226 in
Because the entire display is modulated by a single temporal function f(t) (in the present example), the radiance l(x,y,t) can be factorized into spatial and temporal components:
l(x,y,t)=ltex(x,y)f(t) (1)
where ltex(x,y) is the amplitude of the temporal radiance profile at pixel (x,y), and is determined by the display's texture layer. Note that the temporal radiance profiles at different sensor pixels differ only in their amplitude ltex(x,y). Let e(x,y,t) be the exposure function at pixel (x,y). If pixel (x,y) is on (i.e., it captures light) at time t, e(x,y,t)=1, otherwise, if the pixel is off (i.e., it does not capture light) at time t, e(x,y,t)=0. The measured brightness value i(x,y) is:
i(x,y)=k∫−∞∞l(x,y,t)e(x,y,t)dt (2)
where k is the sensor gain that converts radiance to pixel brightness. Since the sensor has a rolling shatter. different rows capture light during different, shifted time intervals. The amount. of shill is determined by the row index y and the speed of the rolling. shutter. The exposure function e(x,y,t) can be modeled as a time-shifted function s(t):
e(x,y,t)=s(t−ty) (3)
where ty is the temporal shift for a pixel in row y. The function s(t), called the shutter function, can be any suitable function, such as a rect (pill-box) function, temporally coded shutter functions, a temporal Gaussian a high frequency binary code, or any other suitable function.
Substituting equations (1) and (3) into equation (2) provides:
i(x,y)=k ltex(x,y)∫−∞∞s(t−ty)f(t)dt=k ltex(x,y)g′(ty) (4)
where g′(ty)=(s*f)(ty) is the convolution of the signal and the shutter functions. g′(ty) is a function of the temporal shift ty, which in turn depends on the sensor row index y. Typically, ty=y/r, where r is the speed of the rolling shutter in rows per second.
Equation (4) can be re-written as:
i(x,y)=itex(x,y)×g(y) (5)
where itex(x,y)=k×ltex(x,y) represents the display image, and g(y)=g′(ty)=(s*f) (ty) is the signal image that encodes the signal function f(t). Equation (4) states that the texture layer and the signal layer of the display can be observed as two separable (and unknown) components: the display image and the signal image. The temporal signal f(t) manifests only in the signal image g(y), and the display's texture layer is captured only in the display image itex(x,y).
When the display brightness is changing, the signal image g(y) varies along the y dimension because different sensor rows sample the signal function f(t) at different instants. However, all the pixels in a given row sample f(t) at the same instant, and thus should have the same intensity for f(t). As a result, g(y) has the form of a horizontal flicker pattern. Since the signal image g(y) is one-dimensional (1-D), for computational efficiency, analysis can be performed on horizontal sum images, which are 1-D signals, i.e., i(y)=Σxi(x,y) and itex(y)=Σxitex(x, y). Saturated image pixels can be excluded from the summation. Then, equation (5) can be written as i(y)=itex(y)×g(y). For the remainder of the discussion below, this 1-D form of the image formation equation is used.
The image formation model in equation (5) is derived without making any assumptions about the display's shape, orientation or location with respect to the sensor, or about imaging parameters such as zoom and defocus. Since the signal component g(y) depends only on the signal function f(t) and the shutter function s(t), any changes in display-sensor geometry or imaging parameters zoom and focus) manifest only in the display image itex(x,y). Specifically, the display's orientation and location determine the shape of the display's projection in the captured image, sensor zoom influences the size of the displays projection, and camera focus determines the amount of blur in the display image (signal image g(y) is invariant to camera defocus).
If the display is partially occluded so that it is visible to a (non-empty) subset of pixels in each sensor row, because the captured image is summed horizontally, the signal image g(y) is still sampled at every row location. If αy>0 is the fraction of pixels in sensor row y that see the display, the amplitude of the signal image can be scaled by αy. Under mild assumptions, αy can be assumed to be locally constant, and absorbed in the display image. As a result, the signal image is always a horizontal flicker pattern. Its functional form and structure are invariant to the display-camera geometry, partial occlusions and camera parameters.
In some embodiments, f (t) can be a 500 Hz sinusoidal signal and the shutter s(t) can be a rect function of width 0.5 ms such that s(t)=1 when 0≦t≦0.5 ms, otherwise s(t)=0. This can result in a sinusoidal flicker pattern. Notice that the period of the flicker, hsine, is independent of camera-display geometry or camera zoom. Even if only a small fraction of the display is visible to the camera due to large zoom, the flicker image can retain the same structure, and captures the information contained in the signal function, in some embodiments.
in order to decode the information in the signal image g(y), it weds to be separated from the display image itex(y). Since both signal and display components are unknown, in general, they cannot be separated from a. single captured image. However, if two images i1(y) and i2(y) are captured with two different shutter functions s1(t) and s2(t), two different equations are obtained, which will enable the signal image and the display image to be separated.
As described above, in some embodiments, the two images can be captured sequentially using the exposure bracketing mode. This approach, while suitable for static scenes and cameras, is prone to errors if there is scene/camera motion. As also described above, in some embodiments the two mages can be captured using a single SDE sensor that captures two images with different exposure functions simultaneously in a single shot.
The two images can be given as:
i
1(y)=itex(y)×(s1*f)(ty) (6)
i
2(y)=itex(y)×(s2*f)(ty) (7)
This is a system of two equations in two unknowns: signal f(t) and the flicker-free display image itex(y). Since the shutter functions s1(t) and s2(t) are known, these two equations can be solved. simultaneouslv to recover both f(t) and the flicker-free image itex(x,y).
The signal f(t) can be considered to be a sum of sinusoids of different frequencies (the set of frequencies is typically a small, discrete set). These frequencies can have any suitable values such as 1 kHz, 2 kHz, 3 kHz, 4 kHz, etc. This signal encoding scheme is called orthogonal-frequency-division-multiplexing (OFDM). However, any suitable encoding scheme can be used in some embodiments.
In some embodiments, for each frequency, information can be embedded in the phase of the sinusoids. This method of embedding information is called phase-shift keying. For instance, in binary phase-shift keying, binary symbols of 0 and 1 can be embedded by using sinusoids of phase 0 and π, respectively. Bits (sinusoids with different phases) can be transmitted sequentially in time. An example for a single frequency is illustrated in
I
1(ω)=Itex(ω)*(S1(ω)F(ω)) (8)
I
2(ω)=Itex9ω)*(S2(ω)F(ω)) (9)
where ω is the spatial frequency. The functions denoted by uppercase letters are the Fourier transforms of the functions denoted by the corresponding lower case letters. These two equations can be combined as:
I
1(ω)*(S2(ω)F(ω))−I2(ω)*(S1(ω)F(ω))=0 (10)
The temporal signal f(t) consists of a small, discrete set of temporal frequencies Ω=[ωi, . . . , ωM]. Equation (10) needs to only be solved for the frequency set Ω. Let {right arrow over (I)}1 be the vector of values [I1(ω1), . . . , I1(ωM)]. The vectors {right arrow over (I)}2, {right arrow over (S)}1, {right arrow over (S)}2, and {right arrow over (F)} defined similarly. By observing that convolution can be expressed as multiplication by a Toeplitz matrix and element-wise multiplication as multiplication by a diagonal matrix, equation (10) can be compactly represented in matrix form as:
(I1S2−I2S1F=0 (11)
where I1 and I2 are Toeplitz matrices defined by vectors {right arrow over (I)}2 and {right arrow over (I)}2, respectively, S1 and S2 are diagonal matrices defined by vectors {right arrow over (S)}1 and {right arrow over (S)}2, respectively. The matrices I1 and I2 are defined by captured image intensities and S1 and S2 are defined in terms of the known shutter functions.
The goal is to recover the unknown vector {right arrow over (F)}. the above equation can be solved as a linear system of the form AX=0. In order to avoid the degenerate solution ({right arrow over (F)}=0) and ambiguity (if {right arrow over (F)} is a solution, then s{right arrow over (F)} is also a solution tor any complex numbers), the constraint that F(0)=1.0, i.e., the DC level of the signal f(t) is 1.0, can be imposed.
Because the signal f(t) includes multiple bits that are transmitted sequentially, these bits can be captured at different spatial locations in the signal image. Thus, each bit can be recovered by analyzing a corresponding portion of the captured image. The size of the portion, hbit, is the number of image rows required to encode a single bit. hbit can be determined by the signal frequency such that higher frequencies of g(y) (due to f(t)) having high temporal frequency) result in smaller portion sizes. Thus, the captured images i1(y) and i2(y) can be divided into 1-D portions, and {right arrow over (F)} (the Fourier transform of f(t)) can be recovered by computing equation (11) on each interval individually. Since computations are done locally, I1(ω) and I2(ω) are the short time Fourier transforms (STFT) of i1(y) and i2(y). Once {right arrow over (F)} is computed, the signal f(t) and the embedded information can be recovered by applying an inverse Fourier transform. The display image itex(x,y) can then be computed using equation (5): itex(x,y)=i(x,y/g(y)=i(x,y)/((s*f)(ty). If one of the shutter functions is significantly longer than the period of the signal f(t), the corresponding g(y) will be approximately constant. In that case, the corresponding captured image i(x,y) is nearly flicker free, and can directly be used as the display image.
As mentioned above, in some embodiments, an exposure bracketing mode can be used to capture the two images needed to extract the information and display image in some embodiments. However, because the two images are taken sequentially, the second image samples the emitted temporal signal at a different time instant than the first image, and thus captures a different temporal signal f′(t). The two images can thus be given as:
i
1(y)=itex(y)×(s1*f)(ty (12)
i
2(y)=itex(y)×(s2*f′)(ty) (13)
To solve these equations, two images and ishort and ilong can be captured with short and long exposures, sshort and slong, respectively. If slong chosen so that it is significantly longer than. the period of the temporal signal, the signal image glong(y)=(slong*f) is approximately constant, irrespective of the time instance When the signal is sampled. Thus:
(slong*f)(ty)≈(slong*f′)(ty)≦K (14)
where K is a constant. By using the above approximation, the two images ishort and ilong can be expressed as:
i
short(y)=itex(y)×(sshort*f)(ty) (15)
i
long(y)=itex(y)×(slong*f′)(ty)≈itex(y)×(slong*f)(ty) (16)
Equations (15) and (16) are the same as equations (6) and (7). Thus the signal f(t) can be estimated using the same technique for solving for f(t) described above.
Note that the data transmit rate is halved since only the signal transmitted during the capture of short exposure frames is decoded. Because ilong(x,y) can be approximated as the texture image, it is also possible to estimate flicker component by calculating image ratio iratio(x,y)=short(x,y)/ilong(x,y)≈gshort(y)/K.
The approach to extracting the information and the display image when using exposure bracketing described above assumes that both the scene and the camera are static while the two images are captured. If there is scene/camera motion during capture, the images need to be aligned by computing relative motion between them. Unfortunately, if the inter-frame motion is large, image alignment techniques often produce inaccurate results. This can result in erroneous signal recovery.
Any suitable exposure lengths can be used in some embodiments. For example, the two exposures can be 0.25 ms and 16 ms in some embodiments.
As described above, in order to avoid errors in the recovered signal due to motion, the two images with different exposures can be captured simultaneously in some embodiments. One way to achieve this is by using two synchronized cameras that are co-located using additional optics. In another example, two different exposures can be captured in a single image using a simultaneous dual exposure (SDE) sensor. Since little or no motion is present in these images, the signal f(t) can solving for as described above.
If the sensor and the display are not temporally synchronized, the start of the transmitted signal cannot be localized in the signal image, and the signal cannot be decoded. In order to synchronize the sensor and the display, any suitable technique can be used. For example, in some embodiments a pilot symbol can be embedded in the signal to determine the beginning of the signal. The pilot symbol can be a sinusoid of a frequency (e.g., 1 kHz when only a single 2 kHz modulation frequency is used) that s not used to encode the main signal (so that it s readily detected) in some embodiments.
Additionally, in some embodiments, a guard interval based synchronization can be used to determine the start of every symbol (bit) In this scheme, the end of each symbol is copied to its beginning. Then, by self-correlating the signal f(t) with itself, the beginning location of every symbol is computed.
There are various possible sources of error that can impact the technique described above for recovering f(t), such as sensor saturation, low display brightness, small display area, and sensor noise. Moreover, severe occlusions where none of the pixels in a sensor row sees the display can lead to errors. Further, if the display occupies only a small area in the captured image, the signal image can have a low amplitude and the recovered signal f(t) can have a low signal-to-noise ratio. In all these scenarios, the. recovered signal f(t) may have errors which are desirable to detect.
In order to detect errors, the left hand side of equation (11), ((I1S2−I2S1){right arrow over (F)}) (where {right arrow over (F)} is the recovered solution for a portion of the captured image), can be computed and if the value is greater than a prescribed threshold, the recovered signal can be determined to be erroneous. Any suitable threshold such as 0.5 can be used in some embodiments.
In some embodiments, occlusions can be addressed by creating redundancy in the transmitted signal f(t). For example, the display can transmit the signal f(t) repeatedly and the sensor can capture a sequence of frames (assuming small or no inter-frame motion). The signal length can be optimized so that a particular signal bit is captured in different image locations in successive captured images. Since the errors are location specific (due to occlusions or low texture brightness), if a bit is decoded incorrectly in one frame, it may be decoded correctly in a subsequent frame. The number of frames that need to be captured depends on the signal size, the extent of occlusions, the brightness of background texture, display area and sensor noise.
The above-described mechanisms can be used for any suitable application(s). For example, in some embodiments, these mechanisms can be used in a spotlight configuration in which an LED lamp illuminating a photograph on a wall is used to tag the photograph with meta-information (e.g., the time and location of the photograph) by modulating, the lamp brightness. The meta-information can then be received by capturing two rolling shutter images and extracting the meta-information and display image as described above.
As another example, in some embodiments, these mechanisms can be used for marketing and conveying meta-information (e.g., URLs, schedules of shows, item-prices and availability, etc.) via LED-based billboards installed in public places that display images and modulate the display brightness based on meta-information. Users can receive the information by simply pointing their cell phone towards the display, and the cell phone capturing images and performing the processing described above to extract the meta-information so that it can be displayed to a user.
As still another example, in some embodiments, these mechanisms can be used for pairing of cell phone screens with displays. This can allow a user to have a large display as an extension of his/her small cell phone screen, and also to share a large display with other users. Pairing of cell phones with a display can be achieved by the display broadcasting its unique pairing key as modulated display brightness. One or more users can receive the key by simply by pointing his/her/their phone(s) towards the display, the phone(s) capturing images and extracting the key information, and establishing pairing using, the key. Once pairing is established, the cell phone(s) can send data (e.g., images, videos, etc.) to be displayed on the screen using an existing communication modality, such as Wi-Fi, Bluetooth, or any other suitable communication mechanism. In some embodiments, if there are multiple displays available, the display can be selected by capturing an image of its display.
As still another example, in some embodiments, these mechanisms can be used during a presentation being given on a large screen. Using the mechanisms a member of the audience can pair their cell-phone/laptop (like described in the paragraph above) with the screen and show relevant information (e.g., additional charts. web-pages).
As still another example, in some embodiments, these mechanisms can be used to perform non-line-of-sight communication where a spotlight shining on a surface conveys meta-information about it. The information can be received by a user by simply pointing his/her cell-phone at the surface. This functionality can be used in museum settings, for example. More particularly, for example, a strategically installed spotlight can serve the dual purpose of enhancing an artifact's appearance. while simultaneously communicating information about it in an unobtrusive manner.
As still another example, in some embodiments, these mechanisms can be used for indoor navigation and location specific services. More particularly, for example, the mechanisms can utilize a single light source or an array of light sources (e.g., ceiling lights) as transmitters. The light sources, in addition to providing illumination, can also broadcast their location (or other location specific information).
In some embodiments. any suitable computer readable media can be used for storing instructions for performing the functions and/or processes described herein. For example, in some embodiments, computer readable media can be transitory or non-transitory. For example, non-transitory computer readable media can include media such t magnetic media (such as hard disks, floppy disks, etc.), optical media (such as compact discs, digital video discs, Blu-ray discs, etc., semiconductor media (such as flash memory, electrically programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), etc.), any suitable media that is not fleeting or devoid of any semblance of permanence during transmission, and/or any suitable tangible media. As another example, transitory computer readable media can include signals on networks, in wires, conductors, optical fibers, circuits, and any suitable media. that is fleeting and devoid of any semblance of permanence during transmission, and/or any suitable intangible media.
Although the invention has been described and illustrated in the foregoing illustrative embodiments, it is understood that the present disclosure has been made only by way of example, and that numerous changes in the details of implementation of the invention can be made without departing from the spirit and scope of the invention, which is limited only by the claims which follow. Features of the disclosed embodiments can be combined and rearranged in various ways.
This application claims the benefit of U.S. Provisional Patent Application No. 61/980,002, filed Apr. 15, 2014, and U.S. Provisional Patent Application No. 62/073,787, filed Oct. 31, 2014, which are hereby incorporated by reference herein in their entireties.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2015/025994 | 4/15/2015 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62073787 | Oct 2014 | US | |
61980002 | Apr 2014 | US |