The present disclosure relates to the field of image processing, associated methods, computer programs and apparatus, and in particular concerns the representation of stereoscopic images on a conventional display. Certain disclosed aspects/embodiments relate to portable electronic devices, in particular, so-called hand-portable electronic devices which may be hand-held in use (although they may be placed in a cradle in use). Such hand-portable electronic devices include so-called Personal Digital Assistants (PDAs).
The portable electronic devices/apparatus according to one or more disclosed aspects/embodiments may provide one or more audio/text/video communication functions (e.g. tele-communication, video-communication, and/or text transmission, Short Message Service (SMS)/Multimedia Message Service (MMS)/emailing functions, interactive/non-interactive viewing functions (e.g. web-browsing, navigation, TV/program viewing functions), music recording/playing functions (e.g. MP3 or other format and/or (FM/AM) radio broadcast recording/playing), downloading/sending of data functions, image capture function (e.g. using a (e.g. in-built) digital camera), and gaming functions.
Three-dimensional imaging, or stereoscopy, is any technique capable of creating the illusion of depth in an image. Typically, the illusion of depth is created by presenting a slightly different image to each of the observer's eyes, and there are various ways of achieving this. For example, to present a stereoscopic motion picture, two images are projected superimposed onto the same screen through orthogonal polarized filters. To appreciate the depth of the image, the observer may wear a pair of 3D glasses which contain a pair of orthogonal polarizing filters. As each filter passes only light which is similarly polarized and blocks the orthogonally polarized light, each eye sees one of the images, and the three-dimensional effect is achieved.
Autostereoscopy, on the other hand, is a method of displaying three-dimensional images that can be viewed without the need for polarized glasses. Several technologies exist for autostereoscopic 3D displays, many of which use a lenticular lens or a parallax barrier.
A lenticular lens comprises an array of semi-cylindrical lenses which focus light from different columns of pixels at different angles. When an array of these lenses are arranged on a display, images captured from different viewpoints can be made to become visible depending on the viewing angle. In this way, because each eye is viewing the lenticular lens from its own angle, the screen creates an illusion of depth.
A parallax barrier consists of a layer of material with a series of precision slits. When a high-resolution display is placed behind the barrier, light from an individual pixel in the display is visible from a narrow range of viewing angles. As a result, the pixel seen through each hole differs with changes in viewing angle, allowing each eye to see a different set of pixels, so creating a sense of depth through parallax.
Whilst the above-mentioned technology may be effective at creating the illusion of depth in an image, the requirement for polarized glasses or a specialised display is a disadvantage.
The listing or discussion of a prior-published document or any background in this specification should not necessarily be taken as an acknowledgement that the document or background is part of the state of the art or is common general knowledge. One or more aspects/embodiments of the present disclosure may or may not address one or more of the background issues.
According to a first aspect, there is provided a processor configured to:
The processor may be configured to depth-order the identified features for display such that the features which are determined to have changed position most are depth-ordered for display in front of features which are determined to have changed less in position.
Those identified features which are determined to have undergone a substantially similar change in position may be assigned to a layer lying parallel to a plane connecting the image capture sources, wherein the depth of the layer with respect to the plane connecting the image capture sources is unique. The term “substantially similar” in this case may refer to determined changes in position which are substantially the same, or which fall within some specified range.
Different points on the same feature which are determined to have undergone different changes in position may be assigned to different layers. Therefore, some features may be assigned to multiple layers.
The change in position may be determined with respect to a reference point. The reference point may be the centre of each image represented by the respective image data. The reference point could also be a corresponding edge of each image or a corresponding point located outside of each image.
The determined change in position may be the determined change in position of the centre of the identified features. Furthermore, the determined change in position may be a translational shift of the identified features, which might be a horizontal and/or vertical shift.
The number of identified features may be less than or equal to the total number of corresponding features present in the respective image data. The images may be represented by pixels, wherein each of the identified features comprises one or more groups of specific pixels. Each group of pixels may comprise one or more pixels.
The image data captured by the image capture sources may be captured substantially at the same time. Each image capture source may be one or more of a digital camera, an analogue camera and an image sensor for a digital camera. The image capture sources may be connected by a communication link to synchronise the capture and image processing. The image capture sources may reside in the same device/apparatus or different devices/apparatus.
The processor may be configured to calculate image data for a selected viewing angle based on the identified depth-order.
The processor may be configured to display calculated image data, the image data calculated based on the identified depth-order.
Advantageously, the processor may be configured to calculate image data for a selected viewing angle by interpolating one or more of the size, shape and translational shift position of the identified features. The processor may be configured to calculate image data for a selected viewing angle by extrapolating one or more of the size, shape and translational shift position of the identified features.
The image data from each image capture source may be encoded independently using an image compression algorithm. The calculated image data may be encoded using joint image coding. Advantageously, the calculated image data may be compressed by exploiting the redundancy between the image data from the image capture sources. One or more of the following may be encoded in the image data: the depth-order of the identified features, the depth-order of the layers, the relative difference in depth of the identified features, the relative difference in depth of the layers, and the layers to which the identified features have been assigned.
The shape of features which have been assigned to multiple layers may be smoothed in the calculated image. The shape of features in the calculated image may be interpolated or extrapolated using a morphing function.
According to a further aspect, there is provided a device/apparatus comprising any processor described herein. The device may comprise a display, wherein the display is configured to display an image corresponding to the selected viewing angle based on the calculated image data. The device may or may not comprise image capture sources for providing the respective image data to the processor.
The device may be one or more of a camera, a portable electronic/telecommunications device, a computer, a gaming device and a server. The portable electronic/telecommunications device, computer or gaming device may comprise a camera.
Advantageously, the processor may be configured to obtain the respective image data from a storage medium located locally on the device or from a storage medium located remote to the device. The storage medium may be a temporary storage medium, which could be a volatile random access memory. The storage medium may be a permanent storage medium, wherein the permanent storage medium could be one or more of a hard disk drive, a flash memory, and a non-volatile random access memory. The storage medium may be a removable storage medium such as a memory stick or a memory card (SD, mini SD or micro SD).
The processor may be configured to receive the respective image data from a source external to the device/apparatus, wherein the source might be one or more of a camera, a portable telecommunications device, a computer, a gaming device or a server. The external source may or may not comprise a display or image capture sources.
The processor may be configured to receive the respective image data from the external source using a wireless communication technology, wherein the external source is connected to the device/apparatus using said wireless communication technology, and wherein the wireless communication technology may comprise one or more of the following: radio frequency technology, infrared technology, microwave technology, Bluetooth™, a Wi-Fi network, a mobile telephone network and a satellite internet service.
The processor may be configured to receive the respective image data from the external source using a wired communication technology, wherein the external source is connected to the device/apparatus using said wired communication technology, and wherein the wired communication technology may comprise a data cable.
The viewing angle may be selected by rotating the display, adjusting the position of an observer with respect to the display, or adjusting a user interface element. The interface element may be a slider control displayed on the display.
The orientation of the display relative to the position of the observer may be determined using any of the following: a compass, an accelerometer sensor, and a camera. The camera may detect the relative motion using captured images. The camera may detect the observer's face and corresponding position relative to an axis normal to the plane of the display.
The processor may be a microprocessor, including an Application Specific Integrated Circuit (ASIC).
According to a further aspect, there is provided a method for processing image data, the method comprising:
There is also provided a computer program recorded on a carrier, the computer program comprising computer code configured to operate a device, wherein the computer program comprises:
The code could be distributed between the (two or more) cameras and a server. The cameras could handle the capturing and possibly also the compression of images while the server can do the feature and depth identification. Naturally, the cameras may have a communication link between each other to synchronise the capture and for joint image processing. When the camera modules reside in the same physical device, the above mentioned joint coding of two or more images would be easier.
The present disclosure includes one or more corresponding aspects, embodiments or features in isolation or in various combinations whether or not specifically stated (including claimed) in that combination or in isolation. Corresponding means for performing one or more of the discussed functions are also within the present disclosure.
The above summary is intended to be merely exemplary and non-limiting.
A description is now given, by way of example only, with reference to the accompanying drawings, in which:
a shows how the viewing angle can be selected by rotating the display;
b shows how the viewing angle can be selected by adjusting the position of the observer with respect to the display;
Referring to
The field of view of each image capture source is represented approximately by the dashed lines 107. As each image capture source captures an image of the scene from a different viewpoint, the features of the scene appear differently from the perspective of image capture source 101 than they do from the perspective of image capture source 102. As a result, the respective images 201, 202 (two-dimensional projections of the scene) captured by the image capture sources are different. The images captured by each of the image capture sources are illustrated schematically in
When faced with a single two-dimensional image, an observer relies on the appearance of the features present in the image, and the overlap of these features, in order to perceive depth. For example, with respect to image 202 of
With reference to
Starting with a pair of images 301, 302 captured according to
Having identified the corresponding features 304, 305, 306 from the respective images 301, 302 (or from the respective image data if the problem is being solved by a computer), the next step is to determine the change in position between images of each identified feature. This can be considered with respect to a reference point (e.g. a co-ordinate origin) which can be a point on the identified feature itself. The reference point can be any point inside or outside the image, provided the same point is used with respect to each image. In
Likewise, the change in position may be the change in position of any point in the identified feature, provided the same point is used with respect to each image. In
Rather than just using two image capture sources to determine both the horizontal and vertical shift of each identified feature, the vertical shift could be determined independently using an additional image capture source (not shown) positioned at a different point on the y-axis from image capture sources 101 and 102 (e.g. immediately above or in-between image capture sources 101, 102). With this arrangement, the images captured by image capture sources 101 and 102 could be used to determine the horizontal shift of each feature, whilst the image captured by the third image capture source could be used in combination with at least one of the other images, 201 or 202, to determine the vertical shift of each feature. The vertical and horizontal shift calculations should result in similar depth order information and thus calculation of both shifts could be used as a verification of the depth order calculated for a feature.
Considering un-shaded feature 304, the horizontal and vertical distances from the centre 307 of image 301 to the centre 308 of feature 304 are denoted X1 and Y1, respectively. Likewise, the horizontal and vertical distances from the centre 307 of image 302 to the centre 308 of feature 304 are denoted X2 and Y2, respectively. The horizontal and vertical shifts are therefore (X1-X2) and (Y1-Y2), respectively. As mentioned above, the vertical shift in the present case is zero. The change in position may be determined in this way for every corresponding feature.
In the above example, the centre of the image 307 has been used. It will be appreciated that, in other embodiments, the horizontal/vertical shifts (change in position) of the identified feature can be obtained directly by determining a motion vector which defines the shift in position of the feature from one image 301 to the other image 302. In such a case, the reference point could be considered to be the starting position of the identified feature (e.g. by comparison of the starting point of the centre of the identified feature in image 301 with the ending point of the centre of the identified feature in image 302). It will be appreciated that the magnitude of the motion vector will represent the change in the position of the feature.
It turns out that the shift in position is related to the depth of the features (position on the z-axis). In
The relative depth information can then be used to calculate images of the scene from viewing perspectives not captured by the image capture sources 101, 102. This calculation is performed by interpolating or extrapolating image data from each of the captured images as will be discussed shortly. Once the relative depth information is obtained for each of the corresponding features, the features may be ordered for display according to depth. The features which are determined to have changed in position most are depth-ordered for display in front of features which are determined to have changed less in position.
To simplify the ordering (which is useful when the images contain a large number of corresponding features), the features may be assigned to different layers (planes), each layer lying parallel to the xy-plane. In the above case, features 304, 305 and 306 may be assigned to different layers, each layer having a different depth (z-component). Those identified features which are determined to have undergone a substantially similar change in position may be assigned to the same layer. The term “substantially similar” may refer to determined changes in position which are substantially the same, or which fall within some specified range. Therefore, any features whose relative depth falls within the specified range of depths may be assigned to the same layer. The choice of the particular specified range of depths will depend on the subject scene (e.g. images of features taken at close up will have different specified ranges to those of features taken at a distance).
Also, different points on the same feature which are determined to have undergone different changes in position, and which are therefore positioned at different relative depths (i.e. the feature itself has a depth), may be assigned to different layers. Essentially this means that some features may be assigned to multiple layers.
Once the features have been assigned to different layers, the calculation of images (or image data) from different viewing perspectives can be performed. The purpose of calculating such images is to enable the representation of an image on a conventional display 601 in such a way as to create the illusion of depth, without necessarily requiring the need for polarized glasses or specialised display technology. The technique involves presenting an image of the captured scene on the display 601 which corresponds to the observer's selected viewing perspective, as though the image had been captured from this selected position. If the observer then changes position, the image displayed on the screen also changes so that it corresponds to the new viewing perspective. It should be noted, however, that this technique does not produce a “three-dimensional” image as such. As mentioned in the background section, three-dimensional images require the presentation of two superimposed images at the same time (one for each eye) in order to create the three-dimensional effect. In the present case, a single two-dimensional image is produced on the display which changes to suit the position of the observer with respect to the display. In this way, the observer can appreciate the depth of the image by adjusting his position with respect to the display.
In one embodiment, the depth information obtained from the determined change in position of the identified features is encoded with the image data for each of the captured images. Each of the captured images is encoded separately, although the depth information is common to each image. Any redundancy between the images may be used to improve the overall compression efficiency using joint image coding. Coding and decoding of the images may be performed using known techniques.
In order to calculate an image (or image data) for an arbitrary viewing perspective (which lies between the perspectives at which each image was captured), the size, shape and position of the identified features are interpolated from the features in the captured images (image data). Two scenarios can be considered, one where the display 601 is moved with respect to an observer 602 (e.g. in the case of a small hand-portable electronic device) as in
As mentioned above, the perspective of the observer 602 may be selected by adjusting his position in the xy-plane with respect to the axis 603 normal to the centre of the plane of the display 601. The change in the observer position may be determined using appropriate sensing technology. In the present example, the image on the display would only vary if the observer changed position on the x-axis, as shown in
As mentioned above, the perspective of the observer may also be selected by adjusting the orientation of the display 601 with respect to the observer 602, keeping the position of the observer constant, as shown in
To illustrate the calculation method, consider the scene in
The average values of each characteristic can be weighted with respect to each of the captured images. For example, when the scene is viewed from the mid-way position as described above, the values of each characteristic will fall exactly between the values of those characteristics in the captured image data. As a result, the calculated image (shown on the display) will resemble image 1 just as much as it resembles image 2. On the other hand, if the observer's position is moved further to the left or right (on the x-axis) such that angle θ is increased, the calculated image corresponding to this new position will more closely resemble image 1 or 2, respectively. As angle θ increases, the calculated image (data) converges towards the nearest captured image (or image data) until eventually, at a pre-determined maximum value, the image displayed is identical to the captured image. In practise it does not matter exactly what the maximum value of θ is. For example, the display could be configured to display one of the captured images when angle θ is 30°. On the other hand, this captured image may not be displayed until θ is 45°. There may, however, be angular restrictions set by the practicality of observer position, and the degree to which a display can be rotated.
As well as calculating images corresponding to viewing perspectives between the perspectives of the image capture sources by interpolation (as mentioned above), the image data may be extrapolated to calculate images corresponding to viewing perspectives beyond the perspectives of the image capture sources based on extrapolating the interpolated data (or the captured image data). In this way, an increase in θ past the pre-determined “maximum value” (i.e. the value at which the image displayed is identical to one of the captured images) would produce an image corresponding to a broader perspective than that of image capture source 101 or 102, rather than simply converging to the captured image 201 or 202, respectively.
Furthermore, whilst the above description refers to the capture of images (sets of image data) from two image capture sources, there may be more than two image capture sources. For example, in
As well as the size, shape and position of the features, other appearance characteristics may also be taken into account when calculating the image. For example the shading, texture, shadow, reflection, refraction or transparency of the features may be adjusted according to changes in orientation. The incorporation of these additional characteristics into the calculation may help to enhance the illusion of depth in the image.
The importance of the relative depth information should not be overlooked. To illustrate its importance, consider the captured images 701, 702 shown in
The apparatus required to perform the image processing described above will now be described. In
With reference to
In
The orientation determinator 1002 is used to determine the orientation of the display 1003 with respect to the position of the observer, and may comprise one or more of a compass, an accelerometer, and a camera. The orientation determinator may provide the orientation information to the processor 1001 so that the processor can calculate an image corresponding to this orientation.
The display 1003, which comprises a screen, is configured to display on the screen an image corresponding to the selected viewing angle, θ, based on the calculated image data. The display may comprise an orientation determinator 1002. For example, a camera located on the front of the display may determine the position of the observer with respect to the plane of the screen. The viewing angle may be selected by rotating the display, adjusting the position of the observer with respect to the display, or by adjusting a user interface element (e.g. display, physical or virtual slider/key/scroller). The user interface element may be a user operable (virtual) slider control (not shown) displayed on the display. The display may not contain a lenticular lens or a parallax barrier, and may only be capable of displaying a single two-dimensional image at any given moment.
The storage medium 1004 is used to store the image data from the image capture sources 1005, and could also be used to store the calculated image data. The storage medium may be a temporary storage medium, which could be a volatile random access memory. On the other hand, the storage medium may be a permanent storage medium, wherein the permanent storage medium could be one or more of a hard disk drive, a flash memory, and a non-volatile random access memory.
The image capture sources 1005 are spaced apart at a particular pre-determined distance, and are used to capture an image (or generate image data representing the image) of the same subject scene from their respective positions. Each image capture source may be one or more of a digital camera, an analogue camera and an image sensor for a digital camera. The images (image data) captured by the image capture sources may be captured substantially at the same time.
The device illustrated in
The computer program may comprise code for receiving respective image data, representative of images, of the same subject scene from two or more image capture sources spaced apart at a particular predetermined distance, code for identifying corresponding features from the respective image data, code for determining the change in position of the identified features represented in the respective image data, and code for identifying the depth-order of the identified features according to their determined relative change in position to allow for depth-order display of the identified features according to their determined relative change in position. A corresponding method is shown in
The computer program may also comprise code for assigning the features to layers based on their relative change in position, and code for calculating image data for a selected viewing angle using the relative depth information and received image data, wherein the image is calculated by interpolating or extrapolating the size, shape and position of the identified features.
Other embodiments depicted in the figures have been provided with reference numerals that correspond to similar features of earlier described embodiments. For example, feature number 1 may also correspond to numbers 101, 201, 301 etc. These numbered features may appear in the figures but may not have been directly referred to within the description of these particular embodiments. These have still been provided in the figures to aid understanding of the further embodiments, particularly in relation to the features of similar earlier described embodiments.
At least three implementation scenarios have been described. In the first scenario, a single device (camera/phone/image sensor) captures the image data, calculates the interpolated/extrapolated image data, and displays the calculated image to the user. In the second scenario, a first device (camera/phone/image sensor) is used to capture the image data, but a second device (camera/phone/computer/gaming device) is used to calculate and display the interpolated/extrapolated image. In the third scenario, a first device (camera/phone/image sensor) is used to capture the image data, a second device (server) is used to calculate the interpolated/extrapolated image, and a third device is used to display the interpolated/extrapolated image (camera/phone/computer/gaming device).
It will be appreciated to the skilled reader that any mentioned apparatus/device/server and/or other features of particular mentioned apparatus/device/server may be provided by apparatus arranged such that they become configured to carry out the desired operations only when enabled, e.g. switched on, or the like. In such cases, they may not necessarily have the appropriate software loaded into the active memory in the non-enabled (e.g. switched off state) and only load the appropriate software in the enabled (e.g. on state). The apparatus may comprise hardware circuitry and/or firmware. The apparatus may comprise software loaded onto memory. Such software/computer programs may be recorded on the same memory/processor/functional units and/or on one or more memories/processors/functional units.
In some embodiments, a particular mentioned apparatus/device/server may be pre-programmed with the appropriate software to carry out desired operations, and wherein the appropriate software can be enabled for use by a user downloading a “key”, for example, to unlock/enable the software and its associated functionality. Advantages associated with such embodiments can include a reduced requirement to download data when further functionality is required for a device, and this can be useful in examples where a device is perceived to have sufficient capacity to store such pre-programmed software for functionality that may not be enabled by a user.
It will be appreciated that the any mentioned apparatus/circuitry/elements/processor may have other functions in addition to the mentioned functions, and that these functions may be performed by the same apparatus/circuitry/elements/processor. One or more disclosed aspects may encompass the electronic distribution of associated computer programs and computer programs (which may be source/transport encoded) recorded on an appropriate carrier (e.g. memory, signal).
It will be appreciated that any “computer” described herein can comprise a collection of one or more individual processors/processing elements that may or may not be located on the same circuit board, or the same region/position of a circuit board or even the same device. In some embodiments one or more of any mentioned processors may be distributed over a plurality of devices. The same or different processor/processing elements may perform one or more functions described herein.
With reference to any discussion of any mentioned computer and/or processor and memory (e.g. including ROM, CD-ROM etc), these may comprise a computer processor, Application Specific Integrated Circuit (ASIC), field-programmable gate array (FPGA), and/or other hardware components that have been programmed in such a way to carry out the inventive function.
The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole, in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that the disclosed aspects/embodiments may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the disclosure.
While there have been shown and described and pointed out fundamental novel features as applied to different embodiments thereof, it will be understood that various omissions and substitutions and changes in the form and details of the devices and methods described may be made by those skilled in the art without departing from the spirit of the invention. For example, it is expressly intended that all combinations of those elements and/or method steps which perform substantially the same function in substantially the same way to achieve the same results are within the scope of the invention. Moreover, it should be recognized that structures and/or elements and/or method steps shown and/or described in connection with any disclosed form or embodiment may be incorporated in any other disclosed or described or suggested form or embodiment as a general matter of design choice. Furthermore, in the claims means-plus-function clauses are intended to cover the structures described herein as performing the recited function and not only structural equivalents, but also equivalent structures. Thus although a nail and a screw may not be structural equivalents in that a nail employs a cylindrical surface to secure wooden parts together, whereas a screw employs a helical surface, in the environment of fastening wooden parts, a nail and a screw may be equivalent structures.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2009/008689 | 12/4/2009 | WO | 00 | 5/31/2012 |