The present invention relates to a system that supports a user's recognition of an object. Particularly, the present invention relates to a system that supports recognition of an object by using a device which acts on an acoustic sense or a touch sense of a user.
There is increasingly widespread implementation of systems that permit a user to experience a virtual three-dimensional world by using a computer. As a result, it becomes expected that the virtual world systems explore their business usage such as providing virtually created services which have been difficult to realize in a real world.
Techniques of generating an image of an object in a space viewed from a predetermined viewpoint are taught in Japanese Patent Application Laid-Open No. 11-259687 and Japanese Patent Application Laid-Open No. 11-306383. International Application No. 2005-506613, published as US 2003067440, details one example of a device which acts on a touch sense.
In such a system, an object composing a virtual world is represented by a two-dimensional image obtained by projecting three-dimensional shapes. Viewing the two-dimensional image, a user feels as if the user is seeing three-dimensional shapes and then recognizes three-dimensional objects. To experience a virtual world, therefore, it is premised that a user can sense a two-dimensional image by a visual sense, and feel three-dimensional shapes. This makes it difficult for a user, such as a visually handicapped person, to use this system without using a visual sense.
Accordingly, it is an object of the present invention to provide a system, method and program which can overcome the foregoing problem. The object is achieved by combinations of the features described in independent claims in the appended claims. Dependent claims define further advantageous specific examples of the present invention.
To overcome the problem, according to a first aspect of the present invention, there is provided a system that supports recognition of an object drawn in an image, comprising a memory device that stores, in association with each of a plurality of areas obtained by dividing an input image, a feature amount of an object drawn in the area; a selection section that selects a range of the input image to be recognized by a user based on an instruction therefrom; a calculation section that reads the feature amount corresponding to each area contained in the selected range from the memory device, and calculates an index value based on each read feature amount; and a control section that controls a device which acts on an acoustic sense or a touch sense based on the calculated index value. There are also provided a method and a program which support recognition of an image using the system.
The summary of the present invention does not recite all the necessary features of the invention, and sub combinations of those features may also encompass the invention.
The present invention will be described below by way of examples. However, an embodiment and modifications thereof described below do not limit the scope of the invention recited in the appended claims.
The client computer 100 has a memory device 104, such as a hard disk drive, a communication interface 106, such as a network interface card, and an input/output interface 108, such as a speaker, as main hardware. The client computer 100 executes a program stored in the memory device 104 to serve as a virtual world browser 12, a support system 15 and a rendering engine 18.
The virtual world browser 12 acquires data indicating a three-dimensional shape from the server computer 200 connected to, for example, an Internet 400. The data acquisition is achieved by cooperation of a hardware operating system for the communication interface 106 or the like, and device drivers. The rendering engine 18 generates a two-dimensional image by rendering three-dimensional shapes indicted by the acquired data, and provides the virtual world browser 12 with the two-dimensional image. The virtual world browser 12 presents the provided image to a user. When the image indicates a virtual world, the rendered image represents a field of view of an avatar (the avatar being a user's “representative” in the virtual world).
For example, the rendering engine 18 determines viewpoint coordinates and a view direction based on data input as the position and direction of an avatar, and renders a three-dimensional shape acquired from the server computer 200 to a two-dimensional plane. The viewpoint coordinates and the view direction may be input from a probe device mounted on the user as well as from a device, such as a keyboard or a pointing device. A GPS device installed on the probe device outputs real positional information of the user to the rendering engine 18. The rendering engine 18 calculates viewpoint coordinates based on the positional information and then performs rendering. This enables the user to feel as if the user were moving in the virtual world.
The support system 15 supports recognition of an object drawn in an image generated in the above manner. For example, the support system 15 controls the input/output interface 108 which acts on a sense other than a visual sense, based on an image in the object-drawn image which lies in a range selected by the user. As a result, the user can sense the position, size, color, depth and various attributes of an object drawn in an image, or any combination thereof with a sense other than the visual sense.
Accordingly, the user senses the depth by recognizing those two-dimensional images with the visual sense, and feels as if the user were viewing a three-dimensional shape. This allows the user to virtually experience, for example, a virtual world or the like.
To clarify the description,
The Z buffer image 300B is data storing a distance component of each pixel contained in the input image 300A in correspondence to that pixel. A distance component for one pixel indicates a distance from the viewpoint in rendering to a portion corresponding to the pixel in an object drawn in the input image 300A. Although the input image 300A and the Z buffer image 300B are stored in separate files in
A rectangular second portion having coordinates (100, 0), coordinates (104, 0), coordinates (0, 150) and coordinates (0, 154) as vertexes is used in the descriptions of
As another example, for each pixel in the second portion, the input image 300A contains values from 160 to 200 or so. Those values indicate the intensity of one color element in a case where the color element is evaluated in 256 levels from 0 to 255. In the example of
As a further example, for each pixel in the third portion, the input image 300A contains values from 65 to 105 or so. Those values indicate slightly different colors. The colors differ from those of the second portion. Referring to
For example, for any pixel in the first portion, the Z buffer image 300B contains a value “−1” as a distance component. The value “−1” indicates, for example, an infinite distance, and shows a value greater than any other value. Referring to
As another example, for any pixel in the second portion, the Z buffer image 300B contains values of “150” or so. Those values indicate slightly different distances. Referring to
As a further example, for any pixel in the third portion, the Z buffer image 300B contains values from a value “30” to “value “40” or so. Those values indicate slightly different distances. Referring to
In the foregoing descriptions of
As another example, the feature amount is not limited to a distance component and a pixel value. For example, the feature amount may indicate the attribute value of an object. The scenario of a virtual world, for example, may include a case where each object is associated with an attribute indicating the owner or manager of that object. The memory device 104 may store such attributes of objects drawn in a plurality of areas obtained by segmenting the input image 300A, in association with the areas. It is to be assumed in the following description that the memory device 104 stores the input image 300A and the Z buffer image 300B.
Specifically, the selection section 710 accepts an input in the virtual view direction from the user using the view direction input device 705A. The virtual view direction is coordinates of, for example, a point in the display area of the input image 300A. Then, the selection section 710 accepts an input of the virtual view extent of the user using the view extent input device 705B. The virtual view extent is the size of a range to be recognized with the accepted coordinates taken as a reference. Then, the selection section 710 selects the accepted size of the range with the accepted coordinates taken as a reference.
As one example, the selection section 710 accepts an input of center coordinates in a circular range using view direction input device 705A. The selection section 710 accepts an input of the radius or diameter of the circular range using the view extent input device 705B. Then, the selection section 710 selects the range with the accepted radius or diameter about the center coordinates taken as the center as the range to be recognized by the user.
As another example, the selection section 710 accepts an input of the coordinates of one vertex of a rectangular range using view direction input device 705A. The selection section 710 accepts an input of the length of one side of the rectangular range using the view extent input device 705B. Then, the selection section 710 selects the range of a square which has the accepted length as the length of one side as the range to be recognized by the user.
The view direction input device 705A is realized by a pointing device, such as a touch panel, a mouse or a track ball. Note that the view direction input device 705A is not limited to those devices as long as it is a two-degree-of-freedom device which can accept an input of coordinate values on a plane. The view extent input device 705B is realized by a device, such as a slider or a wheel. Note that the view extent input device 705B is not limited to those devices as long as it is a one-degree-of-freedom device which can accept an input of a value indicating the size of the range. The one-degree-of-freedom device can allow the user to change the size of the range as if to change the focus range of a camera.
In general, if the size of the range is made adjustable with a solid angle (one degree of freedom), the relationship between a directional vector r and an area vector S is expressed by the following equation 1.
[Eq. 1]
The calculation section 720 reads from the memory device 104 the feature amount corresponding to each area (e.g., pixel) contained in the selected range. Then, the calculation section 720 calculates an index value based on each feature amount read. For example, the calculation section 720 may read a distance component corresponding to each pixel from the Z buffer image 300B in the memory device 104, and may calculate an index value based on the sum or the average value of the read distance components.
The control section 730 controls the voice output device 740 which acts on the acoustic sense of the user based on the calculated index value. For example, the control section 730 makes the loudness of a sound from the voice output device 740 greater when the average value of the distances indicated by the index value is smaller as compared with a case where the average value of the distances indicated by the index value is larger.
When the size of the range input by the view extent input device 705B is fixed, the control section 730 has only to control the voice output device 740 based on the sum of the distances. For example, the control section 730 makes the loudness of a sound from the voice output device 740 greater when the average value of the distances indicated by the index value is smaller as compared with a case where the average value of the distances indicated by the index value is larger.
While the voice output device 740 is realized by a device, such as a speaker or a headphone, in the embodiment, the device which acts on the user is not limited to those devices. For example, the input/output interface 108 may have a device like a vibrator which causes vibration instead of the voice output device 740. The device that is to be controlled by the control section 730 is not limited to the voice output device 740, as long as it acts on the user's acoustic sense or touch sense. In this case, the control section 730 controls the reaction by such a device. Specifically, the types of the device reaction include the level of a sound, the height of the frequency of a sound, the sound pressure of a sound, the amplitude of vibration, and the level of the frequency of vibration (number of vibrations).
Next, the client computer 100 stands by until the view direction input device 705A or the view extent input device 705B accepts an input (S810: NO). When the view direction input device 705A or the view extent input device 705B accepts an input (S810: YES), the selection section 710 selects a range in the input image 300A to be recognized by the user based on the accepted input (S820). Alternatively, the selection section 710 changes the range already selected, based on the input.
Next, every time the range to be selected is changed, the calculation section 720 reads the feature amount corresponding to each pixel contained in the selected range from the memory device 104, and calculates an index value based on each read feature amount (S830). This processing may take a variety of variations discussed below.
The calculation section 720 reads a distance component corresponding to each pixel contained in the selected range from the Z buffer image 300B in the memory device 104, and calculates an index value based on each read distance component. Let Zi,j be a distance represented by a distance component for a pixel having coordinates (i, j). Also let S be the selected range. In this case, an index value t to be calculated is expressed by, for example, the following equation 2.
[Eq. 2]
The index value t in this case becomes a value which is inversely proportional to a square of the distance to an object corresponding to each pixel contained in the range S, and is inversely proportional to the area of the range S. That is, when an object positioned close to a viewpoint occupies that range S, t becomes a larger value. When the inversely reciprocal portion of the square of the distance is generalized in f(Zi,j), the index value t is expressed as follows.
[Eq. 3]
The calculation section 720 reads a pixel value corresponding to each pixel contained in the selected range from the input image 300A in the memory device 104, and calculates an index value indicating an edge component contained in an image in the selected range based on each read pixel value. Specifically, first, the calculation section 720 calculates a luminance component based on an RGB element of the pixel value.
Given that Ri,j is a red component at coordinates (i, j), Gi,j is a green component at the coordinates (i, j) and Bi,j is a blue component at the coordinates (i, j), a luminance component Li,j of the pixel at the coordinates (i, j) is expressed by the following equation 4.
[Eq. 4]
L
i,j=0.29891×Ri,j+0.58661×Gi,j+0.11448×Bi,j equation 4
Next, the calculation section 720 calculates edge components in the vertical direction and horizontal direction by applying, for example, a Sobel operator to a luminance image in which the luminance components are arranged in the layout order of the pixels. Given that EVi,j is a vertical edge component and EHi,j is a horizontal edge component, the calculation is expressed by the following equation 5.
[Eq. 5]
E
i,j
V
=−L
i−1,j−1−2Li,j−1−Li+1,j−1+Li−1,j+1+2Li,j+1+Li+1,j+1,
E
i,j
H
=−L
i−1,j−1−2Li−1,j−Li−1,j+1+Li−1,j−1+2Li−1,j+Li+1,j+1 equation 5
Then, the calculation section 720 calculates the sum of the edge components from the following equation 6.
[Eq. 6]
E
i,j=√{square root over ((Ei,jV)2+(Ei,jH)2)}{square root over ((Ei,jV)2+(Ei,jH)2)} equation 6
Of the edge components Ei,j calculated this way, the sum or average of the edge components for the selected range S may be the index value t. The calculation on the edge components can be realized by using various image processing schemes, such as a Laplacian filter or Prewitt filter. Therefore, the scheme of calculating an edge component in the embodiment is not limited to those schemes given by the equations 4 to 6.
In place of the foregoing example, the index value t may be calculated based on the combination of an edge component and a distance component as described below.
For example, the calculation section 720 may divide the edge component of each pixel contained in the range S by the square of the distance for that pixel, and sum up the calculated values for the individual pixels contained in the range S as the index value t, as given by an equation 7 below. A distance Z′i,j in the equation indicates the largest one of the distances of 3×3 pixels about the coordinates (i, j) taken as the center.
[Eq. 7]
Accordingly, it is possible to calculate the index value t which becomes larger as an edge component contained in the range S gets larger, and calculate the index value t which becomes larger as a distance component contained in the range S gets larger.
There are further variations of the combination of a distance component and an edge component. For example, for a Z buffer image in which values indicating distances corresponding to respective pixels contained in the range S are arranged in the layout order of the pixels, the calculation section 720 may calculate the edge component of the Z buffer image as an index value. This means that a greater index value is calculated for a range which contains a larger number of portions having large distance changes.
Further, the calculation section 720 may calculate an index value indicating both the edge component of the Z buffer image 300B in the range S and the edge component of an image in the range S. The index value t thus calculated is expressed by, for example, an equation 8 below.
[Eq. 8]
In the equation, Fi,j indicates an edge component at the coordinates (i, j) of the Z buffer image 300B. α indicates a blend ratio of those two edge components, which takes a real number from 0 to 1. The combination of a discontinuous component acquired from the Z buffer with the edge component of the input image 300A can make the index value t larger for a range containing the boundary between an object and the background (e.g., the contour or ridge of an object).
The calculation section 720 may calculate a plurality of index values, not just one of various index values mentioned above. As will be described later, the control section 730 uses the calculated index values to control the reaction by the sound output device 740.
Next, the control section 730 will be described. The control section 730 controls the sound output device 740 based on the calculated index value (S840). In the case (1), for example, the control section 730 makes the reaction by the sound output device 740 greater when the average value of the distances indicated by the index value is smaller as compared with a case where the average value of the distances indicated by the index value is larger.
In the case (2), the control section 730 makes the reaction by the sound output device 740 greater when the edge component indicated by the index value is larger as compared with a case where the edge component indicated by the index value is smaller. In the case (3), the combination of the processes in those two cases is taken.
In the case (4), the device reaction is influenced by the combination of the edge component of the input image 300A and the edge component of the Z buffer image 300B. If the edge component for the range S of the input image 300A is constant, the control section 730 makes the reaction by the sound output device 740 greater when the edge component indicated by the index value is larger for the range S of the Z buffer image 300B as compared with a case where the edge component indicated by the index value is smaller for the range S of the Z buffer image 300B.
If the edge component for the range S of the Z buffer image 300B is constant, on the other hand, the control section 730 makes the reaction by the sound output device 740 greater when the edge component indicated by the index value is larger for the range S of the input image 300A as compared with a case where the edge component indicated by the index value is smaller for the range S of the input image 300A.
More specifically, the control section 730 may calculate a frequency f, a sound pressure p or the intensity (amplitude) a of vibration using the index value t from the following equation 9 where cf, cp and ca are predetermined constants for adjustment. The control section 730 may vibrate the sound output device 740 based on the frequency f, the sound pressure p or the amplitude a or a combination of those constants to generate a sound from the voice output device 740.
[Eq. 9]
f=10c
p=cpt[dB]
a=cat equation 9
Alternatively, when the calculation section 720 calculates a plurality of different index values, the control section 730 may adjust a plurality of different parameters for controlling the reaction of the sound output device 740. As one example, the control section 730 controls the loudness of a sound output from the sound output device 740 based on a first index value, and controls the pitch of the sound output from the sound output device 740 based on a second index value.
More specifically, it is desirable that the first index value should be based on the sum or average of distances corresponding to individual pixels contained in the selected range S. It is desirable that the second index value should indicate the edge component of a pixel value corresponding to each pixel contained in the selected range S.
In this case, the control section 730 makes the sound pressure of a sound output from the sound output device 740 greater when the sum or average of distances indicated by the first index value is smaller as compared with a case where the sum or average of distances indicated by the first index value is larger. Further, the control section 730 makes the pitch of a sound output from the sound output device 740 higher when the edge component indicated by the second index value is larger as compared with a case where the edge component indicated by the second index value is smaller. This control can allow the user to recognize a plurality of different components, namely a distance component and an edge component, with a single sense or an acoustic sense.
As a further example, the control section 730 may change the device reaction based on a change in index value t. For example, the control section 730 may change the device reaction based on the degree of the difference between the average value of the distance components indicated by the index value calculated by the calculation section 720 before changing the selected range and the average value of the distance components indicated by the index value calculated by the calculation section 720 after changing the selected range. This method can also make it easier to recognize the boundary between the contour of a drawn object and the background.
Next, the support system 15 determines whether an instruction to terminate the process of recognizing an image has been received or not (S850). Under a condition that such an instruction has been received (S850: YES), the support system 15 terminates the processing illustrated in
With the configuration explained above referring to
In the first example, the selected range contains various objects including the background. Therefore, the calculation section 720 calculates an index value based on the average value of distances for various portions of those objects. Then, the control section 730 causes the sound output device 740 to act with the power according to the index value.
When the view direction is changed with the view extent set wide first as in the first example, the user can grasp various objects in the display area by catching the objects with a palm as if widespread.
Therefore, the calculation section 720 calculates an index value based on a distance to the cone at the foremost position. Then, the control section 730 causes the sound output device 740 to act with the power according to the index value. In the second example, as compared with the first example, the device reaction by the control section 730 is extremely strong. The device reaction becomes gradually stronger until the view extent is made gradually narrower from the state in the first example so that the cone occupies the view extent. With the view extent becoming narrower as in the second example, the device reaction does not change so much.
If, with the view direction fixed, the view extent is made gradually narrower after the rough position of a desired object is grasped as in the second example, the approximate size of a displayed object can be grasped.
Referring to
Then, the volume changes as shown at the lower portion in
If the position of the range S is changed this way sequentially, the user can accurately grasp the depth as a change in volume with the acoustic sense as if a three-dimensional shape were traced with a finger. As the volume changes distinguishably at the boundary between a three-dimensional shape and the background or at a ridge line of a three-dimensional shape, the user can accurately grasp the three-dimensional shape. For example, if the view direction is changed carefully so as not to change the volume, not changed linearly, as in the example shown in
Because the user can change the size of a range to be recognized according to the usage or the situation, as shown in
The host controller 1082 connects the RAM 1020 to the CPU 1000 and the graphics controller 1075, which accesses the RAM 1020 at a high transfer rate. The CPU 1000 operates to control the individual sections based on programs stored in the ROM 1010 and the RAM 1020. The graphics controller 1075 acquires image data which is generated by the CPU 1000 or the like on a frame buffer provided in the RAM 1020. Instead, the graphics controller 1075 may include a frame buffer inside to store image data generated by the CPU 1000 or the like.
The input/output controller 1084 connects the host controller 1082 to the communication interface 106, the hard disk drive 104 and the CD-ROM drive 1060, which are relatively fast input/output devices. The communication interface 106 communicates with an external device over a network. The hard disk drive 104 stores programs and data which the client computer 100 uses. The CD-ROM drive 1060 reads programs and data from a CD-ROM 1095, and provides the RAM 1020 or the hard disk drive 104 with the programs and data.
The input/output controller 1084 is connected with the ROM 1010, the input/output interface 108, and relatively slow input/output devices, such as the flexible disk drive 1050 and the input/output chip 1070. The ROM 1010 stores a boot program which is executed by the CPU 1000 when the client computer 100 is activated, and programs or the like which depend on the hardware of the client computer 100. The flexible disk drive 1050 reads programs and data from a flexible disk 1090, and provides the RAM 1020 or the hard disk drive 104 with the programs and data via the input/output chip 1070.
The input/output chip 1070 connects flexible disk 1090 to various kinds of input/output devices via, for example, a parallel port, a serial port, a keyboard port, a mouse port and so forth. The input/output interface 108 outputs a sound or causes vibration to thereby act on the acoustic sense or the touch sense. The input/output interface 108 accepts an input made from the user by the pointing device or slider.
The programs that are supplied to the client computer 100 are stored in a recording medium, such as the flexible disk 1090, the CD-ROM 1095 or an IC card, to be provided to a user. Each program is read from the recording medium via the input/output chip 1070 and/or the input/output controller 1084, and is installed on the client computer 100 to be executed. Because the operations which the programs allow the client computer 100 or the like to execute are the same as the operations of the client computer 100 which have been explained referring to
The programs described above may be stored in an external storage medium. An optical recording medium, such as DVD or PD, a magneto-optical recording medium, such as MD, a tape medium, a semiconductor memory, such as an IC card, and the like can be used as storage mediums in addition to the flexible disk 1090 and the CD-ROM 1095. A storage device, such as a hard disk or RAM, provided at a server system connected to a private communication network or the Internet can be used as a recording medium to provide the client computer 100 with the programs over the network.
Although the embodiment of the present invention has been described above, the technical scope of the invention is not limited to the scope of the above-described embodiment. It should be apparent to those skilled in the art that various changes and improvements can be made to the embodiment. It is apparent from the description of the appended claims that modes of such changes or improvements are encompassed in the technical scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2007-237839 | Sep 2007 | JP | national |