This application claims the priority benefit of Taiwan application serial no. 112110395, filed on Mar. 21, 2023. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
The application relates to an image generation method and an image generation device.
Traditionally, during the development and training stage of the image recognition models, developers often need to spend a lot of time collecting training images. For example, collecting a large number of the images related to the target to be recognized but different from each other in an artificial way, and then using these training images to train the image recognition model to improve its recognition ability for the target to be recognized. However, the efficiency of the method of collecting training images is low. In practice, it may even take a lot of time to collect enough training images to train the image recognition model, resulting in the delayed improvement of the image recognition ability of the model.
The application provides an image generation method, which includes: reading a first image from a storage circuit; setting an image mask on the first image, wherein the image mask covers a portion of the image area of the first image; randomly moving the image mask to change the coverage area of the image mask in the first image, and the coverage area of the image mask does not exceed the edge of the first image; extracting a second image from the first image according to the moved image mask; and storing the second image in the storage circuit.
The application also provides an image generation device including a storage circuit and a processor. The processor is coupled to the storage circuit. The processor is configured to: reading a first image from the storage circuit; set an image mask on the first image, wherein the image mask covers a portion of the image area of the first image; randomly moving the image mask to change the coverage area of the image mask in the first image, and the coverage area of the image mask does not exceed the edge of the first image; extracting a second image from the first image based on the moved image mask; and storing the second image in the storage circuit.
Based on the above, after reading the first image from the storage circuit, the image mask can be set on the first image to cover a portion of the image area of the first image. Then, the image mask can be randomly moved to change the coverage area of the image mask in the first image. In particular, the coverage area of the image mask does not exceed the edge of the first image. Then, the second image can be extracted from the first image based on the moved image mask and stored in the storage circuit. Therefore, multiple images (i.e., the second image) that are similar but different from each other can be randomly generated based on a small amount of the images (i.e., the first image).
Referring to
The electronic device 10 may include a processor 11, a storage circuit 12, and an input/output interface 13. The processor 11 is responsible for the overall or partial operation of the electronic device 10. For example, the processor 11 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other programmable general-purpose or special-purpose microprocessors, Digital Signal Processors (DSPs), programmable controllers, Application Specific Integrated Circuits (ASICs), Programmable Logic Devices (PLDs), or combinations thereof.
The storage circuit 12 is coupled to the processor 11 and is used to store data. For example, the storage circuit 12 may include non-volatile storage circuits and volatile storage circuits. The non-volatile storage circuits may include Read Only Memory (ROM), flash memory, traditional hard disk drives (HDDs), or similar non-volatile storage media. The volatile storage circuits may include Dynamic Random Access Memory (DRAM) or similar volatile storage media.
The input/output interface 13 is coupled to the processor 11 and is used to receive input signals and/or send output signals. For example, the input/output interface 13 may include a display, a network interface card, a mouse, a keyboard, a touchpad, a touchscreen, a joystick, a remote control, a microphone, and/or a speaker, and the type of input/output interface 13 is not limited to these. In addition, the electronic device 10 may also include a battery, a power management circuit, and/or other types of circuit modules, which are not limited by the present invention.
In one embodiment, the storage circuit 12 may be used to store an image (also referred to as a first image) 101. The image 101 may include a two-dimensional or three-dimensional image. In addition, the number of the images 101 may be one or more, and the present invention is not limited thereto.
In one embodiment, the processor 11 may read the image 101 from the storage circuit 12. After obtaining the image 101, the processor 11 may set an image mask on the image 101. The image mask may (only) cover a portion of the image area of the image 101. It should be noted that the shape of the image mask may be rectangular, circular, elliptical, or other shapes, and the present invention is not limited thereto. In addition, the shape of the image mask may be the same or different from the shape of the image 101, and the present invention is not limited thereto.
In one embodiment, the processor 11 may generate the image mask based on a reduction ratio and an original size of the image 101. Therefore, the relationship between the size of the image mask and the original size of the image 101 may comply with the specifications of the reduction ratio. For example, assuming that the reduction ratio is 0.8, the relationship between the size of the generated image mask and the original size of the image 101 may be: the size of the image mask is 0.8 times the original size of the image 101. Alternatively, assuming that the reduction ratio is 0.9, the relationship between the size of the generated image mask and the original size of the image 101 may be: the size of the image mask is 0.9 times the original size of the image 101. It should be noted that the reduction ratio may also be other values between 0 and 1 (such as 0.7, 0.75, 0.85, or 0.95, etc.), and the present invention is not limited thereto.
In one embodiment, assuming that the shape of the image 101 is rectangular, the original size of the image 101 can be represented or defined by the length, width, and/or total area of the image 101. Alternatively, in one embodiment, assuming that the shape of the image 101 is circular or elliptical, the original size of the image 101 can be represented or defined by the radius and/or total area of the image 101. Similarly, in one embodiment, assuming that the shape of the image mask is rectangular, the size of the image mask can be represented or defined by the length, width, and/or total area of the image mask. Alternatively, in one embodiment, assuming that the shape of the image mask is circular or elliptical, the size of the image mask can be represented or defined by the radius and/or total area of the image mask.
In one embodiment, the processor 11 can multiply the original size of the image 101 by a scaling ratio to determine the size of the image mask. For example, the processor 11 can multiply the length, width, radius, or total area of the image 101 by the scaling ratio to obtain the length, width, radius, or total area of the image mask.
In one embodiment, the processor 11 can receive user input through input/output interface 13. The user input may contain information about the scaling ratio input or selected by the user. The processor 11 can determine the scaling ratio based on the user input. Alternatively, in one embodiment, the scaling ratio may also be a default value or selected by the user from multiple candidate values, without limitation to the present invention.
In one embodiment, after setting the image mask, the processor 11 can randomly move the image mask to change the coverage area of the image mask in image 101. It should be noted that the coverage area of the image mask cannot exceed the edge of the image 101. That is, the image mask can be randomly moved (e.g., translated and/or rotated) in the image 101 without exceeding the edge of the image 101, to change the position of the image mask and its coverage area in the image 101.
In one embodiment, after random movement of the image mask, the processor 11 can extract an image (also referred to as a second image) 102 from the image 101 based on the moved image mask. For example, after random movement of the image mask, the processor 11 can extract a portion of the image 101 within the current coverage area of the image mask and generate the image 102 based on the extracted image. Then, the processor 11 can store the image 102 in the storage circuit 12.
In one embodiment, the number of the images 102 can be one. In another embodiment, the image 102 can include multiple images 102(1) to 102(n). In one embodiment, the processor 11 can randomly move the image mask and extract image 102(i) from the image 101 based on the moved image mask. After extracting image 102(i), the processor 11 can again randomly move the image mask and extract image 102(j) from the image 101 based on the moved image mask. Both i and j are between 1 and n, and i is not equal to j. Similarly, in one embodiment, the processor 11 can randomly generate multiple images 102(1) to 102(n) that are similar but different from each other based on a single image 101.
Referring to
In one embodiment, the processor 11 can set an image mask 22 in the image 21. The shape of the image mask 22 is also rectangular, with its length and width represented by H2 and W2, respectively. For example, if the scaling ratio corresponding to the width is S1 and the scaling ratio corresponding to the length is S2, then W2=W1×S1 and H2=H1×S2. Both S1 and S2 are values between 0 and 1.
In one embodiment, the width direction of the image mask 22 is parallel to the first reference direction (+/−X axis direction). The length direction of the image mask 22 is parallel to the second reference direction (+/−Y axis direction). The image mask 22 has four endpoints B1˜B4. The coordinates of endpoints B1˜B4 are (X2a, Y2a), (X2b, Y2b), (X2c, Y2c), and (X2d, Y2d), respectively. Where (X2b, Y2b)=(X2a+W2, Y2a), (X2c, Y2c)=(X2a, Y2a+H2), and (X2d, Y2d)=(X2a+W2, Y2a+H2). In addition, the coverage area of the image mask 22 in the image 21 is indicated by diagonal lines in
In one embodiment, the processor 11 can randomly determine a movement direction and a movement distance of the image mask 21. For example, the processor 11 can generate multiple random values based on one or more random functions. The processor 11 can use one of these random values (also known as the first random value) to determine the movement direction of the image mask 21. For example, different first random values correspond to different movement directions. In addition, the processor 11 can use another one of these random values (also known as the second random value) to determine the movement distance of the image mask 21. For example, different second random values correspond to different movement distances. Then, the processor 11 can move (i.e., translate) the image mask 21 according to the determined movement direction and movement distance.
In one embodiment, the processor 11 can determine a critical value for the movement distance corresponding to the Y-axis direction (also known as a first movement distance critical value) based on a distance D1 between the edge 201 of the image 21 and the endpoint B1 (or B2) of the image mask 22. Specifically, the endpoint B1 (or B2) is closer to the edge 201 in the Y-axis direction than the other endpoints B3 (or B4) of the image mask 22. For example, the first movement distance critical value can be less than or equal to the distance D1. Then, the processor 11 can limit the movement distance of the image mask 22 in the Y-axis direction to no more than the first movement distance critical value. In other words, the processor 11 can limit the random movement distance of the image mask 22 in the Y-axis direction based on the first movement distance critical value. It can prevent the coverage area of the moved image mask 22 from exceeding the edge 201 of the image 21.
In one embodiment, the processor 11 can determine a critical value for the movement distance corresponding to the +X-axis direction (also known as a second movement distance critical value) based on a distance D2 between the edge 202 of the image 22 and the endpoint B2 (or B4) of the image mask 22. Specifically, the endpoint B2 (or B4) is closer to the edge 202 in the +X-axis direction than the other endpoints B1 (or B3) of the image mask 22. For example, the second movement distance critical value can be less than or equal to the distance D2. Then, the processor 11 can limit the movement distance of the image mask 22 in the +X-axis direction to no more than the second movement distance critical value. In other words, the processor 11 can limit the random movement distance of the image mask 22 in the +X-axis direction based on the second movement distance critical value. It can prevent the coverage area of the moved image mask 22 from exceeding the edge 202 of the image 21.
In one embodiment, the processor 11 can determine a critical value for the movement distance corresponding to the +Y axis based on a distance D3 between the edge 203 of the image 22 and the endpoint B3 (or B4) of the image mask 22. Specifically, the endpoint B3 (or B4) is closer to the edge 203 in the +Y axis direction than the other endpoints B1 (or B2) of the image mask 22. For example, the third critical value for the movement distance can be less than or equal to the distance D3. Then, the processor 11 can limit the movement distance of the image mask 22 in the +Y axis direction to no more than the third critical value. In other words, the processor 11 can limit the random movement distance of the image mask 22 in the +Y axis direction based on the third critical value, thereby avoiding the coverage area of the moved image mask 22 from exceeding the edge 203 of the image 21.
In one embodiment, the processor 11 can determine a critical value for the movement distance corresponding to the X axis based on a distance D4 between the edge 204 of the image 22 and the endpoint B1 (or B3) of the image mask 22. Specifically, the endpoint B1 (or B3) is closer to the edge 204 in the X axis direction than the other endpoints B2 (or B4) of the image mask 22. For example, the fourth critical value for the movement distance can be less than or equal to the distance D4. Then, the processor 11 can limit the movement distance of the image mask 22 in the X axis direction to no more than the fourth critical value. In other words, the processor 11 can limit the random movement distance of the image mask 22 in the X axis direction based on the fourth critical value, thereby avoiding the coverage area of the moved image mask 22 from exceeding the edge 204 of the image 21.
Referring to
In one embodiment, based on the updated coordinates of endpoints B1 to B4, the processor 11 can determine whether the coverage area of the rotated image mask 22 exceeds any edge of the image 21. In one embodiment, if the coverage area of the rotated image mask 22 exceeds any edge of the image 21, the processor 11 can discard the rotation of the image mask 22. In one embodiment, if the coverage area of the rotated image mask 22 does not exceed any edge of the image 21, the processor 11 can retain the rotation of the image mask 22.
In one embodiment, if the coverage area of the rotated image mask 22 exceeds any edge of the image 21, the processor 11 can randomly determine another rotation angle (also referred to as the second rotation angle) for image mask 22. For example, the second rotation angle can also be any angle between 0 and 360 degrees, and the first rotation angle is different from the second rotation angle. Then, the processor 11 can rotate image mask 22 based on the second rotation angle. In one embodiment, the processor 11 can repeatedly rotate image mask 22 using different randomly determined rotation angles until the rotated image mask 22 does not exceed any edge of the image 21. It should be noted that in one embodiment, the processor 11 can also determine the first rotation angle and/or the second rotation angle based on user operation, and the present invention is not limited thereto.
In one embodiment, after rotating the image mask 22, the processor 11 can randomly determine a movement direction and a movement distance of the image mask 22. Then, the processor 11 can move (i.e., translate) the image mask 21 based on the randomly determined movement direction and movement distance.
In one embodiment, after rotating the image mask 22, the processor 11 can obtain distances D1′, D2′, D3′, and D4′. The distance D1′ is the distance between endpoint B2 of the rotated image mask 22 and the edge 201 of the image 21 (i.e., the shortest distance between the edge of the image mask 22 and the edge 201). The distance D2′ is the distance between endpoint B4 of the rotated image mask 22 and the edge 202 of the image 21 (i.e., the shortest distance between the edge of the image mask 22 and the edge 202). The distance D3′ is the distance between endpoint B3 of the rotated image mask 22 and the edge 203 of the image 21 (i.e., the shortest distance between the edge of the image mask 22 and the edge 203). The distance D4′ is the distance between endpoint B1 of the rotated image mask 22 and the edge 204 of the image 21 (i.e., the shortest distance between the edge of the image mask 22 and the edge 204). The processor 11 can update the first, second, third, and fourth critical distance thresholds based on the distances D1′, D2′, D3′, and D4′, respectively. For example, the updated first, second, third, and fourth critical distance thresholds can be less than or equal to distances D1′, D2′, D3′, and D4′, respectively.
In one embodiment, after rotating the image mask 22, the processor 11 can limit the movement distance of the image mask 22 in the −Y axis direction to not exceed the updated first movement distance critical value, limit the movement distance of the image mask 22 in the +X axis direction to not exceed the updated second movement distance critical value, limit the movement distance of the image mask 22 in the +Y axis direction to not exceed the updated third movement distance critical value, and limit the movement distance of the image mask 22 in the −X axis direction to not exceed the updated fourth movement distance critical value. Based on these movement distance critical values, the processor 11 can randomly move (e.g., translate) the image mask 22. It can prevent the coverage area of the moved (i.e., rotated and translated) image mask 22 from exceeding any edge of the image 21. Then, the processor 11 can generate image 102 based on the moved image mask 22.
In one embodiment, the processor 11 can continuously move (e.g., rotate and/or translate) the image mask 22. The processor 11 can sequentially generate images 102(1) to 102(n) of
Referring to
Referring to
Referring to
It should be noted that the setting and movement of the image mask in
In one embodiment, the image 101 in
Referring to
However, the steps in
In summary, by randomly moving the image mask in the first image, multiple second images that are similar but different can be randomly generated. Therefore, a small amount of the images (i.e., the first image) can be used to efficiently generate multiple images (i.e., the second image) that are similar but different, thereby effectively improving the efficiency of generating training data for artificial intelligence models. It should be noted that the generated second images can also be used for other purposes (such as building image libraries), not limited to training artificial intelligence models.
Although the present invention has been disclosed with reference to the embodiments, it is not intended to limit the present invention. Those skilled in the art may make some modifications and refinements within the spirit and scope of the present invention. Therefore, the scope of protection of the present invention shall be determined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
112110395 | Mar 2023 | TW | national |