DEVICE AND METHOD FOR DATA CLEANSING

Information

  • Patent Application
  • 20250191137
  • Publication Number
    20250191137
  • Date Filed
    December 19, 2023
    a year ago
  • Date Published
    June 12, 2025
    4 months ago
  • CPC
    • G06T5/60
  • International Classifications
    • G06T5/60
Abstract
A device and a method for data cleansing are provided. The method includes following steps: receiving, by the processor, an image through the transceiver, wherein the image include a picture; when the processor determines that a continuous value corresponding to the image is greater than a continuous value threshold, performing, by the processor, a global continuity detection on the picture to obtain a gradient distribution value corresponding to the picture; and using, by the processor, the gradient distribution value to determine whether to perform a data cleansing corresponding to a training set on the picture.
Description
CROSS-REFERENCE TO RELATED APPLICATION

This application claims the priority benefit of Taiwan application serial no. 112147819, filed on Dec. 8, 2023. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.


TECHNICAL FIELD

The present disclosure relates to a device and a method for data cleansing.


BACKGROUND

Currently, in the field of artificial intelligence/deep learning, 3D modeling technology requires a lot of training data (pictures), and requires manpower to filter/clean the training data. How to perform data cleansing more efficiently is one of the goals that those skilled in the art should strive for.


SUMMARY

The present disclosure provides a device and a method for data cleansing, which can perform the data cleansing more efficiently.


The device for data cleansing of the present disclosure includes a transceiver and a processor. The processor is coupled to the transceiver, wherein the processor receives an image through the transceiver, wherein the image includes a picture; when the processor determines that a continuous value corresponding to the image is greater than a continuous value threshold, the processor performs a global continuity detection on the picture to obtain a gradient distribution value corresponding to the picture; the processor uses the gradient distribution value to determine whether to perform a data cleansing corresponding to a training set on the picture.


The method for data cleansing of the present disclosure includes following steps: receiving, by the processor, an image through the transceiver, wherein the image includes a picture; when the processor determines that a continuous value corresponding to the image is greater than a continuous value threshold, performing, by the processor, a global continuity detection on the picture to obtain a gradient distribution value corresponding to the picture; and using, by the processor, the gradient distribution value to determine whether to perform a data cleansing corresponding to a training set on the picture.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a schematic diagram of a device for data cleansing according to an embodiment of the present disclosure.



FIG. 2 is a flow chart of a method for data cleansing according to an embodiment of the present disclosure.



FIG. 3 is a schematic diagram illustrating obtaining the continuous value corresponding to the image according to an embodiment of the present disclosure.



FIGS. 4A and 4B are flowcharts of a method for data cleansing according to another embodiment of the present disclosure.





DESCRIPTION OF THE EMBODIMENTS


FIG. 1 is a schematic diagram of a device 100 for data cleansing according to an embodiment of the present disclosure. Please refer to FIG. 1. Device 100 may include storage device 110, transceiver 130, and processor 150.


The storage device 110 is, for example, any type of fixed or removable random access memory (random access memory, RAM), read-only memory (read-only memory, ROM), flash memory (flash memory), hard disk drive (HDD), solid state drive (SSD) or similar components or a combination of the above components, used to store multiple modules or various applications that can be executed by the processor 150.


The transceiver 130 transmits and receives signals in a wireless or wired manner.


The processor 150 is, for example, a central processing unit (CPU), or other programmable general-purpose or special-purpose micro control unit (MCU), microprocessor (microprocessor), or digital signal processor (digital). signal processor (DSP), programmable controller, application specific integrated circuit (ASIC), graphics processor (graphics processing unit, GPU), image signal processor (image signal processor, ISP), image processing unit (image processing unit, IPU), arithmetic logic unit (ALU), complex programmable logic device (CPLD), field programmable gate array (FPGA) or others Similar elements or combinations of the above elements. The processor 150 can be coupled to the storage device 110 and the transceiver 130, and access and execute multiple modules and various applications stored in the storage device 110.



FIG. 2 is a flow chart of a method for data cleansing according to an embodiment of the present disclosure, wherein the method can be executed by the device 100 shown in FIG. 1. Please refer to FIG. 1 and FIG. 2 at the same time.


In step S210, the processor 150 can receive an image through the transceiver 130, wherein the image includes a picture.


In step S220, when the processor 150 determines that a continuous value corresponding to the image is greater than a continuous value threshold, the processor 150 may perform a global continuity detection on the picture to obtain a gradient distribution value (PR, Percentile Rank) corresponding to the picture. In one embodiment, the continuous value threshold may be 0.75. An implementation example of “continuous value corresponding to the image” in step S220 will be described below with reference to FIG. 3.



FIG. 3 is a schematic diagram illustrating obtaining the continuous value corresponding to the image according to an embodiment of the present disclosure. Please refer to FIG. 1, FIG. 2 and FIG. 3 at the same time. In this embodiment, the processor 150 can perform an image normalization, an image feature value extraction, and a nonlinear layer feature regression operation on the image to obtain the continuous value. As shown in FIG. 3, processor 150 can divide each frame of the image into a uniform grid. Then, for the image in each grid, the processor 150 can use a convolutional neural network to perform the feature extraction. In detail, for the grid of the current frame, the processor 150 may use the relative position relationship between the positions of different grids to obtain scene features. On the other hand, for the grid of the cross frame, the processor 150 can obtain temporal motion trajectories, behavioral changes, and dynamic characteristics of the entire image sequence. After obtaining the features of each grid, the processor 150 can input these features into a regression model to obtain the continuous value corresponding to the image.


Please return to FIG. 2. In step S230, the processor 150 may use the gradient distribution value to determine whether to perform a data cleansing corresponding to a training set on the picture.



FIGS. 4A and 4B are flowcharts of a method for data cleansing according to another embodiment of the present disclosure. Please refer to FIG. 1, FIG. 2, FIG. 3, FIG. 4A and FIG. 4B at the same time. After the processor 150 obtains the continuous value corresponding to the image, the processor 150 may determine whether the continuous value corresponding to the image is greater than the continuous value threshold. As shown in FIG. 4A, when the processor 150 determines that the continuous value is not greater than the continuous value threshold, the processor 150 does not add the picture to the training set. On the other hand, when the processor 150 determines that the continuous value corresponding to the image is greater than the continuous value threshold, as shown in step S220 of FIG. 4A, the processor 150 can perform a global continuity detection on the picture to obtain a gradient distribution value corresponding to the picture. In this embodiment, the picture may include a pixel. Furthermore, the global continuity detection can include a picture scaling, a picture grayscale normalization, a left to right comparison of the pixel, a front to back comparison of the picture, and a picture whole area grouping. Specifically, the picture scaling can be, the processor 150 removes the details of the picture and retains the structure and the light and dark information of the picture. The picture grayscale normalization can be, the processor 150 converts the picture into grayscale. The left to right comparison of the pixel can be, the processor 150 compares all pixels from left to right. If the left pixel is larger than the right pixel, the processor 150 marks as 1. On the other hand, if the left pixel is less than or equal to the right pixel, the processor 150 marks as 0. Thereby, the processor 150 can generate a matrix of difference values for this picture. The front to back comparison of the picture can be, the processor 150 uses Hamming distance to calculate the difference degree between a previous picture and a latter picture in the image. If the Hamming distance value of the previous picture and the latter picture is smaller, the similarity between the previous picture and the latter picture is higher. On the other hand, if the Hamming distance value of the previous picture and the latter picture is larger, the similarity between the previous picture and the latter picture is lower. The picture whole area grouping can be, the processor 150 groups all pictures according to similarity (that is, similar pictures are grouped into the same group). After performing the picture whole area grouping, the processor 150 may perform gradient sorting on the pictures in each group to obtain the gradient distribution value corresponding to the picture.


Next, as shown in step S230 of FIG. 4A, when the processor 150 determines that the gradient distribution value is a mode, the processor 150 can add the picture to the training set. On the other hand, when processor 150 determines that the gradient distribution value is an extreme number, processor 150 does not add the picture to the training set. In one embodiment, the mode is, for example, the gradient distribution value between 40 and 60. On the other hand, the extreme number is, for example, that the gradient distribution value greater than 90 or less than 10.


Please continue to refer to FIG. 4A and FIG. 4B. As shown in step S410 of FIG. 4B, when the processor 150 determines that the gradient distribution value is an away from mean value, the processor 150 can use a natural image quality evaluator (NIQE) to perform a distortion detection on the picture to obtain a distortion value corresponding to the picture. In one embodiment, the above-mentioned away from mean value is, for example, the gradient distribution value greater than 10 and less than 40, or the gradient distribution value greater than 60 and less than 90. When the processor 150 determines that the distortion value is less than a distortion value threshold, the processor 150 can add the picture to the training set. In one embodiment, the distortion value threshold may be 10. On the other hand, as shown in step S420, step S430 and step S440 of FIG. 4B, when the processor 150 determines that the distortion value is not less than the distortion value threshold, the processor 150 can perform a correction operation and a filtering operation on the picture to obtain an integrity value corresponding to the picture.


In one embodiment, in step S420 of FIG. 4B, the processor 150 may use a DBGAN (Data Balancing Generative Adversarial Network) to perform the correction operation on the picture. In detail, processor 150 can use BGAN (Balancing Generative Adversarial Network) to learn the blur process of the picture to generate a blurred synthetic picture. The processor 150 can then input the blurred synthetic picture to the DBGAN to transform the picture from blurry to clear.


In one embodiment, in step S430 of FIG. 4B, the processor 150 may use an edge detection technology to perform the filtering operation on the picture. In detail, the processor 150 can use the edge detection technology to determine whether the contour of the picture is complete.


After executing step S420, step S430 and step S440 of FIG. 4B, when the processor 150 determines that the integrity value is greater than the integrity value threshold, the processor 150 may add the picture to the training set. On the other hand, when the processor 150 determines that the integrity value is not greater than the integrity value threshold, the processor 150 does not add the picture to the training set. In one embodiment, the integrity value threshold may be 100.


To sum up, the device and the method for data cleansing of the present disclosure can use the gradient distribution value corresponding to the picture to determine whether to perform the data cleansing corresponding to the training set on the picture, after determining that the continuous value corresponding to the image is greater than the continuous value threshold. In other words, after the present disclosure determines that a specific picture is not to be added to the training set, the picture can be automatically deleted (data cleansing). Based on this, no manpower is required to perform filtering/cleansing of the training data, so that the data cleansing can be performed more efficiently.

Claims
  • 1. A device for data cleansing, including: a transceiver; anda processor, coupled to the transceiver, wherein the processor receives an image through the transceiver, wherein the image includes a picture;when the processor determines that a continuous value corresponding to the image is greater than a continuous value threshold, the processor performs a global continuity detection on the picture to obtain a gradient distribution value corresponding to the picture;the processor uses the gradient distribution value to determine whether to perform a data cleansing corresponding to a training set on the picture.
  • 2. The device of claim 1, wherein the continuous value threshold is 0.75.
  • 3. The device of claim 1, wherein the processor performs an image normalization, an image feature value extraction, and a nonlinear layer feature regression operation on the image to obtain the continuous value.
  • 4. The device of claim 1, wherein when the processor determines that the continuous value is not greater than the continuous value threshold, the processor does not add the picture to the training set.
  • 5. The device of claim 1, wherein the picture includes a pixel, wherein the global continuity detection includes a picture scaling, a picture grayscale normalization, a left to right comparison of the pixel, a front to back comparison of the picture, and a picture whole area grouping.
  • 6. The device of claim 1, wherein when the processor determines that the gradient distribution value is a mode, the processor adds the picture to the training set.
  • 7. The device of claim 1, wherein when the processor determines that the gradient distribution value is an extreme number, the processor does not add the picture to the training set.
  • 8. The device of claim 1, wherein when the processor determines that the gradient distribution value is an away from mean value, the processor uses a natural image quality evaluator (NIQE) to perform a distortion detection on the picture to obtain a distortion value corresponding to the picture;when the processor determines that the distortion value is less than a distortion value threshold, the processor adds the picture to the training set.
  • 9. The device of claim 8, wherein the distortion value threshold is 10.
  • 10. The device of claim 8, wherein when the processor determines that the distortion value is not less than the distortion value threshold, the processor performs a correction operation and a filtering operation on the picture to obtain an integrity value corresponding to the picture;when the processor determines that the integrity value is greater than an integrity value threshold, the processor adds the picture to the training set;when the processor determines that the integrity value is not greater than the integrity value threshold, the processor does not add the picture to the training set.
  • 11. The device of claim 10, wherein the integrity value threshold is 100.
  • 12. The device of claim 10, wherein the processor uses a DBGAN (Data Balancing Generative Adversarial Network) to perform the correction operation on the picture.
  • 13. The device of claim 10, wherein the processor uses an edge detection technology to perform the filtering operation on the picture.
  • 14. A method for data cleansing, applicable for a device including a transceiver and a processor, wherein the method includes following steps: receiving, by the processor, an image through the transceiver, wherein the image includes a picture;when the processor determines that a continuous value corresponding to the image is greater than a continuous value threshold, performing, by the processor, a global continuity detection on the picture to obtain a gradient distribution value corresponding to the picture; andusing, by the processor, the gradient distribution value to determine whether to perform a data cleansing corresponding to a training set on the picture.
  • 15. The method of claim 14, wherein the method further includes following steps: performing, by the processor, an image normalization, an image feature value extraction, and a nonlinear layer feature regression operation on the image to obtain the continuous value.
  • 16. The method of claim 14, wherein the picture includes a pixel, wherein the global continuity detection includes a picture scaling, a picture grayscale normalization, a left to right comparison of the pixel, a front to back comparison of the picture, and a picture whole area grouping.
  • 17. The method of claim 14, wherein the method further includes following steps: when the processor determines that the gradient distribution value is an away from mean value, using, by the processor, a natural image quality evaluator (NIQE) to perform a distortion detection on the picture to obtain a distortion value corresponding to the picture; andwhen the processor determines that the distortion value is less than a distortion value threshold, adding, by the processor, the picture to the training set.
  • 18. The method of claim 17, wherein the method further includes following steps: when the processor determines that the distortion value is not less than the distortion value threshold, performing, by the processor, a correction operation and a filtering operation on the picture to obtain an integrity value corresponding to the picture;when the processor determines that the integrity value is greater than an integrity value threshold, adding, by the processor, the picture to the training set; andwhen the processor determines that the integrity value is not greater than the integrity value threshold, not adding, by the processor, the picture to the training set.
  • 19. The method of claim 18, wherein when the processor determines that the distortion value is not less than the distortion value threshold, performing, by the processor, the correction operation and the filtering operation on the picture to obtain the integrity value corresponding to the picture includes following steps: using, by the processor, a DBGAN (Data Balancing Generative Adversarial Network) to perform the correction operation on the picture.
  • 20. The method of claim 18, wherein when the processor determines that the distortion value is not less than the distortion value threshold, performing, by the processor, the correction operation and the filtering operation on the picture to obtain the integrity value corresponding to the picture includes following steps: using, by the processor, an edge detection technology to perform the filtering operation on the picture.
Priority Claims (1)
Number Date Country Kind
112147819 Dec 2023 TW national