The present disclosure relates to the field of video surveillance technique, and in particular, to systems and methods for image processing.
Video surveillance is becoming more and more important for security purposes, and is widely used in various locations, such as roads, shopping malls, residential areas, parking lots, etc. because it is accurate, timely, and informative. In recent years, with the advancement of technology, video surveillance techniques have developed rapidly. For example, different image processing functions of a capture device can be executed according to different scene categories of images. However, due to factors such as frequent continuous scene transitions in the short term, the recognition of scene categories in images may not be accurate. In addition, in video surveillance, one or more target regions (e.g., a facial region, a region including a license plate) in an image may be a concern. Thus, it is necessary to ensure that the target region(s) can be clearly displayed. However, in scenes with poor light, such as backlighting condition, reflective condition, the target region may be overexposed or underexposed, which may result in ineffective presentation of the target region. Therefore, in order to ensure that the target region has a desired brightness, it is necessary to provide systems and methods for image processing.
In one aspect of the present disclosure, a system is provided. The system may include at least one storage device and at least one processor in communication with the at least one storage device. The at least one storage device may include a set of instructions. When executing the set of instructions, the at least one processor may be configured to cause the system to perform operations including: obtaining an image captured by a capture device, multiple historical images captured by the capture device before the image, and multiple scene categories; for each of the multiple scene categories, generating a confidence level of the image belonging to the scene category, the scene category corresponding to a confidence level threshold; for each of the multiple historical images, obtaining an initial scene category to which the historical image belongs; determining, based on initial scene categories of the multiple historical images, a target scene category to which the multiple historical images belong; determining multiple updated confidence level thresholds by updating, based on the target scene category, at least a portion of multiple confidence level thresholds corresponding to the multiple scene categories; determining, based on the multiple updated confidence level thresholds and confidence levels of the image corresponding to the multiple scene categories, a final scene category to which the image belongs; and processing the image based on the final scene category of the image.
In some embodiments, the updating, based on the target scene category, at least a portion of multiple confidence level thresholds corresponding to the multiple scene categories includes updating the at least a portion of the multiple confidence level thresholds by decreasing a confidence level threshold corresponding to the target scene category.
In some embodiments, for each of the multiple scene categories, the confidence level of the image belonging to the scene category is generated using a scene recognition model, and the scene recognition model is configured to extract a target feature from the image; determine a reference feature for the scene category; determine a similarity between the target feature of the image and the reference feature of the scene category; and designate the similarity between the target feature and the reference feature as the confidence level of the image belongs to the scene category.
In some embodiments, the determining, based on the multiple updated confidence level thresholds and confidence levels of the image corresponding to the multiple scene categories, a final scene category to which the image belongs includes: for each of the multiple scene categories corresponding to the multiple updated confidence level thresholds, determining a relationship between the confidence level of the image belonging to the scene category and an updated confidence level threshold of the scene category; determining, based on relationships corresponding to the multiple scene categories, a preliminary scene category to which the image belongs; and determining, based on the preliminary scene category of the image, the final scene category to which the image belongs.
In some embodiments, the determining, based on the preliminary scene category of the image, the final scene category to which the image belongs includes: determining whether the preliminary scene category of the image and the initial scene categories of the multiple historical images are consistent; in response to the initial scene category of the image and the initial scene categories of the multiple historical images being consistent, designating the preliminary scene category of the image as an updated scene category of the image, or in response to an inconsistency of the preliminary scene category of the image and the initial scene categories of the multiple historical images, determining the target scene category as an updated scene category of the image; and determining the final scene category of the image based on the updated scene category.
In some embodiments, the determining the final scene category of the image based on the updated scene category includes: obtaining a count of scene category switches within a preset time period, wherein the preset time period is a time period during which the image and the multiple historical images are captured, and two adjacent scene categories of two continuous images being not consistent is determined as one scene category switch; and in response to the count of scene category switches exceeding a preset threshold, determining the preliminary scene category of the image as the final scene category of the image.
In some embodiments, the operations further include in response to the count of scene category switches not exceeding the preset threshold, determining the updated scene category as the final scene category of the image.
In some embodiments, different scene categories correspond to different image processing techniques, and the processing the image based on the final scene category of the image includes processing the image using an image processing technique corresponding to the final scene category of the image.
In some embodiments, the processing the image based on the final scene category of the image includes: determining whether the final scene category is a preset scene category, the preset scene category including at least one of a face recognition category or a license plate recognition category; in response to determining that the final scene category is the preset scene category, obtaining a target region within the image; determining, based on gray information of the target region, a current brightness of the target region; and processing the image based on the current brightness of the target region.
In some embodiments, the image is a human image that includes a human and the target region is a facial region, and the obtaining a target region within the image includes: identifying an initial region including the target region from the image based on an object detection model, the object detection model being a deep learning model. The initial region includes multiple objects. The multiple objects include the human. The identifying the initial region from the image includes: extracting one or more features of each object in the image; assigning a score for the each object based on the object detection model; in response to the score of an object exceeds a threshold, determining a region of the image where the object is located as the initial region; trimming the initial region according to a predetermined aspect ratio to obtain a trimmed initial region; and determining the target region by performing a G-channel downsampling on the trimmed initial region.
In some embodiments, the object is the human. In some embodiments, when the initial region includes a whole body of the human, the trimming the initial region includes: trimming the initial region by applying a three-step trimming strategy. The three-step trimming strategy includes: obtaining a trimmed region by trimming a lower part of the initial region, the trimmed region including an upper part of the body of the human; trimming a portion of an upper part of the trimmed region and a portion of a lower part of the trimmed region; and symmetrically trimming a portion of a left side and a right side of the trimmed region to obtain the trimmed initial region. In some embodiments, when the initial region includes the upper part of the body of the human, the trimming the initial region includes: trimming the initial region by applying a two-step trimming strategy. The two-step trimming strategy includes: trimming a portion of an upper part of the initial region and a portion of the lower part of the initial region; and symmetrically trimming a portion of a left side and a right side of the initial region to obtain the trimmed initial region. In some embodiments, when the initial region includes a lower part of the body of the human, the trimming the initial region includes: removing the initial region.
In some embodiments, the processing the image based on the current brightness of the target region includes: in response to determining that the current brightness of the target region is less than a minimum value of a predetermined range, increasing the current brightness of the target region, the more the current brightness is less than the minimum value of the predetermined range, the greater the increasing of the current brightness is; and in response to determining that the current brightness of the target region is greater than a maximum value of the predetermined range, decreasing the current brightness of the target region, the more the current brightness is greater than the maximum value of the predetermined range, the greater the decreasing of the current brightness is.
In another aspect of the present disclosure, a method is provided. The method is implemented on a computing device having at least one processor and at least one storage device. The method comprising: obtaining an image captured by a capture device, multiple historical images captured by the capture device before the image, and multiple scene categories; for each of the multiple scene categories, generating a confidence level of the image belonging to the scene category, the scene category corresponding to a confidence level threshold; for each of the multiple historical images, obtaining an initial scene category to which the historical image belongs; determining, based on initial scene categories of the multiple historical images, a target scene category to which the multiple historical images belong; determining multiple updated confidence level thresholds by updating, based on the target scene category, at least a portion of multiple confidence level thresholds corresponding to the multiple scene categories; determining, based on the multiple updated confidence level thresholds and confidence levels of the image corresponding to the multiple scene categories, a final scene category to which the image belongs; and processing the image based on the final scene category of the image.
In yet another aspect of the present disclosure, a non-transitory computer-readable storage medium is provided. The non-transitory computer-readable storage medium may include at least one set of instructions. When executed by at least one processor of a computing device, the at least one set of instructions may direct the at least one processor to perform acts of: obtaining an image captured by a capture device, multiple historical images captured by the capture device before the image, and multiple scene categories; for each of the multiple scene categories, generating a confidence level of the image belonging to the scene category, the scene category corresponding to a confidence level threshold; for each of the multiple historical images, obtaining an initial scene category to which the historical image belongs; determining, based on initial scene categories of the multiple historical images, a target scene category to which the multiple historical images belong; determining multiple updated confidence level thresholds by updating, based on the target scene category, at least a portion of multiple confidence level thresholds corresponding to the multiple scene categories; determining, based on the multiple updated confidence level thresholds and confidence levels of the image corresponding to the multiple scene categories, a final scene category to which the image belongs; and processing the image based on the final scene category of the image.
In yet another aspect of the present disclosure, a system for adjustment an exposure parameter is provided. The system may include an obtaining module, a current brightness determination module, a target brightness determination module, and an exposure parameter determination module. The obtaining module may be configured to obtain a target region within an image captured by a capture device. The current brightness determination module may be configured to determine a current brightness of the target region based on gray information of the target region. The target brightness determination module may be configured to determine a target brightness of the target region based on the current brightness of the target region. The exposure parameter determination module may be configured to determine one or more exposure parameters of the capture device based on the target brightness of the target region.
Additional features will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following and the accompanying drawings or may be learned by production or operation of the examples. The features of the present disclosure may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities and combinations set forth in the detailed examples discussed below.
The present disclosure is further described in terms of exemplary embodiments. These exemplary embodiments are described in detail with reference to the drawings. The drawings are not to scale. These embodiments are non-limiting schematic embodiments, in which like reference numerals represent similar structures throughout the several views of the drawings, and wherein:
In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant disclosure. However, it should be apparent to those skilled in the art that the present disclosure may be practiced without such details. In other instances, well known methods, procedures, systems, components, and/or circuitry have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present disclosure. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. Thus, the present disclosure is not limited to the embodiments shown, but to be accorded the widest scope consistent with the claims.
The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. As used herein, the singular forms “a,” “an,” and “the” may be intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprise,” “comprises,” and/or “comprising,” “include,” “includes,” and/or “including,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It will be understood that the term “system,” “unit,” “module,” and/or “block” used herein are one method to distinguish different components, elements, parts, section or assembly of different level in ascending order. However, the terms may be displaced by another expression if they achieve the same purpose.
The modules (or units, blocks, units) described in the present disclosure may be implemented as software and/or hardware modules and may be stored in any type of non-transitory computer-readable medium or another storage device. In some embodiments, a software module may be compiled and linked into an executable program. It will be appreciated that software modules can be callable from other modules or from themselves, and/or can be invoked in response to detected events or interrupts. Software modules configured for execution on computing devices can be provided on a computer readable medium, such as a compact disc, a digital video disc, a flash drive, a magnetic disc, or any other tangible medium, or as a digital download (and can be originally stored in a compressed or installable format that requires installation, decompression, or decryption prior to execution). Such software code can be stored, partially or fully, on a memory device of the executing computing device, for execution by the computing device. Software instructions can be embedded in firmware, such as an EPROM. It will be further appreciated that hardware modules (e.g., circuits) can be included of connected or coupled logic units, such as gates and flip-flops, and/or can be included of programmable units, such as programmable gate arrays or processors. The modules or computing device functionality described herein are preferably implemented as hardware modules, but can be software modules as well. In general, the modules described herein refer to logical modules that can be combined with other modules or divided into units despite their physical organization or storage.
Generally, the word “module,” “sub-module,” “unit,” or “block,” as used herein, refers to logic embodied in hardware or firmware, or to a collection of software instructions. A module, a unit, or a block described herein may be implemented as software and/or hardware and may be stored in any type of non-transitory computer-readable medium or another storage device. In some embodiments, a software module/unit/block may be compiled and linked into an executable program. It will be appreciated that software modules can be callable from other modules/units/blocks or from themselves, and/or may be invoked in response to detected events or interrupts.
Software modules/units/blocks configured for execution on computing devices may be provided on a computer-readable medium, such as a compact disc, a digital video disc, a flash drive, a magnetic disc, or any other tangible medium, or as a digital download (and can be originally stored in a compressed or installable format that needs installation, decompression, or decryption prior to execution). Such software code may be stored, partially or fully, on a storage device of the executing computing device, for execution by the computing device. Software instructions may be embedded in firmware, such as an EPROM. It will be further appreciated that hardware modules/units/blocks may be included in connected logic components, such as gates and flip-flops, and/or can be included of programmable units, such as programmable gate arrays or processors. The modules/units/blocks or computing device functionality described herein may be implemented as software modules/units/blocks, but may be represented in hardware or firmware. In general, the modules/units/blocks described herein refer to logical modules/units/blocks that may be combined with other modules/units/blocks or divided into sub-modules/sub-units/sub-blocks despite their physical organization or storage. The description may be applicable to a system, an engine, or a portion thereof.
It will be understood that when a unit, engine, module or block is referred to as being “on,” “connected to,” or “coupled to,” another unit, engine, module, or block, it may be directly on, connected or coupled to, or communicate with the other unit, engine, module, or block, or an intervening unit, engine, module, or block may be present, unless the context clearly indicates otherwise. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
These and other features, and characteristics of the present disclosure, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, may become more apparent upon consideration of the following description with reference to the accompanying drawings, all of which form a part of this disclosure. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended to limit the scope of the present disclosure.
The present disclosure relates to systems and methods for image processing. The systems and methods may obtain an image captured by a capture device, multiple historical images captured by the capture device before the image, and multiple scene categories. For each of the multiple scene categories, the systems and methods may generate a confidence level of the image belonging to the scene category. The scene category corresponds to a confidence level threshold. For each of the multiple historical images, the systems and methods may obtain an initial scene category to which the historical image belongs. The systems and methods may determine, based on initial scene categories of the multiple historical images, a target scene category to which the multiple historical images belong. The systems and methods may determine multiple updated confidence level thresholds by updating, based on the target scene category, at least a portion of multiple confidence level thresholds corresponding to the multiple scene categories. The systems and methods may determine, based on the multiple updated confidence level thresholds and confidence levels of the image corresponding to the multiple scene categories, a final scene category to which the image belongs. The systems and methods may process the image based on the final scene category of the image.
According to some embodiments of the present disclosure, by determining the multiple updated confidence level thresholds by updating the confidence level threshold based on the target scene category of the multiple historical images captured before the image, and determining a preliminary scene category of the image based on the multiple updated confidence level threshold, the difficulty of determining the preliminary scene category of the image belonging to the target scene category is increased, which maintains the stability of scene recognition.
The capture device 110 may be configured to capture one or more images. As used in this application, an image may be a still image, a video, a stream video, or a video frame obtained from a video. The image may be a two-dimensional (2D) image, a three-dimensional (3D) image, a four-dimensional (4D) image, or thel like. In some embodiments, the image may include one or more target regions, such as a facial region, a region including a license plate, or the like. The capture device 110 may be or include one or more cameras. In some embodiments, the capture device 110 may be a digital camera, a video camera, a security camera, a web camera, a smartphone, a tablet, a laptop, a video gaming console equipped with a web camera, a camera with multiple lenses, a camcorder, etc. In some embodiments, the capture device 110 may be a visible light camera, an infrared camera, or the like.
The network 120 may facilitate the exchange of information and/or data. In some embodiments, one or more components of the image processing system 100 (e.g., the capture device 110, the terminal 130, the processing device 140, the storage device 150) may send information and/or data to another component(s) in the image processing system 100 via the network 120. For example, the processing device 140 may process an image obtained from the capture device 110 via the network 120. As another example, the capture device 110 may obtain user instructions from the terminal 130 via the network 120. In some embodiments, the network 120 may be any type of wired or wireless network, or combination thereof. Merely by way of example, the network 120 may include a cable network, a wireline network, an optical fiber network, a telecommunications network, an intranet, an Internet, a local area network (LAN), a wide area network (WAN), a wireless local area network (WLAN), a metropolitan area network (MAN), a wide area network (WAN), a public telephone switched network (PSTN), a Bluetooth™ network, a ZigBee™ network, a near field communication (NFC) network, or the like, or any combination thereof. In some embodiments, the network 120 may include one or more network access points. For example, the network 120 may include wired or wireless network access points such as base stations and/or internet exchange points 120-1, 120-2, . . . , through which one or more components of the image processing system 100 may be connected to the network 120 to exchange data and/or information.
The terminal 130 include a mobile device 130-1, a tablet computer 130-2, a laptop computer 130-3, or the like, or any combination thereof. In some embodiments, the mobile device 130-1 may include a smart home device, a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the smart home device may include a smart lighting device, a control device of an intelligent electrical apparatus, a smart monitoring device, a smart television, a smart video camera, an interphone, or the like, or any combination thereof. In some embodiments, the wearable device may include a bracelet, footgear, eyeglasses, a helmet, a watch, clothing, a backpack, an accessory, or the like, or any combination thereof. In some embodiments, the smart mobile device may include a smartphone, a personal digital assistant (PDA), a gaming device, a navigation device, a point of sale (POS) device, or the like, or any combination thereof. In some embodiments, the virtual reality device and/or the augmented reality device may include a virtual reality helmet, a virtual reality glass, a virtual reality patch, an augmented reality helmet, an augmented reality glass, an augmented reality patch, or the like, or any combination thereof. For example, the virtual reality device and/or the augmented reality device may include a Google Glass™, an Oculus Rift™, a HoloLens™, a Gear VR™, etc. In some embodiments, the terminal 130 may remotely operate the capture device 110. In some embodiments, the terminal 130 may operate the capture device 110 via a wireless connection. In some embodiments, the terminal 130 may receive information and/or instructions inputted by a user, and send the received information and/or instructions to the capture device 110 or to the processing device 140 via the network 120. In some embodiments, the terminal 130 may be part of the processing device 140. In some embodiments, the terminal 130 may be omitted.
In some embodiments, the processing device 140 may process data obtained from the capture device 110, the terminal 130, or the storage device 150. For example, the processing device 140 may obtain an image and generate a confidence level of the image belonging to each scene category. As another example, the processing device 140 may determine, based on multiple confidence level thresholds and confidence levels of the image corresponding to multiple scene categories, a final scene category to which the image belongs. The processing device 140 may be a central processing unit (CPU), a digital signal processor (DSP), a system on a chip (SoC), a microcontroller unit (MCU), or the like, or any combination thereof. In some embodiments, the processing device 140 may be a single server or a server group. The server group may be centralized or distributed. In some embodiments, the processing device 140 may be local to or remote from one or more other components of the image processing system 100. For example, the processing device 140 may access information and/or data stored in the capture device 110, the terminal 130, and/or the storage device 150 via the network 120. As another example, the processing device 140 may be directly connected to the capture device 110, the terminal 130, and/or the storage device 150, to access stored information and/or data. In some embodiments, the processing device 140 may be implemented on a cloud platform. Merely by way of example, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or any combination thereof.
The storage device 150 may store data and/or instructions. In some embodiments, the storage device 150 may store data or images obtained from the capture device 110, the terminal 130 and/or the processing device 140. In some embodiments, the storage device 150 may store data and/or instructions that the processing device 140 may execute or use to perform exemplary methods described in the present disclosure. In some embodiments, the storage device 150 may include a mass storage, removable storage, a volatile read-and-write memory, a read-only memory (ROM), or the like, or any combination thereof. Exemplary mass storage may include a magnetic disk, an optical disk, a solid-state drive, etc. Exemplary removable storage may include a flash drive, a floppy disk, an optical disk, a memory card, a zip disk, a magnetic tape, etc. Exemplary volatile read-and-write memory may include a random-access memory (RAM). Exemplary RAM may include a dynamic RAM (DRAM), a double date rate synchronous dynamic RAM (DDR SDRAM), a static RAM (SRAM), a thyristor RAM (T-RAM), and a zero-capacitor RAM (Z-RAM), etc. Exemplary ROM may include a mask ROM (MROM), a programmable ROM (PROM), an erasable programmable ROM (PEROM), an electrically erasable programmable ROM (EEPROM), a compact disk ROM (CD-ROM), and a digital versatile disk ROM, etc. In some embodiments, the storage device 150 may be implemented on a cloud platform. Merely by way of example, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or any combination thereof.
In some embodiments, the storage device 150 may be connected to the network 120 to communicate with one or more components of the image processing system 100 (e.g., the capture device 110, the terminal 130, the processing device 140). One or more components in the image processing system 100 may access the data or instructions stored in the storage device 150 via the network 120. In some embodiments, the storage device 150 may be directly connected to or communicate with one or more components in the image processing system 100 (e.g., the capture device 110, the terminal 130, the processing device 140). In some embodiments, the storage device 150 may be part of the capture device 110, or the processing device 140.
The computing device 200 may be a special purpose computer used to implement a multimedia content processing system for the present disclosure. The computing device 200 may be used to implement any component of the multimedia content processing system as described herein. For example, the processing device 140 may be implemented on the computing device, via its hardware, software program, firmware, or a combination thereof. Although only one such computer is shown, for convenience, the computer functions relating to the image processing as described herein may be implemented in a distributed fashion on a number of similar platforms, to distribute the processing load.
The computing device 200, for example, may include a COM port 250 connected to and/or from a network connected thereto to facilitate data communications. The computing device 200 may also include a processor 220, in the form of one or more processors (or CPUs), for executing program instructions. The exemplary computing device may include an internal communication bus 210, different types of program storage units and data storage units (e.g., a disk 270, a read only memory (ROM) 230, a random access memory (RAM) 240), various data files applicable to computer processing and/or communication. The exemplary computing device may also include program instructions stored in the ROM 230, RAM 240, and/or another type of non-transitory storage medium to be executed by the processor 220. The method and/or process of the present disclosure may be implemented as the program instructions. The computing device 200 also includes an I/O device 260 that may support the input and/or output of data flows between the computing device 200 and other components. The computing device 200 may also receive programs and data via the communication network.
Merely for illustration, only one CPU and/or processor is described in the computing device 200. However, it should be noted that the computing device 200 in the present disclosure may also include multiple CPUs and/or processors, thus operations and/or method steps that are performed by one CPU and/or processor as described in the present disclosure may also be jointly or separately performed by the multiple CPUs and/or processors. For example, if in the present disclosure the CPU and/or processor of the computing device 200 executes both step A and step B, it should be understood that step A and step B may also be performed by two different CPUs and/or processors jointly or separately in the computing device 200 (e.g., the first processor executes operation A and the second processor executes operation B, or the first and second processors jointly execute operations A and B).
In some embodiments, an operating system 370 (e.g., iOS™, Android™, Windows Phone™, etc.) and one or more applications 380 may be loaded into the memory 360 from the storage 390 in order to be executed by the CPU 340. The applications 380 may include a browser or any other suitable mobile apps for receiving and rendering information relating to image processing or other information from the image processing system 100. User interactions with the information stream may be achieved via the I/O 350 and provided to the storage device 150, the capture device 110 and/or other components of the image processing system 100.
To implement various modules, units, and their functionalities described in the present disclosure, computer hardware platforms may be used as the hardware platform(s) for one or more of the elements described herein. A computer with user interface elements may be used to implement a personal computer (PC) or any other type of work station or terminal device. A computer may also act as a system if appropriately programmed.
The obtaining module 410 may be configured to obtain an image captured by a capture device, multiple historical images captured by the capture device before the image, and multiple scene categories. More descriptions regarding the obtaining of the image, the multiple historical images, and the multiple scene categories may be found elsewhere in the present disclosure (e.g.,
The scene category determination module 420 may be configured to determine a final scene category to which the image belongs. Specifically, for each of the multiple scene categories, the scene category determination module 420 may generate a confidence level of the image belonging to the scene category. The scene category corresponds to a confidence level threshold. For each of the multiple historical images, the scene category determination module 420 may further determine an initial scene category to which the historical image belongs. The scene category determination module 420 may determine, based on initial scene categories of the multiple historical images, a target scene category to which the multiple historical images belong. The scene category determination module 420 may determine multiple updated confidence level thresholds by updating, based on the target scene category, at least a portion of multiple confidence level thresholds corresponding to the multiple scene categories. The scene category determination module 420 may determine, based on the multiple updated confidence level thresholds and confidence levels of the image corresponding to the multiple scene categories, the final scene category to which the image belongs. More descriptions regarding the determining of the final scene category of the image may be found elsewhere in the present disclosure (e.g.,
The image processing module 430 may be configured to process the image based on the final scene category of the image.
In some embodiments, the image processing module 430 may include an obtaining module 402, a current brightness determination module 404, a target brightness determination module 406, and a capture device adjusting module 408.
The obtaining module 402 may be configured to obtain information and/or data related to the image processing system 100. In some embodiments, the obtaining module 402 may determine whether the final scene category is a preset scene category. The preset scene category includes at least one of a face recognition category or a license plate recognition category. In response to determining that the final scene category is the preset scene category, the obtaining module 402 may obtain a target region within an image captured by a capture device (e.g., the capture device 110). More descriptions regarding the obtaining of the target region may be found elsewhere in the present disclosure (e.g.,
The current brightness determination module 404 may be configured to determine a current brightness of the target region based on gray information of the target region and process the image based on the current brightness of the target region. More descriptions regarding the determination of the current brightness and the processing of the image may be found elsewhere in the present disclosure (e.g.,
The target brightness determination module 406 may be configured to determine a target brightness of the target region based on the current brightness of the target region. More descriptions regarding the determination of the target brightness may be found elsewhere in the present disclosure (e.g., operation 525 of process 500D, and the descriptions thereof).
The capture device adjusting module 408 may be configured to determine one or more exposure parameters of the capture device based on the target brightness of the target region. In some embodiments, the capture device adjusting module 408 may adjust the exposure parameter(s) based on the target brightness of the target region, and determine a desired brightness of the target region.
The modules in the processing device 140 may be connected to or communicate with each other via a wired connection or a wireless connection. The wired connection may include a metal cable, an optical cable, a hybrid cable, or the like, or any combination thereof. The wireless connection may include a Local Area Network (LAN), a Wide Area Network (WAN), a Bluetooth, a ZigBee, a Near Field Communication (NFC), or the like, or any combination thereof.
It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, the processing device 140 may further include one or more additional modules. For example, the processing device 140 may further include a storage module (not shown in
In 501, the processing device 140 (e.g., the obtaining module 410) may obtain an image captured by a capture device (e.g., the capture device 110), multiple historical images captured by the capture device before the image, and multiple scene categories.
In some embodiments, the capture device may include one or more visible light cameras. The capture device may be applied to a road (e.g., a highway, a provincial road, a city road), a station (e.g., a bus station, a train station, a metro station), a shopping mall, a hospital, a hotel, a scenic area, a community, or the like, or any combination thereof. In some embodiments, the capture device may be used to capture an image including vehicle(s) (also referred to as vehicle image), an image including non-motor vehicle(s) (also referred to as non-motor vehicle image), an image including human(s) (also referred to as human image), or the like, or any combination thereof.
In some embodiments, the image may be a two-dimensional (2D) image, a three-dimensional (3D) image, a four-dimensional (4D) image, or thel like. The image may include a visible light image. The visible light image may refer to an image captured by the capture device (e.g., a visible light camera) within a visible light range. In some embodiments, the image may be a processed image (e.g., an image in YUV domain, an image in RGB domain). Alternatively, the image may be a raw image (an unprocessed image), thereby saving the subsequent image signal processing (ISP) process and improving a response speed of exposure parameter adjustment.
Each image corresponds to a scene category, i.e., a scene in the image belongs to the scene category. As used herein, the scene category of an image refers to a category to which a scene in the image can be classified during an image recognition process. It should be noted that the terms “scene” and “scene category” in the present disclosure can be used interchangeably.
In some embodiments, the scene (or scene category) in an image may be associated with a location of the capture device, an environment in which the capture device is located, a function to be achieved by using the capture device, or the like, or any combination thereof.
In some embodiments, according to the location of the capture device, the scene may be classified into an indoor scene and/or an outdoor scene. Specifically, the scene in an image captured by a capture device arranged indoors may be classified into an indoor scene category. The scene in an image captured by a capture device arranged outdoors may be classified into an outdoor scene category. Exemplary indoor scenes may include a home scene (such as a living room scene, a kitchen scene, a bedroom scene, etc.), a business scene (such as a shop scene, an office scene, a restaurant scene, etc.), etc. Exemplary outdoor scenes may include a natural scene (such as a forest scene, a mountain scene, a beach scene, etc.), an urban scene (such as a street scene, a building scene, a square scene, etc.), etc. In some embodiments, the scene may also be classified according to the actual location of the capture device. For example, when the capture device is arranged around a road, the scene in an image captured by the capture device may be classified into a road scene, such as a highway scene, a city street scene, a country road scene, etc. As another example, when the capture device is arranged in a bookstore, the scene in an image captured by the capture device may be classified as a bookstore scene. As a future example, when the capture device is arranged in a classroom, the scene in an image captured by the capture device may be classified as a classroom scene.
In some embodiments, the scene may be classified according to the environment in which the capture device is located. For example, the scene may be classified according to the weather in which the capture device is located, such as a foggy scene, a rainy scene, a snowy scene, a sunny scene, etc. As another example, the scene may be classified according to the orientation of the capture device relative to the light, such as a backlight scene, a facing light scene, etc. In some embodiments, the scene may be classified according to the function to be achieved by using the capture device, such as a face recognition scene, a license plate recognition scene, etc.
In some embodiments, the multiple scene categories may be preset by a user according to his/her actual needs. For example, the user may designate both a bookstore scene and a classroom scene as a learning scene.
The multiple historical images and the image are captured within a certain time period (such as 1 s, 3 s, 5 s, 10 s, etc.). In some embodiments, the multiple historical images and the image are continuous images captured by the capture device. In some embodiments, the multiple historical images and the image may not be continuous images. For example, the processing device 140 may obtain the historical images and the image from the capture device at a preset frequency that is less than an intrinsic sampling frequency of the capture device.
In 502, for each of the multiple scene categories, the processing device 140 (e.g., the scene category determination module 420) may generate a confidence level of the image belonging to the scene category.
The confidence level of the image corresponding to a scene category is an evaluation metric used to determine whether the image belongs to the scene category. Specifically, each scene category may correspond to a confidence level threshold. If the confidence level corresponding to a scene category is greater than the corresponding confidence level threshold, it indicates that the image belongs to that scene category. In some embodiments, the confidence level thresholds corresponding to the multiple scene categories may be the same or different. In some embodiments, the confidence level thresholds may be set according to a default setting of the image processing system 100 or preset by a user or operator via the terminal 130.
In some embodiments, the confidence level corresponding to each scene category may be determined based on a similarity between a target feature of the image and a reference feature of the scene category. The reference feature may include preset image feature information of a reference image corresponding to each scene category. The reference features corresponding to different scene categories may be different. The reference feature corresponding to each scene category may have scene image feature information specific to that scene category. Specifically, for a certain scene category, the processing device 140 may extract the target feature from the image using an image feature extraction algorithm. Exemplary image feature extraction algorithms may include a histogram of oriented gradient (HOG) algorithm, a local binary pattern (LBP) algorithm, a Haar algorithm, etc. The processing device 140 may also extract the reference feature from a reference image of the certain scene category. The processing device 140 may determine the similarity between the target feature of the image corresponding to the certain scene category and the reference feature of the certain scene category as the confidence level corresponding to the certain scene category. Alternatively, the processing device 140 may normalize or map the similarities of the multiple scene categories and determine the normalized or mapped similarities as the confidence levels corresponding to the multiple scene categories.
In some embodiments, the confidence level corresponding to each scene category may be determined by normalizing or mapping features of the image.
In some embodiments, the processing device 140 may input the image into the scene recognition model. The scene recognition model may output the confidence level of the image corresponding to each scene category. Specifically, for each scene category, a feature extraction layer of the scene recognition model may extract the target feature from the image. In some embodiments, a backbone net of the feature extraction layer may include a MobileNet feature extraction net, an Inception feature extraction net, etc. In some embodiments, the feature extraction layer may also extract the reference feature from the reference image corresponding to each scene category. Further, the scene recognition model may determine a similarity between the target feature of the image and the reference feature of the scene category. For example, the similarity may be determined by calculating a cosine similarity or Euclidean distance between the target feature of the image and the reference feature of the scene category. The scene recognition model may designate the similarity between the target feature and the reference feature as the confidence level of the image belongs to the scene category.
In some embodiments, the scene recognition model may be constructed based on a deep learning model. The deep learning model may be trained based on a plurality of sets of training data. Each set of training data may include multiple sample images, the labels of each sample image are sample confidence levels of the sample image belonging to the multiple scene categories. However, in practice scene recognition tasks, extreme imbalance in the amount of training data of different scene categories is common in real-world deep training scenarios. That is, in a training set, the amount of labeled data for a certain common scene is very large, while the amount of labeled data for some uncommon classification scene is relatively small, and a ratio between the two is very larger, which can lead to the scene recognition model easily misclassifying uncommon scene categories into common scene labels. In such cases, common processing techniques include adding more data of scarce categories, increasing loss weights of scarce category samples, using a focal loss function, and so on. In some embodiments of the present disclosure, a focal loss function is used as a classification loss function to train the scene recognition model.
The focal loss function is given by Equation (1) as follows:
where FL(P) denotes the classification loss function; P denotes a sample confidence level of each sample image belonging to each scene category; a denotes a scene category weight; and λ denotes a weight of a scarce category sample (or a difficult-to-classify sample). Both the scene category weight a and the weight λ of the scarce category sample are hyperparameters, that is, they can be customized according to training needs. The sample confidence level of a sample image belonging to a specific scene category indicates a similarity between a sample target feature corresponding to the specific scene category in the sample image and a sample reference feature of the specific scene category.
In some embodiments, in order to avoid the drawbacks of using a single focal loss function, such as sensitivity to outlier samples and poor handling of outlier points at classification boundaries, a cluster center feature vector for each scene category is added during the training process and using similarities of feature vectors as a loss function.
Merely by way of example, a network structure of the deep learning model is determined firstly. The feature extraction layer of the deep learning model may be a preset feature extraction net according to the actual needs. Then, a focal loss function that includes the cluster center feature vector for each scene category is constructed for training the deep learning model. During the constructing process, a multi-center loss function (such as a SoftTriple loss function) is constructed, and a distance metric between each feature vector of a sample image and a feature vector of a specific scene category is determined as a loss function, which is combined with the focal loss function to increase the weight of the scarce category sample (or a difficult-to-classify sample).
In some embodiments, the process of constructing the multi-center loss function may include obtaining sample target features corresponding to several sample images by inputting the several sample images into the deep learning model for feature extraction; determining multiple reference features corresponding to the multiple scene categories; determining a similarity between a sample target feature corresponding to each scene category in the sample image and a sample reference feature of the scene category as a center similarity; determining a multi-center loss based on the multi-center loss function and the center similarities; and using the multi-center loss to adjust the parameters in the deep learning model to obtain a trained deep learning model (i.e., the scene recognition model).
In some embodiments, a classification loss obtained from the focal loss function and the multi-center loss may be weighted and summed to obtain a target loss. Subsequently, the target loss is used to adjust the parameters in the deep learning model, resulting in the trained deep learning model (i.e., the scene recognition model).
According to some embodiments of the present disclosure, during the training process of the deep learning model, the multi-center loss function (such as a SoftTriple loss function) combined with the focal loss function is used to train a feature extraction layer and a set of feature vectors measuring cluster centers of the scene categories. The trained deep learning model (i.e., the scene recognition model) directly eliminates the prior information of imbalanced data classification samples and converts it into a measurement problem between feature vectors of various scene category clusters, which can effectively solve the problem of false detection in scene category determination caused by extreme sample imbalance.
In 503, for each of the multiple historical images, the processing device 140 (e.g., the scene category determination module 420) may obtain an initial scene category to which the historical image belongs.
Specifically, the processing device 140 may determine a confidence level of each historical image corresponding to each scene category, and determine the initial scene category of the historical image based on the confidence levels of the historical image corresponding to the multiple scene categories. For example, the processing device 140 may input each historical image into the scene recognition model. The scene recognition model may output the confidence level of the historical image corresponding to each scene category. The processing device 140 may determine a scene category with a corresponding confidence level greater than the corresponding confidence level threshold (or the corresponding updated confidence level threshold determined as described in operation 505 below) as the initial scene category of the historical image.
In 504, the processing device 140 (e.g., the scene category determination module 420) may determine, based on initial scene categories of the multiple historical images, a target scene category to which the multiple historical images belong.
In some embodiments, the initial scene categories to which the multiple historical images belong may be the same or different.
In some embodiments, if the initial scene categories of the multiple historical images are the same, the processing device 140 may determine the initial scene category as the target scene category to which the multiple historical images belong. In some embodiments, if at least a portion of the initial scene categories of the multiple historical images are different, the processing device 140 may determine the initial scene category of a specific historical image as the target scene category to which the multiple historical images belong. In some embodiments, the specific historical image may be a historical image closet in time to the image, or a historical image whose initial scene category appears the most times among all the initial scene categories.
In 505, the processing device 140 (e.g., the scene category determination module 420) may determine multiple updated confidence level thresholds by updating, based on the target scene category, at least a portion of multiple confidence level thresholds corresponding to the multiple scene categories.
In some embodiments, the processing device 140 may update the confidence level threshold corresponding to the target scene category, and maintain the remaining confidence level thresholds. Specifically, the processing device 140 may fine tune the confidence level threshold corresponding to the target scene category according to usage requirements. For example, the processing device 140 may increase or decrease the confidence level threshold corresponding to the target scene category. The processing device 140 may determine the updated confidence level threshold corresponding to the target scene category and the remaining confidence level thresholds as the multiple updated confidence level thresholds.
In some embodiments, the processing device 140 may also maintain the confidence level threshold corresponding to the target scene category, and update the remaining confidence level thresholds. The processing device 140 may determine the confidence level threshold corresponding to the target scene category and the updated remaining confidence level thresholds as the multiple updated confidence level thresholds.
In some embodiments, the processing device 140 may update the at least a portion of the multiple confidence level thresholds by decreasing the confidence level threshold corresponding to the target scene category. For example, the processing device 140 may multiple the current confidence level threshold by a preset value to determine the updated confidence level threshold. The preset value is less than 1, such as 0.9, 0.8, 0.7, 0.5, etc. At this time, the processing device 140 may maintain the remaining confidence level thresholds corresponding to scene categories other than the target scene category. For example, when the multiple scene categories include a foggy scene category and a rainy scene category, if the target scene category is the foggy scene category, the processing device 140 may update the confidence level threshold of the foggy scene category to obtain an updated confidence level threshold for the foggy scene category. At this time, the processing device 140 may maintain the rainy scene category.
According to some embodiments of the present disclosure, in the absence of significant changes in the actual scene, when confidence levels of several continuous images corresponding to a specific scene category often fluctuate around the confidence level threshold corresponding to the specific scene category, the scene categories of the continuous images may be determined to constantly switch between belonging to and not belonging to the certain scene category, resulting in inaccurate determination of the scene categories of the continuous images. By determining the multiple updated confidence level thresholds by updating the confidence level threshold based on the target scene category of the multiple historical images captured before the image, and determining a preliminary scene category of the image based on the multiple updated confidence level threshold, the difficulty of determining the preliminary scene category of the image belonging to the target scene category is increased, which maintains the stability of scene recognition.
In 506, the processing device 140 (e.g., the scene category determination module 420) may determine, based on the multiple updated confidence level thresholds and confidence levels of the image corresponding to the multiple scene categories, a final scene category to which the image belongs.
In some embodiments, for each of the multiple scene categories corresponding to the multiple updated confidence level thresholds, the processing device 140 may determine a relationship between the confidence level of the image belonging to the scene category and an updated confidence level threshold of the scene category. Specifically, the processing device 140 may compare the confidence level of the image belonging to the scene category and the updated confidence level threshold of the scene category, and determine a compare result as the relationship between the confidence level of the image belonging to the scene category and the updated confidence level threshold of the scene category. The processing device 140 may determine, based on relationships corresponding to the multiple scene categories, a preliminary scene category to which the image belongs. Specifically, the processing device 140 may determine a scene category with a corresponding confidence level greater than the corresponding updated confidence level threshold as the preliminary scene category of the image. The processing device 140 may determine, based on the preliminary scene category of the image, the final scene category to which the image belongs.
In some embodiments, the processing device 140 may determine any one of the preliminary scene category of the image and the initial scene categories of the multiple historical images as the final scene category of the image. For example, the processing device 140 may directly determine the preliminary scene category as the final scene category of the image.
In some embodiments, the processing device 140 may determine the final scene category of the image based on a distribution situation of the initial scene categories and the preliminary scene category of the image. The distribution situation is determined based on statistical values of the initial scene categories of the multiple historical images and the preliminary scene category of the image. Merely by way of example, the processing device 140 may determine a scene category which appears the most times among all the initial scene categories and the preliminary scene category as the final scene category of the image. For example, in the case where a total number of historical images and the image is 10 frames, the statistical value of a first scene category is 8, the statistical value of a second scene category is 1, and the statistical value of a third scene category is 1. The processing device 140 may determine the first scene category as the final scene category of the image.
According to some embodiments of the present disclosure, for the image to be processed, by determining the final scene category of the image based on the updated confidence level thresholds corresponding to the multiple scene categories can improve the stability and/or accuracy of the determined final scene category. Specifically, by determining the multiple updated confidence level thresholds by updating at least a portion of multiple confidence level thresholds corresponding to the multiple scene categories based on the target scene category, it ensures that when determining the preliminary scene category of the image, the multiple updated confidence level thresholds are highly adapted to the scene categories, thus improving the accuracy of the final scene category of the image determined based on the highly adapted confidence level thresholds.
In some embodiments, the processing device 140 may determine whether the preliminary scene category of the image and the initial scene categories of a preset count of historical images among the multiple historical images are consistent. In response to the preliminary scene category of the image and the initial scene categories of the preset count of historical images among the multiple historical images being consistent, the processing device 140 may designate the preliminary scene category of the image as an updated scene category of the image. For example, taking the preset count is one as an example, if the initial scene category of the historical image is a foggy scene category, and the preliminary scene category of the image is also a foggy scene category, in response to the preliminary scene category of the image and the initial scene category of the historical image are consistent, the processing device 140 may determine the preliminary scene category as the updated scene category.
In some embodiments, in response to an inconsistency of the preliminary scene category of the image and the initial scene categories of the preset count of historical images among the multiple historical images, the processing device 140 may determine the target scene category as the updated scene category. For example, taking the preset count is one as an example, if the initial scene category of the historical image is a foggy scene category, and the preliminary scene category of the image is a rainy scene category, in response to the inconsistency of the preliminary scene category of the image and the initial scene category of the historical image, the processing device 140 may determine the target scene category as the updated scene category.
According to some embodiments of the present disclosure, in practice, due to the scene category determination of the current image cannot avoid the occurrence of a few misjudgments, or when the scene captured in the image undergoes instantaneous changes, but the actual scene category does not want to change, by updating or maintaining the preliminary scene category to which the current image belongs, the obtained updated scene category to which the current image belongs can ensure the accuracy of subsequent judgments of the final scene category of the image.
The processing device 140 may determine the final scene category of the image based on the updated scene category. For example, the processing device 140 may directly determine the updated scene category as the final scene category.
In some embodiments, the processing device 140 may customize the final scene category of the image based on the preliminary scene category and the updated scene category. Specifically, a user may preset a weight to each scene category. For example, a weight for the foggy scene category is greater than that for the rainy scene category, that is, in the case where the updated scene category to which the image belongs includes the foggy scene category and the rainy scene category, the processing device 140 may determine the final scene category of the image as the foggy scene category. In some embodiments, the processing device 140 may designate any one of the preliminary scene category and the updated scene category as the final scene category of the image.
In some embodiments, the processing device 140 may determine choose the final scene category from the preliminary scene category and the updated scene category based on a target distribution situation of the updated scene category of the image and final scene categories of a second preset count of historical images. In some embodiments, the target distribution situation may be statistical values of scene categories of a predetermined count of images. That is, the predetermined count is equal to a sum of the second preset count and a count of images to be processed (e.g., one). In some embodiments, the predetermined count may be greater than or equal to a count of the multiple historical images. That is to say, a count of historical images contained in the predetermined count of historical images may be greater than a count of historical images in the aforementioned multiple historical images. The predetermined count of images may include the multiple historical images captured before the image. For example, images of the predetermined count may be 200 frames images including the image and 199 frames of historical images. Thus, the target distribution situation is a distribution situation of the updated scene category of the image and the final scene categories of the 199 frames of historical images.
In some embodiments, the target distribution situation may be a count of times that various scene category combinations appear, and each scene category combination may be at least one scene category.
In some embodiments, the target distribution situation may be a count of scene category switches within a preset time period. The preset time period is a time period during which the image and the second preset count of historical images are captured. As used herein, two adjacent scene categories of two continuous images being not consistent is determined as one scene category switch.
The processing device 140 may determine the count of scene category switches exceeds a preset threshold (e.g., 10, 15, 20, etc.). In some embodiments, the preset threshold may be set according to a default setting of the image processing system 100 or preset by a user or operator via the terminal 130. In some embodiments, the preset threshold may be dynamically adjusted. For example, the processing device 140 may adjust the preset threshold based on the predetermined count. The greater the predetermined count is, the greater the preset threshold may be.
In response to the count of scene category switches exceeding the preset threshold, the processing device 140 may determine the preliminary scene category of the image as the final scene category of the image. In response to the count of scene category switches not exceeding the preset threshold, the processing device 140 may determine the updated scene category as the final scene category of the image.
Merely by way of example, in the dimension of time domain, a queue corresponding to the predetermined count of images is first-in-first-out according to the order of acquisition time, and a queue of final scene categories to which the predetermined count of images belong is retained. Specifically, the queue of the final scene categories to which the predetermined count of images belong may be represented as a State_steady queue. When a count of scene category switches in the State_steady queue is greater than or equal to the preset threshold, a scene state of the predetermined count of images is determined as a fluctuating state. In the fluctuating state, the processing device 140 may determine the preliminary scene category of the image as the final scene category of the image. When the count of scene category switches in the State_steady queue is less than the preset threshold, the scene state of the predetermined count of images is determined as a stable state. In the stable state, the processing device 140 may determine the updated scene category of the image as the final scene category of the image.
It should be noted that in actual image acquisition scenarios, short-term continuous scene switching often occurs. For example, when the capture device is facing the entrance and exit of an animal enclosure area, animals frequently appear at the entrance and exit, which may cause the image scene category to frequently switch between a backlight scene and a facing light scene, resulting in a corresponding image processing module frequently switching between a wide dynamic mode and a non-wide dynamic mode, which can easily lead to resource consumption of the capture device and abnormal image processing results. Therefore, according to some embodiments of the present disclosure, from the dimension of time domain, by determining the preliminary scene category or the updated scene category as the final scene category of the image based on the scene state (including the fluctuating state and the stable state) of the predetermined count of images, the accuracy and stability of the final scene category of the image can be ensured in the case of abnormal discrimination and/or when the scene state of the predetermined count of images is in the fluctuating state.
In 507, the processing device 140 (e.g., the image processing module 430) may process the image based on the final scene category of the image.
At least a portion of the multiple scene categories correspond to different image processing techniques. For example, different scene categories may correspond to different image processing techniques. In some embodiments, different scene categories may also correspond to the same image processing technique.
The processing device 140 may process the image using an image processing technique corresponding to the final scene category of the image.
In some embodiments, the image processing technique represents a mode in which a functional module or image processing module in the capture device can be in, and in this mode, the functional module or image processing module can perform corresponding image processing on the image. Merely by way of example, in the case where the multiple scene categories are the foggy scene category and the backlight scene category, the corresponding image processing technique for the scene categories are different. When the scene category is the backlight scene category, the functional module or image processing module may be in a wide dynamic mode to improve the dynamic range of the image. Specifically, the wide dynamic mode may be used to handle high contrast scenes and capture details of bright and dark parts in the image. When the scene category is the foggy scene category, the functional module or image processing module may be in a penetrating fog mode to increase image clarity. Specifically, the penetrating fog mode may remove fog from the image and improve the clarity and contrast of the image.
It should be noted that the above description regarding the process 500A is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure.
In 511, the processing device 140 (e.g., the obtaining module 410) may obtain an image captured by a capture device, multiple historical images captured by the capture device before the image, and multiple scene categories.
In 512, the processing device 140 (e.g., the scene category determination module 420) may generate a confidence level of the image belonging to the scene category using a scene recognition model.
In 513, for each of the multiple historical images, the processing device 140 (e.g., the scene category determination module 420) may obtain an initial scene category to which the historical image belongs.
In 514, the processing device 140 (e.g., the scene category determination module 420) may determine, based on initial scene categories of the multiple historical images, a target scene category to which the multiple historical images belong.
In 515, the processing device 140 (e.g., the scene category determination module 420) may determine multiple updated confidence level thresholds by updating, based on the target scene category, at least a portion of multiple confidence level thresholds corresponding to the multiple scene categories.
Operations 511-515 may be performed in a similar manner as operations 501-505 as described in connection with
In 516, the processing device 140 (e.g., the scene category determination module 420) may determine a preliminary scene category of the image. Specifically, for each of the multiple scene categories corresponding to the multiple updated confidence level thresholds, the processing device 140 may determine a relationship between the confidence level of the image belonging to the scene category and an updated confidence level threshold of the scene category. The processing device 140 may determine, based on relationships corresponding to the multiple scene categories, the preliminary scene category to which the image belongs. Operation 516 may be performed in a similar manner as operation 506 as described in connection with
In 517, the processing device 140 (e.g., the scene category determination module 420) may whether the preliminary scene category of the image and the initial scene categories of a preset count of historical images among the multiple historical images are consistent. In response to the preliminary scene category of the image and the initial scene categories of the preset count of historical images among the multiple historical images being consistent, the processing device 140 may proceed to perform operation 518 to designate the preliminary scene category of the image as an updated scene category of the image. In response to an inconsistency of the preliminary scene category of the image and the initial scene categories of the preset count of historical images among the multiple historical images, the processing device 140 may proceed to perform operation 519 to determine the target scene category as the updated scene category.
In 520, the processing device 140 (e.g., the scene category determination module 420) may determine whether a count of scene category switches exceeds a preset threshold. In response to the count of scene category switches exceeding the preset threshold, the processing device 140 may proceed to perform operation 521 to determine the preliminary scene category of the image as the final scene category of the image. In response to the count of scene category switches not exceeding the preset threshold, the processing device 140 may proceed to perform operation 522 to determine the updated scene category as the final scene category of the image.
In 523, the processing device 140 (e.g., the scene category determination module 420) may process the image based on the final scene category of the image.
It should be noted that the above description regarding the process 500B is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure.
In 531, the processing device 140 may determine whether the final scene category is a preset scene category.
The preset scene category may include a face recognition category, a license plate recognition category, etc.
In 532, in response to determining that the final scene category is the preset scene category, the processing device 140 may obtain a target region within the image.
In some embodiments, the image may include one or more target regions. The target region may be a region in the image that needs to be highlighted. For example, if the image is a vehicle image, the target region may be a region including a license plate. As another example, if the image is a human image, the target region may be a facial region. In some embodiments, the processing device 140 may identify the target region from the image. The processing device 140 may perform a downsampling on the target region. More descriptions regarding the target region may be found elsewhere in the present disclosure (e.g.,
In some embodiments, the processing device 140 may obtain the target region (or the image) from one or more components of the image processing system 100, such as the capture device 110, the terminal 130, a storage device (e.g., the storage device 150), etc. The processing device 140 may obtain the target region (or the image) from the capture device 110 in real time. For example, the processing device 140 may obtain the target region (or the image) from the capture device 110 via a real-time stream protocol (RTSP). Alternatively or additionally, the processing device 140 may obtain the target region (or the image) from an external source (e.g., a cloud disk) via the network 120.
In 533, the processing device 140 may determine, based on gray information of the target region, a current brightness of the target region.
The gray information of the target region may include a gray level of the target region, a gray value of each pixel in the target region, a distribution of gray values of pixels in the target region, or the like, or any combination thereof.
In some embodiments, the gray level may refer to the difference in brightness of pixels in a monochrome display, or the difference in color of pixels in a color display. The more the gray level is, the clearer the hierarchy of image is. The gray level is determined by the number of bits of a refresh storage unit corresponding to each pixel and the performance of the display (e.g., the monochrome display, the color display). The gray level may include a 16-gray level, 32-gray level, 64-gray level, 256-gray level, or the like. The gray value may be the value that represents the brightness of a pixel in an image (e.g., a monochrome image). For example, if the gray values in an image are represented by 8-bit binary data, the range of the gray values is between 0 and 255. For example, the gray value of the pixel may be 0, 10, 20, 30, or the like. The smaller the gray value(s) are, the darker the corresponding pixel(s) are. The larger the gray value(s) are, the brighter the corresponding pixel(s) are.
In some embodiments, the target region (or the image) may be a color image (e.g., an image in YUV domain, an image in RGB domain). The processing device 140 may convert the color image into a monochrome image. In some embodiments, the processing device 140 may convert the pixel value of a pixel in the color image (e.g., an image in RGB domain) into the corresponding gray value in the monochrome image according to different application needs. For example, the pixel value of a pixel in the color image may be converted to the corresponding gray value by determining a weighted average of red color value, green color value, and blue color value. For example, the gray value of the pixel in the monochrome image may be determined according to Equation (1) as below:
wherein V refers to the gray value of the pixel in the monochrome image; R refers to the red color value in the color image; G refers to the green color value in the color image; and B refers to the blue color value in the color image.
In some embodiments, the processing device 140 may convert the pixel value of the pixel in the color image into the corresponding gray value in the monochrome image by determining the average of the red color value, the green color value, and the blue color value corresponding to the pixel; or by determining the maximum value among the red color value, the green color value, and the blue color value corresponding to the pixel. In some embodiments, the image may be a raw image, and the processing device 140 may perform a G-channel downsampling on the target region (see, e.g., operation 1005 of process 1000). The downsampled target region may include one or more pixels with G color value. The processing device 140 may convert the G color value of each pixel in the downsampled target region into the gray value of the pixel. For example, the processing device 140 may determine the G color value of each pixel as the gray value of the pixel.
In some embodiments, the current brightness of the target region may include the degree of exposure of the target region, such as overexposure, underexposure, etc. Alternatively, the current brightness of the target region may include the brightness level of the target region, such as too dark, too bright, normal. Alternatively, the current brightness of the target region may be represented by a specific brightness value, such as 50, 60, 70, etc. The processing device 140 may determine the current brightness of the target region based on the gray information (e.g., the gray value of each pixel) of the target region. More descriptions of the determination of the current brightness may be found elsewhere in the present disclosure (e.g.,
In some embodiments, the image may include a plurality of target regions. The processing device 140 may identify a target region with a largest size among the plurality of target regions. The size of a target region may be measured by the length of a diagonal of the target region, or the number of pixels in the target region, or the like. The processing device 140 may then determine the current brightness of the target region with the largest size based on its gray information. The processing device 140 may determine the current brightness of the target region with the largest size as the current brightness. Alternatively or additionally, the processing device 140 may determine the current brightness of each target region based on their gray information. The processing device 140 may determine a weighted average of the plurality of target regions as the current brightness of the image based on the current brightness of each target region. More descriptions regarding the determination of the current brightness may be found elsewhere in the present disclosure (e.g.,
In 534, the processing device 140 may process the image based on the current brightness of the target region.
In some embodiments, in response to determining that the current brightness of the target region is less than a minimum value of a predetermined range, the processing device 140 may increase the current brightness of the target region. The more the current brightness is less than the minimum value of the predetermined range, the greater the increasing of the current brightness may be. In some embodiments, in response to determining that the current brightness of the target region is greater than a maximum value of the predetermined range, the processing device 140 may decrease the current brightness of the target region. The more the current brightness is greater than the maximum value of the predetermined range, the greater the decreasing of the current brightness may be.
In some embodiments, the processing device 140 may adjusting the capture device based on based on the current brightness of the target region. More descriptions regarding the adjustment of the capture device may be found elsewhere in the present disclosure (e.g.,
It should be noted that the above description regarding the process 500C is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure.
In 525, the processing device 140 (e.g., the target brightness determination module 406) may determine a target brightness of the target region based on the current brightness of the target region.
In some embodiments, the processing device 140 may determine whether the current brightness of the target region is within a predetermined brightness range. In some embodiments, the predetermined brightness range may be a default brightness range or an empirical brightness range related to the image processing system 100. Alternatively, the predetermined brightness range may vary according to different conditions. Merely by way of example, the predetermined brightness range may be [70, 150], [80, 160], [60, 120], or the like.
If the current brightness of the target region is within the predetermined brightness range, the processing device 140 may determine the current brightness as the target brightness. For example, assuming that the predetermined brightness range may be [80, 160]. If the current brightness of the target region is 90, which is within the predetermined brightness range of [80, 160], the processing device 140 may determine the current brightness as the target brightness.
In some embodiments, if the current brightness of the target region is less than the minimum value of the predetermined brightness range, the processing device 140 may adjust the current brightness, e.g., increasing the current brightness, to obtain the target brightness. The smaller the current brightness is, the greater the adjustment of the current brightness is. For example, if the current brightness is 75, slightly smaller than the minimum value of the predetermined brightness range [80, 160], the processing device 140 may slightly increase the current brightness to determine the target brightness. As another example, if the current brightness is 20, much smaller than the minimum value of the predetermined brightness range [80, 160], the processing device 140 may need to greatly increase the current brightness to obtain the target brightness.
In some embodiments, if the current brightness of the target region is greater than the maximum value of the predetermined brightness range, the processing device 140 may adjust the current brightness, e.g., decreasing the current brightness, to reach the target brightness. The greater the current brightness is, the greater the adjustment of the current brightness is. For example, if the current brightness is 161, slightly greater than the maximum value of the predetermined brightness range [80, 160], the processing device 140 may slightly decrease the current brightness to determine the target brightness. As another example, if the current brightness is 230, much greater than the maximum value of the predetermined brightness range [80, 160], the processing device 140 may need to greatly decrease the current brightness to determine the target brightness.
In some embodiments, the current brightness of the target region is not within the predetermined brightness range, the processing device 140 may determine the target brightness based on the current brightness and a brightness mapping relationship. The brightness mapping relationship may be a correspondence between one or more target brightness values and one or more current brightness values. In some embodiments, the one or more target brightness values may be values within the predetermined brightness range. The current brightness values within different value ranges may correspond to different target brightness values. For example, the current brightness values within a first value range (e.g., the value range [0, 20]) may correspond to a first target brightness value; the current brightness values within a second value range (e.g., the value range [20, 40]) may correspond to a second target brightness value; the current brightness values within a third value range (e.g., the value range [40, 60]) may correspond to a third target brightness value; or the like. In some embodiments, the processing device 140 may determine the brightness corresponding to the current brightness as the target brightness according to the brightness mapping relationship. For example, if the current brightness belongs to the second value range (i.e., the value range [20, 40]), the processing device 140 may determine the second target brightness value as the target brightness.
In 527, the processing device 140 (e.g., the capture device adjusting module 408) may determine one or more exposure parameters of the capture device (e.g., the capture device 110) based on the target brightness of the target region. In some embodiments, the exposure parameter(s) may include an aperture value, an exposure time, a shutter speed, a gain, or the like. The processing device 140 may adjust the exposure parameter(s) based on the target brightness of the target region, and determine a desired brightness of the target region.
In some embodiments of the present disclosure, the processing device 140 may obtain the target region within the image, determine the current brightness of the target region based on the gray information of the target region, determine the target brightness of the target region based on the current brightness, and determine the exposure parameter(s) based on the target brightness. In some embodiments, the target brightness of the target region may be determined based on the current brightness, and the exposure parameter(s) may further be determined based on the target brightness of the target region, which may ensure the effective presentation of the target region in the image. Besides, the exposure parameter(s) may be adjusted and/or varied based on the current brightness of the target region under different situations, which may facilitate the adaptive exposure of the capture device (e.g., the capture device 110), thereby improving the adaptability of the capture device to different scenes.
It should be noted that the above description regarding the process 500D is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, one or more other optional operations (e.g., a storing operation) may be added elsewhere in the process 500D. In the storing operation, the processing device 140 may store information and/or data (e.g., the current brightness of the target region, the target brightness of the target region) associated with the image processing system 100 in a storage device (e.g., the storage device 140) disclosed elsewhere in the present disclosure.
In 601, the processing device 140 (e.g., the current brightness determination module 404) may determine a number of pixels corresponding to each gray value and a number of pixels falling in different gray value ranges by performing statistics on gray values of pixels in the target region.
In some embodiments, the target region may include M*N pixels. The pixels in the target region may be expressed as p(i, j), wherein i is an integer from 0 to M; and j is an integer from 0 to N. In some embodiments, the processing device 140 may perform statistics on the gray value of each pixel p(i, j). The processing device 140 may determine the number of pixels corresponding to each gray value after performing the statistics on the gray value of each pixel p(i, j). The number of pixels corresponding to each gray value may be represented as a gray value histogram, in which the number of pixels is the coordinate on Y-axis and the gray value is the coordinate on X-axis. For example, the number of pixels with gray value 0 may be 10, the number of pixels with gray value 1 may be 21, the number of pixels with gray value 2 may be 3, or the like.
In some embodiments, the processing device 140 may also determine the number of pixels falling in different gray value ranges after performing the statistics on the gray value of each pixel p(i, j). The number of pixels falling in different gray value ranges may be represented as a gray value cumulative histogram. For example, the processing device 140 may determine the number of pixels falling in a first gray value range (e.g., the number of pixels falling in gray value range [0, 0] may be 10), the number of pixels falling in a second gray value range (e.g., the number of pixels falling in gray value range [0, 1] may be 31, the number of pixels falling in a third gray value range (e.g., the number of pixels falling in gray value range [0, 2] may be 34), or the like. The processing device 140 may determine the gray value as the coordinate on X-axis and the number of pixels falling in different gray value ranges as the coordinate on Y-axis to establish the gray value cumulative histogram.
It should be noted that the above description is merely for illustration purposes, and is not intended to limit the scope of the present disclosure. In some embodiments, the processing device 140 may determine a ratio of the number of pixels with a certain gray value to the total number of pixels in the target region as the coordinate on Y-axis of the gray value histogram. The processing device 140 may determine a ratio of the number of pixels in a gray value range to the total number of pixels in the target region as a coordinate on Y-axis of the gray value cumulative histogram.
In 602, the processing device 140 (e.g., the current brightness determination module 404) may determine the current brightness of the target region based on the number of pixels corresponding to each gray value and the number of pixels falling in different gray value ranges. In some embodiments, the processing device 140 may determine a degree of exposure of the target region based on the number of pixels falling in different gray value ranges. The degree of exposure may include overexposure, underexposure, or the like. The processing device 140 may then determine the current brightness of the target region based on the number of pixels corresponding to each gray value and the number of pixels falling in different gray value ranges, a first predetermined threshold corresponding to the degree of exposure. Details regarding the determination of the current brightness of the target region may be found elsewhere in the present disclosure (e.g.,
It should be noted that the above description regarding the process 600 is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, one or more other optional operations (e.g., a storing operation) may be added elsewhere in the process 600. In the storing operation, the processing device 140 may store information and/or data (e.g., the number of pixels corresponding to each gray value, the number of pixels falling in different gray value ranges) associated with the image processing system 100 in a storage device (e.g., the storage device 140) disclosed elsewhere in the present disclosure.
In 701, the processing device 140 (e.g., the current brightness determination module 404) may determine a degree of exposure based on the number of pixels falling in different gray value ranges.
In some embodiments, the degree of exposure may include underexposure, overexposure, or the like. The “underexposure” may mean that the image is too dark, and thus the brighter parts of the image can be recognizable and the darker parts of the image cannot be shown in detail. The “overexposure” may mean that the image is too bright, and thus the darker parts of the image can be recognizable and the brighter parts of the image cannot be shown in detail.
In some embodiments, the processing device 140 may determine whether the degree of exposure is underexposure or overexposure according to process 800.
In 801, the processing device 140 (e.g., the current brightness determination module 404) may determine whether a pixel ratio of a first gray value range is greater than a second predetermined threshold based on the number of pixels in different gray value ranges. The pixel ratio of the first gray value range may be a ratio of the number of pixels falling in the first gray value range to the total number of pixels in the target region. In some embodiments, the first gray value range may be a default value range or an empirical value range related to the image processing system 100. Alternatively, the first gray value range may be different value ranges according to different conditions. Merely by way of example, the first gray value range may be a gray value range of [0, 30], a gray value range of [0, 50], a gray value range of [0, 60], a gray value range of [0, 100], or the like. The second predetermined threshold may be a default value or an empirical value related to the image processing system 100. In some embodiments, the second predetermined threshold may be set according to a default setting of the image processing system 100. For example, the second predetermined threshold may be 20%, 30%, 35%, 40%, 50%, etc.
In response to a determination that the pixel ratio is greater than the second predetermined threshold, the processing device 140 may determine that the target region is too dark, and may proceed to operation 803. In 803, the processing device 140 (e.g., the current brightness determination module 404) may determine the degree of exposure of the target region as underexposure. Merely by way of example, the second predetermined threshold may be 40%. The first gray value range may be the gray value range of [0, 50]. Assuming that the total number of pixels in the target region may be 1000, and the number of pixels in the first gray value range may be 700. The pixel ratio of the first gray value range may be 70%, which is greater than the second predetermined threshold (40%). That is, the target region is too dark, and the degree of exposure of the target region may be determined as underexposure.
In response to a determination that the pixel ratio is less than or equal to the second predetermined threshold, the processing device 140 may determine that the target region is too bright, and may proceed to operation 805. In 805, the processing device 140 (e.g., the current brightness determination module 404) may determine the degree of exposure of the target region as overexposure. Merely by way of example, the second predetermined threshold may be 40%. The first gray value range may be the gray value range of [0, 50]. Assuming that the total number of pixels in the target region may be 1000, and the number of pixels falling in the first gray value range may be 100. The pixel ratio of the first gray value range may be 10%, which is less than the second predetermined threshold (40%). That is, the target region is too bright, and the degree of exposure of the target region may be determined as overexposure.
In some embodiments, the degree of exposure may include underexposure, normal, overexposure, or the like. In some embodiments, the second predetermined threshold may be a value range, such as [35%, 45%], [30%, 50%], [25%, 50%], [20%, 60%], etc. If the pixel ratio of the first gray value range is greater than the maximum value of the value range, the processing device 140 may determine the degree of exposure as underexposure. If the pixel ratio of the first gray value range is within the value range, the processing device 140 may determine the degree of exposure as normal. If the pixel ratio of the first gray value range is less than the minimum value of the value range, the processing device 140 may determine the degree of exposure as overexposure. Merely by way of example, the second predetermined threshold may be a value range of [35%, 45%]. If the pixel ratio of the first gray value range is greater than 45%, the processing device 140 may determine the degree of exposure as underexposure. If the pixel ratio of the first gray value range is within the value range [35%, 45%], the processing device 140 may determine the degree of exposure as overexposure. If the pixel ratio of the first gray value range is less than 35%, the processing device 140 may determine the degree of exposure as overexposure.
In 703, the processing device 140 (e.g., the current brightness determination module 404) may determine the current brightness of the target region based on the number of pixels corresponding to each gray value, the number of pixels falling in different gray value ranges, and a first predetermined threshold corresponding to the degree of exposure.
In some embodiments, the first predetermined threshold may be a default value or an empirical value related to the image processing system 100. In some embodiments, the first predetermined threshold may be set according to a default setting of the image processing system 100. In some embodiments, the first predetermined threshold corresponding to the degree of exposure may have different values when the degree of exposure is different. For example, if the degree of exposure is underexposure or normal, the first predetermined threshold may be a first value (e.g., 45%, 50%, 55%, etc.). If the degree of exposure is overexposure, the first predetermined threshold may be a second value (e.g., 75%, 80%, 85%, etc.).
In some embodiments, the processing device 140 may determine the current brightness of the target region according to process 900.
In 901, the processing device 140 (e.g., the current brightness determination module 404) may determine a gray value of the pixel whose pixel ratio satisfies a first preset condition as a target gray value. The pixel ratio of the target gray value may be a ratio of the number of pixels corresponding to the target gray value to the total number of pixels in the target region. The first preset condition may refer to the pixel ratios of the target gray value and one or more subsequent gray values are less than a third predetermined threshold. For example, the processing device 140 may determine the pixel ratios of the pixels having gray values 20-30, respectively. If the pixel ratios of the pixels having gray values 20-30 are all less than the third predetermined threshold, the processing device 140 may determine the gray value 20 as the target gray value.
In some embodiments, the third predetermined threshold may be a default value or an empirical value related to the image processing system 100. In some embodiments, the third predetermined threshold may be set according to a default setting of the image processing system 100. For example, the third predetermined threshold may be 1%, 1.5%, 2%, or the like.
In 903, the processing device 140 (e.g., the current brightness determination module 404) may determine whether a pixel ratio of a second gray value range is greater than the first predetermined threshold based on the number of pixels in different gray value ranges. In some embodiments, the second gray value range may be determined based on the target gray value. Specifically, pixels in the second gray value range may include one or more pixels whose gray value is less than or equal to the target gray value. For example, if the target gray value is 20, the second gray value range may be a gray value range of [0, 20]. The pixel ratio of the second gray value range may be a ratio of the number of pixels falling in the second gray value range to the total number of pixels in the target region. In some embodiments, the pixel ratio of the second gray value range may be 40%, 45%, 49%, 50%, 51%, 55%, or the like.
In response to a determination that the pixel ratio of the second gray value range is greater than the first predetermined threshold, the processing device 140 may proceed to operation 905. In 905, the processing device 140 may determine the target gray value as the current brightness of the target region. For example, if the degree of exposure of the target region may be determined as underexposure, the first predetermined threshold may be 50%. As described in operation 901, the target gray value may be 20. The processing device 140 may determine the pixel ratio of the pixels having values in the second gray value range (i.e., the gray value range of [0, 20]). If the pixel ratio of the pixels having values in the second gray value range is determined as 51%, which is greater than 50%, the processing device 140 may determine the target gray value (i.e., gray value 20) as the current brightness of the target region.
Otherwise, in response to a determination that the pixel ratio of the second gray value range is less than or equal to the first predetermined threshold, the processing device 140 may proceed to operation 901. In 901, the processing device 140 may re-determine the gray value of pixels whose pixel ratio satisfies the first preset condition as the target gray value. For example, if the pixel ratio of the pixels falling in the second gray value range is determined as 49%, which is less than 50%, the processing device 140 may proceed to operation 901. That is, the target gray value 20 is not the current brightness of the target region. In 901, the processing device 140 may determine whether the pixel ratio of pixels having gray value 21 satisfies the first preset condition. If the pixel ratio of pixels having gray value 21 satisfies the first preset condition, the processing device 140 may determine the gray value 21 as the target gray value, and proceed to operation 903. Otherwise, if the pixel ratio of the pixels having gray value 21 does not satisfy the first preset condition, the processing device 140 may continue to determine whether the pixel ratio of the pixel having subsequent gray value satisfies the first preset condition until the target gray value is determined.
It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, one or more other optional operations (e.g., a storing operation) may be added elsewhere in the process 700. In the storing operation, the processing device 140 may store information and/or data (e.g., the pixel ratio of the first gray value range, the pixel ratio of the second gray value range, etc.) associated with the image processing system 100 in a storage device (e.g., the storage device 140) disclosed elsewhere in the present disclosure.
In 1001, the processing device 140 (e.g., the current brightness determination module 404) may identify an initial region including the target region from the image. In some embodiments, the initial region may be a rectangular box, a box along a 2D outline of an object (e.g., a human, a vehicle), a circular box, or any regular or irregular box. For example, the initial region may be a rectangular box including a human, a rectangular box including a vehicle, or the like. In some embodiments, the processing device 140 may identify the initial region using an object detection algorithm. The object detection algorithm may include a Regions with Convolutional Neural Network features (R-CNN) algorithm, a Fast R-CNN algorithm, a Faster R-CNN algorithm, a You Only Look Once (YOLO) algorithm, a Single Shot multibox Detector (SSD) algorithm, or the like, or any combination thereof.
In some embodiments, the processing device 140 may identify the initial region based on an object detection model (e.g., a human detection model, a vehicle detection model). For example, the processing device 140 may extract one or more features of each object, and assign a score for each object. If the score of an object exceeds a threshold, the processing device 140 may determine the object as an initial region. In some embodiments, the object detection model may be generated by pre-training a preliminary model based on a plurality of training samples. For example, in order to generate a human detection model, the processing device 140 may train the preliminary model using a plurality of images including humans. As another example, in order to generate a vehicle detection model, the processing device 140 may train the preliminary model using a plurality of images including vehicles. The preliminary model may include a Ranking Support Vector Machine (SVM) model, a Gradient Boosting Decision Tree (GBDT) model, an adaptive boosting model, a recurrent neural network model (e.g., a long short term memory (LSTM) neural network model, a hierarchical recurrent neural network model, a bi-direction recurrent neural network model, a second-order recurrent neural network model, a fully recurrent network model), a convolutional network model, a hidden Markov model, a perceptron neural network model, a Hopfield network model, a self-organizing map (SOM), or a learning vector quantization (LVQ), or the like, or any combination thereof.
In some embodiments, the processing device 140 may also determine a position of the initial region in the image. For example, the processing device 140 may determine coordinate information of the upper left corner and/or upper right corner of the initial region, or coordinate information of the center of the initial region.
In 1003, the processing device 140 (e.g., the current brightness determination module 404) may determine the target region by trimming the initial region according to a predetermined trimming strategy. For an initial region for an animal including a human, the target region may be a facial region. For an initial region for a vehicle, the target region may be a region including a license plate. The processing device 140 may trim the initial region to obtain the target region.
In some embodiments, the predetermined trimming strategy may be different for different initial regions. Merely by way of example, the initial region may be a rectangular box including a human. For a relatively long rectangular box (e.g., a rectangular box including a whole body), the processing device 140 may apply a three-step trimming strategy. Specifically, step one, the processing device 140 may trim the lower part of the rectangular box (e.g., 45%, 49%, 50%, 51%, 55%, etc. of the rectangular box). The trimmed rectangular box may include an upper part of the body. Step two, the processing device 140 may trim a portion of the upper part of the trimmed rectangular box (e.g., 10% of the rectangular box) and a portion of the lower part of the trimmed rectangular box (e.g., 30% of the rectangular box). Step three, the processing device 140 may symmetrically trim a portion of the left side and the right side of the trimmed rectangular box (e.g., 2% of the rectangular box). Then the target region (e.g., a facial region) may be determined. The aspect ratio of the target region may be 3:4. For a relatively short rectangular box (e.g., a rectangular box including an upper part of the body), the processing device 140 may apply a two-step trimming strategy. Specifically, step one may be omitted. The processing device 140 may determine the target region by trimming the initial region according to steps two and three. The aspect ratio of the target region may be 3:4. In some embodiments, if the initial region is a rectangular box including a lower part of the body (which not includes a face), the processing device 140 may remove the initial region.
In 1005, the processing device 140 (e.g., the current brightness determination module 404) may perform a downsampling on the target region. In some embodiments, in order to reduce the computing amount of determining the current brightness, the target region may be downsampled. For example, the facial region may be downsampled.
In some embodiments, the image may be a raw image in a Bayer format. The target region may be a part of the raw image in the Bayer format. The Bayer format may include a GRBG format, a RGGB format, or the like. As used herein, “G” refer to green, “R” refers to red, and “B” refers to blue. In some embodiments, the processing device 140 may perform a G-channel downsampling on the target region. Merely by way of example,
In some embodiments of the present disclosure, the initial region may be determined using an object detection model, and the target region may be obtained by trimming the initial region, which can effectively identify the target region under the backlighting condition, the reflective condition, or the like. In the present disclosure, the processing device 140 may perform the downsampling on the raw image (or the target region), thus saving the subsequent image signal processing (ISP) process and reducing the computing amount of determining the current brightness.
It should be noted that the above description regarding the process 1000 is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, operation 1005 may be omitted. The processing device 140 may determine the current brightness of the target region based on the gray information of the (unsampled) target region. In some embodiments, operation 1005 may be performed before operation 1001. The processing device 140 may perform a downsampling on the image. The processing device 140 may then identify the initial region including the target region from the downsampled image.
In 1201, the processing device 140 (e.g., the current brightness determination module 404) may obtain a position of each of the plurality of target regions in the image.
In some embodiments, the position of each target region may be represented by a coordinate of the target region. For example, the processing device 140 may establish a two-dimensional (2D) coordinate system by taking the lower left corner of the image as the origin, or by taking a center of the image as the origin. The processing device 140 may then determine the coordinate of each target region in the 2D coordinate system. In some embodiments, the coordinate of a target region may refer to the coordinate of a center of the target region.
In 1203, the processing device 140 (e.g., the current brightness determination module 404) may determine a deviation distance between each target region and a center of the image based on the position of the target region.
In some embodiments, the processing device 140 may determine the deviation distance of a target region and the center of the image based on the coordinate of the target region. For example, the coordinate of a target region (specifically, the coordinate of the center of the target region) may be (30, 40). The coordinate of the center of the image may be (0, 0). The processing device 140 may determine the deviation distance between the target region and the center of the image as 50 according to the Pythagorean theorem.
In 1205, the processing device 140 (e.g., the current brightness determination module 404) may determine a weight for each target region based on the deviation distance of the target region.
In some embodiments, the processing device 140 may determine a mapping relationship between a plurality of deviation distances and weights. In some embodiments, the mapping relationship may be a negative correlation mapping relationship. The greater the deviation distance is, the smaller the weight is. The smaller the deviation distance is, the greater the weight is. The processing device 140 may determine a weight for each target region based on the deviation distance of the target region and the mapping relationship.
In 1207, the processing device 140 (e.g., the current brightness determination module 404) may determine a current brightness by performing a weighted average on the current brightness of each target region based on the weight of each target region.
In some embodiments, the current brightness of each target region may be determined according to operation 523 of the process 500D, and the descriptions are not repeated herein. The processing device 140 may determine a weighted average of the current brightness of each target region as the current brightness.
The processor 1304 may include one or more central processing unit (CPU). The processor 1304 may be an integrated circuit chip with signal processing capabilities. The processor 1304 may also be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a programmable logic device, a discrete gate, a transistor logic device, a discrete hardware component, or the like, or any combination thereof. In some embodiments, the general-purpose processor may be a microprocessor, a conventional processor, or the like. In some embodiments, the processor 1304 may be implemented by multiple circuit chips.
In some embodiments, the processor 1304 may be configured to obtain a target region within an image captured by a capture device (e.g., the capture device 1306). The processor 1304 may also be configured to determine a current brightness of the target region based on gray information of the target region. The processor 1304 may be configured to determine a target brightness of the target region based on the current brightness of the target region. The processor 1304 may be further configured to determine one or more exposure parameters of the capture device (e.g., the capture device 1306). More descriptions of the determination of the exposure parameter(s) may be found elsewhere in the present disclosure (e.g.,
In some embodiments, the processor 1304 may be configured to determine a number of pixels corresponding to each gray value and a number of pixels falling in different gray value ranges by performing statistics on gray values of pixels in the target region, and determine the current brightness of the target region based on the number of pixels corresponding to each gray value and the number of pixels falling in different gray value ranges.
In some embodiments, the processor 1304 may be configured to determine a degree of exposure based on the number of pixels in different gray value ranges. The processor 1304 may also be configured to determine the current brightness of the target region based on the number of pixels corresponding to each gray value, the number of pixels falling in different gray value ranges, and a first predetermined threshold corresponding to the degree of exposure.
In some embodiments, the processor 1304 may be configured to determine whether a pixel ratio of a first gray value range is greater than a second predetermined threshold based on the number of pixels falling in different gray value ranges. The pixel ratio of the first gray value range may be a ratio of the number of pixels falling in the first gray value range to a total number of pixels in the target region. In response to a determination that the pixel ratio is greater than the second predetermined threshold, the processor 1304 may determine the degree of exposure of the target region as underexposure. Alternatively, in response to a determination that the pixel ratio is less than or equal to the second predetermined threshold, the processor 1304 may determine the degree of exposure of the target region as overexposure.
In some embodiments, the processor 1304 may be configured to determine the gray value of the pixel whose pixel ratio satisfies a first preset condition as a target gray value. The pixel ratio of the target gray value may be a ratio of the number of pixels corresponding to the target gray value to a total number of pixels in the target region. The processor 1304 may also be configured to determine whether a pixel ratio of a second gray value range is greater than the first predetermined threshold based on the number of pixels falling in different gray value ranges. Pixels in the second gray value range may include one or more pixels whose gray value is less than or equal to the target gray value. In response to a determination that the pixel ratio of the second gray value range is greater than the first predetermined threshold, the processor 1304 may determine the target gray value as the current brightness of the target region. In some embodiments, the first preset condition may refer to pixel ratios of the target gray value and one or more subsequent gray values are less than a third predetermined threshold.
In some embodiments, the image may be a raw image. The processor 1304 may be configured to identify an initial region including the target region from the image using an object detection model. The processor 1304 may be also configured to determine the target region by trimming the initial region according a predetermined trimming strategy. The processor 1304 may further be configured to perform a downsampling on the target region. In some embodiments, the processor 1304 may perform a G-channel downsampling on the target region. The processor 1304 may determine the current brightness of the target region based on the gray information of the downsampled target region.
In some embodiments, the image may include a plurality of target regions. The processor 1304 may be configured to identify a target region with a largest size among the plurality of target regions, and designate a current brightness of the target region with the largest size as the current brightness. In some embodiments, the processor 1304 may be configured to obtain a position of each of the plurality of target regions in the image. The processor 1304 may also be configured to determine a deviation distance between each target region and a center of the image based on the position of the target region. The processor 1304 may be further configured to determine a weight for each target region based on the deviation distance of the target region, and determine the current brightness by performing a weighted average on the current brightness of each target region based on the weight of each target region.
In some embodiments, the processor 1304 may be configured to, if the current brightness of the target region is within a predetermined brightness range, determine the current brightness as the target brightness. If the current brightness of the target region is not within the predetermined brightness range, the processor 1304 may determine a target brightness based on the current brightness and a brightness mapping relationship. The brightness mapping relationship may be a correspondence between one or more target brightness values and one or more current brightness values. In some embodiments, if the current brightness of the target region is less than the minimum value of the predetermined brightness range, the processor 1304 may adjust the current brightness, e.g., increasing the current brightness, to obtain the target brightness. In some embodiments, if the current brightness of the target region is greater than the maximum value of the predetermined brightness range, the processor 1304 may adjust the current brightness, e.g., decreasing the current brightness, to obtain the target brightness.
Having thus described the basic concepts, it may be rather apparent to those skilled in the art after reading this detailed disclosure that the foregoing detailed disclosure is intended to be presented by way of example only and is not limiting. Various alterations, improvements, and modifications may occur and are intended to those skilled in the art, though not expressly stated herein. These alterations, improvements, and modifications are intended to be suggested by this disclosure, and are within the spirit and scope of the exemplary embodiments of this disclosure.
Moreover, certain terminology has been used to describe embodiments of the present disclosure. For example, the terms “one embodiment,” “an embodiment,” and/or “some embodiments” mean that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Therefore, it is emphasized and should be appreciated that two or more references to “an embodiment” or “one embodiment” or “an alternative embodiment” in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the present disclosure.
Further, it will be appreciated by one skilled in the art, aspects of the present disclosure may be illustrated and described herein in any of a number of patentable classes or context including any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof. Accordingly, aspects of the present disclosure may be implemented entirely hardware, entirely software (including firmware, resident software, micro-code, etc.) or combining software and hardware implementation that may all generally be referred to herein as a “unit,” “module,” or “system.” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable media having computer readable program code embodied thereon.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including electro-magnetic, optical, or the like, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that may communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable signal medium may be transmitted using any appropriate medium, including wireless, wireline, optical fiber cable, RF, or the like, or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB. NET, Python or the like, conventional procedural programming languages, such as the “C” programming language, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider) or in a cloud computing environment or offered as a service such as a Software as a Service (SaaS).
Furthermore, the recited order of processing elements or sequences, or the use of numbers, letters, or other designations therefore, is not intended to limit the claimed processes and methods to any order except as may be specified in the claims. Although the above disclosure discusses through various examples what is currently considered to be a variety of useful embodiments of the disclosure, it is to be understood that such detail is solely for that purpose, and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover modifications and equivalent arrangements that are within the spirit and scope of the disclosed embodiments. For example, although the implementation of various components described above may be embodied in a hardware device, it may also be implemented as a software only solution, e.g., an installation on an existing server or mobile device.
Similarly, it should be appreciated that in the foregoing description of embodiments of the present disclosure, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the various embodiments. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed subject matter requires more features than are expressly recited in each claim. Rather, claimed subject matter may lie in less than all features of a single foregoing disclosed embodiment.
| Number | Date | Country | Kind |
|---|---|---|---|
| 201910568901.3 | Jun 2019 | CN | national |
| 202411402380.1 | Oct 2024 | CN | national |
This application claims priority to Chinese Application No. 202411402380.1, filed on Oct. 9, 2024 and is a continuation-in-part of U.S. patent application Ser. No. 17/643,185, filed on Dec. 8, 2021, which is a continuation of International Application No. PCT/CN2019/129747, filed on Dec. 30, 2019, which claims priority to Chinese Application No. 201910568901.3, filed on Jun. 27, 2019. Each of the above-related applications is hereby incorporated by reference.
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/CN2019/129747 | Dec 2019 | WO |
| Child | 17643185 | US |
| Number | Date | Country | |
|---|---|---|---|
| Parent | 17643185 | Dec 2021 | US |
| Child | 19037213 | US |