The present disclosure relates to image processing technologies, in particular to an image inference method, a computer device, and a storage medium.
Generally, a deep learning model is trained under a machine learning framework and a hardware accelerator acting jointly. However, for an application scenario that requires frequently performing inference on images, it is very difficult to quickly respond to requirements using the trained deep learning model. Existing inference methods cannot achieve performing inferences at high speed and do not meet requirements of high speed productions.
In order to provide a more clear understanding of the objects, features, and advantages of the present disclosure, the same are given with reference to the drawings and specific embodiments. It should be noted that the embodiments in the present disclosure and the features in the embodiments may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a full understanding of the present disclosure. The present disclosure may be practiced otherwise than as described herein. The following specific embodiments are not to limit the scope of the present disclosure.
Unless defined otherwise, all technical and scientific terms herein have the same meaning as used in the field of the art technology as generally understood. The terms used in the present disclosure are for the purposes of describing particular embodiments and are not intended to limit the present disclosure.
At block S1, the computer device receives an inference request for inferring an image, and detects whether the inference request is correct.
In one embodiment, the inferring of the image may refer to detect defect in the image, recognizing one or more objects from the image, or other operation should be performed on the image.
In one embodiment, information contained in the inference request includes, but is not limited to, the image, a name of a neural network required to infer the image, and a format of the image.
In one embodiment, the computer device detects whether the inference request is correct by:
In one embodiment, the computer device pre-trains a plurality of neural network models, and each of the plurality of neural network models corresponds to a name. For example, a neural network model used for defect detection is named “Defect detection_A”. The plurality of neural network models may also include image recognition models. The image recognition model may be a neural network model that recognizes objects (e.g., people, characters, etc.) in images. In other embodiments, the plurality of neural network models may further include an image localization model, an image collocation model, and the like.
It should be noted that the inference request may further include a request for detecting defects appearing in the image, a request for recognizing the image, and the like.
In one embodiment, if the detection results of the first detection, the second detection and the third detection do not meet the corresponding conditions, the computer device confirms that the inference request is wrong, and receives an updated inference request until an inference request received by the computer device is found to be correct. The computer device performs subsequent blocks on the basis of a correct inference request.
At block S2, when the inference request is correct, the computer device determines a target collocation scheme for the image according to the inference request and a preset weight table, the target collocation scheme includes a hardware accelerator in an idle state and an estimated time duration of inferring the image.
In one embodiment, the preset weight table includes: a name of each of a plurality of hardware accelerators (for example, CPU, GPU, TPU, VPU, etc.), a current usage state of each hardware accelerator, and a name of each of a plurality of machine learning frameworks (for example, Tensorflow, OpenVINO, PyTorch, ONNX, etc.) supported by each hardware accelerator, a name of each of a plurality of neural networks loaded by each hardware accelerator, and an estimated time duration of each neural network loaded by each hardware accelerator, wherein the usage state of each hardware accelerator may be an in use state or an idle state. For example, as shown in
In one embodiment, the determining of the target collocation scheme for the image according to the inference request and the preset weight table includes:
In other embodiments, the computer device may obtain a plurality of collocation schemes by exhaust all combinations of each of the plurality of hardware accelerators, each of the plurality of machine learning frameworks, and each of the plurality of neural networks; obtain the estimated time duration of each of the plurality of collocation schemes; save the plurality of collocation schemes; when the inference request for inferring an image is correct, select first collocation schemes each of which including a hardware accelerator in idle state from the plurality of collocation schemes; select second collocation schemes suitable for the image from the first collocation schemes; and determine a collocation scheme corresponding to a shortest estimated time duration from the second collocation schemes as the target collocation scheme.
In other embodiments, each of the plurality of collocation schemes may include more than one hardware accelerators, and the more than one hardware accelerators can parallelly perform inference on the image. Accordingly, the target collocation scheme may include more than one hardware accelerators, and the more than one hardware accelerators can parallelly perform inference on the image, thereby further improving, the efficiency of inferring, an image.
In one embodiment, when there is no hardware accelerator in idle state during the calculation of the target collocation scheme, the computer device further suspends the calculation of the target collocation scheme until there is at least one hardware accelerator in idle state. It should be noted that the usage state of the hardware accelerator mentioned at block S4 will be updated to be idle state after the inference of the image is completed.
At block S3, the computer device updates a usage state of the hardware accelerator included in the target collocation scheme from the idle state to be an in use state, and infers the image according to the target collocation scheme.
In one embodiment, the inferring of the image according to the target collocation scheme includes:
For example, when the target collocation scheme indicates that the hardware accelerator is VPU, the machine learning framework is OpenVINO and the neural network is “Defect Detection_A”, the computer device first updates the usage state of the VPU to be the in use state, and converts the format of the image from the matrix format to be the format (for example, binarize the image) required by OpenVINO, and then perform defect detection on the converted image by using “Defect Detection_A”.
At block S4, when the inferring of the image is completed, the computer device updates the usage state of the hardware accelerator included in the target collocation scheme from the in use state to be the idle state, obtains an actual time duration of inferring the image, and updates the estimated time duration included in target collocation scheme to be the actual time duration when the estimated time duration is not equal to the actual time duration.
In one embodiment, the actual time duration of inferring the image refers to a time duration between a first time point and a second time point. The first time point is a time point that the computer device begins to determine the required format, and the second time point refers to a time point that the computer device obtains a result of inferring the image. It should be noted that the first time point and the second time point can be defined to be other suitable time points. In one embodiment, it is assumed that the target collocation scheme indicates that the hardware accelerator is “WU”, and the machine teaming framework is “OpenVINO” and the neural network is “Defect Detection_A”, and the target collocation scheme indicates that the estimated time duration equals 0.5 seconds, if the actual time duration of inferring the image equals 0.4 seconds which is different from the estimated time duration, the computer device updates the estimated time duration of the target matching scheme to be the actual time duration i.e., 0.4 seconds, so as to provide a more accurate estimated time duration for a next use of this target collocation scheme.
The inference efficiency improvement method provided by this disclosure is aimed at the production field with the demand for prediction speed, and can make full use of all software and hardware resources, break through the limitation of a single hardware with a single machine learning framework, and maximize the use of software and hardware resources, which improves the prediction throughput of inference while maintaining the advantages of each machine learning framework. It can also dynamically update the actual time duration of inferring an image, effectively improving the inference efficiency of images.
It should be understood that the described embodiments are for illustrative purposes only, and are not limited by this structure in the scope of the claims.
In at least one embodiment, the computer device 3 may include a terminal that is capable of automatically performing numerical calculations and/or information processing in accordance with pre-set or stored instructions. The hardware of terminal can include, but is not limited to, a microprocessor, an application specific integrated circuit, programmable gate arrays, digital processors, and embedded devices.
It should be noted that the computer device 3 is merely an example, and other existing or future electronic products may be included in the scope of the present disclosure, and are thus included in the reference.
In some embodiments, the storage device 31 can be used to store program codes of computer readable programs and various data, such as an image inference system 30 installed in the computer device 3, and automatically access the programs or data with high speed during the running of the computer device 3. The storage device 31 can include a read-only memory (ROM), a random access memory (RAM), a programmable read-only memory (PROM), an erasable programmable read only memory (EPROM), an one-time programmable read-only memory (OTPROM), an electronically-erasable programmable read-only memory (EEPROM)), a compact disc read-only memory (CD-ROM), or other optical disk storage, magnetic disk storage, magnetic tape storage, or any other storage medium readable by the computer device 3 that can be used to carry or store data.
In some embodiments, the at least one processor 32 may be composed of an integrated circuit, for example, may be composed of a single packaged integrated circuit, or multiple integrated circuits of same function or different functions. The at least one processor 32 can include one or more central processing units (CPU), a microprocessor, a digital processing chip, a graphics processor, and various control chips. The at least one processor 32 is a control unit of the computer device 3, which connects various components of the computer device 3 using various interfaces and lines. By running or executing a computer program or modules stored in the storage device 31, and by invoking the data stored in the storage device 31, the at least one processor 32 can perform various functions of the computer device 3 and process data of the computer device 3. For example, the processor 32 may perform the function of inferring an image shown in
In some embodiments, the image inference system 30 operates in computer device 3. The image inference system 30 may include a plurality of functional modules composed of program code segments. The program code of each program segment in the image inference system 30 can be stored in storage device 31 of the computer device 3 and executed by at least one processor 32 to achieve blocks of method as shown in
In this embodiment, the image inference system 30 can be divided into a plurality of functional modules. For example, the image inference system 30 can include a request receiving module 301, a weight calculation module 302, a format conversion module 303, and an inference module 304. The “Module” means a series of computer program segments that can be executed by at least one processor 32 and perform fixed functions and are stored in storage device 31.
In one embodiment, the request receiving module 301 receives the inference request for inferring the image, and detects whether the inference request is correct. When the inference request is correct, the request receiving module 301 sends the inference request to the weight calculation module 302; the weight calculation module 302 determines the target collocation scheme for the image according to the inference request and the preset weight table, the weight calculation module 302 updates the usage state of the hardware accelerator in the target collocation scheme from the idle state to be the in use state; the weight calculation module 302 sends the inference request and the target collocation scheme to the format conversion module 303; the format conversion module 303 determines the format required by the machine learning framework comprised in the target collocation scheme; and obtains the converted image by converting the format of the image into the required format; the format conversion module 303 sends the converted image and the target collocation scheme to the inference module 304, and the inference module 304 infers the converted image by using the neural network comprised in the target collocation scheme. When the inferring of the image is completed, the inference module 304 updates the usage state of the hardware accelerator in the target collocation scheme from the in the use state to be the idle state, and obtains the actual time duration of inferring the image, and updates the estimated time duration to be the actual time duration.
The program codes are stored in storage device 31 and at least one processor 32 nay invoke the program codes stored in storage device 31 to perform the related function. The program codes stored in the storage device 31 can be executed by at least one processor 32, so as to realize the function of each module to achieve the purpose of matching image features as shown in
In one embodiment of this application, said storage device 31 stores at least one instruction, and said at least one instruction is executed by said at least one processor 32 for the purpose of matching image features as shown in
Although not shown, the computer device 3 may further include a power supply (such as a battery) for powering various components. Preferably, the power supply may be logically connected to the at least one processor 32 through a power management device, thereby, the power management device manages functions such as charging, discharging, and power management. The power supply may include one or more DC or AC power sources, a recharging device, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like. The computer device 3 may further include various sensors, such as a BLUETOOTH module, a WI-FI module, and the like, and details are not described herein.
In the several embodiments provided in this disclosure, it should be understood that the devices and methods disclosed can be implemented by other means. For example, the device embodiments described above are only schematic. For example, the division of the modules is only a logical function division, which can be implemented in another way.
The modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical units, that is, may be located in one place, or may be distributed over multiple network units. Part or all of the modules can be selected according to the actual needs to achieve the purpose of this embodiment.
In addition, each functional unit in each embodiment of the present disclosure can be integrated into one processing unit, or can be physically present separately in each unit, or two or more units can be integrated into one unit. The above integrated unit can be implemented in a form of hardware or in a form of a software functional unit.
The above integrated modules implemented in the form of function modules may be stored in a storage medium. The above function modules may be stored in a storage medium, and include several instructions to enable a computing device (which may be a personal computer, server, or network device, etc.) or processor to execute the method described in the embodiment of the present disclosure.
The present disclosure is not limited to the details of the above-described exemplary embodiments, and the present disclosure can be embodied in other specific forms without departing, from the spirit or essential characteristics of the present disclosure. Therefore, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present disclosure is defined by the appended claims. All changes and variations in the meaning and scope of equivalent elements are included in the present disclosure. Any reference sign in the claims should not be construed as limiting the claim. Furthermore, the word “comprising” does not exclude other units nor does the singular exclude the plural. A plurality of units or devices stated in the system claims may also be implemented by one unit or device through software or hardware. Words such as “first” and “second” are used to indicate names but not to signify any particular order.
The above description is only embodiments of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes can be made to the present disclosure. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present disclosure are intended to be included within the scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202210652112.X | Jun 2022 | CN | national |