The present disclosure relates to an electronic device, and more specifically to a method and the electronic device for efficiently reducing dimensions of an image frame.
A flagship computing device usually uses an image with higher resolution (e.g. 1600×1600) to achieve better accuracy for performing Artificial Intelligence (AI) based use cases such as segmentation, matting, night-mode, depth estimation, deblurring, live focus for photo, etc. It is difficult to realize performing the AI based use cases using the image with higher resolution on a mid-tier/low-end computing device due to high memory and computational requirements.
Existing method allows the mid-tier/low-end computing device to down sample the high-resolution image to half or quarter of the image's actual resolution for enabling the AI based use cases on the mid-tier/low-end computing device and achieving the desired KPI. Even though the down sampling operation reduces computation, higher memory requirement, and communication bandwidth, the down sampling operation causes to removal of salient information from the down sampled image as compared to the high-resolution image, which results in loss of accuracy.
As shown, the mid-tier/low-end computing device 30 converts the high-resolution image with dimension 1600×1600×4 (11) to a low-resolution image with dimension 800×800×4 (15), which results in loss of significant features in the high-resolution image with dimension 1600×1600×4 (11). 800, 800, 4 represent height, width, and number of channels of the low-resolution image in the spatial domain respectively. Further, the mid-tier/low-end computing device generates a segmented output (16) from the low-resolution image with dimension 800×800×4 (11) by performing the segmentation using the DNN (12). The inference time for generating the segmented output (13) is 90 ms, which is near to the inference time taken by the flagship computing device. The segmented output (13) meets the desired KPI, but has poor accuracy. Since reducing the resolution results in poor accuracy, it is hard to enable the AI based use case on the mid-tier/low-end computing device.
The existing method allows the mid-tier/low-end computing device to reduce complexity of the neural network (e.g. DNN) used for processing the high-resolution image for AI based use cases. Few existing layers of the neural network remove from the neural network for reducing the complexity of the neural network, which also results in accuracy degradation and hence poor use case performance as compared to results of the flagship computing devices. Thus, it is desired to provide a useful solution for processing the image for the AI based use cases by keeping the desired KPI and an acceptable amount of accuracy.
Embodiments of the disclosure may provide a method and an electronic device for efficiently reducing dimensions of an image frame for AI based use cases within a lower inference time and keeping desired accuracy and KPI.
Embodiments of the disclosure may transform the image frame in Red Green Blue (RGB)/spatial domain to a low-resolution image in non-spatial domain and thereby filtering out irrelevant/less-informative channels of the image frame in the non-spatial domain, which results in dimensionality reduction of the image frame and thereby reducing computations and memory requirements for the AI based use cases.
Embodiments of the disclosure may select most informative channels in the channels of the image frame in the non-spatial domain and ignore the rest of channels of the image frame in the non-spatial domain Thus, the electronic device can perform faster execution of the AI based use cases by achieving better accuracy as compared to existing methods, as the electronic device operates on the high-resolution image without increasing processing time or network complexity.
Embodiments of the disclosure may provide a generic stub layer as a simple plug and play block to embed with a neural network by bypassing insignificant input layers of the neural network for compatibility of the neural network to process the transformed image frame in the non-spatial domain without changing/retraining/redesigning an existing architecture of the neural network.
Accordingly, various example embodiments of the disclosure provide a method for efficiently reducing dimensions of an image frame by an electronic device. The method includes: receiving, by the electronic device, the image frame; transforming, by the electronic device, the image frame from a spatial domain comprising a first plurality of channels to a non-spatial domain comprising a second plurality of channels, wherein a number of the second plurality of channels is greater than a number of first plurality of channels; removing, by the electronic device, channels comprising irrelevant information from among the second plurality of channels using an Artificial Intelligence (AI) engine to generate a low-resolution image frame in the non-spatial domain; and providing, by the electronic device, the low-resolution image frame to a neural network for a faster and accurate inference of the image frame.
In an example embodiment, a Discrete Cosine Transformation (DCT) or a Fourier transformation on the image frame may be performed by the electronic device for transforming the image frame from the spatial domain to the non-spatial domain
In an example embodiment, a generic stub layer may be embedded at an input of the neural network for compatibility of the neural network in receiving the low-resolution image frame, where the generic stub layer bypasses input layers of the neural network that are relevant only for the image frame in the spatial domain.
In an example embodiment, the non-spatial domain comprises a Luminance, Red difference, Blue difference (Y, C, B) domain, a Hue, Saturation, Value (H, S, V) domain, and a Luminance, Chrominance (YUV) domain.
In an example embodiment, transforming, by the electronic device, the image frame from the spatial domain comprising the first plurality of channels to the non-spatial domain comprising the second plurality of channels, where the number of the second plurality of channels is greater than the number of first plurality of channels, comprises: transforming, by the electronic device, the image frame from the spatial domain to the non-spatial domain with the first plurality of channels; and grouping, by the electronic device, components of the transformed image frame with a same frequency into a channel of the second plurality of channels by preserving spatial position information of each component.
In an example embodiment, removing, by the electronic device, the channels comprising the irrelevant information from among the second plurality of channels using the AI engine to generate the low-resolution image frame in the non-spatial domain, comprises: generating, by the electronic device, a tensor by performing a depth-wise convolution and average pool on each channel of the second plurality of channels, including, two trainable parameters with each component of the tensor, determining, by the electronic device, values of the two trainable parameters using the AI engine, determining, by the electronic device, a binary value of each component of the tensor based on the values of the two trainable parameters, performing, by the electronic device, an elementwise product between the second plurality of channels and the binary value of the components of the tensor, filtering, by the electronic device, channels without zero value among the second plurality of channels upon performing the elementwise product, and generating, by the electronic device, the low-resolution image frame in the non-spatial domain using the filtered channels.
Accordingly, various example embodiments herein provide an electronic device for efficiently reducing the dimensions of an image frame. The electronic device includes: an image frame inferencing engine comprising executable program instructions, a memory, a processor, where the image frame inferencing engine is coupled to the memory and the processor. The image frame inferencing engine is configured to: receive the image frame; receive the image frame transforming the image frame from the spatial domain comprising the first plurality of channels to the non-spatial domain comprising the second plurality of channels, wherein the number of the second plurality of channels is greater than the number of first plurality of channels; receive the image frame removing the channels comprising irrelevant information from among the second plurality of channels using an artificial intelligence (AI) engine to generate the low-resolution image frame in the non-spatial domain; and receive the image frame providing the low-resolution image frame to the neural network for the faster and accurate inference of the image frame.
These and other aspects of the various example embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating various example embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the disclosure, and the embodiments herein include all such modifications.
In the accompanying drawings, like reference letters indicate corresponding parts in the various figures. Further, the above and other aspects, features and advantages of certain embodiments of the present disclosure will be more apparent from the following detailed description, taken in conjunction with the accompanying drawings, in which:
The example embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting examples that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques may be omitted so as to not unnecessarily obscure the description herein. The various example embodiments described herein are not necessarily mutually exclusive, as various embodiments can be combined with one or more other embodiments to form new embodiments. The term “or” as used herein, refers to a non-exclusive or, unless otherwise indicated. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein can be practiced and to further enable those skilled in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
Various example embodiments may be described and illustrated in terms of blocks which carry out a described function or functions. These blocks, which may be referred to herein as managers, units, modules, hardware components or the like, are physically implemented by analog and/or digital circuits such as logic gates, integrated circuits, microprocessors, microcontrollers, memory circuits, passive electronic components, active electronic components, optical components, hardwired circuits and the like, and may optionally be driven by firmware. The circuits may, for example, be embodied in one or more semiconductor chips, or on substrate supports such as printed circuit boards and the like. The circuits of a block may be implemented by dedicated hardware, or by a processor (e.g., one or more programmed microprocessors and associated circuitry), or by a combination of dedicated hardware to perform some functions of the block and a processor to perform other functions of the block. Each block of the various embodiments may be physically separated into two or more interacting and discrete blocks without departing from the scope of the disclosure. Likewise, the blocks of the various embodiments may be physically combined into more complex blocks without departing from the scope of the disclosure.
The accompanying drawings are used to help easily understand various technical features and it should be understood that the embodiments presented herein are not limited by the accompanying drawings. As such, the present disclosure should be construed to extend to any alterations, equivalents and substitutes in addition to those which are particularly set out in the accompanying drawings. Although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are generally used simply to distinguish one element from another.
Throughout this disclosure, the terms “image frame” and “image” are used interchangeably and refer to the same feature.
Accordingly, the various example embodiments herein provide a method for efficiently reducing dimensions of an image frame by an electronic device. The method includes receiving, by the electronic device, the image frame. The method includes transforming, by the electronic device, the image frame from a spatial domain comprising a first plurality of channels to a non-spatial domain comprising a second plurality of channels, wherein a number of the second plurality of channels is greater than a number of first plurality of channels. The method includes removing, by the electronic device, channels comprising irrelevant information from among the second plurality of channels using an Artificial Intelligence (AI) engine to generate a low-resolution image frame in the non-spatial domain. The method includes providing, by the electronic device, the low-resolution image frame to a neural network for a faster and accurate inference of the image frame.
Accordingly, the various example embodiments herein provide the electronic device for efficiently reducing the dimensions of an image frame. The electronic device includes an Image Frame Inferencing Engine (IFIE) including various circuitry and/or executable program instructions, a memory, a processor, where the image frame inferencing engine is coupled to the memory and the processor. The image frame inferencing engine is configured for receiving the image frame. The image frame inferencing engine is configured for receiving the image frame transforming the image frame from the spatial domain comprising the first plurality of channels to the non-spatial domain comprising the second plurality of channels, wherein the number of the second plurality of channels is greater than the number of first plurality of channels. The image frame inferencing engine is configured for receiving the image frame removing the channels comprising irrelevant information from among the second plurality of channels using the AI engine to generate the low-resolution image frame in the non-spatial domain. The image frame inferencing engine is configured for receiving the image frame providing the low-resolution image frame to the neural network for the faster and accurate inference of the image frame.
Unlike existing methods and systems, the electronic device may efficiently reduce dimensions of the image frame for AI based use cases within a lower inference time and keeps desired accuracy and KPI.
Unlike existing methods and systems, the electronic device may transform the image frame in Red Green Blue (RGB)/spatial domain to a low-resolution image in the non-spatial domain and thereby filtering out irrelevant/less-informative channels of the image frame in the non-spatial domain, which results in dimensionality reduction of the image frame and thereby reducing computations and memory requirements for the AI based use cases.
Unlike existing methods and systems, the electronic device may select most informative channels in the channels of the image frame in the non-spatial domain and ignores the rest. Thus, the electronic device can perform faster execution of the AI based use cases by achieving better accuracy as compared to conventional systems, as the electronic device operates on the high-resolution image without increasing processing time or network complexity.
Unlike existing method and systems, the electronic device may use a generic stub layer as a simple plug and play block to embed with the neural network by bypassing insignificant input layers of the neural network for compatibility of the neural network to process the transformed image frame in the non-spatial domain without changing/retraining/redesigning an existing architecture of the neural network.
Unlike existing method and systems, the electronic device may transform the image frame in the spatial domain to the low-resolution image frame in the non-spatial domain using Discrete Cosine Transform (DCT) which changes the dimensions of the image frame in the spatial domain The dimensionality of the image frame in the non-spatial domain for the network in Height (H) and Width (W) is drastically reduced by a factor of ‘X’. The shape of the image frame in the non-spatial domain input in which the channels or depth is increased by a factor of 2x is hardware accelerator (e.g., DSP/NPU) friendly format and hence is very computed effective.
The electronic device may identify irrelevant features/channels in the low-resolution image frame using a novel deep learning engine (e.g., AI engine) thereby reducing the dimension of the transformed image frame and hence drastically reducing computation and memory requirements, and data transfer bandwidth. Since preserving the relevant information in the transformed image frame, the disclosed method enables to achieve better accuracy as compared to the image frame in the spatial domain. Thus, the disclosed method facilitates to enable flagship compatible AI based use cases having high input resolution on low-end/mid-tier computing devices with better accuracy and less computations/memory requirement. Additionally, since an input image is transformed to the non-spatial domain, the electronic device can easily operate on high-resolution images as compared to spatial domain methods. The disclosure results in accuracy improvements in flagship computing devices, accuracy and performance, and power saving benefits in the case of the low-end/mid-tier computing devices.
Referring now to the drawings, and more particularly to
The IFIE (110) receives the image frame from a source such as the memory (120) or a device connected to the electronic device (100) or the camera sensor of the electronic device (100), where the image frame is created in a spatial domain including a first plurality of channels. Further, the IFIE (110) transforms the image frame from the spatial domain to a non-spatial domain including a second plurality of channels, where a number of the second plurality of channels is more than a number of the first plurality of channels. In an example, the IFIE (110) may perform a Discrete Cosine Transformation (DCT) or a Fourier transformation on the image frame for transforming the image frame from the spatial domain to the non-spatial domain. In an embodiment, the non-spatial domain can be a Luminance, Red difference, Blue difference (Y, C, B) domain, or a Hue, Saturation, Value (H, S, V) domain, or a Luminance, Chrominance (YUV) domain. In an embodiment, for transforming the image frame from the spatial domain to the non-spatial domain, the IFIE (110) transforms the image frame from the spatial domain to the non-spatial domain with the first plurality of channels. Further, the IFIE (110) groups components of the transformed image frame with the same frequency into a channel of the second plurality of channels by preserving spatial position information of each component.
The IFIE (110) removes channels including irrelevant information from among the second plurality of channels using the Artificial Intelligence (AI) engine (150) to generate a low-resolution image frame in the non-spatial domain. In an embodiment, for removing the channels including the irrelevant information from among the second plurality of channels, the IFIE (110) generates a tensor by performing a depth-wise convolution and average pool on each channel of the second plurality of channels. Further, the IFIE (110) adds two trainable parameters with each component of the tensor. Further, the IFIE (110) determines values of the two trainable parameters using the AI engine (150). Further, the IFIE (110) determines a binary value of each component of the tensor based on the values of the two trainable parameters. Further, the IFIE (110) performs an elementwise product between the second plurality of channels and the binary value of the components of the tensor. Further, the IFIE (110) filters the channels without zero value among the second plurality of channels upon performing the elementwise product. Further, the IFIE (110) generates the low-resolution image frame in the non-spatial domain using the filtered channel.
The IFIE (110) provides the low-resolution image frame to the neural network (e.g. Deep Neural Network (DNN)) of the electronic device (100) for a faster and accurate inference of the image frame. In an embodiment, a generic stub layer is embedded at an input of the neural network for compatibility of the neural network in receiving the low-resolution image frame, where the generic stub layer bypasses input layers of the neural network that are relevant only for the image frame in the spatial domain
The memory (120) stores the image frame, the neural network, and an AI model. The memory (120) stores instructions to be executed by the processor (130). The memory (120) may include non-volatile storage elements. Examples of such non-volatile storage elements may include magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories. In addition, the memory (120) may, in some examples, be considered a non-transitory storage medium. The term “non-transitory” may indicate that the storage medium is not embodied in a carrier wave or a propagated signal. However, the term “non-transitory” should not be interpreted that the memory (120) is non-movable. In some examples, the memory (120) can be configured to store larger amounts of information than its storage space. In certain examples, a non-transitory storage medium may store data that can, over time, change (e.g., in Random Access Memory (RAM) or cache). The memory (120) can be an internal storage unit or it can be an external storage unit of the electronic device (100), a cloud storage, or any other type of external storage.
The processor (130) may include various processing circuitry and may be configured to execute instructions stored in the memory (120). The processor (130) may include one or a plurality of processors. The processor (130) may be a general-purpose processor, such as a Central Processing Unit (CPU), an Application Processor (AP), or the like, a graphics-only processing unit such as a Graphics Processing Unit (GPU), a Visual Processing Unit (VPU) and the like. The processor (130) may include multiple cores to execute the instructions. The communicator (140) may include various communication circuitry and may be configured for communicating internally between hardware components in the electronic device (100). Further, the communicator (140) is configured to facilitate the communication between the electronic device (100) and other devices via one or more networks (e.g. Radio technology). The communicator (140) includes an electronic circuit specific to a standard that enables wired or wireless communication.
At least one of a plurality of modules may be implemented through the AI engine (150). A function associated with AI engine (150) may be performed through the non-volatile memory, the volatile memory, and the processor. The one or a plurality of processors control the processing of the input data in accordance with a predefined operating rule or the AI engine (150) stored in the non-volatile memory and the volatile memory. The predefined operating rule or the AI engine (150) is provided through training or learning. Here, being provided through learning may refer, for example, to, by applying a learning method to a plurality of learning data, the predefined (e.g., specified) operating rule or the AI engine (150) of a desired characteristic being made. The learning may be performed in the electronic device (100) itself in which the AI engine (150) according to an embodiment is performed, and/o may be implemented through a separate server/system. The AI engine (150) may include a plurality of neural network layers. Each layer has a plurality of weight values, and performs a layer operation through calculation of a previous layer and an operation of a plurality of weights. Examples of neural networks include, but are not limited to, convolutional neural network (CNN), deep neural network (DNN), recurrent neural network (RNN), restricted Boltzmann Machine (RBM), deep belief network (DBN), bidirectional recurrent deep neural network (BRDNN), generative adversarial networks (GAN), and deep Q-networks. The learning method is a method for training a predetermined target device (for example, a robot) using a plurality of learning data to cause, allow, or control the target device to make a determination or prediction. Examples of the learning method include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.
Although
The various actions, acts, blocks, steps, or the like in the flow diagram (400) may be performed in the order presented, in a different order, or simultaneously. Further, in various embodiments, some of the actions, acts, blocks, steps, or the like may be omitted, added, modified, skipped, or the like without departing from the scope of the disclosure.
As shown in
The channel in the non-spatial domain multiplied by 0 will trim or ignore. The channel in the non-spatial domain multiplied with 1 will retain. At 606, the electronic device (100) combines the channel in the non-spatial domain that are retained to form a second tensor (e.g., low-resolution image frame) of dimension H×W×C′. The H, W, C′ represents height, width, and a number of channels of the second tensor respectively. C′ will be always very less than C. Since only C′ number of channels are transmitted to next stage of the DNN or the neural network, the disclosed method results in better performance and memory data transfers. Also, since the C′ number of channels encapsulates most relevant information, the disclosed method also results in increased accuracy against traditional spatial method.
Consider, an example of an existing neural network with layers (701-704). The layer (701) of the neural network is useful for processing the image frame in the spatial domain, and may not be useful only for processing the second tensor. The generic stub layer (705) bypasses the layer (701) of the neural network, and embeds it to the second layer (702) of the neural network for compatibility of the neural network in receiving the second tensor (e.g., low-resolution image frame) of dimension H×W×C′. The generic stub layer (705) receives the second tensor (e.g., low-resolution image frame) of dimension H×W×C′, and further provides the second tensor after processing using layers of the generic stub layer (705) to the second layer (702) of the neural network. With help of the generic stub layer (705), the disclosed method is easily adaptable to the existing neural networks/DNNs without modifying a network architecture or retraining the layers of the existing neural networks/DNNs.
The various example embodiments disclosed herein can be implemented using at least one hardware device and performing network management functions to control the elements.
While the disclosure has been illustrated and described with reference to various example embodiments, it will be understood that the various example embodiments are intended to be illustrative, not limiting. It will be further understood by those skilled in the art that various changes in form and detail may be made without departing from the true spirit and full scope of the disclosure, including the appended claims and their equivalents. It will also be understood that any of the embodiment(s) described herein may be used in conjunction with any other embodiment(s) described herein.
Number | Date | Country | Kind |
---|---|---|---|
202241006331 | Feb 2022 | IN | national |
This application is a bypass continuation application of International Application No. PCT/KR2022/015277 designating the United States, filed on Oct. 11, 2022, in the Korean Intellectual Property Receiving Office and claiming priority to Indian Patent Application No. IN202241006331, filed Feb. 7, 2022, in the Indian Intellectual Property Office, the disclosures of which are incorporated by reference herein in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/KR2022/015277 | Oct 2022 | US |
Child | 17978458 | US |