In the process of reconstructing a three-dimensional model based on a two-dimensional image, the feature of an image needs to be acquired by a deep neural network, and then image feature regression is adopted to get parameters of the three-dimensional model, and the three-dimensional model is reconstructed based on the obtained parameters of the three-dimensional model.
The disclosure relates to the technical field of image processing, and particularly to a three-dimensional model generation method and apparatus, a neural network generation method and apparatus, an electronic device and a computer-readable storage medium.
Embodiments of the disclosure provide at least a three-dimensional model generation method and apparatus, a neural network generation method and apparatus, an electronic device and a computer-readable storage medium.
In a first aspect, an embodiment of the disclosure provides a three-dimensional model generation method, which includes: acquiring first sphere position information of each first sphere of multiple first spheres in a camera coordinate system based on a first image including a first object, where the multiple first spheres are configured to represent different parts of the first object respectively; generating a first rendered image based on the first sphere position information of the multiple first spheres; obtaining gradient information of the first rendered image based on the first rendered image and a semantically segmented image of the first image; and adjusting the first sphere position information of the multiple first spheres based on the gradient information of the first rendered image, and generating a three-dimensional model of the first object by utilizing the adjusted first sphere position information of the multiple first spheres.
Therefore, image rendering is performed on the first sphere position information of the multiple first spheres representing the three-dimensional model, the gradient information capable of representing the degree of correctness of the first sphere position information of the multiple first spheres is determined based on the result of rendering the first image, and the first sphere position information corresponding to the first spheres is readjusted respectively based on the gradient information, such that the adjusted first sphere position information has higher accuracy, i.e., the three-dimensional model recovered based on the first sphere position information corresponding to the first spheres respectively further has higher accuracy.
In an implementation, the operation of generating the first rendered image based on first sphere position information of the multiple first spheres includes: determining first three-dimensional position information of each vertex of multiple patches forming each first sphere in a camera coordinate system respectively based on the first sphere position information; and generating the first rendered image based on the first three-dimensional position information of each vertex of the multiple patches forming each first sphere in the camera coordinate system.
Therefore, a first object is divided into multiple parts represented as different first spheres, the first rendered image is generated based on the first three-dimensional position information of each vertex of the multiple patches forming different spheres in the camera coordinate system, the first rendered image includes three-dimensional relation information of the different parts of the first object, and a three-dimensional model of the first object may be constrained based on the gradient information determined by the first rendered image, such that the three-dimensional model of the first object has higher accuracy.
In an implementation, the operation of determining first three-dimensional position information of each vertex of the multiple patches forming each first sphere in a camera coordinate system based on first sphere position information includes: determining the first three-dimensional position information of each vertex of the multiple patches forming each first sphere in the camera coordinate system respectively based on a first positional relation between template vertices of multiple template patches forming a template sphere and a center point of the template sphere, as well as the first sphere position information of each first sphere.
Therefore, multiple first spheres are obtained by deforming multiple template patches, and the surfaces of the spheres are characterized by the patches, such that the complexity in the generation of the first rendered image through rendering is reduced.
In an implementation, the first sphere position information of each first sphere includes: second three-dimensional position information of a center point of each first sphere in the camera coordinate system, lengths corresponding to three coordinate axes of each first sphere respectively, and a rotation angle of each first sphere relative to the camera coordinate system.
Therefore, the position and attitude of each first sphere in the camera coordinate system may be clearly represented through the foregoing three parameters.
In an implementation, the operation of determining first three-dimensional position information of each vertex of the multiple patches forming each first sphere in a camera coordinate system respectively based on the first positional relation between the template vertices of the multiple template patches forming a template sphere and the center point of the template sphere, as well as the first sphere position information of each first sphere includes: transforming the template sphere in terms of shape and rotation angle based on lengths corresponding to the three coordinate axes of each first sphere respectively and the rotation angle of each first sphere relative to the camera coordinate system; determining a second positional relation between each template vertex and a center point of the transformed template sphere based on the result of transforming the template sphere in terms of shape and rotation angle, as well as the first positional relation; and determining the first three-dimensional position information of each vertex of the multiple patches forming each first sphere in the camera coordinate system respectively based on the second three-dimensional position information of the center point of each first sphere in the camera coordinate system and the second positional relation.
Therefore, the first three-dimensional position information can be quickly acquired.
In an implementation, the method further includes: acquiring a camera projection matrix of the first image. The operation of generating the first rendered image based on the first three-dimensional position information of each vertex of the multiple patches forming each first sphere in the camera coordinate system includes: determining a part index and a patch index of each pixel in the first rendered image based on the first three-dimensional position information and the projection matrix; and generating the first rendered image based on the determined part index and patch index of each pixel in the first rendered image. The part index of a pixel is configured to identify a part of the first object corresponding to the pixel; and the patch index of a pixel is configured to identify a patch corresponding to the pixel.
In an implementation, the operation of generating the first rendered image based on first three-dimensional position information of each vertex of the multiple patches forming each first sphere in the camera coordinate system includes: for each first sphere, generating the first rendered image corresponding to each first sphere according to the first three-dimensional position information of each vertex of the multiple patches forming each first sphere in the camera coordinate system respectively. The operation of obtaining the gradient information of the first rendered image based on the first rendered image and the semantically segmented image of the first image includes: for each first sphere, obtaining the gradient information of the first rendered image corresponding to each first sphere according to the first rendered image and the semantically segmented image corresponding to each first sphere.
Therefore, it is conducive to simplifying the expression of class values corresponding to different parts and simplifying the computational complexity in gradient calculation.
In an implementation, the gradient information of the first rendered image includes: a gradient value of each pixel in the first rendered image. The operation of obtaining the gradient information of the first rendered image based on the first rendered image and the semantically segmented image of the first image includes: traversing each pixel in the first rendered image, and determining the gradient value of each traversed pixel based on a first pixel value of each traversed pixel in the first rendered image and a second pixel value of each traversed pixel in the semantically segmented image.
Therefore, the gradient information of the first rendered image can be obtained based on the first rendered image and the semantically segmented image of the first image.
In an implementation, the operation of determining the gradient value of each traversed pixel according to a first pixel value of each traversed pixel in the first rendered image and the second pixel value of each traversed pixel in the semantically segmented image includes: determining a residual error of each traversed pixel according to the first pixel value of each traversed pixel and the second pixel value of each traversed pixel; in the case where the residual error of each traversed pixel is a first value, determining the gradient value of each traversed pixel as the first value; in the case where the residual error of each traversed pixel is not the first value, determining a target first sphere corresponding to each traversed pixel from the multiple first spheres based on the second pixel value of each traversed pixel, and determining a target patch from the multiple patches forming the target first sphere; determining target three-dimensional position information of at least one target vertex of the target patch in the camera coordinate system; where in the case where the at least one target vertex is positioned at the position identified by the target three-dimensional position information, the residual error between a new first pixel value obtained by re-rendering each traversed pixel and the second pixel value corresponding to each traversed pixel as the first value; and obtaining the gradient value of each traversed pixel based on first three-dimensional position information and the target three-dimensional position information of the target vertex in the camera coordinate system.
Therefore, the gradient value of each pixel in the first rendered image can be obtained.
In an implementation, the operation of acquiring first sphere position information of each of the multiple first spheres in the camera coordinate system based on a first image including the first object includes: performing position information prediction processing on the first image by utilizing a pre-trained position information prediction network to obtain the first sphere position information of each first sphere of the multiple first spheres in the camera coordinate system.
In a second aspect, an embodiment of the disclosure further provides a neural network generation method, which includes: performing three-dimensional position information prediction processing on a second object in a second image by utilizing a to-be-trained neural network to obtain second sphere position information of each second sphere of multiple second spheres representing different parts of the second object in a camera coordinate system; generating a second rendered image based on the second sphere position information corresponding to the multiple second spheres respectively; obtaining gradient information of the second rendered image based on the second rendered image and a semantically annotated image of the second image; and updating the to-be-trained neural network based on the gradient information of the second rendered image to obtain an updated neural network.
Therefore, after the three-dimensional position information prediction processing is performed on the second object in the second image by utilizing the to-be-optimized neural network to obtain the second sphere position information of the multiple second spheres representing a three-dimensional model of the second object in the second image, image rendering is performed based on the second sphere position information, the gradient information representing the degree of correctness of the second sphere position information of the multiple second spheres is determined based on the result of image rendering, and the to-be-optimized neural network is updated based on the gradient information to obtain the optimized neural network, such that the optimized neural network has higher accuracy in the prediction of the three-dimensional position information.
In a third aspect, an embodiment of the disclosure further provides a three-dimensional model generation apparatus, which includes: a first acquisition part, configured to acquire first sphere position information of each first sphere of multiple first spheres in a camera coordinate system based on a first image including a first object, where the multiple first spheres are configured to represent different parts of the first object respectively; a first generation part, configured to generate a first rendered image based on the first sphere position information of the multiple first spheres; a first gradient determination part, configured to obtain gradient information of the first rendered image based on the first rendered image and a semantically segmented image of the first image; an adjustment part, configured to adjust the first sphere position information of the multiple first spheres based on the gradient information of the first rendered image; and a model generation part, configured to generate a three-dimensional model of the first object by utilizing the adjusted first sphere position information of the multiple first spheres.
In an implementation, in the case where a first rendered image is generated based on first sphere position information of the multiple first spheres, the first generation part is configured to: determine first three-dimensional position information of each vertex of multiple patches forming each first sphere in a camera coordinate system respectively based on the first sphere position information; and generate the first rendered image based on the first three-dimensional position information of each vertex of the multiple patches forming each first sphere in the camera coordinate system respectively.
In an implementation, in the case where the first three-dimensional position information of each vertex of the multiple patches forming each first sphere in the camera coordinate system respectively is determined based on first sphere position information, the first generation part is configured to: determine the first three-dimensional position information of each vertex of the multiple patches forming each first sphere in the camera coordinate system respectively based on a first positional relation between template vertices of multiple template patches forming a template sphere and a center point of the template sphere, as well the first sphere position information of each first sphere.
In an implementation, the first sphere position information of each first sphere includes: second three-dimensional position information of a center point of each first sphere in the camera coordinate system, lengths corresponding to three coordinate axes of each first sphere respectively, and a rotation angle of each first sphere relative to the camera coordinate system.
In an implementation, in the case where the first three-dimensional position information of each vertex of the multiple patches forming each first sphere in the camera coordinate system is determined based on the first positional relation between the template vertices of the multiple template patches forming the template sphere and the center point of the template sphere, as well as first sphere position information of each first sphere, the first generation part is configured to: transform the template sphere in terms of shape and rotation angle based on the lengths corresponding to the three coordinate axes of each first sphere respectively and the rotation angle of each first sphere relative to the camera coordinate system; determine a second positional relation between each template vertex and a center point of the transformed template sphere based on the result of transforming the template sphere in terms of shape and rotation angle, as well as the first positional relation; and determine the first three-dimensional position information of each vertex of the multiple patches forming each first sphere in the camera coordinate system based on the second three-dimensional position information of the center point of each first sphere in the camera coordinate system and the second positional relation.
In an implementation, the first acquisition part is further configured to: acquire a camera projection matrix of a first image. In the case where the first rendered image is generated based on the first three-dimensional position information of each vertex of the multiple patches forming each first sphere in a camera coordinate system respectively, the first generation part is configured to: determine a part index and a patch index of each pixel in the first rendered image based on the first three-dimensional position information and the projection matrix; and generate the first rendered image based on the determined part index and patch index of each pixel in the first rendered image. The part index of a pixel is configured to identify a part of the first object corresponding to the pixel; and the patch index of a pixel is configured to identify a patch corresponding to the pixel.
In an implementation, in the case where the first rendered image is generated based on the first three-dimensional position information of each vertex of the multiple patches forming each first sphere in a camera coordinate system respectively, the first generation part is configured to: for each first sphere, generate the first rendered image corresponding to each first sphere according to the first three-dimensional position information of each vertex of the multiple patches forming each first sphere in the camera coordinate system respectively.
In the case where the gradient information of the first rendered image is obtained based on the first rendered image a semantically segmented image of the first image, the first gradient determination part is configured to: for each first sphere, obtain the gradient information of the first rendered image corresponding to each first sphere according to the first rendered image and the semantically segmented image corresponding to each first sphere.
In an implementation, the gradient information of the first rendered image includes: a gradient value of each pixel in a first rendered image. In the case where the gradient information of the first rendered image is obtained based on the first rendered image and the semantically segmented image of a first image, the first gradient determination part is configured to: traverse each pixel in the first rendered image, and determine the gradient value of each traversed pixel based on a first pixel value of each traversed pixel in the first rendered image and a second pixel value of each traversed pixel in the semantically segmented image.
In an implementation, in the case where the gradient value of each traversed pixel is determined based on a first pixel value of each traversed pixel in a first rendered image and a second pixel value of each traversed pixel in a semantically segmented image, a first gradient determination part is configured to: determine a residual error of each traversed pixel according to the first pixel value of each traversed pixel and the second pixel value of each traversed pixel; in the case where the residual error of each traversed pixel is a first value, determine the gradient value of each traversed pixel as the first value; in the case where the residual error of each traversed pixel is not the first value determine a target first sphere corresponding to each traversed pixel from the multiple first spheres based on the second pixel value of each traversed pixel, and determine a target patch from the multiple patches forming the target first sphere; determine target three-dimensional position information of at least one target vertex of the target patch in the camera coordinate system, where in the case where the at least one target vertex is positioned at a position identified by the target three-dimensional position information, the residual error between a new first pixel value obtained by re-rendering each traversed pixel and the second pixel value corresponding to each traversed pixel as the first value; and obtain the gradient value of each traversed pixel based on first three-dimensional position information and the target three-dimensional position information of the target vertex in the camera coordinate system.
In an implementation, in the case where the first sphere position information of each first sphere of multiple first spheres in the camera coordinate system is acquired based on a first image including a first object, the first acquisition part is configured to: perform position information prediction processing on the first image by utilizing a pre-trained position information prediction network to obtain the first sphere position information of each first sphere of the multiple first spheres in the camera coordinate system.
In a fourth aspect, an embodiment of the disclosure further provides a neural network generation apparatus, which includes: a second acquisition part, configured to perform three-dimensional position information prediction processing on a second object in a second image by utilizing a to-be-trained neural network to obtain second sphere position information of each second sphere of multiple second spheres representing different parts of the second object in a camera coordinate system; a second generation part, configured to generate a second rendered image based on the second sphere position information corresponding to the multiple second spheres respectively; a second gradient determination part, configured to obtain gradient information of the second rendered image based on the second rendered image and a semantically annotated image of the second image; and an updating part, configured to update the to-be-trained neural network based on the gradient information of the second rendered image to obtain an updated neural network.
In a fifth aspect, an implementation of the disclosure further provides an electronic device, which includes a processor and a memory. The memory is configured to store machine-readable instructions which are executable by the processor, the processor is configured to execute the machine-readable instructions stored in the memory. When the machine-readable instructions are executed by the processor, the steps in the first aspect, or any implementation of the first aspect are executed; or the steps in the second aspect, or any implementation of the second aspect are executed.
In a sixth aspect, an implementation of the disclosure further provides a computer-readable storage medium. The computer-readable storage medium is configured to store a computer program. When the computer program is run, the steps in the first aspect or any implementation of the first aspect are executed; or the steps in the second aspect or any implementation of the second aspect are executed.
In a sixth aspect, an implementation of the disclosure further provides a computer program, including computer-readable codes. When the computer-readable codes are run in an electronic device, the steps in the first aspect, or any implementation of the first aspect are implemented by a processor of the electronic device; or the steps in the second aspect or any implementation of the second aspect are implemented.
In order to illustrate the technical schemes of the embodiments of the disclosure more clearly, the drawings required in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the disclosure and, together with the specification, are used to illustrate the technical schemes of the disclosure. It should be understood that the following drawings merely illustrate certain embodiments of the disclosure and are therefore not intended to limit the scope of the disclosure. It is apparent to those skilled in the art that other related drawings may be derived from the drawings without inventive efforts.
In order to make the objectives, technical schemes and advantages of the embodiments of the disclosure to be understood clearly, the technical schemes of the embodiments of the disclosure will be illustrated clearly and comprehensively with reference to the drawings. It is apparent that the illustrated embodiments are parts and not all embodiments of the disclosure. Generally, the components of the embodiments of the disclosure, as described and illustrated with reference to the drawings herein, may be arranged and designed in a wide variety of different configurations. Therefore, the following detailed description of the embodiments of the disclosure, presented with reference to the drawings, is not intended to limit the scope of the embodiments of the disclosure as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which may be derived by those skilled in the art from the embodiments of the disclosure without making any inventive efforts, should fall within the scope of the embodiments of the disclosure.
In the process of generating a three-dimensional model based on a two-dimensional image, a neural network is generally adopted to predict parameters of the three-dimensional model of a generated object in the two-dimensional image, and the three-dimensional model is generated based on the parameters of the three-dimensional model. In the process of training the neural network, supervision data of a sample image needs to be utilized to supervise the training process. That is, the parameters of the three-dimensional model of the object in the sample image utilized in the training process are annotated in advance and are utilized for the supervision on the neural network training. Because it is difficult to acquire the supervision data, a simulation system is utilized to acquire the two-dimensional image and the supervision data of the two-dimensional image in many cases. However, there are some differences between a two-dimensional image acquired by the simulation system and a real two-dimensional image, which leads to decrease in accuracy of the neural network in generating the three-dimensional model based on the real two-dimensional image.
In addition, the existing three-dimensional model generation method cannot deal with the ambiguity caused by the fact that some parts of the object of the reconstructed three-dimensional model are blocked, which leads to the incapability of accurately restoring the attitude of the object of the reconstructed three-dimensional model in depth, and then leads to decrease in the accuracy of the generated three-dimensional model. As such, the existing three-dimensional model generation methods can have a low accuracy.
Based on the above research, the embodiments of the disclosure provide a three-dimensional model generation method. In the method, image rendering is performed on first sphere position information of multiple first spheres representing the three-dimensional model, gradient information representing degree of correctness of the first sphere position information of the multiple first spheres is determined based on the result of rendering the first image, and the first sphere position information corresponding to the first spheres respectively is readjusted based on the gradient information, such that the adjusted first sphere position information has higher accuracy, i.e., the three-dimensional model recovered based on the first sphere position information corresponding to the first spheres respectively also has higher accuracy.
In addition, according to the three-dimensional model generation method according to the embodiments of the disclosure, since the first sphere position information corresponding to the multiple first spheres is readjusted respectively by utilizing the gradient information representing the degree of correctness of the first sphere position information of the multiple first spheres, the depth information of the first object may be restored more accurately, thereby achieving higher accuracy.
The embodiments of the disclosure further provide a neural network generation method. In the method, three-dimensional position information prediction processing is performed on a second object in a second image by utilizing a to-be-optimized neural network to obtain second sphere position information of multiple second spheres representing a three-dimensional model of the second object in the second image, image rendering is performed based on the second sphere position information, gradient information representing the degree of correctness of the second sphere position information of the multiple second spheres is determined based on the result of image rendering, and the to-be-optimized neural network is updated based on the gradient information to obtain an optimized neural network, such that the optimized neural network has higher accuracy in the prediction of the three-dimensional position information.
It should be noted that same reference numerals and letters designate same items in the following drawings, and therefore, once an item is defined in one drawing, there is no need to further define and explain the item in subsequent drawings.
To enable the readers to better understand the embodiments, firstly, the three-dimensional model generation method according to the embodiment of the disclosure will be illustrated in detail. Generally, an executor of the three-dimensional model generation method according to the embodiment of the disclosure is a computer device with certain computing capacities, including, for example, a terminal device or a server or other processing device, the terminal device may be a UE (User Equipment), a mobile device, a user terminal, a terminal, a cellular telephone, a cordless telephone, a PDA (Personal Digital Assistant), a handheld device, a computing device, a vehicle-mounted device, a wearable device, etc. In some implementations, a three-dimensional model generation method may be implemented in a manner that a processor calls computer-readable instructions stored in a memory.
The three-dimensional model generation method according to the embodiments of the disclosure will be firstly described below.
At S101, first sphere position information of each sphere of multiple first spheres in a camera coordinate system is acquired based on a first image including a first object, where the multiple first spheres are configured to represent different parts of the first object respectively.
At S102, a first rendered image is generated based on the first sphere position information of the multiple first spheres.
At S103, gradient information of the first rendered image is obtained based on the first rendered image and a semantically segmented image of the first image.
At S104, the first sphere position information of the multiple first spheres is adjusted based on the gradient information of the first rendered image, and a three-dimensional model of the first object is generated by utilizing the adjusted first sphere position information of the multiple first spheres.
According to the embodiment of the disclosure, on the basis of the acquisition of the first sphere position information of each first sphere of multiple first spheres representing different parts of the first object in the camera coordinate system, the first object is re-rendered according to the first sphere position information to obtain the first rendered image; and the gradient information of the first rendered image is obtained based on the first rendered image and the first semantically segmented image, where the gradient information represents the degree of correctness of the first rendered image obtained by re-rendering the first object based on the first sphere position information, such that in the process of adjusting the first sphere position information of each first sphere based on the gradient information, the parts, which are predicted incorrectly, of the first sphere position information are adjusted, such that the adjusted first sphere position information can more accurately represent the positions of different parts of the first object in the camera coordinate system, and then the three-dimensional model with higher accuracy of the first object is generated based on the adjusted first sphere position information of each first sphere.
In addition, according to the embodiment of the disclosure, the gradient information representing the degree of correctness of the first sphere position information of the multiple first spheres is utilized to readjust the first sphere position information corresponding to the first spheres respectively, such that depth information of the first object may be restored more accurately, and the acquired three-dimensional model has higher accuracy.
The above S101 to S104 will be described respectively in detail below.
According to the above S101, according to the embodiment of the disclosure, when the three-dimensional model of the first object is generated based on the two-dimensional image of the first object, the first object is divided into multiple parts, and three-dimensional position information prediction is performed on the different parts of the first object respectively.
Exemplarily, three-dimensional position information corresponding to the different parts of the first object is represented by the first sphere position information of the first spheres in the camera coordinate system; and the first sphere position information of a first sphere in the camera coordinate system includes three-dimensional position information of a center point of the first sphere in the camera coordinate system (i.e. second three-dimensional position information), lengths corresponding to three coordinate axes of the first sphere, and rotation angle of the each first sphere relative to the camera coordinate system.
Taking a human body being the first object as an example, the human body may be divided into multiple parts according to the limbs and trunk of the human body, and each part is represented by one first sphere; and each first sphere includes three coordinate axes, which respectively represent the bone length and the thicknesses of the corresponding part in different directions.
Exemplarily, with reference to
The position and attitude data Si of the i-th first sphere satisfies the following formula (1):
S
i
=R
parent(i)·(liOi)+Sparent(i) (1)
where Oi: is an offset vector, which represents the offset direction from a parent part corresponding to the i-th first sphere to the current part; liOi represents the local position of the (i)th part of the human body in the key point layout; Sparent(i) represents the position and attitude data of the parent part; and Rparent(i) represents the rotation information of the parent part corresponding to the i-th first sphere in the camera coordinate system. The above formula (1) constrains the connection relationship between different first spheres.
When the first sphere position information of each of multiple first spheres in the camera coordinate system is acquired, for example, a pre-trained position information prediction network may be utilized to perform position information prediction processing on the first image to obtain the first sphere position information of each first sphere of the multiple first spheres in the camera coordinate system.
Exemplarily, with reference to
Here, the feature extraction sub-network is configured to perform feature extraction on the first image to obtain a feature map of the first image.
Here, the feature extraction sub-network includes, for example, a convolutional neural network (CNN) capable of performing at least one stage of feature extraction on the first image to obtain the feature map of the first image. The process of performing the at least one stage of feature extraction on the first image by the CNN may further be considered as the process of encoding the first image by utilizing a CNN encoder.
The key point prediction sub-network is configured to determine two-dimensional coordinate values of multiple key points of the first object in the first image based on the feature map of the first image.
Here, the key point prediction sub-network, for example, may perform at least one stage of deconvolution based on the feature map of the first image to obtain a heat map of the first image. The size of the heat map is, for example, the same as that of the first image; and a pixel value of each first pixel in the heat map represents the probability that a second pixel corresponding to the position of each first pixel in the first image is a key point of the first object. Then, the two-dimensional coordinate values of the multiple key points of the first object in the first image may be obtained by utilizing the heat map.
The three-dimensional position information prediction sub-network is configured to obtain the first sphere position information of the multiple first spheres forming the first object respectively in the camera coordinate system based on the multiple key points of the first object according to the two-dimensional coordinate values in the first image and the feature map of the first image.
In the above S102, after the first sphere position information corresponding to the multiple first spheres is obtained, for example, the first rendered image may be generated in the following manner.
First three-dimensional position information of each vertex of multiple patches forming each first sphere in the camera coordinate system respectively is determined based on the first sphere position information; and the first rendered image is generated based on the first three-dimensional position information of each vertex of multiple patches forming each first sphere in the camera coordinate system respectively.
Here, the patches are the set of vertices and polygons representing polyhedrons in the three-dimensional computer graphics, which are also called an unstructured grid. Based on the determination of the first sphere position information corresponding to the multiple first spheres forming the first object, the first three-dimensional position information of multiple patches forming the first sphere in the camera coordinate system may be determined based on the first sphere position information.
Here, the first three-dimensional position information of each vertex of the multiple patches forming each first sphere in the camera coordinate system is determined based on a first positional relation between template vertices of multiple template patches forming a template sphere and a center point of the template sphere and the first sphere position information of each first sphere.
Here, for example, a template sphere is shown as 41 in
Here, when the template sphere is transformed in terms of shape and rotation angle, the template sphere may be transformed in terms of shape firstly, such that the three coordinate axes of the template sphere are equal to the lengths of the three coordinate axes of the first sphere; and then, based on the result of transforming the template sphere in terms of shape, the rotation angle is transformed, such that the directions of the three coordinate axes of the template sphere in the camera coordinate system are in one-to-one correspondence with the directions of the three coordinate axes of the first sphere, and then the transformation of the template sphere in terms of shape and rotation angle is completed.
In addition, the template sphere may be transformed in terms of rotation angle firstly, such that the directions of the three axes of the template sphere in the camera coordinate system are in one-to-one correspondence with the directions of the three coordinate axes of the first sphere; and then transformation in terms of shape is performed based on the result of transforming the template sphere in terms of rotation angle, such that the lengths of three coordinate axes of the template sphere are equal to the lengths of three coordinate axes of the first sphere, and then the transformation of the template sphere in terms of shape and rotation angle is completed.
After the transformation of the template sphere in terms of shape and rotation angle is completed, that is, the lengths of three coordinate axes of the template sphere and the rotation angle in the camera coordinate system are determined. Then, the second positional relation between each template vertex of multiple template patches and the center point of the transformed template sphere is determined based on the lengths of the coordinate axes and the rotation angle in the camera coordinate system, and the first positional relation between each template vertex of multiple template patches forming the template sphere and the center point of the template sphere. The three-dimensional position information of each template vertex of multiple template patches in the camera coordinate system is determined based on the second positional relation and the second three-dimensional position information of the center point of each first sphere in the camera coordinate system. At this time, the three-dimensional position information of the template vertices of multiple template patches in the camera coordinate system forms the first three-dimensional position information of multiple vertices of multiple patches forming each first sphere in the camera coordinate system respectively.
Exemplarily, with reference to
After first three-dimensional position information of multiple vertices of multiple patches of each first sphere in a camera coordinate system is obtained, image rendering is performed on the multiple spheres forming a first object based on the first three-dimensional position information of multiple vertices of multiple patches forming each first sphere in the camera coordinate system respectively to generate a first rendered image.
Here, for example, the image rendering may be performed on the multiple first spheres forming the first object in the following manner.
A part index and a patch index of each pixel in the first rendered image are determined based on the first three-dimensional position information and a camera projection matrix; and
The first rendered image is generated based on the determined part index and patch index of each pixel in the first rendered image.
The part index of a pixel is configured to identify a part of the first object corresponding to the pixel; and the patch index of a pixel is configured to identify a patch corresponding to the pixel.
Here, the camera is a camera configured to acquire the first image; and the projection matrix of the camera may be obtained based on the position of the camera in the camera coordinate system and the first three-dimensional position information of multiple vertices of multiple patches forming each first sphere in the camera coordinate system respectively. After the first camera projection matrix is obtained, multiple first spheres may be mapped into the camera coordinate system based on the projection matrix to obtain the first rendered image.
In an implementation, when image rendering is performed on multiple spheres forming the first object, the multiple first spheres are collectively rendered based on first sphere position information corresponding the multiple spheres to obtain the first rendered image including all the first spheres. Then, gradient information of the first rendered image corresponding to all the first spheres is obtained, and the first sphere position information of the multiple first spheres is adjusted based on the gradient information.
In an implementation, when image rendering is performed on multiple first spheres forming the first object, each first sphere of the multiple first spheres is rendered respectively to obtain a first rendered image corresponding to each first sphere respectively. Then, gradient information of the first rendered image corresponding to each first sphere respectively is acquired, and the first sphere position of each first sphere is adjusted based on the gradient information of the first rendered image corresponding to each first sphere.
In the above S103, for example, semantic segmentation processing may be performed on the first image by utilizing a pre-trained semantic segmentation network to obtain the semantically segmented image of the first image.
(1) When the multiple first spheres are collectively rendered, the pixel values of corresponding pixels of different first spheres are different when they are rendered to obtain the first rendered image; and then, when the semantic segmentation is performed on the first image to obtain the semantically segmented image of the first image, the pixel value corresponding to each pixel in the semantically segmented image represents a class value of the part to which each pixel at the corresponding position in the first image belongs. Different parts of the first object have different class values in the semantically segmented image.
Exemplarily, the pixel value of a pixel corresponding to a first sphere corresponding to a part when the first sphere is rendered to obtain the first rendered image is the same as the class value corresponding to the part in the semantically segmented image.
(2) When the multiple first spheres are rendered respectively, the semantic segmentation is performed on the first image to obtain the semantically segmented image corresponding to first spheres representing different parts of the first object respectively.
Then, the gradient information of the first rendered image is obtained based on the first rendered image and the semantically segmented image of the first image, for example, in the following manner.
The gradient information of the first rendered image corresponding to each first sphere according to the first rendered image and the semantically segmented image corresponding to each first sphere.
The total gradient information corresponding to the multiple first spheres is obtained based on the gradient information of the first rendered image corresponding to each first sphere respectively.
Therefore, it is conducive to simplifying the expression of class values corresponding to different parts and simplifying the computational complexity in gradient calculation.
Theoretically, when the obtained first sphere position information corresponding to each first sphere respectively is completely correct, the pixel values of pixels at the corresponding position in the generated first rendered image and in the first semantically segmented image are the same. If an error occurs in the prediction of the first sphere position information of any first sphere, the pixel values of pixels corresponding to at least part of the positions in the first rendered image and the first semantically segmented image may be different.
Based on the above principle, the gradient information of the first rendered image may be determined by the first rendered image and the semantically segmented image of the first image. The gradient information represents the degree of correctness of the first sphere position information of each first sphere of the multiple first spheres in the camera coordinate system. Generally, a larger gradient characterizes lower accuracy of the first sphere position information; and accordingly, a smaller gradient characterizes higher accuracy of the first sphere position information. Therefore, the gradient information of the first rendered image may be utilized to guide the adjustment of the first sphere position information corresponding to each first sphere respectively, such that the obtained first rendered image may be gradually optimized towards the correct direction during the continuous adjustment of the first sphere position information, thereby making the finally generated three-dimensional model of the first object have higher accuracy.
Here, the gradient information of the first rendered image includes the gradient value of each pixel in the first rendered image.
When the first rendered image gradient information is determined, for example, each pixel in the first rendered image may be traversed, and the gradient value of each traversed pixel may be determined according to a first pixel value of each traversed pixel in the first rendered image and a second pixel value of each traversed pixel in the semantically segmented image.
With reference to
At S501, a residual error of the traversed pixel is determined according to a first pixel value of the traversed pixel and a second pixel value of the traversed pixel.
At S502, when the residual error of the traversed pixel is a first value, the gradient value of the traversed pixel is determined as the first value.
Here, for the traversed pixel, when the first pixel value and the second pixel value of the traversed pixel are equal, the first sphere position information of a first sphere to which a position point with the traversed pixel as a projection point belongs, is considered to be predicted correctly. Here, the position point may be a position point on any patch of a first sphere representing any part of the first object. When the first pixel value and the second pixel value of the traversed pixel are not equal, the first sphere position information of a first sphere to which a position point with the traversed pixel as a projection point belongs, is considered to be predicted erroneously.
In an implementation, a first value is, for example, 0.
At S503, when the residual error of the traversed pixel is not the first value, a target first sphere corresponding to the traversed pixel is determined from multiple first spheres based on the second pixel value of the traversed pixel, and a target patch is determined from multiple patches forming the target first sphere.
At S504, target three-dimensional position information of at least one target vertex of the target patch in the camera coordinate system is determined; where when the at least one target vertex is positioned at the position identified by the target three-dimensional position information, the residual error between a new first pixel value obtained by re-rendering the traversed pixel and the second pixel value corresponding to the traversed pixel is determined as the first value.
At S505, the gradient value of the traversed pixel is determined based on the first three-dimensional position information and the target three-dimensional position information of the target vertex in the camera coordinate system.
According to some embodiments of the disclosure, with reference to
In the example, a pixel P is a traversed pixel, and the coordinate value of P in the image coordinate system is represented as: P=(uP, vP). IP(x) ∈ 0,1 represents the rendering function of the pixel P.
In
62 denotes a blocking patch blocking the target patch in the direction in which a camera is positioned, and the patch blocking the target patch and the target patch belong to different first spheres.
As shown in panel a of
Here, the gradient value of the pixel P satisfies the following formulae (2) and (3):
where
represents the gradient value of the pixel P in the x-axis direction and
represents the gradient value of the pixel P in the z-axis direction. The gradient value of the pixel P in the y-axis direction is 0.
In the above formulae (2) and (3), δIP represents the residual error of the pixel P.
x0 represents the coordinate value of the target vertex vki,j in the x-axis before the target vertex vki,j moves along the x-axis direction; and x1 represents the coordinate value of the target vertex vki,j in the x-axis after the target vertex vki,j moves along the x-axis direction.
Δz=z0−z1 represents the depth difference between the position point Q projected to the pixel P in the target patch and a position point Q′ projected to the pixel P in the blocking patch, z0 represents the depth value of Q, zi represents the depth value of Q′, and the connecting line between v1i,j and Q intersects with the connecting line between v2i,j and v3i,j at M0. λ represents a hyperparameter. Δ(.,.) represents the distance between two points.
In panel e of
In panel b of
In this case, the gradient value of the pixel P satisfies the above formula (2), and the gradient values of the pixel P in the z-axis direction and the y-axis direction are both 0.
In panel c of
In this case, the gradient value of the pixel P satisfies the above formula (3), and the gradient values of the pixel P in both the x-axis direction and the y-axis direction are 0.
In panel d of
In this case, the gradient value of the pixel P satisfies the above formula (2), and the gradient values of the pixel P in the y-axis direction and the z-axis direction are both 0.
By adopting the foregoing method, the gradient value of each pixel in the first rendered image may be obtained; and the gradient values of all pixels in the first rendered image form the gradient information of the first rendered image.
In the above S104, when the first sphere position information of each first sphere is adjusted based on the gradient information of the first rendered image, for example, at least one item of the first sphere position information of each first sphere may be adjusted, and at least one item of the second three-dimensional position information of the center point of each first sphere in the camera coordinate system, the lengths corresponding to the three coordinate axes of each first sphere respectively, and the rotation angle of each first sphere relative to the camera coordinate system may also be adjusted, such that in a new first rendered image generated based on the adjusted first sphere position information, the gradient value of each pixel changes towards the first value, such that the first sphere position information may be gradually approximate to the true values through multiple iterations, and the accuracy of the first sphere position information is improved, and finally the accuracy of the three-dimensional model of the first object is improved.
With reference to
At S701, three-dimensional position information prediction processing is performed on a second object in a second image by utilizing a to-be-trained neural network to obtain second sphere position information of each second sphere of multiple second spheres representing different parts of the second object in a camera coordinate system.
At S702, a second rendered image is generated based on the second sphere position information corresponding to the multiple second spheres.
At S703, gradient information of the second rendered image is obtained based on the second rendered image and a semantically annotated image of the second image.
At S704, the to-be-trained neural network is updated based on the gradient information of the second rendered image to obtain an updated neural network.
According to the embodiment of the disclosure, the three-dimensional position information prediction processing is performed on the second object in the second image by utilizing the to-be-optimized neural network, based on obtaining the second sphere position information of the multiple second spheres representing the three-dimensional model of the second object in the second image, image rendering is performed based on the second sphere position information, the gradient information representing the degree of correctness of the second sphere position information of the multiple second spheres is determined based on the result of the image rendering, and the to-be-optimized neural network is updated based on the gradient information to obtain the optimized neural network, such that the optimized neural network has higher accuracy in the prediction of the three-dimensional position information.
The implementation process of the above S702 is similar to that of the above S102; and the implementation process of the above S703 is similar to that of the above S103, which will not be repeated here.
In the above S704, when the to-be-trained neural network is updated based on the gradient information of the second rendered image, the new second sphere position information is obtained by utilizing the updated neural network, the gradient value of each pixel changes towards the first value in a new second rendered image obtained based on the new second sphere position information, and the accuracy of the neural network in predicting the second sphere position information may be gradually improved by optimizing the neural network for multiple times.
It can be seen from the above contents, according to the embodiments of the disclosure, the gradient of a certain pixel may be transferred to the Euclidean coordinates of a node on three-dimensional grid, i.e., the shape of the three-dimensional object model may be corrected by utilizing image information such as object contour and part semantic segmentation. An application scenario of the embodiments of the disclosure will be described below.
1. Forward propagation: from three-dimensional model grid to image pixels.
According to given camera parameters, the projection of each triangle patch (the foregoing patch) on the image plane is calculated according to the imaging principle of a pinhole camera; for each pixel on the image plane an index of a ptach closest to the camera in the area in which the pixel is positioned is calculated (i.e., in complete rendering, this pixel is obtained from rendering which triangle patch); and an image in which each pixel stores the index of the corresponding patch is a triangle face index. Here, whether a pixel (u, v) belongs to the i-th part is represented by Ai(u,v), which is referred to as an element index (the foregoing part index); and a rendered image is generated, and then a portion of pixels are separately extracted from the complete rendered image for each element (the foregoing part), where the coordinates of the portion of extracted pixels in the part indexes belong to the current element.
2. Back propagation: gradients of pixels transmitted back to nodes of three-dimensional grid;
Since the situation is the same for x-direction and y-direction, the situation that the gradients are transmitted back in the x-direction is taken as an example for illustration. A pixel value may be RGB value, gray value, brightness value and binary value. Here, taking the binary value as an example, i.e., 1 represents “visible” and 0 represents “invisible”. The gradient of each pixel is either positive (from 0 to 1) or negative (from 1 to 0). In order to correlate the Euclidean coordinates of the nodes (the foregoing vertices) with the gradients of the pixels, it is considered here that the value of each pixel changes linearly, rather than abruptly when a node is moved. When blocking does not exist, for example, as shown in panel a of
is the slope of the change, as shown by a black solid line in a second line chart in the lower portion of panel a of
represents the gradient of the node k, which belongs to the j-th triangle patch of the i-th part, the above formula (2) works. When blocking exists, due to part-level rendering, if the current part is blocked by another part, the corresponding value is not rendered, such that regardless whether the part covers the pixel or not, the value of the pixel is 0 in the rendered image of the part. With reference to
3. All the pixels are traversed according to the foregoing part 1 and part 2, and the values of the gradients of the traversed pixels transmitted back to the nodes of the three-dimensional model are calculated; when gradients of multiple pixels are transmitted back to a node, all gradients are accumulated; a parallel acceleration method may be adopted herein for acceleration, for example, CUDA or CPUs in parallel may be adopted to calculate each pixel independently; and finally, with given supervision information, the gradients of the nodes of the three-dimensional model are obtained in this way.
According to the method, the used supervision information is no longer limited to complete rendered images, and the semantic segmentation of an object may be utilized as the supervision information. When multiple objects are rendered together, different objects may further be considered as different parts and rendered independently, such that the positional relation between different objects may be known.
Those skilled in the art should understand that, according to the method, the sequence of writing the steps does not imply a strict sequence of execution and should not limit the implementations, and the sequence of execution of the steps should be determined by the functions and possible inherent logic thereof.
Based on the same inventive concept, an embodiment of the disclosure further provides a three-dimensional model generation apparatus corresponding to the three-dimensional model generation method. Due to the fact that the device according to the embodiment of the disclosure is similar to the three-dimensional model generation method according to the foregoing embodiments of the disclosure, the implementation of the device may refer to the implementation of the method, and the same parts will not be repeated herein.
The first acquisition part 81 is configured to acquire first sphere position information of each first sphere in a camera coordinate system based on a first image including the first object, where the multiple first spheres represent different parts of the first object respectively.
The first generation part 82 is configured to generate a first rendered image based on the first sphere position information of the multiple first spheres.
The first gradient determination part 83 is configured to obtain gradient information of the first rendered image based on the first rendered image and a semantically segmented image of the first image
The adjustment part 84 is configured to adjust the first sphere position information of the multiple first spheres based on the gradient information of the first rendered image.
The model generation part 85 is configured to generate a three-dimensional model of the first object by utilizing the adjusted first sphere position information of the multiple first spheres.
According to some embodiments of the disclosure, when a first rendered image is generated based on first sphere position information of multiple first spheres, a first generation part 82 is configured to: determine first three-dimensional position information of each vertex of multiple patches forming each first sphere in a camera coordinate system respectively based on the first sphere position information; and generate the first rendered image based on the first three-dimensional position information of each vertex of multiple patches forming each first sphere in the camera coordinate system respectively.
According to some embodiments of the disclosure, when first three-dimensional position information of each vertex of multiple patches forming each first sphere in a camera coordinate system is determined based on first sphere position information, the first generation part 82 is configured to: determine the first three-dimensional position information of each vertex of multiple patches forming each first sphere in the camera coordinate system based on a first positional relation between template vertices of multiple template patches forming a template sphere and a center point of the template sphere, as well as the first sphere position information of each first sphere.
According to some embodiments of the disclosure, the first sphere position information of each first sphere includes: second three-dimensional position information of a center point of each first sphere in a camera coordinate system, lengths corresponding to three coordinate axes of each first sphere respectively, and the rotation angle of each first sphere relative to the camera coordinate system.
According to some embodiments of the disclosure, when first three-dimensional position information of each vertex of multiple patches forming each first sphere in a camera coordinate system is determined based on a first positional relation between template vertices of multiple template patches forming a template sphere and a center point of the template sphere, as well as the first sphere position information of each first sphere, the first generation part 82 is configured to: transform the template sphere in terms of shape and rotation angle based on the lengths corresponding to the three coordinate axes of each first sphere respectively and the rotation angle of each first sphere relative to the camera coordinate system; determine a second positional relation between each template vertex and a center point of the transformed template sphere based on the result of transforming the template sphere in terms of shape and rotation angle, as well as the first positional relation; and determine the first three-dimensional position information of each vertex of multiple patches forming each first sphere in the camera coordinate system based on the second three-dimensional position information of the center point of each first sphere in the camera coordinate system and the second positional relation.
According to some embodiments of the disclosure, the first acquisition part 81 is further configured to: acquire a camera projection matrix of a first image.
When the first rendered image is generated based on first three-dimensional position information of each vertex of multiple patches forming each first sphere in a camera coordinate system, the first generation part 82 is configured to: determine a part index and a patch index of each pixel in the first rendered image based on the first three-dimensional position information and the projection matrix; and generate the first rendered image based on the determined part index and patch index of each pixel in the first rendered image.
The part index of a pixel is configured to identify a part of the first object corresponding to the pixel; and the patch index of a pixel is configured to identify a patch corresponding to the pixel.
According to some embodiments of the disclosure, when a first rendered image is generated based on first three-dimensional position information of each vertex of multiple patches forming each first sphere in a camera coordinate system respectively, the first generation part 82 is configured to: for each first sphere, generate the first rendered image corresponding to each first sphere according to the first three-dimensional position information of each vertex of the multiple patches forming each first sphere in the camera coordinate system respectively.
When the gradient information of the first rendered image is obtained based on the first rendered image and the semantically segmented image of the first image, the first gradient determination part 83 is configured to: for each first sphere, obtain the gradient information of the first rendered image corresponding to each first sphere according to the first rendered image and the semantically segmented image corresponding to each first sphere.
According to some embodiments of the disclosure, first rendered image gradient information includes: a gradient value of each pixel in a first rendered image.
When the gradient information of the first rendered image is obtained based on a first rendered image and the semantically segmented image of the first image, the first gradient determination part 83 is configured to: traverse each pixel in the first rendered image, and determine the gradient value of each traversed pixel according to a first pixel value of each traversed pixel in the first rendered image and a second pixel value of each traversed pixel in the semantically segmented image.
According to some embodiments of the disclosure, when the gradient value of each traversed pixel is determined according to a first pixel value of each traversed pixel in a first rendered image and a second pixel value of each traversed pixel in a semantically segmented image, the first gradient determination part 83 is configured to: determine a residual error of each traversed pixel according to the first pixel value of each traversed pixel and the second pixel value of each traversed pixel; when the residual error of each traversed pixel is a first value determine the gradient value of each traversed pixel as the first value; when the residual error of each traversed pixel is not the first value, determine a target first sphere corresponding to each traversed pixel from the multiple first spheres based on the second pixel value of each traversed pixel, and determine a target patch from multiple patches forming the target first sphere; determine target three-dimensional position information of at least one target vertex of the target patch in the camera coordinate system, where when the at least one target vertex is positioned at the position identified by the target three-dimensional position information, the residual error between a new first pixel value obtained by re-rendering each traversed pixel and the second pixel value is determined corresponding to each traversed pixel as the first value; and obtain the gradient value of each traversed pixel based on first three-dimensional position information and the target three-dimensional position information of the target vertex in the camera coordinate system.
According to some embodiments of the disclosure, when first sphere position information of each first sphere of the multiple first spheres in a camera coordinate system is obtained based on a first image including a first object, the first acquisition part 81 is configured to: perform position information prediction processing on the first image by utilizing a pre-trained position information prediction network to obtain the first sphere position information of each first sphere of the multiple first spheres in the camera coordinate system.
With reference to
A second acquisition part 91 is configured to perform three-dimensional position information prediction processing on a second object in a second image by utilizing a to-be-trained neural network to obtain second sphere position information of each second sphere of multiple second spheres representing different parts of the second object in a camera coordinate system.
A second generation part 92 is configured to generate a second rendered image based on the second sphere position information corresponding to the multiple second spheres respectively.
A second gradient determination part 93 is configured to obtain gradient information of the second rendered image based on the second rendered image and a semantically annotated image of the second image.
The updating part 94 is configured to update the to-be-trained neural network based on the gradient information of the second rendered image to obtain an updated neural network.
The process of each part of the device and the interaction between different parts may refer to relevant descriptions of the foregoing embodiments of the method, which will not be described in detail here.
According to the embodiments and other embodiments of the disclosure, a “part” may be a part of a circuit, a part of a processor, a part of a program or software, etc., and definitely the “part” may further be a unit, modular or non-modular.
An embodiment of the disclosure further provides a computer device, as shown in
The memory 12 being configured to store machine-readable instructions which may be executed by the processor 11, and when the computer device is run, the machine-readable instructions being executed by the processor to implement the following steps.
First sphere position information of each first sphere of multiple first spheres in a camera coordinate system is acquired based on a first image including a first object, where the multiple first spheres are configured to represent different parts of the first object respectively.
A first rendered image is generated based on the first sphere position information of the multiple first spheres.
Gradient information of the first rendered image is obtained based on the first rendered image and a semantically segmented image of the first image.
The first sphere position information of the multiple first spheres is adjusted based on the gradient information of the first rendered image, and a three-dimensional model of the first object is generated by utilizing the adjusted first sphere position information of the multiple first spheres.
In an embodiment, the machine-readable instructions are executed by the processor to implement the following steps.
Three-dimensional position information prediction processing is performed on a second object in a second image by utilizing a to-be-trained neural network to obtain second sphere position information of each second sphere of multiple second spheres representing different parts of the second object in a camera coordinate system.
A second rendered image is generated based on the second sphere position information corresponding to multiple second spheres respectively.
Gradient information of the second rendered image is obtained based on the second rendered image and a semantically annotated image of the second image.
The to-be-trained neural network is updated based on the gradient information of the second rendered image to obtain an updated neural network.
The execution process of the foregoing instructions may refer to the steps of the three-dimensional model generation method and the method for generating the neural network according the embodiments of the disclosure, which will not be repeated herein.
An embodiment of the disclosure further provides a computer-readable storage medium, in which a computer program is stored. When the computer program is run by a processor, the steps of a three-dimensional model generation method or a neural network generation method according to the foregoing embodiments are executed. The storage medium may be a volatile or non-volatile computer-readable storage medium.
An embodiment of the disclosure provides a computer program product for a three-dimensional model generation method or a neural network generation method, including a computer-readable storage medium storing program codes. The program codes include instructions which may be configured to execute the steps of the three-dimensional model generation method or the method for generating the neural network according to the foregoing embodiments, which may refer to the foregoing embodiments and will not be repeated herein.
An embodiment of the disclosure further provides a computer program. When the computer program is executed by a processor, any of the methods according to the foregoing embodiments may be implemented. The computer program product may be implemented in terms of hardware, software, or a combination thereof. According to an embodiment, a computer program product may be implemented as a computer storage medium, and according to another embodiment, a computer program product may be implemented as a software product, such as an SDK (Software Development Kit), etc.
An embodiment of the disclosure further provides a computer program, which includes computer-readable codes. When the computer-readable codes are run in an electronic device, a processor in the electronic device executes a three-dimensional model generation method or a neural network generation method according to the foregoing embodiments.
According to the embodiments of the disclosure, in the task of reconstructing a three-dimensional, the accuracy of a reconstructed model can be optimized, and the ambiguity generated by the self-blocking of the high-degree-of-freedom model is reduced. Moreover, in the deep learning, an image and the three-dimensional space can be connected according to the embodiments of the disclosure, thereby improving the accuracy of semantic segmentation, three-dimensional reconstruction and other tasks.
Those skilled in the art should understand that, for convenience and conciseness of the descriptions, the work processes of the foregoing systems and devices may refer to the corresponding processes of the methods according to the foregoing embodiments and will not be repeated herein. According to the embodiments of the disclosure, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. The foregoing embodiments of the devices are merely for illustration, for example, the units are merely classified according to the logical functions thereof, and may be classified in another way in actual application; and for example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not implemented. On the other hand, the “coupling” or “direct coupling” or “communication connection” shown or discussed herein may be “indirect coupling” or “indirect communication connection” through some communication interfaces, devices or units, which may be implemented in electrical, mechanical or other forms.
The units, illustrated as separate components, may or may not be physically separated, and the components displayed as units may or may not be physical units, i.e., the components may be positioned in one place, or may be distributed over multiple network units. Part or all of the units may be selected according to actual needs to achieve the objectives of the embodiments.
In addition, the functional units according to the embodiments of the disclosure may be integrated in one processing unit, or each unit may exist physically alone, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or utilized as a product alone, may be stored in a non-volatile computer-readable storage medium which may be executed by a processor. Thereon, the technical schemes of the disclosure or part of contributes thereof to the related art or part of the technical schemes may be implemented in the form of a software product in essence. The computer software product is stored in a storage medium and includes multiple instructions configured to enable a computer device (which may be a personal computer, server, network device, etc.) to execute all or part of the steps of the methods according to the embodiments of the disclosure. The foregoing storage medium may be a USB flash disk, a portable hard drive, a ROM (Read-Only Memory), a RAM (Random Access Memory), a diskette or a CD and another medium capable of storing program codes.
Finally, it should be noted that: the foregoing embodiments are merely specific embodiments of the disclosure to illustrate the technical schemes of the disclosure, and are not intended to limit the disclosure, and the scope of the disclosure is not limited thereto. Although the disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that modification, variations to the technical schemes and equivalent substitutions for some technical features according to the foregoing embodiments may be made by those skilled in the art within the technical scope of the disclosure. However, the modifications, variations and substitutions do not depart from the spirit and scope of the embodiments disclosed herein, which should fall within the scope of the disclosure. Therefore, the scope of the disclosure should be defined by the scope of the claims.
Number | Date | Country | Kind |
---|---|---|---|
202010607430.5 | Jun 2020 | CN | national |
This is a continuation of international patent application no. PCT/CN2021/082485 filed on Mar. 23, 2021, which claims priority to Chinese patent application no. 202010607430.5 filed on Jun. 29, 2020. The disclosures of the above-referenced applications are hereby incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2021/082485 | Mar 2021 | US |
Child | 17645446 | US |