The disclosure relates to an electronic device for performing camera calibration of a plurality of cameras and an operation method thereof. More particularly, the disclosure relates to an electronic device for obtaining information about a positional relationship between cameras based on two-dimensional (2D) feature points extracted from the cameras.
Triangulation is used to obtain a distance from a camera to an object, i.e., a depth value, by using a plurality of cameras. In order to obtain an accurate depth value of the object by using triangulation, it is necessary to know in advance a positional relationship between the plurality of cameras, i.e., information about the relative positions and orientations between the plurality of cameras. In particular, when a camera is not fixed at a certain location, for example, in the case of a camera in a smartphone, a home closed-circuit television (CCTV), or a robot vacuum cleaner, a position and a view of the camera may be changed due to movement thereof. To accurately predict a changed positional relationship between the plurality of cameras, camera calibration needs to be performed again.
In general, various proposed methods for performing camera calibration for estimating a positional relationship between a plurality of cameras may include structure-from-motion (SfM), stereo vision, visual localization, a method using a checkerboard, etc.
Among the existing methods, SfM is a method involving extracting two-dimensional (2D) feature points from a plurality of images captured at different angles by using a plurality of cameras, and estimating a positional relationship between the cameras by matching corresponding 2D feature points between the plurality of images among the extracted 2D feature points. Because SfM is applied only to pairs of 2D feature points on the same plane, this method has a limitation in that it is not applied to other planes. Among the other methods, the method using a checkerboard involves obtaining a plurality of images of a checkerboard captured by using a plurality of cameras and predicting a positional relationship between the cameras by matching grid points on the checkerboard in the obtained plurality of images. The method using a checkerboard has a prerequisite that all grid points on the checkerboard be on the same plane, which is cumbersome because the method requires the checkerboard, and when a position of a camera changes, the checkerboard is required to be prepared again.
The above information is presented as background information only to assist with an understanding of the disclosure. No determination has been made, and no assertion is made, as to whether any of the above might be applicable as prior art with regard to the disclosure.
Aspects of the disclosure are to address at least the above-mentioned problems and/or disadvantages and to provide at least the advantages described below. Accordingly, an aspect of the disclosure is to provide an electronic device for obtaining information about a positional relationship between cameras based on two-dimensional (2D) feature points extracted from the cameras.
Additional aspects will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the presented embodiments.
In accordance with an aspect of the disclosure, an electronic device for performing camera calibration is provided. The electronic device includes a communication interface, memory storing one or more computer programs, and one or more processors including processing circuitry and communicatively coupled to the communication interface and the memory, wherein the one or more computer programs include computer-executable instructions that, when executed by the one or more processors individually or collectively, cause the electronic device to obtain, via the communication interface, a first image of a user captured by a first camera and a second image of the user captured by a second camera, extract first joint feature points, which are two-dimensional (2D) position coordinate values of joints of the user, from the first image, and second joint feature points, which are 2D position coordinate values of the joints, from the second image, obtain three-dimensional (3D) joint feature points of the joints by lifting the extracted first joint feature points to 3D position coordinate values, obtain a projection relationship for projecting the 3D joint feature points onto the 2D position coordinate values of the second joint feature points, and perform camera calibration by predicting a positional relationship between the first camera and the second camera based on the obtained projection relationship.
In accordance with another aspect of the disclosure, a method performed by an electronic device for performing camera calibration is provided. The method includes obtaining, by the electronic device, a first image of a user captured from a first camera, and a second image of the user captured from a second camera, extracting, by the electronic device, first joint feature points, which are 2D position coordinate values of joints of the user, from the first image, and second joint feature points, which are 2D position coordinate values of the joints, from the second image, obtaining, by the electronic device, 3D joint feature points of the joints by lifting the extracted first joint feature points to 3D position coordinate values, obtaining, by the electronic device, a projection relationship for projecting the 3D joint feature points onto the 2D position coordinate values of the second joint feature points, and performing, by the electronic device, camera calibration by predicting a positional relationship between the first camera and the second camera based on the obtained projection relationship.
In accordance with another aspect of the disclosure, one or more non-transitory computer-readable storage media storing one or more computer programs including computer-executable instructions that, when executed by one or more processors of an electronic device individually or collectively, cause the electronic device to perform operations, are provided. The operations include obtaining, by the electronic device, a first image of a user captured by a first camera and a second image of the user captured by a second camera, extracting, by the electronic device, first joint feature points, which are 2D position coordinate values of joints of the user, from the first image, and second joint feature points, which are 2D position coordinate values of the joints, from the second image, obtaining, by the electronic device, 3D joint feature points of the joints by lifting the extracted first joint feature points to 3D position coordinate values, obtaining, by the electronic device, a projection relationship for projecting the 3D joint feature points onto the 2D position coordinate values of the second joint feature points, and performing, by the electronic device, camera calibration by predicting a positional relationship between the first camera and the second camera based on the obtained projection relationship.
Other aspects, advantages, and salient features of the disclosure will become apparent to those skilled in the art from the following detailed description, which, taken in conjunction with the annexed drawings, discloses various embodiments of the disclosure.
The above and other aspects, features, and advantages of certain embodiments of the disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings, in which:
The same reference numerals are used to represent the same elements throughout the drawings.
The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of various embodiments of the disclosure as defined by the claims and their equivalents. It includes various specific details to assist in that understanding but these are to be regarded as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the various embodiments described herein can be made without departing from the scope and spirit of the disclosure. In addition, descriptions of well-known functions and constructions may be omitted for clarity and conciseness.
The terms and words used in the following description and claims are not limited to the bibliographical meanings, but, are merely used by the inventor to enable a clear and consistent understanding of the disclosure. Accordingly, it should be apparent to those skilled in the art that the following description of various embodiments of the disclosure is provided for illustration purpose only and not for the purpose of limiting the disclosure as defined by the appended claims and their equivalents.
It is to be understood that the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a component surface” includes reference to one or more of such surfaces.
Throughout the disclosure, when a part “includes” or “comprises” an element, unless there is a particular description contrary thereto, it is understood that the part may further include other elements, not excluding the other elements. Furthermore, terms, such as “portion,” “module,” etc., used herein indicate a unit for processing at least one function or operation, and may be implemented as hardware or software or a combination of hardware and software.
The expression “configured to (or set to)” used herein may be used interchangeably, according to context, with, for example, the expression “suitable for,” “having the capacity to,” “designed to,” “adapted to,” “made to,” or “capable of”. The term “configured to (or set to)” may not necessarily mean only “specifically designed to” in terms of hardware. Instead, the expression “a system configured to” may mean, in some contexts, the system being “capable of”, together with other devices or components. For example, the expression “a processor configured to (or set to) perform A, B, and C” may mean a dedicated processor (e.g., an embedded processor) for performing the corresponding operations, or a general-purpose processor (e.g., a central processing unit (CPU) or an application processor (AP)) capable of performing the corresponding operations by executing one or more software programs stored in memory.
Furthermore, in the disclosure, when a component is referred to as being “connected” or “coupled” to another component, it should be understood that the component may be directly connected or coupled to the other component, but may also be connected or coupled to the other component via another intervening component therebetween unless there is a particular description contrary thereto.
As used herein, ‘camera calibration’ refers to an operation of estimating or obtaining a positional relationship between a plurality of cameras. The positional relationship between the cameras may include information about positions and orientations of the plurality of cameras arranged at different locations. In an embodiment of the disclosure, the camera calibration may include an operation of obtaining a rotation matrix, denoted by R, and a translation vector, denoted by t. The camera calibration may also be referred to as ‘pose estimation between cameras’.
As used in the disclosure, a ‘joint’ is a part of a human body where bones are connected to each other, such as one or more regions included in the head, neck, arm, shoulder, waist, knee, leg, or foot.
As used herein, ‘joint feature points’ represent position coordinate values for a plurality of joints included in a body.
As used herein, a ‘three-dimensional (3D) pose’ of a user refers to a pose consisting of 3D position coordinate values of 3D feature points of joints of the user. In a 3D pose, a ‘pose’ has a different meaning than a pose between cameras, which is another expression for camera calibration.
In the disclosure, functions related to artificial intelligence (AI) are performed via a processor and memory. The processor may be configured as one or a plurality of processors. In this case, the one or plurality of processors may be a general-purpose processor such as a CPU, an AP, a digital signal processor (DSP), etc., a dedicated graphics processor such as a graphics processing unit (GPU) and a vision processing unit (VPU), or a dedicated AI processor such as a neural processing unit (NPU). The one or plurality of processors control input data to be processed according to predefined operation rules or AI model stored in the memory. Alternatively, in a case that the one or plurality of processors are a dedicated AI processor, the dedicated AI processor may be designed with a hardware structure specialized for processing a particular AI model.
The predefined operation rules or AI model are created via a training process. In this case, the creation via the training process means that the predefined operation rules or AI model set to perform desired characteristics (or purposes) are created by training a basic AI model based on a large number of training data via a learning algorithm. The training process may be performed by an apparatus itself on which AI according to the disclosure is performed, or via a separate server and/or system. Examples of a learning algorithm may include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning.
In the disclosure, an ‘AI model’ may consist of a plurality of neural network layers. Each of the plurality of neural network layers has a plurality of weight values and may perform neural network computations via calculations between a result of computations in a previous layer and the plurality of weight values. A plurality of weight values assigned to each of the plurality of neural network layers may be optimized by a result of training the AI model. For example, the plurality of weight values may be updated to reduce or minimize a loss or cost value obtained in the AI model during a training process. An artificial neural network model may include a deep neural network (DNN), such as a convolutional neural network (CNN), a recurrent neural network (RNN), a restricted Boltzmann machine (RBM), a deep belief network (DBN), a bidirectional recurrent DNN (BRDNN), or a deep Q-network (DQN), but is not limited thereto.
An embodiment of the disclosure will be described more fully hereinafter with reference to the accompanying drawings so that the embodiment may be easily implemented by a person of ordinary skill in the art. However, the disclosure may be implemented in different forms and should not be construed as being limited to embodiments set forth herein.
It should be appreciated that the blocks in each flowchart and combinations of the flowcharts may be performed by one or more computer programs which include instructions. The entirety of the one or more computer programs may be stored in a single memory device or the one or more computer programs may be divided with different portions stored in different multiple memory devices.
Any of the functions or operations described herein can be processed by one processor or a combination of processors. The one processor or the combination of processors is circuitry performing processing and includes circuitry like an application processor (AP, e.g. a central processing unit (CPU)), a communication processor (CP, e.g., a modem), a graphics processing unit (GPU), a neural processing unit (NPU) (e.g., an artificial intelligence (AI) chip), a Wi-Fi chip, a Bluetooth® chip, a global positioning system (GPS) chip, a near field communication (NFC) chip, connectivity chips, a sensor controller, a touch controller, a finger-print sensor controller, a display driver integrated circuit (IC), an audio CODEC chip, a universal serial bus (USB) controller, a camera controller, an image processing IC, a microprocessor unit (MPU), a system on chip (SoC), an IC, or the like.
Hereinafter, embodiments of the disclosure are described in detail with reference to the accompanying drawings.
Referring to
The electronic device 100 extracts joint feature points from each of the first image 10 and the second image 20 (operation {circle around (1)}). In an embodiment of the disclosure, the electronic device 100 may extract first joint feature points Pi1_1, Pi1_2, . . . , and Pi1_n from the first image 10 and second joint feature points Pi2_1, Pi2_2, . . . , and Pi2_n from the second image 20. The first joint feature points Pi1_1, Pi1_2, . . . , and Pi1_n and the second joint feature points Pi2_1, Pi2_2, . . . , and Pi2_n may include a plurality of 2D position coordinate values for a plurality of joints included in a body of the user 1, e.g., one or more regions included in the head, neck, arms, shoulders, waist, knees, legs, or feet. In an embodiment of the disclosure, the electronic device 100 may obtain the first joint feature points Pi1_1, Pi1_2, . . . , and Pi1_n from the first image 10 and the second joint feature points Pi2_1, Pi2_2, . . . , and Pi2_n from the second image 20 by using an AI model trained to extract 2D position coordinate values corresponding to joint feature points for human joints from a 2D image.
The electronic device 100 may obtain 3D joint feature points P1, P2, . . . , and Pn by lifting the first joint feature points Pi1_1, Pi1_2, . . . , and Pi1_n extracted from the first image 10 to 3D position coordinate values (operation {circle around (2)}). In an embodiment of the disclosure, the electronic device 100 may obtain 3D joint feature points P1, P2, . . . , and Pn from the first joint feature points Pi1_1, Pi1_2, . . . , and Pi1_n, which are 2D position coordinate values, by using an AI model trained to obtain 3D joint feature points from joint feature points included in an red, green, and blue (RGB) image. For example, the fifth feature point Pi1_5 among the first joint feature points Pi1_1, Pi1_2, . . . , and Pi1_n may be transformed into position coordinate values of the fifth 3D joint feature point P5 among the 3D position coordinate values via lifting.
The electronic device 100 may obtain a projection relationship for respectively matching the 3D joint feature points P1, P2, . . . , and Pn with the second joint feature points Pi2_1, Pi2_2, . . . , and Pi2_n in the second image 20 (operation {circle around (3)}). In an embodiment of the disclosure, the electronic device 100 may obtain information about a projection relationship for projecting the 3D joint feature points P1, P2, . . . , and Pn to match 2D position coordinate values of the second joint feature points Pi2_1, Pi2_2, . . . , and Pi2_n. According to a result of projecting the 3D joint feature points P1, P2, . . . , and Pn, combinations of the 3D joint feature points P1, P2, . . . , Pn and the second joint feature points Pi2_1, Pi2_2, . . . , and Pi2_n may have 2D-3D correspondences. In the embodiment illustrated in
The electronic device 100 predicts a positional relationship of the second camera 220 based on the projection relationship (operation {circle around (4)}). In an embodiment of the disclosure, the electronic device 100 may predict a position and an orientation of the second camera 220 based on the projection relationship between the 3D joint feature points P1, P2, . . . , and Pn and the second joint feature points Pi2_1, Pi2_2, . . . , and Pi2_n.
The electronic device 100 may obtain a relative positional relationship between the first camera 210 and the second camera 220 (operation {circle around (5)}). In an embodiment of the disclosure, the electronic device 100 may obtain information about the relative position and orientation of the second camera 220 with respect to a position and an orientation of the first camera 210 as a pose between the cameras. For example, the electronic device 100 may estimate a relative positional relationship between the cameras by using a Perspective-n-Point (PnP) method. In an embodiment of the disclosure, the electronic device 100 may obtain information about a relative positional relationship between the cameras, including a rotation matrix R and a translation vector t.
The electronic device 100 may perform camera calibration by using the relative positional relationship between the cameras. A specific method, performed by the electronic device 100, of performing camera calibration by using the first image 10 obtained from the first camera 210 and the second image 20 obtained from the second camera 220 is described with reference to
Referring to
The electronic device 100 may extract joint feature points from each of the first image 10 and the second image 20. In the embodiment illustrated in
The electronic device 100 may obtain 3D joint feature points P1, P2, . . . , and Pn by performing 2D-3D lifting of the first joint feature points Pi1_1, Pi1_2, . . . , and Pi1_n. By performing 2D-3D lifting, the electronic device 100 may transform the first joint feature points Pi1_1, Pi1_2, . . . , and Pi1_n consisting of 2D position coordinate values into the 3D joint feature points P1, P2, . . . , and Pn, which are 3D position coordinate values. In the embodiment illustrated in
The electronic device 100 may obtain a projection relationship R, t for matching the 3D joint feature points P1, P2, . . . , and Pn with the second joint feature points Pi2_1, Pi2_2, . . . , and Pi2_n extracted from the second image 20. In an embodiment of the disclosure, the electronic device 100 may obtain information about the projection relationship R, t for projecting the 3D joint feature points P1, P2, . . . , and Pn to match the 2D position coordinate values of the second joint feature points Pi2_1, Pi2_2, . . . , and Pi2_n. In an embodiment of the disclosure, the projection relationship R, t may include information about a relative positional relationship between the cameras, including a rotation matrix R and a translation vector t.
The electronic device 100 may predict a relative positional relationship between the first camera 210 and the second camera 220 based on the projection relationship R, t, and perform camera calibration by using the relative positional relationship therebetween.
Existing camera calibration methods that are commonly used include structure-from-motion (SfM), stereo vision, visual localization, or using a checkerboard. Among the existing methods, SfM is a method involving extracting 2D feature points from a plurality of images captured at different angles by using a plurality of cameras, and estimating a positional relationship between the cameras by matching corresponding 2D feature points between the plurality of images among the extracted 2D feature points. Because SfM is applied only to pairs of 2D feature points on the same plane, this method has a limitation in that it is not applied to other planes. Among the other methods, using a checkerboard is a method involving obtaining a plurality of images of a checkerboard captured by using a plurality of cameras and predicting a positional relationship between the cameras by matching grid points on the checkerboard in the obtained plurality of images. The method using a checkerboard has a prerequisite that all grid points on the checkerboard be on the same plane, which is cumbersome because the method requires the checkerboard, and when a position of a camera changes, the checkerboard is required to be prepared again. In addition, stereo vision or visual localization methods necessarily have prerequisites such as knowing 3D position coordinate values of a reference point in advance.
The disclosure aims to provide the electronic device 100 and a method for operating the same for performing camera calibration by predicting a positional relationship between cameras by using feature points related to joints of the user without having a prerequisite such as a prerequisite that 2D feature points exist on the same plane, or without a separate device such as a checkerboard.
According to the embodiments illustrated in
According to an embodiment of the disclosure, the electronic device 100 performs camera calibration by using joint feature points extracted from an image of a user in an environment where a camera is not fixed and a position or view of the camera changes, for example, in the case of) a camera in a mobile device, a home closed-circuit television (CCTV), or a robot vacuum cleaner, thereby increasing usability in daily life.
According to an embodiment of the disclosure, the electronic device 100 may obtain a 3D pose of the user via triangulation based on the first image 10, the second image 20, and the relative positional relationship between the first camera 210 and the second camera 220 obtained by performing camera calibration.
The electronic device 100 may be a mobile device, such as a smartphone, a tablet personal computer (PC), a laptop computer, a digital camera, an e-book terminal, a digital broadcasting terminal, a personal digital assistant (PDA), a portable multimedia player (PMP), a navigation device, an MP3 player, or the like. In an embodiment of the disclosure, the electronic device 100 may be a home appliance such as a television (TV), an air conditioner, a robot vacuum cleaner, or a clothing manager. However, the disclosure is not limited thereto, and in another embodiment of the disclosure, the electronic device 100 may be implemented as a wearable device, such as a smartwatch, an eye glasses-shaped augmented reality (AR) device (e.g., AR glasses), a head-mounted display (HMD) apparatus, or a body-attached device (e.g., a skin pad).
Referring to
The first camera 210 and the second camera 220 are each configured to obtain an image of an object included in a real-world space (e.g., an indoor space) by capturing the image of the object. The first camera 210 and the second camera 220 may each include a lens module, an image sensor, and an image processing module. The first camera 210 and the second camera 220 may each obtain a still image or a video of an object through an image sensor (e.g., a complementary metal-oxide-semiconductor (CMOS) or charge-coupled device (CCD) sensor). An image processing module may encode a still image consisting of a single image frame or video data consisting of a plurality of image frames obtained through the image sensor, and transmit the encoded data to the processor 120.
In an embodiment of the disclosure, the first camera 210 may obtain a first image by capturing an image of a user, and the second camera 220 may obtain a second image by capturing an image of the user. The first camera 210 and the second camera 220 may be connected to the electronic device 100 via a wired or wireless communication network and transmit and receive data to and from the electronic device 100. In the embodiment illustrated in
The communication interface 110 is configured to perform data communication with an external device or server. In an embodiment of the disclosure, the communication interface 110 may be connected to the first camera 210 and the second camera 220 via a wired or wireless communication network, and receive the first image from the first camera 210 and the second image from the second camera 220. The communication interface 110 may receive the first image and the second image from the first camera 210 and the second camera 220, respectively, by using at least one of data communication methods including, for example, wired local area network (LAN), wireless LAN, Wi-Fi, Bluetooth, ZigBee, Wi-Fi Direct (WFD), Infrared Data Association (IrDA), Bluetooth Low Energy (BLE), near field communication (NFC), wireless broadband Internet (WiBro), World Interoperability for Microwave Access (WiMAX), Shared Wireless Access Protocol (SWAP), Wireless Gigabit Alliance (WiGig), and radio frequency (RF) communication. The communication interface 110 may provide image data of the received first image and second image to the processor 120.
The processor 120 may execute one or more instructions of a program stored in the memory 130. The processor 120 may be composed of hardware components that perform arithmetic, logic, and input/output (I/O) operations, and image processing. The processor 120 is shown as an element in
The processor 120 according to an embodiment of the disclosure may include various processing circuitry and/or multiple processors. For example, as used herein, including the claims, the term “processor” may include various processing circuitry, including at least one processor, wherein one or more of at least one processor, individually and/or collectively in a distributed manner, may be configured to perform various functions described herein. As used herein, when “a processor”, “at least one processor”, and “one or more processors” are described as being configured to perform numerous functions, these terms cover situations, for example and without limitation, in which one processor performs some of recited functions and another processor(s) performs other of recited functions, and also situations in which a single processor may perform all recited functions. Additionally, the at least one processor may include a combination of processors performing a variety of the recited/disclosed functions, e.g., in a distributed manner. At least one processor may execute program instructions to achieve or perform various functions.
The memory 130 may include at least one type of storage medium among, for example, flash memory-type memory, a hard disk-type memory, a multimedia card micro-type memory, card-type memory (e.g., a Secure Digital (SD) card or an eXtreme Digital (XD) memory), random access memory (RAM), static RAM (SRAM), read-only memory (ROM), electrically erasable programmable ROM (EEPROM), PROM, or an optical disc.
The memory 130 may store instructions related to operations of the electronic device 100 performing calibration. In an embodiment of the disclosure, the memory 130 may store at least one of instructions, algorithms, data structures, program code, and application programs readable by the processor 120. The instructions, algorithms, data structures, and program code stored in memory 130 may be implemented in programming or scripting languages such as C, C++, Java, assembler, etc.
The memory 130 may store instructions, algorithms, data structures, or program code related to a joint feature point extraction module 132, a lifting module 134, a camera calibration module 136, and a 3D pose estimation module 138. A ‘module’ included in the memory 130 refers to a unit for processing a function or an operation performed by the processor 120, and may be implemented as software such as instructions, algorithms, data structures, or program code.
Hereinafter, with reference to
The joint feature point extraction module 132 is composed of instructions or program code related to a function and/or an operation of extracting feature points related to human joints from a 2D image. In an embodiment of the disclosure, the joint feature point extraction module 132 may include an AI model trained, via supervised learning, by applying a 2D image as input data and applying a plurality of 2D position coordinate values of human joints, e.g., one or more regions included in a head, a neck, arms, shoulders, waist, knees, legs, or feet, extracted from the 2D image, as output ground truth. For example, the AI model may be configured as a DNN, such as a CNN, an RNN, an RBM, a DBN, a BRDNN, or a DQN. In an embodiment of the disclosure, the joint feature point extraction module 132 may include a pose estimation model trained to extract 2D feature points of joints from a 2D RGB image and output a 2D pose by using the extracted 2D feature points. The pose estimation model may be configured as a DNN model, such as TensorFlow Lite or LitePose, but is not limited thereto.
The processor 120 may extract, from the first image, first joint feature points, which are 2D position coordinate values for joints, and extract, from the second image, second joint feature points, which are 2D position coordinate values for the joints, by executing the instructions or program code of the joint feature point extraction module 132. In an embodiment of the disclosure, the processor 120 may input the first image to the pose estimation model configured as a DNN model, and extract the first joint feature points from the first image via inference using the pose estimation model. Similarly, the processor 120 may input the second image to the pose estimation model, and extract the second joint feature points from the second image via inference using the pose estimation model. The processor 120 may obtain a 2D pose of the user from the first image based on the extracted first joint feature points. The processor 120 may obtain a 2D pose of the user from the second image based on the second joint feature points.
The joint feature point extraction module 132 may provide first joint feature data to the lifting module 134 and second joint feature data to the camera calibration module 136.
The lifting module 134 is composed of instructions or program code related to a function and/or an operation of obtaining 3D position coordinate values from 2D position coordinate values. In an embodiment of the disclosure, the lifting module 134 may include an AI model trained, via supervised learning, by applying 2D feature points obtained from an RGB image as input data and applying 3D position coordinate values corresponding to the 2D feature points as output ground truth. The AI model included in the lifting module 134 may be configured as, for example, a multi-stage CNN model, but is not limited thereto. For example, the lifting module 134 may also include a DNN model such as an RNN, an RBM, a DBN, a BRDNN, or a DQN.
The processor 120 may obtain 3D joint feature points, which are 3D position coordinate values by lifting first joint feature points, which are 2D position coordinate values, by executing the instructions or program code of the lifting module 134. In an embodiment of the disclosure, the processor 120 may input the first joint feature points to a DNN model included in the lifting module 134, and obtain 3D joint feature points that are 3D position coordinate values via inference using the DNN model. The processor 120 may obtain a lifting image representing a 3D pose of the user based on the 3D joint feature points.
In an embodiment of the disclosure, the processor 120 may determine whether a 3D pose of the user composed of the 3D joint feature points is suitable to apply to camera calibration, based on a distribution of position coordinate values in the z-axis direction among 3D position coordinate values included in the 3D joint feature points obtained through the lifting. In a case that the processor 120 determines that the pose of the user is not suitable to apply to camera calibration, the processor 120 may display, on the display, guide information requesting the user to assume a predetermined pose. A specific embodiment in which the processor 120 determines suitability for application to camera calibration based on z-axis coordinate values of 3D joint feature points is described in detail with reference to
The lifting module 134 may provide the 3D position coordinate values to the camera calibration module 136.
In an embodiment of the disclosure, the processor 120 may obtain a plurality of image frames captured over a certain period of time from the first camera 210, and obtain a plurality of 3D joint feature points by lifting, through the lifting module 134, a plurality of first joint feature points extracted from each of the plurality of image frames. The processor 120 may identify an image frame with a largest degree of distribution of position coordinate values in the z-axis direction among a plurality of 3D position coordinate values included in the plurality of 3D joint feature points, and provide information about the identified image frame to the camera calibration module 136. A specific embodiment in which the processor 120 identifies an image frame with a largest degree of distribution of z-axis coordinate values among 3D position coordinate values from a plurality of image frames, and performs camera calibration by using the identified image frame is described in detail with reference to
The camera calibration module 136 is composed of instructions or program code related to a function and/or an operation of obtaining a relative positional relationship between cameras based on a projection relationship. In an embodiment of the disclosure, the camera calibration module 136 may obtain information about a projection relationship regarding a projection rotation direction and a position movement for projecting 3D position coordinate values onto 2D position coordinate values. In an embodiment of the disclosure, a ‘projection relationship R, t’ may include information about a relative positional relationship between the cameras, including a rotation matrix R and a translation vector t.
The processor 120 may obtain a projection relationship R, t for projecting 3D joint feature points onto 2D positional coordinate values of second joint feature points by executing the instructions or program code of the camera calibration module 136. In an embodiment of the disclosure, the processor 120 may obtain information about a rotation direction and a movement distance value for projecting the 3D joint feature points to match the 2D position coordinate values of the second joint feature points. The processor 120 may perform camera calibration by predicting a relative positional relationship between the first camera 210 and the second camera 220 based on the projection relationship.
The camera calibration module 136 may provide information about the projection relationship R, t to the 3D pose estimation module 138.
The 3D pose estimation module 138 is composed of instructions or program code related to a function and/or an operation of estimating a 3D pose of the user by reflecting a result of camera calibration. In an embodiment of the disclosure, the 3D pose estimation module 138 may calculate 3D position coordinate values for feature points of the user's joints by using triangulation, and estimate a 3D pose of the user by using the calculated 3D position coordinate values.
The processor 120 may obtain 3D position coordinate values for feature points of the user's joints that reflect the result of camera calibration by executing the instructions or program code of the 3D pose estimation module 138. By using triangulation, the processor 120 may calculate 3D position coordinate values of the user's joints, based on the first joint feature points, the second joint feature points, and the relative positional relationship between the first camera 210 and the second camera 220 obtained by performing the camera calibration. The processor 120 may predict a 3D pose of the user based on the 3D position coordinate values.
In an embodiment of the disclosure, the processor 120 may obtain first position coordinate values and second position coordinate values by reprojecting the 3D position coordinate values onto 2D position coordinate values based on camera calibration information, and determine accuracy of the calibration based on the first position coordinate values and the second position coordinate values obtained as a result of the reprojection. In an embodiment of the disclosure, the processor 120 may measure bone lengths between joints from the estimated 3D pose, and determine the accuracy of the calibration based on the measured bone lengths. A specific embodiment in which the processor 120 determines the accuracy of camera calibration is described in detail with reference to
In operation S410, the electronic device 100 obtains a first image of a user captured from the first camera, and obtains a second image of the user captured from the second camera. In an embodiment of the disclosure, the electronic device 100 may respectively receive image data of the first image and image data of the second image from the first camera and the second camera via a wired or wireless communication network. The electronic device 100 may receive the first image and the second image from the first camera 210 and the second camera 220, respectively, by using at least one of data communication methods including, for example, wired LAN, wireless LAN, Wi-Fi, Bluetooth, ZigBee, WFD, IrDA, BLE, NFC, WiBro, WiMAX, SWAP, WiGig, and RF communication.
In operation S420, the electronic device 100 extracts first joint feature points, which are 2D position coordinates of joints of the user, from the first image, and second joint feature points, which are 2D position coordinates of the joints, from the second image. In an embodiment of the disclosure, the electronic device 100 may input the first image to an AI model trained to output a plurality of 2D position coordinate values for human joints, e.g., one or more regions included in a head, a neck, arms, shoulders, a waist, knees, legs, or feet, from a 2D image, and obtain first joint feature points via inference using the AI model. Similarly, the electronic device 100 may input the second image to the AI model, and obtain second joint feature points via inference using the AI model. The AI model may be implemented as a DNN model trained, via supervised learning, by applying a 2D image as input data and 2D feature points of human joints included in the 2D image as ground truth. The DNN model may be, for example, a pose estimation model. Because the ‘pose estimation model’ is the same as described with reference to
In operation S430, the electronic device 100 obtains 3D joint feature points of the joints by lifting the extracted first joint feature points. In an embodiment of the disclosure, the electronic device 100 may input the first joint feature points to an AI model trained, via supervised learning, by applying 2D feature points extracted from an RGB image as input data and 3D position coordinate values corresponding to the 2D feature points as output ground truth, and obtain 3D joint feature points of the joints via inference using the AI model. The AI model may be configured as, for example, a multi-stage CNN model that performs a lifting function and/or operation, but is not limited thereto.
In operation S440, the electronic device 100 obtains a projection relationship for projecting the 3D joint feature points onto the 2D position coordinate values of the second joint feature points. In an embodiment of the disclosure, the electronic device 100 may obtain a projection relationship regarding a rotation direction and a movement distance value for projecting the 3D joint feature points to match the 2D position coordinate values of the second joint feature points. In an embodiment of the disclosure, the ‘projection relationship’ may include information about a relative positional relationship between the cameras, including a rotation matrix R and a translation vector t.
In operation S450, the electronic device 100 performs camera calibration by predicting a relative positional relationship between the first camera and the second camera based on the obtained projection relationship.
Operations S510 to S530 of
Hereinafter, operations of the electronic device 100 are described with reference to
Referring to
Referring to
The processor (120 of
Referring back to
In a case that it is determined, as a result of the determination, that the 3D joint feature points are not suitable to apply to camera calibration, the electronic device 100 displays guide information requesting the user to assume a predetermined pose (operation S530). In an embodiment of the disclosure, a ‘predetermined pose’ may be a pose with a large degree of movement in the z-axis direction. The predetermined pose may be set based on a user input, but is not limited thereto. The predetermined pose may be a pose input in advance as guide information. The predetermined pose may be, for example, a pose with one hand extended forward. In an embodiment of the disclosure, the electronic device 100 may further include a display, and the processor (120 of
After the guide information is displayed, the electronic device 100 obtains a first image and a second image by recapturing an image of the user assuming the predetermined pose according to the guide information (operation S410).
In order to perform camera calibration by using the 3D joint feature points obtained through lifting, the accuracy of the lifted 3D joint feature points is important. In the embodiments illustrated in
Operation S710 illustrated in
Hereinafter, operations of the electronic device 100 are described with reference to
Referring to
In operation S720 of
In operation S730 of
In operation S740 of
In operation S750 of
In order to perform camera calibration by using 3D joint feature points obtained through lifting, the accuracy of the obtained 3D joint feature points is important. According to the embodiments illustrated in
Operations S910 to S960 illustrated in
Hereinafter, the operation in which the electronic device 100 determines the accuracy of camera calibration is described with reference to
Referring to
In operation S920 of
In operation S930 of
In operation S940 of
In a case that the difference value is less than the threshold α (operation S950) as a result of the comparison, the electronic device 100 determines accuracy of the camera calibration and then terminates the process. In an embodiment of the disclosure, when the difference value is less than the threshold α, the processor 120 may determine that the camera calibration was performed inaccurately.
In a case that the difference value is greater than or equal to the threshold α as a result of the comparison (operation S960), the electronic device 100 determines that the camera calibration is inaccurate and performs re-calibration. Referring to the embodiment illustrated in
Operations S1110 to S1160 illustrated in
In operation S1110, the electronic device 100 obtains a 3D pose of the user based on the first joint feature points, the second joint feature points, and the positional relationship between the first camera and the second camera. In an embodiment of the disclosure, the processor 120 may use triangulation to calculate 3D position coordinate values of the joints, based on the first joint feature points, which are the 2D position coordinate values extracted from the first image, the second joint feature points, which are the 2D position coordinate values extracted from the second image, and the relative positional relationship between the cameras. The electronic device 100 may obtain a 3D pose of the user based on the calculated 3D position coordinate values of the joints.
In operation S1120, the electronic device 100 measures a bone length between joints from the 3D pose. In an embodiment of the disclosure, the processor (120 of
Referring to operation S1120 in conjunction with
Referring back to
In operation S1140 of
In a case that the difference value is less than the threshold β (operation S1150) as a result of the comparison, the electronic device 100 determines accuracy of the camera calibration and then terminates the process. In an embodiment of the disclosure, in a case that the difference value is less than the threshold β, the processor 120 may determine that the camera calibration was performed accurately. Referring to
In a case that the difference value is greater than or equal to the threshold β as a result of the comparison (operation S1160), the electronic device 100 determines that the camera calibration is inaccurate and performs re-calibration.
Referring
In the embodiments illustrated in
Operation S1310 illustrated in
Hereinafter, an operation in which an electronic device 100 distinguishes the plurality of users 1410 and 1420 in the images 1401 and 1402 is described with reference to
Referring to
In operation S1320 of
In operation S1330 of
In operation S1340 of
The electronic device 100 performs camera calibration based on the obtained projection relationship (operation S450).
In the embodiments illustrated in
The disclosure provides an electronic device 100 for performing camera calibration. According to an embodiment of the disclosure, the electronic device 100 may include a communication interface 110, memory 130 storing one or more computer programs, and one or more processors 120 communicatively coupled to the communication interface and the memory, wherein the one or more computer programs include computer-executable instructions that, when executed by the one or more processors individually or collectively, cause the electronic device to obtain, via the communication interface 110, a first image of a user captured by a first camera, and a second image of the user captured by a second camera, extract first joint feature points, which are 2D position coordinate values of joints of the user, from the first image, and second joint feature points, which are 2D position coordinate values of the joints, from the second image, obtain 3D joint feature points of the joints by lifting the extracted first joint feature points to 3D position coordinate values, obtain a projection relationship for projecting the 3D joint feature points onto the 2D position coordinate values of the second joint feature points, and to perform camera calibration by predicting a positional relationship between the first camera and the second camera based on the obtained projection relationship.
In an embodiment of the disclosure, the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors 120 individually or collectively, cause the electronic device to obtain information about a rotation direction and a movement distance value for projecting the 3D joint feature points to match the 2D position coordinate values of the second joint feature points.
In an embodiment of the disclosure, the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors 120 individually or collectively, cause the electronic device to determine whether a 3D pose of the user consisting of the 3D joint feature points are suitable to apply to the camera calibration, based on a distribution of coordinate values in a z-axis direction among the 3D position coordinate values included in the 3D joint feature points.
In an embodiment of the disclosure, the electronic device 100 may further include a display, and the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors 120 individually or collectively, cause the electronic device to, in a case that it is determined that the pose of the user is not suitable for the application to the camera calibration, control the display to display guide information requesting the user to assume a predetermined pose.
In an embodiment of the disclosure, the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors 120 individually or collectively, cause the electronic device to obtain a plurality of image frames captured over a certain period of time from the first camera, obtain a plurality of 3D joint feature points by lifting a plurality of first joint feature points extracted from each of the plurality of image frames, identify, among the plurality of first image frames, an image frame with a largest degree of distribution of coordinate values in the z-axis direction among a plurality of 3D position coordinate values included in the plurality of 3D joint feature points, extract the second joint feature points from the second image corresponding to the identified image frame among the plurality of second image frames, and perform the camera calibration based on a projection relationship between 3D joint coordinate values obtained from the identified image frame and the second joint feature points extracted from the second image.
In an embodiment of the disclosure, the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors 120 individually or collectively, cause the electronic device to obtain 3D position coordinate values of the joints of the user, based on the first joint feature points, the second joint feature points, and the positional relationship between the first camera and the second camera, and estimate a 3D pose of the user based on the obtained 3D position coordinate values of the joints.
In an embodiment of the disclosure, the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors 120 individually or collectively, cause the electronic device to obtain first position coordinate values and second position coordinate values by reprojecting the 3D position coordinate values onto 2D position coordinate values based on information about calibration between the first camera and the second camera, calculate a difference value between the first position coordinate values obtained as a result of the reprojection and the first joint feature points and a difference value between the second position coordinate values and the second joint feature points, compare the calculated difference values with a predetermined threshold, and determine accuracy of the camera calibration based on a result of the comparison.
In an embodiment of the disclosure, the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors 120 individually or collectively, cause the electronic device to measure a bone length between joints from the 3D pose, calculate a difference value by comparing the measured bone length with a bone length of a normal person, and determine whether to re-perform the camera calibration based on a result of comparing the calculated difference value with a predetermined threshold.
In an embodiment of the disclosure, the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors 120 individually or collectively, cause the electronic device to extract, from the first image, a plurality of first joint feature points that are 2D position coordinate values of joints of a plurality of users, and extract, from the second image, a plurality of second joint feature points that are 2D position coordinate values of the joints of the plurality of users, obtain a plurality of first 3D joint feature points and a plurality of second 3D joint feature points by lifting the plurality of first joint feature points and the plurality of second joint feature points to 3D position coordinate values, respectively, and distinguish the plurality of users included in the first image and the second image by matching a first 3D pose consisting of the obtained plurality of first 3D joint feature points with a second 3D pose consisting of the plurality of second 3D joint feature points.
In an embodiment of the disclosure, the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors 120 individually or collectively, cause the electronic device to obtain, based on a result of the distinguishing of the plurality of users, a projection relationship for respectively projecting the plurality of first 3D joint feature points onto the plurality of second joint feature points.
The disclosure provides a method, performed by an electronic device 100, for performing camera calibration. The method may include obtaining, by the electronic device, a first image of a user captured from a first camera 210, and a second image of the user captured from a second camera 220 (S410), extracting, by the electronic device, first joint feature points, which are 2D position coordinate values of joints of the user, from the first image, and second joint feature points, which are 2D position coordinate values of the joints, from the second image (S420), obtaining, by the electronic device, 3D joint feature points of the joints by lifting the extracted first joint feature points to 3D position coordinate values (S430), obtaining, by the electronic device, a projection relationship for projecting the 3D joint feature points onto the 2D position coordinate values of the second joint feature points (S440), and performing, by the electronic device, camera calibration by predicting a positional relationship between the first camera 210 and the second camera 220 based on the obtained projection relationship (S450).
In an embodiment of the disclosure, the obtaining of the projection relationship (S440) may include obtaining information about a rotation direction and a movement distance value for projecting the 3D joint feature points to match the 2D position coordinate values of the second joint feature points.
In an embodiment of the disclosure, the method may further include determining whether a 3D pose of the user consisting of the 3D joint feature points are suitable to apply to the camera calibration, based on a distribution of coordinate values in a z-axis direction among the 3D position coordinate values included in the 3D joint feature points (S510).
In an embodiment of the disclosure, the method may further include displaying guide information requesting the user to assume a predetermined pose in a case that it is determined that the 3D pose of the user is not suitable for the application to the camera calibration (S530).
In an embodiment of the disclosure, the method may further include obtaining 3D position coordinate values of the joints of the user, based on the first joint feature points, the second joint feature points, and the positional relationship between the first camera 210 and the second camera 220, and estimating a 3D pose of the user based on the obtained 3D position coordinate values of the joints.
In an embodiment of the disclosure, the method may further include obtaining first position coordinate values and second position coordinate values by reprojecting the 3D position coordinate values onto 2D position coordinate values based on information about calibration between the first camera and the second camera (S920), calculating a difference value between the first position coordinate values obtained as a result of the reprojection and the first joint feature points and a difference value between the second position coordinate values and the second joint feature points (S930), comparing the calculated difference values with a predetermined threshold (S940), and determining accuracy of the camera calibration based on a result of the comparing.
In an embodiment of the disclosure, the method may further include measuring a bone length between joints from the 3D pose (S1120), calculating a difference value by comparing the measured bone length with a bone length of a normal person (S1130), and determining whether to re-perform the camera calibration based on a result of comparing of the calculated difference value with a predetermined threshold.
In an embodiment of the disclosure, the extracting of the first joint feature points and the second joint feature points (S420) may include extracting, from the first image, a plurality of first joint feature points that are 2D position coordinate values of joints of a plurality of users, and extracting, from the second image, a plurality of second joint feature points that are 2D position coordinate values of the joints of the plurality of users (S1310), obtaining a plurality of first 3D joint feature points and a plurality of second 3D joint feature points by lifting the plurality of first joint feature points and the plurality of second joint feature points to 3D position coordinate values (S1320), respectively, and distinguishing the plurality of users included in the first image and the second image by matching a first 3D pose consisting of the obtained plurality of first 3D joint feature points with a second 3D pose consisting of the plurality of second 3D joint feature points (S1330).
In an embodiment of the disclosure, the obtaining of the projection relationship may include obtaining, based on a result of the distinguishing of the plurality of users, the projection relationship for respectively projecting the plurality of first 3D joint feature points onto the plurality of second joint feature points (S1340).
The disclosure provides one or more non-transitory computer-readable storage media storing one or more computer programs including computer-executable instructions that, when executed by one or more processors of an electronic device individually or collectively, cause the electronic device to perform operations. The operations include obtaining, by the electronic device, a first image of a user captured by a first camera 210 and a second image of the user captured by a second camera 220, extracting, by the electronic device, first joint feature points, which are 2D position coordinate values of joints of the user, from the first image, and second joint feature points, which are 2D position coordinate values of the joints, from the second image, obtaining, by the electronic device, 3D joint feature points of the joints by lifting the extracted first joint feature points to 3D position coordinate values, obtaining, by the electronic device, a projection relationship for projecting the 3D joint feature points onto the 2D position coordinate values of the second joint feature points, and performing, by the electronic device, camera calibration by predicting a positional relationship between the first camera 210 and the second camera 220 based on the obtained projection relationship.
A program executed by the electronic device 100 described in this specification may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. The program may be executed by any system capable of executing computer-readable instructions.
Software may include a computer program, a piece of code, an instruction, or a combination of one or more thereof, and configure a processing device to operate as desired or instruct the processing device independently or collectively.
The software may be implemented as a computer program including instructions stored in computer-readable storage media. Examples of the computer-readable recording media include magnetic storage media (e.g., ROM, RAM, floppy disks, hard disks, etc.), optical recording media (e.g., compact disc (CD)-ROM and a digital versatile disc (DVD)), etc. The computer-readable recording media may be distributed over computer systems connected through a network so that computer-readable code may be stored and executed in a distributed manner. The media may be readable by a computer, stored in memory, and executed by a processor.
A computer-readable storage medium may be provided in the form of a non-transitory storage medium. In this regard, the term ‘non-transitory’ only means that the storage medium does not include a signal and is a tangible device, and the term does not differentiate between where data is semi-permanently stored in the storage medium and where the data is temporarily stored in the storage medium. For example, the ‘non-transitory storage medium’ may include a buffer in which data is temporarily stored.
Furthermore, programs according to embodiments disclosed in the specification may be included in a computer program product when provided. The computer program product may be traded, as a product, between a seller and a buyer.
The computer program product may include a software program and a computer-readable storage medium having stored thereon the software program. For example, the computer program product may include a product (e.g., a downloadable application) in the form of a software program electronically distributed by a manufacturer of the electronic device 100 or through an electronic market (e.g., Samsung Galaxy Store™). For such electronic distribution, at least a part of the software program may be stored in the storage medium or may be temporarily generated. In this case, the storage medium may be a storage medium of a server of a manufacturer of the electronic device 100, a server of the electronic market, or a relay server for temporarily storing the software program.
In a system including the electronic device 100 and/or a server, the computer program product may include a storage medium of the server or a storage medium of the electronic device 100. Alternatively, in a case where there is a third device (e.g., a wearable device) communicatively connected to the electronic device 100, the computer program product may include a storage medium of the third device. Alternatively, the computer program product may include a software program itself that is transmitted from the electronic device 100 to the third device or that is transmitted from the third device to the electronic device.
In this case, one of the electronic device 100 and the third device may execute the computer program product to perform methods according to embodiments of the disclosure. Alternatively, at least one of the electronic device 100 or the third device may execute the computer program product to perform the methods according to the embodiments of the disclosure in a distributed manner.
For example, the electronic device 100 may execute the computer program product stored in the memory (130 of
In another example, the third device may execute the computer program product to control an electronic device communicatively connected to the third device to perform the methods according to the disclosed embodiments.
In a case where the third device executes the computer program product, the third device may download the computer program product from the electronic device 100 and execute the downloaded computer program product. Alternatively, the third device may execute the computer program product that is pre-loaded therein to perform the methods according to the disclosed embodiments.
For example, adequate effects may be achieved even when the above-described techniques are performed in a different order than that described above, and/or the aforementioned components such as computer systems or modules are coupled or combined in different forms and modes than those described above or are replaced or supplemented by other components or their equivalents.
While the disclosure has been shown and described with reference to various embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2022-0114495 | Sep 2022 | KR | national |
10-2022-0159491 | Nov 2022 | KR | national |
This application is a continuation application, claiming priority under 35 U.S.C. § 365 (c), of an International application No. PCT/KR2023/011608, filed on Aug. 7, 2023, which is based on and claims the benefit of a Korean patent application number 10-2022-0114495, filed on Sep. 8, 2022, in the Korean Intellectual Property Office, and of a Korean patent application number 10-2022-0159491, filed on Nov. 24, 2022, in the Korean Intellectual Property Office, the disclosure of each of which is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/KR2023/011608 | Aug 2023 | WO |
Child | 19071152 | US |