The present disclosure generally relates to the field of video encoding and decoding and in particular, encoding and decoding video and other data for machines.
Recent trends in robotics, surveillance, monitoring, Internet of Things, etc. have introduced use cases in which a significant portion of all the images and videos that are recorded in the field is consumed by machines only, without ever reaching human eyes. Those machines process images and videos with the goal of completing specific tasks such as object detection, object tracking, segmentation, event detection etc. Recognizing that this trend is prevalent and will only accelerate in the future, international standardization bodies have established efforts to standardize image and video coding that is primarily optimized for machine consumption. For example, standards like JPEG AI and Video Coding for Machines have ongoing efforts in addition to already established standards such as Compact Descriptors for Visual Search, and Compact Descriptors for Video Analytics. Solutions that improve efficiency compared to the classical image and video coding techniques are needed and are presented herein.
In one embodiment, a video encoder for encoding data for machine consumption is provided. The video encoder includes a region detector selection module receiving source video and detector selection parameters and selecting an object detector model. A region detection module applies the selected model to the source video to identify regions of interest in the source video. A region extractor module extracts the pixels for the identified regions from the source video. A region packing module receives the extracted regions from the source video and packs those regions into a packed frame in which pixels outside the regions of interest are omitted. A region parameter module receives the identified regions from the region extractor and provides parameters for placing the regions of interest in a reconstructed video frame. A video encoder receives the packed frame from the region packing module and region parameters from the region parameter module and generates an encoded bitstream.
In some embodiments, the region detector selection module selects one of a plurality of models based on detector selection parameters from a machine task system. The detection selection parameters from the machine task system may be updated based on the performance of the machine task system to the encoded bitstream.
In certain embodiments, the detector models may include at least one of a RetinaNet model and a Yolov7 model.
The region detection module may define each detected region at least in part by a rectangular bounding box. In some embodiments, the encoder may include a region padding module which adds a padding parameter to one or more dimensions of a bounding box of a detected region. Each detected region may have an associated region type and the padding parameter may be determined at least in part based on the object type. Alternatively or additionally, the padding parameter may be determined at least in part on the region size and/or bounding box size.
In another embodiment, the encoder may include a merge split region extractor module which further processes detected regions and performs at least one of selectively merging regions with substantial overlap and selectively splitting regions to optimize packing performance. The merge split region extractor module can receive adaptive extraction parameters from a machine task system and dynamically adjusts merge and split parameters based on said parameters.
In certain embodiments the encoder may include both a region padding module and a merge split region extractor module.
A method of encoding video data for consumption by machine processing is provided which includes the steps of receiving source video; identifying at least one region of interest in the source video, each region of interest defined by an associated bounding box; extracting identified content of the regions of interest within the associated bounding box from the source video; packing the extracted regions into a packed video frame in which pixels outside the regions of interest are omitted; providing region parameters for the bounding boxes sufficient to reconstruct the regions of interest in a reconstructed video frame; and generating an encoded bitstream including the packed frame and associated region parameters.
In some cases the method may further include for at least one region of interest, applying region padding to at least one dimension of the associated bounding box. The method may further include merge split processing comprising including at least one of selectively merging regions of interest with substantial overlap and selectively splitting regions to optimize packing performance. A region of interest may have an associated object type and the region padding may be determined at least in part on the object type. In some embodiments, a region of interest has an associated bounding box size and the region padding is determined at least based on the bounding box size.
In some embodiments, the method may include receiving performance data from a machine system at a decoder site receiving the encoded bitstream and the region padding is determined at least in part based on the received performance data.
The present disclosure also includes a video decoder comprising circuitry configured to receive and decode an encoded bitstream generated by the above-described encoders and encoding methods. The present disclosure further discloses embodiments of computer readable media on which an encoded bitstream is stored, the encoded bitstream being generated by any of the encoders and encoding methods described herein.
These and other aspects and features of non-limiting embodiments of the present invention will become apparent to those skilled in the art upon review of the following description of specific non-limiting embodiments in conjunction with the accompanying drawings.
For the purpose of illustrating the invention, the drawings show aspects of one or more embodiments of the invention. However, it should be understood that the present invention is not limited to the precise arrangements and instrumentalities shown in the drawings, wherein:
The drawings are not necessarily to scale and may be illustrated by phantom lines, diagrammatic representations and fragmentary views. In certain instances, details that are not necessary for an understanding of the embodiments or that render other details difficult to perceive may have been omitted.
Referring now to
Further referring to
Encoder 105 may include, without limitation, an inference with region extractor 110, a region transformer and packer 115, a packed picture converter and shifter 120, and/or an adaptive video encoder 125.
The Packed Picture Converter and Shifter 120 processes the packed image so that further redundant information may be removed before encoding. Examples of conversion are conversions of color space, (e.g. converting from the RGB to grayscale), quantization of the pixel values (e.g. reducing the range of represented pixel values, and thus reducing the contrast), and other conversions that remove redundancy in the sense of machine model. Shifting entails the reduction of the range of represented pixel values by direct right-shift operation (e.g. right-shifting the pixel values by 1 is equivalent to dividing all values by 2). Both conversions and shifting processes are reversed on the decoder side by block 140, using the inverse mathematical operations to the ones used in 120.
Further referring to
Still referring to
With continued reference to
Given a frame of a video or an image, effective compression of such media can be achieved by detecting and extracting its important regions and packing them into a single frame. Simultaneously, the system discards any detected regions that are not of interest. These packed frames serve as input to an encoder to produce a compressed bitstream. The produced bitstream 155 contains the encoded packed regions along with parameters needed to reconstruct and reposition each region in the decoded frame. A machine task system 150 can perform tasks such as designated computer vision related functions on the reconstructed video frames.
Such a video compression system can be improved by enabling adaptive selection of region detection methods. Adaptively selecting which encoder-side region detection system to use is beneficial in supporting the endpoint target machine task.
The decoder 236 includes a video decoder 240 which receives the compressed bitstream 232, region unpacking module 244, and region parameters 248 which generate unpacked reconstructed video frames 252 for the machine task system 256.
Significant frame or image regions are identified using the detection system 212 which produces coordinates of discovered objects. Resulting coordinates are used to determine regions for packing and enable the identification of pixels deemed as unimportant to the detection module. Such unimportant regions may be discarded and need not be used in packing.
Improvements to the region detection module 212 within the compression pipeline aim to better support target machine task performance. The adaptive selection methodology described herein provides that the encoder-side detection algorithm may be chosen based on specified characteristics of the endpoint evaluation network. For example, a neural network can be chosen that matches a neural network with similar characteristics used by the machine, e.g., a convolutional neural network with similar number of layers and input and output dimensions. In some examples an identical algorithm can be used. If the information about the detection algorithm that the machine 150 uses is not available, or does not contain detailed description, or the algorithm itself is not available for implementation on the encoder side, a similar algorithm can be used. In some cases, a similar but more recent algorithm can be used as a substitute on the encoder side to allow faster operation.
It will be appreciated by those skilled in the art that in applying the proposed methods, one particular detection network may be preferred over others based on particular target machine task. Returning to
The selected detection method along with the information that characterizes the detection method may be included in the output bitstream 232. Such information can be signaled in the bitstream header or provided as supplemental information in the bitstream. In some embodiments, this can be included in the sequence parameter set (“SPS”) data that usually remains unchanged for a sequence of frames or included in picture parameter set (“PPS”) data that could change over frames. Detection method information include detection method use, version number of the detection method, training data used in the detection method, performance parameters such as minimum and maximum detection confidence for detection in a frame, object classes detected in a frame. Detection confidence for each object class may also be included in the bitstream. Other parameters that characterize detection performance can be determined and included. At the decoder, the model parameters extracted from the bitstream may be used to select or adapt the machine/algorithm used for the machine task.
The following is a description of exemplary object detection semantics that can be encoded in the bitstream
object_detector_ID—ID of the object detector. This ID may be from a known detector registration authority or may be configured and agreed upon between the encoding and decoding systems.
object_detector_version—version of the object detector. Object detector version may be used to identify how a specific detector is trained. Additional information such as number of classes the detector can handle can be obtained based on the version number. This bitstream field can be extended include a list of classes the detection can detect.
object_detector_name_length—number of bytes used for the object detector name
object_detector_name [object_detector_name_length]—name of the object detector. This is usually a displayable string.
object_classes_detected—number of object classes detected in this frame.
min_detection_condfidence—confidence described as a number between 0 and 100. 100 is 100% confidence and 0 is 0% confidence.
max_detection_condfidence—confidence described as a number between 0 and 100. 100 is 100% confidence and 0 is 0% confidence.
object_class_name_length—number of bytes used for the object class name
object_detector_information_present—a one bit field, when set to 1, signals the presence of object_detector_information in the sequence parameter set (“SPS”).
object_detector_information_present—a one bit field, when set to 1, signals the presence of object_detector_information in the picture parameter set (“PPS”).
Similarly, such object detector information may be extended or reduced and signaled in other places in a video bitstream such as the slice header. In some cases, such object detection information can be signaled as supplementary enhancement information (SEI) data that may be associated with the frame and signaled in a separate information packet that is not included in the video bitstream.
The selected method for object detection along with any specification information, serves as input to the region detector module 212. The signaled method from machine system 256 along with any selection parameters 264 are used to perform inference and identify regions in 212. Additional region extraction methodologies may be paired with the proposed adaptive system. These steps may be used to isolate the best possible combinations of region detection predictions, derived from specified characteristics. That is, based on the detection network chosen, different thresholds for box selection may be applied based on the characteristics of the network. This includes thresholding of box confidence to keep high/low/all confidence predictions or additionally using class-based methods to select regions based on its inferred category from the network.
Table 1 shows the difference in performance between the previously mentioned RetinaNet and Yolov7 detection networks. From the table, it may be concluded that such a region packing system may be influenced by the inference predictions derived from 212. Thus, the inclusion of selection module 260 benefits the system as it allows performance to be improved through adaptive selection of detection methods.
The resulting coordinates from region detection system 212 serve as input to region extraction system 216. The extraction module 216 extracts the significant image regions and prepares the coordinates for the packing module 220. The packing module 220 receives the extracted region and packs them tightly into a single frame. The module additionally outputs packing parameters that will be signaled later in bitstream 232.
Packed object frames are processed through video encoder 228 to produce a compressed bitstream 232. The compressed bitstream includes the encoded packed regions along with parameters 224 needed to reconstruct and reposition each region in the decoded frame. Video encoder 228 can take the form of any known advanced video encoder known for use in encoding standards such as HEVC, AV1, and VVC, or variations on such known encoders. Optionally, any detection thresholds applied along with the network selected by the detection module may be signaled in the 232 bitstream for use in 236 decoder side reconstruction.
The compressed bitstream 232 is decoded using video decoder 240 to produce a packed region frame along with its signaled region information. Video decoder 240 will generally take the form which is complimentary to the selected video encoder 228 and can take the form of any known advanced video decoder known for use in conventional codec standards such as HEVC, AV1, and VVC, or variations on such known standards. The signaled region information includes parameters needed for reconstruction of the frame and may incorporate the signaled detection thresholds and methods used for each of the regions applied in the encoder 208.
Region parameter module 248 provides the decoded region parameters to unpack the region frames via region unpacking module 244. During region unpacking each box is returned to its position within the context of the original video frame. The resulting unpacked frame only includes the significant regions determined by region detection system 212 and does not include the discarded pixels. These unpacked regions include the predictions made by the adaptively selected detection network in encoder 208.
The unpacked and reconstructed video frame 252 is used as input to machine task system 256 which performs a specified machine task such as computer vision related functions. Machine task performance on the regions selected by the 260-selection module may be analyzed to determine optimal box selection methods and inference thresholds on a case-by-case scenario. Optimized region detection parameters 264 may be altered and signaled to the encoder side pipeline in order to more effectively select region detection methods.
Significant frame or image regions are identified using the encoder side region detector module 412, which produces coordinates of discovered objects, typically in form of rectangular bounding boxes around the detected objects. Saliency based detection methods using video motion may also be employed to identify important regions. For example, uniform motion that is detected across consecutive frames can be designated as salient. In another example, any motion that persists over long periods of time (for example 100 frames) in a continuous trajectory can be designated as salient. In another example, motion that is detected at the same coordinates at which the objects are detected can be designated as salient. Spatial coordinates of salient regions can be used to determine regions for packing and enable the identification of pixels deemed as unimportant to the detection module. Such unimportant regions may be discarded and not used in packing.
The discovered objects and regions from region detector 412 may be additionally processed prior to the extraction and packing stage in order to enable more efficient compression and/or endpoint machine task performance. Detected object boundaries may be extended using region padding module 460. Padding is the expansion of the bounding box beyond the minimum area detected by one or more pixels in one or more directions or dimensions. Padding may be applied uniformly around a bounding box, e.g., the same number of pixels in each dimension, or dynamically where the padding varies in different dimensions.
Padding size may be determined based on internal decisions made by the module either with or without receiving adaptive padding parameters 464. This can include applying padding based on object class, object size, and/or object confidence. For example, the decision to enable padding and the amount of padding can be calculated using optimized search in the inference space which may compare the detection accuracy for boxes with or without padding, and with various amounts of padding applied. It will be appreciated that not all object classes and instances need to be evaluated. Representative samples of classes with similar characteristics such as size, orientation and color, can be used to assign padding decision for all the objects that are represented by the exemplary object.
Application of the padding module 460 can provide better context for post-compression machine task evaluation by the machine system at the decoder site. Each endpoint machine system may have varying sensitivities to background pixel information; thus, the extension of initially predicted regions can help to increase evaluation accuracy. The object padding described herein is preferably performed by expanding each dimension with respect to overall image boundaries in order to include additional pixels for context. Such additional pixels are ones which reside outside of the original coordinates output by the region detection module.
Prediction box extension may be performed using a fixed padding amount or can adaptively be determined on a box-by-box basis. Adaptive padding can be performed using the characteristics of the detected object including object class, inference confidence/score, and/or object size. Additionally, padding may be skipped based on the type of region detection method selected or using a similar box-by-box basis. Padding size may be determined with supplementary input adaptive padding parameters 464 based on machine task feedback.
The resulting extended/padded coordinates from padding module 460 serve as input to region extraction module 416. The region extraction module 416 extracts the image regions and prepares the coordinates for the packing module 420. The region packing module 420 takes the extracted region and packs them tightly into a single frame. The packing module 420 additionally outputs packing parameters that will be signaled later in bitstream 432.
Packed object frames are processed through video encoder 428 to produce a compressed bitstream. The video encoder 428 can take the form of any known advanced video encoder such as for use in encoding to standards such as HEVC, AV1, and VVC, or variations on such known standards for machine use. The compressed bitstream includes the encoded packed regions along with parameters 424 needed to reconstruct and reposition each region in the decoded frame. Optionally, the padding size used for each of the boxes may be signaled in the bitstream for use in decoder side reconstruction and for data collection. Such signaling may include signaling within a header, an SPS, PPS, or auxiliary signaling such as supplementary enhancement information (SEI).
While the various functional modules in encoder 408 have been described as distinct functional modules, it will be appreciated that these functional modules can be further divided into sub-modules or functionality combined without departing from the intent of the embodiments described herein.
The structure and operation of a decoder for the bitstream with region padding is substantially the same as illustrated in
With reference to
The unpacked and reconstructed video frame 252 is used as input to machine task system 256 which may perform machine tasks, such as computer vision related functions. Machine task performance on the padded regions may be analyzed and used to determine optimal padding amounts on a box-by-box or object-by object basis. Optimized padding parameters 464 may be updated and signaled to the encoder side pipeline in order to effectively extend object boundaries.
The merge split region extraction module 656 receives object coordinates from region detection system 612. These coordinates may contain multiple predictions within the same region and/or may consist of overlapping regions with redundant pixels. The merge/split extraction module 656, which creates new region boxes based on the given predictions, is further illustrated in
Referring to
The merge split region extractor module 756 examines which regions are close in proximity and identifies them as candidates for further processing. The decision to merge and the decision to split is made primarily based on rate-saving considerations, and secondarily on the consideration of the expected detection accuracy on the machine. By merging region predictions, more compact and continuous spatial structures may be obtained which may be more amenable to predictive hybrid video and image coding. By splitting region predictions, smaller geometric structures, such as smaller rectangles are obtained which can potentially be packed in a spatially more optimal way. For example, in some cases, the machine detection performance can be improved if the bits that are saved by improved object packing are spent on more accurate texture representation (for example preserving more of the high frequency components).
Different criteria may be used to determine appropriate scenarios for merging and splitting actions. For example, inference prediction boxes which overlap beyond a determined threshold may be merged to form a single new region box. Here, the previous inference boxes may be discarded and replaced with the unified new box. Splitting may be performed on boxes that overlap with a smaller threshold. In this case, the split boxes are preserved while the original inference boxes may be discarded.
One or more region detections can be grouped into a local cluster. Machine learning methods may also be used to determine the optimal number of clusters based on given inference parameters and image characteristics. For example, each region detection can be designated as a single instance in the k-means clustering algorithm-a cost function that minimizes the bit budget to encode the frames, and/or a cost function that maximizes detection accuracy in the inference model, can be used as an objective function of the k-means algorithm.
Clusters of boxes may be merged to form regions which contain multiple inference objects. Additionally, a variety of splitting methods may be used on a case-by-case basis. This includes horizontal splitting, vertical splitting, and/or some combination of both. For instance, overlapping inference boxes which contain vertically oriented objects may be split vertically whereas horizontal objects may be split horizontally.
Merging inference boxes may be beneficial to the encoding and machine task processes. Merged boxes may help to reduce the number of individual boxes sent to the packing system and consequently enable improved spatial preservation of image objects. Alternatively, splitting overlapping boxes helps to reduce occurrences of duplicate pixels propagated throughout the pipeline. The region extraction step performed by merge split region extractor 656 may include additional pixels outside of what was determined to be significant by the region detection module 612. It may additionally change which pixels are to be discarded. The system outputs new region coordinates 720.
The newly identified region box coordinates 720, returned from the merge split extraction module 656, serve as input to the region packing system 616. The region packing module 616 extracts the significant image regions and packs them tightly into a single frame. The region packing module produces packing parameters that will be signaled later in the encoded bitstream 628.
Packed object frames which contain the processed regions from 656 are input to video encoder 624 which produces a compressed bitstream 628. The compressed bitstream includes the encoded packed regions along with parameters 620 needed to reconstruct and reposition each region in the decoded frame. Optionally, original region coordinates (i.e., those derived from inferences from region detector 612, prior to merge split region extraction 656) may be signaled in bitstream for decoder side usage.
The compressed bitstream 628 is decoded using video decoder 236, substantially described in
The decoded region parameters 248 are used by region unpacking module 244 to unpack the packed region frames. Each box is returned to its position within the context of the original video frame. The resulting unpacked frame only includes the significant regions determined by the region detection system 112 after merge split extractor module 656 and preferably does not include the discarded pixels.
The unpacked and reconstructed video frame 252 is used as input to machine task system 256 which may perform machine tasks, such as computer vision related functions. Machine task performance on the regions determined by the merge split extraction module 656 may be analyzed to determine optimal extraction actions. Optimized region extraction parameters 660 may be updated and signaled to the encoder side pipeline in order to effectively merge and split the inference boxes to identify regions.
Some embodiments may include non-transitory computer program products (i.e., physically embodied computer program products) that store instructions, which when executed by one or more data processors of one or more computing systems, cause at least one data processor to perform operations herein.
Embodiments may include circuitry configured to implement any operations as described above in any embodiment, in any order and with any degree of repetition. For instance, modules, such as encoder or decoder, may be configured to perform a single step or sequence repeatedly until a desired or commanded outcome is achieved; repetition of a step or a sequence of steps may be performed iteratively and/or recursively using outputs of previous repetitions as inputs to subsequent repetitions, aggregating inputs and/or outputs of repetitions to produce an aggregate result, reduction or decrement of one or more variables such as global variables, and/or division of a larger processing task into a set of iteratively addressed smaller processing tasks. Encoders and decoders described herein may perform any step or sequence of steps as described in this disclosure in parallel, such as simultaneously and/or substantially simultaneously performing a step two or more times using two or more parallel threads, processor cores, or the like; division of tasks between parallel threads and/or processes may be performed according to any protocol suitable for division of tasks between iterations. Persons skilled in the art, upon reviewing the entirety of this disclosure, will be aware of various ways in which steps, sequences of steps, processing tasks, and/or data may be subdivided, shared, or otherwise dealt with using iteration, recursion, and/or parallel processing.
Non-transitory computer program products (i.e., physically embodied computer program products) may store instructions, which when executed by one or more data processors of one or more computing systems, causes at least one data processor to perform operations, and/or steps thereof described in this disclosure, including without limitation any operations described above and/or any operations decoder and/or encoder may be configured to perform. Similarly, computer systems are also described that may include one or more data processors and memory coupled to the one or more data processors. The memory may temporarily or permanently store instructions that cause at least one processor to perform one or more of the operations described herein. In addition, methods can be implemented by one or more data processors either within a single computing system or distributed among two or more computing systems. Such computing systems can be connected and can exchange data and/or commands or other instructions or the like via one or more connections, including a connection over a network (e.g. the Internet, a wireless wide area network, a local area network, a wide area network, a wired network, or the like), via a direct connection between one or more of the multiple computing systems, or the like.
It is to be noted that any one or more of the aspects and embodiments described herein may be conveniently implemented using one or more machines (e.g., one or more computing devices that are utilized as a user computing device for an electronic document, one or more server devices, such as a document server, etc.) programmed according to the teachings of the present specification, as will be apparent to those of ordinary skill in the computer art. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those of ordinary skill in the software art. Aspects and implementations discussed above employing software and/or software modules may also include appropriate hardware for assisting in the implementation of the machine executable instructions of the software and/or software module.
Such software may be a computer program product that employs a machine-readable storage medium. A machine-readable storage medium may be any medium that is capable of storing and/or encoding a sequence of instructions for execution by a machine (e.g., a computing device) and that causes the machine to perform any one of the methodologies and/or embodiments described herein. Examples of a machine-readable storage medium include, but are not limited to, a magnetic disk, an optical disc (e.g., CD, CD-R, DVD, DVD-R, etc.), a magneto-optical disk, a read-only memory “ROM” device, a random-access memory “RAM” device, a magnetic card, an optical card, a solid-state memory device, an EPROM, an EEPROM, and any combinations thereof. A machine-readable medium, as used herein, is intended to include a single medium as well as a collection of physically separate media, such as, for example, a collection of compact discs or one or more hard disk drives in combination with a computer memory. As used herein, a machine-readable storage medium does not include transitory forms of signal transmission.
Such software may also include information (e.g., data) carried as a data signal on a data carrier, such as a carrier wave. For example, machine-executable information may be included as a data-carrying signal embodied in a data carrier in which the signal encodes a sequence of instruction, or portion thereof, for execution by a machine (e.g., a computing device) and any related information (e.g., data structures and data) that causes the machine to perform any one of the methodologies and/or embodiments described herein.
Examples of a computing device include, but are not limited to, an electronic book reading device, a computer workstation, a terminal computer, a server computer, a handheld device (e.g., a tablet computer, a smartphone, etc.), a web appliance, a network router, a network switch, a network bridge, any machine capable of executing a sequence of instructions that specify an action to be taken by that machine, and any combinations thereof. In one example, a computing device may include and/or be included in a kiosk.
The foregoing has been a detailed description of illustrative embodiments of the invention. Various modifications and additions can be made without departing from the spirit and scope of this invention. Features of each of the various embodiments described above may be combined with features of other described embodiments as appropriate in order to provide a multiplicity of feature combinations in associated new embodiments. Furthermore, while the foregoing describes a number of separate embodiments, what has been described herein is merely illustrative of the application of the principles of the present invention. Additionally, although particular methods herein may be illustrated and/or described as being performed in a specific order, the ordering is highly variable within ordinary skill to achieve methods, systems, and software according to the present disclosure. Accordingly, this description is meant to be taken only by way of example, and not to otherwise limit the scope of this invention.
Exemplary embodiments have been disclosed above and illustrated in the accompanying drawings. It will be understood by those skilled in the art that various changes, omissions and additions may be made to that which is specifically disclosed herein without departing from the spirit and scope of the present invention.
The present application is a continuation of international application PCT/US23/33662, filed on Sep. 26, 2023, and entitled Systems and Methods for Region Detection and Region Packing in Video Coding and Decoding for Machines, which international application claims the benefit of priority to U.S. Provisional Application Ser. No. 63/409,843 filed on Sep. 26, 2022, and entitled “System and Method for Adaptive Region Detection and Region Packing,” and also claims the benefit of priority of U.S. Provisional Application Ser. No. 63/409,847 filed on Sep. 26, 2022, and entitled “System and Method for Extending Predicted Object Boundaries in a Video Packing System,” and further claims the benefit of priority of U.S. Provisional Application Ser. No. 63/409,851 filed on Sep. 26, 2022, and entitled “Systems and Methods for Merge and Split Region Extraction in Video Region Packing,” the disclosures of each which are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
63409843 | Sep 2022 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2023/033662 | Sep 2023 | WO |
Child | 19089533 | US |