This application is based on and claims the benefit of priority to U.S. Provisional Patent Application No. 63/467,569 filed on May 18, 2023 and entitled “Methods of understanding Indoor Environments using Panel Representation in 360 Panorama Images,” which is herein incorporated by reference in its entirety.
This disclosure relates generally to image processing and specifically to processing panorama images using neural networks to generate depth maps, layouts, semantic maps or the like with reduced distortion and improved continuity.
Semantic content extraction/prediction and object recognition from digital images using computer vision techniques is essential for many autonomous applications. Panorama images generated in various formats may differ from typical perspective 2D images in various geometric and other properties. Computer vision techniques and architecture for processing panorama images may be designed to explore such properties in order to improve prediction accuracy and reduce negative effects of panoramic distortions.
This disclosure relates generally to image processing and specifically to processing panorama images using neural networks to generate depth maps, layouts, semantic maps or the like with reduced distortion and improved continuity. Methods and systems are described for generating such maps by leveraging several essential properties of these panorama images and by using a panorama panel representation and a neural network framework. A panel geometry embedding network is incorporated for encoding both the local and global geometric features of the panels in order to reduce negative impact of panoramic distortion. A local-to-global transformer network is also incorporated for capturing geometric context and aggregating local information within a panel and panel-wise global context.
In some example implementations, a method for processing a panorama image dataset by a computing circuitry is disclosed. The method may include generating a plurality of data panels from the panorama image dataset; executing a first neural network to process the plurality of data panels to generate a set of embeddings representing geometric features of the plurality of data panels; executing a second neural network to process the plurality of data panels and the set of embeddings to generate a plurality of mapping panels; and fusing the plurality of mapping panels into a mapping dataset of the panorama image dataset.
In the example implementation above, the mapping dataset comprises one of a depth map, a layout map, or a semantic map corresponding to the panorama image dataset.
In any one of the example implementations above, the panorama image dataset comprises a data array in two dimensions; and the each of the plurality of data panels comprises a subarray of the data array in an entirety of a first dimension of the two dimensions and a segment of a second dimension of the two dimensions.
In any one of the example implementations above, the first dimension represents a gravitational direction of the panorama image dataset and the second dimension represents a horizontal direction of the panorama image dataset.
In any one of the example implementations above, the plurality of data panels are generated from the panorama image dataset consecutively using a window having a length of the entirety of the first dimension in the first dimension and a predefined width in the second dimension, the window sliding along the second dimension by a predefined stride.
In any one of the example implementations above, the window continuously slides across from one edge of the panorama image dataset in the second dimension into another edge of the panorama image dataset in the second dimension.
In any one of the example implementations above, the first neural network is configured to encode local and global geometric features of the plurality of data panels to reduce impact of geometric distortions in the panorama image dataset and to enhance preservation of geometric continuity across the plurality of mapping panels.
In any one of the example implementations above, the first neural network comprises a multilayer perceptron (MLP) network LP for processing geometric information extracted from the plurality of data panels to generate the set of embeddings comprising a set of global geometric features and a set of local geometric features of the plurality of the data panels.
In any one of the example implementations above, the second neural network is configured to process the plurality of data panels and reduce geometric distortions in the panorama image dataset based on the set of embeddings.
In any one of the example implementations above, the second neural network comprises: a down-sampling network; a transformer network; and an up-sampling network.
In the example implementations above, the down-sampling network is configured for processing the plurality of data panels and the set of embeddings to generate a series of down-sampled features with decreasing resolutions; the transformer network is configured for processing lowest resolution down-sampled features to generate transformed low-resolution features; and the up-sampling network is configured for processing the transformed low-resolution features and the series of down-sampled features to generate the plurality of mapping panels.
In any one of the example implementations above, the transformer network comprises a feature processor.
In any one of the example implementations above, the feature processor is configured to increase continuity of the geometric features.
In any one of the example implementations above, the feature processor is configured to aggregate local information within each of the plurality of data panels to capture panel-wise context.
Aspects of the disclosure also provide an electronic device or apparatus including a circuitry or processor configured to carry out any of the method implementations above.
Aspects of the disclosure also provide non-transitory computer-readable mediums storing instructions which when executed by an electronic device, cause the electronic device to perform any one of the method implementations above.
Further features, the nature, and various advantages of the disclosed subject matter will be more apparent from the following detailed description and the accompanying drawings in which:
Throughout the specification and claims, terms may have nuanced meanings suggested or implied in context beyond an explicitly stated meaning. The phrase “in one embodiment/implementation” or “in some embodiments/implementations” as used herein does not necessarily refer to the same embodiment/implementation and the phrase “in another embodiment/implementation” or “in other embodiments” as used herein does not necessarily refer to a different embodiment. It is intended, for example, that claimed subject matter includes combinations of exemplary embodiments/implementations in whole or in part.
In general, terminology may be understood at least in part from usage in context. For example, terms, such as “and”, “or”, or “and/or,” as used herein may include a variety of context-dependent meanings. Typically, “or” if used to associate a list, such as A, B or C, is intended to mean A, B, and C, here used in the inclusive sense, as well as A, B or C, here used in the exclusive sense. In addition, the term “one or more”, “at least one”, “a”, “an”, or “the” as used herein, depending at least in part upon context, may be used in a singular sense or plural sense. In addition, the term “based on” or “determined by” may be understood as not necessarily intended to convey an exclusive set of factors and may, instead, allow for existence of additional factors not necessarily expressly described, again, depending at least in part on context.
The disclosure below relates generally to image processing and specifically to processing panorama images using neural networks to generate depth maps, layouts, semantic maps or the like with reduced distortion and improved continuity. Methods and systems are described for generating such maps by leveraging several essential properties of these panorama images and by using a panorama panel representation and a neural network framework. A panel geometry embedding network is incorporated for encoding both the local and global geometric features of the panels in order to reduce negative impact of panoramic distortion. A local-to-global transformer network is also incorporated for capturing geometric context and aggregating local information within a panel and panel-wise global context.
In comparison to regular perspective 2D images taken from, for example, a normal camera with a fixed limited viewing angles, a panorama image offers a wide, e.g., 360 degrees, field of view (FoV) around a particular viewing axis. A panorama image may be taken by a specialized camera with a wide/multiple-lens optical system or may be created by stitching together a plurality of overlapping regular images taken from multiple perspective angles around the viewing axis. A full panorama image may be referred to as a 360 panorama and may be represented in digital form using one of several example data formats. One example format may be based on an equirectangular projection (ERP) representation. A panorama image in the ERP format may be represented by a dataset in a 2D pixel array, similar to a regular image, but may include full 360 panorama information with respect to the viewing axis. Each pixel of the panorama image of a 360 scene in the ERP formatted dataset contains perspective imaging information (e.g., RGB or YUV values) corresponding to a perspective viewing solid-angle unit in the 360 scene.
An ERP representation of an example 360 panorama image is illustrated in
The 2D ERP representation above for a 360 panorama image is, by its nature, continuous in the panning direction. As such, a continuity is expected to be preserved across the vertical edges 106 and 108 of the 2D data array of the ERP representation of
A panorama image above may be generated from any 360-degree scene. For example, such 360-degree scene may be in an indoor or outdoor environment. A realistic or natural scene of any of such environments may be characterized by an object alignment dictated by gravity, which, for clarity and illustration purposes in this disclosure, is taken as the vertical direction above in
For some applications and image analytics tasks, additional information may be extracted or generated from an input image dataset based on computer vision techniques and modeling. Such information extraction or generation may include but is not limited to image classification, image segmentation, object identification and recognition, depth estimation, object layout generation, and the like. Just as imaging processing of regular 2D images, such information may also be extracted or generated from a panorama image using computer vision modeling. An input for such information extraction or generation, for example, may be the ERP dataset of the panorama image described above with respect to
The extracted or generated information from a panorama image above may be of various types and complexity. For example, an output of a classification model may be simple and compact. On the other hand, an output dataset representing extracted or generated information, such as depths estimate information at each pixel, object layout, and semantic information of the pixels of the panorama image may be more complex. Such information may be used to construct, for example, depth maps, semantic maps, and layout.
As an example,
For example, panorama depth prediction may be generated via computer vision modeling to determine 3D positions of objects recognized in the panorama image. For another example, panorama layout prediction may be generated via computer vision modeling to acquire a layout structure of the captured layout embedded in the panorama image of a 3D scene. For another example, panorama semantic segmentation may represent another important dense prediction task to generate pixel semantic information for understanding the content of the panorama image.
An example computer vision model for generating the mapping datasets above may include, for example, various neural networks (e.g., convolutional neural networks, CNNs) and/or other data analytics algorithms or components. The various example maps above may be crucial for many practical applications. In the indoor environment, such practically applications may include but are not limited to room reconstruction, robot navigation, and virtual reality applications. While early methods focus on modeling indoor scenes using perspective images. with the development of the CNNs and omnidirectional photography, panorama images has become candidates for the mapping datasets above. Compared to using traditional perspective images, panorama images have a larger FoV and provide geometric context, particularly of the indoor environment, that can be learned via training, in a continuous manner.
While the ERP format provides a convenient representation of a panorama image, modeling a holistic panorama scene in ERP format or representation by computer vision may be challenging. For example, as described above, the ERP distortion increases when the ERP pixels are close to the zenith or nadir of the panorama image, which may decrease the power of convolutional network structures that may be included in a computer vision model designed for distortion-free perspective images.
In some example implementation for negating the effects of ERP distortion above, a panorama may be first decomposed into perspective patches of, e.g., tangent images, so that the computer vision model can be configured to extract image information and features at a patch level where the relative distortion is less disparate across a patch (e.g., relatively distortion within each patch). However, partitioning a typical gravity aligned panorama image into discontinuous patches may break local continuity of gravity-aligned scenes and objects, thereby still limiting the performance of typical distortion free modeling.
The example embodiments in the disclosure below further provide a computer vision system involving a partitioning method and a neural network architecture for processing panorama images to extract or generate imaging information for understanding the content included in the panorama image that can negate the effect of the ERP distortion and at the same time maintain continuity between image partitions. These embodiments, while being particularly adapted and useful for information extraction and understanding of indoor panorama scene including gravity-aligned indoor objects and generated in the ERP representation, may also be applicable to analyzing panorama images in other environments and in representations other than the ERP format. The various neural networks in these embodiments may be pretrained using training panorama images, and may be retrained and updated as more training datasets become available.
Specifically, an input ERP panorama image may be partitioned in a continuous manner and the partitions may be processed by a neural network structure referred to as a PanelNet, which may be designed, configured, and trained as being capable of tackling major panorama understanding tasks such as depth estimation, semantic segmentation and layout prediction. In some example implementations, only the last one or a few layers of a decoder in the PanelNet may need to be slightly modified in order to accommodate the different extraction/generation tasks above.
In some example implementations that are further detailed below, the PanelNet may be based on at least two essential properties of the ERP presentation of a panorama image: (1) the ERP representation of the panorama image is continuous and seamless in the horizontal direction, as described above in relation to
In some example implementations of
As illustrated in
In some example implementations, the geometry embedding generation process may be configured to combine geometry feature of the EPR panels with image feature together and to thus reduce the negative impact of the ERP distortion. A pixel Pe(xe, ye) located in an ERP (where xe, and ye represent the horizontal and vertical coordinates of the pixel in the ERP representation, respectively) would correspond to an angular position including φ and θ representing the azimuth angle and the polar angle of the corresponding direction in the FoV. As such, a pixel in the EPR with pixel position xe, ye may be mapped to an angular direction of (φ, θ), which may further be converted to an absolute 3D world coordinate Ps(xs, ys, zs) on a unit sphere Ps(φ, θ) in the FOV with the following conversion relationship:
The 3D world coordinates Ps(xs, ys, zs) of all pixels in all panels as converted may then be used to generate global geometric features. Since each ERP panel above has the same distortion profile in the vertical direction as any other ERP panels (as dictated by the manner in which the ERP is partitioned into vertical panels), the relative position of each pixel to the panel where it is located is also important. In some example implementations, a relative 3D local position P (x′, y′, z′) for each pixel per ERP panel may be assigned. The global 3D world coordinates of a randomly selected ERP panel may be chosen to represent the relative 3D position of all ERP panels. Due to the vertical partitional manners for generating the ERP panels, zs would be equal to z′. As such, a final output parameter set of a point on an ERP panel from the geometric parametrization process 702 may be a combination of its local and global coordinates (xs, ys, zs, x′, y′). Such geometric parameter set for each of pixels of the ERP panels may be then input to the geometry embedding network 704 to generate the geometry embeddings 705.
As further shown in
The pixel positions and various conversions between the ERP pixel positions and the 3D world coordinates would be determined by the partitioning of the vertical ERP panels. As described above, the vertical ERP panel partitioning is determined by the width I and stride S of the sliding windows. As such, the local and global geometric features are determined and are generated together as part the geometry embeddings 705, given I and S.
The global geometric features referred to as global geometry embeddings may thus be extracted across ERP panels to record the splite panel location in the panorama. The global geometry information, for example, may include panel location information in the panorama ERP image, such as, panel center pixel location in the ERP image and the boundary range of each panel in the ERP image, the sphere geometry in corresponding sphere coordinate, etc. The global features may be used by the decoder network (described in further detail below) across the ERP panels when processing each panel, as shown in
As further shown in
In some example implementations, a 1×1 convolution layer may be applied to reduce the dimensions of the final feature map of each EPR panel to fb∈RC
As further shown in
As further shown in the example implementation of
In some examples, the transformer network 608 may be implemented as a local-to-global transformer network for performing information aggregation as described in further detail below, to particularly extract long-range dependencies in distant ERP panels.
Specifically, although partitioning the ERP into consecutive vertical panels via a sliding window as described above may help preserve the continuity of structure or objects in the panorama scene, capturing the long-range dependencies is still crucial. Since the ERP representation is seamless in the horizontal direction, two distant vertical ERP panels on a panorama may have a closer realistic distance and thus correlated. Such correlation may not be easily captured. To address this problem and further improve local information aggregation, the local-to-global transformer network 608 may be designed and configured to include at least two major important components: (1) Window Blocks to enhance the geometry relations within a local panel, and (2) Panel Blocks for capturing long-range context among panels. An example local-to-global transformer network is shown as 800 in
In some example implementations, for each ERP panel, the input feature map fb∈RC
In Panel Blocks, global information may be aggregated via panel-wise multi-head self-attention. The feature maps of all panels may be compressed to N 1-D feature vectors fp∈RN×D and then used as tokens in the Panel Blocks. Similar to Window Blocks, a learnable positional embedding Ep∈RN×D may be added to the tokens to retain patch-wise positional information.
In some example implementations, and as shown in the example local-to-global transformer network 800 of
where l is the block number of each stage, and {circumflex over (z)}l and zl represent the output feature map of the Window/Panel—MSA and FFN. To aggregate the features from local to global, the Window Blocks may be stacked according to the window size from small to high successively. The Panel Blocks may be stacked after the Window Blocks, as shown in
As illustrated in
In some example implementations, for each decoder layer, its feature map may be concatenated with the feature map generated by a corresponding layer or stage in the encoder 602, as indicated by the vertical arrows from the encoder 602 to decoder 604 in
The output from the decoder 604 may represent panel-wise mapping datasets, and may be referred to as mapping panels. The mapping panels, at each pixel, may contain predicted information such as depth information, layout information, or semantic information, rather than the original RBG or YUV image information. The mapping panels may then be fused or merged together to form the overall mapping dataset(s) corresponding to the input ERP.
In some example implementations, a learnable confidence map may be predicted by the fusion network 608 to improve the final merged or fused result. For the final merge, an average of the prediction of all mapping panels may be taken. In some example implementations, by slightly modifying the network structure, the model of
The model of
where e represents an error term and an error threshold c is used to determine where a switch from L1 loss (e.g., least absolute deviation) to L2 loss (e.g., least squire errors) occurs. The combination of L1 loss for horizontal depth and room height, normal loss and normal gradient loss may be optimized for training the model of
For semantic segmentation prediction, in some example implementations, a loss function based on Cross-Entropy Loss with class-wise weights may be used.
For testing the PanelNet implementations above, a real-world dataset consisting of 1,413 panoramas collected in 6 large-scale indoor areas, referred to as Stanford2D3D is used. For depth estimation, a split of the dataset into area 1, area 2, area 3, area 4, area 6 for training and area 5 for testing is adopted. For semantic segmentation, a 3-fold split of the data set is adopted for training, evaluation, and testing. Example resolutions used for depth estimation and semantic segmentation are 512×1024 and 256×512, respectively.
In addition, datasets referred to as PanoContext and the extended Stanford2D3D are also used for training and testing of the PanelNet implantations above. These two datasets include two cuboid room layout datasets. PanoContext, for example, contains 514 annotated cuboid room layouts collected from the SunCG dataset. Specifically, 571 panoramas were collected from Stanford2D3D and annotated with room layouts. The input resolution of both datasets is 512×1024. A same example split above for training and testing is adopted for these datasets.
Further, dataset referred to as Matterport3D may also be used. This dataset includes a large-scale RGB-D dataset that contains 10,800 panoramic images collected in 90 indoor scenes. This dataset may be particularly used for our depth estimation training evaluation and testing. The data set may be split into 7829 panoramas from 61 houses for training and the rest for testing. The resolution of 512×1024 may be used for training and testing.
Further, for depth estimation, the performance of the PanelNet implementations may be evaluated using standard depth estimation metrics, including Mean Relative Error (MRE), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), log-based Root Mean Square Error (RMSE(log)) and threshold-based precision, e.g., δ1, δ2 and δ3. For semantic segmentation, performance of the PanelNet implementations is evaluated using standard semantic segmentation metrics including class-wise mIoU and class-wise mAcc. For layout prediction, the performance of the PanelNet implementations is evaluated using 3D Intersection over Union (3DIoU).
The PanelNet model(s) may be implemented in PyTorch and trained on, for example, eight NVIDIA GTX 1080 Ti GPUs with a batch size of 16. The network is trained using Adam optimizer, and the initial learning rate may be set to 0.0001. For the depth estimation, the network/model may be trained on Stanford2D3D dataset above for, e.g., 100 epochs, and Matterport3D dataset above for, e.g., 60 epochs. The network/model may be trained with 200epochs on semantic segmentation datasets above, and 1000 epochs on layout prediction datasets above. Random flipping, random horizontal rotation and random gamma augmentation may be further adopted for data augmentation. Example default stride and interval for depth estimation of 32 and 128 respectively may be used while the stride may be set to, for example, 16, for semantic segmentation.
The method for PanelNet above may be evaluated against state-of-the-art panorama depth estimation algorithms in Table 1 below. The results may be averaged by the best results from three training sessions. The results of SliceNet on Stanford2D3D were re-produced by the fixed metrics and a 2-iteration Omnifusion model is retrained and reevaluated on the corresponding Matterport3D dataset. Table 1 shows that the PanelNet model implementations outperform existing models on all metrics on both datasets.
In comparison to the PanelNet implementations, the method that directly operate on the panoramas predict continuous background while lacking object details. The fusion-based method generates sharp depth boundaries while artifacts caused by patch-wise discrepancy lead to inconsistent depth prediction which is not removable with its patch fusion module or iteration mechanism. PanelNet implementations, however, with the help of the Local-to-Global Transformer network above, preserves the geometric continuity of the room structure and shows superior performance even for some challenging scenarios. The PanelNet model is also capable of generating sharp object depth edges.
The PanelNet is further evaluated against state-of-the-art panorama semantic segmentation methods, as shown in Table 2 below. The PanelNet model improves the mIoU metrics by 6.9% and mAcc metrics by 8.9% against, for example, the existing Ho-HoNet implementations. The PanelNet provides a strong ability to segment out objects with a smooth surface. The segmentation edges generated by the PanelNet appear natural and continuous. This is because the Local-to-Global Transformer network is capable of successfully capturing geometric context of the object. The PanelNet model is also capable of segmenting out small objects from the background. The segmentation boundaries of the ceiling and the walls generated by the PanelNet model are highly smooth, indicating the power of the panel geometry embedding network capable of learning the ERP distortion.
The PanelNet model method is further evaluated against state-of-the-art panorama layout estimation methods, as shown in Table 3 below. By adding linear layers at the end of the depth estimation network as described above, the PanelNet model achieves competitive performance against state-of-the-art methods designed specifically for layout estimation. Since the PanelNet mode is initially designed for dense prediction, it suffers an information loss in the process of upsampling and channel compression. The layout model based on the PanelNet shares the same structure with the depth estimation model before the linear layers. The PanelNet model may be activated with the weights pretrained on depth estimation datasets to reduce the training overhead. The layout prediction model based on the PanelNet has the highest performance when the stride is set at 64 and the interval is set at 128.
Ablation studies may be further performed to evaluate impact of the various elements and hyper-parameters of the PanelNet on, for example, the Stanford2D3D dataset for depth estimation, as shown in Table 4 below. The stride may be set at an example value of 32 and the interval at an example value of 128 for all networks. The baseline model with a ResNet-34 encoder and a depth decoder as illustrated above may be used. Since partitioning the entire panorama into vertical panels with overlaps greatly increase the computational complexity, the ResNet-34 rather than vision Transformers may be used as the backbones (the encoder and the decoder). As shown in Table 4, the performance improvement of adding the panel geometry embedding network to the pure CNN structure of the PanelNet may be small since the network's ability to aggregate distortion information with image features is low. However, by applying the Local-to-Global Transformer network as a feature processor, the baseline network gains a significant performance improvement on all evaluation metrics. Benefiting from the information aggregation ability of the Local-to-Global Transformer network, the panel geometry embedding network performs its ability on distortion perception to a fuller extent and improves the performance both quantitatively and qualitatively. The combination of the Local-to-Global Transformer network and panel geometry embedding network leads to the clearest object edges in depth estimation. The effect of the panel-wise relative position embedding similar to the LGT-Net is further evaluated for Panel Blocks. However, it appears that it brings minimal performance improvements on depth estimation while increasing the computational complexity.
An ablation study to may be further conducted to validate the usefulness of panel representation against, for example tangent image partitioning. Omnifusion implementations may be used as a comparison since it has a similar input format and can be trained via the same encoder-decoder CNN architecture with the PanelNet model. The comparison is shown in Table 5. As shown in Table 5, the panel representation with pure CNN architecture outperforms the original Omnifusion, which demonstrates the superiority of panel representation. The default transformer of Omnifusion may be replaced with the Local-to-Global Transformer network. However, the Local-to-Global Transformer network does not appear to bring a significant performance improvement for tangent images since the discontinuous tangent patches lower the ability of the Window Blocks to aggregate local information in the vertical direction which reduces the continuity of depth estimation for gravity-aligned objects and scenes. On the contrary, the vertical continuity is preserved within the vertical panels of the PanelNet. With the panel representation, the Local-to-Global Transformer exerts its greatest information aggregation ability.
The effect of panel size and stride of the PanelNet on the performance and speed of the model is further evaluated, as shown in Table 6. For Table 6, the FPSs are obtained by measuring the average inference time on a single NVIDIA GTX 1080Ti GPU. It is observed that for PanelNet models that have the same panel interval, i.e. sliding widow width, a smaller stride enhances the performance. For the same stride, the PanelNet models with larger panels have better performance. Theoretically, smaller strides improve performance because horizontal consistency is preserved by the more overlapping area of consecutive panels. Larger panels also lead to better performance because larger panels provide larger FoV, which contains more geometric context within a panel. However, it is observed that keeping increasing the interval may have a negative impact on performance. Specifically, the larger panel brings higher computational complexity, which forces the stride to increase to reduce the computational overhead. This makes the performance gain brought by the larger FoV being negated by the consistency loss due to fewer overlaps. To gain the best performance, the interval may be set to, for example, 128, and the stride may be set to, for example, 32.
The techniques described above, can be implemented as computer software using computer-readable instructions and physically stored in one or more computer-readable media. For example,
The computer software can be coded using any suitable machine code or computer language, that may be subject to assembly, compilation, linking, or like mechanisms to create code comprising instructions that can be executed directly, or through interpretation, micro-code execution, and the like, by one or more computer central processing units (CPUs), Graphics Processing Units (GPUs), and the like.
The instructions can be executed on various types of computers or components thereof, including, for example, personal computers, tablet computers, servers, smartphones, gaming devices, internet of things devices, and the like.
The components shown in
Computer system (1000) may include certain human interface input devices. Input human interface devices may include one or more of (only one of each depicted): keyboard (1001), mouse (1002), trackpad (1003), touch screen (1010), data-glove (not shown), joystick (1005), microphone (1006), scanner (1007), camera (1008).
Computer system (1000) may also include certain human interface output devices. Such human interface output devices may be stimulating the senses of one or more human users through, for example, tactile output, sound, light, and smell/taste. Such human interface output devices may include tactile output devices (for example tactile feedback by the touch-screen (1010), data-glove (not shown), or joystick (1005), but there can also be tactile feedback devices that do not serve as input devices), audio output devices (such as: speakers (1009), headphones (not depicted)), visual output devices (such as screens (1010) to include CRT screens, LCD screens, plasma screens, OLED screens, each with or without touch-screen input capability, each with or without tactile feedback capability-some of which may be capable to output two dimensional visual output or more than three dimensional output through means such as stereographic output; virtual-reality glasses (not depicted), holographic displays and smoke tanks (not depicted)), and printers (not depicted).
Computer system (1000) can also include human accessible storage devices and their associated media such as optical media including CD/DVD ROM/RW (1020) with CD/DVD or the like media (1021), thumb-drive (1022), removable hard drive or solid state drive (1023), legacy magnetic media such as tape and floppy disc (not depicted), specialized ROM/ASIC/PLD based devices such as security dongles (not depicted), and the like.
Those skilled in the art should also understand that term “computer readable media” as used in connection with the presently disclosed subject matter does not encompass transmission media, carrier waves, or other transitory signals.
Computer system (1000) can also include an interface (1054) to one or more communication networks (1055). Networks can for example be wireless, wireline, optical. Networks can further be local, wide-area, metropolitan, vehicular and industrial, real-time, delay-tolerant, and so on. Examples of networks include local area networks such as Ethernet, wireless LANs, cellular networks to include GSM, 3G, 4G, 5G, LTE and the like, TV wireline or wireless wide area digital networks to include cable TV, satellite TV, and terrestrial broadcast TV, vehicular and industrial to include CAN bus, and so forth.
Aforementioned human interface devices, human-accessible storage devices, and network interfaces can be attached to a core (1040) of the computer system (1000).
The core (1040) can include one or more Central Processing Units (CPU) (1041), Graphics Processing Units (GPU) (1042), specialized programmable processing units in the form of Field Programmable Gate Areas (FPGA) (1043), hardware accelerators for certain tasks (1044), graphics adapters (1050), and so forth. These devices, along with Read-only memory (ROM) (1045), Random-access memory (1046), internal mass storage such as internal non-user accessible hard drives, SSDs, and the like (1047), may be connected through a system bus (1048). In some computer systems, the system bus (1048) can be accessible in the form of one or more physical plugs to enable extensions by additional CPUs, GPU, and the like. The peripheral devices can be attached either directly to the core's system bus (1048), or through a peripheral bus (1049). In an example, the screen (1010) can be connected to the graphics adapter (1050). Architectures for a peripheral bus include PCI, USB, and the like.
The computer readable media can have computer code thereon for performing various computer-implemented operations. The media and computer code can be those specially designed and constructed for the purposes of the present disclosure, or they can be of the kind well known and available to those having skill in the computer software arts.
While this disclosure has described several exemplary embodiments, there are alterations, permutations, and various substitute equivalents, which fall within the scope of the disclosure. It will thus be appreciated that those skilled in the art will be able to devise numerous systems and methods which. although not explicitly shown or described herein, embody the principles of the disclosure and are thus within the spirit and scope thereof.
Number | Date | Country | |
---|---|---|---|
63467569 | May 2023 | US |