Documents are a popular way of storing information by businesses, governments, educational institutes etc. With the rise of personal computing, documents have transitioned from physical (e.g., paper), stored in the real world, to electronic, stored in the cloud. For example, physical copies of newly created documents are often no longer created, replaced by digital documents. Likewise, existing physical documents are increasingly converted to digital media formats like PDF. Modern document usage is no longer restricted only to reading or sharing, but is shifting to more active modes like authoring, editing styles, customizing figures and tables, among others. A key part of active usage is an advanced search mechanism. However, search functionality within documents is mostly limited to locating regions in a page containing text that matches a given textual query.
Introduced here are techniques/technologies that enable one-shot multi-modal document snippet search. A document snippet may include a portion of a document that may be characterized by text, image, spatial, and/or other features. Document snippet search allows for portions of a document with similar features (though not necessarily exactly the same content) to be identified. Embodiments perform one-shot document snippet search by extracting features corresponding to each modality from a query snippet and a target document to be searched. This may be performed using multiple encoders (e.g., a text encoder, an image encoder, a layout encoder, etc.).
Once the features have been extracted, they may be combined into co-attention and cross-attention feature sets. These may be formed by combining like features from the query snippet and the target document and combining unlike features from the query snippet and the target document. These feature sets can be used to create a feature volume from which regions of interest in the target document can be identified. These regions of interest correspond to predicted portions of the target document that match the query snippet.
Additional features and advantages of exemplary embodiments of the present disclosure will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of such exemplary embodiments.
The detailed description is described with reference to the accompanying drawings in which:
One or more embodiments of the present disclosure include a document search system which enables one-shot document snippet search of target documents. Document search in prior systems has been most often limited to identifying text that exactly matches a text query. While this kind of searching is effective for basic text editing, it cannot be used to search for any modalities other than text, such as layout, image content, etc. Other techniques, such as template matching, have attempted to provide a more intelligent search for similar content across documents. Template Matching refers to the task of detecting and localizing a given query image in a target (usually larger) image. Such techniques typically use traditional computer vision techniques like Normalized Cross Correlation (NCC) and Sum of Squares Differences (SSD) for searching.
Template matching techniques have clear limitations. For instance, they struggle when presented with a large variation in scale, occlusions, poor image quality, different lighting conditions, etc. Template matching also offers limited real-time use. The rise of deep learning has allowed researchers to develop more sophisticated searching techniques like QATM and DeepOneClass that perform matching between deep features of natural images for tasks like GPS localization. However, such techniques have not performed well when attempting to match templates within documents rather than from natural images. For example, documents may present diverse and complicated arrangement of layout, visual structures and textual content as compared to features across natural images.
Another prior technique is one shot object detection (OSOD). OSOD aims at detecting instances of novel classes (e.g., classes not seen during training) within a test image given a single example of the unseen/novel class. At a high level, most OSOD techniques perform alignment between deep features of a query (e.g., an example of a novel class) and a target image (e.g., a test image where the novel class instance is present). Such techniques have shown that the learned attention-based correlation can out-perform standard Siamese matching since they capture multi-scale context better through global and local attention. Popular OSOD techniques have been shown to perform well on natural images when class definitions are clearly specified. However, due to the complexity of document data and lack of a well-defined, yet exhaustive, set of layout patterns, it is not possible to enumerate a finite set of classes. More recently, attempts have been made to learn a hierarchical relationship (e.g., Balanced and Hierarchical Relation Learning or BHRL) between object proposals within a target and the query. While BHRL shows impressive performance on natural images, it does not leverage multi-modal information that is critical for document snippet detection.
As discussed, prior approaches to document search have been formulated in two distinct ways. First, as a retrieval task where a database of search items is matched against the user query. However, creating and storing large databases for complex modalities like document snippets is a non-trivial task. The number and types of snippets that may be queried is unbounded, meaning that even if such a database is created, the next snippet that is queried still may not be included in the database. As a result, such retrieval implementations can only be practically implemented for limited subsets of snippets, such as text and simple multi-modal structures like logos etc. The second formulation is as an object detection task where a fixed set of classes are detected by a function (e.g., a deep model or other machine learning technique). However, as in the retrieval case, document snippets can be arbitrarily complex making it likely impossible to fully train a model on all of the possible classes to which a snippet may belong. As such, these prior techniques have failed to provide effective results when applied to document snippet search.
As discussed, traditional text searching provides very limited functionality to the document author (e.g., find and replace and similar use cases related to exact text matching). However, there are a number of use cases where snippet searches would be much more useful. For example, a user may want to add a column to a particular kind of table to accommodate more statistics. In such an instance, the query snippet may include an example of the table to be edited. The target document would then be searched for similar tables (e.g., tables with the same number of columns, potentially with the same or different column labels). Similarly, a form author may want to add an extra field in an information collection question. In such an instance, the query snippet may include a question field that includes text (e.g., the question) and a document control (e.g., a menu of selectable answers to the question). Likewise, a schoolteacher may want to find a multiple-choice question with three options to edit it to four options. In such an instance, the query snippet may include an example multiple-choice question that includes three options.
In the above use cases, a traditional text search system would return, at best, search results that were under-inclusive. For example, the text search system would only return an exact match to the text of the query snippet, while missing any similar snippets with varying text content. Prior intelligent searching systems would require the model in use to be trained on the specific query classes to find potentially matching snippets, however even if such training had occurred the model would have only been trained on a single modality (e.g., image data of the class) making the search results less accurate.
Contrary to these existing approaches, embodiments use a one-shot multi-modal framework that fuses context from visual, textual, and spatial modalities across the query snippet and the target document. For example, when a user seeks to search a target document for a query snippet, multiple modalities of the query snippet and target document are encoded (e.g., a text encoder encodes the text content, an image encoder encodes the image content, etc.). These encoded representations (e.g., embeddings) of the query snippet and the target document are then combined using co-attention and cross-attention modules to create a combined feature representation. For example, in some embodiments, the output of the co-attention and cross-attention modules are 2D vector representations (e.g., encoded representations) which are then combined to form a 3D feature volume. The feature volume can then be used to identify candidate snippets of the target document that match the query snippet. As discussed further below, embodiments use a new model architecture that enables the fusion of multi-modal inputs, which results in more accurate snippet detection in documents.
As shown in
In the example of
A neural network may include a machine-learning model that can be tuned (e.g., trained) based on training input to approximate unknown functions. In particular, a neural network can include a model of interconnected digital neurons that communicate and learn to approximate complex functions and generate outputs based on a plurality of inputs provided to the model. For instance, the neural network includes one or more machine learning algorithms. In other words, a neural network is an algorithm that implements deep learning techniques, i.e., machine learning that utilizes a set of algorithms to attempt to model high-level abstractions in data.
At numeral 2, the query snippet 104 and target document 106 are processed by the encoders of feature extractor 108 to generate a plurality of features. For example, each encoder outputs its own set of features for the query snippet and the target document. These features are then provided to feature fusion manager 110. As discussed further below, feature fusion manager 110 includes a co-attention module which combines like features from the query snippet and the target document, and a cross-attention module that combines unlike features from the query snippet and target document and outputs a fused combined feature representation. At numeral 3, feature fusion manager 110 combines the features extracted from the query snippet 104 and the target document 106 into a combined feature representation which is provided to snippet detector 112. For example, in some embodiments, the features extracted from the query snippet and the target document are each 2D feature vectors. When these 2D feature vectors are combined, they form a 3D feature volume.
The snippet detector 112 may include one or more detection heads which identify bounding boxes in the target document 106 associated with likely matches to the query snippet 104. At numeral 4, the feature volume is processed by the snippet detector 112 to identify matching snippets in the target document. At numeral 5, an augmented target document 114 is returned. In some embodiments, the augmented target document 114 has been augmented to include the bounding boxes identified by the snippet detector 112 which highlight matching snippets in the target document. In some embodiments, the augmented target document 114 is displayed to the user. Alternatively, the matching snippets may be displayed in isolation (e.g., removed from the target document). Additionally, or alternatively, the bounding box data (e.g., coordinates defining the bounding box) are returned to the requesting system to be used for further processing of the target document.
For example, in some embodiments, the document search system 100 may include a query snippet source 202. The query snippet source may be a document store which includes a variety of potential snippets which the user may select from to use for searching a target document. This may be used, for example, to aid the user in document authoring or editing.
In some embodiments, the document from which the query snippet is to be taken from can be displayed in user interface 200. The user can then select the snippet from the document using the user interface. For example, the user may draw a rectangle, or other shape, around the snippet in the document. Information defining the selected region (e.g., coordinates, path objects, etc.) may be provided to snippet selector 206 which extracts the query snippet 208 from the document. In some embodiments, the snippet selector 206 may crop the query snippet from the document based on the user input (e.g., based on the bounding information provided by the user). In some embodiments, the snippet selector 206 may use a machine learning model to extract a portion of the document corresponding to the query snippet. The machine learning model may receive the document and the user input identifying the snippet and output a predicted portion of the document corresponding to the query snippet. This query snippet 208 is now available to search a target document for similar snippets.
The target document to be searched for similar snippets is a form titled Application for approval of a maintenance controller. Upon being searched by the document search system 100 for matching snippets to the query, the document search system identifies one target snippet. This is shown in augmented target document 302 and highlighted by bounding box 304. The matching snippet also includes a line of text followed by two check boxes each associated with their own lines of text. Specifically, the matching snippet reads:
As can be seen, the text of the snippets is completely different, but the structure of the snippets is very similar. As a result, the document search system 100 allows users to find other versions of a query snippet, where its structure would be similar but the content, styles, fonts etc. might vary.
of query-target pairs (Q, T) which are generated using an oracle (not accessible afterwards), embodiments find snippets Sqt for each pair (Q, T) ∈
. Let fθ be a model with parameters θ which predicts similar snippets Sqt for a given (Q, T) pair. Let loss
be the measure of error between Sqt and Ŝqt, then the optimization problem is that of minimizing
as follows:
Let be the set of all document snippets. Similar snippets can be identified using a similarity criterion editqt based on the edit distance (e.g., Levenshtein distance), such that editqt:
2→
which takes two document snippets A, B ∈
, and outputs a similarity score s=editqt(A, B). Essentially, the similarity score compares a distance between the layout of the query and a potential region in the target, this allows structurally similar query-target pairs to be formed for training. In some embodiments, this enables similarity search datasets to be created from various document datasets, such as the Flamingo forms dataset and PubLayNet document dataset2.
As shown in
As discussed, when a query is received, the query snippet and the target document are first processed by feature extractor 108. Feature extractor 108 may include an image encoder, a text encoder, and a layout encoder, each configured to generate a representation of the query snippet or target document. In some embodiments, the query snippet and target document may be processed by separate encoders (e.g., as shown in
In some embodiments, the image encoder 402A, 402B can be implemented as a Document Image Transformer (DiT)-backbone with encoder-only architecture having four layers, each including four attention heads with model dimension of 512. The image encoder receives a three-channel document image (e.g., RGB) resized (e.g., using bi-cubic interpolation) to 224×224 resolution which is further cut into 16×16 sized patches and outputs a token sequence of length 197. The 197 tokens are formed as follows
where the additional token corresponds to the CLS token as in the original Bidirectional Encoder representation from Image Transformers (BEiT). In some embodiments, a pretrained DiT base model is used that has a hidden dimension of 768. Since both query image Qiinp and target image Tiinp are preprocessed to the same dimension, two feature vectors Qv, Tv are created, each of size BS×197×1024, where 1024 is the maximum sequence length and BS denotes the batch size. Note that the maximum sequence length is a hyperparameter choice that is chosen based on the maximum number of text-blocks in the target document. The encodings are then padded to final vectors Qv, Tv of size BS×1024×1024 each. The rationale behind doing so is to conveniently be able to perform the subsequent cross-attention with different modalities. The sequence of operations is as follows:
In some embodiments, the text encoder 404A, 404B is implemented as a pretrained BeRT-based sentence transformer. The text encoder generates a 768 dimensional embedding for a given block of text. In some embodiments, the continuous blocks of text in the query and in the target document are fed into this encoder to generate token sequence Ttinp, Qtinp of dimension BS×textt×768 and BS×textq×768 respectively, where textt is the number of text-blocks in the target document and textq is the number of text-blocks in the query snippet. Additionally, both Ttinp, Qtinp can be padded to a constant size of BS×1024×768. Unlike other MONOMER parameters, in some embodiments, the text encoder weights are kept frozen. Mathematically, text encoding is represented as follows:
In some embodiments, the layout encoder 406A, 406B is implemented as a vision transformer (ViT). The layout encoder encodes bounding box (e.g., spatial) information in the target document and query snippet. In some embodiments, the layout encoder is implemented using an encoder-only transformer architecture with four layers, four heads, and hidden dimension of 1024. The layout encoder receives bounds of the target Tsinp and query snippet Qsinp of size BS×boxt×4, and BS×boxq×4, where boxt and boxq are the number of bounding boxes in the target and query, respectively. Similar to the text-encoder, boxt and boxq are padded to the maximum sequence length of 1024. In some embodiments, weights of this encoder are initialized randomly. The bounding box encoding can be denoted as follows:
Once the feature sets (e.g., embeddings) are generated for the query snippet and the target document, as discussed above, the feature sets are provided to the feature fusion manager 110 for further processing. As shown in
As shown in
Similarly, the cross-attention module 411 includes two symmetric attention modules (e.g., 418 and 422) for generating spatio-visual features and two for attending text over those generated features (e.g., 420 and 424). The cross-attention module 411 generates cross-attention feature sets SqVtTt 426 and StVqTq 428, the dimensions of which are length of 1024 and token size of 1024. The co-attention feature sets 416 are then combined with the cross-attention feature sets 426, 428 to create feature volume Fsim, which is provided to snippet detector 112. In some embodiments, the feature volume is formed by concatenating the co-attention feature sets 416 and cross-attention feature sets 426, 428. Fsim can be represented as:
As shown in BS×1024×64×64. Ffeat is then processed by a sequence of convolutional layers, each with a kernel size of 1, followed by LeakyReLU activation (slope=0.1) to output features at 4 different levels, with shape BS×256×64×64, BS×512×64×64, BS×1024×64×64, and BS×2048×64×64. The hierarchical features are subsequently processed through a feature pyramid network (FPN) architecture, followed by a region proposal network (RPN), such as FasterRCNN, and region of interest (RoI) heads to obtain the final bounding boxes. The FPN returns features at a common representation size of 1024. The RPN outputs proposed regions of the target document that are predicted to be most similar to the query snippet. The RoI head then outputs bounding boxes corresponding to the predicted regions, as shown at 432.
The first technique used is Balanced and Hierarchical Relation Learning (BHRL). As shown at 604, BHRL did not identify any matching snippets in the target document. The second technique is LayoutLMv3 which uses a pretrained model to perform various document AI tasks. As shown at 606, LayoutLMv3 results in a bounding box covering part of the chart in the target document along with a significant portion of the target document's text. While LayoutLMv3 identified a match, it was very imprecise. However, as shown at 608, the embodiments described herein were able to correctly identify the chart of the target document as matching the query snippet 600. In general, embodiments were found to predict correct bounds of matching snippets while making fewer extraneous predictions than prior techniques.
As illustrated in
Additionally, the user interface manager 702 allows users to request the document search system 700 to search the target document for snippets matching the query snippet. For example, the user can select a query snippet using the user interface. In some embodiments, this selection may be made by selecting a region of a document that includes the query snippet. Selection may be performed by drawing the region (e.g., using a box tool, a free hand tool, etc.). The user may then request that the document search system search the target document for similar snippets to the query snippet. The document search system may then perform the techniques described herein to identify matching snippets.
As illustrated in
As discussed, feature extractor 708 may include a plurality of encoders (e.g., text encoders, image encoders, spatial encoders, etc.) which receive the query snippet and the target document and generate, e.g., text features, image features, and spatial features that represent the query snippet and target document. The encoders may be implemented as neural networks, such as transformers or networks of transformers, as discussed above. Once the features have been generated for the query snippet and the target document, the features are provided to feature fusion manager 710.
As discussed, the feature fusion manager 710 may include a plurality of transformer networks, including a co-attention module and a cross-attention module. The co-attention module combines like features from the query snippet and the target document, and the cross-attention module combines unlike features from the query snippet and the target document. The resulting feature sets are then combined to form a feature volume that is provided to snippet detector 112.
As discussed, snippet detector 712 may include a detection head which generates hierarchical features from the feature volume received from the feature fusion manager 110. In some embodiments, hierarchical features are subsequently processed through a feature pyramid network (FPN) architecture, followed by a region proposal network (RPN), such as FasterRCNN, and region of interest (RoI) heads to obtain the final bounding boxes. The FPN returns features at a common representation size of 1024. The RPN outputs proposed regions of the target document that are predicted to be most similar to the query snippet. The RoI head then outputs bounding boxes corresponding to the predicted regions. In some embodiments, the bounding boxes are used to create an augmented target document by overlaying the bounding boxes on the target document.
Although depicted in
As illustrated in
As further illustrated in
Each of the components 702-706 of the document search system 700 and their corresponding elements (as shown in
The components 702-706 and their corresponding elements can comprise software, hardware, or both. For example, the components 702-706 and their corresponding elements can comprise one or more instructions stored on a computer-readable storage medium and executable by processors of one or more computing devices. When executed by the one or more processors, the computer-executable instructions of the document search system 700 can cause a client device and/or a server device to perform the methods described herein. Alternatively, the components 702-706 and their corresponding elements can comprise hardware, such as a special purpose processing device to perform a certain function or group of functions. Additionally, the components 702-706 and their corresponding elements can comprise a combination of computer-executable instructions and hardware.
Furthermore, the components 702-706 of the document search system 700 may, for example, be implemented as one or more stand-alone applications, as one or more modules of an application, as one or more plug-ins, as one or more library functions or functions that may be called by other applications, and/or as a cloud-computing model. Thus, the components 702-706 of the document search system 700 may be implemented as a stand-alone application, such as a desktop or mobile application. Furthermore, the components 702-706 of the document search system 700 may be implemented as one or more web-based applications hosted on a remote server. Alternatively, or additionally, the components of the document search system 700 may be implemented in a suite of mobile device applications or “apps.”
As shown, the document search system 700 can be implemented as a single system. In other embodiments, the document search system 700 can be implemented in whole, or in part, across multiple systems. For example, one or more functions of the document search system 700 can be performed by one or more servers, and one or more functions of the document search system 700 can be performed by one or more client devices. The one or more servers and/or one or more client devices may generate, store, receive, and transmit any type of data used by the document search system 700, as described herein.
In one implementation, the one or more client devices can include or implement at least a portion of the document search system 700. In other implementations, the one or more servers can include or implement at least a portion of the document search system 700. For instance, the document search system 700 can include an application running on the one or more servers or a portion of the document search system 700 can be downloaded from the one or more servers. Additionally or alternatively, the document search system 700 can include a web hosting application that allows the client device(s) to interact with content hosted at the one or more server(s).
For example, upon a client device accessing a webpage or other web application hosted at the one or more servers, in one or more embodiments, the one or more servers can provide access to the document search system. The client device can receive a request (i.e., via user input) to search for content in one or more target documents that match a query snippet, and provide the request to the one or more servers. As discussed, the query snippet and target documents may be provided to the document search system to conduct a search. Upon receiving the request, the one or more servers can automatically perform the methods and processes described above to identify matching content in the target document(s). The one or more servers can provide all or portions of the matching content to the client device for display to the user.
The server(s) and/or client device(s) may communicate using any communication platforms and technologies suitable for transporting data and/or communication signals, including any known communication technologies, devices, media, and protocols supportive of remote data communications, examples of which will be described in more detail below with respect to
The server(s) may include one or more hardware servers (e.g., hosts), each with its own computing resources (e.g., processors, memory, disk space, networking bandwidth, etc.) which may be securely divided between multiple customers (e.g. client devices), each of which may host their own applications on the server(s). The client device(s) may include one or more personal computers, laptop computers, mobile devices, mobile phones, tablets, special purpose computers, TVs, or other computing devices, including computing devices described below with regard to
As illustrated in
In some embodiments, the method may further include extracting, by a plurality of encoders, the first multi-modal features from the query snippet and the second multi-modal features from the target document. As discussed, the document search system may include a feature extractor that includes multiple encoders, each corresponding to a different modality being encoded. In some embodiments, the plurality of encoders includes one or more of a text encoder, an image encoder, and a layout encoder.
As illustrated in
For example, in some embodiments, combining the multi-modal features includes obtaining a first plurality of feature vectors from the first multi-modal features, wherein each feature vector from the first plurality of feature vectors is associated with a different feature type and obtaining a second plurality of feature vectors from the second multi-modal features, wherein the second plurality of feature vectors include feature vectors corresponding to the feature types of the first plurality of feature vectors. A co-attention module generates a plurality of co-attention feature sets by combining feature vectors of like feature types from the first plurality of feature vectors and the second plurality of feature vectors.
Additionally, in some embodiments, combining the multi-modal features includes obtaining the first plurality of feature vectors from the first multi-modal features and obtaining the second plurality of feature vectors from the second multi-modal features. A cross attention module generates a plurality of cross-attention feature sets by combining feature vectors of unlike feature types from the first plurality of feature vectors and the second plurality of feature vectors. A feature volume is then generated by combining the plurality of co-attention feature sets with the plurality of cross-attention feature sets. For example, the co-attention feature sets and the cross-attention feature sets may be concatenated.
As illustrated in
Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. Embodiments within the scope of the present disclosure also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. In particular, one or more of the processes described herein may be implemented at least in part as instructions embodied in a non-transitory computer-readable medium and executable by one or more computing devices (e.g., any of the media content access devices described herein). In general, a processor (e.g., a microprocessor) receives instructions, from a non-transitory computer-readable medium, (e.g., a memory, etc.), and executes those instructions, thereby performing one or more processes, including one or more of the processes described herein.
Computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computer-executable instructions are non-transitory computer-readable storage media (devices). Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, embodiments of the disclosure can comprise at least two distinctly different kinds of computer-readable media: non-transitory computer-readable storage media (devices) and transmission media.
Non-transitory computer-readable storage media (devices) includes RAM, ROM, EEPROM, CD-ROM, solid state drives (“SSDs”) (e.g., based on RAM), Flash memory, phase-change memory (“PCM”), other types of memory, other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other non-transitory storage medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
A “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmissions media can include a network and/or data links which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
Further, upon reaching various computer system components, program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to non-transitory computer-readable storage media (devices) (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a “NIC”), and then eventually transferred to computer system RAM and/or to less volatile computer storage media (devices) at a computer system. Thus, it should be understood that non-transitory computer-readable storage media (devices) can be included in computer system components that also (or even primarily) utilize transmission media.
Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general-purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. In some embodiments, computer-executable instructions are executed on a general-purpose computer to turn the general-purpose computer into a special purpose computer implementing elements of the disclosure. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.
Those skilled in the art will appreciate that the disclosure may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, and the like. The disclosure may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
Embodiments of the present disclosure can also be implemented in cloud computing environments. In this description, “cloud computing” is defined as a model for enabling on-demand network access to a shared pool of configurable computing resources. For example, cloud computing can be employed in the marketplace to offer ubiquitous and convenient on-demand access to the shared pool of configurable computing resources. The shared pool of configurable computing resources can be rapidly provisioned via virtualization and released with low management effort or service provider interaction, and then scaled accordingly.
A cloud-computing model can be composed of various characteristics such as, for example, on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, and so forth. A cloud-computing model can also expose various service models, such as, for example, Software as a Service (“SaaS”), Platform as a Service (“PaaS”), and Infrastructure as a Service (“IaaS”). A cloud-computing model can also be deployed using different deployment models such as private cloud, community cloud, public cloud, hybrid cloud, and so forth. In this description and in the claims, a “cloud-computing environment” is an environment in which cloud computing is employed.
In particular embodiments, processor(s) 902 includes hardware for executing instructions, such as those making up a computer program. As an example, and not by way of limitation, to execute instructions, processor(s) 902 may retrieve (or fetch) the instructions from an internal register, an internal cache, memory 904, or a storage device 908 and decode and execute them. In various embodiments, the processor(s) 902 may include one or more central processing units (CPUs), graphics processing units (GPUs), field programmable gate arrays (FPGAs), systems on chip (SoC), or other processor(s) or combinations of processors.
The computing device 900 includes memory 904, which is coupled to the processor(s) 902. The memory 904 may be used for storing data, metadata, and programs for execution by the processor(s). The memory 904 may include one or more of volatile and non-volatile memories, such as Random Access Memory (“RAM”), Read Only Memory (“ROM”), a solid state disk (“SSD”), Flash, Phase Change Memory (“PCM”), or other types of data storage. The memory 904 may be internal or distributed memory.
The computing device 900 can further include one or more communication interfaces 906. A communication interface 906 can include hardware, software, or both. The communication interface 906 can provide one or more interfaces for communication (such as, for example, packet-based communication) between the computing device and one or more other computing devices 900 or one or more networks. As an example and not by way of limitation, communication interface 906 may include a network interface controller (NIC) or network adapter for communicating with an Ethernet or other wire-based network or a wireless NIC (WNIC) or wireless adapter for communicating with a wireless network, such as a WI-FI. The computing device 900 can further include a bus 912. The bus 912 can comprise hardware, software, or both that couples components of computing device 900 to each other.
The computing device 900 includes a storage device 908 includes storage for storing data or instructions. As an example, and not by way of limitation, storage device 908 can comprise a non-transitory storage medium described above. The storage device 908 may include a hard disk drive (HDD), flash memory, a Universal Serial Bus (USB) drive or a combination these or other storage devices. The computing device 900 also includes one or more input or output (“I/O”) devices/interfaces 910, which are provided to allow a user to provide input to (such as user strokes), receive output from, and otherwise transfer data to and from the computing device 900. These I/O devices/interfaces 910 may include a mouse, keypad or a keyboard, a touch screen, camera, optical scanner, network interface, modem, other known I/O devices or a combination of such I/O devices/interfaces 910. The touch screen may be activated with a stylus or a finger.
The I/O devices/interfaces 910 may include one or more devices for presenting output to a user, including, but not limited to, a graphics engine, a display (e.g., a display screen), one or more output drivers (e.g., display drivers), one or more audio speakers, and one or more audio drivers. In certain embodiments, I/O devices/interfaces 910 is configured to provide graphical data to a display for presentation to a user. The graphical data may be representative of one or more graphical user interfaces and/or any other graphical content as may serve a particular implementation.
In the foregoing specification, embodiments have been described with reference to specific exemplary embodiments thereof. Various embodiments are described with reference to details discussed herein, and the accompanying drawings illustrate the various embodiments. The description above and drawings are illustrative of one or more embodiments and are not to be construed as limiting. Numerous specific details are described to provide a thorough understanding of various embodiments.
Embodiments may include other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. For example, the methods described herein may be performed with less or more steps/acts or the steps/acts may be performed in differing orders. Additionally, the steps/acts described herein may be repeated or performed in parallel with one another or in parallel with different instances of the same or similar steps/acts. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.
In the various embodiments described above, unless specifically noted otherwise, disjunctive language such as the phrase “at least one of A, B, or C,” is intended to be understood to mean either A, B, or C, or any combination thereof (e.g., A, B, and/or C). As such, disjunctive language is not intended to, nor should it be understood to, imply that a given embodiment requires at least one of A, at least one of B, or at least one of C to each be present.