The present invention relates generally to the field of data processing, and more particularly to generating a personal corpus, which consists of a knowledge of individual summaries of an invention and conventional technology.
Aspects of an embodiment of the present invention disclose a method, computer program product, and computer system for generating a user-specific personal corpus. A processor creates a basic corpus for a first user using a first set of data sources, wherein the basic corpus includes one or more basic words and one or more vectors of the one or more basic words. A processor extracts a set of text from a second set of data sources associated with the first user. Responsive to finding an unknown word included in the set of text extracted, a processor updates the basic corpus, wherein the basic corpus is updated by replacing a vector of the unknown word with an average vector of the one or more basic words in the basic corpus created and registering the unknown word in a first personal corpus.
In some aspects of an embodiment of the present invention, a processor tags each basic word of the one or more basic words with a flag.
In some aspects of an embodiment of the present invention, a processor separates a first basic word from the basic corpus if the basic word is polysemous. A processor clusters the first basic word with a second basic word based on a degree of similarity.
In some aspects of an embodiment of the present invention, the second set of data sources includes at least one of a group of historical information acquired from a user computing device of the first user and a group of information input into the user computing device by the first user.
In some aspects of an embodiment of the present invention, the group of historical information acquired from the user computing device of the first user and the group of information input into the user computing device by the first user includes at least one of a web browsing history of the user computing device, an email history of the user computing device, a chat history of the user computing device, and a text message history of the user computing device.
In some aspects of an embodiment of the present invention, a processor divides the set of text into one or more words using morphological analysis. A processor creates a first word group from the set of text.
In some aspects of an embodiment of the present invention, subsequent to extracting the set of text from the second set of data sources associated with the first user, a processor processes the unknown word from the first word group created. A processor processes a known word from the first word group created.
In some aspects of an embodiment of the present invention, a processor extracts a third basic word from the first word group created. A processor classifies the third basic word into a basic word group. A processor calculates the average vector for the basic word group.
In some aspects of an embodiment of the present invention, responsive to finding the known word included in the set of text extracted, a processor determines a distance between a vector of the known word and the average vector for the basic word group. Responsive to determining the distance does exceed a first threshold, a processor registers the known word in the first personal corpus as a polysemous word. Responsive to determining the distance does not exceed the first threshold, a processor updates the vector of the known word by replacing the vector of the known word with an average of the vector of the known word and the average vector for the basic word group.
In some aspects of an embodiment of the present invention, a processor obtains a plurality of unique words other than the basic words from the second set of data sources associated with the first user. A processor determines among the plurality of unique words, one or more common words are included in a second personal corpus of the second user. A processor extracts a second word group and a third word group having a vector close to a common word of the one or more common words included in the first personal corpus of the first user and the second personal corpus of the second user, respectively. Responsive to the similarity between the second word group and the third word group not exceeding a second threshold, a processor sends a notification to the first user, or a processor sends a word that has the vector close to the common word and is selected from the first word group to the second user together with a set of textual information.
These and other features and advantages of the present invention will be described in, or will become apparent to those of ordinary skill in the art in view of, the following detailed description of the example embodiments of the present invention.
Embodiments of the present invention recognize that an individual word or phrase can be used (in different contexts) to express two or more different meanings. This is referred to as polysemy. Polysemy is distinguished from simple homonyms (i.e., where words sound alike but have different meanings) by etymology. For example, the word dish is a polysemous word. Dish may mean a kind of plate (e.g., “It is your turn to wash the dishes.”). Dish may also mean a meal (e.g., “How long does it take to cook this dish?”).
Embodiments of the present invention recognize that polysemous words may create communication issues between two or more communicating parties. For example, an issue may arise when a word has multiple meanings and the meaning of the word differs among the two or more communicating parties. This is true even when the same word is included in the corpora of the two or more communicating parties. Therefore, embodiments of the present invention recognize the need for a system and method to compare the personal corpora of the two or more communicating parties and to detect any differences in the meaning of a word between the two or more communicating parties.
Embodiments of the present invention provide a system and method to generate a user-specific personal corpus. Embodiments of the present invention provide a system and method to perform a comparison between each personal corpus of the two or more communicating parties to detect for differences in the meaning of a word contained in each personal corpus. A personal corpus is a personal database of a set of words that a user knows. Each personal corpus can be built from various sources of information including, but not limited to, a web browsing history, an email history, a chat history, and a text message history. Embodiments of the present invention detect differences in the meaning of a word by selecting the words close to the subject word used in the conversation, from a vector-based corpus, and by determining the similarity of the selected words between parties. Embodiments of the present invention send a notification to either or both of the two or more communicating parties if the similarity of the vectors of the word in the communicating party's personal corpus falls below a threshold, indicating a word may have a different meaning. Embodiments of the present invention update each personal corpus by replacing the vector of an unknown word that is included in the extracted word group but not included in a basic corpus, with the average vector of the basic words included in the basic corpus, and adding the unknown word to each personal corpus.
Implementation of embodiments of the present invention may take a variety of forms, and exemplary implementation details are discussed subsequently with reference to the Figures.
Network 110 operates as a computing network that can be, for example, a telecommunications network, a local area network (LAN), a wide area network (WAN), such as the Internet, or a combination of the three, and can include wired, wireless, or fiber optic connections. Network 110 can include one or more wired and/or wireless networks capable of receiving and transmitting data, voice, and/or video signals, including multimedia signals that include data, voice, and video information. In general, network 110 can be any combination of connections and protocols that will support communications between server 120, user computing devices 1301-N, and other computing devices (not shown) within distributed data processing environment 100.
Server 120 operates to run personal corpus creation program 122 and to send and/or store data in database 124. In an embodiment, server 120 can send data from database 124 to user computing devices 1301-N. In an embodiment, server 120 can receive data in database 124 from user computing devices 1301-N. In one or more embodiments, server 120 can be a standalone computing device, a management server, a web server, a mobile computing device, or any other electronic device or computing system capable of receiving, sending, and processing data and capable of communicating with user computing devices 1301-N via network 110. In one or more embodiments, server 120 can be a computing system utilizing clustered computers and components (e.g., database server computers, application server computers, etc.) that act as a single pool of seamless resources when accessed within distributed data processing environment 100, such as in a cloud computing environment. In one or more embodiments, server 120 can be a laptop computer, a tablet computer, a netbook computer, a personal computer, a desktop computer, a personal digital assistant, a smart phone, or any programmable electronic device capable of communicating with user computing devices 1301-N and other computing devices (not shown) within distributed data processing environment 100 via network 110. Server 120 may include internal and external hardware components, as depicted and described in further detail in
Personal corpus creation program 122 operates to generate a user-specific personal corpus. In the depicted embodiment, personal corpus creation program 122 is a standalone program. In another embodiment, personal corpus creation program 122 may be integrated into another software product, such as a communication software (i.e., an application designed to share information from one system to another, e.g., an application used for such tasks as file transfers or an application used for such tasks as instant messaging and video conferencing). In the depicted embodiment, personal corpus creation program 122 resides on server 120. In another embodiment, personal corpus creation program 122 may reside on user computing devices 1301-N or on another computing device (not shown), provided that personal corpus creation program 122 has access to network 110. The operational steps of personal corpus creation program 122 are depicted and described in further detail with respect to
In an embodiment, a user of user computing devices 1301-N registers with personal corpus creation program 122 of server 120. For example, the user completes a registration process (e.g., user validation), provides information to create a user profile, and authorizes the collection, analysis, and distribution (i.e., opts-in) of relevant data on identified computing devices (e.g., on user computing devices 1301-N) by server 120 (e.g., via personal corpus creation program 122). Relevant data includes, but is not limited to, personal information or data provided by the user or inadvertently provided by the user's device without the user's knowledge; tagged and/or recorded location information of the user (e.g., to infer context (i.e., time, place, and usage) of a location or existence); time stamped temporal information (e.g., to infer contextual reference points); and specifications pertaining to the software or hardware of the user's device. In an embodiment, the user opts-in or opts-out of certain categories of data collection. For example, the user can opt-in to provide all requested information, a subset of requested information, or no information. In one example scenario, the user opts-in to provide time-based information, but opts-out of providing location-based information (on all or a subset of computing devices associated with the user). In an embodiment, the user opts-in or opts-out of certain categories of data analysis. In an embodiment, the user opts-in or opts-out of certain categories of data distribution. Such preferences can be stored in database 124.
Database 124 operates as a repository for data received, used, and/or generated by personal corpus creation program 122. A database is an organized collection of data. Data includes, but is not limited to, information about user preferences (e.g., general user system settings such as alert notifications for user computing devices 1301-N); information about alert notification preferences; a user-specific profile; a user-specific corpus C; and any other data received, used, and/or generated by personal corpus creation program 122.
Database 124 can be implemented with any type of device capable of storing data and configuration files that can be accessed and utilized by server 120, such as a hard disk drive, a database server, or a flash memory. In an embodiment, database 124 is accessed by personal corpus creation program 122 to store and/or to access the data. In the depicted embodiment, database 124 resides on server 120. In another embodiment, database 124 may reside on another computing device, server, cloud server, or spread across multiple devices elsewhere (not shown) within distributed data processing environment 100, provided that personal corpus creation program 122 has access to database 124.
The present invention may contain various accessible data sources, such as database 124, that may include personal and/or confidential company data, content, or information the user wishes not to be processed. Processing refers to any operation, automated or unautomated, or set of operations such as collecting, recording, organizing, structuring, storing, adapting, altering, retrieving, consulting, using, disclosing by transmission, dissemination, or otherwise making available, combining, restricting, erasing, or destroying personal and/or confidential company data. Personal corpus creation program 122 enables the authorized and secure processing of personal data.
Personal corpus creation program 122 provides informed consent, with notice of the collection of personal and/or confidential data, allowing the user to opt-in or opt-out of processing personal and/or confidential data. Consent can take several forms. Opt-in consent can impose on the user to take an affirmative action before personal and/or confidential data is processed. Alternatively, opt-out consent can impose on the user to take an affirmative action to prevent the processing of personal and/or confidential data before personal and/or confidential data is processed. Personal corpus creation program 122 provides information regarding personal and/or confidential data and the nature (e.g., type, scope, purpose, duration, etc.) of the processing. Personal corpus creation program 122 provides the user with copies of stored personal and/or confidential company data. Personal corpus creation program 122 allows the correction or completion of incorrect or incomplete personal and/or confidential data. Personal corpus creation program 122 allows for the immediate deletion of personal and/or confidential data.
User computing devices 1301-N operates to each run user interfaces 1321-N, respectively, through which a user can interact with personal corpus creation program 122 on server 120 and to store data in and/or send data from local databases 1341-N. As used herein, N represents a positive integer, and accordingly the number of scenarios implemented in a given embodiment of the present invention is not limited to those depicted in
User interfaces 1321-N operate as a local user interface between personal corpus creation program 122 on server 120 and a user of user computing devices 1301-N. In some embodiments, user interface 1321-N are a graphical user interface (GUI), a web user interface (WUI), and/or a voice user interface (VUI) that can display (i.e., visually) or present (i.e., audibly) text, documents, web browser windows, user options, application interfaces, and instructions for operations sent from personal corpus creation program 122 to a user via network 110. User interfaces 1321-N can also display or present alerts including information (such as graphics, text, and/or sound) sent from personal corpus creation program 122 to a user via network 110. In an embodiment, user interfaces 1321-N are capable of sending and receiving data (i.e., to and from personal corpus creation program 122 via network 110, respectively). Through user interfaces 1321-N, a user can opt-in to personal corpus creation program 122; create a user profile; set user preferences and alert notification preferences; utilize web browsing, email, chat, and text messaging; receive alert notifications; receive a request for feedback; and input feedback.
A user preference is a setting that can be customized for a particular user. A set of default user preferences are assigned to each user of personal corpus creation program 122. A user preference editor can be used to update values to change the default user preferences. User preferences that can be customized include, but are not limited to, general user system settings, specific user profile settings, alert notification settings, and machine-learned data collection/storage settings. Machine-learned data is a user's personalized corpus of data. Machine-learned data includes, but is not limited to, past results of iterations of personal corpus creation program 122.
Local databases 1341-N operate as a repository for a user-specific profile and corpus C. Local databases 1341-N can be implemented with any type of device capable of storing data and configuration files that can be accessed and utilized by server 120, such as a hard disk drive, a database server, or a flash memory. In an embodiment, local databases 1341-N are each accessed by personal corpus creation program 122 to store and/or to access the data. In the depicted embodiment, local databases 1341-N reside on respective user computing devices 1301-N. In another embodiment, local databases 1341-N may reside on another computing device, server, cloud server, or spread across multiple devices elsewhere (not shown) within distributed data processing environment 100, provided that personal corpus creation program 122 has access to local databases 1341-N.
In 210, personal corpus creation program 122 creates a basic corpus C for a first user. In an embodiment, personal corpus creation program 122 creates a basic corpus C using a first set of data sources. The first set of data sources may include, but are not limited to, sources which consists of only common terms (e.g., Wikipedia®). The basic corpus C may include, but is not limited to, one or more basic words and one or more vectors of the one or more basic words.
In an embodiment, personal corpus creation program 122 tags each basic word in the basic corpus C with a flag. The flag denotes each basic word as “is_general_word=true”.
In an embodiment, if the basic corpus C contains a basic word that is polysemous (i.e., the basic word can be used in different contexts to express two or more different meanings), then personal corpus creation program 122 separates the basic word from the basic corpus C (i.e., as a separate entry or as a separate vector). In an embodiment, personal corpus creation program 122 clusters the vectors of the basic words separated from the basic corpus C based on a degree of similarity (i.e., between the basic words separated from the basic corpus C). In an embodiment, personal corpus creation program 122 clusters the vectors of the basic words separated from the basic corpus C using one or more existing techniques known in the art. The one or more existing techniques known in the art may include, but is not limited to, Word2Vec. Clustering of the vectors of the basic words separated from the basic corpus C is optional and may or may not be performed by personal corpus creation program 122.
In step 220, responsive to the first user (via user computing device 1301) preparing a communication to a second user (via user computing device 130N), personal corpus creation program 122 extracts a set of text X from the communication. In another embodiment, personal corpus creation program 122 extracts a set of text X from a second set of data sources. The second set of data sources may include, but are not limited to, a group of historical information that the first user acquires from the first user computing device (e.g., user computing device 1301) and a group of information that the first user inputs into the first user computing device (e.g., user computing device 1301). The information may include, but is not limited to, a web browsing history (i.e., a list of web pages the first user has visited as well as associated metadata such as a page title and a time of visit), an email history (e.g., a list of emails sent and a list of emails received), chat history (e.g., a list of chat messages sent and a list of chat messages received), text message history (e.g., a list of written and/or voice text messages sent (including text messages written by the first user) and a list of written and/or voice text messages received (including text messages read by the first user)), and a set of text the first user read when using a Head-Mounted Display.
In an embodiment, personal corpus creation program 122 divides the set of text X into individual words using morphological analysis. In an embodiment, personal corpus creation program 122 creates a word group W from the set of text X extracted from an entire page. In another embodiment, personal corpus creation program 122 creates a word group W from the set of text X extracted from a pre-defined window size.
In step 230, personal corpus creation program 122 processes any unknown words from word group W. In an embodiment, personal corpus creation program 122 processes any unknown words from word group W by extracting any basic words from word group W. In an embodiment, personal corpus creation program 122 extracts any basic words from word group W. Basic words are flagged as “is_general_word=true”. In an embodiment, if a basic word extracted is stored in basic corpus C (i.e., a known word), personal corpus creation program 122 classifies the basic word extracted into a basic word group BW. In an embodiment, if a basic word extracted is not stored in basic corpus C (i.e., an unknown word), personal corpus creation program 122 classifies the basic word extracted into an unknown word group NW.
In an embodiment, personal corpus creation program 122 obtains a vector for each basic word in basic word group BW. In an embodiment, personal corpus creation program 122 calculates the average vector VBW (i.e., for all of the basic words in basic word group BW). In an embodiment, personal corpus creation program 122 saves the average vector VBW as a vector of unknown words n ∈NW. If there is more than one basic word in basic word group BW, the vector for all of the unknown words in basic word group BW is the same (i.e., n ∈NW). To avoid this, the average vector VBW may be multiplied by a random number for each unknown word to fine-tune the vector.
In an embodiment, personal corpus creation program 122 replaces the vector for each basic word extracted but not stored in basic corpus C (i.e., an unknown word) with the average vector VBW. In an embodiment, personal corpus creation program 122 registers each basic word extracted but not stored in basic corpus C (i.e., an unknown word) in the basic corpus C. By registering the basic words extracted but not stored in basic corpus C (i.e., an unknown word) in basic corpus C, the basic words become known words. Additionally, by registering the basic words extracted but not stored in basic corpus C (i.e., unknown words) to basic corpus C, basic corpus C becomes a personal corpus (i.e., a corpus personal to the first user).
In an embodiment, personal corpus creation program 122 sends an alert notification to the first user (via user computing device 1301). In an embodiment, personal corpus creation program 122 sends an alert notification to the first user, notifying the first user of the unknown word. In another embodiment, personal corpus creation program 122 sends an alert notification to the second user (via user computing device 130N), notifying the second user of the unknown word (i.e., to teach the second user the meaning of the unknown word).
In step 240, personal corpus creation program 122 processes any known words and any polysemous word in word group W. A known word is a non-basic word stored in basic corpus C. A non-basic word is flagged as “is_general_word=false”. A group of known words are treated as a known word group KW. A known word is also a word that was not originally included in basic corpus C, but later added to basic corpus C. A polysemous word is identified by determining whether the word is included in a circle (i.e., an ellipse) encompassing a cluster of words. In an embodiment, personal corpus creation program 122 processes any known words and any polysemous words in word group W by determining the distance between the existing VKW, which is the vector of the known word k stored in corpus C, and VBW, which is the average vector of the basic word group BW.
In decision step 250, personal corpus creation program 122 determines whether the distance between the existing VKW and VBW exceeds a predetermined threshold TD. If personal corpus creation program 122 determines the distance between the existing VKW and VBW does not exceed a predetermined threshold TD (decision step 250, NO branch), then personal corpus creation program 122 proceeds to step 260, updates the vector of a known word in word group W. If personal corpus creation program 122 determines the distance between the existing VKW and VBW does exceed a predetermined threshold TD (decision step 250, YES branch), then personal corpus creation program 122 proceeds to step 270, adding a known word to the personal corpus.
In step 260, responsive to determining the distance between the existing VKW and VBW does not exceed the predetermined threshold TD, personal corpus creation program 122 updates the vector of a known word (i.e., k∈KW) in word group W. In an embodiment, personal corpus creation program 122 updates the vector of a known word in word group W with VKW In an embodiment, personal corpus creation program 122 updates the vector of a known word in word group W by replacing the vector with an average of the existing VKW, which is the vector of a known word k stored in corpus C and VBW, which is the average vector of the basic word group BW.
In an embodiment, personal corpus creation program 122 sends an alert notification to the first user (via user computing device 1301). In an embodiment, personal corpus creation program 122 sends an alert notification to the first user, notifying the first user of the difference in perception of the known word.
In step 270, responsive to determining the distance between the existing VKW and VBW does exceed the predetermined threshold TD, personal corpus creation program 122 registers the known word in the personal corpus. In an embodiment, if there is more than one polysemous word, personal corpus creation program 122 subjects the closest polysemous word of the more than one polysemous word to the calculation (i.e., determining whether the distance between the existing VKW and VBW exceeds the predetermined threshold TD). In an embodiment, personal corpus creation program 122 selects a polysemous word.
In an embodiment, personal corpus creation program 122 updates the frequency f of the word selected among the polysemous words. The frequency f is a parameter representing priority among polysemous words. The frequency f may be the sum of the number of times the word is used (i.e., in word group W), or a separate formula may be created to allow an administrator to optimize it. The user's occupation and other information regarding the user may be used as a reference when calculating the frequency f (and proficiency) of a word. For example, Information Technology engineers may use DI to indicate Dependency Injection. The frequency f (and proficiency) of DI (indicating Dependency Injection) is set to be greater than DI (indicating Diffusion Index). Generally, the higher the frequency f, the more frequently the word is used and/or seen by the user. In an embodiment, personal corpus creation program 122 stores the frequency f of a word in a database (e.g., database 124). In another embodiment, personal corpus creation program 122 stores the frequency f of a word as fields in a relational database (RDB). A RDB is a collective set of multiple data sets organized by tables, records, and columns. In another embodiment, personal corpus creation program 122 uses the frequency f of a word as one of the components of a vector.
In another embodiment, personal corpus creation program 122 updates the proficiency p of the word selected among polysemous words. For example, it is assumed that a person understands a word better if he or she has used (e.g., written) the word than if he or she has only read it. In an embodiment, personal corpus creation program 122 stores the number of times the word has been read and the number of times the word has been written with the vector in a database (e.g., database 124). In an embodiment, personal corpus creation program 122 calculates the proficiency p using the following equation: p=cread*RP+cwrite*WP, wherein c read is the number of times the word has been read and c write is the number of times the word has been written, and wherein RP and WP are predetermined constants, where RP<WP.
In an embodiment, personal corpus creation program 122 sends an alert notification to the first user (via user computing device 1301). In an embodiment, personal corpus creation program 122 sends an alert notification to the first user, notifying the first user of the difference in perception of the polysemous word.
In a third application of personal corpus creation program 122, personal corpus creation program 122 obtains a plurality of unique words other than the basic words from a second set of data sources associated with a first user. Among the plurality of unique words, personal corpus creation program 122 determines one or more common words are included in a second personal corpus of a second user. Personal corpus creation program 122 extracts a first word group and a second word group having a vector close to a common word of the one or more common words included in a first personal corpus of the first user and the second personal corpus of the second user, respectively. Responsive to the similarity between the first word group and the second word group not exceeding a predetermined threshold, personal corpus creation program 122 either sends a notification to the first user, or sends a word that has the vector close to the common word and is selected from a word group to the second user together with a set of textual information.
In a fourth application of personal corpus creation program 122, the corpus of a static page such as a blog at its creation (update) is embedded to record the meanings of words the author recognizes at that time. The entire corpus may be loaded using java script (JS), etc. (e.g., <script src=“load-cupus-20190301.js/>”). If the corpus is hosted or version-managed, the link and version can be recorded as meta-information on the page (e.g., <meta name=“corpus-link” content=“https://corpus.com/user1/”/> or <meta name=“corpus-version” content=“1.1.10”/>).
In a fifth application of personal corpus creation program 122, a particular word is found in a book that a user is reading but does not exist in the user's corpus. Personal corpus creation program 122 provides the user with the meaning of the word.
Computing environment 400 contains an example of an environment for the execution of at least some of the computer code involved in performing the inventive methods, such as personal corpus creation program 122. In addition to personal corpus creation program 122, computing environment 400 includes, for example, computer 401, wide area network (WAN) 402, end user device (EUD) 403, remote server 404, public cloud 405, and private cloud 406. In this embodiment, computer 401 includes processor set 410 (including processing circuitry 420 and cache 421), communication fabric 411, volatile memory 412, persistent storage 413 (including operating system 422 and personal corpus creation program 122, as identified above), peripheral device set 414 (including user interface (UI), device set 423, storage 424, and Internet of Things (IoT) sensor set 425), and network module 415. Remote server 404 includes remote database 430. Public cloud 405 includes gateway 440, cloud orchestration module 441, host physical machine set 442, virtual machine set 443, and container set 444.
Computer 401, which represents server 120 of
Processor set 410 includes one, or more, computer processors of any type now known or to be developed in the future. Processing circuitry 420 may be distributed over multiple packages, for example, multiple, coordinated integrated circuit chips. Processing circuitry 420 may implement multiple processor threads and/or multiple processor cores. Cache 421 is memory that is located in the processor chip package(s) and is typically used for data or code that should be available for rapid access by the threads or cores running on processor set 410. Cache memories are typically organized into multiple levels depending upon relative proximity to the processing circuitry. Alternatively, some, or all, of the cache for the processor set may be located “off chip.” In some computing environments, processor set 410 may be designed for working with qubits and performing quantum computing.
Computer readable program instructions are typically loaded onto computer 401 to cause a series of operational steps to be performed by processor set 410 of computer 401 and thereby effect a computer-implemented method, such that the instructions thus executed will instantiate the methods specified in flowcharts and/or narrative descriptions of computer-implemented methods included in this document (collectively referred to as “the inventive methods”). These computer readable program instructions are stored in various types of computer readable storage media, such as cache 421 and the other storage media discussed below. The program instructions, and associated data, are accessed by processor set 410 to control and direct performance of the inventive methods. In computing environment 400, at least some of the instructions for performing the inventive methods may be stored in personal corpus creation program 122 in persistent storage 413.
Communication fabric 411 is the signal conduction paths that allow the various components of computer 401 to communicate with each other. Typically, this fabric is made of switches and electrically conductive paths, such as the switches and electrically conductive paths that make up busses, bridges, physical input/output ports and the like. Other types of signal communication paths may be used, such as fiber optic communication paths and/or wireless communication paths.
Volatile memory 412 is any type of volatile memory now known or to be developed in the future. Examples include dynamic type random access memory (RAM) or static type RAM. Typically, the volatile memory is characterized by random access, but this is not required unless affirmatively indicated. In computer 401, the volatile memory 412 is located in a single package and is internal to computer 401, but, alternatively or additionally, the volatile memory may be distributed over multiple packages and/or located externally with respect to computer 401.
Persistent storage 413 is any form of non-volatile storage for computers that is now known or to be developed in the future. The non-volatility of this storage means that the stored data is maintained regardless of whether power is being supplied to computer 401 and/or directly to persistent storage 413. Persistent storage 413 may be a read only memory (ROM), but typically at least a portion of the persistent storage allows writing of data, deletion of data and re-writing of data. Some familiar forms of persistent storage include magnetic disks and solid-state storage devices. Operating system 422 may take several forms, such as various known proprietary operating systems or open-source Portable Operating System Interface type operating systems that employ a kernel. The code included in personal corpus creation program 122 typically includes at least some of the computer code involved in performing the inventive methods.
Peripheral device set 414 includes the set of peripheral devices of computer 401. Data communication connections between the peripheral devices and the other components of computer 401 may be implemented in various ways, such as Bluetooth connections, Near-Field Communication (NFC) connections, connections made by cables (such as universal serial bus (USB) type cables), insertion type connections (for example, secure digital (SD) card), connections made though local area communication networks and even connections made through wide area networks such as the internet. In various embodiments, UI device set 423 may include components such as a display screen, speaker, microphone, wearable devices (such as goggles and smart watches), keyboard, mouse, printer, touchpad, game controllers, and haptic devices. Storage 424 is external storage, such as an external hard drive, or insertable storage, such as an SD card. Storage 424 may be persistent and/or volatile. In some embodiments, storage 424 may take the form of a quantum computing storage device for storing data in the form of qubits. In embodiments where computer 401 is required to have a large amount of storage (for example, where computer 401 locally stores and manages a large database) then this storage may be provided by peripheral storage devices designed for storing very large amounts of data, such as a storage area network (SAN) that is shared by multiple, geographically distributed computers. IoT sensor set 425 is made up of sensors that can be used in Internet of Things applications. For example, one sensor may be a thermometer and another sensor may be a motion detector.
Network module 415 is the collection of computer software, hardware, and firmware that allows computer 401 to communicate with other computers through WAN 402. Network module 415 may include hardware, such as modems or Wi-Fi signal transceivers, software for packetizing and/or de-packetizing data for communication network transmission, and/or web browser software for communicating data over the internet. In some embodiments, network control functions and network forwarding functions of network module 415 are performed on the same physical hardware device. In other embodiments (for example, embodiments that utilize software-defined networking (SDN)), the control functions and the forwarding functions of network module 415 are performed on physically separate devices, such that the control functions manage several different network hardware devices. Computer readable program instructions for performing the inventive methods can typically be downloaded to computer 401 from an external computer or external storage device through a network adapter card or network interface included in network module 415.
WAN 402 is any wide area network (for example, the internet) capable of communicating computer data over non-local distances by any technology for communicating computer data, now known or to be developed in the future. In some embodiments, the WAN may be replaced and/or supplemented by local area networks (LANs) designed to communicate data between devices located in a local area, such as a Wi-Fi network. The WAN and/or LANs typically include computer hardware such as copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and edge servers.
End user device (EUD) 403 is any computer system that is used and controlled by an end user (for example, a customer of an enterprise that operates computer 401) and may take any of the forms discussed above in connection with computer 401. EUD 403 typically receives helpful and useful data from the operations of computer 401. For example, in a hypothetical case where computer 401 is designed to provide a recommendation to an end user, this recommendation would typically be communicated from network module 415 of computer 401 through WAN 402 to EUD 403. In this way, EUD 403 can display, or otherwise present, the recommendation to an end user. In some embodiments, EUD 403 may be a client device, such as thin client, heavy client, mainframe computer, desktop computer and so on.
Remote server 404 is any computer system that serves at least some data and/or functionality to computer 401. Remote server 404 may be controlled and used by the same entity that operates computer 401. Remote server 404 represents the machine(s) that collect and store helpful and useful data for use by other computers, such as computer 401. For example, in a hypothetical case where computer 401 is designed and programmed to provide a recommendation based on historical data, then this historical data may be provided to computer 401 from remote database 430 of remote server 404.
Public cloud 405 is any computer system available for use by multiple entities that provides on-demand availability of computer system resources and/or other computer capabilities, especially data storage (cloud storage) and computing power, without direct active management by the user. Cloud computing typically leverages sharing of resources to achieve coherence and economies of scale. The direct and active management of the computing resources of public cloud 405 is performed by the computer hardware and/or software of cloud orchestration module 441. The computing resources provided by public cloud 405 are typically implemented by virtual computing environments that run on various computers making up the computers of host physical machine set 442, which is the universe of physical computers in and/or available to public cloud 405. The virtual computing environments (VCEs) typically take the form of virtual machines from virtual machine set 443 and/or containers from container set 444. It is understood that these VCEs may be stored as images and may be transferred among and between the various physical machine hosts, either as images or after instantiation of the VCE. Cloud orchestration module 441 manages the transfer and storage of images, deploys new instantiations of VCEs and manages active instantiations of VCE deployments. Gateway 440 is the collection of computer software, hardware, and firmware that allows public cloud 405 to communicate through WAN 402.
Some further explanation of virtualized computing environments (VCEs) will now be provided. VCEs can be stored as “images.” A new active instance of the VCE can be instantiated from the image. Two familiar types of VCEs are virtual machines and containers. A container is a VCE that uses operating-system-level virtualization. This refers to an operating system feature in which the kernel allows the existence of multiple isolated user-space instances, called containers. These isolated user-space instances typically behave as real computers from the point of view of programs running in them. A computer program running on an ordinary operating system can utilize all resources of that computer, such as connected devices, files and folders, network shares, CPU power, and quantifiable hardware capabilities. However, programs running inside a container can only use the contents of the container and devices assigned to the container, a feature which is known as containerization.
Private cloud 406 is similar to public cloud 405, except that the computing resources are only available for use by a single enterprise. While private cloud 406 is depicted as being in communication with WAN 402, in other embodiments a private cloud may be disconnected from the internet entirely and only accessible through a local/private network. A hybrid cloud is a composition of multiple clouds of different types (for example, private, community or public cloud types), often respectively implemented by different vendors. Each of the multiple clouds remains a separate and discrete entity, but the larger hybrid cloud architecture is bound together by standardized or proprietary technology that enables orchestration, management, and/or data/application portability between the multiple constituent clouds. In this embodiment, public cloud 405 and private cloud 406 are both part of a larger hybrid cloud.
The programs described herein are identified based upon the application for which they are implemented in a specific embodiment of the invention. However, it should be appreciated that any particular program nomenclature herein is used merely for convenience, and thus the invention should not be limited to use solely in any specific application identified and/or implied by such nomenclature.
Various aspects of the present disclosure are described by narrative text, flowcharts, block diagrams of computer systems and/or block diagrams of the machine logic included in computer program product (CPP) embodiments. With respect to any flowcharts, depending upon the technology involved, the operations can be performed in a different order than what is shown in a given flowchart. For example, again depending upon the technology involved, two operations shown in successive flowchart blocks may be performed in reverse order, as a single integrated step, concurrently, or in a manner at least partially overlapping in time.
A computer program product embodiment (“CPP embodiment” or “CPP”) is a term used in the present disclosure to describe any set of one, or more, storage media (also called “mediums”) collectively included in a set of one, or more, storage devices that collectively include machine readable code corresponding to instructions and/or data for performing computer operations specified in a given CPP claim. A “storage device” is any tangible device that can retain and store instructions for use by a computer processor. Without limitation, the computer readable storage medium may be an electronic storage medium, a magnetic storage medium, an optical storage medium, an electromagnetic storage medium, a semiconductor storage medium, a mechanical storage medium, or any suitable combination of the foregoing. Some known types of storage devices that include these mediums include: diskette, hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or Flash memory), static random access memory (SRAM), compact disc read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanically encoded device (such as punch cards or pits/lands formed in a major surface of a disc) or any suitable combination of the foregoing. A computer readable storage medium, as that term is used in the present disclosure, is not to be construed as storage in the form of transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide, light pulses passing through a fiber optic cable, electrical signals communicated through a wire, and/or other transmission media. As will be understood by those of skill in the art, data is typically moved at some occasional points in time during normal operations of a storage device, such as during access, de-fragmentation or garbage collection, but this does not render the storage device as transitory because the data is not transitory while it is stored.
The foregoing descriptions of the various embodiments of the present invention have been presented for purposes of illustration and example but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The terminology used herein was chosen to best explain the principles of the embodiment, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.