This disclosure relates generally to apparatuses, methods, and computer readable media for automatically archiving, sorting, and/or indexing content related to messages sent to and from computing devices across multiple communications formats and protocols.
The proliferation of personal computing devices in recent years, especially mobile personal computing devices, combined with a growth in the number of widely-used communications formats (e.g., text, voice, video, image) and protocols (e.g., SMTP, IMAP/POP, SMS/MMS, XMPP, etc.) has led to a communications experience that many users find fragmented and restrictive. Users desire an experience where all of their data is accessible, searchable, and sortable to them through a single interface.
Typically, users receive data, for example, text, voice, video and images, through a variety of communications formats. Data received from such sources is generally accessible, searchable, and/or sortable through only the communications format used to transmit it. As such, users experience difficulties in organizing, managing, and searching across such data. For example, a user may have to open various communications applications to discover where a certain data file is located. Even then, that file must be sent by, for example, email or direct message to another communications application in order to enable sharing with other users. This process is time-consuming and may cause difficulties in locating user data.
Moreover, some message or communications may contain what is referred to herein as “secondary message content,” e.g., message content that may include: file attachments; links to other files and/or webpages; as well as text and/or other media information on the files and/or webpages that are linked out to by the links embedded in the original message, etc. As such, easily searching for particular content across all of a given user's communications formats and communications applications—including any “secondary message content” that may be embedded in the user's messages—is not possible. In fact, no methods are known for the creation of a single, integrated communications platform that can automatically index and archive data (including “secondary message content”) from messages in a variety of communications formats and received over a variety of communications protocols, index the data for deep searching, and allow for such data to be accessible to users through a single communications application interface.
The subject matter of the present disclosure is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above. To address these and other issues, techniques that enable automatic archival, indexing, and accessibility of data, including “secondary message content” originating from messages received in a plurality of communications formats and delivered via a plurality of communications protocols.
Disclosed are apparatuses, methods, and computer readable media for automatically archiving, sorting, and/or indexing content related to messages sent to and from computing devices across multiple communications formats and protocols. More particularly, but not by way of limitation, this disclosure relates to apparatuses, methods, and computer readable media to permit computing devices, e.g., smartphones, smart devices, tablets, wearable devices, laptops, and the like, to utilize a single, integrated communications platform that can automatically index and archive message data (including “secondary message content,” such as: file attachments; links to other files and/or webpages; as well as text and/or other media information on the files and/or webpages that are linked out to by the links embedded in the original message, etc.) from messages in a variety of communications formats and received over a variety of communications protocols. The indexed message data may thus allow users to conduct deep searches for message content (and/or secondary message content, such as URL links) in a seamless fashion that is accessible to users through a single communications application interface.
Referring now to
Server 106 in the server-entry point network architecture infrastructure 100 of
Referring now to
Referring now to
System unit 205 may be programmed to perform methods in accordance with this disclosure. System unit 205 comprises one or more processing units, input-output (I/O) bus 225 and memory 215. Access to memory 215 can be accomplished using the communication bus 225. Processing unit 210 may include any programmable controller device including, for example, a mainframe processor, a mobile phone processor, or, as examples, one or more members of the INTEL® ATOM™, INTEL® XEON™, and INTEL® CORE™ processor families from Intel Corporation and the Cortex and ARM processor families from ARM. (INTEL, INTEL ATOM, XEON, and CORE are trademarks of the Intel Corporation. CORTEX is a registered trademark of the ARM Limited Corporation. ARM is a registered trademark of the ARM Limited Company). Memory 215 may include one or more memory modules and comprise random access memory (RAM), read only memory (ROM), programmable read only memory (PROM), programmable read-write memory, and solid-state memory. As also shown in
Referring now to
The processing unit core 210 is shown including execution logic 280 having a set of execution units 285-1 through 285-N. Some embodiments may include a number of execution units dedicated to specific functions or sets of functions. Other embodiments may include only one execution unit or one execution unit that can perform a particular function. The execution logic 280 performs the operations specified by code instructions.
After completion of execution of the operations specified by the code instructions, back end logic 290 retires the instructions of the code 250. In one embodiment, the processing unit core 210 allows out of order execution but requires in order retirement of instructions. Retirement logic 295 may take a variety of forms as known to those of skill in the art (e.g., re-order buffers or the like). In this manner, the processing unit core 210 is transformed during execution of the code 250, at least in terms of the output generated by the decoder, the hardware registers and tables utilized by the register renaming logic 262, and any registers (not shown) modified by the execution logic 280.
Although not illustrated in
Auto-Archiving and Indexing of Secondary Message Content
Auto-archiving and indexing of secondary message content from messages received by a centralized, multi-protocol communications system in a variety of communications formats and via a variety of delivery protocols may be achieved through the use of certain databases 104 of the centralized communications system. These databases are referred to in this disclosure as “Vault” storage databases or, simply, “Vault,” for short. When a message, a data file, is received by the multi-protocol communications system, the central sync server 105 may initiate the following exemplary process to archive the message and its contents (including secondary message content, such as URL links and/or attachments from the message) and then index its contents to be searchable. Because the databases 104 of the multi-protocol communications system may act as a central repository for these messages, users are presumed to have registered for this service with the multi-protocol communications system and logged-in to the system using authorized credentials before viewing and/or searching across archived messages.
An example of this auto-archiving and indexing process is shown in
At step 303, if the save procedure is successful, the body of the UMO message may be sent to what will be referred to herein as the Content Discovery Service (CDS), shown as step 304. If the save procedure is not successful, then the process reverts to step 301, and attempts the save procedure again. The CDS may comprise a software-implemented rules/machine-learning engine implementing a set of criteria, machine learning heuristics, artificial intelligence, or the like, designed to identify the “key,” that is, relevant, pieces of content related to any given message in any given format. For example, in the case of an email message object, the CDS may disregard the email's, “To,” “From,” and “BCC” fields, and instead scan the rest of the message body and any relevant links, attachments, etc. thereto for relevant content, e.g., information, topics, or “keywords” from the message and its associated content that a user may later wish to search based on. Thus, at step 305, the CDS may parse the metadata of the UMO message to identify key characteristics of the message, such as its contents and its format. For example, the CDS may pull likely-relevant information, e.g., names, places, proper nouns, dates, times, URL links, media content, etc. from the message as potential “key” characteristics of the message. From this process, any secondary message content associated with the UMO message, if present, may also be identified. This process may also isolates the parsed items for individual processing, as will be described below. The isolated, parsed items may then be checked against the CDS Rules Engine at step 306 to determine if any particular actions should be take to archive, sort, and/or index the parsed item from the message. If there are parsed secondary message content items, such as a URL link, among the isolated, parsed items, then, at step 307, the system may follow the relevant instructions for processing the particular type of secondary message content item. For example, in the case of a URL link, the system may create a ‘weblink’ object based on the URL link located in the message and add the created ‘weblink’ object to a database, e.g., database 104, referred to here as the “Vault” database. The ‘weblink’ object may, e.g., be stored as a file of a pre-existing format (e.g., a PDF, a screenshot image, an exact HTML/CSS copy, a downloaded copy of an image or video, etc.). This object could also be stored in association with any relevant text-based tags associated with the URL link (e.g., by using full-text analyzers, computer vision-based, etc.).
The system may then proceed to step 308 and actually visit or ‘crawl’ the webpage (or other data item) linked to by the URL link located in the message and download any “key” page content on the associated webpage hosted at the URL link, e.g., text, images, video, and/or other media data. In parallel, at step 309, the process may also capture a live, i.e., contemporaneous, “snapshot” of the webpage (or other data item) linked to by the URL link located in the message. This may be particularly useful in situations where the content of a webpage changes frequently (or even if it changes slowly over time), so that, when the user goes back to search for a URL link sent to him by a particular contact (which may be years after the URL link was originally sent), the user is able to see the webpage as it looked at the date and time that the contact first transmitted the URL link to the user. This is much more likely to convey the actual content that the sender of the URL link actually wanted to convey to the recipient. The downloaded content may then be saved to the “Vault” database by the system at step 310. The system may then create an association between the created ‘weblink’ object and the originally received message at step 311. It is this message with the associated ‘weblink’ object that is then accessible to the user by search. In this manner, the URL link itself (as well as relevant content located on the webpage linked to by the URL link) are indexed and maintained on the databases 104 of the system in a quickly searchable fashion that is true-to-the-content of the URL link as of the date and time that it was sent by the sender.
It is to be understood that, if the data item associated with the secondary message content is something other than a URL link, e.g., a document or attachment, an analogous process may be undertaken by the system to download “key” content from the data item (step 308) and/or capture and save a live “snapshot” of the data item (step 309) as it existed at the time of sending. An object storing the non-URL link data item and the associated “key” content and/or snapshot may then also be stored in the Vault database (step 310) and associated with the message object that it was sent in (step 311).
An exemplary process for searching and accessing saved messages and their associated attachments and web links is disclosed exemplarily in
The exemplary process shown in
Upon identifying the “key” parameters of the search request, the process may proceed to step 314, where the sync servers 105 pass a search query to the databases 104 and utilize the database's search index, which may have been previously constructed, e.g., as disclosed with reference to
Subsequently, at step 316, the sync servers 105 generate and return the results of the search request to the user-facing application. The search results set may be presented in a number of ways. For example, as shown at step 317, the search results set, comprised of the messages, attachments, document, and/or weblinks, may optionally be ranked, e.g., with the ranking based on a strength of the match with the search request. Alternately, the search results set may be a list, sortable against one or more of the search parameters entered by the user and/or one or more preferences of the user.
Finally, at step 318, the search results set may be presented to the user through the multi-protocol communication system's user-facing application. The results may be presented as a ranked list, a sortable list or chart, or any other method readily apparent to those of ordinary skill in the art. Using those results, a user may sort matches against one or more search parameters in an order of his or her choosing and then request the specific messages, attachments, documents, and/or weblinks to be retrieved from the sync servers 105, which will pass the requested data files to the user-facing application for user access.
Turning now to
As shown in
Participant 401 objects represent an “on-network” or “off-network” users. Participant 401 objects correspond to any people identified in the traditional email format fields of “To,” “From,” “Cc,” and “Bcc.” However, the Participant 401 objects are not limited to this, as a Participant 401 may be any user engaged in the conversation, and is relational to the service being used as the underlying communication protocol.
Service Identifier 402 object represents the service utilized by a single Participant 401 object in the delivery of a format over a communication protocol. For each “To”, “From,” “Cc,” and “Bcc” associated with a message, there may be a Participant 401 object containing a Service Identifier 402 indicating which service was used as the underlying format and communication protocol. The Service Identifier includes data related to the delivery of the message, including the type of the service, and the address. In the case of an SMS text message, a Service Identifier 402 object would have the type of “SMS” and the address would be respective telephone number. The Service Identifier 402 object implies a format and communication protocol unique to that indicated service.
Message Unique 405 is the representation format and communication protocol specific format for a message. For every message sent using a particular delivery method to one or more recipients, one or more Message Unique 405 objects may be instantiated. Message Unique 405 objects contain the format and communication protocol specific data gathered during the delivery process. For example, timestamps of “sent” and “received,” based on the communication protocol, may be stored in this object. Additionally, in instances where the format and communication protocol are limited in some fundamental way, e.g. TWITTER® messages are limited to 140 characters and SMS text messages are limited to 160 characters, it may be necessary to send multiple messages across these communication protocols to fully convey the Sender's intended message. For this purpose, multiple Message Unique 405 objects would be instantiated to track the transmitted content.
The Message Common 403 object is the message that an “on-network” user views in their Inbox feed. For every user message sent, there are common components present in all formats and communication protocols. For efficiency, these common components are extracted and contained in one object. Because of this efficiency, there is one Message Common 403 object for every message sent by the Sender. For example, the Message Common 403 object may store the body of the message, as well as the time sent at the moment the Sender selects ‘send,’ not the actual ‘sent time’ as reported by the underlying communication protocol (which may vary from protocol to protocol). This has the advantage of presenting one ‘unified’ or ‘common’ view to the Sender and recipient(s), while resolving minor discrepancies from the underlying communication protocols.
The Message Source 406 object is a representation of the Message Unique 405 object, e.g., in a Javascript object notation (JSON) format. The Message Source 406 object may thus have a one-to-one relationship with the Message Unique 405 object.
Message Group 404 object is a representative identifier that coordinates a Message Common 403 object. The purpose of a Message Group 404 object is to enable multi-protocol communication and establish a relationship between those messages. There may also be a one-to-one relationship between the Message Group 404 object and the Message Common 403 object.
Turning now to
The following examples pertain to further embodiments.
Example 1 is a non-transitory computer readable medium that comprises computer executable instructions stored thereon to cause one or more processing units to: receive a first message in a first communications format; parse the first message based, at least in part, on the first communications format, to extract one or more characteristics; apply a first set of rules to the one or more characteristics; discover a first secondary message content item based, at least in part, on the application of the first set of rules to the one or more characteristics; store the first secondary message content item in a database; store one or more key content items associated with the first secondary message content item in the database; index the one or more key content items; and associate the one or more indexed key content items and the first secondary message content item with the first message in the database.
Example 2 includes the subject matter of example 1, wherein the computer executable instructions further cause the one or more processing units to: store a first contemporaneous data item associated with the first secondary message content item in the database; and associate the first contemporaneous data item with the one or more indexed key content items and the first secondary message content item in the database.
Example 3 includes the subject matter of example 2, wherein the computer executable instructions further cause the one or more processing units to: index the first contemporaneous data item.
Example 4 includes the subject matter of example 1, wherein the computer executable instructions further cause the one or more processing units to: receive, from a first client application, a first query for content associated with the first secondary message content item; and generate a result set comprising at least one of the following: the first message; the first secondary message content item; and the one or more key content items associated with the first secondary message content item.
Example 5 includes the subject matter of example 4, wherein the result set is sorted based, at least in part, on a preference of a user of the first client application.
Example 6 includes the subject matter of example 2, wherein the computer executable instructions further cause the one or more processing units to: receive, from a first client application, a first query for content associated with the first secondary message content item; and generate a result set comprising at least one of the following: the first message; the first secondary message content item; the one or more key content items associated with the first secondary message content item; and the first contemporaneous data item associated with the first secondary message content item.
Example 7 includes the subject matter of example 1, wherein the first secondary message content item comprises at least one of the following: an attachment to the first message; a document associated with the first message; and a URL link from the first message.
Example 8 includes the subject matter of example 1, wherein the instructions to store one or more key content items associated with the first secondary message content item in the database further comprise instructions to crawl a webpage associated with the first secondary message content item.
Example 9 includes the subject matter of example 8, wherein the instructions to store one or more key content items associated with the first secondary message content item in the database further comprise instructions to store one or more media items from the webpage associated with the first secondary message content item.
Example 10 includes the subject matter of example 4, wherein: the first secondary message content item comprises a URL link, and the one or more key content items associated with the first secondary message content item comprise at least one of the following: a path address of the URL link; a contemporaneous capture of the URL link target webpage or web-accessible file; and a clipping of text or media from the URL link target webpage or web-accessible file.
Example 11 is a computer-implemented method, comprising: receiving a first message in a first communications format; parsing the first message based, at least in part, on the first communications format, to extract one or more characteristics; applying a first set of rules to the one or more characteristics; discovering a first secondary message content item based, at least in part, on the application of the first set of rules to the one or more characteristics; storing the first secondary message content item in a database; storing one or more key content items associated with the first secondary message content item in the database; indexing the one or more key content items; and associating the one or more indexed key content items and the first secondary message content item with the first message in the database.
Example 12 includes the subject matter of example 11, further comprising: storing a first contemporaneous data item associated with the first secondary message content item in the database; and associating the first contemporaneous data item with the one or more indexed key content items and the first secondary message content item in the database.
Example 13 includes the subject matter of example 12, further comprising: indexing the first contemporaneous data item.
Example 14 includes the subject matter of example 11, further comprising: receiving, from a first client application, a first query for content associated with the first secondary message content item; and generating a result set comprising at least one of the following: the first message; the first secondary message content item; and the one or more key content items associated with the first secondary message content item.
Example 15 includes the subject matter of example 14, wherein the result set is sorted based, at least in part, on a preference of a user of the first client application.
Example 16 includes the subject matter of example 12, further comprising: receiving, from a first client application, a first query for content associated with the first secondary message content item; and generating a result set comprising at least one of the following: the first message; the first secondary message content item; the one or more key content items associated with the first secondary message content item; and the first contemporaneous data item associated with the first secondary message content item.
Example 17 includes the subject matter of example 11, wherein the first secondary message content item comprises at least one of the following: an attachment to the first message; a document associated with the first message; and a URL link from the first message.
Example 18 includes the subject matter of example 11, wherein the act of storing one or more key content items associated with the first secondary message content item in the database further comprises crawling a webpage associated with the first secondary message content item.
Example 19 includes the subject matter of example 18, wherein the act of storing one or more key content items associated with the first secondary message content item in the database further comprises storing one or more media items from the webpage associated with the first secondary message content item.
Example 20 includes the subject matter of example 14, wherein: the first secondary message content item comprises a URL link, and the one or more key content items associated with the first secondary message content item comprise at least one of the following: a path address of the URL link; a contemporaneous capture of the URL link target webpage or web-accessible file; and a clipping of text or media from the URL link target webpage or web-accessible file.
Example 21 is a system, comprising: a memory; and one or more processing units, communicatively coupled to the memory, wherein the memory stores instructions to configure the one or more processing units to: receive a first message in a first communications format; parse the first message based, at least in part, on the first communications format, to extract one or more characteristics; apply a first set of rules to the one or more characteristics; discover a first secondary message content item based, at least in part, on the application of the first set of rules to the one or more characteristics; store the first secondary message content item in a database; store one or more key content items associated with the first secondary message content item in the database; index the one or more key content items; and associate the one or more indexed key content items and the first secondary message content item with the first message in the database.
Example 22 includes the subject matter of example 21, wherein the instructions are further configured to cause the one or more processing units to: store a first contemporaneous data item associated with the first secondary message content item in the database; index the first contemporaneous data item; and associate the indexed first contemporaneous data item with the one or more indexed key content items and the first secondary message content item in the database.
Example 23 includes the subject matter of example 21, wherein the instructions are further configured to cause the one or more processing units to: receive, from a first client application, a first query for content associated with the first secondary message content item; and generate a result set comprising at least one of the following: the first message; the first secondary message content item; and the one or more key content items associated with the first secondary message content item.
Example 24 includes the subject matter of example 21, wherein the first secondary message content item comprises at least one of the following: an attachment to the first message; a document associated with the first message; and a URL link from the first message.
Example 25 includes the subject matter of example 21, wherein the instructions to store one or more key content items associated with the first secondary message content item in the database further comprise instructions to crawl a webpage associated with the first secondary message content item.
In the foregoing description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the disclosed embodiments. It will be apparent, however, to one skilled in the art that the disclosed embodiments may be practiced without these specific details. In other instances, structure and devices are shown in block diagram form in order to avoid obscuring the disclosed embodiments. References to numbers without subscripts or suffixes are understood to reference all instance of subscripts and suffixes corresponding to the referenced number. Moreover, the language used in this disclosure has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter, resort to the claims being necessary to determine such inventive subject matter. Reference in the specification to “one embodiment” or to “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one disclosed embodiment, and multiple references to “one embodiment” or “an embodiment” should not be understood as necessarily all referring to the same embodiment.
It is also to be understood that the above description is intended to be illustrative, and not restrictive. For example, above-described embodiments may be used in combination with each other and illustrative process steps may be performed in an order different than shown. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of the invention therefore should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.