The disclosed subject matter is related to at least the technical fields of information retrieval systems, distributed computing systems, natural language processes, semantic-search processes, word embedding processes, and formal concept analysis processes.
Application software products (i.e., applications) have been developed to perform a variety of functions related to, for example, word processing, spreadsheets, slide show presentations, database management, electronic mail, Internet access, business productivity, educational assistance, health and fitness management, providing digital content (such as, for example, text, pictures, audio, video, and electronic games), navigation, text messaging, access to social media networks, etc. The advancement of electronic communication network bandwidth capabilities in the last decade has enabled the delivery of applications to shift from being primarily performed via physical data storage devices (such as, for example, floppy disks, compact discs, digital versatile discs, and Universal Serial Bus flash drives) to being performed via online distribution in which developers can upload applications from an application host platform to a digital distribution platform, and users can download applications from the digital distribution platform to a user device. The digital distribution platform can be an application marketplace, online store, or other distribution system.
According to an implementation of the disclosed subject matter, in a method for producing a personalized selection of applications for presentation on a web-based interface, a first vector can be produced by a processor through a word embedding process. The first vector can represent one or more first words. The one or more first words can be from a first query. The first query can be a free-form text query. A second query can be transmitted, in response to a first determination, from the processor to a digital distribution platform. The second query can include the one or more first words and one or more second words. The first determination can be that a measure of similarity between the first vector and a second vector is greater than a threshold. The second vector can represent the one or more second words. A response to the second query, from the digital distribution platform, can be received by the processor. The response to the second query can include an identification of a first application. The first application can be available for distribution by the digital distribution platform. A cluster of applications can be generated, in response to a second determination, by the processor. The cluster of applications can include the first application and a second application. The second application can be available for distribution by the digital distribution platform. The second determination can be of an existence of a relationship between the first application and the second application. The personalized selection of applications can be produced, based on information about the cluster of applications, by the processor for presentation on the web-based interface for a user account associated with the first query.
According to an implementation of the disclosed subject matter, in a non-transitory computer-readable medium storing computer code for controlling a processor to cause the processor to produce a personalized selection of applications for presentation on a web-based interface, the computer code can include instructions to cause the processor to produce a first vector through a word embedding process. The first vector can represent one or more first words. The one or more first words can be from a first query. The first query can be a free-form text query. The computer code can include instructions to cause the processor to transmit, in response to a first determination, a second query to a digital distribution platform. The second query can include the one or more first words and one or more second words. The first determination can be that a measure of similarity between the first vector and a second vector is greater than a threshold. The second vector can represent the one or more second words. The computer code can include instructions to cause the processor to receive a response to the second query from the digital distribution platform. The response to the second query can include an identification of a first application. The first application can be available for distribution by the digital distribution platform. The computer code can include instructions to cause the processor to generate, in response to a second determination, a cluster of applications. The cluster of applications can include the first application and a second application. The second application can be available for distribution by the digital distribution platform. The second determination can be of an existence of a relationship between the first application and the second application. The computer code can include instructions to cause the processor to produce, based on information about the cluster of applications, the personalized selection of applications for presentation on the web-based interface for a user account associated with the first query.
According to an implementation of the disclosed subject matter, a system for producing a personalized selection of applications for presentation on a web-based interface can include a processor, communications circuitry, and a memory. The processor can be configured to produce, through a word embedding process, a first vector. The first vector can represent one or more first words. The one or more first words can be from a first query. The first query can be a free-form text query. The processor can be configured to determine that a measure of similarity between the first vector and a second vector is greater than a threshold. The second vector can represent one or more second words. The processor can be configured to determine an existence of a relationship between a first application and a second application. The first application and the second application can be available for distribution by a digital distribution platform. The processor can be configured to generate, in response to a first determination, a cluster of applications. The cluster of applications can include the first application and the second application. The first determination can be of the existence of the relationship. The processor can be configured to produce, based on information about the cluster of applications, the personalized selection of applications for presentation on the web-based interface for a user account associated with the first query. The communications circuitry can be configured to transmit, in response to a second determination, a second query to the digital distribution platform. The second query can include the one or more first words and the one or more second words. The second determination can be that the measure of similarity is greater than the threshold. The communications circuitry can be configured to receive, from the digital distribution platform, a response to the second query. The response to the second query can include an identification of the first application. The memory can be configured to store one or more first words, the first vector, the first query, the one or more second words, the second vector, the second query, the measure of similarity, the threshold, the response to the second query, and the information about the cluster of applications.
According to an implementation of the disclosed subject matter, a system for producing a personalized selection of applications for presentation on a web-based interface. The system can include means for producing, through a word embedding process, a first vector. The first vector can represent one or more first words. The one or more first words can be from a first query. The first query can be a free-form text query. The system can include means for transmitting, in response to a first determination, a second query to a digital distribution platform. The second query can include the one or more first words and one or more second words. The first determination can be that a measure of similarity between the first vector and a second vector is greater than a threshold. The second vector can represent the one or more second words. The system can include means for receiving, from the digital distribution platform, a response to the second query. The response to the second query can include an identification of a first application. The first application can be available for distribution by the digital distribution platform. The system can include means for generating, in response to a second determination, a cluster of applications. The cluster of applications can include the first application and a second application. The second application can be available for distribution by the digital distribution platform. The second determination can be of an existence of a relationship between the first application and the second application. The system can include means for producing, based on information about the cluster of applications, the personalized selection of applications for presentation on the web-based interface for a user account associated with the first query.
Additional features, advantages, and aspects of the disclosed subject matter are set forth or apparent from consideration of the following detailed description, drawings, and claims. Moreover, it is to be understood that both the foregoing summary and the following detailed description are illustrative and are intended to provide further explanation without limiting the scope of the claims.
The accompanying drawings, which are included to provide a further understanding of the disclosed subject matter, are incorporated in and constitute a part of this specification. The drawings also illustrate aspects of the disclosed subject matter and together with the detailed description serve to explain the principles of aspects of the disclosed subject matter. No attempt is made to show structural details in more detail than may be necessary for a fundamental understanding of the disclosed subject matter and various ways in which it may be practiced.
As used herein, a statement that a component can be “configured to” perform an operation can be understood to mean that the component requires no structural alterations, but merely needs to be placed into an operational state (e.g., be provided with electrical power, have an underlying operating system running, etc.) in order to perform the operation.
An information retrieval platform can be an electronic system configured to receive, from a user, a request that represents one or more characteristics of an informational need of the user. The information retrieval platform can be configured to produce a web-based interface to be transmitted to a user device of the user to facilitate receipt of the request. The web-based interface can be presented on the user device and can include a text box into which the user can enter the request as a query. The information retrieval platform can be configured to provide a response to the query. The response can include one or more data objects, from a collection of data objects, relevant to the informational need. A data object can be a particular way of organizing data so that the data can be used efficiently. Determining that a data object is relevant to the informational need can involve interpreting a relationship between the data object and the informational need. Accordingly, the information retrieval platform typically can perform operations to measure a degree of the relationship, or the relevancy, between the data object and the informational need. The response can be presented on the web-based interface as graphical control elements (e.g., icons) associated with the one or more data objects. Frequently, the response can include a large number of data objects. (For example, the Word Wide Web has over 4.7 billion pages and Google Play™ has over two million applications.) For at least this reason, the information retrieval platform usually can rank the data objects according to degrees of relevancy and can present the data objects according to their ranks.
The information retrieval platform can be configured to generate a cluster of data objects. A cluster can be a set of data objects grouped in such a way that data objects in the same cluster are more similar (in some sense or another) to each other than to data objects in other clusters. The response to the query can be presented on the web-based interface as data objects organized into clusters. The information retrieval platform can be configured to operate in conjunction with a digital distribution platform so that the data objects are for applications. The web-based interface can be used as a home page for the digital distribution platform. The home page can include a predetermined set of applications organized into predetermined clusters.
The web-based interface 106 can be used as the home page for the digital distribution platform. The home page can include a predetermined set of applications organized into predetermined clusters. For example, the web-based interface 106 can present graphical control elements (e.g., icons) associated with some of the applications in the “games” cluster 110, the “movies” cluster 112, the “music” cluster 114, and the “books” cluster 116 that were predetermined to be included in the home page. The information retrieval platform 104 can rank the applications, presented on the web-based interface 106, according to pre-determined degrees of relevancy. For example, the web-based interface 106 can present applications “b1”, “i1”, “f1”, and “m1” from the “games” cluster 110, applications “j2”, “g2”, “c2”, and “e2” from the “movies” cluster 112, applications “k3”, “d3”, “l3”, and “p3” from the “music” cluster 114, and applications “h4”, “n4”, “a4”, and “o4” from the “books” cluster 116.
However, because: (1) the digital distribution platform can include a large number of applications (for example, Google Play™ has over two million applications) and (2) only a limited number of applications can be presented on the web-based interface 106, determining that an application is relevant to the informational need based strictly on a measure of a degree of relevancy can have an unintended consequence of producing a response that includes applications having a small degree of variety from one another. Such a response can fail to fully capture an intent of the user who entered the query. For example, if: (1) the user entered a query for “job search book” and (2) “What Color Is Your Parachute?” is the job search book with largest degree of relevancy, then a response in which (with reference to the web-based interface 106 illustrated in
Additionally, because: (1) a query for “job search book” can indicate that the user is interested in the topic of “job search” and (2) the applications presented on the web-based interface 106 in response to the query (“e1”, “j1”, “g1”, “m1”, “k2”, “h2”, “c2”, “f2”, “l3”, “e3”, “m3”, “p3”, “i4”, “o4”, “a4”, and “p4” illustrated in
In contrast, according to the disclosed subject matter, a personalized selection of applications can be produced, based on a history of one or more queries associated with a user account of a user, for presentation on a web-based interface. The disclosed production of the personalized selection of applications is rooted in information retrieval technology to overcome a problem specifically arising from: (1) producing a response to a query based strictly on a measure of a degree of relevancy between an application and an informational need represented by the query and (2) failing to include other techniques to determine topics that are of interest to the user. Advantageously, because the disclosed production of the personalized selection of applications can present applications directed to topics that are of interest to the user, the disclosed production of the personalized selection of applications can preclude, in some instances, a need for the user to enter a query. Such preclusion of a need to enter a query can free bandwidth between the user device 102 and the information retrieval platform 104 to convey information other than the query and a response to the query.
The disclosed production of the personalized selection of applications can be realized using a natural language process, a semantic-search process, a word embedding process, a formal concept analysis process, the like, or any combination thereof. A natural language process can refer to a technique to interact with a computer system using a natural human language. A semantic-search process can refer to a technique to improve an understanding of: (1) an intent of a user who enters a query, (2) a contextual meaning of a term as it appears in a searchable dataspace, (3) the like, or (4) any combination thereof. A word embedding process can refer to a set of language modeling and feature learning techniques in which one or more words are mapped from a vocabulary to vectors of real numbers in a low-dimensional space relative to a size of the vocabulary (i.e., a number of dimensions of the vectors can be less than a number of words included in the vocabulary). (For example, the Oxford English Dictionary has a vocabulary of more than 200,000 words.) Dimensions of the vector can represent various aspects of the particular one or more words. For example, the aspects can include: (1) a number of occurrences of the particular one or more words in documents in a collection of documents, (2) other words and their displacements from the particular one or more words in a context of a phrase, (3) the like, or (4) any combination of the foregoing. A formal analysis concept process can refer to a technique for deriving a concept hierarchy or formal ontology from a collection of data objects and their properties. A concept in a hierarchy can represent a set of data objects that share same values for a specific set of the properties.
According to the disclosed subject matter, a first vector, which can represent one or more first words from a first query associated with the user account of the user, can be produced through a word embedding process. The first query can be a free-form text query. A second vector can be obtained. For example, the second vector can be retrieved from a knowledge base. For example, the knowledge base can include the Knowledge Graph (developed and maintained by Google Inc. of Mountain View, Calif.). The Knowledge Graph is a knowledge base used to enhance search results of a search engine with semantic-search information. A first determination can be made that a measure of similarity between the first vector and the second vector is greater than a threshold. The second vector can represent one or more second words. A second query can be transmitted to a digital distribution platform in response to the first determination. The second query can include the one or more first words (from the first query) and the one or more second words (derived from the second vector). (A response to a query with two sets of one or more words can be more likely to include applications having a large degree of variety from one another than a response to a query with only one set of the two sets of one or more words.) A response to the second query can be received from the digital distribution platform. The response to the second query can include an identification of a first application available for distribution by the digital distribution platform. A second determination can be made of an existence of a relationship between the first application and a second application available for distribution by the digital distribution platform. For example, the relationship can be based on: (1) an action performed on a user device associated with the user account and that involves the first application and the second application, (2) a same topic associated with the first application and the second application, (3) the like, or (4) any combination thereof. A cluster of applications can be generated in response to the second determination. The cluster of applications can include the first application and the second application. (A cluster with two applications can have a greater degree of variety than a cluster with only one of the two applications.) The personalized selection of applications can be produced, based on information about the cluster of applications, for presentation on the web-based interface for the user account associated with the first query.
In general, each of the information retrieval platform 104, the digital distribution platform 302, and the application host platform 306 can be a computer-implemented platform configured to automatically perform some or all of the functions disclosed herein. The information retrieval platform 104 can be, for example, a combination of hardware architecture, operating system, runtime libraries, and/or computer software or code object to support an information retrieval system. In an implementation, the information retrieval platform 104 can be configured specifically to support information retrieval operations. The digital distribution platform 302 can be, for example, a combination of hardware architecture, operating system, runtime libraries, and/or computer software or code object to support a digital distribution system. In an implementation, the digital distribution platform 302 can be configured specifically to support digital distribution operations. The application host platform 306 can be, for example, a combination of hardware architecture, operating system, runtime libraries, and/or computer software or code object to support an application host system. In an implementation, the application host platform 306 can be configured specifically to support application host operations. Alternatively, the information retrieval platform 104 and the digital distribution platform 302 can be combined in a platform 310. Alternatively, the information retrieval platform 104, the digital distribution platform 302, and the knowledge base 304 can be combined in a platform 312.
In general, the user device 102 can be, for example, any suitable electronic client device, such as a smartphone, a cellular phone, a personal digital assistant (PDA), a wireless communication device, a handheld device, a desktop computer, a laptop computer, a netbook, a tablet computer, a web portal, a digital video recorder, a video game console, an e-book reader, etc. The user device 102 can be associated with one or more users. Likewise, the user device 102 can be a plurality of user devices and a single user can be associated with one or more of the plurality of user devices.
The network 308 can be, for example, a telecommunications network configured to allow computers to exchange data. Connections between elements of the distributed computing system 300 via the network 308 can be established using cable media, wireless media, or both. Data traffic on the network 308 can be organized according to a variety of communications protocols including, but not limited to, the Internet Protocol Suite (Transmission Control Protocol/Internet Protocol (TCP/IP)), the Institute of Electrical and Electronics Engineers (IEEE) 802 protocol suite, the synchronous optical networking (SONET) protocol, the Asynchronous Transfer Mode (ATM) switching technique, the like, or any combination thereof. In an aspect, the network 308 can include the Internet.
The collection of data objects 400 can include, for descriptive purposes herein, a data object 402 for the application Learning to Sing Music, a data object 404 for the application Singing Made Easy, a data object 406 for the application Basic Singing, a data object 408 for the application Beginning Guitar, a data object 410 for the application Learn Guitar Music, a data object 412 for the application Getting Started on the Guitar, a data object 414 for the application Maps, a data object 416 for the application How to Make a Guitar, a data object 418 for the application Beginning Banjo, a data object 420 for the data object Beginning
Instruments, a data object 422 for the application Accompanying Plano, and a data object 424 for the application Learning Sports.
Returning to
Returning to
Returning to
Therefore, in the vector 800, the value for the dimension “fun, −2” can be one, the value for the dimension “fun, −1” can be zero, the value for the dimension “is, −3” can be one, the value for the dimension “it, −4” can be one, the value for the dimension “music, +2” can be zero, the value for the dimension “music, +3” can be one, the value for the dimension “play, +2” can be one, the value for the dimension “play, +3” can be zero, the value for the dimension “to, −2” can be zero, the value for the dimension “to, −1” can be one, the value for the dimension “to, +1” can be one, and the value for the dimension “to, +2” can be zero.
One of skill in the art in light of the description herein understands other dimensions that can be used for the first vector besides the dimensions illustrated in
Returning to
At an operation 612, a second query can be transmitted, from the processor to the digital distribution platform, in response to a first determination. The second query can include the one or more first words (from the first query) and the one or more second words (derived from the second vector). (A response to a query with two sets of one or more words can be more likely to include applications having a large degree of variety from one another than a response to a query with only one set of the two sets of one or more words.) For, example, the second query can include “learn” and “music” (from the first query) and “play” and “guitar” (derived from the second vector).
The first determination can be that a measure of similarity between the first vector and a second vector is greater than a second threshold. The measure of similarity can include, for example: (1) a cosine similarity between the first vector and the second vector, (2) a product of the first vector multiplied by a weight, (3) the like, or (4) any combination thereof. A value of the weight can be determined, for example, by: (1) a part of speech of one of the one or more the first words (e.g., noun, pronoun, adjective, verb, adverb, preposition, conjunction, or interjection), (2) a number of occurrences of the first word in documents in a collection of documents, (3) the like, or (4) any combination thereof.
One of skill in the art in light of the description herein understands other techniques that can be used to make a determination of similarity between vectors besides the techniques illustrated in
Returning to
At an optional operation 616, information about a second application can be retrieved, by the processor, from the digital distribution platform. The second application can be available for distribution by the digital distribution platform. For example, the information about the second application can be used by the processor to perform the operation 618 illustrated in
In
For example, with reference to
For example, with reference to
For example, with reference to
For example, with reference to
Therefore, the cluster of applications for Darla can include the application Learn Guitar Music, the application Maps, and the application Getting Started on the Guitar; the cluster of applications for Brad can include the application Learn Guitar Music, the application How to Make a Guitar, and the application Getting Started on the Guitar.
Returning to
At the optional operation 622, a concept of data objects for applications available for distribution by the digital distribution platform (i.e., a concept) can be determined, by the processor, through a formal concept analysis process. The concept can include a set of data objects from a population of data objects. The set of data objects can be defined by a set of specific words included in an attribute field of each data object in the set of data objects.
For example, with reference to
Optionally, the concept can include a merger of a first concept and a second concept. (For example, several concepts may be determined though performance of the operation 622.) The merger can be produced by merging the first concept and the second concept.
For example, with reference to
Returning to
For example, with reference to
Returning to
Returning to
For example, with reference to
Therefore, the cluster of applications for Darla can include the application Learn Guitar Music, the application Maps, the application Getting Started on the Guitar, the application Learning to Sing Music, the application Singing Made Easy, the application Basic Singing, the application Beginning Guitar, the application Beginning Instruments, and the application Accompanying Plano; the cluster of applications for Brad can include the application Learn Guitar Music, the application How to Make a Guitar, the application Getting Started on the Guitar, the application Learning to Sing Music, the application Singing Made Easy, the application Basic Singing, the application Beginning Guitar, the application Getting Started on the Guitar, the application Beginning Instruments, and the application Accompanying Plano.
Returning to
Additionally, the information retrieval platform 104 can be configured to generate the cluster of applications for the user account associated with the first query. For example, the information retrieval platform 104 can generate the “Brad's interests” cluster 1402 (based on the first query having been entered by the user account associated with Brad) and the “Darla's interests” cluster 1404 (based on the first query having been entered by the used account associated with Darla). The “Brad's interests” cluster 1402 can include, for example, modified data objects associated with those applications, from the “games” cluster 110, the “movies” cluster 112, the “music” cluster 114, and the “books” cluster 116, determined to be related to the topics that are of interest to Brad. For example, the “Brad's interests” cluster 1402 can include a modified data object “j4”' (for the application Learn Guitar Music) from the “books” cluster 116, a modified data object “f4′” (for the application
How to Make a Guitar) from the “books” cluster 116, a modified data object “n1”' (for the application Getting Started on the Guitar) from the “games” cluster 110, and a modified data object “f3”' (for the application Learning to Sing Music) from the “music” cluster 114. Likewise, the “Darla's interests” 1404 can include, for example, modified data objects associated with those applications, from the “games” cluster 110, the “movies” cluster 112, the “music” cluster 114, and the “books” cluster 116, determined to be related to the topics that are of interest to Darla. For example, the “Darla's interests” cluster 1404 can include a modified data object “j4″” (for the application Learn Guitar Music) from the “books” cluster 116, a modified data object “c4″” (for the application Maps) from the “books” cluster 116, a modified data object “n1″” (for the application Getting Started on the Guitar) from the “games” cluster 110, and a modified data object “12″” (for the application Singing Made Easy) from the “movies” cluster 112.
The user device 102 illustrated in
Because the applications presented on the web-based interface 106 can include the personalized selection of applications 1406, the applications presented on the web-based interface 106 may be directed to topics of interest to Brad. Because the applications presented on the web-based interface 106 may be directed to topics of interest to Brad, the web-based interface 106 with the personalized selection of applications 1406 can preclude, in some instances, a need for Brad to enter a query. Such preclusion of a need to enter a query can free bandwidth between the user device 102 and the information retrieval platform 104 to convey information other than the query and a response to the query.
The processor 1602 can be configured to produce, through a word embedding process, a first vector. The first vector can represent one or more first words, the one or more first words can be from a first query. The first query can be a free-form text query. The word embedding process can include, for example: (1) a neural network process, (2) a process to reduce dimensions of a word co-occurrence matrix, (2) a process that uses a probabilistic model, (4) a process to represent the one or more first words in terms of a context in which the one or more first words are used, (5) the like, or (6) any combination thereof. A dimension of the first vector can include, for example: (1) a number of occurrences of the one or more first words in documents in a collection of documents, (2) another word and a displacement of the other word from one of the one or more first words in a context of a phrase, (3) the like, or (4) any combination thereof.
The processor 1602 can be configured to determine that a measure of similarity between the first vector and a second vector is greater than a first threshold. The second vector can represent one or more second words. The measure of similarity can include, for example: (1) a cosine similarity between the first vector and the second vector, (2) a product of the first vector multiplied by a weight, (3) the like, or (4) any combination thereof. A value of the weight can be determined, for example, by: (1) a part of speech of one of the one or more the first words (e.g., noun, pronoun, adjective, verb, adverb, preposition, conjunction, or interjection), (2) a number of occurrences of the first word in documents in a collection of documents, (3) the like, or (4) any combination thereof.
The processor 1602 can be configured to determine an existence of a relationship between a first application and a second application. The first application and the second application can be available for distribution by the digital distribution platform 302.
The processor 1602 can be configured to generate, in response to a first determination, a cluster of applications, the cluster of applications including the first application and the second application. The first determination can be of the existence of the relationship. The existence of the relationship can include, for example: (1) an indication that the first application was opened on a user device, associated with a user account associated with the first query, at a first time, and the second application was opened on the user device at a second time; (2) an indication that the first application was installed on the user device at a third time, and the second application was installed on the user device at a fourth time; (3) an indication that the first application and the second application are related to a same topic; (4) the like; or (5) any combination thereof. Optionally, the first time can be different from the second time, and the first time and the second time can be within a first duration of time. Optionally, the third time can be different from the fourth time, and the third time and the fourth time can be within a second duration of time.
The processor 1602 can be configured to produce, based on information about the cluster of applications, the personalized selection of applications for presentation on the web-based interface for a user account associated with the first query.
The communications circuitry 1604 can be configured to transmit, to the digital distribution platform 302 and in response to a second determination, a second query. The second query can include the one or more first words (from the first query) and the one or more second words (derived from the second vector). The second determination can be that the measure of similarity is greater than the first threshold.
The communication circuitry 1604 can be configured to receive, from the digital distribution platform 302, a response to the second query. The response to the second query can include an identification of the first application.
The memory 1606 can be configured to store the one or more first words, the first vector, the first query, the one or more second words, the second vector, the second query, the measure of similarity, the first threshold, the response to the second query, and the information about the cluster of applications.
Optionally, the processor 1602 can be further configured to retrieve the first query from the digital distribution platform 302.
Optionally, the processor 1602 can be further configured to produce a modified first query by: (1) changing a specific tense of a first specific word of the one or more first words, (2) changing a specific grammatical number of a second specific word of the one or more first words, (3) removing a stop word from the first query, (4) the like, or (5) any combination thereof. Optionally, the processor 1602 can be further configured to determine that a number of occurrences, in the digital distribution platform, of the modified first query is greater than a second threshold.
Optionally, the processor 1602 can be further configured to retrieve the second vector from a knowledge base. For example, a knowledge base can be a technology used to store complex structured and unstructured information used by a computer system. The knowledge base can include, for example, the Knowledge Graph (developed and maintained by Google Inc. of Mountain View, Calif.).
Optionally, the processor 1602 can be further configured to retrieve information about the second application from the digital distribution platform 302.
Optionally, the processor 1602 can be further configured to determine, through a formal concept analysis process, a concept of data objects for applications available for distribution by the digital distribution platform. The concept can include a set of data objects from a population of data objects. The set of data objects can be defined by a set of specific words included in an attribute field of each data object in the set of data objects.
Optionally, the processor 1602 can be further configured to retrieve, from the digital distribution platform 302, information from the data objects.
Optionally, the concept can include a merger of a first concept and a second concept.
For example, the processor 1602 can be further configured to produce the merger by: (1) calculating a first quotient of a number of words included in a set of specific words included in an attribute field of each data object included in both the first concept and the second concept divided by a number of words included in a set of specific words included in the attribute field of the each data object included in the first concept, (2) calculating a second quotient of the number of words included in the set of specific words included in the attribute field of the each data object included in both the first concept and the second concept divided by a number of words included in a set of specific words included in the attribute field of the each data object included in the second concept, and (3) producing, in response to a third determination, the merger. The third determination can be that at least one of the first quotient or the second quotient is greater than or equal to a third threshold.
Optionally, the processor 1602 can be further configured to modify, in response to a fourth determination, the cluster of applications to include the applications associated with the data objects included in the concept. The fourth determination can be that a word, of the set of specific words, matches at least one of the at least one first word or the at least one second word.
In situations in which the systems discussed here collect personal information about users, or may make use of personal information, the users may be provided with an opportunity to control whether programs or features collect user information (e.g., information about a user's social network, social actions or activities, profession, a user's preferences, or a user's current location), or to control whether and/or how to receive content from the content server that may be more relevant to the user. In addition, certain data may be treated in one or more ways before it is stored or used, so that personally identifiable information is removed. For example, a user's identity may be treated so that no personally identifiable information can be determined for the user, or a user's geographic location may be generalized where location information is obtained (such as to a city, ZIP code, or state level), so that a particular location of a user cannot be determined. Thus, the user may have control over how information is collected about the user and used by a system as disclosed herein.
Aspects of the presently disclosed subject matter may be implemented in and used with a variety of component and network architectures.
The bus 21 can allow data communication between the central processor 24 and one or more memory components, which can include RAM, ROM, and other memory, as previously noted. Typically RAM can be the main memory into which an operating system and application programs are loaded. A ROM or flash memory component can contain, among other code, the basic input-output system (BIOS) which can control basic hardware operation such as the interaction with peripheral components. Applications resident with the computer 20 can generally be stored on and accessed via a computer readable medium, such as a hard disk drive (e.g., fixed storage 23), an optical drive, floppy disk, or other storage medium.
The fixed storage 23 can be integral with the computer 20 or can be separate and accessed through other interfaces. The network interface 29 can provide a direct connection to a remote server via a wired or wireless connection. The network interface 29 can provide such connection using any suitable technique and protocol as is readily understood by one of skill in the art, including digital cellular telephone, WiFi™, Bluetooth®, near-field, and the like. For example, the network interface 29 can allow the computer to communicate with other computers via one or more local, wide-area, or other communication networks, as described in further detail below.
Many other devices or components (not shown) can be connected in a similar manner (e.g., document scanners, digital cameras and so on). Conversely, all of the components illustrated in
More generally, various aspects of the presently disclosed subject matter can include or be realized in the form of computer-implemented processes and apparatuses for practicing those processes. Aspects also can be realized in the form of a computer program product having computer program code containing instructions embodied in non-transitory and/or tangible media, such as floppy diskettes, CD-ROMs, hard drives, universal serial bus (USB) drives, or any other machine readable storage medium, such that when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing aspects of the disclosed subject matter. Aspects also can be realized in the form of computer program code, for example, whether stored in a storage medium, loaded into and/or executed by a computer, or transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, such that when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing aspects of the disclosed subject matter. When implemented on a general-purpose microprocessor, the computer program code segments configure the microprocessor to create specific logic circuits.
In some configurations, a set of computer-readable instructions stored on a computer-readable storage medium can be implemented by a general-purpose processor, which can transform the general-purpose processor or a device containing the general-purpose processor into a special-purpose device configured to implement or carry out the instructions. Aspects can be implemented using hardware that can include a processor, such as a general purpose microprocessor and/or an application-specific integrated circuit (ASIC) that embodies all or part of the techniques according to aspects of the disclosed subject matter in hardware and/or firmware. The processor can be coupled to memory, such as RAM, ROM, flash memory, a hard disk or any other device capable of storing electronic information. The memory can store instructions adapted to be executed by the processor to perform the techniques according to aspects of the disclosed subject matter.
The foregoing description, for purpose of explanation, has been described with reference to specific aspects. However, the illustrative discussions above are not intended to be exhaustive or to limit aspects of the disclosed subject matter to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The aspects were chosen and described in order to explain the principles of aspects of the disclosed subject matter and their practical applications, to thereby enable others skilled in the art to utilize those aspects as well as various aspects with various modifications as may be suited to the particular use contemplated.