1. Technical Field
The present teaching relates to methods, systems, and programming for information processing. More specifically, the present teaching is directed to methods, systems, and programming for representation of information.
2. Discussion of Technical Background
Text documents coming in a sequence are common in real data and can arise in various contexts. For example, consider Web pages surfed by users in random walks along the hyperlinks, streams of click-through URLs associated with a query in search engine, publications of an author in chronological order, threaded posts in online discussion forums, answers to a question in online knowledge sharing communities, or emails replied in a same subject, to name a few. The co-occurrences of documents in a temporal sequence may reveal the relatedness between them, such as their semantic and topical similarity. In addition, sequence of words within the documents introduces another rich and complex source of the data, which can be leveraged to learn useful and insightful representations of information, such as documents and keywords.
This idea of distributed word representations has spurred many applications in natural language processing. For example, some known solutions learn vector representations of words by considering sentences and learning similar representations of words that are either often in the neighborhood of each other (e.g., vectors for “ham” and “cheese”), or not often appear in the neighborhood of each other but have similar neighborhoods (e.g., vectors for “Monday” and “Tuesday”). However, those solutions are not able to represent higher-level entities, such as documents or users, since they use a shallow neural network. This limits the applicability of their method significantly.
More recently, the concept of distributed representations has been extended beyond pure language words to phrases, sentences and paragraphs, general text-based attributes, descriptive text of images, and nodes in a network. For example, some known solutions define a vector for each document and consider this document vector to be in the neighborhood of all word tokens that belong to it. Thus, those known solutions are able to learn document vector that in some sense summarizes the words within. However, those known solutions merely consider the specific document in which the words are contained, but not the global context of the specific document and words, e.g., contextual documents in the document stream or users related to the content. In other words, those known solutions do not model contextual relationships between information at higher-levels, e.g., documents, users, and/or user groups. Thus, such architecture remains shallow.
Therefore, there is a need to provide an improved solution for representation of information to solve the above-mentioned problems.
The present teaching relates to methods, systems, and programming for information processing. Particularly, the present teaching is directed to methods, systems, and programming for representation of information.
In one example, a method, implemented on at least one computing device each having at least one processor, storage, and a communication platform connected to a network for determining similarity between information is presented. A first piece of information and a second piece of information are received. Each of the first and second pieces of information relates to one word in a plurality of documents, one of the plurality of documents, or one of user to which the plurality of documents are given. A model for estimating feature vectors of the first and second pieces of information is obtained. The model includes a first neural network model based, at least in part, on a first order of words within one of the plurality of documents and a second neural network model based, at least in part, on a second order in which at least some of the plurality of documents are given. Based on the model, a first feature vector of the first piece of information and a second feature vector of the second piece of information are estimated. A similarity between the first and second pieces of information is determined based on a distance between the first and second feature vectors.
In a different example, a system having at least one processor, storage, and a communication platform for determining similarity between information is presented. The system includes a data receiving module, a modeling module, an optimization module, and a similarity measurement module. The data receiving module is configured to receive a first piece of information and a second piece of information. Each of the first and second pieces of information relates to one word in a plurality of documents, one of the plurality of documents, or one of user to which the plurality of documents are given. The modeling module is configured to obtain a model for estimating feature vectors of the first and second pieces of information. The model includes a first neural network model based, at least in part, on a first order of words within one of the plurality of documents and a second neural network model based, at least in part, on a second order in which at least some of the plurality of documents are given. The optimization module is configured to estimate, based on the model, a first feature vector of the first piece of information and a second feature vector of the second piece of information. The similarity measurement module is configured to determine a similarity between the first and second pieces of information based on a distance between the first and second feature vectors.
Other concepts relate to software for implementing the present teaching on determining similarity between information. A software product, in accord with this concept, includes at least one non-transitory machine-readable medium and information carried by the medium. The information carried by the medium may be executable program code data, parameters in association with the executable program code, and/or information related to a user, a request, content, or information related to a social group, etc.
In one example, a non-transitory machine readable medium having information recorded thereon for determining similarity between information is presented. The recorded information, when read by the machine, causes the machine to perform a series of processes. A first piece of information and a second piece of information are received. Each of the first and second pieces of information relates to one word in a plurality of documents, one of the plurality of documents, or one of user to which the plurality of documents are given. A model for estimating feature vectors of the first and second pieces of information is obtained. The model includes a first neural network model based, at least in part, on a first order of words within one of the plurality of documents and a second neural network model based, at least in part, on a second order in which at least some of the plurality of documents are given. Based on the model, a first feature vector of the first piece of information and a second feature vector of the second piece of information are estimated. A similarity between the first and second pieces of information is determined based on a distance between the first and second feature vectors.
Additional features will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following and the accompanying drawings or may be learned by production or operation of the examples. The features of the present teachings may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities and combinations set forth in the detailed examples discussed below.
The methods, systems, and/or programming described herein are further described in terms of exemplary embodiments. These exemplary embodiments are described in detail with reference to the drawings. These embodiments are non-limiting exemplary embodiments, in which like reference numerals represent similar structures throughout the several views of the drawings, and wherein:
In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. However, it should be apparent to those skilled in the art that the present teachings may be practiced without such details. In other instances, well known methods, procedures, systems, components, and/or circuitry have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present teachings.
The present disclosure describes method, system, and programming aspects of efficient and effective distributed representation of information, e.g., related concepts, realized as a specialized and networked system by utilizing one or more computing devices (e.g., mobile phone, personal computer, etc.) and network communications (wired or wireless). The method and system as disclosed herein introduce an algorithm that can simultaneously model documents from a stream as well as their residing natural language in a common lower-dimensional vector space. The method and system in the present teaching include a general unsupervised learning framework to uncover the latent structure of contextual documents, where feature vectors are used to represent documents and words in the same latent space. The method and system in the present teaching introduce hierarchical models where document vectors act as units in a context of document sequences and also as global contexts of word sequences contained within them. In the hierarchical models, the probability distribution of a document depends on the surrounding documents in the stream data. The models may be trained to predict words and documents in a sequence with maximum likelihood.
The vector representations (feature vectors) of documents and words learned by the models are useful for various applications in online businesses. For example, by means of measuring the distance in the joint vector space between document and word vectors, hybrid query tasks can be addressed: 1) given a query keyword, search for similar keywords to expand the query (useful in the search product); 2) given a keyword, search for relevant documents such as news stories (useful in document retrieval); 3) given a document, retrieve similar or related documents, useful for news stream personalization and document recommendation; and 4) automatically generate related words to tag or summarize a given document, useful in native advertising or document retrieval. All these tasks are essential elements of a number of online applications, including online search, advertising, and personalized recommendation. In addition, learned vector representations can be used to obtain state-of-the-art classification results. The proposed approach represents a step towards automatic organization, semantic analysis, and summarization of documents observed in sequences.
Moreover, the method and system in the present teaching are flexible and straightforward to add more layers in order to learn additional representations for related concepts. The method and system in the present teaching are not limited to joint representations of documents and their content (words), and can be extended to the higher-level of global contextual information, such as users and user groups. For example, using data with documents specific to a different set of users (or authors), more complex models can be built in the present teaching to additionally learn distributed representations of users. The extensions can be applied to, for example, personalized recommendation and social relationship mining.
In this example, the hierarchical structure also includes a “user layer” above the “document layer.” User 1 may be the person who creates or consumes the documents in the document sequence (Doc 1, Doc 2, Doc 3, Doc 4, . . . ). For example, the documents may be recommended to user 1 as a personalized content stream, or user 1 may actively browse those documents in this sequence. In any event, the profile of user 1, e.g., her/his declared or implied interests, demographic information, geographic information, etc., may be taken into consideration in modeling the lower-level concepts in the hierarchical structure, e.g., the distributed representations of the document sequence and/or the word sequences. In addition to user 1 who creates or consumes those documents in
It is understood that the context is not only provided by higher-level concepts to lower-level concepts as described above, but can also be provided by lower-level concepts to higher-level concepts. For example, the word sequence may be used as the context for modeling the representation of Doc 2 and/or other documents in the document sequence. In another example, the document sequence may be used as the context for estimating the profile of user 1 and/or other related users. In some embodiments, both higher-level concepts and lower-level concepts may be served as the global context together. For example, in modeling distributed representations of the document sequence, both related users and content (word sequences) of those documents may be used as the global context.
The training documents in this example are given in a sequence. For example, if the documents are news articles, a document sequence can be a sequence of news articles sorted in an order in which the user read them. More specifically, assuming that a set S of S document sequences S=[s1, s2, . . . , ss] is given, each consisting of Ni documents si=(d1, d2, . . . , dNi). Moreover, each document is a sequence of Tm words dm=(w1, w2, . . . , wTm). The hierarchical neural network models in this example simultaneously learn distributed representations of contextual documents and language words in a common vector space and represent each document and word as a continuous feature vector of dimensionality D. Suppose there are M unique documents in the training data set, W unique words in the vocabulary, then during training, (M+W) D model parameters are learned.
The context of document sequence and the natural language context are learned using hierarchical neural network models of this example, where document vectors act not only as the units to predict their surrounding documents, but also the global context of word sequences within them. The second neural network model 204 learns the temporal context of document sequence, based on the assumption that temporally closer documents in the document stream are statistically more dependent. The first neural network model 202 makes use of the contextual information of word sequences. The two neural network models 202, 204 are connected by considering each document token as the global context for all words within the document. In this example, the document Dm is not only used in the second neural network model 204, but also as the global context for projecting the word within the document in the first neural network model 202.
In this example, given sequences of documents, the objective of the hierarchical model is to maximize the average data log-likelihood,
where a is the weight that trades off between focusing on minimization of the log-likelihood of document sequence and the log-likelihood of word sequences (set to 1 in the experiments described below), b is the length of the training context for document sequences, and c is the length of the training context for word sequences. In this example, continuous skip-gram (SG) model is used as the first neural network model 202, and continuous bag-of-words (CBOW) model is used as the second neural network model 204. It is understood that any suitable neural network models, such as but not limited, to n-gram language model, log-bilinear model, log-linear model, SG model, or CBOW model, can be used in any layer and the choice depends on the modalities of the problem at hand.
The CBOW model is a simplified neural language model without any non-linear hidden layers. A log-linear classifier is used to predict current word based on consecutive history and future words, where their vector representations are averaged as the input. More precisely, the objective of the CBOW model is to maximize the average log probability.
where c the context length, and wt−c:wt+c is the subsequence (wt−c, . . . , wt+c) excluding wt itself. The probability (wt|wt−c:wt+c) is defined using the softmax,
where v′w
where vw is the input vector representation of w.
SG model tries to predict the surrounding words within a certain distance based on the current one. SG model defines the objective function as the exact counterpart to CBOW model,
Furthermore, SG model simplifies the probability distribution, introducing an assumption that the contextual words wt−c:wt+c are independent given current word wt,
with (wt+j|wt) defined as
where vw and v′w are the input and output vectors of w, respectively. Increasing the range of context c would generally improve the quality of learned word vectors, but at the expense of higher computation cost. SG model considers the surrounding words are equivalently important, and in this sense the word order is not fully exploited, similar to CBOW model.
Returning back to Equation 1, the probability of observing a surrounding document based on the current document (
where vd and v′d are the input and output vector representations of document d, respectively. The probability of observing a word not only depends on its surrounding words, but also the specific document that the word belongs to. More precisely, probability (wt|wt−c:wt+c, dm) defined as
where v′wt is the output vector representation of wt, and
(dm|dm−b:dm−1), i.e., the probability of the mth document in the sequence given its preceding documents. This is reflected, for example, in the second neural network model 302 of
(dm|dm+1:dm+b), i.e., the probability of the mth document given its b succeeding documents, are applied.
From this example, it is understood that the inputs and outputs in each of the hierarchical neural network models for modeling each layer of concepts may be reversed as needed. For example, the inputs and outputs of the first neural network model 202 may be reversed in some embodiments such that it can learn the temporal context of word sequence for the word Wt.
In this example, the first layer of the hierarchical neural network models is the first neural network model 402 for document content/words. On top of the first neural network model 402, the second neural network model 404 for documents is added and connected to the first neural network model 402 by the document Dm 406. Dm 406 may be the document that contains the word sequence in the first neural network model 402 as described above with respect to
The third neural network model 410 for users and the second neural network model 404 are arranged in a cascade of models in this example. The third neural network model 410 is connected to the second neural network model 404 via the user Un 412. The documents in the second neural network model 404 may be specific to Un 412. For example, the documents may be personalized content stream for Un 412, or Un 412 may be the author or consumer of the documents. Then, Un 412 could serve as the global context of contextual documents pertaining to that specific user, much like Dm 406 serves as the global context to words pertaining to that specific document. For example, a document may be predicted based on the surrounding documents, which also conditioning on a specific user. This variant model can be represented as (dm|dm−b:dm−1, u), where u denotes the indicator for the user. Learning vector representations of users would open doors for further improvement of personalization. The first, second, and third neural network models 402, 404, 410 may be viewed as a combined neural network model 414 for users, documents, and document content.
The fourth neural network model 416 for user groups is also part of the cascade of models in this example. The fourth neural network model 416 is connected to the third neural network model 410 via the user group Gk 418. The users in the third neural network model 410 may belong to Gk 418. For example, all the users may be in the same family. Then, Gk 418 could serve as the global context of contextual users pertaining to that specific user group, much like Dm 406 serves as the global context to words pertaining to that specific document and Un 412 servers as the global context to documents pertaining to that specific user. Learning vector representations of user groups would open doors for further improvement of social relationship mining. It is understood that the neural network models in this example may be continuously extended by cascading more neural network models for related concepts at other levels.
The hybrid query tasks that can be addressed by the hybrid query engine 602 in this example include: 1) given a query keyword, search for similar keywords to expand the query (useful in the search product); 2) given a keyword, search for relevant documents such as news stories (useful in document retrieval); 3) given a document, retrieve similar or related documents, useful for news stream personalization and document recommendation; and 4) automatically generate related words to tag or summarize a given document, useful in native advertising or document retrieval. All these tasks are essential elements of a number of online applications, including online search, advertising, and personalized recommendation.
The optimization module 806 in this example is configured to estimate, based on the hierarchical neural network model 810, feature vectors of the input information. The feature vectors may be estimated by automatically optimizing the hierarchical neural network model 810. In some embodiments, the hierarchical neural network model 810 is optimized using stochastic gradient descent. In this embodiment, the hierarchical softmax approach is used for automatically optimizing the hierarchical neural network model 810. The hierarchical softmax approach reduces the time complexity to (R log(W)+2bM log(N)), where R is the total number of words in the document sequence. Instead of evaluating each distinct word or document in different entries in the output, the hierarchical softmax approach uses two binary trees, one with distinct documents as leaves and the other with distinct words as leaves. For each leaf node, there is unique path assigned and the path is encoded using binary digits. To construct the tree structure, Huffman tree may be used, where more frequent words (or documents) in data have shorter codes. The internal tree nodes are represented as real-valued vectors, of the same dimensionality as word and document vectors. More precisely, the hierarchical softmax approach expresses the probability of observing the current document (or word) in the sequence as a product of probabilities of the binary decisions specified by the Huffman code of the document as follows,
where hl is the lth bit in the code with respect to ql, which is the lth node in the specified tree path of dm+i. The probability of each binary decision is defined as follows,
p(hl=1|ql,dm)=(vTd
where σ(x) is the sigmoid function, and vqi is the vector representation of node ql. It can be verified that Σd=1N(dm+i=d|dm)=1, and hence the property of probability distribution is preserved. Similarly,
(wt|wt−c:wt−c, dm) can be expressed in the same manner, but with construction of a separate, word-specific Huffman tree. It is understood that any other suitable approach known in the art may be applied to optimize the hierarchical neural network model 810 as well.
The vectors similarity measurement module 808 in this example determines similarity between any two or more pieces of input information based on a distance between their feature vectors. In one example, a cosine distance, a Hamming distance, or a Euclidean distance may be used as the metric of similarity measure. The vector representations in this example are all in the common vector space with the same dimensionality, and thus, can be compared directly by their distance therebetween. In this example, the dimensionality of the common vector space may be in the order of hundreds.
At 906, based on the obtained model, first and second feature vectors are estimated for the first and second pieces of information, respectively. In one example, the first and second feature vectors are estimated by automatically optimizing the model using a hierarchical softmax approach. At 908, the similarity between the first and second pieces of information is determined based on a distance between the first and second feature vectors. The similarity may be used for hybrid query task in which the first and second pieces of information are input query and query result, respectively. The similarity may also be used for classifying the first and second pieces of information based on the determined similarity between the first and second pieces of information.
The method and system in the present teaching have been evaluated by preliminary experiments as described below in details. In the first set of experiments, the quality of the distributed document representations obtained by the method and system in the present teaching is evaluated on classification tasks. In the experiments, the training data set is a public movie ratings data set MovieLens 10M (http://grouplens.org/datasets/movielens/, September 2014), consisting of movie ratings for around 10,000 movies generated by more than 71,000 users, with a movie synopses data set found online (ftp://ftp.fu-berlin.de/pub/misc/movies/database/, September 2014). Each movie is tagged as belonging to one or more genres, such as “action” or “horror.” Then, following terminology used in the present teaching, movies are considered as “documents” and synopses are considered as “content/words.” The document streams were obtained by taking for each user movies rated 4 and above (on the scale from 1 to 5), and ordering them in a sequence by the timestamp of the rating. This resulted in 69,702 document sequences comprising 8,565 movies.
Several assumptions are made while generating the movie data set. First, only high-rated movies are used in order to make the data less noisy, as the assumption is that the users are more likely to enjoy two movies that belonged to the same genre, than two movies coming from two different genres. Thus, by removing low-rated movies, the experiments aim to retain only similar movies in a single user's sequence. The experimental results as shown below indicate that the assumption is true. In addition, the ratings timestamp is used as a proxy for a time when the movie was actually watched. Although this might not always hold in reality, the empirical results suggest that the assumption was reasonable for learning useful movie and word embedding.
As comparisons, movie vector representations for the training data set are also learned by some known solutions: (1) latent Dirichlet allocation (LDA), which learns low-dimensional representations of documents (i.e., movies) as a topic distribution over their synopses; (2) paragraph vector (paragraph2vec), where the entire synopses are taken as a single paragraph; and (3) word2vec, where movie sequences are used as “documents” and movies as “words.” The method and system in the present teaching are referred as hierarchical document vector (HDV). Note that LDA and paragraph2vec only take into account the content of the documents (i.e., movie synopses), word2vec only considers the movie sequences and does not consider synopses in any way, while HDV combines the two approaches and jointly considers and models both the movie sequences and the content of movie synopses. Dimensionality of the embedding space was set to 100 for all low-dimensional embedding methods, and the neighborhood of the neural language modelling methods was set to 5. A linear support vector machine (SVM) was used to predict a movie genre in order to reduce the effect of variance of non-linear methods on the results.
The classification results after 5-fold cross validation are shown in TABLE 1, where results are reported on eight binary classification tasks for eight most frequent movie genres in the training data set. As shown in TABLE 1, neural language models obtained higher accuracy than LDA on average, although LDA achieved very competitive results on the last six tasks. It is interesting to observe that word2vec obtained higher accuracy than paragraph2vec despite the fact that the latter was specifically designed for document representation, which indicates that the users have strong genre preferences that were exploited by word2vec. Note that the method and system in the present teaching (HDV) achieved higher accuracy than the known solutions, obtaining on average 5.62% better performance over the state-of-the-art paragraph2vec and 1.52% over the word2vec model. This can be explained by the fact that the method and system in the present teaching (HDV) successfully exploited both the document content and the relationships in a stream between them, resulting in improved performance.
In another news topic classification experiment, the learned representations are used to label news documents with the 19 first-level topic tags from a large Internet company's internal hierarchy (e.g., “home & garden,” “science”). A large-scale training data set was collected at servers of the company. The data consists of nearly 200,000 distinct news stories, viewed by a subset of company's users from March to June, 2014. After pre-processing where the stopwords are removed, the hierarchical neural network models in the present teaching are trained on 80 million document sequences generated by users, containing a total of 100 million words and with a vocabulary size of 161 thousands. Linear SVM is used to predict each topic separately, and the average improvement over LDA after 5-fold cross-validation is given in TABLE 2. Note that the method and system in the present teaching (HDV) outperformed the known solutions on this large-scale problem, strongly confirming the benefits of the method and system in the present teaching (HDV) for contextual document representation.
In the second sets of experiments, the applications of the method and system in the present teaching on hybrid query are evaluated. The experiment results show a wide potential of the method and system in the present teaching for online applications, using the large-scale training data set collected at servers of the large Internet company as mentioned above. In the second sets of experiments, cosine distance is used to measure the closeness of two vectors, i.e., similarity (either document or word) in the common embedding space.
Note that the method and system in the present teaching differ from the traditional information retrieval due to the fact that the retrieved document does not need to contain the query word, as seen in the example of keyword “boxing.” As we can see, the method and system in the present teaching found that the articles discussing UFC and WSOF events are related to the sport, despite the fact that they don't specifically contain word “boxing.”
Used the trained models, the method and system in the present teaching retrieve the nearest words given a news story as an input.
Users 1502 may be of different types such as users connected to the network 1504 via desktop computers 1502-1, laptop computers 1502-2, a built-in device in a motor vehicle 1502-3, or a mobile device 1502-4. A user 1502 may send a query in any type (a user group, a user, a document, or a keyword) to the hybrid query engine 602 via the network 1402 and receive query result(s) in any type from the hybrid query engine 602. The user 1502 may also send information in any type (user groups, users, documents, or keywords) to the classification engine 702 via the network 1402 and receive classification results from the classification engine 702. In this embodiment, the joint representation engine 502 serves as a backend system for providing vector representations of any incoming information or similarity measures between any information to the hybrid query engine 602 and/or the classification engine 702.
The content sources 1506 include multiple content sources 1506-1, 1506-2, . . . , 1506-n, such as vertical content sources (domains). A content source 1506 may correspond to a website hosted by an entity, whether an individual, a business, or an organization such as USPTO.gov, a content provider such as cnn.com and Yahoo.com, a social network website such as Facebook.com, or a content feed source such as tweeter or blogs. The joint representation engine 502, the hybrid query engine 602, or the classification engine 702 may access information from any of the content sources 1506-1, 1506-2, . . . , 1506-n.
To implement various modules, units, and their functionalities described in the present disclosure, computer hardware platforms may be used as the hardware platform(s) for one or more of the elements described herein (e.g., the joint representation engine 502, the hybrid query engine 602, the classification engine 702, described with respect to
The computer 1700, for example, includes COM ports 1702 connected to and from a network connected thereto to facilitate data communications. The computer 1700 also includes a central processing unit (CPU) 1704, in the form of one or more processors, for executing program instructions. The exemplary computer platform includes an internal communication bus 1706, program storage and data storage of different forms, e.g., disk 1708, read only memory (ROM) 1710, or random access memory (RAM) 1712, for various data files to be processed and/or communicated by the computer, as well as possibly program instructions to be executed by the CPU 1704. The computer 1700 also includes an I/O component 1714, supporting input/output flows between the computer and other components therein such as user interface elements 1716. The computer 1700 may also receive programming and data via network communications.
Hence, aspects of the methods of joint information representation and/or other processes, as outlined above, may be embodied in programming. Program aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Tangible non-transitory “storage” type media include any or all of the memory or other storage for the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide storage at any time for the software programming.
All or portions of the software may at times be communicated through a network such as the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer of a search engine operator into the hardware platform(s) of a computing environment or other system implementing a computing environment or similar functionalities in connection with joint information representation. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software. As used herein, unless restricted to tangible “storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.
Hence, a machine-readable medium may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, which may be used to implement the system or any of its components as shown in the drawings. Volatile storage media include dynamic memory, such as a main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that form a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a physical processor for execution.
Those skilled in the art will recognize that the present teachings are amenable to a variety of modifications and/or enhancements. For example, although the implementation of various components described above may be embodied in a hardware device, it may also be implemented as a software only solution—e.g., an installation on an existing server. In addition, the enhanced ad serving based on user curated native ads as disclosed herein may be implemented as a firmware, firmware/software combination, firmware/hardware combination, or a hardware/firmware/software combination.
While the foregoing has described what are considered to constitute the present teachings and/or other examples, it is understood that various modifications may be made thereto and that the subject matter disclosed herein may be implemented in various forms and examples, and that the teachings may be applied in numerous applications, only some of which have been described herein. It is intended by the following claims to claim any and all applications, modifications and variations that fall within the true scope of the present teachings.