The present application generally relates to computer-implemented recommendation systems. More specifically, the present application relates to a recommendation system for online job postings with a candidate selection technique that leverages machine learning to rank and select attributes to be used with a query for selecting the candidate job postings.
Many online or web-based applications, services and sites have recommendation systems for deriving recommendations for presentation to end-users. By way of example, e-commerce sites have recommendation systems for recommending products, digital content, and/or services. Online dating services may provide recommendations relating to people of potential interest for dating. An online job hosting service allows those who are seeking employees to create and post online job postings that describe available job opportunities, while simultaneously allowing those seeking job opportunities to browse recommended online job postings and/or search for online job postings.
To generate relevant recommendations, many recommendation systems use attributes of end-users to generate queries that fetch candidate content items, before ranking the candidate content items, and then presenting as recommendations a subset of the highest-ranking content items. This two-step paradigm generally involves a retrieval step—frequently referred to as a candidate selection or candidate retrieval step—followed by a separate ranking step. Typically, during the candidate selection step, content items are fetched by matching end-user attributes with corresponding attributes of content items being considered for recommendation.
By way of example, in the context of an online job hosting service, an end-user may have a user profile that specifies a skill possessed by the end-user. During the candidate selection step, a recommendation system may retrieve all job postings that specify a required skill that matches the skill possessed by the end-user, as reflected by the end-user's profile. Once the candidate job postings are fetched, a ranking algorithm is applied to rank the job postings. Finally, several of the highest-ranking job postings are selected for presentation as job recommendations to the end-user. As set forth below, embodiments of the present invention provide an improved candidate selection technique for use with recommendation systems.
Embodiments of the present invention are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which:
Described herein are techniques for selecting candidate content items (e.g., online job postings) for an online recommendation service or system. Specifically, the present disclosure describes a technique for training and using a machine learned model to rank attributes of an end-user, so that a query can be generated with a combination of high-ranking attributes that are likely to return relevant content items. To ensure that the selected candidate content items are relevant, the machine learning model is optimized using a loss function that expresses similarity between an end-user and a content item as a match between at least “k” attributes. The query that is generated to fetch candidate content items is derived to ensure that the fetched candidate content items have “k” or more attributes that match those specified in the query. In the following description, for purposes of explanation, numerous specific details and features are set forth to provide a thorough understanding of the various aspects of different embodiments of the present invention. It will be evident, however, to one skilled in the art, that the present invention may be practiced and/or implemented with varying combinations of the many details and features presented herein.
Referring now to
The request, including the set of user attributes for the end-user 106, is then provided as an input to a query processor 108. The query processor formulates a query 110 to be executed against a datastore 112 of content items. In this example, the query 110 is formulated with individual terms expressing each of the individual attributes and joined by a Boolean OR operator. For instance, the first term is represented in the query as “A1” for a first attribute. Accordingly, upon executing the query to fetch the candidate content items, any content item having a single attribute matching one of the several attributes (e.g., attributes, “A1” or “A2” . . . or “A5”) expressed by the several terms of the query will satisfy the query. Once the candidate content items have been fetched, the candidate content items are ranked by a ranker 112 before a subset of high-ranking content items are selected and presented as recommendations 114 to the end-user.
When the content items are stored using a pre-built index, (e.g., Lucene, or key-value store), the fetching of the content items occurs very quickly. However, as the number of attributes under consideration increases, it takes more time to fetch the content items and the overall relevance of each content item decreases. Although the example of
Turning now to
As shown in
A candidate selection technique consistent with that shown in
The candidate selection technique illustrated and described in connection with
Here, “S” is a similarity function whose value is derived from a document “d” (representing a content item) and a query “q” (representing an end-user), where “i” represents an index for the number “N” of attributes. If no attributes are shared in common between the query (end-user) and document (content item), the expression “qidi” evaluates to zero, indicating that the query and the document are not similar. However, when just one attribute is shared in common between the query (end-user) and the document (content item), the expression “qidi” evaluates to one, indicating the query and document are deemed to be similar. Because the query rewriter 208 derives the query 214 with Boolean OR operators, the result is that, in some instances, a candidate content item may be fetched on the basis of the content item having only one attribute shared in common with the end-user. This may lead to confusingly poor results. Specifically, in the context of an online job hosting service, an end-user may be presented with one or more recommended job postings that are not at all suitable for, and/or of interest to, the end-user.
Embodiments of the present invention address the technical problems of the prior art by improving the manner in which a machine learning model is trained to score attributes, and improving the manner in which a query rewriter derives a query, based on scored attributes, for use in selecting candidate content items. Turning now to the diagram illustrated in
The request, including the end-user attributes 306, is then provided as input to a query rewriter 308. A pre-trained machine learned model 310 derives for each attribute a score representing a predictive power of the attribute in identifying relevant content items for the end-user. As reproduced below and as shown in
Similar to the loss function (“S”) shown in
Continuing with the example shown in
Once the candidate content items have been fetched, the ranker 316 processes the candidate content items to derive a rank or ranking score for each content item. Finally, a subset of the ranked content items 318 is selected for presentation to the end-user, based on their respective ranking scores, and the selected content items are presented in an order that is based on their respective rank. By scoring attributes with a machine learned model that has been optimized to determine similarity based on “k” attributes, and generating queries that require “k” common attributes between the query and the content item, the overall relevancy of the candidate content items selected is increased, as compared to conventional candidate content selection techniques.
As illustrated in
During the training stage 406, after each instance of training data is processed by the machine learned model 400 to generate scores 402 for the attributes 404, an evaluation or optimization operation 418 is performed to determine how to manipulate the weights of the machine learned model 400 to generate more accurate predictions (e.g., scores). For example, the evaluation operation generally involves comparing the predicted output of the machine learned model with the actual output associated with the example input. A loss function is used to evaluate the performance of the model in generating the desired outputs, based on the provided inputs.
During the training stage 406, as the training data are provided to the learning system 408, the weights of the individual neurons of the neural network model 400 are manipulated to minimize the error or difference, as measured by the loss function. Once fully trained and deployed in a production setting 420, the model 400 is provided with a set of attributes for an end-user, for whom content recommendations are to be generated. The machine learned model 404 then generates the scores 402 for the received attributes. Finally, the query rewriter selects those attributes that have scores exceeding some predetermined threshold, for use in a weighted OR query for selecting the candidate content items (e.g., job postings).
Consistent with some embodiments, each content item or job posting (represented for purposes of modeling as a document, “d”) is represented as an embedding—a high dimensional binary vector with “N” dimensions, with d∈{0, 1}N, where each dimension corresponds with an attribute (e.g., “skill=TENSOR FLOW” and “title=ENGINEER”). Similarly, the attributes of an end-user (represented for modeling as a query, “q”) is also represented as an embedding (e.g., a high dimensional binary vector) in the same space, d∈{0, 1}N. Accordingly, if the attributes of any given end-user (e.g., query, “q”) are similar (e.g., match) the attributes of a content item or job posting (e.g., document, “d”), then the probability of the content item or job posting (e.g., document, “d”) being relevant to the end-user (e.g., query, “q”) increases. Thus, with the goal being to find relevant content items (e.g., job postings) to present to the end-user, the selection of an appropriate loss function is important to achieve the goal.
Consistent with some embodiments of the present invention, the similarity of loss function used to optimize the machine learned model is based on the Heaviside Step function:
Here, “k” is the number of attributes matching between the query, “q” (e.g., the end-user) and the document, “d” (e.g., the content item or job posting). However, as the Heaviside Step function is not a continuous function, the following logistic function, which is a smooth and continuous approximation of the Heaviside Step function, may be used:
Here, the larger the choice of “r”, the sharper the transition will be at the origin.
During information retrieval, the arguments of the maxima (abbreviated as the arg-max) of the loss function is determined. Because log(x) is an increasing function, a maximum for the loss function, H(x), implies a maximum for, log(H(x)),
Because eƒ(x) is convex and arg-max log(1+ƒ(x)) is the same as arg-max of log(ƒ(x)) we have:
In addition, the embedding spaces of the query and document are discrete. For instance, the query, “q” has dimensions that are between the values of zero and one, making back-propagation difficult. Therefore, to enable back propagation, the query embedding is smoothed using the sigmoid function as follows:
where, ƒ∈RN, which can take any real number and T>0 is the level of smoothness. If T is close to 0, depending on the sign of ƒ, q can take discrete values, that is, either zero (0) or one (1). When T is close to one (1), the query, “q” will have a soft value lying anywhere between zero (0) and one (1).
The loss function, H(x), is aligned with the goal, Pr(+|U, d), using cross entropy.
By way of example, consider the query expression 500 shown in
For a given content item, the query is evaluated by determining which, if any, attributes expressed in the various terms are present in the content item—meaning, the content item expresses corresponding attributes. By way of example, if the content item is a job posting, and the job posting indicates the job being promoted has a job title of “software engineer,” then the query term 502 would be considered to have a corresponding, matching attribute expressed in the job posting. To determine whether the content item satisfies the query, the sum of the weighting factors associated with matching attributes is compared with the query threshold score 508. If the score (e.g., sum of weighting factors) is equal to or greater than the query threshold score 508 for a particular content item, then that content item is deemed to satisfy the query, and the content item is fetched as a candidate content item.
Continuing with the example, if a content item—in this instance, an online job posting—indicates a job being promoted has a job title of “software engineer,” and lists as a required skill, “TensorFlow,” then the online job posting satisfies the query because the sum of the weighting factors is two (e.g., one plus one equals two), which, in this instance, is equal to the query threshold score (e.g., “THRESHOLD=2”) 508. Similarly, if an online job posting indicates a job being promoted has a job title of “marketing specialist,” and lists as a required skill, “TensorFlow,” and no other attributes expressed in the query match those expressed in the job posting, then the online job posting would not satisfy the query because the sum of the weighting factors is one, which, in this instance, is less than the query threshold score (e.g., “THRESHOLD−2”) 508.
In
Turning now to the example in
Subsequent to training the machine learning model, the model (now referred to as a pre-trained model) is deployed in a production setting, such as that illustrated in
At method operation 606, the attributes of the end-user, as obtained at method operation 604, are provided as input to the pre-trained machine learned model. The machine learned model processes the attributes to derive for each attribute a score. The score represents a predictive power of the attribute when used as a term in a query to select relevant content items for the end-user.
At method operation 608, a query processor derives a query with a plurality of attributes having scores, derived by the machine learned model (e.g., at operation 606), exceeding a predetermined threshold. With some embodiments, the query is derived as a weighted OR query, as described in connection with the description of
At method operation 610, the candidate content items are provided as input to a ranker, which processes the candidate content items to rank the candidate content items with relation to one another. The ranking of the candidate content items by the ranker may involve one or more additional pre-trained machine learned models, and input features relating to the end-user and/or candidate content items. Finally, at method operation 612, plurality of content items are selected from the ranked content items, for presentation, as recommended content items, to the end-user. As part of selecting the content items to be presented as recommended content items, a variety of rule-based constraints beyond the scope of the present application may be applied in determining the final selection and order in which the content item recommendations are presented.
In various implementations, the operating system 804 manages hardware resources and provides common services. The operating system 804 includes, for example, a kernel 820, services 822, and drivers 824. The kernel 820 acts as an abstraction layer between the hardware and the other software layers, consistent with some embodiments. For example, the kernel 820 provides memory management, processor management (e.g., scheduling), component management, networking, and security settings, among other functionality. The services 822 can provide other common services for the other software layers. The drivers 824 are responsible for controlling or interfacing with the underlying hardware, according to some embodiments. For instance, the drivers 824 can include display drivers, camera drivers, BLUETOOTH® or BLUETOOTH® Low Energy drivers, flash memory drivers, serial communication drivers (e.g., Universal Serial Bus (USB) drivers), Wi-Fi® drivers, audio drivers, power management drivers, and so forth.
In some embodiments, the libraries 806 provide a low-level common infrastructure utilized by the applications 810. The libraries 606 can include system libraries 830 (e.g., C standard library) that can provide functions such as memory allocation functions, string manipulation functions, mathematical functions, and the like. In addition, the libraries 806 can include API libraries 832 such as media libraries (e.g., libraries to support presentation and manipulation of various media formats such as Moving Picture Experts Group-4 (MPEG4), Advanced Video Coding (H.264 or AVC), Moving Picture Experts Group Layer-3 (MP3), Advanced Audio Coding (AAC), Adaptive Multi-Rate (AMR) audio codec, Joint Photographic Experts Group (JPEG or JPG), or Portable Network Graphics (PNG)), graphics libraries (e.g., an OpenGL framework used to render in two dimensions (2D) and three dimensions (3D) in a graphic context on a display), database libraries (e.g., SQLite to provide various relational database functions), web libraries (e.g., WebKit to provide web browsing functionality), and the like. The libraries 806 can also include a wide variety of other libraries 834 to provide many other APIs to the applications 810.
The frameworks 808 provide a high-level common infrastructure that can be utilized by the applications 810, according to some embodiments. For example, the framework 808 provides various GUI functions, high-level resource management, high-level location services, and so forth. The frameworks 808 can provide a broad spectrum of other APIs that can be utilized by the applications 810, some of which may be specific to a particular operating system 804 or platform.
In an example embodiment, the applications 810 include a home application 850, a contacts application 852, a browser application 854, a book reader application 856, a location application 858, a media application 860, a messaging application 862, a game application 864, and a broad assortment of other applications, such as a third-party application 866. According to some embodiments, the applications 810 are programs that execute functions defined in the programs. Various programming languages can be employed to create one or more of the applications 810, structured in a variety of manners, such as object-oriented programming languages (e.g., Objective-C, Java, or C++) or procedural programming languages (e.g., C or assembly language). In a specific example, the third-party application 866 (e.g., an application developed using the ANDROID™ or IOS™ software development kit (SDK) by an entity other than the vendor of the particular platform) may be mobile software running on a mobile operating system such as IOS™, ANDROID™, WINDOWS® Phone, or another mobile operating system. In this example, the third-party application 866 can invoke the API calls 812 provided by the operating system 804 to facilitate functionality described herein.
The machine 900 may include processors 910, memory 930, and I/O components 950, which may be configured to communicate with each other such as via a bus 902. In an example embodiment, the processors 910 (e.g., a Central Processing Unit (CPU), a Reduced Instruction Set Computing (RISC) processor, a Complex Instruction Set Computing (CISC) processor, a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), an ASIC, a Radio-Frequency Integrated Circuit (RFIC), another processor, or any suitable combination thereof) may include, for example, a processor 912 and a processor 914 that may execute the instructions 916. The term “processor” is intended to include multi-core processors that may comprise two or more independent processors (sometimes referred to as “cores”) that may execute instructions contemporaneously. Although
The memory 930 may include a main memory 932, a static memory 934, and a storage unit 936, all accessible to the processors 910 such as via the bus 902. The main memory 930, the static memory 934, and storage unit 936 store the instructions 916 embodying any one or more of the methodologies or functions described herein. The instructions 916 may also reside, completely or partially, within the main memory 932, within the static memory 934, within the storage unit 936, within at least one of the processors 910 (e.g., within the processor's cache memory), or any suitable combination thereof, during execution thereof by the machine 900.
The I/O components 950 may include a wide variety of components to receive input, provide output, produce output, transmit information, exchange information, capture measurements, and so on. The specific I/O components 950 that are included in a particular machine will depend on the type of machine. For example, portable machines such as mobile phones will likely include a touch input device or other such input mechanisms, while a headless server machine will likely not include such a touch input device. It will be appreciated that the I/O components 950 may include many other components that are not shown in
In further example embodiments, the I/O components 950 may include biometric components 956, motion components 958, environmental components 960, or position components 962, among a wide array of other components. For example, the biometric components 956 may include components to detect expressions (e.g., hand expressions, facial expressions, vocal expressions, body gestures, or eye tracking), measure biosignals (e.g., blood pressure, heart rate, body temperature, perspiration, or brain waves), identify a person (e.g., voice identification, retinal identification, facial identification, fingerprint identification, or electroencephalogram-based identification), and the like. The motion components 758 may include acceleration sensor components (e.g., accelerometer), gravitation sensor components, rotation sensor components (e.g., gyroscope), and so forth. The environmental components 760 may include, for example, illumination sensor components (e.g., photometer), temperature sensor components (e.g., one or more thermometers that detect ambient temperature), humidity sensor components, pressure sensor components (e.g., barometer), acoustic sensor components (e.g., one or more microphones that detect background noise), proximity sensor components (e.g., infrared sensors that detect nearby objects), gas sensors (e.g., gas detection sensors to detection concentrations of hazardous gases for safety or to measure pollutants in the atmosphere), or other components that may provide indications, measurements, or signals corresponding to a surrounding physical environment. The position components 962 may include location sensor components (e.g., a GPS receiver component), altitude sensor components (e.g., altimeters or barometers that detect air pressure from which altitude may be derived), orientation sensor components (e.g., magnetometers), and the like.
Communication may be implemented using a wide variety of technologies. The I/O components 950 may include communication components 964 operable to couple the machine 900 to a network 980 or devices 970 via a coupling 982 and a coupling 972, respectively. For example, the communication components 964 may include a network interface component or another suitable device to interface with the network 980. In further examples, the communication components 964 may include wired communication components, wireless communication components, cellular communication components, Near Field Communication (NFC) components, Bluetooth® components (e.g., Bluetooth® Low Energy), Wi-Fi® components, and other communication components to provide communication via other modalities. The devices 970 may be another machine or any of a wide variety of peripheral devices (e.g., a peripheral device coupled via a USB).
Moreover, the communication components 964 may detect identifiers or include components operable to detect identifiers. For example, the communication components 964 may include Radio Frequency Identification (RFID) tag reader components, NFC smart tag detection components, optical reader components (e.g., an optical sensor to detect one-dimensional bar codes such as Universal Product Code (UPC) bar code, multi-dimensional bar codes such as Quick Response (QR) code, Aztec code, Data Matrix, Dataglyph, MaxiCode, PDF417, Ultra Code, UCC RSS-2D bar code, and other optical codes), or acoustic detection components (e.g., microphones to identify tagged audio signals). In addition, a variety of information may be derived via the communication components 764, such as location via Internet Protocol (IP) geolocation, location via Wi-Fi® signal triangulation, location via detecting an NFC beacon signal that may indicate a particular location, and so forth.
The various memories (i.e., 930, 932, 934, and/or memory of the processor(s) 910) and/or storage unit 936 may store one or more sets of instructions and data structures (e.g., software) embodying or utilized by any one or more of the methodologies or functions described herein. These instructions (e.g., the instructions 916), when executed by processor(s) 910, cause various operations to implement the disclosed embodiments.
As used herein, the terms “machine-storage medium,” “device-storage medium,” “computer-storage medium” mean the same thing and may be used interchangeably in this disclosure. The terms refer to a single or multiple storage devices and/or media (e.g., a centralized or distributed database, and/or associated caches and servers) that store executable instructions and/or data. The terms shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media, including memory internal or external to processors. Specific examples of machine-storage media, computer-storage media and/or device-storage media include non-volatile memory, including by way of example semiconductor memory devices, e.g., erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), FPGA, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The terms “machine-storage media,” “computer-storage media,” and “device-storage media” specifically exclude carrier waves, modulated data signals, and other such media, at least some of which are covered under the term “signal medium” discussed below.
In various example embodiments, one or more portions of the network 980 may be an ad hoc network, an intranet, an extranet, a VPN, a LAN, a WLAN, a WAN, a WWAN, a MAN, the Internet, a portion of the Internet, a portion of the PSTN, a plain old telephone service (POTS) network, a cellular telephone network, a wireless network, a Wi-Fi® network, another type of network, or a combination of two or more such networks. For example, the network 980 or a portion of the network 980 may include a wireless or cellular network, and the coupling 982 may be a Code Division Multiple Access (CDMA) connection, a Global System for Mobile communications (GSM) connection, or another type of cellular or wireless coupling. In this example, the coupling 982 may implement any of a variety of types of data transfer technology, such as Single Carrier Radio Transmission Technology (1xRTT), Evolution-Data Optimized (EVDO) technology, General Packet Radio Service (GPRS) technology, Enhanced Data rates for GSM Evolution (EDGE) technology, third Generation Partnership Project (3GPP) including 3G, fourth generation wireless (4G) networks, Universal Mobile Telecommunications System (UMTS), High Speed Packet Access (HSPA), Worldwide Interoperability for Microwave Access (WiMAX), Long Term Evolution (LTE) standard, others defined by various standard-setting organizations, other long range protocols, or other data transfer technology.
The instructions 916 may be transmitted or received over the network 980 using a transmission medium via a network interface device (e.g., a network interface component included in the communication components 964) and utilizing any one of a number of well-known transfer protocols (e.g., HTTP). Similarly, the instructions 916 may be transmitted or received using a transmission medium via the coupling 972 (e.g., a peer-to-peer coupling) to the devices 070. The terms “transmission medium” and “signal medium” mean the same thing and may be used interchangeably in this disclosure. The terms “transmission medium” and “signal medium” shall be taken to include any intangible medium that is capable of storing, encoding, or carrying the instructions 916 for execution by the machine 900, and includes digital or analog communications signals or other intangible media to facilitate communication of such software. Hence, the terms “transmission medium” and “signal medium” shall be taken to include any form of modulated data signal, carrier wave, and so forth. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a matter as to encode information in the signal.
The terms “machine-readable medium,” “computer-readable medium” and “device-readable medium” mean the same thing and may be used interchangeably in this disclosure. The terms are defined to include both machine-storage media and transmission media. Thus, the terms include both storage devices/media and carrier waves/modulated data signals.