The present application relates generally to data processing systems, and in one specific example, to a social and/or business networking system that includes a job search engine that is especially suited for recent college graduates.
Online social and professional networking services are becoming increasingly popular, with many such services boasting millions of active members. In particular, the professional networking website LinkedIn has become successful at least in part because it allows members to actively search for jobs.
Some embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings in which:
Example methods and systems are described for permitting recent college graduates to search for jobs, and for locating for the recent college graduates jobs that are suited to the recent college graduates. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of example embodiments. It will be evident, however, to one skilled in the art that the present invention may be practiced without these specific details.
According to various exemplary embodiments, a job search engine for recent college graduates is configured to identify jobs that are particularly suited to the recent college graduates. These jobs can be jobs that are posted on or associated with a social network service such as LinkedIn. For example, the job search engine may identify types of jobs for which recently graduated college students were hired, and then recommend similar type jobs to other recently graduated college students.
As shown in
Once registered, a member may invite other members, or be invited by other members, to connect via the social network service. A “connection” may require a bi-lateral agreement by the members, such that both members acknowledge the establishment of the connection. Similarly, with some embodiments, a member may elect to “follow” another member. In contrast to establishing a connection, the concept of “following” another member typically is a unilateral operation, and at least with some embodiments, does not require acknowledgement or approval by the member that is being followed. When one member follows another, the member who is following may receive status updates or other messages published by the member being followed, or relating to various activities undertaken by the member being followed. Similarly, when a member follows an organization, the member becomes eligible to receive messages or status updates published on behalf of the organization. For instance, messages or status updates published on behalf of an organization that a member is following will appear in the member's personalized data feed or content stream. In any case, the various associations and relationships that the members establish with other members, or with other entities and objects, are stored and maintained within the social graph, shown in
The social network service may provide a broad range of other applications and services that allow members the opportunity to share and receive information, often customized to the interests of the member. For example, with some embodiments, the social network service may include a photo sharing application that allows members to upload and share photos with other members. With some embodiments, members may be able to self-organize into groups, or interest groups, organized around a subject matter or topic of interest. With some embodiments, the social network service may host various job listings providing details of job openings with various organizations.
As members interact with the various applications, services and content made available via the social network service, the members' behavior (e.g., content viewed, links or member-interest buttons selected, etc.) may be monitored and information concerning the member's activities and behavior may be stored, for example, as indicated in
With some embodiments, the social network system 20 includes what is generally referred to herein as a job search engine 200. The job search engine 200 is described in more detail below in conjunction with
Although not shown, with some embodiments, the social network system 20 provides an application programming interface (API) module via which third-party applications can access various services and data provided by the social network service. For example, using an API, a third-party application may provide a user interface and logic that enables an authorized representative of an organization to publish messages from a third-party application to a content hosting platform of the social network service that facilitates presentation of activity or content streams maintained and presented by the social network service. Such third-party applications may be browser-based applications, or may be operating system-specific. In particular, some third-party applications may reside and execute on one or more mobile devices (e.g., phone, or tablet computing devices) having a mobile operating system.
Turning now to
As described in more detail below, the profile filtering module 202 is configured to filter profiles of members of an online social or business network service to identify recent college graduates who have recently become employed in their first (or one of their first) employment positions. Thereafter, the metadata code identification module 203 identifies metadata codes that are associated with these first employment positions. The job filtering module 204 filters a database of job listings using the identified metadata codes to identify job listings that are similar to the first employment positions for which the recent college graduates have recently become employed. The job description analysis module 205 analyses the job descriptions of the identified job listings to identify keywords that may indicate that a job listing has optional or mandatory requirements (e.g., of previous work experience). A modeling module 206 invokes a logistic regression to model job listings (identified by the job description analysis module 205) and their probability of having optional or mandatory requirements. The job suitability determination module 207 takes a new job posting and predicts whether the job is likely suitable or not suitable for a recent college graduate. For a given job posting, it extracts features using the job description analysis module 205, it inputs the feature vector into a binary logistic regression model from the modeling module 206, and it records the prediction. The operation of each of the aforementioned modules of the job search engine 200 will now be described in greater detail in conjunction with
Referring now specifically to
More specifically, each member of an online social network service (e.g., LinkedIn) may be associated with a member profile page that includes various information about that member. An example of a member profile page 400 of a member (e.g., a LinkedIn® page of a member “Jane Doe”) is illustrated in
In some embodiments, by analyzing the member profile data and/or member profile page of a member of a social network service, the profile filtering module 202 may determine that a member is a recent college graduate who has recently become employed. The online social networking system will check a member's profile to determine if he or she has graduated from college (432), and then check to see if the member is employed (412). To determine if the member is a recently graduated and employed person, the online social networking system compares the graduation date or degree conferred date 432A in the member's profile to the employment start date 412A of an employment position in the member's profile. If the job start date is within a certain time of the member's graduation date, that member is considered to be recently employed.
Returning to
At 310, the metadata code identification module 203 identifies codes associated with these first employment positions. These codes can be in the form of metadata, and can consist of one or more parts that identify the type of job, the company, and the specific job title, for example. Then, at 315, the job filtering module 204 filters a database of job listings using the identified codes. The database can be the database 208 in
At 320, the online social networking system retrieves the job listings from the database that were identified with the codes, and stores these job listings, whose codes are similar to those of the first employment positions for which the recent college graduates have recently become employed, into a first subset of job listings. Then, at 325, the job description analysis module 205 and the modeling module 206 analyse the job descriptions in the first subset of job listings using a logistic regression to model job listings as a function of predictor variables indicating whether the requirements expressed in the job descriptions are optional or mandatory. That is, the logistic regression models a job listing's probability of having optional or mandatory requirements. As indicated at 326, examples of mandatory requirements include a previous work experience requirement, an advanced degree requirement, and/or a professional certification requirement. As indicated at 327A, job listings can be identified as having mandatory requirements by searching for key words in the job description that indicate such requirements are mandatory. Such key words can include “must,” “minimum,” and “at least.” Similarly, at 327B, job listings can be identified as having optional requirements by searching for keywords in the job description that indicate that the requirements are optional. Such keywords can include “should,” “preferably,” “ideally,” and/or “equivalent.” In another embodiment, as indicated at 328, job listings that include mandatory requirements are identified by examining the length of the job posting and modeling longer job postings as likely to include mandatory requirements.
Additionally, according to various exemplary embodiments, job descriptions that have mandatory requirements may be labelled or pre-classified (by a trusted individual or set of individuals) as negative training samples for training the model. In other words, the negative training data may be treated by the modeling module 206 as representative samples of job descriptions that have mandatory requirements, and the modeling module 206 may train the model based on such data (e.g., by refining the coefficients of the logistic regression model). In this way, the model may be later used to determine whether a given job description has mandatory requirements, analyzing the same types of features or indicators used when training the model on other job descriptions. For example, as illustrated at 500 in
Returning again to
As indicated at 325A-325D, the steps in the analysing of the job descriptions in the first subset of job listings using a logistic regression to model job listings as a function of features or indicators or optional or mandatory requirements are as follows. At 325A, the logistic regression trains models on the first subset of data, that is, the subset of jobs whose metadata codes are similar to those of the jobs for which recently graduated college students have recently been hired. At 325B, the regression identifies potential models as a function of performance from cross-validation within a subset of job listings. At 325C, the potential models are tested, and at 325D, a model is selected as a function of performance from cross-validation within a subset of job listings.
More specifically, the modeling module 206 performs a prediction modeling process based on the indicators (i.e., of optional or mandatory requirements associated with job descriptions) in order to identify job listings that are suitable to recent graduates and job listings that are not suitable to recent graduates. According to various exemplary embodiments described in more detail below, the aforementioned modeling process may include training a model (e.g., a logistic regression model) using positive data samples (job descriptions with optional requirements) and negative data samples (job descriptions with mandatory requirements) which may exhibit none, some, or all of the features or indicators, in varying magnitude. Thereafter, the trained model may analyse a particular job description posted on the online social network service to predict a likelihood or probability that the particular job posting will be suitable or not suitable to a recent college graduate. This may then be repeated for all the job listings on the online social network service, in order to identify all job listings for which recent college graduates may be suited.
The modeling module 206 may use any one of various known modeling techniques to perform the modeling. For example, according to various exemplary embodiments, the modeling module 206 may apply a statistics-based machine learning model such as a logistic regression model to the indicators. As understood by those skilled in the art, logistic regression is an example of a statistics-based machine learning technique that uses a logistic function. The logistic function is based on a variable, referred to as a logit. The logit is defined in terms of a set of regression coefficients of corresponding independent predictor variables. Logistic regression can be used to predict the probability of occurrence of an event given a set of independent/predictor variables. A highly simplified example machine learning model using logistic regression may be ln[p/(1−p)]=a+BX+e, or [p/(1−p)]=exp(a+BX+e), where In is the natural logarithm, logexp, where exp=2.71828 . . . , p is the probability that the event Y occurs, p(Y=1), p/(1−p) is the “odds ratio”, ln[p/(1−p)] is the log odds ratio, or “logit”, a is the coefficient on the constant term, B is the regression coefficient(s) on the independent/predictor variable(s), X is the independent/predictor variable(s), and e is the error term. In some embodiments, the independent/predictor variables of the logistic regression model may be data associated with the job descriptions of job listings (where the data may be encoded into feature vectors). The regression coefficients may be estimated using maximum likelihood or learned through a supervised learning technique from the indicators, as described in more detail below. Accordingly, once the appropriate regression coefficients (e.g., B) are determined, the features included in a feature vector (e.g., data associated with a job description of a social network service) may be plugged into the logistic regression model in order to predict the probability that the event Y occurs (where the event Y may be, for example, a particular job listing has optional requirements and therefore is suitable for a recent college graduate). In other words, provided a feature vector including various requirements associated with a particular job listing, the feature vector may be applied to a logistic regression model to determine the probability that the particular job listing is suitable to a recent college graduate. Logistic regression is well understood by those skilled in the art, and will not be described in further detail herein, in order to avoid occluding various aspects of this disclosure. The modeling module 206 may use various other modeling techniques understood by those skilled in the art to predict whether a particular job listing is suitable for a recent college graduate. For example, other modeling techniques may include other machine learning models such as a Naïve Bayes model, a support vector machines (SVM) model, a decision trees model, and a neural network model, all of which are understood by those skilled in the art.
According to various embodiments described above, the job listing indicators may be used for the purposes of both training the model (for generating and refining a model and/or the coefficients of a model) and using the trained model (for predicting whether a particular job listing is suitable for a recent college graduate). For example, if the modeling module 206 is utilizing a logistic regression model (as described above), then the regression coefficients of the logistic regression model may be learned through a supervised learning technique from the indicators. Accordingly, in one embodiment, the job description analysis module 205 may operate in an off-line training mode by assembling the job description indicators into feature vectors. (For the purposes of training the system, the system generally needs both positive examples of job listings having optional requirements, as well as negative examples of job listings having mandatory requirements, as will be described in more detail below). The feature vectors may then be passed to the modeling module 206, in order to refine regression coefficients for the logistic regression model. For example, statistical learning based on the Stochastic Gradient Descent technique may be utilized for this task. Thereafter, once the regression coefficients are determined, the job suitability determination module 207 may operate to perform online (or offline) inferences based on the trained model (including the trained model coefficients) on a feature vector representing a job listing of the online social network service. For example, according to various exemplary embodiments described herein, the job suitability determination module 207 is configured to predict the likelihood that a particular job listing is suitable to a recent college graduate, based on the job description indicators of the particular job listing compared to the contributions or weights of these indicators in the job listings that were utilized to train the model. In some embodiments, if the probability that the particular job listing is suitable for a recent college graduate is greater than a specific threshold (e.g., 0.5, 0.8, etc.), then the job suitability determination module 207 may classify that particular job listing as being suitable for a recent college graduate. In other embodiments, the job suitability determination module 207 may calculate a score for the particular job listing, based on the probability that the particular job listing is suitable for a recent college graduate. Accordingly, the job suitability determination module 207 may repeat this process for all the job listings of an online social network service.
According to various exemplary embodiments, the off-line process of training or retraining the model based on the job description indicators may be performed periodically at regular time intervals (e.g., once a month), or may be performed at irregular time intervals, random time intervals, continuously, etc. Since job listing indicators may change over time based on changes in the listing of jobs on the social networking system, it is understood that the model itself may change over time (based on the current recruiting intent indicators being used to train the model). The descriptions of job listings may change over time because, for example, industry practice within a field may change, or features, products and technology of the online social network service may change, and so on.
As described above, for the purposes of training the logistic regression model, the model generally requires both positive examples of job listings having optional requirements, as well as negative examples of job listings having mandatory requirements. In other words, the job listing examples may be treated by the job suitability determination module 207 as representative samples of job listings associated with optional requirements and those associated with mandatory requirements. The job suitability determination module 207 may train the model based on the indicators or predictor variables contained in these job listings (e.g., by refining the coefficients of the prediction model). In this way, the model may be later utilized to analyse data associated with a given job listing, in order to determine the job contributions of the values of the predictor variables in this particular listing, and to thus determine whether the given job listing is suitable for a recent college graduate.
Returning again to
At 331 and 332, the online social networking system ranks the jobs whose metadata codes are similar to those jobs for which recent college graduates have recently been hired. Specifically, at 331, the online social networking system scores the job listings whose metadata codes are similar to those jobs for which recent graduates have recently been hired as a probability of the requirements or skills listed in the job listings being mandatory or optional. For example, if a job listing has a description with one or more of the term “must” in it, that job listing will likely be scored lower (since a recent college graduate is not likely to have many of the required skill sets listed after the word “must”). Similarly, if a job listing has a description with one or more or the term “preferably” in it, that job listing will likely be scored higher (since a recent college graduate who does not have many skill sets yet will more likely be considered for such a position). Then, at 332, the job listings of recent college graduates who have recently been hired are ranked as a function of the score. Consequently, job listings with optional requirements are ranked higher and presented to the recent college graduate who is searching for a job because it is less likely that that recent college graduate will land a job with mandatory requirements.
At 329, the online social networking system's analysis of the job descriptions of the jobs that have recently been filled by recent college graduates involves searching for job types and job titles, and identifying the job types and job titles as unattainable or undesirable. For example, if the job description includes the terms “senior,” “CEO,” or “group leader,” then it is likely that the recent college graduate is not qualified for that job. Similarly, a recent college graduate may not be interested in jobs with descriptions of “chef,” “mechanic,” or “grocer.”
In summary, an embodiment of an online social networking system identifies jobs similar to those for which recent college graduates have been recently hired, analyses the descriptions of those jobs and creates models based on that analysis, and uses the models to identify other job listings that may be the type of job suitable to a recent college graduate. This embodiment improves the functionality of the computerized online social networking service because it displays to a recent college graduate jobs for which he or she is more likely to be hired. By searching for, locating, and displaying only the jobs for which a recent college is likely to be hired, and not searching for, locating, and displaying jobs for which the recent college graduate is not likely to be hired, the operation of the computer hardware on which the online social networking system executes functions much more efficiently.
The example computer system 600 includes a processor 602 (e.g., a central processing unit (CPU), a graphics processing unit (GPU) or both), a main memory 604 and a static memory 606, which communicate with each other via a bus 608. The computer system 600 may further include a video display unit 610 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)). The computer system 600 also includes an alphanumeric input device 612 (e.g., a keyboard or a touch-sensitive display screen), a user interface (UI) navigation device 614 (e.g., a mouse), a disk drive unit 616, a signal generation device 618 (e.g., a speaker) and a network interface device 620.
The disk drive unit 616 includes a machine-readable medium 622 on which is stored one or more sets of instructions and data structures (e.g., software) 624 embodying or utilized by any one or more of the methodologies or functions described herein. The instructions 624 may also reside, completely or at least partially, within the main memory 604 and/or within the processor 602 during execution thereof by the computer system 600, the main memory 604 and the processor 602 also constituting machine-readable media.
While the machine-readable medium 622 is shown in an example embodiment to be a single medium, the term “machine-readable medium” may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more instructions or data structures. The term “machine-readable medium” shall also be taken to include any tangible medium that is capable of storing, encoding or carrying instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present invention, or that is capable of storing, encoding or carrying data structures utilized by or associated with such instructions. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media. Specific examples of machine-readable media include non-volatile memory, including by way of example semiconductor memory devices, e.g., Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
The instructions 624 may further be transmitted or received over a communications network 626 using a transmission medium. The instructions 624 may be transmitted using the network interface device 620 and any one of a number of well-known transfer protocols (e.g., HTTP). Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), the Internet, mobile telephone networks, Plain Old Telephone (POTS) networks, and wireless data networks (e.g., WiFi, LTE, and WiMAX networks). The term “transmission medium” shall be taken to include any intangible medium that is capable of storing, encoding or carrying instructions for execution by the machine, and includes digital or analog communications signals or other intangible media to facilitate communication of such software.
Although an embodiment has been described with reference to specific example embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the invention. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense. The accompanying drawings that form a part hereof, show by way of illustration, and not of limitation, specific embodiments in which the subject matter may be practiced. The embodiments illustrated are described in sufficient detail to enable those skilled in the art to practice the teachings disclosed herein. Other embodiments may be utilized and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. This Detailed Description, therefore, is not to be taken in a limiting sense, and the scope of various embodiments is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.
Such embodiments of the inventive subject matter may be referred to herein, individually and/or collectively, by the term “invention” merely for convenience and without intending to voluntarily limit the scope of this application to any single invention or inventive concept if more than one is in fact disclosed. Thus, although specific embodiments have been illustrated and described herein, it should be appreciated that any arrangement calculated to achieve the same purpose may be substituted for the specific embodiments shown. This disclosure is intended to cover any and all adaptations or variations of various embodiments. Combinations of the above embodiments, and other embodiments not specifically described herein, will be apparent to those of skill in the art upon reviewing the above description.
Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
Certain embodiments are described herein as including logic or a number of components, modules, or mechanisms. Modules may constitute either software modules (e.g., code embodied on a machine-readable medium or in a transmission signal) or hardware modules. A “hardware module” is a tangible unit capable of performing certain operations and may be configured or arranged in a certain physical manner. In various example embodiments, one or more computer systems (e.g., a standalone computer system, a client computer system, or a server computer system) or one or more hardware modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein.
In some embodiments, a hardware module may be implemented mechanically, electronically, or any suitable combination thereof. For example, a hardware module may include dedicated circuitry or logic that is permanently configured to perform certain operations. For example, a hardware module may be a special-purpose processor, such as a field programmable gate array (FPGA) or an ASIC. A hardware module may also include programmable logic or circuitry that is temporarily configured by software to perform certain operations. For example, a hardware module may include software encompassed within a general-purpose processor or other programmable processor. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
Accordingly, the phrase “hardware module” should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein. As used herein, “hardware-implemented module” refers to a hardware module. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where a hardware module comprises a general-purpose processor configured by software to become a special-purpose processor, the general-purpose processor may be configured as respectively different special-purpose processors (e.g., comprising different hardware modules) at different times. Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) between or among two or more of the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
Some portions of this specification are presented in terms of algorithms or symbolic representations of operations on data stored as bits or binary digital signals within a machine memory (e.g., a computer memory). These algorithms or symbolic representations are examples of techniques used by those of ordinary skill in the data processing arts to convey the substance of their work to others skilled in the art. As used herein, an “algorithm” is a self-consistent sequence of operations or similar processing leading to a desired result. In this context, algorithms and operations involve physical manipulation of physical quantities. Typically, but not necessarily, such quantities may take the form of electrical, magnetic, or optical signals capable of being stored, accessed, transferred, combined, compared, or otherwise manipulated by a machine. It is convenient at times, principally for reasons of common usage, to refer to such signals using words such as “data,” “content,” “bits,” “values,” “elements,” “symbols,” “characters,” “terms,” “numbers,” “numerals,” or the like. These words, however, are merely convenient labels and are to be associated with appropriate physical quantities.
Unless specifically stated otherwise, discussions herein using words such as “processing,” “computing,” “calculating,” “determining,” “presenting,” “displaying,” or the like may refer to actions or processes of a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non-volatile memory, or any suitable combination thereof), registers, or other machine components that receive, store, transmit, or display information. Furthermore, unless specifically stated otherwise, the terms “a” or “an” are herein used, as is common in patent documents, to include one or more than one instance. Finally, as used herein, the conjunction “or” refers to a non-exclusive “or,” unless specifically stated otherwise.