This U.S. patent application claims priority under 35 U.S.C. § 119 to: Indian Patent Application No. 202121015335, filed on 31 Mar. 2021. The entire contents of the aforementioned application are incorporated herein by reference.
The disclosure herein generally relates to the field of computer assisted generation of questionnaire, and, more particularly, to systems and methods for integer linear programming-based generation of a personalized optimal questionnaire using a skill graph.
Large multinational Information Technology companies typically hire a large number of employees yearly, sometimes to the order of tens of thousands. Assuming an average interview time of 30 minutes, three interviewers in each interview, four candidates interviewed for every position, and 30,000 candidates were interviewed, 180,000 person-hours may be spent on just conducting interviews in one year. Given the diversity of candidates and complexity of job requirements and considering that interviewing is an inherently human and subjective process, it is a mammoth task to ensure consistent, uniform, efficient and objective interviews that result in high quality recruitment.
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems.
In an aspect, there is provided a processor implemented method comprising the steps of: receiving, via one or more hardware processors, input data comprising: (i) a resume associated with a candidate, the resume comprising a first set of skills, each skill in the first set of skills being associated with a first set of concepts, (ii) a job description comprising a second set of skills, each skill in the second set of skills being associated with a second set of concepts, (iii) a skill graph pertaining to each skill in the second set of skills, wherein each concept in the second set of concepts serves as a node in the skill graph and edges connect each node to a semantically related one or more concepts in the second set of concepts, (iv) a predefined question bank comprising a plurality of questions pertaining to each skill in the second set of skills, each question in the plurality of questions being mapped to (a) one or more concepts from the second set of concepts and (b) an associated difficulty level being one of easy, medium and hard, (v) a difficulty level proportion configured to define number of questions from the predefined question bank corresponding to each difficulty level, (vi) a proficiency level corresponding to each skill in the second set of skills, and (vii) a time budget allotted for an assessment of the candidate; computing, via the one or more hardware processors, a coverage of each question in the predefined question bank as a union of (i) the one or more concepts from the second set of concepts directly mapped to each question in the predefined question bank and (ii) one or more concepts from the second set of concepts that are neighbors, in the skill graph, of the directly mapped one or more concepts, the neighbors being the one or more concepts that are semantically related to each other; generating, via the one or more hardware processors, a question graph corresponding to the predefined question bank, the question graph having vertices corresponding to each question in the predefined question bank and edges connecting a pair of vertices if the corresponding questions share at least one concept from the second set of concepts, thereby representing related questions, while questions from the predefined question bank that do not share a concept represent unrelated questions; computing, via the one or more hardware processors, a coverage of the resume as a union of (i) the first set of concepts and (ii) one or more concepts that are neighbors, in the skill graph, of the concepts from the first set of concepts; computing, via the one or more hardware processors, a coverage of the job description as a union of (i) the second set of concepts and (ii) one or more concepts that are neighbors, in the skill graph of the concepts from the second set of concepts; formulating, via the one or more hardware processors, an objective function to be maximized, the objective function representing a sum of a plurality of additive components subject to one or more constraints; computing, via the one or more hardware processors, a solution to an Integer Linear Programming (ILP) formulation comprising the formulated objective function and the one or more constraints, the solution being in the form a bit array having a size corresponding to a number of questions in the predefined question bank; and generating, via the one or more hardware processors, a personalized optimal questionnaire, for the candidate, the personalized optimal questionnaire comprising an optimal set of questions from the predefined question bank, corresponding to a 1-bit in the bit array, and satisfying one or more criteria, each of the one or more criteria being the additive component of the objective function subject to the one or more constraints.
In another aspect, there is provided a system comprising: one or more data storage devices operatively coupled to one or more hardware processors and configured to store instructions configured for execution via the one or more hardware processors to: receive input data comprising: (i) a resume associated with a candidate, the resume comprising a first set of skills, each skill in the first set of skills being associated with a first set of concepts, (ii) a job description comprising a second set of skills, each skill in the second set of skills being associated with a second set of concepts, (iii) a skill graph pertaining to each skill in the second set of skills, wherein each concept in the second set of concepts serves as a node in the skill graph and edges connect each node to a semantically related one or more concepts in the second set of concepts, (iv) a predefined question bank comprising a plurality of questions pertaining to each skill in the second set of skills, each question in the plurality of questions being mapped to (a) one or more concepts from the second set of concepts and (b) an associated difficulty level being one of easy, medium and hard, (v) a difficulty level proportion configured to define number of questions from the predefined question bank corresponding to each difficulty level, (vi) a proficiency level corresponding to each skill in the second set of skills, and (vii) a time budget allotted for an assessment of the candidate; compute a coverage of each question in the predefined question bank as a union of (i) the one or more concepts from the second set of concepts directly mapped to each question in the predefined question bank and (ii) one or more concepts from the second set of concepts that are neighbors, in the skill graph, of the directly mapped one or more concepts, the neighbors being the one or more concepts that are semantically related to each other; generate a question graph corresponding to the predefined question bank, the question graph having vertices corresponding to each question in the predefined question bank and edges connecting a pair of vertices if the corresponding questions share at least one concept from the second set of concepts, thereby representing related questions, while questions from the predefined question bank that do not share a concept represent unrelated questions; compute a coverage of the resume as a union of (i) the first set of concepts and (ii) one or more concepts that are neighbors, in the skill graph, of the concepts from the first set of concepts; compute a coverage of the job description as a union of (i) the second set of concepts and (ii) one or more concepts that are neighbors, in the skill graph of the concepts from the second set of concepts; formulate an objective function to be maximized, the objective function representing a sum of a plurality of additive components subject to one or more constraints; compute a solution to an Integer Linear Programming (ILP) formulation comprising the formulated objective function and the one or more constraints, the solution being in the form a bit array having a size corresponding to a number of questions in the predefined question bank; and generate a personalized optimal questionnaire, for the candidate, the personalized optimal questionnaire comprising an optimal set of questions from the predefined question bank, corresponding to a 1-bit in the bit array, and satisfying one or more criteria, each of the one or more criteria being the additive component of the objective function subject to the one or more constraints.
In yet another aspect, there is provided a computer program product comprising a non-transitory computer readable medium having a computer readable program embodied therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: receive input data comprising: (i) a resume associated with a candidate, the resume comprising a first set of skills, each skill in the first set of skills being associated with a first set of concepts, (ii) a job description comprising a second set of skills, each skill in the second set of skills being associated with a second set of concepts, (iii) a skill graph pertaining to each skill in the second set of skills, wherein each concept in the second set of concepts serves as a node in the skill graph and edges connect each node to a semantically related one or more concepts in the second set of concepts, (iv) a predefined question bank comprising a plurality of questions pertaining to each skill in the second set of skills, each question in the plurality of questions being mapped to (a) one or more concepts from the second set of concepts and (b) an associated difficulty level being one of easy, medium and hard, (v) a difficulty level proportion configured to define number of questions from the predefined question bank corresponding to each difficulty level, (vi) a proficiency level corresponding to each skill in the second set of skills, and (vii) a time budget allotted for an assessment of the candidate; compute a coverage of each question in the predefined question bank as a union of (i) the one or more concepts from the second set of concepts directly mapped to each question in the predefined question bank and (ii) one or more concepts from the second set of concepts that are neighbors, in the skill graph, of the directly mapped one or more concepts, the neighbors being the one or more concepts that are semantically related to each other; generate a question graph corresponding to the predefined question bank, the question graph having vertices corresponding to each question in the predefined question bank and edges connecting a pair of vertices if the corresponding questions share at least one concept from the second set of concepts, thereby representing related questions, while questions from the predefined question bank that do not share a concept represent unrelated questions; compute a coverage of the resume as a union of (i) the first set of concepts and (ii) one or more concepts that are neighbors, in the skill graph, of the concepts from the first set of concepts; compute a coverage of the job description as a union of (i) the second set of concepts and (ii) one or more concepts that are neighbors, in the skill graph of the concepts from the second set of concepts; formulate an objective function to be maximized, the objective function representing a sum of a plurality of additive components subject to one or more constraints; compute a solution to an Integer Linear Programming (ILP) formulation comprising the formulated objective function and the one or more constraints, the solution being in the form a bit array having a size corresponding to a number of questions in the predefined question bank; and generate a personalized optimal questionnaire, for the candidate, the personalized optimal questionnaire comprising an optimal set of questions from the predefined question bank, corresponding to a 1-bit in the bit array, and satisfying one or more criteria, each of the one or more criteria being the additive component of the objective function subject to the one or more constraints.
In accordance with an embodiment of the present disclosure, the one or more hardware processors are configured to formulate the objective function to be maximized by: formulating an additive component f1 from the plurality of additive components, the additive component f1 being configured to select questions from the predefined question bank having a maximum coverage of the concepts in the second set of concepts; formulating an additive component f2 from the plurality of additive components, the additive component f2 being configured to select the related questions in the question graph; formulating an additive component f3 from the plurality of additive components, the additive component f3 being configured to select the unrelated questions in the question graph; formulating an additive component f4 from the plurality of additive components, the additive component f4 being configured to select questions having a coverage that shares one or more concepts in the first set of concepts by comparing the coverage of each question in the predefined question bank with the coverage of the resume; formulating an additive component f5 from the plurality of additive components, the additive component f5 being configured to select questions having a coverage that shares one or more concepts in the second set of concepts by comparing the coverage of each question in the predefined question bank with the coverage of the job description; formulating a constraint C1 configured to select questions such that time taken for responding to the personalized optimal questionnaire is less than or equal to the time budget allotted for the assessment; formulating a constraint C2 configured to select number of questions associated with the difficulty level being easy, medium and hard, based on the difficulty level proportion; and formulating a constraint C3, configured to select the number of questions, from the predefined question bank corresponding to each difficulty level on the proficiency level.
In accordance with an embodiment of the present disclosure, the additive component f1 is represented as: f1: Σi=1|Q|xi·|ψ(qi)|, with Ψ(qi) representing the coverage of a question qi, |Q| representing number of questions, pertaining to a skill, in the predefined question bank, and xi is a set of Boolean variables corresponding to an ith question, where 1≤i≤|Q|; the additive component f2 is represented as: f2:Σi=1|Q|Σj>i|Q|xi·xj·is_qgraph_edge(i,j), with xj is a set of Boolean variables corresponding to a jth question, xi representing a set of Boolean variables corresponding to an ith question, and qgraph_edge representing a function that returns a value 1 in the presence of an edge between the ith question and the jth question in the question graph; the additive component f3 is represented as: f3: Σi=1|Q|Σj>i|Q|xi·xj·¬is_qgraph_path(xi, xj), with xi representing a set of Boolean variables corresponding to an ith question, xj representing a set of Boolean variables corresponding to a jth question, and is_qgraph_path representing a function that returns a value 1 in the presence of a path between the ith question and the jth question in the question graph; the additive component f4 is represented as: f4: Σi=1|Q|xi·(ψ(qi)∩Φ(WR)≠Ø), with xi representing a set of Boolean variables corresponding to an ith question, WR representing the first set of concepts and Φ(WR) representing the coverage of the resume; the additive component f5 is represented as: f5: Σi=1|Q|xi≮(ψ(qi)∩Φ(Wj)≠Ø), with xi representing a set of Boolean variables corresponding to an ith question, Wj representing the second set of concepts and Φ(Wj) representing the coverage of the job description; the constraint C1 is represented as: C1: Σi=1|Q|xi·Tδ(q
In accordance with an embodiment of the present disclosure, the one or more hardware processors are configured to formulate an additive component f4d from the plurality of additive components, the additive component f4d being configured to select questions directly mapped to the first set of concepts; and formulate an additive component f5d from the plurality of additive components, the additive component f5d being configured to select questions that directly map to the second set of concepts.
In accordance with an embodiment of the present disclosure, the additive component f4d is represented as: f4d: Σi=1|Q|xi·(ψ0(qi)∩(QR)≠Ø), with xi representing a set of Boolean variables corresponding to an ith question, ψ0(qi) represents concepts directly related to the questions; and the additive component f5d is represented as: f5d: Σi=1|Q|xi·(ψ0(qi)∩(Wj)≠Ø), with xi representing a set of Boolean variables corresponding to an ith question.
In accordance with an embodiment of the present disclosure, the objective function to be maximized is a weighted sum of a plurality of additive components subject to one or more constraints, wherein weights associated with one or more of the plurality of additive components are configured to change relative importance associated with a corresponding additive component.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
Given the diversity of candidates and complexity of job requirements; and since interviewing is an inherently subjective process, it is an important task to ensure consistent, uniform, efficient and objective interviews that result in high quality recruitment. The method and system of the present disclosure facilitate automatic, and in an objective manner, selection of an optimal set of technical questions (from question banks) personalized for a candidate. This set helps an interviewer to plan for an upcoming interview of the candidate.
In the field of Computerized Adaptive Testing (CAT), the task involved selects questions (also called items) on-the-fly from a question bank, depending on how the student has answered the questions so far (i.e., adjusting to her/his ability level). CAT techniques are used in large-scale general online examinations. Students are anonymous in CAT, whereas interviewers have detailed knowledge about the candidate, for instance, through the Curriculum Vitae (CV) or resume. CAT is about a dynamically assembled, personalized, ordered sequence of questions which are dependent on the student's answers so far, whereas in the present disclosure, a static one-time selection of interview questions customized to cater to the breadth and depth of the resume is addressed. Skill graphs have been employed to create a semantically rich and detailed characterization of questions in terms of concepts. The optimization formulation uses the skill graph to generate constraints (including content balancing) and objective functions for selection of questions. Duerquiz (Qin et al. 2019) recommends a set of questions from a question bank using a skill graph given the job description, resumes of candidates, but unlike the present disclosure, does not consider the difficulty level associated with questions, or question graph. Also, the skill graph used in the prior art is generated using historical recruitment data. The state-of-the-art method can therefore not be used in the absence of historical recruitment data. Also, the state-of-the-art skill graph may not include necessary skills, if they are not part of the historical recruitment data.
Referring now to the drawings, and more particularly to
I/O interface(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface(s) can include one or more ports for connect g a number of devices to one another or to another server.
The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, one or more modules (not shown) of the system 100 can be stored in the memory 102.
During an assessment, say an interview, technical questions that explore the breadth and depth of the candidate's understanding of a particular technical skill need to be addressed. Given a candidate resume, a technical skill and a question bank about that skill, the task addressed by the present disclosure is how to select the “best” set of questions from the question bank that maximize some objective functions and meet required constraints. The selected “best” set of questions need to be personalized for the candidate, in the sense the set of questions need to be closely related to the candidate's background mentioned in the resume. Accordingly, in an embodiment of the present disclosure, the one or more hardware processors 104, are configured to receive, at step 302, input data comprising (i) a resume associated with a candidate, (ii) a job description, (iii) a skill graph, (iv) a predefined question bank, (v) a difficulty level proportion, (vi) a proficiency level, and (vii) a time budget allotted for an assessment of the candidate. In an embodiment, the assessment may be in the form of an interview or a written test.
The resume comprises a first set of skills, each skill being associated with a first set of concepts. Likewise, the job description comprises a second set of skills, each skill in the second set of skills being associated with a second set of concepts. It may be understood that the first set of skills and the second set of skills are not disjoint. A skill is a subject in the technical domain. For example, Machine learning, Java™, Python™, etc. are skills in Computer science domain. A concept is a topic associated with a skill. For example, linear regression is a concept associated with the skill Machine learning, Object-oriented programming is a concept associated with the skill Java™. The examples in the present disclosure are mainly related to the skill Machine learning, merely for ease of explanation. It may be noted by those skilled in the art that the method 300 is applicable to any skill in the technical domain.
The skill graph, in accordance with the present disclosure, pertains to each skill in the second set of skills, wherein each concept in the second set of concepts serves as a node in the skill graph and edges connect each node to a semantically related one or more concepts in the second set of concepts. The skill graph of the present disclosure received as an input data is built from public knowledge graphs such as DBPedia and WikiData which already contain a lot of knowledge about many skills. But being open-domain, only a small subset is relevant for a skill. Given such a larger knowledge graph, in accordance with the present disclosure, the skill graph is identified as a sub-graph of the larger knowledge graph. To aid in this task, a set of positive seed concepts for the skill are identified. This includes the concepts corresponding to the skill itself, e.g. Machine Learning, and additional concepts. A natural source for positive seeds is the syllabi and lecture schedules of open courses (e.g. MIT Open course ware). The output is a skill specific Knowledge Graph or a skill graph. To address this, the state-of-the-art (CRUMBTRAIL) is used. However, CRUMBTRAIL only considers strict-weak-order relations such as category hierarchy, and further, in the generated graph, the positive seed concepts are the only leaf concepts. This is an issue when working with a syllabus, since these are not specific enough and do not cover all leaves of the desired skill graph. Therefore, an initializer stage is run before invoking the CRUMBTRAIL as a pruner. The positive seeds then contain both leaf concepts and intermediate concepts from the category hierarchy in the larger knowledge graph. The initializer augments with concepts from the vertices of the larger knowledge graph that are at most k-hops away from the category hierarchy in the graph. An augmented positive seed set along with the larger knowledge graph is provided as input to the CRUMBTRAIL. This algorithm climbs bottom-up, traversing concepts from the leaf concepts to a layer containing the intermediate concepts, searching paths that connect the intermediate concepts to the leaf concepts. It then unifies these paths to construct the skill graph. However, the edge set of the skill graph now contains only the category hierarchy relations. Further post processing is done by adding to the set relations of other types from the larger knowledge graph between the concepts already in the vertex set of the skill graph.
The predefined question bank comprises a plurality of questions pertaining to each skill in the second set of skills, wherein each question in the plurality of questions is mapped to (a) one or more concepts from the second set of concepts and (b) an associated difficulty level being one of easy, medium and hard. An exemplary question bank is as shown in Table 1 below.
The difficulty level proportion, in accordance with the present disclosure, is configured to define number of questions from the predefined question bank corresponding to each difficulty level. The time taken to answer a question depends upon the difficulty level of the question. It is assumed that an easy question takes ˜1 minute to answer, a medium question takes ˜2 minutes and a hard question takes ˜3 minutes. The proficiency level received as part of the input data corresponds to an expected expertise the candidate must have for each skill in the second set of skills.
Coverage of a question is the set of concepts associated with it, along with their neighborhoods. In an embodiment of the present disclosure, the one or more hardware processors 104, are configured to compute, at step 304, a coverage of each question in the predefined question bank as a union of (i) the one or more concepts from the second set of concepts directly mapped to each question in the predefined question bank and (ii) one or more concepts from the second set of concepts that are neighbors, in the skill graph, of the directly mapped one or more concepts, the neighbors being the one or more concepts that are semantically related to each other.
For the seventh question in the question bank, ‘What are the evaluation metrics for regression?’, the associated concept is ‘Regression analysis’, so according to the skill graph in
To get related questions a question graph is generated using the questions in the question bank. In the question graph, any two questions with a common concept is connected by an edge. For instance, the questions 6 and 7 in the question back of Table 1 share an edge because of the common concept ‘Regression analysis’. In an embodiment of the present disclosure, the one or more hardware processors 104, are configured to generate, at step 306, a question graph corresponding to the predefined question bank, the question graph having vertices corresponding to each question in the predefined question bank and edges connecting a pair of vertices if the corresponding questions share at least one concept from the second set of concepts, thereby representing related questions, while questions from the predefined question bank that do not share a concept represent unrelated questions.
In an embodiment of the present disclosure, the one or more hardware processors 104, are configured to compute, at step 308, a coverage of the resume as a union of (i) the first set of concepts and (ii) one or more concepts that are neighbors, in the skill graph, of the concepts from the first set of concepts.
In an embodiment of the present disclosure, the one or more hardware processors 104, are configured to compute, at step 310, a coverage of the job description as a union of (i) the second set of concepts and (ii) one or more concepts that are neighbors, in the skill graph of the concepts from the second set of concepts.
In an embodiment of the present disclosure, the one or more hardware processors 104, are configured to formulate, at step 312, an objective function to be maximized, the objective function representing a sum of a plurality of additive components subject to one or more constraints.
In an embodiment of the present disclosure, the step of formulating an objective function to be maximized comprises formulating a plurality of additive components (f1, f2, f3, f4, f5) and a plurality of constraints (C1, C2 and C3) as explained hereinafter.
In an embodiment, the additive component f1 is configured to select questions from the predefined question bank having a maximum coverage of the concepts in the second set of concepts and is represented as:
with Ψ(qi) representing the coverage of a question qi, |Q| representing number of questions, pertaining to a skill, in the predefined question bank, and xi is a set of Boolean variables corresponding to an ith question, where 1≤i≤|Q|.
Table 2 below provides a recommended set of questions, from the exemplary question bank of Table 1, obtained using the additive component f1.
The questions 1, 3, 4, and 5 in Table 2 are about ‘Cluster analysis’ and ‘Regression analysis’ and their large coverage size may be attributed to the dense neighborhood of these concepts in the skill graph.
Questions asked in an interview are often related to an earlier question, indicating exploration of the depth of a candidate's knowledge. This is obtained by selecting questions which are directly connected in the question graph. Accordingly, in an embodiment, the additive component f2 is configured to select the related questions in the question graph and is represented as:
with xj is a set of Boolean variables corresponding to a jth question, xi representing a set of Boolean variables corresponding to an ith question, and qgraph_edge representing a function that returns a value 1 in the presence of an edge between the ith question and the jth question in the question graph. In the Table 2 above, questions 3 and 4 are related because they share a common concept, ‘Regression analysis’.
Questions asked in an interview are often unrelated to an earlier question, indicating exploration of the breadth of a candidate's knowledge. This is obtained by selecting questions which are unrelated, i.e. do not have a path in the question bank. Accordingly, in an embodiment, the additive component f3 is configured to select the unrelated questions in the question graph and is represented as:
with xi representing a set of Boolean variables corresponding to an ith question, xj representing a set of Boolean variables corresponding to a jth question, and is_qgraph_path representing a function that returns a value 1 in the presence of a path between the ith question and the jth question in the question graph. In the Table 2 above, question 1 is unrelated to question 2 or question 3.
Questions asked in an interview are mostly about concepts mentioned in the candidate's resume or about concepts related to them. This is obtained by selecting questions whose coverage shares some concepts with the resume concepts coverage. Accordingly, in an embodiment, the additive component f4 is configured to select questions having a coverage that shares one or more concepts in the first set of concepts by comparing the coverage of each question in the predefined question bank with the coverage of the resume and is represented as:
with xi representing a set of Boolean variables corresponding to an ith question, WR representing the first set of concepts and ϕ(WR) representing the coverage of the resume. In the Table 2 above, all the questions except question 6, are related to the candidate's resume concepts.
Similarly, questions asked in an interview are also mostly about concepts mentioned in the job description or about concepts related to them. This is obtained by selecting questions whose coverage shares some concepts with the job description concepts coverage. Accordingly, in an embodiment, the additive component f5 is configured to select questions having a coverage that shares one or more concepts in the second set of concepts by comparing the coverage of each question in the predefined question bank with the coverage of the job description and is represented as:
with xi representing a set of Boolean variables corresponding to an ith question, Wj representing the second set of concepts and ϕ(Wj) representing the coverage of the job description. In the Table 2 above, question 6 is related to the job description concept.
The objective function also needs to take into account certain constraints. Given a time budget allotted for an assessment of the candidate, the total time taken to answer all the questions in the optimal questionnaire needs to be less than the time budget allotted. In the Table 2, it may be noted that there are 3 easy questions, 2 hard questions and 1 question with medium difficulty. As mentioned above, it is assumed that the easy, medium and hard questions take 1, 2, and 3 minutes respectively, to respond. Thus, the total time to answer these questions is 12, which is less than the time budget allotted which is 15 mins (say). It may be noted that the time taken to ask a question is ignored here. Accordingly, in an embodiment, the constraint C1 is configured to select questions such that time taken for responding to the personalized optimal questionnaire is less than or equal to the time budget allotted for the assessment and is represented as:
with xi representing a set of Boolean variables corresponding to an ith question, δ(qi) representing the difficulty level associated with the question qi, Tδ(q
The difficulty level distribution of the questions in the optimal questionnaire needs to follow the user specified distribution. The number of easy, medium or hard questions needs to be less than or equal to the difficulty level proportion of easy, medium and hard questions specified in the input data. Here, the questions follow the user-specified difficulty level proportion, say 0.5 for easy questions, 0.33 for hard questions and 0.2 for medium questions. Accordingly, in an embodiment, the constraint C2 is configured to select number of questions associated with the difficulty level being easy, medium and hard, based on the difficulty level proportion and is represented as:
with xi, representing a set of Boolean variables corresponding to an ith question, k representing the difficulty level and mk representing the difficulty level proportion.
Based on the proficiency level that a candidate has in a given skill, as received as part of the input data, the difficulty level of selected questions is required to be adjected. For example, a candidate with high proficiency level may be asked fairly difficult questions. This provides another constraint, which states that the average difficulty level of selected questions should be above a user-specified constant and may be derived from the proficiency level of the candidate in a skill. Here, if it is assumed that ‘easy’ is 0, ‘medium’ is 1, ‘hard’ is 2, then the average difficulty level of the questions in Table 2 is 0.83 which is greater than the received proficiency level, 0.8 (say). Accordingly, in an embodiment, the constraint C3 is configured to select the number of questions, from the predefined question bank corresponding to each difficulty level on the proficiency level and is represented as;
with xi representing a set of Boolean variables corresponding to an ith question and h0 representing the proficiency level.
Considering all the requirements mentioned above, the method 300 selects a fixed set of questions for the candidate before an assessment, as shown in Table 2. Although a reasonable assessment must include questions related to as many concepts in the neighborhood of the concepts in the resume, it may be sometimes needed to refine this objective to specifically consider questions directly related to the concepts in the resume. Accordingly in an embodiment of the present disclosure, to increase selection of questions directly mapped to the concepts in the resume, an additional additive component f4d configured to select questions directly mapped to the first set of concepts may be formulated. The additive component f4d may be represented as:
with xi representing a set of Boolean variables corresponding to an ith question, ψ0(qi) represents concepts directly related to the questions.
Likewise, a reasonable assessment must also include questions related to as many concepts in the neighborhood of the concepts in the job description. Accordingly, in an embodiment of the present disclosure, to increase selection of questions directly mapped to the concepts in the job description, an additional additive component f5d configured to select questions directly mapped to the second set of concepts may be formulated. The additive component f5d may be represented as:
with xi representing a set of Boolean variables corresponding to an ith question.
Sometimes, in an assessment, it may be needed to change the relative importance of any of the additive components. Accordingly, in an embodiment of the present disclosure, the objective function to be maximized is a weighted sum of a plurality of additive components subject to one or more constraints, wherein weights associated with one or more of the plurality of additive components are configured to change relative importance associated with a corresponding additive component.
In an embodiment of the present disclosure, the one or more hardware processors 104, are configured to compute, at step 314, a solution to an Integer Linear Programming (ILP) formulation comprising the formulated objective function and the one or more constraints, the solution being in the form a bit array having a size corresponding to a number of questions in the predefined question bank. In an embodiment, a Linear Program solver such as a COIN-Ohl Branch and Cut (CBC) solver in Python linear programming (PULP) Application Programming Interface (API) is used.
The vector (bit array) is a combination of 0's and 1's, wherein the questions in the predefined question bank corresponding to the 1's in the vector form the questions to be selected as part of the personalized optimal questionnaire. Accordingly, in an embodiment of the present disclosure, the one or more hardware processors 104, are configured to generate, at step 316, a personalized optimal questionnaire, for the candidate, the personalized optimal questionnaire comprising an optimal set of questions from the predefined question bank, corresponding to a 1-bit in the bit array, and satisfying one or more criteria, each of the one or more criteria being the additive component of the objective function subject to the one or more constraints.
Baselines: To compare against the Integer Linear Programming (ILP) formulation-based method of the present disclosure, the following baselines for selecting questions for a candidate having resume R were used:
Baseline 1 for resume, BR1: Select nq questions randomly from QBs, where nq is same as the number of questions in the optimal set.
Baseline 2 for resume, BR2: Let FR(s) denote the set of concepts related to skill s mentioned in the resume R. Select nq questions randomly from QBs, where coverage of each selected question q has at least one concept common with the neighborhood of the set of concepts FR (s), i.e. ψ(q)∩ϕFR (s)≠Ø.
Baseline 3 for resume, BR3: same as BR2 but ensures even distribution of difficulty level of the questions.
DuerQuiz (Qin et al., 2019): Since no implementation is publicly available, an in house version of this baseline was implemented. Since historical resumes or skill graph edge labels are not used in the method of the present disclosure, the QuerQuiz method was adapted in a best possible manner. The terms corresponding to historical resumes were ignored when assigning weights to concepts. Further, descendants of the concepts as direct neighbors in the skill graph were approximated, both for weight initialization and weight propagation.
Mixed Integer Programming (MIP): The method of the present disclosure is represented as MIP and the following weights and hyper-parameters were used:
=100, w2=100, W3=100, w4=30, =70, w5=30, w5d=70, m0=0.3, m1=0.4, mz=03, h0=0.9 and T=45. The weights were not fine-tuned, aside for T for controlling the number of recommended questions. For DuerQuiz, there was no thumb rule for setting hyper-parameters. A dissipation hyper-parameter or propagation weight αc=0.85 was set by hand-tuning on one resume and a smoothing weight βf=0.001 was further set.
All experiments were performed on an Ubuntu 18.04.5 LTB machine with 8-core Intel™ i7-8550U 1.80 GHz processors, and 16 GB memory. For MIP, generation of questions with 45 min time budget (25 questions) takes 155 secs on an average. Comparison of the MIP method has been done with baselines BR1, BR3 (stronger version than BR2) and the DuerQuiz method.
Dataset: A skill graph of 714 concepts and 903 edges (average degree 2.51) for two skills Machine Learning and Deep Learning was used. A degree of a vertex (here concept) in a graph is the number of edges it is linked to. So average degree of a graph is about all the vertices in it. A question bank of 549 questions associated with the two skills was used. Each question was annotated with concepts from the skill graph (1.18 concepts per question on an average). Finally, resumes of 40 candidates (mostly fresh Information Technology graduates) who were actually interviewed were used. The skill graph concepts associated with the resumes (4.7 concepts per resume on an average) were identified. For 20 of the candidates, actual questions asked during the interviews ere obtained. Of these, only the questions related to the two skills under consideration were used. The average number of questions per candidate is 5.05.
Intrinsic evaluation on actual interviews: In a first evaluation, the set of suggested questions with the set of actually asked questions was compared.
In a second evaluation, the questionnaire generated using pairs of the methods (MIP-BR1, MIP-BR3, MIP-DuerQuiz) mentioned above were compared by three experienced human interviewers. The questionnaire was distributed randomly to each of the 20 candidates, for e.g. 7 candidates got the pair MIP-BR1, and so on. For each candidate, two sets of questions were generated, each for the skill Machine Learning, using the method pair assigned. The two sets of questions were then presented to each of the 3 reviewers along with skills extracted from the resumes, without disclosing the method pair used. For each candidate, each human interviewer gave a comparative ranking, indicating whether set 1 was better than set 2. Each human interviewer used his/her own intuition for the comparison.
There were 7*3-21 evaluations of MIP-BR1 pair, where MP did well in 19 evaluations. Using X2 test with 99.9% confidence, the null hypothesis was rejected, and it was concluded that MIP performed better than BR1. Similarly, MIP performed better than BR3 in 14 evaluations. Hence, MIP performed better than the relatively simpler methods BR1 and BR3. However, MIP performed better than DuerQuiz in only 6 evaluations. There was a large disagreement amongst the human interviewers and the discussions showed that the human evaluation criteria was considerably simpler than the objective functions used in MIR For instance, none of the human interviewers considered the inter-linking of the questions in the evaluation, nor did they consider duplication of questions across different candidates as undesirable (not personalized); but these are important factors in MIP for selecting questions.
Thus it is seen from the experimental validations, that the method of the present disclosure ensures the quality of output of the system 100 using the method 300 is improved when compared to state-of-the-art methods and facilitates selection of an optimal set of technical questions, from a question bank, personalized for a candidate in an automatic and objective manner. This ensures consistent, standardized, efficient and objective interviews that result in high quality recruitment, given the diversity of candidates, complexity of job requirements and the fact that interviews are inherently subjective.
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means, and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more hardware processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
202121015335 | Mar 2021 | IN | national |
Number | Name | Date | Kind |
---|---|---|---|
10796217 | Wu | Oct 2020 | B2 |
20190188645 | Monasor | Jun 2019 | A1 |
20210233030 | Preuss | Jul 2021 | A1 |
20210391039 | Laszlo | Dec 2021 | A1 |
Number | Date | Country |
---|---|---|
111445200 | Jul 2020 | CN |
2922002 | Sep 2015 | EP |
2010-61179 | Mar 2010 | JP |
10-2015-0000941 | Jan 2015 | KR |
Entry |
---|
Qin et al., “DuerQuiz: A Personalized Question Recommender System for Intelligent Job Interview,” Applied Data Science Track Paper (2019). |
Number | Date | Country | |
---|---|---|---|
20220366351 A1 | Nov 2022 | US |