© 2016 PreSeries Tech, SL. A portion of the present disclosure may contain material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the present disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
This disclosure pertains to software-driven machine learning and, more specifically, to systems and methods that utilize machine learning technologies to quantitatively evaluate multiple types of data about individuals and teams to predict the likelihood a corresponding enterprise will be successful.
Early-stage investors are frequently characterized as following a “gut-driven”, “term driven”, or “lemming-like” approach. In summary, gut-driven investors primarily follow their instincts about specific companies in making investment decisions. Term driven investors focus on maximizing potential returns by focusing on companies that offer better financial terms than others. Lemming-like investors let others identify promising opportunities and follow them, frequently co-investing in companies that others feel are promising. It would be advantageous to have a technical tool for evaluating investment opportunities to identify those entities that are likely to be more successful than others.
We disclose a system and methods that include characterizing individual team members in terms of specified “character elements” or attributes, and combining the individual characterizations into an aggregate characterization of the team. The team evaluation does not merely sum individual attributes; rather, it analyzes the composition of the team relative to predetermined metrics, taking into account what combinations of individual team member attributes are more likely to lead to success of the team. The system and methods described here use machine learning technologies to evaluate multiple types of data about individuals and teams to predict the likelihood the company will be successful and therefore be a good investment. These individual and team characterizations can be combined with other measures of company performance relevant to predicting whether the company will succeed or fail. This brief summary is not intended to limit the scope of the more detailed description that follows, nor does it limit the scope of the claims. It is provided as a convenience to the reader.
The included drawings are for illustrative purposes and serve to provide examples of possible structures and operations for the disclosed inventive systems, apparatus, methods and computer-readable storage media. These drawings in no way limit any changes in form and detail that may be made by one skilled in the art without departing from the spirit and scope of the disclosed implementations.
Examples of systems, apparatus, computer-readable storage media, and methods according to the disclosed implementations are described in this section. These examples are being provided solely to add context and aid in the understanding of the disclosed implementations. It will thus be apparent to one skilled in the art that the disclosed implementations may be practiced without some or all of the specific details provided. In other instances, certain process or method operations, also referred to herein as “blocks,” have not been described in detail in order to avoid unnecessarily obscuring the disclosed implementations. Other implementations and applications also are possible, and as such, the following examples should not be taken as definitive or limiting either in scope or setting.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which are implemented via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which are implemented on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
In the following detailed description, references are made to the accompanying drawings, which form a part of the description and in which are shown, by way of illustration, specific implementations. Although these disclosed implementations are described in sufficient detail to enable one skilled in the art to practice the implementations, it is to be understood that these examples are not limiting, such that other implementations may be used and changes may be made to the disclosed implementations without departing from their spirit and scope. For example, the blocks of the methods shown and described herein are not necessarily performed in the order indicated in some other implementations. Additionally, in some other implementations, the disclosed methods may include more or fewer blocks than are described. As another example, some blocks described herein as separate blocks may be combined in some other implementations. Conversely, what may be described herein as a single block may be implemented in multiple blocks in some other implementations. Additionally, the conjunction “or” is intended herein in the inclusive sense where appropriate unless otherwise indicated; that is, the phrase “A, B or C” is intended to include the possibilities of “A,” “B,” “C,” “A and B,” “B and C,” “A and C” and “A, B and C.”
In
A feature synthesizer component 110 is arranged to process the received data from sources 102. Such processing may include identifying data fields, field types, and corresponding data values, etc. The feature synthesizer may determine which fields of data to import into a dataset, and which to ignore. The processing may include cleaning or “scrubbing” the data to remove errors or anomalies. In some cases, text analysis may be applied to text fields, for example, tokenizing, stop word processing, stemming, etc. to make the data more usable. Further, various types of input data besides text can be used; for example, sound or image files. As a simple example, “wears a tie” can be synthesized from an image of a person wearing a tie, such as from a profile posted in social media.
More importantly, the feature synthesizer 110, although illustrated as a single entity for simplicity, actually comprises N individual feature synthesizer components. Each individual feature synthesizer is arranged to provide data, derived from the input data sources 102, and store it in a corresponding Dataset 112 for use with a corresponding specialized model builder 120. The system is initialized or configured for processing a given set of attributes of interest.
A feature synthesizer for a given attribute is configured to recognize, and extract from the input data, information that is indicative of the attribute of interest. Some examples are given in Table 1 below. It then stores the extracted data in the corresponding dataset 112. As discussed below, the process is repeated periodically over time. To illustrate, a feature synthesizer directed to technology understanding, for example, might look for data on a person's education, technical degrees, patents, and work experience. It may collect what degrees were earned at what schools, and when. It might even look for grade reports or special awards or designations such as cum laude. It may evaluation technical publication in which the person was an author. All of this data is collected into a dataset for the technology understanding attribute. As another example, a feature synthesizer for an attribute attention to detail may collect writings authored by the person on interest, and determine a frequency of misspellings or grammatical errors in those writings. Or, inconsistencies within the same writing may be an indicator of lack of attention to detail. Again, the corresponding feature synthesizer component gleans data relevant to its task from the input data sources and stores it in a dataset.
The dataset must also include an assessment or score for the particular attribute or variable of interest, at least for some of the records. In some cases, this evaluation may be conducted programmatically. In other cases, records may be evaluation by an expert with regard to the attribute of interest, and the evaluation results input to the dataset in association with the records reviewed. The evaluation may be expressed as a binary result (detail oriented or not detail oriented; high level of technical understanding, or not). In some embodiments, these evaluations may take the form of an analog value, say between 0 and 1.
Referring again to
Example types of information (attributes) about team members that could be included in an entrepreneurial team member's profile could include background and experience data, such as that shown below.
Other information that could be included in a profile might address character attributes such as “nonconformist?”, “dissenter?”, or “maverick?”, or aggregate attributes such as “rebel” for the preceding distinct attributes. Suitable feature synthesizers can be configured to collect the data for model building.
In some systems, data may be collected for a mature organization, as distinguished from a startup. Here we mean an entity that has reached an “outcome” indicative of success or failure (conveniently, a binary variable). Preferably, such data may be collected from thousands of organizations so that it is statistically meaningful. Further, detailed information for each such entity may include attribute data for each team member in that entity, such as described herein. That data may be processed, and the actual outcomes included in appropriate datasets. This information may be used to further train or “tune” the attribute models by taking into account the eventual outcomes of actual companies.
Referring again to
In other embodiments, mutability may be a single Boolean value (indicating mutable or not). For example, whether a person (team member) speaks English might take a Boolean value, or it may have a scaled value from 0 (not at all) to 1 (fluent). Referring again to
Referring again to
The same process is repeated for each team member, or any selected subset of a team. Thus, the feature synthesizer, as part of collecting raw data, will identify the other team members of interest, and collect data associated with each of them. Accordingly, a dataset may include records for each team member of interest, or separate datasets may be provisioned. Details of the data storage are a matter of design choice. In
Individual team member profiles may be combined by formal mathematical rules into an aggregate profile for the team as represented in
Each vector may correspond to a vector such as those described with regard to
An EC (character score) represents, and quantifies objectively, whether or to what extent an individual is appropriate to start or continue leading a company and if the character is predicted to evolve positively or negatively. More specifically, the mutability metrics stored in feature vectors such as 160, 170 can be acquired and analyzed over time in the vectors from T=0 to T=M. With these metrics, average values, rates of change, and other statistical measures can be used to assess and predict where each attribute is moving for those that are mutable. Increasing values of a positive attribute may be contribute to a higher overall team member score 312, 314, 316 and to a higher team score 333.
Individual team member profiles may be combined by formal mathematical rules into an aggregate profile for the team as represented in
The combiner 200 (
Distribution of Character Components
Some character components or attributes are generally positive for every individual in which they are found, for example, hard working or well educated, and they remain positive when these attributes are found from the input data to exist across multiple members of a team. In a sense, they may be considered additive contributions to the overall team score. In some cases, attributes such at assertiveness, strong leader, authoritarian may be positive for an individual, but may not be positive where found in multiple members on the same team. For this reason, our system may implement a preferred distribution (or composition) in assessing a team. For some attributes, a very small number of instances (team members) may be preferred. For other attributes, the more team members that exhibit the attribute, the better for overall team function. To that end, we create a preferred distribution for each character component. Then the process assesses how closely the distribution for a given attribute matches the preferred distribution. Mathematically, this can be done in various ways, for example, summing the differences between the actual distribution and the preferred distribution, or using a sum of squares, etc. In some embodiments, correlation coefficients may be used to assess this “closeness” or deviation from the preferred distribution. Preferred distributions may be created (or inferred) based on historical data that describes teams that were successful.
The operators 410, 412, 414 may be selected according to the specific attribute of interest. To illustrate, if the team is going to work together in the English language, it would be important for all members of the team to speak English. Here, we will use English language skill for attribute #1, and assume it is a Boolean variable. Thus we apply the Boolean AND operator for operator 410 so that the team result at 420 will be true (Boolean 1) only if all team members speak English.
As another example, suppose the team is going to build a web application for consumers to use. It would be important for at least one team member to be skilled at user building user interfaces (UX). Here, we will use UX skill for attribute #2, and again assume it is a Boolean variable (the skill is present or it is absent in each team member, as ascertained from the input data by a corresponding feature synthesizer and model. Assuming that one person skilled in UX is enough, we apply the Boolean OR as operator 412 in the drawing, to determine the team result 422. If one or more team members have that UX skill, it will result in the result 422 true.
Suppose that attribute #N is a strong leader and authoritarian. It would be helpful to have exactly one person on the team with that attribute. Again, for now, we assume it is a Boolean variable. For the operator 414 we apply the Boolean XOR operator across the team members. If there is one team member with that attribute, the output at 426 will be true. In general, Boolean logic can be applied to realize any desired composition of the team. Further, compound expressions can be used in forming the team values for a given attribute. A compound expression here refers to a Boolean operation where at least one of the operands is itself a Boolean function of the team member's data.
The results at 420, 422, 426, that is the Boolean output for the team for each attribute, together form a team profile—a vector of Boolean values. The number of “ones” can be counted to form a team score. This score will improve in proportion to the number of elements or attributes for which the team “fits” the preferred distribution. This score can be used to compare teams or subsets of team quite readily. Different sets of attributes can be used by creating a desired or paradigm distribution and processing the data with correspondingly selected operators. Comparison of the team's resulting profile to the paradigm distribution will immediately identify where the team misses the mark. As explained above, some attributes are not simply input data from the input data sources. Rather, some attributes must be inferred, or estimated, by the feature synthesizer and model building processes described above.
We have discussed several examples of Boolean attributes. Other attributes, or some of the same attributes, may have numeric values, for example, in a range of 0 to 1. For example, English language proficiency or UX programming skills can be assessed on a numeric scale. A team can be evaluated using these metrics as well.
The team score can be used for comparison to other teams. Importantly, the delta data can quickly identify where the team attributes depart from the preferred values. Further, the size of those departures can be reported to help to build a better team.
In viewing and using these metrics, the mutability values discussed above may be taken into consideration. Where a team score is relatively low, but the attributes that contribute to lowering the score are mutable in a positive direction, the score may improve over time. On the other hand, where the mutability values are low or negative, improvement over time is less likely.
Further with regard to
Otherwise, proceed to block 644 to combine team member feature vectors to form a team (aggregate) feature vector. Next, compare the team vector to a preferred distribution or composition, block 646, as described in more detail above. The differences between the team vector and the preferred composition may be assessed, block 650, which may include generating an overall team score for ready comparison to other teams. Finally, results reporting, block 652, may include final team score, problematic attributes, mutability assessment, and other metrics which can be used to predict success of the team, and to improve its composition. The process concludes at terminator 660.
One of skill in the art will recognize that the concepts taught herein can be tailored to a particular application in many other ways. In particular, those skilled in the art will recognize that the illustrated examples are but one of many alternative implementations that will become apparent upon reading this disclosure. It will be obvious to those having skill in the art that many changes may be made to the details of the above-described embodiments without departing from the underlying principles of the invention. The scope of the present invention should, therefore, be determined only by the following claims.
This application is a non-provisional of and claims priority benefit to U.S. provisional patent application 62/307,918, filed Mar. 14, 2016, and U.S. provisional patent application 62/308,095, filed Mar. 14, 2016, both of which are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
62308095 | Mar 2016 | US | |
62307918 | Mar 2016 | US |