1. Technical Field
The disclosure relates generally to data mining and knowledge discovery.
2. Description of Related Art
A variety of person-to-person communication forms have been created throughout history. While many forms are still in use today, electronic mail, “e-mail,” currently has become a ubiquitous tool in both the business and private sectors of everyday life. The use of e-mail and content of an e-mail message can be analyzed to derive other information not necessarily inherent in the content itself. Natural language processing techniques and pattern recognition techniques when applied to e-mail messaging and e-mail content can be used to derive other, non-inherent, information. For example, within an organization's computer network, based on an analysis of e-mail message header and attachment information, a system administrator may derive reports based on that information rather than the content to determine appropriate uses of e-mail in the network without reading the message content itself. As another example, monitoring and displaying to a user a variety of e-mail usage statistics may provide information that may affect the user's own e-mail usage practices and habits.
Identifying organizational hierarchical structures has been a focus for data mining and knowledge discovery researchers. Organizational hierarchy knowledge may be a useful tool for many types of studies. For example, an organization may have an interest in understanding their formal or informal hierarchy and communication flow as a way of improving knowledge sharing. With respect to businesses, the hierarchical, usually in the form of a known manner “organization chart,” may be often constructed by extensive and expensive manual labor given access to precise, given data, namely, each employee's name, title, ranking of such a title, and the like. There is a need for data mining and knowledge discovery techniques for reducing such extensive manual labor tasks and improving derivative results.
The invention generally provides for using personal communications data for approximating a hierarchical structure.
The foregoing summary is not intended to be inclusive of all aspects, objects, advantages and features of the present invention nor should any limitation on the scope of the invention be implied therefrom. This Brief Summary is provided in accordance with the mandate of 37 C.F.R. 1.73 and M.P.E.P. 608.01(d) merely to apprise the public, and more especially those interested in the particular art to which the invention relates, of the nature of the invention in order to be of assistance in aiding ready understanding of the patent in future searches.
Like reference designations represent like features throughout the drawings. The drawings in this specification should be understood as not being drawn to scale unless specifically annotated as such.
In general, acquired data about inter-organizational communication interactions—such as e-mail, including instant messaging exchanges, telephone call routing connections, voice mail messaging, paper mail, or any like “pairwise,” person-to-person, communication data—may be used to form constructs which are indicative of a hierarchical structure for the organization. A graphical layout, or other imaging diagram, may be derived from the addressing data associated with the interactions to depict a communication network construct of the organization over time. Placement of individuals in the graphical construct is used to infer each individuals placement in an organizational hierarchy construct. In order to describe details of the present invention, an exemplary embodiment using e-mail logs—a substantially complete set of the “To” and “From” information available at the communications network system level during a predetermined, or given, time period—is used for approximating the hierarchical structure of the organization.
Based on the To/From data, an inter-organizational communications network construct may be formed 105. One methodology 201 for forming a communications network construct is shown in
Referring to both
Each nodal connector 305 may be a virtual spring with a given equal spring constant. Since the nodes repel each other, and each spring constant is identical, in the final diagram 301, in effect, the length of each virtual spring may be selected to be inversely proportional to the amount of e-mail between the person nodes 303; in other words, the higher the number of e-mail messages between two nodes, the shorter, “stronger,” the connector may be. Thus, in another aspect, each nodal connector 305 may be also indicative of a higher e-mail messaging frequency between nodes 303 at each end thereof.
A calculation 205 is performed for each possible pair of nodes 303 to determine the repulsion between them; e.g., for a given repulsive force, repulsion may be illustrated as inverse with the square of the distance between them. The nodal pairs in analysis may be moved away from each other according to the calculated amount of repulsion 207.
For each nodal connector 305 inserted once the threshold is achieved between two nodes 303 based on the To/From data 103, how much each spring wants to shrink or lengthen may be calculated 209 based on the frequency of messaging.
Based on the shrink/lengthen calculation 209, the nodes 303 at each end may be moved accordingly.
The process may be repeated 213 for each nodal pair until the diagram 301 is substantially stabilized. In
Returning now to
From the graph 107, a predictive approximation of organizational structure can be derived 109. It should be recognized by those skilled in the art that generation of a communications network image, graph, or other intercommunications construct for the period-in-question, itself may be completely transparent to the user; in other words, the user may be only interested in the goal of generating an organizational hierarchy. Thus, the addressing data may be simply stored in appropriate tables or the like toward achieving this goal.
It will be readily apparent that in most corporations, the chief executive officer, “CEO,” is a publically known figure to be placed at the apex of the pyramid. However, the process 401 may be implemented for sub-structures of the organization, such as one operating division within a corporation where such information is not publically available or known to an analyst using the process. Therefore, if the topmost person in the organization known, 405, YES-path, that person/node may be chosen 409 as the current person/node under consideration. If the topmost person in the organization is not known, 405, NO-path, as a hierarchical structure construction starting point, the centermost node in the graph—or other locus depending on the specific implementation—may be assigned 407 as the topmost person. Continuing the corporate operating division example, the centermost node is predicted to be the “Head of Division.” The name of the person associated with the centermost node is assigned to the top of the approximated organization chart. It should be recognized at this point that this approximation may not be true. That is, there may be a member of the organization who received and sent more e-mail during the predetermined time period than the actual Head of Division. Nevertheless, in testing simulations of the present invention, it has been found that the exemplary method employed in the experiment had a better than about sixty-five percent (65%) accuracy in approximating the actual hierarchical structure of the tested organization. When the topmost person is known to start, the accuracy may improve to better than about seventy-five percent (75%).
Once the topmost person is assigned, that topmost person/node 303 is selected 409 as the first, “current,” person/node-under-analysis. Each iteration of the method involving a subsequent person/node 303 becomes the next “current” person/node-under-analysis. A decision 411 is made as to whether the current person/node has nodal connectors 305 to other nodes that are further from the center of the graph than the current person/node. For each current person/node 303 where such a connector 305 exists, 411, YES-path, the persons represented by the connected nodes may be added 413 to the approximated organization structure as direct reportees to the current person/node-under analysis 409. In other words, it may be predicted that those nodes represent persons who are managed directly by the current person/node-under analysis 409 because they have direct e-mail access.
Once those nodes are accounted for 413, or the current person node has no connectors to nodes that are farther from the center of the graph than the current person/node, 411, NO-path, a determination is made 415 preferably as to whether there may be persons/nodes yet to be considered. If so, 415, YES path, the next closest node 3030 to the center of the graph may be selected 417 as the current person/node-under analysis. In this embodiment, the process loops back to step 411. If not, the approximation analysis may be terminated and the approximated organization structure is provided 419, 111 (
Having been described hereinabove, it should now be apparent to persons skilled in the art that the present invention may be implemented in a software, firmware, or the like, computer program and contained in a computer memory device.
The present invention may be implemented as a method of doing business such as by being a purveyor of software or providing a service in which the business employs the above-described methodologies to present a client organization with a finished product such as a report based on the data mining and knowledge discovery results from analyzing specific communications data provided by the client organization.
It is also to be recognized that only the To/From data may be needed for the analysis of hierarchical structure. In other words, given a database of To/From data for a given set of individual nodal artifices—which may be persons, organizations, collectives, and the like—prediction of some form of relationship between those nodes may be implied.
The foregoing Detailed Description of exemplary and preferred embodiments is presented for purposes of illustration and disclosure in accordance with the requirements of the law. It is not intended to be exhaustive nor to limit the invention to the precise form(s) described, but only to enable others skilled in the art to understand how the invention may be suited for a particular use or implementation. The possibility of modifications and variations will be apparent to practitioners skilled in the art, particularly with respect to adaptations for other peer-to-peer communications data such as telephone call logs, instant e-mail messaging exchanges, and the like. No limitation is intended by the description of exemplary embodiments which may have included tolerances, feature dimensions, specific operating conditions, engineering specifications, or the like, and which may vary between implementations or with changes to the state of the art, and no limitation should be implied therefrom. Applicant has made this disclosure with respect to the current state of the art, but also contemplates advancements and that adaptations in the future may take into consideration of those advancements, namely in accordance with the then current state of the art. It is intended that the scope of the invention be defined by the Claims as written and equivalents as applicable. Reference to a claim element in the singular is not intended to mean “one and only one” unless explicitly so stated. Moreover, no element, component, nor method or process step in this disclosure is intended to be dedicated to the public regardless of whether the element, component, or step is explicitly recited in the Claims. No claim element herein is to be construed under the provisions of 35 U.S.C. Sec. 112, sixth paragraph, unless the element is expressly recited using the phrase “means for . . . ” and no method or process step herein is to be construed under those provisions unless the step, or steps, are expressly recited using the phrase “comprising the step(s) of . . . . ”