The invention concerns identifying relationships between users of a communication domains. For example, but not limited to, identifying relationships between users of a online communications domain such as a social networking website. Aspects of the invention include methods, computer systems and software.
Communication domains are typically comprised of multiple online communication enabled devices that each are equipped with software to send and receive communications in that domain. Example devices are computers, portable computers, mobile phones and personal digital assistant devices.
Communication domains are currently used by millions of people (i.e. users) worldwide to interact on a daily basis. Example domains are online social networks, virtual worlds and instant messaging systems.
Users of these domains have relationships with others users. These relationships define how the two related users can communicate in the domain. A relationship between two users must be identified before the users can communicate in that domain. A relationship may be a one-to-one or a one-to-many relationship. An example of a one-to-one relationship is between two users. An example of a one-to-many relationship is between a user and another user that represents a set of users. The set may be closed, such as set of specific users defined in that user's buddy list. Alternatively, the set may be open, such as all users residing in Australia.
A central service provider typically supports services associated with the online communications domain. The service provider, such as a constellation of central server, stores information relating to each user, such as their identity information and information on their identified relationship.
In most existing online communication domains, the relationship between users are explicitly defined by users themselves, and usually include a request and an authorisation step. The process for a new user is described as follows:
The new user may also have new relationships identified as other users perform searches on the new user's identification information.
In some domains the authorisation is automatic with no authorisation check required.
Once the new user has a relationship with another user, the new user can use the communications domain to communicate with that user.
In a first aspect the invention provides a computer implemented method of identifying relationships between users in a communications domain comprising:
It is an advantage of the invention that the relationships between users can be identified while preserving the privacy of each of the users, both of their actual identity and the identity of each user they have a relationship with. By receiving only one-way encrypted identity information the underlying identity information is unintelligible to the communications domain. This helps to prevent the use of this information in an unauthorized manner by the service provider of the communications domain and reduces the negative impacts of potential unauthorized access of the information.
At the same time the identified relationships can be used to help identity further relationships or help ensure further identified relationships are correct while still maintaining the privacy of the users.
The one-way encrypted identification information of an identification token may correspond to one or more identity attributes of the associated domain user.
The one-way encrypted identification information of a relationship token may correspond to one or more identity attributes of a possible domain user.
An identity attribute may be one of:
An identity attribute may be a predetermined number of suffix characters of an identity attribute. It is an advantage of this embodiment of the invention that matches can be identified despite variations in the notation of identity attributes that typically vary in the prefix of an identity attribute.
The identification information associated with possible domain users may be identification information known to the first user.
The one-way encrypted information is coded in a way that renders it unintelligible to the recipient. In this case, the communications domain is unable to decipher the underlying identification information as the tokens cannot be inverted. The encrypted identification information may be computed using a cryptographic secure hash. Examples includes SHA-1 or MD5.
The method may have an additional step of authorising the relationship with the domain user associated with the matched identification token.
The method may further comprises the step of analysing previously identified relationships of the first user and/or the domain user associated with the matched identification token and only identifying a match in step (c) if the previously identified relationships are indicative that the two users know each other.
The method may further comprise the step of only identifying a match in step (c) if at least two identification tokens associated with the domain user are matched to the relationship tokens of the first user.
The allowed communications between the first user and the domain user associated with the matched identification token in the communications domain may include one or more of:
In a second aspect the invention provides software, being computer readable instructions stored on computer readable media that when executed by a computer causes the computer to perform the method according to the method described immediately above.
In a third aspect the invention provides a computer system for identifying relationships between users in a communications domain comprising:
The computer system may be a server or a collection of servers that offer services that support the communications domain.
In a fourth aspect the invention provides a computer implemented method for identifying relationships between a first user and other users in a communications domain, comprising:
The one-way encrypted identification information of a relationship token may corresponds to one or more identity attributes of a possible domain user. The identity attributes may be read from an electronic address book of the first user.
The method may further comprise the steps of:
In a fifth aspect the invention provides software, being computer readable instructions stored on computer readable media that when executed by a computer causes the computer the perform the method described immediately above.
In a sixth aspect the invention provides a communications enabled device to identify relationships between a first user and other users in a communications domain, comprising:
An example of the invention will now be described with reference to the following drawings in which:
An example of identifying relationships between users of a communications domain will now be described. In this example the communications domain is an online social network.
Referring to
Here we can see that user A's identity information 20 that is stored in memory of user A's computer has three identity attributes A1, A2 and A3. In this example A1 is an email address, A2 is a mobile phone number and A3 is a home phone number.
User A's address book 22 is also stored in the memory of user A's computer and has the identity information of users that are known to user A. User B is known to user A since user B's mobile number B2 is stored in the address book 22. This creates a relationship between users A and B shown schematically in
User C is also known to user A who has stored user C's email address C1 and mobile phone number C2 creating a relationship shown schematically in
The relationship 26 between user A and C is uni directional since user C does not have stored in their address book 28 any identity information of user A.
The relationship between users A and B is bidirectional since user A is known to user B as user A's email address A1 is stored in user B's address book 32. This forms the relationship between users B and A shown schematically in
Again, the relationship between users B and C is bidirectional. User B has user C's office number C4 and home phone number C3 stored in their address 32 creating the relationship schematically shown as 40 in
Since user D's email address D1 is stored in user C's address book, the relationship schematically shown as 44 in
Note that there is no relationship defined between users A and D as neither user has any identity attributes of the other user stored in their address book 22.
The service provider of the social network in this example is a set of servers shown schematically in
A method of identifying these relationships in the online social network in a manner that maintains the privacy of each of the users will now be described with reference to
Referring first to user A, user A registers 100 with the online social network. In this example user A, using their device, accesses the website of the online social network. The website is hosted by one or more servers 58 that are the service providers of the social network. From the website or from a third party supplier, user A downloads a small application software that is then installed on their device.
As part of the registration process user A operates their device to compute an encrypted version of each identity attribute A1, A2 and A3 using the downloaded software 100a.
In this example the software computes a one-way encrypted version of each identity attribute in its relationship set. In this example, the secure hash function SHA-I is used to produce a set of identity tokens I(A)={H(A1), H(A2), H(A3)} representing user A's identity.
User A also operates their device to compute a one-way encrypted version of their address book 22 using the downloaded software 100b. Again using the secure hash function SHA-I each attribute in the address book is hashed to produce a set of relationship tokens R(A)={H(B2), H(C1), H(C2)} which represents all the possible users of the social network known to user A. They are possible users as user A is not yet aware whether or not the users in the contact list 22 are registered users of the social network. User A's token sets are schematically shown in
In this example, user A's token sets are stored on their device. As part of the registration process, using the website, user A sends 70, 100c the two token sets 60 to the server 58. In this example the communication channel with the server 58 is insecure, but in other embodiments of this invention the communication channel may be secure.
The server 58 stores 102 these two token sets 60 in a related manner its relational database (or other suitable data structure) of all registered users.
User B and C also register with the social online network in the same way as user A which includes performing the same one-way encryptions steps. User B produces a set of identity tokens I(B)={H(B1), H(B2)} representing user B's identity. User B also produces a set of relationship tokens R(B)={H(A1), H(C4), H(C3)}. User B's token sets are schematically shown in
User C computes an identity set of tokens I(C)={H(C1), H(C2), H(C3), H(C4)} representing user C's identity. User C also produces a set relationship of tokens R(C)={H(D1), H(B2)} representing all of user B's relationships. User C's token sets are schematically shown in
User D does not perform these encryption steps 100 as user D does not wish to become a registered user of this online social network.
As each new user registers with the social online network, the server 58 compares each of the received relationship tokens to the stored identity tokens of registered users. In this example, this is done as a bit comparison of tokens stored in the relational database or using a set of JOIN statements, linking users to stored tokens to relationship tokens to users again. This allows for a fast lookup on most Relational Database Engines. Alternatively, for large sets of users a search on a flat Identity Directory systems could be used. For each match with a distinct user, the server 58 identifies a relationship between the two users. The server 58 stores the identified relationship in the database as an association between those two users.
For example if user A is the first to register no relationships are identified by the server 58 as identity tokens corresponding to the attributes B2, C1 or C2 of user's relationship tokens are not found in the database.
User B is next to register. The server 58 matches user B's relationship token H(A1) to an identity token stored in the relational database. As a result the server 58 identifies a relationship between user B and A (B, A). An indication of this identified relationship (B, A) is also sent to user B by the server 58 (as described directly below) and an appropriate record is also stored in the database.
The server 58 now also checks 104 whether any previously registered users have a relationship token in the database that is the same as either of user B's identity tokens H(B1) and H(B2). The server 58 identifies 106 that user A's relationship token set does include H(B2) and accordingly identifies a relationship between user A and B (A, B). This makes the relationship between users A and B unidirectional. Server 58 creates an appropriate record in the database and sends 108 to user A an indication that the relationship (A, B) has been identified. For example, if user A is currently online in the social network, the webpage currently viewed by user A may be modified by information sent by the server 58 to show this, such as a pop up box. Alternatively, a message may be sent that can be retrieved by user A next time user A reads their emails or the next time user A is online.
User C is next to register. The server 58 identifies that user C's relationship token H(B2) is an identity token stored in the relational database. As a result server 58 identifies a relationship between user C and B (C, B). Server 58 also identifies that user C's relationship token H(D1) is not stored in the database and no new relationship is identifies with user D. The server 58 sends to user C an indication that the relationship (C, B) has been identified but does not send an indication that a relationship with user D has been found. In this way, user C receives an indication that user C now has a relationship with only a subset (B) of the possible users of the social network (B & D) listed in user C's address book 28.
The server 58 now also checks whether any previously registered users have a relationship token in the database that is the same as any of user C's identity tokens H(C1), H(C2), H(C3) and H(C4). The server 58 identifies that user A's relationship token set does includes H(C1) and H(C2) which uniquely matches to one user, namely user C. The server 58 identifies one relationship between user A and C (A, C). Since there is no relationship (C, A) this new relationship is unidirectional. The uni or bi directional relationship can be used by the communications domain to confer some attributes to the communications allowed between them. That is users involved in a unidirectional relationship may have different communication and information privileges to each other.
The server 58 identifies that user B's relationship token set does includes H(C4) and H(C3). The server stores an association between user B and C (B, C). Since the relationship (C, B) also exists this new relationship is bi-directional.
A summary of the relationships identified and stored on the server 58 is schematically shown in
In other communication domains, the edge (i.e. relationship) may allow user B to communicate with user A using synchronous or asynchronous messaging, such as email, instantaneous messaging, multimedia files transfers such as video, pictures and audio files and real-time streamed multi-media communication such as video or audio chat.
Over time, to maintain the relationships 80 so that they are synchronized with the data contained in the address books of the users A, B and C, a simple synchronization protocol is used to transmit only the variation of the address book to the servers. That is, the software on the user's device at a predetermined interval or based on the detection of an event, re-computes the identity and relationship token sets and compares it to the version of these token sets most recently computed and stored in the user's device. The differences are identified and only the differences are sent to the server 58.
A particular identity attribute may often be represented in different notation standards For example a phone number could be noted with the full country and area code or in a shorter version without the country code and still be correct. The following three numbers are different representations of user B's telephone number B2:
+33 1 42 23 11 54 (as stored by user B as an identity attribute)
0011 33 1 42 23 11 54 (as stored by user A as a relationship attribute)
01 42 23 11 54 (as stored by user A as a relationship attribute)
For these different representations of the same identity attribute, there is a chance that the server 58 will not correctly identity that they are in fact the same attribute and therefore fail to identify the relationships (A, B) and (C, B).
To address this, a transformation can be applied, specific to telephone numbers, where only the last k digits are selected by the software when encrypting an attribute. In this case where k=7, 2 23 11 54 is the number that is encrypted, in this example hashed.
When using this truncation technique the probability of collision increases. A collision can occur as phone number loose their globally unique semantic. For example, the following phone numbers: +1 4344-982-209 and +33 3 44 98 22 09 share the same suffix when k=7. As a result they would both produce the same tokens even though they represent different phone numbers. This may result in an incorrect relationship being identified by the server 58.
To address this, the criteria needed to create a relationship by the server could be made more strict. For example, a relationship may only be identified by the server where the relationship is bi-directional. Alternatively or in addition, having a relationship only identified if the two or more relationship tokens match two identity tokens of a user. That means a user needs to have a user's telephone number AND email address to identify a relationship.
Alternatively or in addition, the distance between two nodes in the graph (i.e. minimum number of edges) as an indicator of probability of collision. For example, if two user's have a relationship to a third user in common, the chance that they themselves have a relationship is much higher.
A privacy leak threat exists in the form of a brute-force attack is where the attacker tries every possible number to revert from the hash to the user's phone number. To mitigate this problem, a k in this example the phone number by using e.g. the last x digits, x being sufficiently large to minimise collision probability, but sufficiently small to maximize the user's privacy. For example, if user's A phone number is +1 421 510 889, one could choose to truncate it to 510889 and hash that value instead.
It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the scope of the invention as broadly described.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.
Number | Date | Country | Kind |
---|---|---|---|
2008904718 | Sep 2008 | AU | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/AU2009/001186 | 9/10/2009 | WO | 00 | 1/5/2010 |