The aspects of the present disclosure relate generally to Federated Learning Systems and Federated Recommendation Systems and more particularly to enhancing privacy of data in a Federated Learning or Recommendation System.
Federated Learning and Federated Recommendation systems have been shown to have a high level of inherent user privacy preserving qualities. The reason for this is mainly due to the user data remaining on the user equipment or device while the user recommendations are also generated on the user equipment. The part of a Federated Learning or Recommendation system that is most vulnerable to reducing user privacy is access to the model updates that are moved between the user equipment and the backend server.
While different methods based on secure aggregation techniques have been proposed to solve the problems related to protection of user data in these systems, such methods typically require a sophisticated system of pair-wise (between users) secure communication channels involving security measures such as key sharing, for example. This requires extra infra-structure, resources and management of different processes. These methods may also not be so robust in terms of users dropping out.
Accordingly, it would be desirable to be able to provide a system that addresses at least some of the problems identified above.
It is an object of the disclosed embodiments to provide an apparatus and method that enhances privacy of federated learning. This object is solved by the subject matter of the independent claims. Further advantageous modifications can be found in the dependent claims.
According to a first aspect the above and further objects and advantages are obtained by a user equipment. In one embodiment, the user equipment includes a processor configured to configured to download a master machine learning model for generating a user recommendation related to one or more of a use or interaction with an application of the user equipment; calculate a model update for the master machine learning model using the master machine learning model and data related to one or more of a user of the user equipment or a user interaction with the user equipment; encode the calculated model update using an ε-differential privacy mechanism; and transmit the ε-differential privacy encoded model update. The aspects of the disclosed embodiments enhance the privacy of a federated learning system by applying F-Differential Privacy (DP) to the model updates uploaded from the user equipment to the backend server. The privacy of the user is further enhanced as the model updates are hashed and randomized and cannot be decoded individually to learn anything about the user. The aspects of the disclosed embodiments do not require additional infrastructure or communication channels or need to share keys/data between individual users, and no management of encryption keys is required, which reduces the amount of required computational resources.
In a possible implementation form of the user equipment according to the first aspect, the downloaded master machine learning model is one or more of a collaborative filter (CF) model or a Federated Learning collaborative filter model. The aspects of the disclosed embodiments can be applied to a general set of Machine Learning algorithms in Federated Learning mode, and more specific filter models.
In a possible implementation form of the user equipment according to the first aspect as such or the previous possible implementation form, the processor is configured to generate the user recommendation related to the use of the application based on the downloaded master machine learning model and the data related to one or more of the user of the user equipment or the user interaction with the user equipment. The aspects of the disclosed embodiments minimize the risk of exposing user data by generating the recommendations on the user equipment.
In a further possible implementation form of the apparatus, the application is a video service. The aspects of the disclosed embodiments provide a high level of user privacy when the user uses the personalised recommendations that propose video choices to the user based on video preference selections, user demographic and/or gender data, or videos they have previously selected and/or watched through the service.
According to a second aspect, the above and further objects and advantages are obtained by a server apparatus. In one embodiment, the server apparatus includes a processor that is configured to receive a plurality of ε-differential privacy encoded model updates for a master machine learning model; aggregate the plurality of the received ε-differential privacy encoded updates; decode the aggregation of the plurality of received ε-differential privacy encoded updates to recover an aggregated version of the plurality of received ε-differential privacy encoded updates; and update the master machine learning model from the aggregated version of the aggregated version of the plurality of received ε-differential privacy encoded updates. The aspects of the disclosed embodiments use ε-Differential Privacy to encode the model updates sent from a user equipment to the backend in such a way that it is impossible or very difficult for any agent (including the backend itself) to intercept or view the encoded updates to reverse engineer the encoded updates to extract any useful information about the user data. By aggregating the encoded model updates from many users and decoding the resulting aggregate an estimate of the actual model updates can be calculated. This aggregate of the model updates is all that is required in the Federated Learning system as opposed to knowing the updates from the individual users. This further enhances the privacy properties of Federated Learning.
In a possible implementation form of the server apparatus according to the second aspect as such the master machine learning model is one or more of a collaborative filter (CF) model or a Federated Learning collaborative filter model. The aspects of the disclosed embodiments can be applied to a general set of Machine Learning algorithms in Federated Learning mode, and more specific filter models.
In another possible implementation form of the server apparatus according to the second aspect as such, the processor is configured to aggregate the plurality of received ε-differential privacy encoded updates as a sum of the plurality of received ε-differential privacy encoded updates. The aspects of the disclosed embodiments make it difficult for anyone looking at the encoded versions of the model updates to extract accurate information from the encoded model updates. By aggregating the encoded model updates from many users and decoding the resulting aggregate, an estimate of the actual model updates can be calculated. This aggregate of the model updates is all that is required in the Federated Learning system as opposed to knowing the updates from the individual users, which further enhances privacy.
According to a third aspect, the above and further objects and advantages are obtained by a method. In one embodiment, the method includes downloading to a user equipment, a master machine learning model for generating a recommendation related to an application of the user equipment; calculating a model update for the master machine learning model using the master machine learning model and data related to one or more of a user of the user equipment or a user interaction with the user equipment; encoding the model update using an c-differential privacy mechanism; and transmitting the encoded model update from the user equipment to a server. In one embodiment the master machine learning model is downloaded from a backend server associated with an application service. The aspects of the disclosed embodiments enhance the privacy of a federated learning system by applying ε-Differential Privacy (DP) to the model updates uploaded from the user equipment to the backend server. The privacy of the user is further enhanced as the model updates are hashed and randomized and cannot be decoded individually to learn anything about the user. This makes it very difficult, if not impossible for any agent, including the backend itself, to intercept or view the encoded updates to reverse engineer the encoded updates to extract any useful information about the user data. The use of computational resources is reduced compared to other methods using secure communications, encryption and decryption.
In a possible implementation form of the method according to the third aspect as such, the master machine learning model is one or more of a collaborative filter (CF) model or a Federated Learning collaborative filter model. The aspects of the disclosed embodiments can be applied to a general set of Machine Learning algorithms in Federated Learning mode, and more specific filter models.
In a possible implementation form of the method according to the third aspect as such, the method further includes receiving, in the server, a plurality of ε-differential privacy encoded model updates for the master machine learning model; aggregating the plurality of ε-differential privacy encoded model updates; decoding the aggregation of the ε-differential privacy encoded model updates to recover an aggregated version of the received plurality of εprivacy encoded model updates; and updating the master machine learning model from the recovered aggregated version. The aspects of the disclosed embodiments make it difficult for anyone looking at the encoded versions of the model updates to extract accurate information from the encoded model updates. By aggregating the encoded model updates from many users and decoding the resulting aggregate, an estimate of the actual model updates can be calculated. This aggregate of the model updates is all that is required in the Federated Learning system as opposed to knowing the updates from the individual users, which further enhances privacy.
In a further possible implementation form of the method according to the third aspect as such, the method further includes aggregating the plurality of ε-differential privacy encoded model updates as a sum of the plurality of ε-differential privacy encoded model updates. By aggregating the encoded model updates from many users and decoding the resulting aggregate an estimate of the actual model updates can be calculated. This aggregate of the model updates is all that is required in the Federated Learning system as opposed to knowing the updates from the individual users.
In a further possible implementation form of the method according to the third aspect as such, the application is a video service running on the user equipment. The aspects of the disclosed embodiments provide a high level of user privacy when the user uses the personalised recommendations that propose for example video choices to the user based on for example, videos they have previously watched through the service, user demographics, user gender and user preferences selected through the application and service.
According to a fourth aspect, the above and further objects and advantages are obtained by a method. In one embodiment, the method includes receiving in a server, a plurality of ε-differential privacy encoded model updates for a master machine learning model; aggregating the plurality of ε-differential privacy encoded machine learning model updates; decoding the aggregation of the ε-differential privacy encoded plurality of master machine learning model updates to recover an aggregated version of the received plurality of ε-differential privacy encoded master machine learning model updates; and updating the master machine learning model from the recovered aggregated version. The aspects of the disclosed embodiments use ε-Differential Privacy to encode the model updates sent from a user's device to the backend in such a way that it is impossible or very difficult for any agent (including the backend itself) to intercept or view the encoded updates to reverse engineer the encoded updates to extract any useful information about the user data. By aggregating the encoded model updates from many users and decoding the resulting aggregate an estimate of the actual model updates can be calculated. This aggregate of the model updates is all that is required in the Federated Learning system as opposed to knowing the updates from the individual users. This further enhances the privacy properties of Federated Learning.
In a possible implementation form of the method according to the fourth aspect as such, the method further includes aggregating the plurality of ε-differential privacy encoded machine learning model updates as a sum of the plurality of ε-differential privacy encoded machine learning model updates. By aggregating the encoded model updates from many users and decoding the resulting aggregate an estimate of the actual model updates can be calculated. This aggregate of the model updates is all that is required in the Federated Learning system as opposed to knowing the updates from the individual users
According to a fifth aspect, the above and further objects and advantages are obtained by a non-transitory computer readable media having stored thereon program instructions that when executed by a processor cause the processor to perform the method of the possible implementations forms recited herein.
According to a sixth aspect, the processor is configured to execute non-transitory machine readable program instructions to perform the method of the possible implementation forms recited herein.
These and other aspects, implementation forms, and advantages of the exemplary embodiments will become apparent from the embodiments described herein considered in conjunction with the accompanying drawings. It is to be understood, however, that the description and drawings are designed solely for purposes of illustration and not as a definition of the limits of the disclosed invention, for which reference should be made to the appended claims. Additional aspects and advantages of the invention will be set forth in the description that follows, and in part will be obvious from the description, or may be learned by practice of the invention. Moreover, the aspects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
In the following detailed portion of the present disclosure, the invention will be explained in more detail with reference to the example embodiments shown in the drawings, in which:
Referring to
In one embodiment, referring to
The processor 102 is configured to download a master machine learning model for generating a user recommendation related to use of an application of the user equipment 100. In one embodiment, the master machine learning model can be downloaded from the backend server 200 to the user equipment 100. The user recommendation can provide one or more different options, or recommendations, to the user related to the use of the application or service. The processor 102 can then calculate a model update for the master machine learning model based on the master machine learning model and data related to one or more of the user of the user equipment or the user's interaction with the user equipment.
In one embodiment, the data, also referred to as user data, can have different types. For example, the data can include data obtained or recorded from the user's interaction with the application or service. This can include data recorded based on a user's selection of an item or option of the application, or selection of one or more items being recommended. For example, when the application is a video service, the data can include information pertaining to a video watched by the user in the video service.
Another form of data can include information about the user. For example, the data can include any form of user demographic data. The data can include meta data such as location of the user and the user equipment, a type of the user equipment, user gender, or user age, or any combination thereof. In alternate embodiment, the data can include user behavioural data and/or user meta data, or any combination thereof. In one embodiment, this data is obtained by and stored locally in the user equipment 100. In alternate embodiments, this type of data is obtained in any suitable manner and stored on any suitable storage medium accessible by the user equipment 100.
The calculated model update is encoded using an c-differential privacy mechanism, and the c-differential privacy encoded model update is then transmitted. In one embodiment, the encoded model update is transmitted to the apparatus 200, referred to herein as the server, or backend server. As is illustrated in
The aspects of the disclosed embodiments use ε-Differential Privacy (DP) to encode a model update on the user equipment or device 100 and decode an aggregation of user model updates on the backend server 200. DP allows, for example numbers, to be encoded by a process involving hashing and randomization. This hashing-randomization process is applied to the model updates (which are numbers) and instead of transferring the plain model updates from the user device 100 to the backend server 200, the encoded version is transferred from the user device 100 to the backend server 200.
Anyone looking at the encoded versions of the model updates would find it very difficult, even impossible to extract accurate information from the encoded model updates. However on the server side 200, by aggregating the encoded model updates from many user devices 100, and decoding the resulting aggregate, an estimate of the actual model updates can be calculated. This aggregate of the model updates is all that is required in the Federated Learning system 10, as opposed to knowing the updates from the individual user devices 100.
Advantages of the method of the disclosed embodiments is that the privacy of the user of the user device 100 is further enhanced as the model updates are hashed and randomized and cannot be decoded individually to learn anything about the user. However Federated Learning can still be used as only the aggregate of the updates is required and this can be extracted from the aggregate of the individual encoded user model updates.
Compared to other approaches, the method of the disclosed embodiments requires no additional infrastructure or communication channels, or the need to share keys/data between individual user devices 100. Management of encryption keys is not required. The hashing-randomisation process on the user device 100 is straightforward and not resource consuming. The use of computational resources and processing is reduced compared to other methods using secure communications, encryption and decryption of updates. Thus, the aspects of the disclosed embodiments afford a comparable level of privacy as other methods, but in a much simpler and more resource efficient manner.
Referring to
As shown in
The Master model Y for the Federated Collaborative Filter (FCF) can be a matrix of numbers, such as for example [[0.2, 0.4, 0.6], [0.1, 0.3, 0.5], [0.7, 0.8, 0.9] . . . ]. The Master model Y will be stored locally on the user equipment 100a-100m as Xi. The storage can utilize a memory 108, such as that shown in
Using a combination of the locally stored master model Xi and local user data, such as for example videos the user has previously watched, a set of personalised recommendations for the user of the user equipment 100a-100m can be generated. The model updates ΔXi to “learn” the model Y are then calculated in the user equipment 100a-100m for each user or client, such as Client 1-Client M, respectively, from the master model Xi stored locally on a specific user equipment 100a-100m, and the corresponding local user data.
The model updates ΔXi is also a matrix of numbers, such as for example [[0.04, −0.01, −0.02], [0.08, −0.05, 0.03], [−0.04, 0.01, −0.03] . . . ]. Differential Privacy encoding is applied to the model update ΔXi of a particular user equipment 100a-100m to give E(ΔXi)=[[*, *, *], [*, *, *], [*, *, *] . . . ] the DP encoded updates.
The encoded model updates E(ΔXi) are transferred back to the back end server 200 and are aggregated on the server 200 as
E(ΔY)=ΣiΔXi
A decoding is applied to E(ΔY) to give an approximation to ΔY:
=E−1(E(ΔY))≈ΔY
The master model Y is updated as:
Y=Y+η
The process can continue with the distribution of the updated master model Y, as described above. Thus, the example of
The model update ΔXi is encoded 306 by applying ε-Differential Privacy. The encoded model update, referred to as E(ΔXi) is then sent 308 to the backend server, such as the backend server 200 of
Referring to
As an example, Huawei video service provides an application to users to run on their mobile device that allows them to watch videos through the service. The service backend is hosted in a cloud service. The video service would like to offer users a personalised recommendation service to propose video choices to users based on videos they have previously watched through the service, as well as other user specific preferences and demographics. The video service would like to provide the highest level of user privacy they can when the user uses the personalised recommendations. The video service decides to use a Collaborative Filter (CF) recommendation algorithm/model and use a Federated Learning mode to build and update the CF model. In particular they decide to use a Federated version of CF or FCF.
In accordance with the aspects of the disclosed embodiments, the video service applies ε-Differential Privacy to encode the model updates. The encoded model updates are then sent, via the cloud service, to the service backend. The service backend aggregates the encoded model updates and decodes the resulting aggregate to calculate an estimate of the actual model updates. In this manner, the privacy of the user is enhanced since the model updates cannot be decoded individually to learn anything about the user.
The apparatus 1000 includes or is coupled to a processor or computing hardware 1002, a memory 1004, a radio frequency (RF) unit 1006 and a user interface (UI) 1008. In certain embodiments such as for an access node or base station, the UI 1008 may be removed from the apparatus 1000. When the UI 1008 is removed the apparatus 1000 may be administered remotely or locally through a wireless or wired network connection (not shown).
The processor 1002 may be a single processing device or may comprise a plurality of processing devices including special purpose devices, such as for example, digital signal processing (DSP) devices, microprocessors, graphics processing units (GPU), specialized processing devices, or general purpose computer processing unit (CPU). The processor 1002 often includes a CPU working in tandem with a DSP to handle signal processing tasks. The processor 1002, which can be implemented as one or more of the processors 102 and 202 described with respect to
In the example of
The program instructions stored in memory 1004 are organized as sets or groups of program instructions referred to in the industry with various terms such as programs, software components, software modules, units, etc. Each module may include a set of functionality designed to support a certain purpose. For example a software module may be of a recognized type such as a hypervisor, a virtual execution environment, an operating system, an application, a device driver, or other conventionally recognized type of software component. Also included in the memory 1004 are program data and data files which may be stored and processed by the processor 1002 while executing a set of computer program instructions.
The apparatus 1000 can also include or be coupled to an RF Unit 1006 such as a transceiver, coupled to the processor 1002 that is configured to transmit and receive RF signals based on digital data 1012 exchanged with the processor 1002 and may be configured to transmit and receive radio signals with other nodes in a wireless network. In certain embodiments, the RF Unit 1006 includes receivers capable of receiving and interpreting messages sent from satellites in the global positioning system (GPS) and work together with information received from other transmitters to obtain positioning information pertaining to the location of the computing device 1000. To facilitate transmitting and receiving RF signals the RF unit 1006 includes an antenna unit 1010 which in certain embodiments may include a plurality of antenna elements. The multiple antennas 1010 may be configured to support transmitting and receiving MIMO signals as may be used for beamforming.
The UI 1008 may include one or more user interface elements such as a touch screen, keypad, buttons, voice command processor, as well as other elements adapted for exchanging information with a user. The UI 1008 may also include a display unit configured to display a variety of information appropriate for a computing device or mobile user equipment and may be implemented using any appropriate display type such as for example organic light emitting diodes (OLED), liquid crystal display (LCD), as well as less complex elements such as LEDs or indicator lamps.
The aspects of the disclosed embodiments are directed to the use of F-Differential Privacy to encode the model updates sent from a user's device to the backend. In this manner, it is impossible or very difficult for any agent intercepting or viewing the encoded updates to reverse engineer the encoded updates to extract any useful information about the user data. This further enhances the privacy properties of Federated Learning.
Thus, while there have been shown, described and pointed out, fundamental novel features of the invention as applied to the exemplary embodiments thereof, it will be understood that various omissions, substitutions and changes in the form and details of devices and methods illustrated, and in their operation, may be made by those skilled in the art without departing from the spirit and scope of the presently disclosed invention. Further, it is expressly intended that all combinations of those elements, which perform substantially the same function in substantially the same way to achieve the same results, are within the scope of the invention. Moreover, it should be recognized that structures and/or elements shown and/or described in connection with any disclosed form or embodiment of the invention may be incorporated in any other disclosed or described or suggested form or embodiment as a general matter of design choice. It is the intention, therefore, to be limited only as indicated by the scope of the claims appended hereto.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2019/051250 | 1/18/2019 | WO | 00 |