LEARNING APPARATUS, LEARNING SYSTEM, LEARNING METHOD, AND PROGRAM

Information

  • Patent Application
  • 20250190864
  • Publication Number
    20250190864
  • Date Filed
    March 18, 2022
    3 years ago
  • Date Published
    June 12, 2025
    7 months ago
  • CPC
    • G06N20/00
  • International Classifications
    • G06N20/00
Abstract
A learning apparatus updates a model variable wi by using a dual variable zA and noise Rσi including a random number R in a normal distribution and a standard deviation σi of noise, obtains a parameter λ used when learning of an update difference yA and the standard deviation of noise is performed by using the updated model variable wi and the noise Rσi, exchanges the update difference yA when communication with another learning apparatus constituting the learning system is performed, updates the standard deviation σi of noise by using a dual variable zB, a hyperparameter L, and noise Rλ including a random number R in a normal distribution and the parameter 2, obtains an update difference yB by using the updated standard deviation σi, the hyperparameter L, and the noise Rλ, and exchanges the update difference yB when communication with the other learning apparatus is performed.
Description
TECHNICAL FIELD

The present invention relates to federated learning.


BACKGROUND ART

Information from which many advantages such as matching, automatic control, and AI medical care can be obtained by utilizing e-mails and purchase history in personal terminals, inspection materials and IoT information in companies, and diagnosis information in hospitals cannot be utilized due to information security and leakage anxiety. As means for solving this problem, federated learning capable of performing learning in a distributed manner is known.


The federated learning can be implemented at a high speed by distributing learning. However, in the federated learning, it is necessary to transmit an AI model generated by a user to another person, and thus there is a risk that original data is reproduced from the model. In addition, how much information has been transmitted regarding the learning of an AI model has not been known.


Regarding this, the reproduction risk of the original data can be reduced by applying a difference privacy method and adding noise to a model (see Non Patent Literature 1).


CITATION LIST
Non Patent Literature



  • Non Patent Literature 1: K Wei, et al., “Federated Learning with Differential Privacy: Algorithms and Performance Analysis”, IEEE Transactions on Information Forensics and Security (Volume: 15), p. 3454-3469, 2020.



SUMMARY OF INVENTION
Technical Problem

However, in the conventional method, the risk of leakage can be reduced as more noise is added, but then there is a problem that the accuracy of the AI model to be learned decreases. In other words, in the known method, it is necessary to freely determine the noise to be added, and the problem of achieving a balance between the leakage risk and the accuracy remains.


An object of the present invention is to provide a learning apparatus, a learning system, a learning method, and a program thereof that optimally adjust safety and accuracy of an AI model learned by an end user terminal by controlling a magnitude of noise to be added, generating a large amount of noise within a range not interfering with AI learning in federated learning, and adding noise to the AI model.


Solution to Problem

In order to solve the above problem, according to an aspect of the present invention, a learning apparatus constitutes a learning system including N learning apparatuses. The learning apparatus includes a model learning unit that updates a model variable wi by using a dual variable zA and noise Rσi including a random number R in a normal distribution and a standard deviation σi of noise, obtains a parameter λ used when learning of an update difference yA and the standard deviation of noise is performed by using the updated model variable wi and the noise Rσi, and exchanges the update difference yA when communication with another learning apparatus constituting the learning system is performed, and a noise learning unit that updates the standard deviation σi of noise by using a dual variable zB, a hyperparameter L, and noise Rλ including a random number R in a normal distribution and the parameter λ, obtains an update difference yB by using the updated standard deviation σi, the hyperparameter L, and the noise Rλ, and exchanges the update difference yB when communication with the other learning apparatus is performed.


In order to solve the above problem, according to another aspect of the present invention, a learning system includes N learning apparatuses. Each learning apparatus i includes a model learning unit that updates a model variable wi by using a dual variable zA and noise Rσi including a random number R in a normal distribution and a standard deviation σi of noise, and obtains a parameter λ used when learning of an update difference yA and the standard deviation of noise is performed by using the updated model variable wi and the noise Rσi, and a noise learning unit that updates the standard deviation σi of noise by using a dual variable zB, a hyperparameter L, and noise Rλ including a random number R in a normal distribution and the parameter λ, and obtains an update difference yB by using the updated standard deviation σi, the hyperparameter L, and the noise Rλ. The learning apparatus i and another learning apparatus j exchange the update differences yB and yA when the learning apparatus i communicates with the other learning apparatus j.


Advantageous Effects of Invention

According to the present invention, there is an effect that it is possible to achieve both safety and security and model accuracy and to perform model learning in contrast to a case where known difference privacy and federated learning are simply combined.





BRIEF DESCRIPTION OF DRAWINGS


FIG. 1 is a diagram illustrating a configuration example of a learning system according to a first embodiment.



FIG. 2 is a diagram illustrating an example of a processing flow of the learning system according to the first embodiment.



FIG. 3 is a diagram for explaining an algorithm of the learning system according to the first embodiment.



FIG. 4 is a functional block diagram of the learning apparatus according to the first embodiment.



FIG. 5 is a diagram illustrating an example of a communication schedule.



FIG. 6 is a diagram illustrating a configuration example of a computer to which the present method is applied.





DESCRIPTION OF EMBODIMENTS

Hereinafter, embodiments of the present invention will be described. Note that, in the drawings to be used for the following description, components having the same functions or steps for performing the same processing will be denoted by the same reference numerals, and redundant description will be omitted. In the following description, the symbol “-” or the like used in the text would normally be written immediately above the immediately following character, but is notated immediately before the character due to limitations of text notation. In expressions, these symbols are written in their original positions. In addition, processing performed in units of elements of a vector or a matrix is applied to all elements of the vector or the matrix unless otherwise specified.


<Conventional Federated Learning>

First, conventional federated learning will be described.

    • (i) The variance of a random number is determined in advance by the number of learning apparatuses that perform the federated learning and the round number during learning.
    • (ii) A learning apparatus learns a model, adds noise to the model, and transmits the model to which the noise has been added to a learning server.
    • (iii) The learning server aggregates the received models, further adds noise to the aggregated model, and transmits the model to which the noise has been added after the aggregation to the learning apparatus.
    • (iv) The model is learned by repeating the above (ii) and (iii).


On the other hand, a learning system in a first embodiment is a distributed system in which a plurality of learning apparatuses operates asynchronously without providing a learning server.


In the learning system in the first embodiment, each learning apparatus learns the standard deviation of noise simultaneously with learning of the model. Furthermore, the resultant obtained by adding noise to an update difference of the model and the resultant obtained by adding noise to an update difference of the standard deviation of the noise are exchanged between the learning apparatuses.


First Embodiment


FIG. 1 illustrates a configuration example of a learning system according to the first embodiment. FIG. 2 illustrates an example of a processing flow thereof. FIG. 3 illustrates an example of an algorithm thereof.


The learning system includes N learning apparatuses 100-i. The configuration of a network, the number N of learning apparatuses, and the number Ei of other learning apparatuses with which each learning apparatus 100-i can communicate can be appropriately set. Here, i=1, 2, . . . , N is defined. For example, N=6 and Ei=2 may be obtained by using a ring network as illustrated in FIG. 1A, or a random network of N=6 as illustrated in FIG. 1B may be used. In the random network, the number E; of other learning apparatuses with which the learning apparatus 100-i can communicate differs. For example, the learning apparatus 100-1 can communicate only with the learning apparatus 100-5, but the learning apparatus 100-5 can communicate with three learning apparatuses 100-1, 100-4, and 100-6.



FIG. 4 illustrates a functional block diagram of the learning apparatus 100-i according to the first embodiment.


The learning apparatus 100-i includes an initialization unit 110, a model learning unit 120, and a noise learning unit 140.


The learning apparatus 100-i learns the model by using a data subset xi as an input and performing federated learning with another learning apparatus. The data subset xi is a set of radiuses |xi| including samples of ζ dimensions available for each learning apparatus 100-i∈−N. |⋅| represents the radix of the set ⋅. xi may be different types of data, meaning that xi and xj (i≠j, i∈N, j∈N) are sampled from different distributions. N represents a set of N learning apparatuses, and N=|N|. Note that a set of other learning apparatuses (set of bidirectional edges) with which the learning apparatus 100-i can communicate is expressed as Ei={j∈N|(i,j)∈E}, Ei=|Ei| represents the number of other learning apparatuses with which the learning apparatus 100-i can communicate, and E=Σi∈NEi is defined. Ei is also referred to as the number of edges.



FIG. 5 illustrates an example of a communication schedule. Assuming that the calculation and communication performance of all the learning apparatuses are similar, each learning apparatus 100-i performs the update K times for each communication round r∈{1, . . . , R}. That is, each round r includes K inner iterations in each learning apparatus 100-i. Each learning apparatus 100-i communicates with another learning apparatus 100-j that can communicate for each round r one or more times, and exchanges an update difference of a model parameter and an update difference of the standard deviation of noise. Note that FIG. 3 illustrates that the N learning apparatuses 100-i sequentially perform processing within a round, but actually, the N learning apparatuses 100-i performs processing in parallel by matching the number k of internal iterations.


This distributed optimization method can be applied to any machine learning (ML) model. Before starting a learning procedure, the same model architecture and local cost function fi are defined for all the learning apparatuses 100-i (fi=fj|i,j∈N). A cost function of the learning apparatus 100-i is the following expression.








f
i

(

w
i

)

=


E


χ_

i



x

_

i



[


f
i

(


w
i

;

χ
i


)

]





Note that the subscript X_Y means XY. Here, wi∈Rm is a model variable of the learning apparatus 100-i, and χi is a mini-batch data sample from xi. It is assumed that the cost function fi: Rζ→R is Lipschitz smooth, and is convex or non-convex (for example, DNN). Thus, the function fi is differentiable and the gradient is calculated by gi(wi)=∇fi(wi;xi). Here, ∇ represents a differential operator.


In the present embodiment, while the N learning apparatuses 100-i learn the model asynchronously, the update difference of the model parameter and the update difference of the standard deviation of the noise are exchanged with another learning apparatus 100-j that can communicate for each round r. In this manner, the accuracy of the model is improved in the entire learning system. The learning system finds, by learning, a model variable that minimizes the global cost function f(w)=(1/N)Σi∈Nfi(wi). Note that the model variable is given by the following expression.






w
=



[



w
T

1

,


,


w
T

N


]

T



R
Nm






Here, T represents transposition.


The learning apparatus 100-i is a special device configured such that a special program is read by a known or dedicated computer having, for example, a central processing unit (CPU), a main storage device (a random access memory (RAM)), and the like. For example, the learning apparatus 100-i executes each processing under a control of the central processing unit. The data which is input to the learning apparatus 100-i or the data obtained by each processing is stored in, for example, the main storage device. The data stored in the main storage device is read to the central processing unit, and is used for another processing as necessary. At least some of processing units of the learning apparatus 100-i may be configured by hardware such as an integrated circuit. Each storage unit included in the learning apparatus 100-i can be configured with, for example, a main storage device such as a random access memory (RAM) or middleware such as a relational database or a key value store. However, each storage unit is not necessarily provided inside the learning apparatus 100-i, and may be configured with an auxiliary storage device configured with a semiconductor memory element such as a hard disk, an optical disc, or a flash memory, and may be provided outside the learning apparatus 100-i.


Hereinafter, each unit will be described.


<Initialization Unit 110>

The initialization unit 110 initializes a dual variable zi|j (S101), and outputs the initialized dual variable zi|j. For example, zi|j=0 is defined. zi|j is a variable held and updated by the learning apparatus 100-i, and includes dual variables zA_(i|j) and zB_(i|j). The dual variable zA_(i|j) is a parameter used when the model as a learning target is learned, and the dual variable zB_(i|j) is a parameter used when the standard deviation of noise is learned.


<Model Learning Unit 120>

The model learning unit 120 updates a model variable wir,k by using a data subset xi, a model variable wir,k of the learning apparatus 100-i, a constraint parameter Ai|j∈Rm×m, the dual variable zA_(i|j)r,k, and noise Rσir,k including the random number R in the normal distribution and the standard deviation σir,k of the noise (S103). Note that the constraint parameter Ai|j is a constraint parameter during model learning for the edge (i, j) of the learning apparatus 100-i.


The model learning unit 120 obtains the update difference yA_(i|j)r,k by using the dual variable zA_(i|j)r,k, the constraint parameter Ai|j, the updated model variable wir,k, and the noise Rσir,k. When communication with the learning apparatus 100-j∈Ei is performed (YES in S105), the model learning unit 120 transmits the update difference yA_(i|j)r,k to the learning apparatus 100-j instead of the model variable wink, receives the update difference yA_(j|i)r,k from the learning apparatus 100-j, and exchanges the update difference (S107).


For example, first, the model learning unit 120 calculates the gradient gi(wir,k)=∇fi(wir,ki) by using the mini-batch data sample χi from the data subset xi and the model variable wir,k∈Rm of the learning apparatus 100-i of the model learning unit 120.


Then, the model learning unit 120 updates the model variable wir,k by using the gradient gi(wir,k), the constraint parameter Ai|j, the dual variable zA_(i|j)r,k, and the noise Rσir,k.


For example, the model variable wir,k is updated by the following expression.










w
i

r
,
k






a

(
1
)



w
i

r
,
k



-


a

(
2
)



μ
[



a

(
3
)




g
i

(

w
i

r
,
k


)


+




a

(
4
)


μ


KE
i





{



a

(
5
)








j



E
_

i





(



a

(
6
)



w
i

r
,
k



-


a

(
7
)



A

i

j

r



z

A

i

j



r
,
k



+


a

(
8
)



A

i

j

r


R


σ
i

r
,
k




)


-



a

(
9
)


μ


E
i




g
i

(

w
i

r
,
k


)



}



]







[

Math
.

1

]







However, a(1) to a(9) are predetermined coefficients, and may be manually set, or appropriate values may be calculated and set by simulation or the like. For example, a(1) to a(9) are all set to 1, and the following expression is defined.










w
i

r
,
k





w
i

r
,
k


-

μ
[



g
i

(

w
i

r
,
k


)

+


1

μ


KE
i





{








j



E
_

i





(


w
i

r
,
k


-


A

i

j

T



z

A

i

j



r
,
k



+


A

i

j

T


R


σ
i

r
,
k




)


-

μ


E
i




g
i

(

w
i

r
,
k


)



}



]






[

Math
.

2

]







Note that μ is a step size. The model variable wir,k is a parameter used in the model as the learning target. The learning of the model is performed by updating the model variable wir,k and the dual variable zA)(i|j)r,k. Note that the superscript r, k means a parameter at the k-th iteration of the r round.


Further, the model learning unit 120 obtains the parameters yA_(i|j)r,k and λi|jr,k by using the updated model variable wir,k, the dual variable zA_(i|j)r,k, the constraint parameter Ai|j, and the noise Rσir,k.








y


A

_



(

i

j

)




r
,
k






b

(
1
)




z


A

_



(

i

j

)




r
,
k



-


b

(
2
)



(



b

(
3
)



A

i

j





w
i


r
,
k



+


b

(
4
)


R



σ
i


r
,
k




)











λ

i

j



r
,
k






c

(
1
)




z


A

_



(

i

j

)




r
,
k



-


c

(
2
)



(



c

(
3
)



A

i

j





w
i


r
,
k



+


c

(
4
)


R



σ
i


r
,
k




)







However, b(1) to b(4) and c(1) to c(4) are predetermined coefficients, and may be manually set, or appropriate values may be calculated and set by simulation or the like. For example, the following expressions are defined.







b

(
2
)

=


2


and



b

(
1
)


=


b

(
3
)

=


b

(
4
)

=


c

(
1
)

=


c

(
2
)

=


c

(
3
)

=


c

(
4
)

=
1















y


A

_



(

i

j

)




r
,
k






z


A

_



(

i

j

)




r
,
k


-

2


(



A

i

j





w
i


r
,
k



+

R



σ
i


r
,
k




)











λ

i

j



r
,
k






z


A

_



(

i

j

)




r
,
k


-

(



A

i

j





w
i


r
,
k



+

R



σ
i


r
,
k




)






Here, j∈Ei.


The parameter yA_(i|j)r,k is a parameter used when the model as the learning target is updated, is an update difference to which the noise Rσir,k is added, and is a parameter that is held by the learning apparatus 100-i and transmitted to the learning apparatus 100-j. The parameter λi|jr,k is a parameter used when the standard deviation of noise is learned.


When the learning apparatus 100-i communicates with the learning apparatus 100-j∈Ei (YES in S105), the model learning unit 120 receives the parameter yA_(j|i)r,k held by the learning apparatus 100-j and transmitted to the learning apparatus 100-i, and updates the dual variable zA_(i|j)r,k.








z


A

_



(

i

j

)




r
,
k





y


A

_



(

i

j

)




r
,
k






Furthermore, the model learning unit 120 transmits the parameter yA_(i|j)r,k to the learning apparatus 100-j and exchanges the update difference of the model parameters.


<Noise Learning Unit 140>

The noise learning unit 140 updates the standard deviation σir,k of the noise by using the parameter ηi, the constraint parameter Bi|j, the dual variable zB_(i|j)r,k, the hyperparameter L, the random number R of the normal distribution, and the parameter λi|jr,k (S109). Note that the constraint parameter Bi|j is a constraint parameter used during a period when the learning apparatus 100-i learns the standard deviation of noise with respect to the edge (i, j). In addition, n is a parameter used for learning the standard deviation of the noise, and, for example, ηi=1/(μEi(K−1)) is defined.


The noise learning unit 140 obtains an update difference yB_(i|j)r,k by using the dual variable zB_(i|j)r,k, the constraint parameter Bi|j, the updated standard deviation σir,k, the hyperparameter L, and the noise Rλi|jr,k including the random number R of the normal distribution and the parameter λi|jr,k. When communication with the learning apparatus 100-j∈E; is performed (YES in S111), the noise learning unit 140 transmits the update difference yB_(i|j)r,k to the learning apparatus 100-j, receives the update difference yB_(i|j)r,k from the learning apparatus 100-j, and exchanges the update difference (S113).


For example, first, the noise learning unit 140 updates the standard deviation σir,k of the noise by the following expression.










σ
i

r
,
k







d

(
1
)


L


2


E
i



η
i





(



d

(
2
)








j



E
_

i





d

(
3
)




η
i

(



d

(
4
)



B

i

j




z

B

i

j



r
,
k



-


d

(
5
)



B

i

j



R


λ

i

j




)


±






d

(
6
)








j



E
_

i





d

(
7
)





η
i

(



d

(
8
)



B

i

j



R


λ

i

j



-


d

(
9
)



B

i

j




z

B

i

j



r
,
k




)
















j



E
_

i





d

(
10
)




η
i

(



d

(
11
)



B

i

j



R


λ

i

j



-


d

(
12
)



B

i

j




z

B

i

j



r
,
k




)


+


d

(
13
)



E
i



η
i








)






[

Math
.

3

]







Here, d(1) to a(13) are predetermined coefficients, and may be manually set, or appropriate values may be calculated and set by simulation or the like. For example, d(1) to d(12) are all defined to 1 and d(13)=4 is defined. Further, the following expression is defined.










σ
i

r
,
k





L

2


E
i



η
i





(








j



E
_

i






η
i

(



B

i

j




z

B

i

j



r
,
k



-


B

i

j



R


λ

i

j




)


±











j



E
_

i





η
i




(



B

i

j



R


λ

i

j



-


B

i

j




z

B

i

j



r
,
k




)
















j



E
_

i





η
i



(



B

i

j



R


λ

i

j



-


B

i

j




z

B

i

j



r
,
k




)


+

4


E
i



η
i








)






[

Math
.

4

]







Furthermore, the noise learning unit 140 obtains the parameter yB_(i|j)r,k by the following expression using the dual variable zB_(i|j)r,k, the constraint parameter Bi|j, the hyperparameter L, the updated standard deviation σir,k, and the noise Rλir,k.








y


B

_



(

i

j

)




r
,
k






e

(
1
)




z


B

_



(

i

j

)




r
,
k



-


e

(
2
)



(



e

(
3
)



B

i

j





σ
i


r
,
k



+


e

(
4
)


LR



λ
i


r
,
k




)







Here, e(1) to e(4) are predetermined coefficients, and may be manually set, or appropriate values may be calculated and set by simulation or the like. For example, the following expressions are defined.







e

(
1
)

=


e

(
3
)

=


e

(
4
)

=


1


and



e

(
2
)


=
2











y


B

_



(

i

j

)




r
,
k






z


B

_



(

i

j

)




r
,
k


-

2


(



B

i

j





σ
i


r
,
k



+

LR



λ
i


r
,
k




)







Here, j∈Ei. The parameter yB_(i|j)r,k is a parameter used when the standard deviation of the noise is updated, is an update difference to which the noise Rλir,k is added, and is a parameter held by the learning apparatus 100-i and transmitted to the learning apparatus 100-j. Note that L is a hyperparameter, and a value of about 0.01 is usually set. L is a hyperparameter for controlling safety. By setting L to a value of 0.02 or more and 0.03 or less, safety is improved instead of slightly deteriorating learning accuracy.


When the learning apparatus 100-i communicates with the learning apparatus 100-j∈Ei (YES in S111), the noise learning unit 140 receives the parameter yB_(j|i)r,k held by the learning apparatus 100-j and transmitted to the learning apparatus 100-i, and updates the dual variable zB_(i|j)r,k (S113).








z


B

_



(

i

j

)




r
,
k





y


B

_



(

j

i

)




r
,
k






Furthermore, the noise learning unit 140 transmits the parameter yB_(i|j)r,k to the learning apparatus 100-j and exchanges the update difference of the standard deviation of the noise.


In each learning apparatus 100-i, the above processing is repeated K times to form one round of processing, and the R round processing is repeated.


Effects

With the above configuration, it is possible to achieve both safety and security and model accuracy and to perform model learning.


Other Modification Examples

The present invention is not limited to the embodiment and the modification example. For example, the various kinds of processing described above may be executed not only in time series in accordance with the description but also in parallel or individually in accordance with processing abilities of the devices that execute the processing or as necessary. In addition, modifications can be made as needed within the gist of the present invention.


<Program and Recording Medium>

The various processes described above can be performed by causing a recording unit 2020 of a computer 2000 illustrated in FIG. 6 to read a program for executing each step of the method described above and causing a control unit 2010, an input unit 2030, an output unit 2040, a display unit 2050, and the like to operate.


The program in which the processing content is written may be recorded on a computer-readable recording medium. The computer-readable recording medium may be, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, or a semiconductor memory.


Moreover, the program is distributed by, for example, selling, transferring, or renting a portable recording medium such as a DVD or a CD-ROM on which the program is recorded. Furthermore, the program may be stored in a storage device of a server computer, and the program may be distributed by transferring the program from the server computer to another computer via a network.


For example, a computer for executing such a program first temporarily stores a program recorded on a portable recording medium or a program transferred from a server computer in a storage device of the computer. Then, when executing processing, the computer reads the program stored in the recording medium of the computer and executes the processing according to the read program. Moreover, as another mode of the program, the computer may read the program directly from a portable recording medium and execute processing according to the program, or alternatively, the computer may sequentially execute processing according to a received program every time the program is transferred from a server computer to the computer. Moreover, the above-described processing may be executed by a so-called application service provider (ASP) type service that implements a processing function only by an execution instruction and result acquisition without transferring the program from a server computer to the computer. Note that the program in the present embodiment includes information that is used for processing by an electronic computer and is equivalent to the program (data or the like that is not a direct command to the computer but has property that defines processing performed by the computer).


Moreover, although the present devices are each configured by executing a predetermined program on a computer in this mode, at least a part of the processing content may be implemented by hardware.

Claims
  • 1. A learning apparatus constituting a learning system including N learning apparatuses, the learning apparatus comprising: processing circuitry configured to:updates a model variable wi by using a dual variable zA and noise Rσi including a random number R in a normal distribution and a standard deviation σi of noise, obtains a parameter λ used when learning of an update difference yA and the standard deviation of noise is performed by using the updated model variable wi and the noise Rσi, and exchanges the update difference yA when communication with another learning apparatus constituting the learning system is performed; andupdates the standard deviation σi of noise by using a dual variable zB, a hyperparameter L, and noise Rλ including a random number R in a normal distribution and the parameter λ, obtains an update difference yB by using the updated standard deviation σi, the hyperparameter L, and the noise Rλ, and exchanges the update difference yB when communication with the other learning apparatus is performed.
  • 2. The learning apparatus according to claim 1, wherein the hyperparameter L is any value of 0.02 or more and 0.03 or less.
  • 3. A learning system comprising: N learning apparatuses, wherein each learning apparatus i includesprocessing circuitry configured to:updates a model variable wi by using a dual variable zA and noise Rσi including a random number R in a normal distribution and a standard deviation σi of noise, and obtains a parameter λ used when learning of an update difference yA and the standard deviation of noise is performed by using the updated model variable wi and the noise Rσi; andupdates the standard deviation σi of noise by using a dual variable zB, a hyperparameter L, and noise Rλ including a random number R in a normal distribution and the parameter λ, and obtains an update difference yB by using the updated standard deviation σi, the hyperparameter L, and the noise Rλ, andthe learning apparatus i and another learning apparatus j exchange the update differences yA and yB, when the learning apparatus i communicates with the other learning apparatus j.
  • 4. A learning method using N learning apparatuses, the learning method comprising: a model learning step in which processing circuitry included in learning apparatus i updates a model variable wi by using a dual variable zA and noise Rσi including a random number R in a normal distribution and a standard deviation σi of noise, and obtains a parameter λ used when learning of an update difference yA and the standard deviation of noise is performed by using the updated model variable wi and the noise Rσi; anda model parameter update difference exchange step in which the learning apparatus i and another learning apparatus j exchange the update differences yA, when the learning apparatus i and the other learning apparatus j communicate with each other;a noise learning step in which the processing circuitry included in the learning apparatus i updates the standard deviation σi of noise by using a dual variable zB, a hyperparameter L, and noise Rλ including a random number R in a normal distribution and the parameter λ, and obtains an update difference yB by using the updated standard deviation σi, the hyperparameter L, and the noise Rλ; anda noise standard deviation update difference exchange step in which the learning apparatus i and the other learning apparatus j exchange the update difference yB, when the learning apparatus i and the other learning apparatus j communicate with each other.
  • 5. A non-transitory computer-readable recording medium having recorded thereon a program for causing a computer to function as the learning apparatus according to claim 1.
PCT Information
Filing Document Filing Date Country Kind
PCT/JP2022/012677 3/18/2022 WO