The present disclosure relates to an information processing apparatus and an information processing method.
In recent years, technologies that execute computations such as prediction or recognition using a model constructed by machine learning are being used widely. For example, Patent Literature 1 below discloses a technology that, while continuing a real service, improves a classification model by machine learning while also checking to a certain extent the degree of influence on past data.
Patent Literature 1: JP 2014-92878A
However, in some cases it is difficult for humans to understand the appropriateness of a computational result using a model constructed by machine learning. For this reason, one has been constrained to indirectly accept the appropriateness of the computational result, such as by accepting that the data set used for advance learning is suitable, that unsuitable learning such as overfitting and underfitting has not been executed, and the like, for example. Accordingly, it is desirable to provide a mechanism capable of providing information indicating the appropriateness of a computational result by a model constructed by machine learning.
According to the present disclosure, there is provided an information processing apparatus including: a first acquisition section that acquires a first data set including a combination of input data and output data obtained by inputting the input data into a neural network; a second acquisition section that acquires one or more second data sets including an item identical to the first data set; and a generation section that generates information indicating a positioning of the first data set in a relationship with the one or more second data sets.
In addition, according to the present disclosure, there is provided an information processing apparatus including: a notification section that notifies an other apparatus of input data; and an acquisition section that acquires, from the other apparatus, information indicating a positioning of a first data set in a relationship with a second data set including an item identical to the first data set, the first data set including a combination of output data obtained by inputting the input data into a neural network.
In addition, according to the present disclosure, there is provided an information processing method including: acquiring a first data set including a combination of input data and output data obtained by inputting the input data into a neural network; acquiring one or more second data sets including an item identical to the first data set; and generating, by a processor, information indicating a positioning of the first data set in a relationship with the one or more second data sets.
In addition, according to the present disclosure, there is provided an information processing method including: notifying an other apparatus of input data; and acquiring, by a processor, from the other apparatus, information indicating a positioning of a first data set in a relationship with a second data set including an item identical to the first data set, the first data set including a combination of output data obtained by inputting the input data into a neural network.
According to the present disclosure as described above, there is provided a mechanism capable of providing information indicating the appropriateness of a computational result by a model constructed by machine learning. Note that the effects described above are not necessarily limitative. With or in the place of the above effects, there may be achieved any one of the effects described in this specification or other effects that may be grasped from this specification.
Hereinafter, (a) preferred embodiment(s) of the present disclosure will be described in detail with reference to the appended drawings. Note that, in this specification and the appended drawings, structural elements that have substantially the same function and structure are denoted with the same reference numerals, and repeated explanation of these structural elements is omitted.
Hereinafter, the description will proceed in the following order.
In the machine learning framework of the related art, the design of feature quantities, or in other words, the design of the method for calculating the values required by computations such as recognition or prediction (that is, regression) has been carried out using human-led analysis, observation, incorporation of rules of thumb, and the like. In contrast, with deep learning, the design of feature quantities is entrusted to the function of the neural network. Because of this difference, in recent years, technologies using deep learning have come to overwhelm other technologies using machine learning.
A neural network is known to be capable of approximating an arbitrary function. With deep learning, by exploiting this characteristic of universal approximation, the series of processes from the design of a process for transforming data into feature quantities up to the computational process that executes recognition, prediction, or the like from feature quantities can be executed in an end-to-end manner without making distinctions. Also, with regard to the feature quantity transformation process and the computational processes such as recognition, prediction, or the like, deep learning is freed from the constraints on expressive ability imposed by design with the range of human understanding. Because of such tremendous improvements in expressive ability, deep learning has made it possible to automatically construct feature quantities and the like which are not understandable, analyzable, or discoverable by human experience or knowledge.
On the other hand, in a neural network, although the process of obtaining a computational result from input data is learned as a mapping function, in some cases the mapping function takes a form which is not easy for humans to understand. Consequently, in some cases it is difficult to persuade humans that the computational result is correct.
For example, it is conceivable to use deep learning to assess the real estate prices of existing apartments. In this case, for example, a multidimensional vector of the “floor area”. “floor number”. “time on foot from station”, “age of building”, “most recent contract price of a different apartment in the same building”, and the like of a property becomes the input data, and the “assessed value” of the property becomes the output data. By learning a relationship (that is, a mapping function) between input and output data in a data set collected in advance, the neural network after learning becomes able to calculate output data corresponding to new input data. For example, in the example of real estate price assessment described above, the neural network is able to calculate a corresponding “assessed value” when input data having items (such as “floor area”, for example) similar to when learning is input.
However, from the output data alone, it is difficult to accept whether or not the output data was derived via an appropriate computation. Accordingly, one has been constrained to indirectly accept the appropriateness of the computation by accepting that the data set used for advance learning is suitable, and that unsuitable learning such as overfitting and underfitting has not been executed.
Thus, one embodiment of the present disclosure provides a technology capable of providing information for directly knowing how a computational result was derived by a neural network.
Overview
In general, in recognition or prediction (that is, regression) technologies not just limited to neural networks, the function that maps a data variable x onto an objective variable y (for example, a label, a regression value, or a prediction value) is expressed by the following formula.
[Math. 1]
y=ƒ(x) (1)
However, each of x and y may also be a multidimensional (for example, x is a D-dimensional, and y is a K-dimensional) vector. Also, f is a function determined for each dimension of y. In a recognition or regression problem, finding the function f is typical in the field of machine learning, and is not just limited to a neural network or deep learning.
For example, in a linear regression problem, it is hypothesized that the regression takes the form of a linear response with respect to input, like in the following formula.
[Math. 2]
y=Ax+b (2)
Herein, A and b are matrices or vectors which do not depend on the input x. Each component of A and b may also be treated as a parameter of the model illustrated in the above Formula (2). With a model expressed in this form, it is possible to describe the calculation result easily. For example, assuming that K=1 and y is a scalar value y, A is expressed as a 1×D matrix. Accordingly, assuming that the value of each component of A is wj=aij, the above Formula (2) is transformed into the following formula.
[Math. 3]
y
1
=w
1
x
1
+w
2
x
z
+ . . . w
D
x
D
+b
1 (3)
According to the above formula, y1 is a weighted sum of the input x. In this way, if the mapping function is a simple weighted sum, it is easy to describe the contents of the mapping function in a way that humans can understand. Although a detailed description is omitted, the same also applies to classification problems.
Next, an example of performing some kind of nonlinear transformation (however, a linear transformation may also be included in the nonlinear transformation) on the input will be considered. The vector obtained by the nonlinear transformation is called a feature vector. Hypothetically, if the nonlinear transformation is known, the function of the nonlinear transformation (hereinafter also designated the nonlinear function) is used to express the regression formula as follows.
[Math. 4]
y=Aϕ(x)+b (4)
Herein, φ(x) is a nonlinear function for finding a feature vector, and for example, is a function that outputs an M-dimensional vector. For example, assuming that K=1 and y is a scalar value y1, A is expressed as a 1×M matrix. Accordingly, assuming that the value of each component of A is wj=aij, the above Formula (4) is transformed into the following formula.
[Math. 5]
y
1
=w
1ϕ1(x)+ . . . wMϕM(x)+b1 (5)
According to the above formula, y1 is a weighted sum of the feature quantities of each dimension. With this mapping function, since the mapping function is a weighted sum of known feature quantities, it is also considered easy to describe the contents of the mapping function in a way that humans can understand.
Herein, the model parameters A and b are obtained by learning. Specifically, the model parameters A and b are learned such that the above Formula (1) holds for a data set supplied in advance. Herein, as an example, assume that the data set is (xn, yn), where n=1, . . . . N. In this case, learning refers finding the weights w whereby the difference between the prediction value f(xn) and the actual value yn is reduced in all supplied data sets. The difference may be the sum of squared errors illustrated in the following formula, for example.
The function L expressing the difference may be a simple sum of squared errors, a weighted sum of squared errors weighted by dimension, or a sum of squared errors including constraint terms (such as an L1 constraint or an L2 constraint, for example) expressing functional constraints on the parameters. Otherwise, the function L expressing the difference may be an arbitrary function, insofar as the function is an index that expresses how correctly the output prediction value f(xn) predicts the actual value yn, such as binary cross entropy, categorical entropy, log-likelihood, or variational lower bound. Note that besides learning, the model parameters A and b may also be human-designed.
In some cases, the model parameters A and b obtained by learning are not round values compared to the human-designed case. However, since it is easy to know how much weight (that is, contribution) is set for which dimension, this model is also considered to be in the range of human understanding.
Meanwhile, in the case of creating a model designed to be human-understandable in this way, the variations of data that can be expressed by the model become limited. For this reason, actual data cannot be expressed adequately, that is, adequate prediction is difficult in many cases.
Accordingly, to increase the variations of data that can be expressed by the model, it is desirable for even the portion of feature quantities from the nonlinear transformation designed using existing experience and knowledge to be obtained by learning. For this purpose, approximation of a feature function by a neural network is effective. A neural network is a model suited to expressing composite functions. Accordingly, expressing the portion of feature transformation and a composite function of subsequent processes with a neural network is considered. In this case, it becomes possible to also entrust to learning the portion of feature quantity transformation by a known nonlinear transformation, and inadequate expressive power due to design constraints can be avoided.
As illustrated in the following formulas, a neural network includes a composite function which is a layered joining of a linear combination of inputs to the outputs from a nonlinear function (Activation) of nonlinear combination values.
Herein. A(l) and the vector b(l) are the weight and the bias of the linear combination of the lth layer, respectively. Also, f(l+1) is a vector combining the nonlinear (Activation) function of the (l+1)th layer. The reason for a vector or matrix is that the inputs and outputs of each layer are multidimensional. Hereinafter, the parameters of the neural network are taken to refer to A(l) and b(l), where l=0 . . . , L−1.
Objective Function
In the neural network, the parameters are learned such that the above Formula (8) holds for a data set supplied in advance. Herein, as an example, assume that the data set is (xn, yn), where n=1, . . . , N. In this case, learning refers finding the parameters whereby the difference between the prediction value f(xn) and the actual value yn is reduced in all supplied data sets. This difference is expressed by the following formula, for example.
Herein, l(yn, f(xn)) is a function that expresses the distance between the prediction value f(xn) and the actual value yn. Also, θ is a merging of the parameters A(l) and b(l) included in f, where l=0, . . . . L−1. L(θ) is also designated the objective function. Hypothetically, in the case in which the following formula holds, the objective function L(θ) is a squared error function.
[Math. 10]
l(yn,f(xn))=|yn−f(xn)|2 (10)
The objective function may also be other than the sum of squared error, and may be, for example, weighted squared error, normalized squared error, binary cross entropy, categorical entropy, or the like. Which kind of objectifying function is appropriate depends on the data format, the task (for example, whether y is discrete-valued (classification problem) or continuously-valued (regression problem)), and the like.
Learning
Learning is executed by minimizing the objective function L(θ). For this purpose, a method that uses a gradient related to the parameter θ may be used, for example. In one example known as the stochastic gradient method, the parameter θ that minimizes the objective function L(θ) is repeatedly searched for while the parameter is successively updated in accordance with the following formula.
Herein, the process of summing the gradient over the range from Nb to Nb+1 is also designated mini-batch. Methods that extend the above method, such as AdaGrad, RMProp, and AdaM, are often used for learning. With these methods, the variance of the gradient is used to eliminate scale dependence or the second moment, moving average, or the like of the gradient δ1/δθ is used to thereby avoid excessive response and the like.
In a neural network, by utilizing the rules of the composite function, the gradient of the objective function is obtained as a recurrence relation like the following formula.
The processing of obtaining a gradient with a recurrence relation is also called backpropagation. Since gradients are obtained easily by the recurrence relation, these gradients may be used in Formula (11) or a gradient method derived therefrom to successively estimate the parameter.
Configuration of Each Layer of Neural Network
Each layer of the neural network may take various configurations. The principal configuration of each layer of the neural network is determined by the configuration of linear combination and the type of nonlinear function. Configurations of linear combination chiefly include fully connected networks, convolutional networks, and the like. On the other hand, with regard to nonlinear combination functions, a succession of new functions are being proposed. One such example is the sigmoid illustrated in the following formula.
The above Formula (13) is a nonlinear mapping that transforms an input over (−∞, ∞) to an output over [0, 1]. A derivative of the sigmoid is Tan h illustrated in the following formula.
Also, rectified linear units (Relu) illustrated in the following formula is an activation function in use recently.
[Math. 15]
relu(x)=max(0,x) (15)
Additionally, Softmax and Softplus illustrated in the following formula and the like are also well-known.
In this way, in a neural network with increased expressive ability through the use of a composite function, the mapping function is learned by using a data set that includes a large amount of data. For this reason, the neural network is able to achieve highly accurate recognition or prediction, without designing feature quantities by human work as in the past.
The processing apparatus 100 and the terminal apparatus 200 are connected by a network 300. The network 300 is a wired or wireless transmission line for information transmitted from apparatus connected by the network 300. The network 300 may include, for example, a cellular network, a wired local area network (LAN), a wireless LAN, or the like.
The processing apparatus 100 is an information processing apparatus that executes various processes. The terminal apparatus 200 is an information processing apparatus that functions as an interface with a user. Typically, the system 1 interacts with the user by the cooperative action of the processing apparatus 100 and the terminal apparatus 200.
Next, exemplary configurations of each apparatus will be described with reference to
The communication section 110 includes a function of transmitting and receiving information. For example, the communication section 110 receives information from the terminal apparatus 200, and transmits information to the terminal apparatus 200.
The storage section 120 temporarily or permanently stores programs and various data for the operation of the processing apparatus 100.
The control section 130 provides various functions of the processing apparatus 100. The control section 130 includes a first acquisition section 131, a second acquisition section 132, a computation section 133, a generation section 134, and a notification section 135. Note that the control section 130 may additionally include other components besides the above components. In other words, the control section 130 may also execute operations besides the operations of the above components.
The operation of each component will be described briefly. The first acquisition section 131 and the second acquisition section 132 acquire information. The computation section 133 executes various computations and learning of the neural network described later. The generation section 134 generates information indicating a result of computations or learning by the computation section 133. The notification section 135 notifies the terminal apparatus 200 of information generated by the generation section 134. Other detailed operations will be described in detail later.
The input section 210 includes a function of receiving the input of information. For example, the input section 210 receives the input of information from a user. For example, the input section 210 may receive text input by a keyboard, touch panel, or the like, may receive voice input, or may receive gesture input. Otherwise, the input section 210 may receive data input from a storage medium such as flash memory.
The output section 220 includes a function of outputting information. For example, the output section 220 outputs information through images, sound, vibration, light emission, or the like.
The communication section 230 includes a function of transmitting and receiving information. For example, the communication section 230 receives information from the processing apparatus 100, and transmits information to the processing apparatus 100.
The storage section 240 temporarily or permanently stores programs and various data for the operation of the terminal apparatus 200.
The control section 250 provides various functions of the terminal apparatus 200. The control section 250 includes a notification section 251 and an acquisition section 253. Note that the control section 250 may additionally include other components besides the above components. In other words, the control section 250 may also execute operations besides the operations of the above components.
The operation of each component will be described briefly. The notification section 251 notifies the processing apparatus 100 of information indicating user input which is input into the input section 210. The acquisition section 253 acquires information indicating a result of computation by the processing apparatus 100, and causes the information to be output by the output section 220. Other detailed operations will be described in detail later.
Next, technical features of the system 1 according to the present embodiment will be described.
The system 1 (for example, the computation section 133) executes various computations related to the neural network. For example, the computation section 133 executes the learning of the parameters of the neural network, as well as computations that accept input data as input into the neural network, and outputs output data. The input data includes each value of one or more input items (hereinafter also designated input values), and each input value is input into a unit that corresponds to an input layer of the neural network. Similarly, the output data includes each value of one or more output items (hereinafter also designated output values), and each output value is output from a unit that corresponds to an output layer of the neural network.
The system 1 according to the present embodiment provides a visualization of information indicating the response relationship of the inputs and outputs of the neural network. Through the visualization of information indicating the response relationship of the inputs and outputs, the basis of a computational result of the neural network is illustrated. Thus, the user becomes able to check against one's own prior knowledge, and intuitively accept the appropriateness of the computational result. For this purpose, the system 1 executes the following two types of neural network computations, and provides the computational results.
As a first method, the system 1 adds a disturbance to the input values, and provides a visualization of information indicating how the output value changed as a result. Specifically, the system 1 provides information indicating the positioning inside the distribution of output values in the case of locking some of the input values in the multidimensional input data, and varying (that is, adding a disturbance to) the other input values. With this arrangement, the user becomes able to grasp the positioning of the output values.
As a second method, the system 1 backpropagates the error of the output values to provide a visualization of information indicating the distribution of input values. Specifically, the system 1 provides information indicating the degree of contribution to the output values by the input values. With this arrangement, the user becomes able to grasp the positioning of the input values.
Otherwise, the system 1 provides a visualization of a calculating formula or statistical processing result (for example, a flag or the like) indicating the response relationship of the system 1. With this arrangement, the user becomes able to understand the appropriateness of the computational result in greater detail.
Regarding the first method, the system 1 visualizes the response of an output value y in the case of adding a disturbance r to an input value x in Formula (1). The response is expressed like the following formula.
[Math. 17]
ƒk(r)=ƒ(x+rek) (17)
Herein, r is the value of the disturbance on a k-axis. Also, ek is the unit vector of the k-axis.
Regarding the second method, the system 1 visualizes the contribution of the inverse response of an input value x in the case of adding a disturbance δ to an output value y in Formula (11) or Formula (12). In the case of considering the differential from the output side, the system 1 computes the error to each layer by backpropagation using the following formula.
δ(L)=l(yn+δ,ƒ(xn)) (18)
Herein, by taking the following formula in the above Formula (12), the system 1 is also able to compute the backpropagation error to an input value itself.
[Math. 19]
ƒj(0)′(z)=1 (19)
Note that this visualization of response is not limited to the inputs into a neural network and the outputs from a neural network. For example, the response may also be visualized regarding input into a unit on the side close to the input and the output from a unit on the side close to the output in intermediate layers. Generally, on the side close to the output, feature quantities which are close to human-developed concepts, such as categories, are calculated. Also, on the side close to the input, feature quantities related to concepts closer to the data are calculated. The response between the two may also be visualized.
The above describes an overview of the visualization of response of a neural network. Hereinafter, an example of a UI related to the first method and the second method will be described specifically. As an example, a neural network is imagined in which real estate attribute information, such as the time on foot from the nearest station, the land area, and the age of the building, are treated as the input data, and a real estate price is output.
First Data Set
The system 1 (for example, the first acquisition section 131) acquires a first data set including combinations of input data and output data obtained by inputting the input data into the neural network. For example, the real estate attribute information corresponds to the input data, and the output data corresponds to the real estate price.
Second Data Set
The system 1 (for example, the second acquisition section 132) acquires one or more second data sets that include items with the same values as the first data set. The second data set includes items with the same values as the first data set, and items corresponding to the output data. For example, the second data set includes combinations of real estate attribute information and real estate prices, similarly to the first data set. Additionally, for example, a part of the attribute information is identical to the first data set.
In particular, in the first method, the second data set may include, among the items included in the input data, a first item whose values are different from the first data set, and another second item whose values are identical to the first data set. For example, the first item corresponds to an input item which is a target of the response visualization, while the second item corresponds to an input item which is not a target of the response visualization. For example, in the case in which the target of the response visualization is the age of the building, a second data set is acquired in which, among the input items, only the age of the building is different, while the others are identical. In other words, the second data set corresponds to a data set for the case of adding a disturbance to the value of the input item which is the target of the response visualization, expressed by the above Formula (17). With this arrangement, the second data set becomes an appropriate data set for providing a response relationship to the user.
The second data set includes a data set generated by a neural network taught on the basis of a data set actually observed in past time. Specifically, the system 1 (for example, the computation section 133) uses the above Formula (17) to compute output data for the case of adding a disturbance to the input item which is the target of visualization on the input unit side, and generates the second data set. For example, in some cases, it may be difficult to acquire a sufficient amount of real data satisfying the conditions described above, namely, having values of an input item which is the target of the response visualization be different from the first data set while the values of other input items are identical to the first data set. In such cases, generating a data set by a neural network makes it possible to compensate for an insufficient amount of data.
The second data set may also include a data set actually observed in past time. For example, the system 1 acquires the second data set from a database (for example, the storage section 120) that accumulates real data of the attribute information and value of real estate bought and sold in past time. In this case, the second data set may be learning data used in the learning of a neural network.
Generation of Information Indicating Positioning
The system 1 (for example, the generation section 134) generates information indicating the positioning of the first data set in the relationship with the second data set. The generated information indicating the positioning is provided to the user by the terminal apparatus 200, for example. With this arrangement, it becomes possible to provide the basis of a computational result of the neural network to the user.
Basic UI Example
In the first method, the information indicating the positioning illustrates the relationship between the values of an input item (corresponding to the first item) which is the target of the response visualization and the value of an item corresponding to the output data in the first data set and the second data set. With this arrangement, it becomes possible to provide the response relationship, such as how the age of the building influences the real estate price, for example, to the user clearly. Hereinafter,
Note that although
Note that the number of points of actual data 432 to be plotted may be a certain number, or only data close to the model output curve 431 may be plotted. Herein, “close” refers to the distance (for example, the Euclidean distance) in the input space being close, and in the case of treating one of the intermediate layers as the input, refers to the distance in the intermediate layer being close. In the case in which the distance (for example, the squared distance) between the plot of the actual data 432 and the model output curve 431 is close, the user becomes able to accept the computational result by the neural network. Also, it is desirable for the plotted actual data 432 to include both data whose value is greater than and data whose value is less than the input value of at least the provisional data 433. Also, in the case in which the plot of the actual data 432 and the contents (for example, the entirety of the data set) of the actual data 432 are linked, the contents of the actual data 432 may be provided when the plot of the actual data 432 is selected.
Applied UI Examples
The system 1 (for example, the generation section) may also generate information indicating the response relationship between an input value into a unit of an input layer and an output value from a unit of an output layer, the input layer and the output layer being selected from intermediate layers of the neural network. This refers to executing the visualization of the response relationship between an input layer and an output layer executed in the above UI example in the intermediate layers. With this arrangement, the user becomes able to understand the response relationship in the intermediate layers, and as a result, becomes able to understand in greater detail the appropriateness of a computational result of the neural network. Also, for example, in the case in which an abstract concept is expected to be expressed in a specific intermediate layer, through a visualization of the response relationship in the intermediate layers, the user becomes able to know the concept.
Graph
The information indicating the response relationship may also be a graph. A UI example of this case will be described with reference to
The UI block 444 is a UI for selecting the unit (that is, the dimension) to treat as the target of the visualization of the response relationship from among the intermediate layer 442 and the intermediate layer 443 selected in the UI block 441. As illustrated in
The UI block 447 is a UI that visualizes the response relationship between the input unit and the output unit selected in the UI block 444 with a graph. The X axis of the graph corresponds to the input value into the unit 445, and the Y axis corresponds to the output value from the unit 446. The curve included in the graph of the UI block 447 is generated by a technique similar to the model output curve 431 illustrated in
Note that the UI blocks 441, 444, and 447 may be provided on the same screen, or may be provided through successive screen transitions. The same applies to other UIs that include multiple UI blocks.
Function
The information indicating the response relationship may also be a function. A UI example of this case will be described with reference to
In the neural network, the response relationship of input and output is expressed by the above Formula (8). Formula (8) is a composite function of the linear combination by the model parameters A and b obtained by learning, and the nonlinear activation function f. However, this function is written with an enormous number of parameters, and it is normally difficult to understand the meaning thereof. Accordingly, the system 1 (for example, the generation section 134) generates a function with a reduced number of variables by treating the selected units 445A and 445B as variables, and locking the input values into input units other than the units 445A and 445B. Note that the locked input values into the input units other than the selected units 445A and 445B refer to the values input into each input unit as a result of computation in the case of locking the input values into the neural network, for example. Subsequently, the generated function is provided in the UI block 451. In this way, the system 1 provides a projection function that treats the selected input units as variables in an easy-to-understand format with a reduced number of variables. With this arrangement, the user becomes able to understand a partial response of the neural network easily.
In the case in which the function is difficult to understand for the user, an approximate function may also be provided as illustrated in
Note that in the UI block 451 or 461, the function may be provided in text format. Otherwise, the function may be provided as a statement in an arbitrary programming language. For example, the function may be stated in a programming language selected by the user from among a list of programming languages. In addition, the function provided in the UI block 451 or 461 may also be available to copy and paste into another arbitrary application.
Neural Network Design Assistance
The system 1 (for example, the generation section) may also generate and provide a UI for neural network design assistance.
For example, the system 1 may generate information for proposing the removal of units in an input layer which do not contribute to an output value from a unit in an output layer. For example, in the UI 460, in the case in which the coefficients of the input/output response of a function provided in the UI block 461 are all 0 and the bias is also 0, the unit 446 does not contribute to the network downstream, and thus may be removed. In this case, the system 1 may generate information for proposing the removal of the output unit. An example of the UI is illustrated in
For example, the system 1 may generate information related to a unit that reacts to specific input data. For example, the system 1 may detect a unit that reacts to the input data of a certain specific category (or class), and provide a message such as “This unit reacts to data in this category”, for example, to the user. Furthermore, the system 1 may recursively follow the input units that contribute to the unit, and extract a sub-network that generates the unit. Additionally, the system 1 may also reuse the sub-network as a detector for detecting input data of the specific category.
For example, suppose that a certain unit is a unit that reacts to “dog”. In this case, it is also possible to treat the unit as a “dog detector”. Furthermore, by classifying concepts subordinate to the concept by which “dog” is detected, the system 1 may provide a classification tree. A UI for associating a name such as “dog” or “dog detector” to the unit may also be provided. Also, besides a UI corresponding to the responsiveness of a unit, a UI corresponding to the distribution characteristics of a unit may be provided.
First Data Set
Similarly to the first method, the system 1 (for example, the first acquisition section 131) acquires a first data set including combinations of input data and output data obtained by inputting the input data into the neural network.
Second Data Set
Similarly to the first method, the system 1 (for example, the second acquisition section 132) acquires one or more second data sets that include items with the same values as the first data set. Similarly to the first method, the second data set includes items with the same values as the first data set, and items corresponding to the output data.
In particular, in the second method, the second data set may include, among the items included in the input data, a first item whose values are different from the first data set, and another second item as well as an item corresponding to the output data whose values are identical to the first data set. For example, the first item corresponds to an input item which is a target of the response visualization, while the second item corresponds to an input item which is not a target of the response visualization. For example, in the case in which the target of the response visualization is the age of the building, a second data set is acquired in which, among the input items, only the age of the building is different, while the other input items and the real estate price are identical. In other words, the second data set corresponds to the backpropagation error of the input item which is the target of the response visualization, expressed by the above Formula (18). With this arrangement, the second data set becomes an appropriate data set for providing a response relationship to the user.
The second data set includes a data set generated by backpropagation in a neural network taught on the basis of a data set actually observed in past time. Specifically, the system 1 (for example, the computation section 133) uses the above Formula (18) to compute the response of the input unit which is the target of visualization in the case of adding an error on the output unit side. Herein, the input and output units may be units of the input layer and the output layer of the neural network, or units of selected intermediate layers. The error on the output unit side is repeatedly calculated using a random number (for example, a Gaussian random number) that takes a value in the first data set as a center value. It is assumed that the standard deviation of the random number can be set by the user. Meanwhile, on the input unit side, a distribution corresponding to the random number on the output unit side is obtained.
Generation of Information Indicating Positioning
The system 1 (for example, the generation section 134) generates information indicating the positioning of the first data set in the relationship with the second data set.
In the second method, the information indicating the positioning may also be information indicating the position of the input data in the distribution of the second data set for which the values of items corresponding to the output data are identical to the output data. In this case, the system 1 provides information indicating the position of the input data in the first data set in the distribution of input values produced on the input unit side in response to an error added on the output unit side, for example. A UI example of this case will be described with reference to
Also, in the second method, the information indicating the positioning may also be information indicating the deviation value of the input data in the distribution of the second data set for which the values of items corresponding to the output data are identical to the output data. In this case, the system 1 provides information indicating the deviation value of the input data in the first data set in the distribution of input values produced on the input unit side in response to an error added on the output unit side, for example. A UI example of this case will be described with reference to
Note that the deviation value may also be calculated by a method other than the method using a random number described above. For example, the deviation value of the first data set in the distribution of accumulated actual data may be calculated. With this arrangement, the user becomes able to know the absolute positioning of the first data set in the actual data as a whole. Otherwise, in the case in which the inputs values are locked to some degree, the method using a random number described above, by which the positioning of the target input value among input values that obtain a similar output value can be known, is effective.
In addition, the foregoing describes a case in which the input values are continuous quantities, the present technology is not limited to such an example. For example, the input values may also be discrete quantities. In the case of discrete quantities, the calculation of deviation values may become difficult, but an occurrence distribution of the discrete quantities may be calculated, and the occurrence probability may be used instead of deviation values.
Also, besides deviation values, the distribution width of input values, for example, may also be provided in the UI 530. Even if the output value is the same, since the correlation is weak between an input value with a large distribution width and the output value, the degree of contribution to the output value is understood to be low. For example, degree of contribution=1/distribution width.
Also, likewise in the second method, a UI for neural network design assistance described in relation to the first method may also be provided.
Next,
The present technology is usable in any application for a classification problem that determines a category to which input data belongs, or for a regression problem that predicts output data obtained by input data. In the following, as one example, an example of an application for a regression problem will be described.
Also, the present technology is able to take a variety of input data and output data. The input data is attribute information about an evaluation target, and the output data is an evaluation value of the evaluation target. For example, to cite the example above, the evaluation target may be real estate, and the evaluation value may be the price of the real estate.
For example, an example in which a user assesses real estate prices on an Internet site is anticipated. In this case, the processing apparatus 100 corresponds to a user terminal that the user operates, and the processing apparatus 100 may exist on the backend of the Internet site.
Suppose that the user desires to sell or purchase “Tokiwa-so Apt. 109”. The user inputs attribute information about the real estate into the Internet site, and is provided with an estimated result of the assessed price. An example of the UI in this case is illustrated in
As described above, the system 1 is capable of providing the basis of a computational result of the neural network. For example, if a button 612 is selected, the system 1 provides the basis of assessment, that is, information indicating the basis of the price indicated in the system-estimated price or the system-estimated price range. An example of the UI is illustrated in
First, the user is able to confirm that the provisional data 623 is on the model output curve 621. In addition, the user is also able to confirm the contents (for example, other attribute information besides the number of square meters) of the actual data 622 linked to the plot of the actual data 622. The user is also able to switch the X axis from the number of square meters to another input item. For example, if the user selects an X axis label 624, a candidate list of input items, such as the age of the building, is displayed, and the input item corresponding to the X axis may be switched in response to a selection from the list. From this information, the user is able to confirm that the assessed price is appropriate.
In this way, by applying the present technology to a real estate price assessment application, the user is able to confirm the basis of the assessed price. In the related art, even if a real estate agent provides such a basis, the appropriateness of the basis is unclear, and for example, there is also the possibility that a basis of weak reliability including the agent's intentions may be provided. In contrast, according to the present technology, since the basis by which the calculator has calculated automatically is provided, a basis of higher reliability is provided.
The evaluation target in this application is a work of art. The evaluation value may be a score of the work of art. In the following, a piano performance is assumed as one example of a work of art.
For example, in a musical competition, a score is assigned by a judge. Specifically, the judge evaluates the performance on multiple axes. Conceivable evaluation axes are, for example, the selection of music, the expression of the music, the assertiveness of the performer, the technical skill, the sense of tempo, the sense of rhythm, the overall balance of the music, the pedaling, the touch, the tone, and the like. Evaluation on each of these evaluation axes may be executed on the basis of the judge's experience. Subsequently, the judge calculates a score by weighting each of the evaluation results. The weighting may also be executed on the basis of the judge's experience.
The system 1 may also execute evaluation conforming to such an evaluation method by a judge. A computational process for this case will be described with reference to
Each of the feature quantity computation function 141, the individual item evaluation function 142, and the comprehensive evaluation function 143 may be configured by a neural network, or may be an approximate function of a neural network. For example, in the case of applying the present technology in the comprehensive evaluation function 143, the system 1 becomes able to provide the basis of the appropriateness of the evaluation by providing the response in the case of varying the performance data, the contribution of an individual item evaluation value when the comprehensive evaluation is modified, and the like.
The above gives a piano performance as one example of a work of art, but the present technology is not limited to such an example. For example, the work of art to be evaluated may also be a song, a painting, a literary work, or the like.
The evaluation target in this application is a sport. The evaluation value may be a score in the sport. Herein, the score in the sport does not refer to something that is easy to quantify, like the score in soccer or the time in a track and field event, but instead refers to something that is difficult to quantify, like the points of a figure skating performance. In the following, figure skating is assumed as one example of a sport.
For example, a figure skating performance is assigned a score by a judge. Specifically, the judge evaluates the performance on multiple axes. The evaluation axes are, for example, the skating skill, how the elements are connected, the motion, the carriage of the body, the choreography, the composition, the interpretation of the music, and the like. Evaluation on each of these evaluation axes may be executed on the basis of the judge's experience. Subsequently, the judge calculates a score by weighting each of the evaluation results. The weighting may also be executed on the basis of the judge's experience. In figure skating, to avoid an arbitrary evaluation by a judge, the evaluations of judges with high or low scores are excluded to make a comprehensive evaluation. In the future, this evaluation could also be executed by machines.
Accordingly, the system 1 may evaluate a sport by a process similar to the process of evaluating a work of art described above with reference to
Finally, a hardware configuration of an information processing apparatus according to the present embodiment will be described with reference to
As illustrated in
The CPU 901 functions as an arithmetic processing device and a control device and controls the overall operation in the information processing apparatus 900 according to various programs. Further, the CPU 901 may be a microprocessor. The ROM 902 stores programs, operation parameters and the like used by the CPU 901. The RAM 903 temporarily stores programs used in execution of the CPU 901, parameters appropriately changed in the execution, and the like. The CPU 901 can form the control section 130 illustrated in
The CPU 901, the ROM 902 and the RAM 903 are connected by the host bus 904a including a CPU bus and the like. The host bus 904a is connected with the external bus 904b such as a peripheral component interconnect/interface (PCI) bus via the bridge 904. Further, the host bus 904a, the bridge 904 and the external bus 904b are not necessarily separately configured and such functions may be mounted in a single bus.
The input device 906 is realized by a device through which a user inputs information, such as a mouse, a keyboard, a touch panel, a button, a microphone, a switch, and a lever. In addition, the input device 906 may be a remote control device using infrared ray or other electric waves or external connection equipment such as a cellular phone or a PDA corresponding to operation of the information processing apparatus 900, for example. Furthermore, the input device 906 may include an input control circuit or the like which generates an input signal on the basis of information input by the user using the aforementioned input means and outputs the input signal to the CPU 901, for example. The user of the information processing apparatus 900 may input various types of data or order a processing operation for the information processing apparatus 900 by operating the input device 906.
In addition to the above, the input device 906 can be formed by a device that detects information related to the user. For example, the input device 906 can include various sensors such as an image sensor (a camera, for example), a depth sensor (a stereo camera, for example), an acceleration sensor, a gyro sensor, a geomagnetic sensor, an optical sensor, a sound sensor, a distance measurement sensor, and a force sensor. Also, the input device 906 may acquire information related to the state of the information processing apparatus 900 itself such as the posture and the moving velocity of the information processing apparatus 900 and information related to a surrounding environment of the information processing apparatus 900 such as brightness or noise around the information processing apparatus 900. Also, the input device 906 may include a GNSS module that receives a GNSS signal (a GPS signal from a global positioning system (GPS) satellite, for example) from a global navigation satellite system (GNSS) satellite and measures position information including the latitude, the longitude, and the altitude of the device. In addition, the input device 906 may detect the position through Wi-Fi (registered trademark), transmission and reception to and from a mobile phone, a PHS, a smartphone, or the like, near-field communication, or the like, in relation to the position information. The input device 906 can form the input section 210 illustrated in
The output device 907 is formed by a device that may visually or aurally notify the user of acquired information. As such devices, there is a display device such as a CRT display device, a liquid crystal display device, a plasma display device, an EL display device, a laser projector, an LED projector or a lamp, a sound output device such as a speaker and a headphone, a printer device and the like. The output device 907 outputs results acquired through various processes performed by the information processing apparatus 900, for example. Specifically, the display device visually displays results acquired through various processes performed by the information processing apparatus 900 in various forms such as text, images, tables and graphs. On the other hand, the sound output device converts audio signals including reproduced sound data, audio data and the like into analog signals and aurally outputs the analog signals. The aforementioned display device or the aforementioned sound output device may form the output section 220 illustrated in
The storage device 908 is a device for data storage, formed as an example of a storage section of the information processing apparatus 900. For example, the storage device 908 is realized by a magnetic storage device such as an HDD, a semiconductor storage device, an optical storage device, a magneto-optical storage device or the like. The storage device 908 may include a storage medium, a recording device for recording data on the storage medium, a reading device for reading data from the storage medium, a deletion device for deleting data recorded on the storage medium and the like. The storage device 908 stores programs and various types of data executed by the CPU 901, various types of data acquired from the outside and the like. The storage device 908 may form the storage section 120 illustrated in
The drive 909 is a reader/writer for storage media and is included in or externally attached to the information processing apparatus 900. The drive 909 reads information recorded on a removable storage medium such as a magnetic disc, an optical disc, a magneto-optical disc or a semiconductor memory mounted thereon and outputs the information to the RAM 903. In addition, the drive 909 can write information on the removable storage medium.
The connection port 911 is an interface connected with external equipment and is a connector to the external equipment through which data may be transmitted through a universal serial bus (USB) and the like, for example.
The communication device 913 is a communication interface formed by a communication device for connection to a network 920 or the like, for example. The communication device 913 is a communication card or the like for a wired or wireless local area network (LAN), long term evolution (LTE), Bluetooth (registered trademark) or wireless USB (WUSB), for example. In addition, the communication device 913 may be a router for optical communication, a router for asymmetric digital subscriber line (ADSL), various communication modems or the like. For example, the communication device 913 may transmit/receive signals and the like to/from the Internet and other communication apparatuses according to a predetermined protocol, for example, TCP/IP or the like. The communication device 913 may form the communication section 110 illustrated in
Further, the network 920 is a wired or wireless transmission path of information transmitted from devices connected to the network 920. For example, the network 920 may include a public circuit network such as the Internet, a telephone circuit network or a satellite communication network, various local area networks (LANs) including Ethernet (registered trademark), a wide area network (WAN) and the like. In addition, the network 920 may include a dedicated circuit network such as an internet protocol-virtual private network (IP-VPN).
Hereinbefore, an example of a hardware configuration capable of realizing the functions of the information processing apparatus 900 according to this embodiment is shown. The respective components may be implemented using universal members, or may be implemented by hardware specific to the functions of the respective components. Accordingly, according to a technical level at the time when the embodiments are executed, it is possible to appropriately change hardware configurations to be used.
In addition, a computer program for realizing each of the functions of the information processing apparatus 900 according to the present embodiment as described above may be created, and may be mounted in a PC or the like. Furthermore, a computer-readable recording medium on which such a computer program is stored may be provided. The recording medium is a magnetic disc, an optical disc, a magneto-optical disc, a flash memory, or the like, for example. Further, the computer program may be delivered through a network, for example, without using the recording medium.
The above describes an embodiment of the present disclosure in detail, with reference to
The preferred embodiment(s) of the present disclosure has/have been described above with reference to the accompanying drawings, whilst the present disclosure is not limited to the above examples. A person skilled in the art may find various alterations and modifications within the scope of the appended claims, and it should be understood that they will naturally come under the technical scope of the present disclosure.
For example, in the foregoing embodiments, the processing apparatus 100 and the terminal apparatus 200 are described as individual apparatus, but the present technology is not limited to such an example. For example, the processing apparatus 100 and the terminal apparatus 200 may also be realized as a single apparatus.
Note that it is not necessary for the processing described in this specification with reference to the flowchart and the sequence diagram to be executed in the order shown in the flowchart. Some processing steps may be performed in parallel. Further, some of additional steps can be adopted, or some processing steps can be omitted.
Further, the effects described in this specification are merely illustrative or exemplified effects, and are not limitative. That is, with or in the place of the above effects, the technology according to the present disclosure may achieve other effects that are clear to those skilled in the art from the description of this specification.
Additionally, the present technology may also be configured as below.
(1)
An information processing apparatus including:
a first acquisition section that acquires a first data set including a combination of input data and output data obtained by inputting the input data into a neural network;
a second acquisition section that acquires one or more second data sets including an item identical to the first data set; and
a generation section that generates information indicating a positioning of the first data set in a relationship with the one or more second data sets.
(2)
The information processing apparatus according to (1), in which
the one or more second data sets include an item of identical value to the first data set.
(3)
The information processing apparatus according to (2), in which
in the one or more second data sets, among items included in the input data, a value of a first item is different from the first data set, while a value of a second item other than the first item is identical to the first data set.
(4)
The information processing apparatus according to (3), in which
the information indicating the positioning includes information indicating a relationship between the value of the first item and a value of an item corresponding to the output data in the first data set and the one or more second data sets.
(5)
The information processing apparatus according to (2), in which
the one or more second data sets include a data set generated by the neural network taught on a basis of a data set actually observed in past time.
(6)
The information processing apparatus according to (2), in which
the one or more second data sets include a data set actually observed in past time.
(7)
The information processing apparatus according to (2), in which
in the one or more second data sets, among items included in the input data, a value of a first item is different from the first data set, while a value of a second item other than the first item and a value of an item corresponding to the output data are identical to the first data set.
(8)
The information processing apparatus according to (7), in which
the information indicating the positioning is information indicating a position of the input data in a distribution of the one or more second data sets for which a value of an item corresponding to the output data is identical to the output data.
(9)
The information processing apparatus according to (7), in which
the information indicating the positioning is information indicating a deviation value of the input data in a distribution of the one or more second data sets for which a value of an item corresponding to the output data is identical to the output data.
(10)
The information processing apparatus according to (1), in which
the generation section generates information indicating a response relationship between an input value into a unit of an input layer and an output value from a unit of an output layer, the input layer and the output layer being selected from intermediate layers of the neural network.
(11)
The information processing apparatus according to (10), in which
the information indicating the response relationship is a graph.
(12)
The information processing apparatus according to (10), in which
the information indicating the response relationship is a function.
(13)
The information processing apparatus according to (10), in which
the generation section generates information for proposing a removal of a unit in the input layer that does not contribute to an output value from a unit in the output layer.
(14)
The information processing apparatus according to (1), in which
the input data is attribute information about an evaluation target, and
the output data is an evaluation value of the evaluation target.
(15)
The information processing apparatus according to (14), in which
the evaluation target is real estate, and
the evaluation value is a price.
(16)
The information processing apparatus according to (14), in which
the evaluation target is a work of art.
(17)
The information processing apparatus according to (14), in which
the evaluation target is a sport.
(18)
An information processing apparatus including:
a notification section that notifies an other apparatus of input data; and
an acquisition section that acquires, from the other apparatus, information indicating a positioning of a first data set in a relationship with a second data set including an item identical to the first data set, the first data set including a combination of output data obtained by inputting the input data into a neural network.
(19)
An information processing method including:
acquiring a first data set including a combination of input data and output data obtained by inputting the input data into a neural network:
acquiring one or more second data sets including an item identical to the first data set; and
generating, by a processor, information indicating a positioning of the first data set in a relationship with the one or more second data sets.
(20)
An information processing method including:
notifying an other apparatus of input data; and
acquiring, by a processor, from the other apparatus, information indicating a positioning of a first data set in a relationship with a second data set including an item identical to the first data set, the first data set including a combination of output data obtained by inputting the input data into a neural network.
Number | Date | Country | Kind |
---|---|---|---|
2016-063785 | Mar 2016 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2016/087186 | 12/14/2016 | WO | 00 |