This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2019-102944, filed on May 31, 2019, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to an information processing apparatus, log control program, and a log control method.
In recent years, a large amount of data has been continuously generated due to an increase in an Internet of Things (IoT) device, and a streaming processing technology for executing processing the data continuously generated as an input has been developed. In a distributed stream data processing platform for executing such streaming processing, a plurality of tasks are coupled to each other, and a large amount of data is sequentially input to and processed by the tasks. For example, a developer may output a log for each task and each processing path by describing a code that outputs the log at the time of development of each task. The distributed stream data processing platform may include “Apache Storm” developed by Twitter Inc., and “Apache Hadoop” (Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation quoted from Wikipedia), for example.
In this regard, a technology related to the output of the log has been known (for example, Japanese Laid-open Patent Publication No. 2009-110156 and Japanese Laid-open Patent Publication No. 2013-206117).
Related techniques are disclosed in, for example, Japanese Laid-open Patent Publication No. 2014-81811.
According to an aspect of the embodiments, an information processing apparatus includes: a memory; and a processor, coupled to the memory, configured to: determine, for each task of a plurality of tasks executed in a distributed stream data processing platform, a log score based on an indication associated with easiness of occurrence of a failure; and output a log message for each task of the plurality of tasks at an output frequency based on a log score of each task and a log score of at least one of an upstream task located upstream of each task and a downstream task located downstream of each task.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
However, the developer has determined how to output the log of the task at the time of development. Therefore, it is difficult to predict a state at the time of an operation of a task, so that a log may be acquired excessively to increase a load on a distributed stream data processing platform, sufficient log may not be acquired, thereby making it difficult to perform the verification later. The distributed stream data processing platform is also called as the distributed stream data processing base.
Hereinafter, some embodiments of the present disclosure will be described in detail with reference to the drawings. The corresponding components in a plurality of the drawings are denoted by the same reference signs.
As described above, in the distributed stream data processing platform, a plurality of tasks are coupled to each other, and a large amount of data is input to these tasks to be processed one after another.
However, the developer has determined, for example, how to output the log of the task 101 at the time of the development. Therefore, it is difficult to predict the situation at the time of the operation of the task 101, and it may be difficult to acquire the log, increase the load on the distributed stream data processing platform, or acquire a sufficient log, thereby making it difficult to investigate the cause at the time of occurring a failure or the like. Therefore, there is a demand for providing a technology capable of improving the efficiency of log acquisition.
In the embodiments described below, for each task 101, a log score is calculated based on the state of the task 101. The log score may be, for example, a value obtained by evaluating a possibility that the task 101 is associated with a failure. The log score may, in one example, be calculated based on an indication associated with the occurrence of a failure in the task 101.
In the embodiment, the output frequency of the log of the task 101 is determined based on, for example, the log score of the task 101 and the log score of at least one task 101 upstream and the downstream side of the task 101.
Next, the determination of the output frequency of the log according to the embodiment will be further described.
For example, in
For example, in a case where the log score of the task 101 on the downstream of a certain task 101 is high, the control unit 401 may increase the log score of the task 101. For example, in
For example, when determining the output frequency of the log of the task C503, the control unit 401 considers the log scores of the task D504, the task E505, and the task F506 downstream. In this case, since the log score of the task C504 is high, the log score of the task C503 may be increased, but the other task E505 and task F506 downstream may not have a high log score, and therefore, the control unit 401 may not increase the log score of the task C503 that much. The control unit 401 may determine the output frequency of the log of the task C503 based on the log score slightly increased due to the influence of the task D504 downstream. For example, in a case where a failure occurs in the task D504, the task D504 is a task 101 downstream of the task C503, and thus the task C503 may be a cause of a failure, and the log score may be increased. However, since the task E505 and task F506 other than the task D504 among the three tasks 101 downstream of the task C503 are normally operated, it may be estimated that the task C503 is less likely to be an actual cause of the failure, so that the log score is not increased that much.
For example, when determining the output frequency of the log of the task G506, the control unit 401 considers the log scores of the task C504 upstream. In this case, since the log score of the task C504 is high, the log score of the task G507 may be increased. The control unit 401 may determine the output frequency of the log of the task G507 based on the log score slightly increased due to the influence of the task C504 upstream. The reason for this is that in a case where a failure occurs in the task D504, the log of the task G507 downstream of the task D504 is checked, so that the influence of the failure may be easily investigated.
In the example illustrated in
As described above, in the embodiment, in the case where there is a task 101 that is estimated to have a high probability of occurrence of a failure, the log scores of the tasks 101 upstream and the downstream side are also adjusted to be increased. Based on the increased log score, the output frequency of the log is determined. For this reason, for example, it is assumed that a failure has actually occurred in the task 101, which is estimated to be highly likely to cause a failure. In this case as well, since the output frequency of the log of the task 101 with which the failure occurred and the log of at least one of the tasks 101 upstream and downstream side is increased, it is possible to efficiently investigate the cause of the failure and the influence of the failure.
Next, the adjustment of the total amount of the logs output from the distributed stream data processing platform 300 will be described. When the total amount of the logs output from the distributed stream data processing platform 300 is large, the load applied to the distributed stream data processing platform 300 is also increased. Therefore, it is preferable that the amount of the logs output from the distributed stream data processing platform 300 is adjusted so as to approximate the predetermined number of logs to be a target. In the embodiments described below, the processing for optimizing the amount of the log output from each task 101 is executed so that the total number of logs output in a predetermined period in the distributed stream data processing platform 300 approaches the predetermined number of logs. In one embodiment, the processing for optimizing the amount of logs may be executed by the management apparatus that executes the management of the distributed stream data processing platform 300, for example. In one example, the management apparatus may be an information processing apparatus 301 representing a plurality of information processing apparatuses 301.
The control unit 401 of the management apparatus totalizes the log number for each log score determined for each of the tasks 101 (
Subsequently, the control unit 401 corrects the output frequency of the log in each task 101 so that the total number of logs of the distributed stream data processing platform 300 approaches a predetermined value as a target (
As described above, the total amount of the logs output from the distributed stream data processing platform 300 may be adjusted to an appropriate amount with respect to the distributed stream data processing platform 300. Therefore, it is possible to suppress the processing load on the distributed stream data processing platform 300.
The execution processing of the task 101 accompanied with the output of the log according to the embodiment will be described below.
In step 701 (the term “step” is abbreviated as “S” hereinafter. For example, referred to as “S701”), the control unit 401 waits to receive a message. For example, when the message is received, the processing of the control unit 401 may proceed to S702.
In S702, the control unit 401 determines the type of the received message. When the message type is a message 801 (illustrated in
The upstream task ID is, for example, identification information for identifying the task 101 upstream which transmits the message 801. The downstream task ID is, for example, identification information for identifying the task 101 downstream which is the destination of the message 801. The log score of the upstream task is a log score calculated for the task 101 upstream identified by the upstream task ID at the time of transmitting the message 801. The message for task processing may be, for example, information used in processing the task 101 downstream which is the destination of the message 801. The message for task processing may be information output as the execution result of the task 101 upstream, for example.
In S703, when receiving the message 801, the control unit 401 returns a response message 802 (illustrated in
In S704, the control unit 401 executes the task 101 designated as the transmission destination of the message 801. For example, the control unit 401 may execute the processing of the task 101 designated as a transmission destination to the downstream task ID of the message 801 by using the Information included in the message for task processing of the message 801. For example, the control unit 401 may temporarily store, in the storage unit 402, a log message to which an output request has been issued in the execution of the task 101. When executing the task 101, the control unit 401 may record the information related to the execution in task execution information 1000.
In S705, the control unit 401 generates and transmits the message 801 to the task 101 downstream of the executed task 101. The control unit 401 may set, for example, the identification information of the executed task 101 to the upstream task ID of the message 801. The control unit 401 may set the log score of the score information 900 corresponding to the task ID of the executed task 101 to the log score of the upstream task of the message 801, and may set the execution result of the task 101 to the message for task processing of the message 801. The control unit 401 may set the identification information of the task 101 downstream to the downstream task ID of the message 801. The control unit 401 may transmit the generated message 801 to the task 101 downstream.
In S706, the control unit 401 receives the response message 802 from the task 101 downstream. In the upstream task ID of the response message 802, for example, the task ID of the task 101 that has transmitted the message 801 in S705 is registered. The transmission destination of the message 801 in S705, the task ID of the task 101 which is the transmission source of the response message 802 with respect to the message 801, and the log score are registered in the downstream task ID and the log score of the downstream task of the response message 802, respectively.
In S707, the control unit 401 calculates a log score for the task 101 executed in S704. The details of the log score calculation will be described later with reference to
In S708, the control unit 401 determines whether or not to output a log message based on the log score calculated for the task 101 in S707. For example, the task 101 outputs an output request for requesting the output of the log at the time of execution. In S708, the control unit 401 may determine whether or not to output the log message requested to be output to the log data in the storage unit 402. The process for determining whether or not to output the log message will be described in detail later with reference to
In a case where it is determined that the log message is output in the processing of S708, in S709, the control unit 401 outputs and records the log message temporarily stored in the storage unit 402 to the log data in the storage unit 402. In this case, the control unit 401 updates the value by adding 1 to the number of outputs corresponding to the executed task 101 in log count information 1100 described below, and the flow returns to S701.
In a case where the message type is the log score correction message 1200 in S702, the flow proceeds to S710.
In S710, the control unit 401 stores the received log score correction message 1200 in the storage unit 402, and the flow returns to S701. In a case where the log score correction message 1200 is already stored in the storage unit 402, the control unit 401 may update the log score correction message 1200 stored in the storage unit 402 with a new received log score correction message 1200.
In a case where the message type is a tabulation request in S702, the flow proceeds to S711. As described above with reference to
When the control unit 401 receives the tabulation request in S711, it generates a tabulation request response message 1300, responds to the information processing apparatus 301 which has transmitted the tabulation request, and returns the flow to S701.
According to the operation flow of
Subsequently, the calculation of the log score will be described in more detail.
In S1401, the control unit 401 calculates a log score for each of the at least one of the tasks 101 provided in the information processing apparatus 301. For example, the control unit 401 may calculate the log score of the task 101 based on the information related to the execution of the past task 101 registered in the task execution information 1000. In the following description, an example of calculating a log score using the processing time of the task 101 as information relating to the execution of the task 101 will be described.
For example, in a case where the processing time of the task 101 is significantly longer than the normal processing time, it may be estimated that the task 101 is highly likely to cause a failure. Therefore, based on the difference from the normal processing time of the task 101, it is possible to obtain a log score in which the possibility of occurrence of the failure is evaluated. For example, it is assumed that an average of the processing time of the last 10 tasks 101 is set as Tave. The processing time of the current task 101 is set as Tcurr. In this case, the log score Stask for the task 101 may be calculated, for example, by the following Equation 1. In Equation 1, abs represents an absolute value.
S
task=abs(Tcurr−Tave)/Tave Equation 1
According to Equation 1, the log score may be evaluated based on the difference from the past processing time. In another embodiment, the log score may be evaluated by using an occurrence frequency of errors of the task 101, an elapsed time since the task 101 is provided, a frequency at which the task 101 outputs the message of the output request of the logs, and the like.
In S1402, the control unit 401 determines the log score of the task 101 upstream based on the message 801 received from the task 101 upstream. For example, in a case where there is one task 101 upstream, the control unit 401 may use the log score of the upstream task of the latest message 801 received from the task 101 upstream. In a case where there are a plurality of tasks 101 upstream that have transmitted data to the task 101, the control unit 401 may determine a value representative of a log score of an upstream task of the latest message 801 received from each of the tasks 101 upstream as a log score of the task 101 upstream. For example, as indicated in Equation 2 below, the control unit 401 may use the average of the log scores of the plurality of tasks 101 upstream as the log score: Supper of the task 101 upstream. In Equation 2, ave represents an average, and Su1, Su2, Su3, . . . represent log scores of a plurality of tasks 101 upstream.
S
upper
=ave(Su1,Su2,Su3, . . . ) Equation 2
In S1403, the control unit 401 determines the log score of the task 101 downstream based on the response message 802 received from the task 101 downstream. For example, in a case where there is one task 101 downstream, the control unit 401 may use the log score of the downstream task of the latest response message 802 received from the task 101 downstream. In a case where there are a plurality of tasks 101 downstream that have transmitted data to the task 101, the control unit 401 may determine a value representative of a log score of a downstream task of the latest response message 802 received from each of the tasks 101 downstream as a log score of the task 101 downstream. For example, as indicated in Equation 3 below, the control unit 401 may use the average of the log scores of the plurality of tasks 101 downstream as the log score: Sdown the task 101 downstream. In Equation 3, ave represents an average, and Sd1, Sd2, Sd3, . . . represent log scores of a plurality of tasks 101 downstream.
S
down
=ave(Sd1,Sd2,Sd3, . . . ) Equation 2
In S1404, the control unit 401 calculates a log score using the log scores of the adjacent tasks 101. For example, the control unit 401 may calculate the log score of the task 101 using the log score of the task 101 upstream of the task 101 obtained in S1402 and the log score of the task 101 downstream of the task 101 obtained in S1403. In one example, as indicated in Equation 4, the control unit 401 calculates the log score: Scurrent by causing the log score of the task 101 in S1401, the log score of the task 101 upstream of S1402, and the log score of the task 101 downstream of S1403 to be contributed at a predetermined ratio.
S
current=0.5×Stask+0.1×Supper+0.4×Sdown Equation 4
For example, the log scores of the adjacent tasks 101 may be contributed to the log score of the task 101 according to the above Equation 4. For example, in a case where a higher log score is set to a task 101 than another task 101, the value may also be propagated to the log scores of the adjacent tasks 101. When the failure occurs in the task 101, it is highly likely that the cause exists in the task 101 upstream rather than the task 101 downstream. In Equation 4, the coefficient of the log score Sdown of the task 101 downstream is set higher than the coefficient of the log score Supper of the task upstream, so that the influence of the log score of the task 101 downstream is more easily affected than that of the task 101 upstream. Although an example in which the log scores of both the task 101 upstream and the task 101 downstream are used for the calculation of the log score of the task 101 is indicated in Equation 4, the embodiment is not limited thereto. In another embodiment, only one of the task 101 upstream and the task 101 downstream may be used to calculate the log score of the task 101.
In S1405, the control unit 401 adjusts the log score to suppress a change in the log score. For example, in a case where the value of the log score drops abruptly and the frequency of taking a log drops abruptly, a sufficient log for pursuing the cause of the failure may not be taken. Therefore, in S1405, the control unit 401 performs adjustment to moderate the change of the log score. For example, as indicated in Equation 5 below, by causing the Scurrent currently calculated in S1404 to contribute to the log score: Sbase-1 previously calculated in S1405 in the execution of the operation flow illustrated in
S
base=0.5×Scurrent+0.5×Sbase-1 Equation 5
In Equation 5, the log score: Sbase-1 previously calculated in S1405 in the execution of the operation flow illustrated in
In S1406, the control unit 401 performs correction so as to optimize the total number of logs. For example, as will be described later with reference to
S=min(1,max(0,Sbase×W)) Equation 6
In Equation 6, W is a correction value for the task 101. In Equation 6, the value of the log score S falls within the range of 0 to 1 by using a function of min and max. When the processing in S1406 is completed, the operation flow in
Next, log output determination processing executed in S708 in
In S1501, the control unit 401 calculates an output interval of the logs based on the log score. The output interval of the logs may be calculated by, for example, the following Equation 7. In Equation 7, round is a function for rounding off the decimal point or less, so that the value of the N is made an integer.
N
interval=round(1/S) Equation 7
In S1502, the control unit 401 determines whether to output the log message by comparing the number of log messages for which an output request has been issued after the output of the previous log message with the log output interval obtained in S1501. For example, the control unit 401 may determine that the log message is output in a case where the number Nbefore of the message of the output request of the log output after the output of the previous log message is larger than the Ninterval (Nbefore>Ninterval). When the processing in S1502 is completed, the operation flow in
Next, the optimization of the entire log amount will be described with reference to
In S1601, the control unit 401 of the management apparatus waits for a predetermined period of time. In one example, the administrator of the distributed stream data processing platform 300 may estimate the waiting time for which it is possible to see the tendency of the output of the log in the distributed stream data processing platform 300, and may set the time in advance as a predetermined time.
In S1602, the control unit 401 of the management apparatus transmits a tabulation request for a log score to the information processing apparatus 301 included in the distributed stream data processing platform 300. For example, the control unit 401 of the management apparatus may transmit the tabulation request to each of the plurality of information processing apparatuses 301 responsible for the execution of the task 101 in the distributed stream data processing platform 300.
In S1603, the control unit 401 of the management apparatus waits for a response from each of the information processing apparatuses 301. The response from each of the information processing apparatuses 301 may be, for example, the tabulation request response message 1300 described above.
In S1604, when the reception of the tabulation request response message 1300 from each of the information processing apparatuses 301 is completed, the control unit 401 of the management apparatus calculates a correction value using the received tabulation request response message 1300. For example, when the number of outputs of the log messages of each task 101 is Nt1, Nt2, Nt3, . . . , a correction value W may be obtained by the following Equation 8. Z is a target log message number, which may be set in advance by the administrator of the distributed stream data processing platform 300.
W=Z/(Nt1+Nt2+Nt3+ . . . ) Equation 8
In S1605, the control unit 401 of the management apparatus generates a log score correction message 1200 including the obtained correction value, transmits the log score correction message to each of the information processing apparatuses 301, and ends the operation flow.
According to the operation flow of
In the operation flow of
For example, since the task 101 having a high log score notified by the tabulation request response message 1300 is estimated to output a log frequently, even if the amount of log to be output is greatly reduced more than that of other tasks, it is highly likely that a log sufficient for analyzing a failure may be acquired. For example, in a case where the correction value W obtained in Equation 8 is 0.8, a lower correction value (for example, 0.7) or the like may be set for the task 101 having a high log score. On the other hand, since the task 101 having a low log score notified in the tabulation request response message 1300 may have a small number of logs to be output, if the number of the logs is smaller than that, the logs available at the time of the occurrence of the failure or the like may become insufficient. For example, in a case where the correction value W obtained in Equation 8 is 0.8, a higher correction value (for example, 0.9) or the like may be set for the task 101 having a low log score. In this manner, the control unit 401 may adjust the correction value based on the log score for each task 101.
As described above, according to the above embodiments, when there is a task 101 having a high possibility of a failure, the output frequency of logs of tasks 101 around the task 101 is increased, so that it is possible to effectively collect a trace log or the like. As a result, it is possible to efficiently find the failure in which the plurality of tasks 101 are affected.
Although the example of the embodiment has been described, the embodiment is not limited to this. For example, the operation flows described above are exemplary, and the embodiment is not limited to these. Where possible, the order of performing the types of processing in the operation flows may be changed, the operation flow may further include a different type of processing, or a subset of the types of the processing may be omitted. For example, the processing of S1402 and the processing of S1403 in
In the log score calculation in the above-described embodiment, the elapsed time after the task 101 is provided in the distributed stream data processing platform 300 may be taken into consideration. For example, in situations where the task 101 is not provided, it is highly possible that a bug or the like that is not expected by the developer of the task 101 is included. The control unit 401 may perform processing for correcting the log score of the task 101, which has been just provided in this manner, to a higher value.
In the above embodiments, an example in which the higher the log score value as a numerical value, the higher the log output frequency is described. However, the embodiments are not limited thereto, and for example, the log score may be evaluated in such a manner that the smaller the log score value, the higher the log output frequency. In this case, the small value of the log score may indicate the high log score.
In the above embodiments, an example in which the average value is used as representative values representing the log scores of the plurality of upstream tasks 101 and the log scores of the plurality of downstream tasks 101 is described, but the embodiments are not limited to this example. For example, in another embodiment, other statistics, such as a median value, may be used as a representative value.
In the embodiments described above, the message between the tasks illustrated in
In the above embodiments, an example in which the log score is corrected by a correction value is described, but the embodiments are not limited thereto, and the output frequency of the logs of the task 101 may be adjusted by using the correction values for correction of other values. For example, in another embodiment, in place of the processing of S1406 described above, in S1501, processing for correcting the output interval Ninterval of the log by the inverse of the correction value W may be executed in the following Equation 9. In this case, the control unit 401 may execute the processing in S1502 using the obtained Nc instead of the Ninterval.
N
c
=N
interval×1/W Equation 9
In the above embodiments, in the processing of S1401, the control unit 401 of the information processing apparatus 301 operates as, for example, the acquisition unit 411. The control unit 401 of the information processing apparatus 301 may operate as the output unit 412 in the processing of, for example, from S709 and S1402 to S1406. In the processing of the operation flow of
The processor 1701 may be, for example, a single processor, a multiple processor, or a multi-core processor. The processor 1701 uses the memory 1702 to execute a program in which the procedures for the aforementioned operation flows are described, thereby enabling some or all of the functions of the aforementioned control unit 401 illustrated in
The memory 1702 is, for example, semiconductor memory and may include a ROM area and a RAM area. The storage device 1703 is, for example, a hard-disk drive, semiconductor memory such as flash memory, or an external storage device. The RAM is an abbreviation for a random-access memory. The ROM is an abbreviation for a read-only memory.
The reader 1704 accesses the removable storage medium 1705 in accordance with an instruction from the processor 1701. For example, the removable storage medium 1705 is realized by a semiconductor device (USB memory or the like), a medium (a magnetic disk or the like) to which information is input and output by a magnetic action, a medium (CD-ROM, DVD, or the like) to which information is input and output by an optical action. The USB is an abbreviation for a Universal Serial Bus. The CD is an abbreviation for a compact disc. The DVD is an abbreviation for a digital versatile disk.
The information processing apparatus 301 and the storage unit 402 of the management apparatus may include, for example, a memory 1702, a storage device 1703, and a removable storage medium 1705. For example, the score information 900, the task execution information 1000, the log count information 1100, and the log score correction message 1200 illustrated in FIGS. 9-12 may be stored in the storage device 1703 of the information processing apparatus 301. For example, the log score correction message 1200 and the tabulation request response message 1300 may be stored in the storage device 1703 of the management apparatus.
The communication interface 1706 transmits and receives data via a network, in accordance with an instruction of the processor 1701. The communication interface 1706 is, for example, an example of the above-described communication unit 403. The input and output interface 1707 may be, for example, an interface between an input device and an output device. The input device is, for example, a device accepting instructions from the user such as a keyboard or a mouse. For example, the output device is a printing device such as a display and a sound device such as a printer.
Each program according to the embodiment is provided to the information processing apparatus 301 and the management apparatus in the following form, for example.
1) A subset or all of the programs are installed in advance in the storage device 1703.
2) A subset or all of the programs are provided from the removable storage medium 1705.
3) A subset or all of the programs are provided from a server such as a program server.
The hardware configuration of the computer 1700 for realizing the information processing apparatus 301 and the management apparatus, which is described with reference to
Some embodiments have been described above. However, the embodiments are not limited to the above-described embodiments. It is to be appreciated that the embodiments include a number of types of variations and alternatives of the above-described embodiments. For example, it would be appreciated that various types of embodiments are able to be embodied by modifying the elements without departing from the scope of the gist of the embodiments. It would also be appreciated that various types of embodiments are able to be implemented by appropriately combining a plurality of the elements disclosed according to the above-described embodiment. Also, one skilled in the art would appreciate that various types of embodiments are able to be implemented by deleting or replacing a subset of the elements out of all the elements described according to the embodiment or adding an element or elements to the elements described according to the embodiment.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2019-102944 | May 2019 | JP | national |