The present examples relate to techniques and processes for correlating input file sizes to consumption of scalable resources.
Mainframe computing is a computing platform used today by the largest companies in the world. A mainframe often processes many workloads such as accounts receivable, general ledger, payroll and a variety of applications needed for specific business requirements. These workloads are commonly referred to as jobs.
A mainframe environment includes databases which may be sequential files that can be accessed in any order through indexing. A sequential file is a file generally containing records of the same structure and length. Sequential files are processed sequentially from the start of the file to the end of the file. These sequential files typically reside on a direct access storage device (DASD) or disk drive. Regardless of the storage medium, sequential files are processed by a customer program that performs data manipulation on the record in the sequential files and storing the manipulated record in output files which are typically allocated predefined sizes that may not be optimal for the manipulated record. Conventionally, it is not known how much output file space is required for the output of the program, how long the program will take, or how many resources it will consume.
A system for use in predicting resources required for a program. The system comprising a processor, a storage device accessible by the processor, and a sequential file prediction program that when executed by the processor configures the system to access a history file to determine sizes of past sequential input files input to a customer program and sizes of resultant past sequential output files produced by the customer program processing the sequential input files, determine a correlation between the sizes of the past sequential input files and the resultant sizes of the past sequential output files, utilize the correlation to predict future consumption of the scalable resources including future sizes of future sequential output files based on the current sizes of current sequential input files, and utilize the predicted future consumption of the scalable resources to perform at least one of memory allocation or to determine scheduling for batch jobs being performed by the system, wherein the scalable resources include at least one of processing time and memory allocation.
The drawing figures depict one or more implementations in accord with the present concepts, by way of example only, not by way of limitations. In the figures, like reference numerals refer to the same or similar elements.
In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. However, it should be apparent to those skilled in the art that the present teachings may be practiced without such details. In other instances, well known methods, procedures, components, and/or circuitry have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present teachings.
Many software programs, including International Business Machine (IBM)® z/OS programs operate on sequential files and perform collations on the records within these files. A program may take multiple input files and produce multiple output files. The size of the output produced by these programs is conventionally not known before execution, but for practical reasons, the output files cannot grow indefinitely (e.g. their size is defined upon their creation). If the output of a program exceeds the space allocated to the output file, the program will fail and terminate in an error state.
A program (e.g. z/OS program) may fail to complete or may have decreased performance (indicated by increased processing time) if it is limited by some resource. For example, a program may fail to complete if it is limited in output file size or working memory size. If a program is CPU intensive, it may take longer to process when other CPU intensive programs are running. If the operating system knew beforehand what the resource requirements of a program were before executing it, it could allocate sufficient resources (e.g. CPU time, I/O time, memory allocation, etc.) to the job, or re-schedule the execution to minimize load on the system.
A sequential file is a file generally containing records of the same structure and length. Sequential files are processed sequentially from the start of the file to the end. These sequential files typically reside on a direct access storage device (DASD) or disk drive. Regardless of the storage medium, sequential files are typically processed by a customer program by performing data manipulation on the record in the sequential files and storing the manipulated record in output files of predefined sizes.
In an example, a record in a sequential file may contain information about a transactional event, such as credit card transactions, phone calls, purchases of a security to name a few. The customer program (e.g. credit card company program) may process these sequential files by reading the sequential files to collate the records into new records written to output sequential files. Some examples of possible collations include pulling specific transaction details from each record necessary for a particular department (accounting, sales, logistics, etc.) within the credit card company, producing a record summarizing transactions per unique entity, and merging data about each transaction from two different sources (such as matching purchase orders to sell orders).
The customer program is generally designed to execute as part of a batch job which defines a series of steps, each executing the customer program and providing the program references to sequential files via a set of parameters (e.g. “DD statements”) that describe a data set (name, size, etc.) and to specify the input and output resources needed for the data set. A single customer program may perform different functions depending on what parameters are provided. However, the parameters provided to a customer program are unlikely to change for a particular step within a particular job, defined by the job name, step name, and optionally procedure step name. A procedure is a series of steps that can be used across multiple jobs and may have parameters affecting its execution. The names of DD statements are specific to a customer program, and provide a layer of abstraction so the customer program does not need to know the specific name of the sequential file or where it resides. A DD statement may map to an input file or to an output file. The size of the output file is defined upon creation, and if the output is larger than the size of the output file, the customer program may fail to complete. In addition, the size output of the customer program is generally correlated to the processing time (e.g. CPU load, input/output (I/O) time, combination of CPU/IO time, etc.), and working memory required to process the dataset record. Thus, a means for using the DD statement to predict an appropriate output file size for optimal memory allocation (e.g. allocating optimal output file sizes), and for predicting processing resources for determining system timeliness would be beneficial to the customer (e.g. how much memory and processing time is needed for pulling, manipulation and storing transaction details from sequential files).
In an example mainframe environment, a job is a collection of programs, or job steps, instructing a processor to perform particular functions. When a job is represented as a sequence of job control language (JCL) statements, each job step is defined by an EXEC statement referencing a program and supported by DD statements which represent the datasets to be processed. The DD statements may include information such as the dataset name, the dataset volume (e.g. where the dataset is stored) and the dataset size among others. While a job includes one or more programs, or job steps, a single job step may, in turn, represent a procedure, program and/or other process that includes one or more steps, functions, and/or other components that each perform some form of processing. Thus, a job step is a single entry within a job, but is not limited to performing a single step of processing. A job is also often referred to as a batch job, because a collection of JCL is processed in batch by the processor. In this mainframe environment, jobs are continually being processed. These jobs regularly reference datasets (e.g., sequential files) as part of their processing. As such, at any given point in time, some number of jobs may have some number of datasets opened and may be processing data related to those opened datasets.
As shown in
Rather than allocating output file DDs of a predefined size that may or may not be optimal for storage of the customer program output, prediction of the output file sizes would be beneficial.
Rather than executing a job at a pre-determined time when the current load on the system may be high, prediction of the processing resources necessary to execute a job, and rescheduling the job for a later time when system load is lower would be beneficial. If a job is initiated and it is predicted to use a large amount of working memory while available working memory on the system is low, the operating system may choose to schedule the job for a later time when more working memory is available for the job.
The analysis of the input file DD information for predicting the optimal sizes for the one or more output files, and predicting the consumption of other scalable resources (e.g. DDname, CPU time, I/O time, memory allocation, etc.) can be performed by a number of different methods. These methods including linear regression techniques and neural network techniques, among others, that determine a predictive function based on historical file data indicating known relationships between past input file sizes and their corresponding resource requirements. These prediction techniques are described below.
A goal of the predictive solution is to analyze historical data of job executions to create a predictive model, computing the quantity of one or more scalable resources required by a customer program, given the size of input DDs. Some examples of a scalable resource include CPU time, real world time, the size of an output DD, and amount of working memory used. Historical data of all jobs are kept in SMF records. From these SMF records, the system can extract the job name, procedure name, step name, customer program name, a list of input and output DDs and their sizes, CPU time, run time, number of I/O operations, and more.
The job, procedure, step, and customer program names are used as an identifier to build a predictive model. A model may not be tied to a customer program name alone, as it may perform different operations depending on the parameters, and these parameters may not be accessible from the SMF records. However, the parameters provided to a customer program are unlikely to change within one specific job, so the behavior of a customer program is likely to remain consistent from one job execution to the next.
More specifically, the system uses this historical data to train one of the prediction algorithms in steps 404-408 which could include linear regression for all input/output file sizes, linear regression for a targeted resource (e.g. the size of a specific output DD, or IO time), and a neural network, which may predict all resources used in one algorithm. Once the linear function or neural network is trained, the system may then use the prediction algorithm in step 410 to accurately predict future resource consumption based on future input file sizes to be processed by the customer program. Developing/Training the linear equation and neural network for resource consumption prediction is described in more detail in
For example, a linear regression could be performed using the input file size as an explanatory variable and the output file size as the predictor variable. If the hypothesis holds true for the program step in the historical file, then a correlation between input and output file sizes can be established so that the output size can be predicted for future files. In one example, the customer program may reduce/increase the output file size so a variable of percent change between input and output may also be computed and recorded. In another example, the customer program splits and merges records, and has multiple inputs and outputs. This type of customer program may need multiple passes through the previously stated linear regression.
The results of step 504 in
Rather than producing linear functions for all possible numbers of input files versus all possible numbers of output files, the system could perform linear regression of all input file sizes for a particular consumable, scalable resource (e.g. specific DDname or working memory).
For the linear regression there may be more than N factors for N input DDs. For example, for each input DD, X, k transformations can be applied to it (e.g. X2, X3, log(X), sqrt (X), etc.) which are then input into the linear equation. As a result, the total number of inputs would be k*N, k being the number of transformations applied to each input DD. The linear equation would then output the true size for one output DD.
Regardless, if there is a linear relationship between the size of inputs and a consumable resource, a linear equation can be constructed that multiplies the size of every input DD (sn) by some weight (wn) and sums the values together to get the quantity of a desired resource:
If a non-linear relationship exists between the size of inputs and a consumable resource, a set of transformation functions (ƒk) can be applied to each input size to produce a set of inputs. Each input DD could also be passed through a transformation function ƒ (such as squaring, cubing, or taking the logarithm) and provided as another input to open the possibility of non-linear relationships. If there are k transformations applied to 12 input DDs:
The transformations applied can be summarized as follows:
The weights are specific to one resource, and other resources will have a different set of weights used. Once a set of weights has been determined for every desired resource, the system can calculate prediction resource usage given a list of input sizes. Assuming there are n input DDs and m output DDs, the sizes of output DDs can be predicted using the following operation:
A biasing constant can also be added to the equation by including another column of weights and adding the value 1 to the end of the input size vector .
The weights of this equation can be computed by analyzing the historical file data from the SMF record. By looking at the sizes of input and output DDs for each run of a customer program, optimal weights that best predict the output DD sizes can be computed.
Only one linear regression is required per output DD per customer program. Finding the right combination of weights (wn) will result in a function that can compute an output size given a list of input sizes. If there are n unique input DDs across k runs of a particular customer program, the system can construct this set of linear equations for each output DD:
Which can be written as:
k denotes the index of a historical execution of a customer program and is used to avoid confusion with m used above, which denotes the index of a unique output DD. The system can solve for using the following equation:
This will result in the smallest difference of squares between S* and
. While computing S+ will be computationally expensive, it only needs to be computed once per customer program. Then for each output DD of that customer program, the vector of all its historical sizes can be multiplied by S+ to get the weights vector. Once all
vectors have been solved, the row vectors can be multiplied by a vector of known input sizes to predict the sizes of all output DDs.
Another approach to solving for is using gradient descent.
Gradient descent is an iterative process, where the partial derivative of a cost function C is taken in terms of each weight in order to create a gradient. This gradient is then used to adjust all weights in order to step towards a minimum in the cost function:
Where α is the “learning rate”, denotes the weight vector at a particular iteration of training, wt,n denotes a value within a weight vector, and Ci denotes the cost function associated with a particular training observation out of k observations.
A gradient vector, like a derivative, describes the rate of change. This vector describes the direction and rate of fastest increase. By subtracting the gradient from the weights, the weight vector moves in the direction of fastest decrease in order to move towards a minimum. This is done iteratively, taking small steps down the gradient and re-evaluating at the next point in order to prevent overshooting a minimum.
A single neural network can be made to predict the size of all output DDs at once. This neural network would be constructed to have one input layer (one set) of input nodes, one output layer of output nodes, and multiple layers of nodes in between the input and output. In each layer, each node receives input from all nodes in the previous layer and computes a set of linear equations on these inputs. The resulting value is then passed to each node in the next layer for it to compute its own set of linear equations. Very complex relationships can be modeled, and external variables such as parameters or flags passed to the customer program can also be encoded as input to the network, which it can then account for.
The input to this algorithm can take many forms. For example: a) each input node may receive the real size of one input DD, b) the size of one input DD may be split across multiples input nodes, c) each input node may be a binary digit of the size of an input DD, each input node may be a binary digit relating to an order of magnitude of the size, and each input node may be normalized size of an input DD (scaled to be between 0 and 1). The input and output data will then be manipulated to convert between real sizes and data the algorithm uses during processing.
Gradient descent can also be used to train the algorithm as mentioned in the previous proposal to find the right weights and biases for each node. By providing the algorithm with the historical data, a cost function can be evaluated for each data point to grade how close the algorithm is to the desired result. By evaluating the cost function over many historical data points, a gradient can be computed to determine what weights and biases need to change in the network to minimize the cost function. Since the gradient may change over the vector space of all weights and biases, a small step is taken down the gradient, and the process is repeated many times.
There are several advantages to using a neural network. They can model more complex relationships between the inputs and outputs, one model can solve for all input and output DDs of a particular customer program, and parameters that influence output size can be encoded as input to the network which it can then account for. Some parameters may include, input file sizes, logical record length and block size of input files, record format, day of week/year of job execution, parameters passed to the customer program, encoded as on-off flags or as an enumerated value, names of DDs, which could be encoded as an index, duration of job execution.
Some options for a neural network include: a) a network with no data normalization of any kind. In this case, each input and output node would be equal to the unbounded size of the corresponding input/output DD. This is essentially a multi-leveled linear equation solver, where each layer of the network is a set of linear equations exactly like the first example, b) sizes of each input and output DD would be encoded into binary, with each input/output node of the network corresponding to one specific bit of the size of one specific input or output DD, c) normalizing input data using logarithms or z score normalization of the values and possibly passing the resulting values through normalization functions such as sigmoid or tanh. Using logs will dramatically reduce the range of input sizes while maintaining a meaningful difference between extremely large or extremely small data sets. Using a typical normalization function (such as sigmoid) without logarithms would result in negligible differences between values at the extreme, even if they vary by a large quantity, d) each input and output node correspond to a specific order of magnitude. This is a generalization between the two above methods. This has an advantage in that the system can create buckets to put each DD into based on their size, and instead of requiring 40 bits (nodes) just to describe a single 1 TB DD, the system can encode it into 10 bits or even fewer. Using this approach may provide a better distinction between the sizes of inputs resulting in a more accurate model.
Once data normalization is done, operation of the network is essentially the same. A neural network is conceptually split up into layers, with each layer being comprised of a set of nodes. There is one input layer, one output layer, and several middle or “hidden” layers. Starting with the first middle layer, each node in a layer receives input from every node in the previous layer. Each node has a set of weights () it multiplies the values (
) by and adds a bias b:
If there are m nodes in the previous layer, and n nodes in the current layer, a layer can be processed using the following operation:
The top equation is the function of each node in the network, while the bottom equation is an equation to compute one layer of the network. σ is an activation function which helps to limit the range of the output. The subscripts x0 and x1 serve to differentiate between the output of the previous layer, and the output of the current layer. The output of each layer is provided as input to the next, possibly passing through some sort of activation function which serves to limit the range of outputs of the node. Activation functions help to prevent values from growing too large and either becoming meaningless to the network or overflowing the limits of the data type size.
Stochastic gradient descent is used to train a neural network by gradually approximating the optimal weights and biases to model the relationship between input and output. By combining all weights and biases from all layers into one vector, a gradient of the cost function can be computed in terms of all weights and biases. Note that the partial derivative of the cost function will be different from layer to layer, as layers downstream will change the impact a particular weight or bias has on the cost function.
Once the linear functions are computed, the system chooses a linear function for a given input/output file combination for a DD name of the customer program, inputs the input file size for a given DD name to be processed by the customer program to the chosen linear function, computes predicted output file size as a result of processing the DD name file, and uses predicted output file size for memory allocation and processing time prediction.
It is noted that a mainframe, for example, includes a data communication interface for packet data communication and an input/output (I/O) controller. The I/O controller manages communication to various I/O elements and storage facilities. Storage facilities include one or more direct access storage devices (DASD) and/or one or more tape systems. Such storage facilities provide storage for data, jobs for managing batch processing and applications. The mainframe includes an internal communication bus providing a channel of communication between the communications ports, the I/O controller, and one or more system processors. Each system processor includes one or more central processing units (CPUs) and local memory corresponding to each CPU, as well as shared memory available to any CPU. An operating system (OS) executed by the system processors manages the various jobs and applications currently running to perform appropriate processing. The OS also provides a system management facility (SMF) and open exit points for managing the operation of the mainframe and the various jobs and applications currently running. The hardware elements, operating systems, jobs and applications of such mainframes may be conventional in nature. Of course, the mainframe functions may be implemented in a distributed fashion on a number of similar platforms, to distribute the processing load, and/or replicated across one or more similar platforms, to provide redundancy for the processing. As such,
It is also noted that a computer type user terminal device, such as a PC, similarly includes a data communication interface CPU, main memory and one or more mass storage devices for storing user data and the various executable programs. The various types of user terminal devices will also include various user input and output elements. A computer, for example, may include a keyboard and a cursor control/selection device such as a mouse, trackball, or touchpad; and a display for visual outputs. The hardware elements, operating systems and programming languages of such user terminal devices may be conventional in nature.
Hence, aspects of the methods for recording the time each step within a job executing within a mainframe computing environment begins and ends outlined above may be embodied in programming. Program aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of executable code and/or associated data that is carried on or embodied in a type of machine readable medium. “Storage” type media include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through a global information network (e.g. the Internet®) or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the mainframe platform that will execute the various jobs. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. As used herein, unless restricted to non-transitory, tangible “storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.
Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to hold datasets of sequential files and customer/prediction programs for enterprise applications. Volatile storage media include dynamic memory, such as main memory of such a computer platform. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge or any other medium from which a computer can read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
It will be understood that the terms and expressions used herein have the ordinary meaning as is accorded to such terms and expressions with respect to their corresponding respective areas of inquiry and study except where specific meanings have otherwise been set forth herein. Relational terms such as first and second and the like may be used solely to distinguish one entity or action from another without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms “comprises,” “comprising,” “includes,” “including,” or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element proceeded by “a” or “an” does not, without further constraints, preclude the existence of additional identical elements in the process, method, article, or apparatus that comprises the element.
Unless otherwise stated, any and all measurements, values, ratings, positions, magnitudes, sizes, and other specifications that are set forth in this specification, including in the claims that follow, are approximate, not exact. They are intended to have a reasonable range that is consistent with the functions to which they relate and with what is customary in the art to which they pertain.
While the foregoing has described what are considered to be the best mode and/or other examples, it is understood that various modifications may be made therein and that the subject matter disclosed herein may be implemented in various forms and examples, and that they may be applied in numerous applications, only some of which have been described herein. It is intended by the following claims to claim any and all modifications and variations that fall within the true scope of the present concepts.
This application claims priority to U.S. Provisional Application No. 63/334,362, filed Apr. 25, 2022, and the contents of which are incorporated herein by reference in their entireties for all purposes.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/EP2023/060705 | 4/24/2023 | WO |
| Number | Date | Country | |
|---|---|---|---|
| 63334362 | Apr 2022 | US |