The present disclosure generally relates to allocating resources of computer processors. More specifically, the present disclosure relates to optimizing the allocation of computing resources (e.g., Graphics Processing Unit (GPU)) in a production workspace, so that a utilization rate of the computing resources is increased.
Rapid advances have been made in the past several decades in the fields of computer technology and artificial intelligence. For example, GPU cards—as an example of computer processors—have been used to perform tasks associated with data analysis, such as machine learning model training, the non-limiting examples of which may include image classification, video analysis, natural language processing, etc. Since GPU cards are expensive, it is desirable to fully utilize the resources of the GPU cards (e.g., with as low of an idle rate as possible for the GPU cards). However, existing schemes of using GPU cards to perform machine learning model training have not optimized the allocation of the computing resources of the GPU cards. Consequently, the GPU cards (or portions thereof) may have an excessively high idle rate, which leads to undesirable waste.
Therefore, although existing schemes of allocating computing resources have been generally adequate for their intended purposes, they have not been entirely satisfactory in every aspect.
The accompanying drawings, which are included to provide further understanding and are incorporated in and constitute a part of this specification, illustrate disclosed embodiments and, together with the description, serve to explain the principles of the disclosed embodiments. In the drawings:
In the following description of the various embodiments, reference is made to the accompanying drawings identified above and which form a part hereof, and in which is shown by way of illustration various embodiments in which aspects described herein may be practiced. It is to be understood that other embodiments may be utilized, and structural and functional modifications may be made without departing from the scope described herein. Various aspects are capable of other embodiments and of being practiced or being carried out in various different ways.
Rapid advances in the fields of computer technology have produced increasingly powerful computer processors, such as Graphics Processing Units (GPUs). One advantage of GPUs is that they are configured for parallel processing, which is suitable for processing vast amounts of data typically involved in artificial intelligence (AI), such as in a machine learning context. GPU cards are expensive, and therefore it is desirable to utilize them fully when the vast amounts of data need to be processed in a production environment. However, existing schemes of allocating GPU resources have not been able to accurately predict how much GPU resources are needed to execute each task (e.g., a given machine learning job). As a result, significant portions of a GPU card (that has been assigned to execute a data analysis task) may sit idly, which is a waste of precious GPU resources.
The present disclosure overcomes the above problems with the use of a research workspace that is configured for research and/or experimentation, as well as a production workspace that is used in an actual production environment. For example, a first version of a data analysis task (e.g., a machine learning job with a small amount of training data) is first executed in a research workspace. The research workspace may have a plurality of virtualized resource units that each correspond to a small amount of computing resources offered by a physical device. For example, each of the virtualized resource units may represent an amount of processing power equivalent to a small fraction of a physical GPU card. Statistics may be gathered from the execution of the first version of the data analysis task. For example, the gathered statistics may include a size of the training data, the number of the virtualized resource units used to execute the task, the idle rate of the virtualized resource units during the execution, the total amount of execution time, etc. A second version of the data analysis task (e.g., a machine learning job with the same algorithm as the one executed in the research workspace but with a much greater amount of training data) may then be sent to the production workspace for execution. The statistics gathered via the research workspace may then be used to determine the resource allocation for the second version of the data analysis task in the production workspace. For example, if a maximum amount of time for completing the second version of the data analysis task in the production workspace is specified (e.g., by a Service Level Agreement (SLA)), a calculation may be made as to how many physical GPU cards, and/or what portions of the physical GPU cards, should be allocated to execute the second version of the data analysis task in order for the task to be completed within the specified amount of time. Such a calculation is made so that the idle rate of the GPU cards is low (e.g., a high utilization rate of the GPU cards), which reduces waste of resources and improves efficiency. As such, the concepts of the present disclosure are directed to improvements in computer technology. since the computing resources such as physical GPU cards are allocated to the relevant data analysis tasks with enhanced precision. In addition, since electronic resource waste is reduced, the overall cost associated with the architecture of the present disclosure is also reduced.
Still referring to
The jobs that come through the job de-duplicator 120 (e.g., the jobs that were not removed by the job de-duplicator 120) are then sent to a job queue 130 according to the process flow 100. In some embodiments, the job queue 130 includes a temporary data storage, in which the jobs may temporarily reside before they are sent away for further processing. In some embodiments, the job queue 130 may employ a first-in-first-out (FIFO) methodology for receiving and outputting the jobs. In other embodiments, the job queue 130 may employ alternative methodologies that are different from the FIFO methodology. For example, the jobs may be designated with their respective priority levels. The jobs that have higher priority levels may be outputted sooner than the jobs that have lower priority levels.
Still referring to
Due to the unique nature of the research workspace 140, the present disclosure configures the computing resources allocated to the research workspace 140 by virtualizing them into a plurality of small units. For example, in the embodiment shown in
Similarly, the virtualized CPU and the virtualized electronic memory may include small portions of one or more physical CPU cards and one or more physical electronic memory, respectively. In that regard, the physical CPU cards and/or electronic memory may be considered parts of physical computing resources 160 that also includes the physical GPU cards 161-164. In some embodiments, the physical computing resources 160 may includes computing resources (e.g., GPUs, CPUs, electronic memories) in a centralized environment, such as a data center. In other embodiments, the physical computing resources 160 may include computing resources (e.g., GPUs, CPUs, electronic memories) in a decentralized environment, such as in an edge computing environment. For example, edge computing may refer to a computing paradigm that performs computing and/or data storage at nodes that are at or close to the end users, where the sources of data originate. Since edge computing allows computing of the data to be closer to the sources of the data, the computing may be done at greater speeds and/or larger volumes. As a result, response times and/or bandwidth may be improved.
In any case, regardless of what is included in the physical computing resources 160 or its context, it is understood that the virtualized computing resource units 151-154 each correspond to a small fraction (e.g., a fraction substantially less than 1) of the processing power offered by the physical computing resources 160. In some embodiments, additional and/or different types of computer hardware (e.g., other than CPU, GPU, or electronic memory) may also be included in each of the virtualized computing resource units 151-154. In addition, although four of such virtualized computing resource units 151-154 are illustrated herein for reasons of simplicity, it is understood that many more of the virtualized computing resource units 151-154 (e.g., hundreds, thousands, or tens of thousands) may be implemented in the research workspace 140. Furthermore, it is understood that although the virtualized computing resource units 151-154 illustrated in
Regardless of how many virtualized computing resource units are implemented, or what type of virtual resources are included in each of then, it is understood that they may be utilized individually or collectively to execute the data analysis tasks within the research workspace 140. For example, one of the jobs that is sent to the research workspace 140 may include a job 165, which may include a plurality of attributes. As a non-limiting example herein, the job 165 has a job identifier (ID) that is “sample_job_id.” Such a job ID is used to identify the job 165 from a plurality of other jobs in the research workspace 140. The job 165 has a job name that is “predict backorder rate.” As the name indicates, a goal of the job 165 is to make a specific prediction, for example, a backorder rate of a product inventory of a particular merchant. The job 165 has a job start time that is “2022-03-21:14:24”, which indicates that the execution of the job 165 began on the date of March 21 of the year 2022, at the time of 14:24. The job 165 has a training data size of “100G”, which indicates there is a hundred gigabytes of data in the training data associated with the job 165. The job 165 has virtualized computing resource units “[1,2,3,4]”, which indicates that the virtualized computing resource units 151-514 (which correspond to the “virtualized computing resource units [1, 2, 3, 4]” herein) are used to execute the job 165. The job 165 may also have a plurality of other attributes, but these attributes are not specifically discussed herein for reasons of simplicity.
The research workspace 140 keeps electronic records of each job executed therein and the combinations of virtualized computing resource units utilized to execute the jobs. In some embodiments, these electronic records are maintained in the form of a table 170, which is shown in
The table 170 includes a plurality of rows, where each row includes the data of a record for a different one of the virtualized computing resource units 1-4 (e.g., corresponding to the virtualized computing resource units 151-154). The table 170 also includes a plurality of columns, where each column represents a different aspect of the records. For example, the column “virtualized computing resource units” lists the names of the virtualized computing resource units (e.g., the virtualized computing resource units 151-154) used to execute the job 165. The column “Is occupied” lists the occupied status of each of the different ones of the virtualized computing resource units 1-4. For example, a status of “TRUE” indicates that the corresponding virtualized computing resource unit is occupied (e.g., being utilized to execute the job 165). Conversely. A status of “FALSE” indicates that the corresponding virtualized computing resource unit is not used to execute the job 165. The column “Last occupied time” indicates the date and time in which the corresponding virtualized computing resource unit is utilized to execute the job 165. The column “Idle rate” indicates a percentage of time in which the corresponding virtualized computing resource unit is in an idle position, rather than being used to execute the job 165. For example, an idle rate of 95% for the virtualized computing resource unit 1 means that 95% of the time, the virtualized computing resource unit 151 is not being used to execute the job 165. Alternatively stated, the idle rate of 95% indicates that the virtualized computing resource unit 151 is used 5% of the time to execute the job 165. It is understood that the idle rate may be replaced by a utilization rate in alternative embodiments. In that case, a column of “utilization rate” would be 5%, 4%, 2%, and 5% in the example herein. In some embodiments, the table 170 may include both an idle rate and a utilization rate. Of course, it is understood that a goal of the present disclosure is to increase the utilization rate and to decrease the idle rate, so as to increase the efficiency and to reduce the waste of using the computing resources to perform data analysis. The table 170 may include additional columns that represent additional aspects of the records kept for the virtualized computing resource units 1-4 during their execution of the job 165. However, these additional columns are not discussed herein for reasons of simplicity. The content of the records maintained by the table 170 may also be referred to as the meta-data of the job 165.
In some embodiments, the table 170 is dynamically updated (e.g., in real-time) throughout the execution of the job 165 in the research workspace 140. At some point, the job 165 may be completed. In some embodiments, the execution of the job 165 may be deemed complete when the data scientist 110 that submitted the job 165 sends an indication to the research workspace 140. For example, the data scientist 110 may trigger the indication by pressing a button in a software program interface configured to interact with the research workspace 140. This may occur when the data scientist 110 is satisfied with the results of the job 165. For example, the job 165 may offer a backorder prediction accuracy above a threshold defined by the data scientist 110.
Regardless of how the job 165 is completed, the data maintained by the table 170 for the job 165 may be useful when computing resources need to be allocated to jobs similar to the job 165 in a production environment. For example, if a job that is much larger in size but otherwise substantially identical in terms of algorithm to the job 165 needs to be executed in a production environment, the amount of computing resources needed to execute that job may be calculated based on the larger data size of that job, the data size of the job 165, and the computing resources used to execute the job 165 in the research workspace 140. In some embodiments, the amount of computing resources needed to execute the larger job may be extrapolated based on a linear relationship with the amount of computing resources used to execute the job 165 in the research workspace 140, as will be discussed in later below.
Still referring to
When the job 165 is promoted and sent to the production workspace 180, it may include a substantially similar version (e.g., in terms of algorithm) of the task that was submitted to and executed in the research workspace 140, but with a substantially larger data size. For example, whereas the job 165 in the research workspace 140 has 100 gigabytes of data, the promoted version of the job 165 send to the production workspace 180 may have terabytes, petabytes, or exabytes of data. Due to the large size of the data of the promoted version of the job 165, its execution may involve multiple physical GPU cards 161-164 and the resource allocation of which may need to be optimized. As discussed below in more detail, such a resource allocation optimization may include extrapolating, based on the computing resources used to execute the original version of the job 165 and difference between the data sizes of the original version of the job 165 and the promoted version of the job 165, the amount of computing resources needed to execute the promoted version of the job 165.
When the promoted version of the job 165 is sent to the production workspace 180, it first resides in a queue 185. Similar to the job queue 130, the queue 185 may include a temporary data storage, so that the promoted version of the job 165 (along with other jobs) can be stored therein before they can be executed when the computing resources of the production workspace 180 become available.
The production workspace 180 allocates computing resources to execute the promoted version of the job 165 via a scheduler 190. The scheduler 190, which may include a software program in some embodiments, has access to the physical computing resources 160. For example, the scheduler 190 can assign portions of any one of the physical GPU cards 161-163 to any job that needs to be executed by the production workspace 180. To reduce waste, it is desirable to lower the idle rate of the physical GPU cards. Unfortunately, existing systems may not be able to provide an accurate estimate of how much GPU resources are needed to execute any given job. As a result, too much GPU resources may be allocated for a job, which results in a high idle rate of the physical GPU cards and therefore unnecessary waste of GPU resources. Alternatively, not enough GPU resources are allocated for the job, which may result in an unacceptably slow execution of the job.
The present disclosure optimizes the computing resource allocation by calculating, with enhanced precision, what percentage of each of the physical GPU cards 161-164 should be allocated to any given job in the production workspace 180, based on the history of execution of the smaller version of that job in the research workspace 140. Using the job 165 as a simplified example herein, suppose that the promoted version of the job 165 that needs to be executed in the production workspace 180 has 100 petabytes of data, which is a million times greater than the 100 gigabytes of data associated with the job 165 when it was executed in the research workspace 140. Also suppose that each of the virtualized computing resource units 151-154 used to execute the job 165 in the research workspace 140 has processing power that is equal to one hundred thousandth (or 0.001%) of the processing power of a single one of the physical GPU cards 161-164. In addition, suppose that the execution of the job 165 in the research workspace 140 using the combination of the virtualized computing resource units 151-154 took 2 hours, where the virtualized computing resource units 151-154 each had an idle rate of 25% (e.g., equivalent to a utilization rate of 75%, since the utilization rate=100%-idle rate).
The scheduler 190 then uses the above data to calculate how much computing resources to allocate to the promoted version of the job 165 in the production workspace 180. In some embodiments, the maximum amount of time given to finish the execution of the promoted version of the job 165 is already known, for example, via a SLA. As a simple example, the SLA may specify that the promoted version of the job 165 should be completed within 24 hours. Since the computing resources of the virtualized computing resource units 151-154 and the computing resources of the physical GPU cards 161-164 may have a linear relationship, as does the data sizes of the original version of the job 165 executed in the research workspace 140 and the promoted version of the job 165 to be executed in the production workspace 180, the optimal computing resources allocated for the execution of the promoted version of the job 165 in the production workspace 180 may be calculated as a simple algebraic equation of:
X=A*B*C*D*E, where:
Based on the above numbers, X=1,000,000*0.001%*4*(1/12)*0.75=2.5. This means that 2.5 physical GPU cards 161-164 should be allocated to the production workspace 180 to ensure that the promoted version of the job 165 can be completed within the allotted time (e.g., 24 hours in this simplified example). In some embodiments, the number X may be adjusted upwards slightly, for example, by a predefined percentage (e.g., 1% to 10%), in order to provide a margin of safety. For example, the margin of safety may allow the utilization rate of the allocated portions of the physical GPU cards 161-164 to fall slightly below 100% (e.g., having an idle rate slightly above 0%), and still ensure that the SLA-specified time limit of 24 hours is met.
Regardless of the exact value determined for X, it is understood that X represents the amount of computing processing power that is equivalent to the hardware processing power offered by X number of GPU cards (e.g., GPU cards 161-164). However, there are many different ways to apportion this amount of hardware processing power to execute the promoted version of the job 165 in the production workspace 180. For example, the scheduler 190 may allocate the physical GPU cards 161 and 162 in their entireties, as well as half of the physical GPU card 163, to execute the promoted version of the job 165 in the production workspace 180. As another example, the scheduler 190 may allocate the physical GPU card 161 in its entirety, as well as half of each of the physical GPU cards 162, 163, and 164, to execute the promoted version of the job 165 in the production workspace 180. As a further example, the scheduler 190 may allocate 80% of the physical GPU card 161, 70% of the physical GPU card 162, 40% of the physical GPU card 163, and 60% of the physical GPU card 164, to execute the promoted version of the job 165 in the production workspace 180. It is understood that in embodiments where a portion of a given physical GPU card 161-164 (but not the entire GPU card) is allocated, the average utilization rate discussed above may apply to the portion of the physical GPU card that is allocated. For example, if 40% of the physical GPU card 161 is allocated, then the average utilization rate of 100% discussed above may refer to the fact that the 40% of the allocated portion of the physical GPU card 161 is being utilized 100% of the time, regardless of what is happening with the rest of the physical GPU card 161 that is not allocated for the execution of the promoted version of the job 165.
In some embodiments, the physical GPU cards 161-164 may be divided into a plurality of GPU blocks 195. For example, each of the GPU blocks 195 may represent 10% (or some other suitable percentage) of a physical GPU card. These GPU blocks 195 may be substantially identical to one another and offer substantially identical processing powers as one another. As shown in
It is also understood that similar to the job 165, other jobs (e.g., other types of machine-learning jobs) may run through the research workspace 140 to allow data to be gathered on their execution, and then the promoted versions of these jobs may then be executed in the production workspace 180. The data gathered for these jobs in the research workspace 140 may be used to determine how the computing resources of the production workspace 180 should be allocated to these jobs. Since there may be multiple jobs at the production workspace 180 at any point in time, the scheduler 190 may utilize a plurality of scheduling schemes to determine the order of execution of these jobs. In some embodiments, the scheduler 190 may use a “shorter job first” scheduling scheme, in which the job with the shortest execution time (which may be allotted or projected) will be executed first. In some embodiments, the scheduler 190 may use a “round robin” scheduling scheme, in which time slots are assigned to each job (which may be done in equal portions) in a cyclic (e.g., circular) order, such that all jobs are handled without priority. In some embodiments, the scheduler 190 may use a “first-in-first-out (FIFO)” scheduling scheme, in which the job that comes out of the queue 185 first will be executed first. In some embodiments, the scheduler 190 may use a “shortest time to execute and complete first” scheduling scheme, in which the jobs that have the shortest time to be executed and completed will be executed first. In some embodiments, the scheduler 190 may use a “completely fair” scheduling scheme, which is the default scheduling algorithm in the Linux operating system.
Regardless of how the scheduler 190 schedules the different jobs and/or the exact manner in which the computing resources corresponding to the physical GPU cards 161-164 are allocated to each of the jobs in the production workspace 180, it is understood that the allocation of these resources is optimized by the present disclosure. For example, using the data gathered by executing a job with smaller data size in the research workspace 140, the amount of computing resources needed to execute a promoted version of that job with a much larger data size can be accurately determined. This allows each of the physical GPU cards 161-164 to be utilized as fully as possible, which leads to a lower idle rate and a reduction in waste of computing resources compared to existing systems.
In the embodiment shown in
User device 210, merchant server 240, payment provider server 270, acquirer host 265, issuer host 268, and payment network 272 may each include one or more electronic processors, electronic memories, and other appropriate electronic components for executing instructions such as program code and/or data stored on one or more computer readable mediums to implement the various applications, data, and steps described here. For example, such instructions may be stored in one or more computer readable media such as memories or data storage devices internal and/or external to various components of system 200, and/or accessible over network 260.
Network 260 may be implemented as a single network or a combination of multiple networks. For example, in various embodiments, network 260 may include the Internet or one or more intranets, landline networks, wireless networks, and/or other appropriate types of networks. Software programs (e.g., programs developed by the payment provider or by another entity) may be installed on the network 260 to facilitate the offer solicitation, transmission, and presentation processes discussed above. The network 260 may also include a blockchain network in some embodiments.
User device 210 may be implemented using any appropriate hardware and software configured for wired and/or wireless communication over network 260. For example, in one embodiment, the user device may be implemented as a personal computer (PC), a smart phone, a smart phone with additional hardware such as NFC chips, BLE hardware etc., wearable devices with similar hardware configurations such as a gaming device, a Virtual Reality Headset, or that talk to a smart phone with unique hardware configurations and running appropriate software, laptop computer, and/or other types of computing devices capable of transmitting and/or receiving data, such as an IPHONE™ or IPAD™ from APPLE™.
User device 210 may include one or more browser applications 215 which may be used, for example, to provide a convenient interface to permit user 205 to browse information available over network 260. For example, in one embodiment, browser application 215 may be implemented as a web browser configured to view information available over the Internet, such as a user account for online shopping and/or merchant sites for viewing and purchasing goods and/or services.
Still referring to
User device 210 also may include other applications 225 to perform functions, such as email, texting, voice and IM applications that allow user 205 to send and receive emails, calls, and texts through network 260, as well as applications that enable the user to communicate, transfer information, make payments, and otherwise utilize a digital wallet through the payment provider as discussed here. In some embodiments, these other applications 225 may include a mobile application downloadable from an online application store (e.g., from the APPSTORE™ by APPLE™). The mobile application may be developed by the payment provider or by another entity, such as an offer aggregation entity. The mobile application may then communicate with other devices to perform various transaction processes. In some embodiments, the execution of the mobile application may be done locally without contacting an external server such as the payment provider server 270. In other embodiments, one or more processes associated with the execution of the mobile application may involve or be performed in conjunction with the payment provider server 270 or another entity. In addition to allowing the user to receive, accept, and redeem offers, such a mobile application may also allow the user 205 to send payment transaction requests to the payment provider server 270, which includes communication of data or information needed to complete the request, such as funding source information.
User device 210 may include one or more user identifiers 230 which may be implemented, for example, as operating system registry entries, cookies associated with browser application 215, identifiers associated with hardware of user device 210, or other appropriate identifiers, such as used for payment/user/device authentication. In one embodiment, user identifier 230 may be used by a payment service provider to associate user 205 with a particular account maintained by the payment provider. A communications application 222, with associated interfaces, enables user device 210 to communicate within networked system 200.
In conjunction with user identifiers 230, user device 210 may also include a trusted zone 235 owned or provisioned by the payment service provider with agreement from a device manufacturer. The trusted zone 235 may also be part of a telecommunications provider SIM that is used to store appropriate software by the payment service provider capable of generating secure industry standard payment credentials as a proxy to user payment credentials based on user 205's credentials/status in the payment providers system/age/risk level and other similar parameters.
Still referring to
Merchant server 240 also may include a checkout application 255 which may be configured to facilitate the purchase by user 205 of goods or services online or at a physical POS or store front. Checkout application 255 may be configured to accept payment information from or on behalf of user 205 through payment provider server 270 over network 260. For example, checkout application 255 may receive and process a payment confirmation from payment provider server 270, as well as transmit transaction information to the payment provider and receive information from the payment provider (e.g., a transaction ID). Checkout application 255 may be configured to receive payment via a plurality of payment methods including cash, credit cards, debit cards, checks, money orders, or the like. The merchant server 240 may also be configured to generate offers for the user 205 based on data received from the user device 210 via the network 260.
Payment provider server 270 may be maintained, for example, by an online payment service provider which may provide payment between user 205 and the operator of merchant server 240. In this regard, payment provider server 270 may include one or more payment applications 275 which may be configured to interact with user device 210 and/or merchant server 240 over network 260 to facilitate the purchase of goods or services, communicate/display information, and send payments by user 205 of user device 210.
The payment provider server 270 also maintains a plurality of user accounts 280, each of which may include account information 285 associated with consumers, merchants, and funding sources, such as credit card companies. For example, account information 285 may include private financial information of users of devices such as account numbers, passwords, device identifiers, usernames, phone numbers, credit card information, bank information, or other financial information which may be used to facilitate online transactions by user 205. Advantageously, payment application 275 may be configured to interact with merchant server 240 on behalf of user 205 during a transaction with checkout application 255 to track and manage purchases made by users and which and when funding sources are used.
A transaction processing application 290, which may be part of payment application 275 or separate, may be configured to receive information from a user device and/or merchant server 240 for processing and storage in a payment database 295. Transaction processing application 290 may include one or more applications to process information from user 205 for processing an order and payment using various selected funding instruments, as described here. As such, transaction processing application 290 may store details of an order from individual users, including funding source used, credit options available, etc. Payment application 275 may be further configured to determine the existence of and to manage accounts for user 205, as well as create new accounts if necessary.
The payment provider server 270 may also include a computing resource allocation module 298 that is configured to optimize the allocation of computing resources in accordance with the process flow 100 discussed above. For example, the computing resource allocation module 298 may include modules to configure the research workspace 140 and/or the production workspace 180, including the virtualized computing resource units 150 and the physical computing resources 160 discussed above. The computing resource allocation module 298 may leverage the statistics extracted during the execution of a data analysis task (e.g., a machine learning job) in the research workspace 140 to determine how the computing resources of physical computing devices (e.g., GPU cards) should be allocated to execute the data analysis task in the production workspace 180. It is understood that although the computing resource allocation module 298 is shown to be implemented on the payment provider server 270 in the embodiment of
The payment network 272 may be operated by payment card service providers or card associations, such as DISCOVER™, VISA™, MASTERCARD™, AMERICAN EXPRESS™, RUPAY™, CHINA UNION PAY™, etc. The payment card service providers may provide services, standards, rules, and/or policies for issuing various payment cards. A network of communication devices, servers, and the like also may be established to relay payment related information among the different parties of a payment transaction.
Acquirer host 265 may be a server operated by an acquiring bank. An acquiring bank is a financial institution that accepts payments on behalf of merchants. For example, a merchant may establish an account at an acquiring bank to receive payments made via various payment cards. When a user presents a payment card as payment to the merchant, the merchant may submit the transaction to the acquiring bank. The acquiring bank may verify the payment card number, the transaction type and the amount with the issuing bank and reserve that amount of the user's credit limit for the merchant. An authorization will generate an approval code, which the merchant stores with the transaction.
Issuer host 268 may be a server operated by an issuing bank or issuing organization of payment cards. The issuing banks may enter into agreements with various merchants to accept payments made using the payment cards. The issuing bank may issue a payment card to a user after a card account has been established by the user at the issuing bank. The user then may use the payment card to make payments at or with various merchants who agreed to accept the payment card.
As discussed above, the computing resource allocation scheme of the present disclosure may apply to various types of data analysis tasks, such as machine learning. In some embodiments, machine learning may be used to predict and/or detect fraud. For example, nefarious entities may pose as legitimate users such as the user 205. Using training data such as data pertaining to the user's historical or current activities and/or behavioral patterns, a machine learning model may be trained to predict whether a seemingly legitimate user may actually be a bad-faith actor seeking to perpetrate fraud. In some other embodiments, machine learning may be used to predict metrics for merchants that operate the merchant server 240. For example, based on data pertaining to sales of goods of the merchant, a machine learning model may be trained to predict when to reorder the goods to refill the merchant's inventory.
In some embodiments, the machine learning processes of the present disclosure (e.g., the job 165 discussed above with reference to
In some embodiments, each of the nodes 316-318 in the hidden layer 304 generates a representation, which may include a mathematical computation (or algorithm) that produces a value based on the input values received from the nodes 308-314. The mathematical computation may include assigning different weights to each of the data values received from the nodes 308-314. The nodes 316 and 318 may include different algorithms and/or different weights assigned to the data variables from the nodes 308-314 such that each of the nodes 316-318 may produce a different value based on the same input values received from the nodes 308-314. In some embodiments, the weights that are initially assigned to the features (e.g., input values) for each of the nodes 316-318 may be randomly generated (e.g., using a computer randomizer). The values generated by the nodes 316 and 318 may be used by the node 322 in the output layer 306 to produce an output value for the artificial neural network 300. When the artificial neural network 300 is used to implement machine learning, the output value produced by the artificial neural network 300 may indicate a likelihood of an event.
The artificial neural network 300 may be trained by using training data. For example, the training data herein may include the data pertaining to the electronic modules that is collected using various time period lengths. By providing training data to the artificial neural network 300, the nodes 316-318 in the hidden layer 304 may be trained (e.g., adjusted) such that an optimal output is produced in the output layer 306 based on the training data. By continuously providing different sets of training data and penalizing the artificial neural network 300 when the output of the artificial neural network 300 is incorrect, the artificial neural network 300 (and specifically, the representations of the nodes in the hidden layer 304) may be trained to improve its performance in data classification. Adjusting the artificial neural network 300 may include adjusting the weights associated with each node in the hidden layer 304.
Although the above discussions pertain to an artificial neural network as an example of machine learning, it is understood that other types of machine learning methods may also be suitable to implement the various aspects of the present disclosure. For example, support vector machines (SVMs) may be used to implement machine learning. SVMs are a set of related supervised learning methods used for classification and regression. A SVM training algorithm—which may be a non-probabilistic binary linear classifier—may build a model that predicts whether a new example falls into one category or another. As another example, Bayesian networks may be used to implement machine learning. A Bayesian network is an acyclic probabilistic graphical model that represents a set of random variables and their conditional independence with a directed acyclic graph (DAG). The Bayesian network could present the probabilistic relationship between one variable and another variable. Other types of machine learning algorithms are not discussed in detail herein for reasons of simplicity.
The method 400 includes a step 410 to access a first machine learning task through a research workspace. The research workspace comprises a plurality of virtualized computing resource units. The first machine learning task has a first data size.
The method 400 includes a step 420 to execute the first machine learning task via a subset of the plurality of virtualized computing resource units.
The method 400 includes a step 430 to associate the first machine learning task with the subset of the virtualized computing resource units used and an amount of execution time.
The method 400 includes a step 440 to access a second machine learning task through a production workspace. The production workspace comprises a plurality of physical computing resource units, the second machine learning task having a second data size greater than the first data size. The second machine learning task and the first machine learning task have a same algorithm.
The method 400 includes a step 450 to allocate a subset of the physical computing resource units for an execution of the second machine learning task. The allocation is at least in part based on an association between the first machine learning task, the subset of the virtualized computing resource units used during an execution of the first machine learning task in the research workspace, and the amount of execution time during the execution of the first machine learning task in the research workspace.
In some embodiments, each virtualized computing resource units corresponds to a portion of a physical hardware processor or a portion of a physical electronic memory.
In some embodiments, the physical computing resource units comprise computing resources in a decentralized environment, such as an edge computing environment.
In some embodiments, the first machine learning task is one of a plurality of machine learning tasks submitted to the research workspace. The duplicative ones of the machine learning tasks may be filtered out before a rest of the machine learning tasks (including the first machine learning task) are submitted to the research workspace.
In some embodiments, the allocation of step 450 comprises: dividing each of the physical computing resource units into a plurality of blocks, and allocating one or more blocks from the subset of the physical computing resource units for the execution of the second machine learning task. In some embodiments, the method 400 further comprises monitoring, in the production workspace, which of the one or more blocks have been allocated and which other blocks of the plurality of blocks are idle. In some embodiments, the allocating is performed at least in part using a scheduler software program within the production workspace. In some embodiments, the allocating is performed at least in part using a scheduler software program within the production workspace. In some embodiments, the allocating is performed by extrapolating, based on a difference between the first data size and the second data size and further based on the subset of the virtualized computing resource units and the amount of execution time used during the execution of the first machine learning task in the research workspace, how much time or how much of the physical computing resource units are needed to complete the execution of the second machine learning task. In some embodiments, an amount of time needed to complete the execution of the second machine learning task is defined according to a Service-Level Agreement (SLA), and the extrapolating further comprises calculating how much of the physical computing resource units are needed to complete the execution of the second machine learning task in order to meet the amount of time defined according to the SLA.
In some embodiments, the associating step 430 comprises recording, for the first machine learning task via an electronic table maintained within the research workspace, the subset of the virtualized computing resource units used and the amount of execution time for each individual virtualized computing resource unit. In some embodiments, the associating step 430 further comprises associating the first machine learning task with an idle rate for each of the virtualized computing resource units in the subset.
It is also understood that additional method steps may be performed before, during, or after the steps 410-450 discussed above. For example, the method 400 may include a step of: before the accessing the second machine learning task through the production workspace, promoting the first machine learning task to be production-ready. In some embodiments, after the resources are allocated, a transaction request may be received, and the second machine learning model may be accessed in the production space. The transaction request may be processed using the machine learning model. For reasons of simplicity, other additional steps are not discussed in detail herein.
Turning now to
Input/output (I/O) device 509 may include a microphone, keypad, touch screen, and/or stylus motion, gesture, through which a user of the computing device 505 may provide input, and may also include one or more speakers for providing audio output and a video display device for providing textual, audiovisual, and/or graphical output. Software may be stored within memory 515 to provide instructions to processor 503 allowing computing device 505 to perform various actions. For example, memory 515 may store software used by the computing device 505, such as an operating system 517, application programs 519, and/or an associated internal database 521. The various hardware memory units in memory 515 may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Memory 515 may include one or more physical persistent memory devices and/or one or more non-persistent memory devices. Memory 515 may include, but is not limited to, random access memory (RAM) 506, read only memory (ROM) 507, electronically erasable programmable read only memory (EEPROM), flash memory or other memory technology, optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to store the desired information and that may be accessed by processor 503.
Communication interface 511 may include one or more transceivers, digital signal processors, and/or additional circuitry and software for communicating via any network, wired or wireless, using any protocol as described herein.
Processor 503 may include a single central processing unit (CPU), which may be a single-core or multi-core processor, or may include multiple CPUs. Processor(s) 503 and associated components may allow the computing device 505 to execute a series of computer-readable instructions to perform some or all of the processes described herein. Although not shown in
Although various components of computing device 505 are described separately, functionality of the various components may be combined and/or performed by a single component and/or multiple computing devices in communication without departing from the invention.
One aspect of the present disclosure pertains to a method. The method includes: accessing a first machine learning task through a research workspace, the research workspace comprising a plurality of virtualized computing resource units, the first machine learning task having a first data size; executing the first machine learning task via a subset of the plurality of virtualized computing resource units; associating the first machine learning task with the subset of the virtualized computing resource units used and an amount of execution time; accessing a second machine learning task through a production workspace, the production workspace comprising a plurality of physical computing resource units, the second machine learning task having a second data size greater than the first data size, wherein the second machine learning task and the first machine learning task have a same algorithm; and allocating a subset of the physical computing resource units for an execution of the second machine learning task, wherein the allocating is at least in part based on an association between the first machine learning task, the subset of the virtualized computing resource units used during an execution of the first machine learning task in the research workspace, and the amount of execution time during the execution of the first machine learning task in the research workspace.
Another aspect of the present disclosure pertains to a system. The system includes a processor and a non-transitory computer-readable medium having stored thereon instructions that are executable by the processor to cause the system to perform operations comprising: receiving, via a non-production workspace, a data analysis job; executing the data analysis job in the non-production workspace via a plurality of virtualized computing resource units, the virtualized computing resource units each providing a fraction of processing power offered by physical computing resources that are located outside the non-production workspace; recording statistics of an execution of the data analysis job in the non-production workspace; sending the data analysis job to a production workspace based on a determination that the data analysis job is production-ready; and determining, based on the statistics recorded during the execution of the data analysis job in the non-production workspace, how the physical computing resources should be allocated to execute the data analysis job in the production workspace.
Yet another aspect of the present disclosure pertains to a non-transitory machine-readable medium having stored thereon machine-readable instructions executable to cause a machine to perform operations comprising: accessing a first version of a machine learning job in a non-production environment, the first version of the machine learning job having a first data size, the non-production environment comprising a plurality of virtualized computing resource units, wherein each of the virtualized computing resource units provides a fraction of computing power provided by a physical computing device, the fraction being less than 1; executing the first version of the machine learning job in the non-production environment via a subset of the virtualized computing resource units; extracting data from an execution of the first version of the machine learning job in the non-production environment, wherein the data extracted comprises a total amount of execution time, which subset of the virtualized computing resource units were used in the execution, or a utilization rate of each of the virtualized computing resource units of the subset during the execution; promoting, based on a satisfaction of a predetermined condition, the first version of the machine learning job to a second version of the machine learning job that is production-ready; accessing the second version of the machine learning job in a production environment that comprises a plurality of the physical computing devices, the second version of the machine learning job having a second data size that is greater than the first data size; and determining, based on a difference between the first data size and the second data size and further based on the data extracted from the execution of the first version of the machine learning job in the non-production environment, how the plurality of the physical computing devices should be allocated to execute the second version of the machine learning job in the production environment.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are described as example implementations of the following claims.
Number | Date | Country | Kind |
---|---|---|---|
PCT/CN2022/135165 | Nov 2022 | WO | international |
This application claims priority to and the benefit of International Patent Application No. PCT/CN2022/135165, filed Nov. 29, 2022, the contents of which are hereby incorporated by reference herein in its entirety.