This application claims priority to Chinese Patent Application No. 202210236324.X, filed on Mar. 10, 2022, which is hereby incorporated by reference in its entirety.
The present disclosure relates to technologies such as deep learning, natural language processing, text recognition in the field of artificial intelligence, and in particular, to a training method and apparatus for a document processing model, a device, a storage medium and a program.
Artificial intelligence is a subject that studies how to make a computer to simulate certain thinking procedures and intelligent behaviors of people (such as learning, reasoning, thinking, planning, etc.), which is related to both hardware-level technique and software-level technique. Artificial intelligence hardware technique generally includes technologies such as a sensor, a special-purpose artificial intelligence chip, cloud computing, a cloud distributed storage, big data processing, etc.; and artificial intelligence software technique mainly includes computer vision technique, speech recognition technique, natural language processing technique, machine learning/deep learning, big data processing technique, knowledge mapping technique and other major directions.
Artificial intelligence has been widely used in document processing scenarios. For example, documents can be analyzed, information extracted, or classified by the target model obtained by pre-training. A training procedure of the above target model usually includes two stages: pre-training and fine-tuning training. Specifically, a sample document is used to pre-train a basic model first, so as to obtain a pre-training model which can be used to represent a document semantically. After the pre-training, aiming at a specific document processing task, a small amount of sample data is used to perform fine-tuning training on the pre-training model, so as to obtain a target model corresponding to the specific document processing task.
Generally, in the above pre-training stage, character information in the sample document can be recognized first, and the basic model can be trained by using these pieces of character information to obtain the pre-training model. However, in practical applications, it is found that the accuracy of the above pre-trained model for representing a document semantically is not high.
According to a first aspect of the present disclosure, there is provided a training method for a document processing model, including:
acquiring a first sample document;
determining element features of a plurality of document elements in the first sample document and positions corresponding to M position types of each document element according to the first sample document; where the document element corresponds to a character or a document area in the first sample document, and M is an integer greater than or equal to 1; and
performing training on a basic model according to the element features of the plurality of document elements and the positions corresponding to the M position types of each document element to obtain the document processing model.
According to a second aspect of the present disclosure, there is provided a training apparatus for a document processing model, including:
at least one processor; and
a memory connected with the at least one processor in a communication way; where,
the memory stores instructions executable by the at least one processor which, when executed by the at least one processor, enable the at least one processor to:
acquire a first sample document;
determine element features of a plurality of document elements in the first sample document and positions corresponding to M position types of each document element according to the first sample document; wherein the document element corresponds to a character or a document area in the first sample document, and M is an integer greater than or equal to 1; and
perform training on a basic model according to the element features of the plurality of document elements and the positions corresponding to the M position types of each document element to obtain the document processing model.
According to a third aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions, where the computer instructions are used to cause a computer to:
acquire a first sample document;
determine element features of a plurality of document elements in the first sample document and positions corresponding to M position types of each document element according to the first sample document; wherein the document element corresponds to a character or a document area in the first sample document, and M is an integer greater than or equal to 1; and
perform training on a basic model according to the element features of the plurality of document elements and the positions corresponding to the M position types of each document element to obtain the document processing model.
The drawings are used for a better understanding of the present solution and do not constitute a limitation of the present disclosure. Among which:
Exemplary embodiments of the present disclosure are described below with reference to the drawings, where various details of the embodiments of the present disclosure are included to facilitate understanding, which should be considered as merely exemplary. Therefore, those of ordinary skill in the art should recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the present disclosure. Also, for clarity and conciseness, descriptions of well-known functions and structures are omitted in the following description.
In order to facilitate the understanding of technical solutions provided by the present disclosure, an application scenario of the present disclosure is illustrated with reference to
Referring to
Continue to refer to
Generally, in the above pre-training stage, character information in the sample document can be recognized first, and the basic model can be trained by using these pieces of character information to obtain a pre-training model. However, in practical applications, it is found that the accuracy of the above pre-trained model for representing a document semantically is not high.
The present disclosure provides a training method and apparatus for a document processing model, a device, a storage medium and a program, which are applied to technologies such as deep learning, natural language processing and text recognition in the field of artificial intelligence, and can be used in the model pre-training stage to improve the accuracy of the pre-training model for representing a document semantically.
In the technical solutions provided by the present disclosure, the pre-training procedure is as follows: acquiring a first sample document; determining element features of a plurality of document elements in the first sample document and positions corresponding to M position types of each document element according to the first sample document; where the document element corresponds to a character or a document area in the first sample document, and M is an integer greater than or equal to 1; performing training on a basic model according to the element features of the above plurality of document elements and the positions corresponding to the M position types of each document element to obtain a pre-training model.
In the above procedure of pre-training the basic model, it not only utilizes the element features of a plurality of document elements, but also utilizes the positions corresponding to the M position types of each document element, which is equivalent to considering the relationship among all document elements, i.e., the considered information is more comprehensive, thus the accuracy of the pre-training model for representing a document semantically can be improved. In addition, each of the above-mentioned document elements can correspond to a character or a document area in the first sample document, i.e., the present disclosure can analyze a document not only from a character dimension, but also from a document area dimension. Therefore, the accuracy of the pre-training model for representing a document semantically can be further improved.
The technical solutions provided by the present disclosure will be described in detail with reference to the following specific embodiments. The following embodiments can be combined with each other. The same or similar concept or procedure may not be described in detail in some embodiments.
S201: acquire a first sample document.
Illustratively, the first sample document can be a sample document in a sample document database in
In the embodiments of the present disclosure, the first sample document may include at least one of the following contents: a character, a picture, a table, etc., where the character can be a Chinese character, an English character, or a character of any other languages.
S202: determine element features of a plurality of document elements in the first sample document and positions corresponding to M position types of each document element according to the first sample document; where the document element corresponds to a character or a document area in the first sample document, and M is an integer greater than or equal to 1.
Among which, the document element refers to an object that constitutes the first sample document. A document element can correspond to a character or a document area in the first sample document.
As an example,
As an example,
In the embodiments of the present disclosure, each character and each document area in the first sample document can be taken as a document element. That is, assuming that the first sample document includes K1 characters and the first sample document is divided into K2 document areas, K1 characters and K2 document areas in the first sample document are all taken as document elements. In this way, K1+K2 document elements can be determined in the first sample document.
Element features of each document element are used to describe semantic information of the document element. Illustratively, after a plurality of document elements in the first document is determined, each document element can be semantically represented to determine element features of the document element.
Generally, upon describing a position of a document element, it can be described in a number of ways. Illustratively, in a possible way, an identifier (index or ID) of each document element can be adopted to describe a position of a document element. With reference to
In the embodiments of the present disclosure, it is considered that the semantic of a document is not only related to each document element in the document, but also related to the position among various document elements. Therefore, in order to represent a document semantically better, after a plurality of document elements in the first sample document is determined, the position of each document element can also be determined.
In an implementation, the position of each document element can be a relative position of each document element relative to a certain reference object. Illustratively, the first document element in the first sample document can be used as a reference object, and a relative position of each document element relative to the first document element can be determined respectively.
Further, in the embodiments of the present disclosure, when the position of the document element is determined, the positions corresponding to M position types can be determined. That is, M position types are adopted to represent the positions of the document elements, respectively. In an implementation, the M position types include one or more of the following: a one-dimensional position type, a document width direction position type, and a document height direction position type.
Among which, a position corresponding to the one-dimensional position type of the document element is for indicating an arrangement position of the document element among the plurality of document elements.
For example, taking
A position corresponding to the document width direction position type of the document element is for indicating an offset between a coordinate of the document element in a document width direction and a first preset reference coordinate, where the first preset reference coordinate can be a coordinate of a preset reference object in the document width direction.
A position corresponding to the document height position type of the document element is for indicating an offset between a coordinate of the document element in a document height direction and a second preset reference coordinate, where the second preset reference coordinate can be a coordinate of the preset reference object in the document height direction.
For example, assuming that coordinate information of the document element 301 is (x1, y1, h, w), coordinate information of the document element 302 is (x2, y2, h, w), and coordinate information of the document element 303 is (x3, y3, h, w). Taking the document element 301 as the preset reference object, then:
for a position type in the document height direction,
the position of the document element 301 can be represented as 0 (y1−y1=0);
the position of the document element 302 can be represented as y2−y1;
the position of the document element 303 can be represented as y3−y1;
for a position type in the document width direction,
the position of the document element 301 can be represented as 0 (x1−x1=0);
the position of the document element 302 can be represented as x2−x1;
the position of the document element 303 can be represented as x3−x1.
In some possible implementations, positions corresponding to various position types of document elements can be converted into vector forms by using a preset look-up table method.
S203: perform training on a basic model according to the element features of the plurality of document elements and the positions corresponding to the M position types of each document element to obtain the document processing model.
Among which, the basic model refers to a to-be-trained model, or an empty model. It should be noted that this embodiment does not limit a network structure of the basic model. Illustratively, the basic model can be a Transformer model.
In this embodiment, training on the basic model can be performed according to the element feature of a plurality of document elements and the positions corresponding to M position types of each document element, so that the basic model can continuously learn and obtain the relationship among the document semantics, the element feature of each document element and the positions of each document element. That is, the basic model has the ability to represent a document semantically through training.
It should be understood that the embodiment shown in
The training method for a document processing model provided by this embodiment includes: acquiring a first sample document; determining element features of a plurality of document elements in the first sample document and positions corresponding to M position types of each document element according to the first sample document; where the document element corresponds to a character or a document area in the first sample document; and performing training on a basic model according to the element features of the plurality of document elements and the positions corresponding to the M position types of each document element to obtain the document processing model. In the above procedure, not only the element feature of a plurality of document elements, but also the positions corresponding to the M position types of each document element, are utilized, which is equivalent to considering the relationship among all document elements, i.e., the information considered is more comprehensive, so the accuracy of the document processing model for representing a document semantically can be improved.
On the basis of the embodiment shown in
In this embodiment, the plurality of document elements include K1 characters and K2 document areas, and both K1 and K2 are integers greater than or equal to 0. The first sample document can be processed as follows.
(1) character recognition processing is performed on the first sample document to obtain element features of the K1 characters and positions corresponding to M position types of each character.
Illustratively, optical character recognition (OCR) technique can be used to perform character recognition processing on the first sample document to obtain the characters included in the first sample document and the position of each character in the first sample document, where the above position can be represented by a one-dimensional position or a two-dimensional position (for example, coordinate information (x, y, h, w)).
For each character, a word vector corresponding to the character is obtained by performing vector mapping on the character. Position information of each character recognized by the above OCR technique is usually an absolute position. By performing vector mapping on the absolute position of the character, a position vector corresponding to the character can be obtained. According to the word vector and the position vector corresponding to the character, the element feature of the character is generated.
Further, for each position type, a relative position of the character relative to the preset reference object can also be determined according to the absolute position of the character. Thereby, the positions corresponding to the M position types of the character are obtained.
In some possible scenarios, all characters in a document are not arranged in the order from left to right and from top to bottom due to the reasons such as document typesetting and layout. For example, in the document shown in
For the above scenario, the document layout can be parsed first to obtain layout information, and then character recognition processing is performed based on the layout information, so as to ensure that the recognized character sequence is consistent with the reading sequence. The following is an example with reference to
Continue to refer to
For each character in the K1 characters, a word vector corresponding to the character is obtained by performing vector mapping on the character. According to the position of the character in the text block and the positional relationship among each text block, the absolute position of the character in the first sample document is determined. By performing vector mapping on the absolute position of the character in the first sample document, the position vector corresponding to the character is obtained. According to the word vector and the position vector corresponding to the character, the element feature of the character is generated.
Further, for each position type, the relative position of the character relative to the preset reference object can also be determined according to the absolute position of the character in the first sample document. Thereby, the positions corresponding to the M position types of the character are obtained.
(2) the document image corresponding to the first sample document is divided into K2 document areas, and feature extraction is performed on the document image to obtain element features of the K2 document areas and positions corresponding to M position types of each document area.
The following is an example with reference to
Further, feature extraction can be performed on the document image to obtain an image feature of the document image. For example, a document image can be input into a visual encoder with a convolution network structure, and the visual encoder encodes the document image to obtain the image feature. For each document area in K2 document areas, the area feature corresponding to the document area is obtained from the image feature. For example, the image feature is input into an average pooling layer and a full connection layer to map the image feature into the area features of K2 document areas. For each document area, the absolute position of the document area in the document image is processed by vector mapping to obtain a position feature of the document area. The area feature and the position feature of the document area are spliced to obtain the element feature of the document area.
It should be understood that through the above procedure shown in
On the basis of any of the above embodiments, the training method for a document processing model provided by the present disclosure will be described in more detail with reference to a specific embodiment.
S601: input element features of a plurality of document elements and the positions corresponding to the M position types of each document element into the basic model.
For ease of understanding, the following is an example with reference to
Referring to
In this embodiment, the positions corresponding to M position types of each document element are respectively input into the basic model, instead of fusing the positions corresponding to the M position types and then inputting the fused positions into the basic model. In this way, premature fusion of positions corresponding to different position types can be avoided, so that the positions corresponding to different position types can be distinguished within the basic model, or the positions corresponding to different position types can be decoupled within the basic model, so that more knowledge can be learned in the model training procedure, thus the ability to represent a document semantically is improved.
S602: determine an attention weight parameter of each document element through the basic model according to the element features of the plurality of document elements and the positions corresponding to the M position types of each document element.
In other words, within the basic model, an attention weight parameter of each document element is determined according to the element features of the plurality of document elements and the positions corresponding to the M position types of each document element. It should be understood that the greater the attention weight of a document element, the more attention will be applied to the element feature of the document element in the training procedure; and the smaller the attention weight of the document element, the less attention will be applied to the element feature of the document element in the training procedure. It is thus evident that the attention weight parameter of each document element can guide the model training procedure.
In a possible implementation, the attention weight parameter of each document element can be determined as follows.
(1) first linear processing and second linear processing is performed on the element features of the plurality of document elements to obtain a first feature matrix and a second feature matrix respectively.
Illustratively, referring to
(2) the first linear processing and the second linear processing is performed, for each position type of the M position types, on the position of each document element corresponding to the position type to obtain a first position matrix and a second position matrix corresponding to the position type respectively.
Illustratively, referring to
Continue to refer to
Continue to refer to
(3) the attention weight parameter of each document element is determined according to the first feature matrix, the second feature matrix, and the first position matrix and the second position matrix corresponding to each of the M position types.
In a possible implementation, the following manners can be adopted.
(a) a first attention matrix is determined according to the first feature matrix and the second feature matrix.
Illustratively, referring to
(b) a second attention matrix corresponding to the position type is determined according to the first feature matrix and the second position matrix corresponding to each position type.
Continue to refer to
(c) a third attention matrix corresponding to the position type is determined according to the second feature matrix and the first position matrix corresponding to each position type.
Continue to refer to
(d) the attention weight parameter of each document element is determined according to the first attention matrix, and the second attention matrix and the third attention matrix corresponding to each of the M position types.
In an implementation, the sum of the first attention matrix, and the second attention matrix and the third attention matrix corresponding to the M position types, respectively, can be determined as a target attention matrix, and then, according to the target attention matrix, the attention weight parameter of each document element is determined.
Illustratively, referring to
S603: perform training on the basic model according to the element features of the plurality of document elements and the attention weight parameter of each document element to obtain the document processing model.
Illustratively, continue to refer to
Since the attention weight parameter of each document element indicates how much attention is applied to each document element in the training procedure, upon training the basic model, different attentions can be applied to different document elements according to the attention weight parameter of each document element, thereby improving the ability of the document processing model for representing a document semantically.
In this embodiment, by inputting the element feature of each document element and the positions corresponding to the M position types of each document element into the basic model, the positions corresponding to different position types can be distinguished within the basic model, or, the positions corresponding to different position types can be decoupled within the basic model, so that more knowledge can be learned in the model training procedure, thus the ability to represent a document semantically is improved.
Further, within the basic model, upon determining the attention weight parameter of each document element, not only the first attention matrix obtained by the first feature matrix Qc and the second feature matrix Kc is considered, but also the second attention matrix corresponding to each position type obtained by the first feature matrix Qc and the second position matrix (Kr, Kx, Ky) corresponding to different position types, and the third attention matrix corresponding to each position type obtained by the second feature matrix Kc and the first position matrix (Qp, Qx, Qy) corresponding to different position types are considered. That is, upon determining the attention weight parameter of each document element, the relationship between the element features and the positions corresponding to different position types is fully considered, so that more knowledge can be learned in the model training procedure, and then the ability to represent a document semantically is improved.
On the basis of the embodiments shown in
Taking 4 training tasks as examples to illustrate. Assume the 4 training tasks are as follows.
Training task 1: mask part of the characters in the sample document, and during the pre-training procedure, and predict a masked character. In this prediction task, in addition to masking part of the characters, it is also necessary to smear the document area where the masked character are located, so as to avoid the leakage of a label on the document area side.
Training task 2: randomly smear a document area in the first sample document and then predict which character(s) is/are smeared.
Training task 3: randomly replace a certain document area in the first sample document, and predict which document area is replaced.
Training task 4: for a certain character in the first sample document, predict which character is the next character.
With reference to
Within the basic model, a prediction document element corresponding to each training task can be determined according to the third feature matrix and the attention weight parameter of each document element respectively. Taking
Further, training on the basic model can be performed according to the target document element corresponding to each of the N training tasks and the prediction document element corresponding to each of the N training tasks, to obtain the document processing model.
Illustratively, for each training task of the N training tasks, a loss function corresponding to the training task is determined according to a target document element and a prediction document element corresponding to the training task. Taking
A target loss function is determined according to the loss function corresponding to each of the N training tasks. Referring to
It should be understood that the above description is an iterative training procedure. For a plurality of sample documents, the above iterative training procedure is respectively executed until the training is stopped when the basic model reaches the convergence condition. The basic model that reaches the convergence condition is taken as the document processing model.
In this embodiment, by adopting the model training mode of a plurality of training tasks at the same time, the document processing model integrates the training objections of the plurality of training tasks, improves the effect of the document processing model for representing a document semantically, and enables the document processing model to quickly migrate to different document processing scenarios.
On the basis of any of the above embodiments, after the document processing model is obtained, further including: acquiring sample data corresponding to a preset document task, where the sample data includes a second sample document and annotation data corresponding to the second sample document; performing processing on the second sample document through the document processing model to obtain prediction data; and adjusting a parameter of the document processing model according to a difference between the prediction data and the annotation data to obtain a target model corresponding to the preset document task.
Among which, the above preset document task can be, but is not limited to, any of the following: a document classification task, a document analysis task, an information extraction task from documents, etc.
The sample data includes a second sample document and annotation data corresponding to the second sample document. It should be understood that the annotation data in the sample data may be different for different document processing tasks, which is not limited in this embodiment. For example, for the document classification task, the above annotation data may indicate an annotation category of the second sample document; for the document analysis task, the above annotation data may indicate an annotation analysis result of the second sample document; and for the document information extraction task, the above annotation data may indicate an annotation information extraction result of the second sample document.
The second sample data is input into the document processing model, and the document processing model processes the second sample data to obtain the prediction data. It should be understood that the prediction data output by the document processing model may be different for different document processing tasks, which is not limited in this embodiment. For example, for the document classification task, the above prediction data may indicate a prediction category of the second sample document; for the document analysis task, the above prediction data may indicate a prediction analysis result of the second sample document; and for the document information extraction task, the above prediction data may indicate a prediction information extraction result of the second sample document.
The loss function is determined according to the prediction data and the annotation data, and the model parameter of the document processing model is adjusted according to the loss function.
It should be understood that this embodiment describes the fine-tuning stage shown in
the first acquisition module 901 is configured to acquire a first sample document;
a determination module 902 is configured to determine element features of a plurality of document elements in the first sample document and positions corresponding to M position types of each document element according to the first sample document; where the document element corresponds to a character or a document area in the first sample document, and M is an integer greater than or equal to 1; and
the first training module 903 is configured to perform training on a basic model according to the element features of the plurality of document elements and the positions corresponding to the M position types of each document element to obtain the document processing model.
In a possible implementation, the first training module 903 includes:
an input unit, configured to input the element features of the plurality of document elements and the positions corresponding to the M position types of each document element into the basic model;
a first determination unit, configured to determine an attention weight parameter of each document element through the basic model according to the element features of the plurality of document elements and the positions corresponding to the M position types of each document element; and
a training unit, configured to perform training on the basic model according to the element features of the plurality of document elements and the attention weight parameter of each document element to obtain the document processing model.
In a possible implementation, the first determination unit includes:
a first processing subunit, configured to perform first linear processing and second linear processing on the element features of the plurality of document elements to obtain a first feature matrix and a second feature matrix respectively;
a second processing subunit, configured to perform, for each position type of the M position types, the first linear processing and the second linear processing on the position of each document element corresponding to the position type to obtain a first position matrix and a second position matrix corresponding to the position type respectively; and
a determination subunit, configured to determine the attention weight parameter of each document element according to the first feature matrix, the second feature matrix, and the first position matrix and the second position matrix corresponding to each of the M position types.
In a possible implementation, the determination subunit is specifically configured to:
determine a first attention matrix according to the first feature matrix and the second feature matrix;
determine a second attention matrix corresponding to the position type according to the first feature matrix and the second position matrix corresponding to each position type;
determine a third attention matrix corresponding to the position type according to the second feature matrix and the first position matrix corresponding to each position type; and
determine the attention weight parameter of each document element according to the first attention matrix, and the second attention matrix and the third attention matrix corresponding to each of the M position types.
In a possible implementation, the determination subunit is specifically configured to:
determine a sum of the first attention matrix, and the second attention matrix and the third attention matrix corresponding to each of the M position types as a target attention matrix; and
determine the attention weight parameter of each document element according to the target attention matrix.
In a possible implementation, the training unit includes:
a third processing subunit, configured to perform third linear processing on the element features of the plurality of document elements to obtain a third feature matrix; and
a training subunit, configured to perform training on the basic model according to the third feature matrix and the attention weight parameter of each document element to obtain the document processing model.
In a possible implementation, the first training module 903 further includes:
a scrambling processing unit, configured to determine a target document element corresponding to each training task in the plurality of document elements according to N training tasks respectively, and performing scrambling processing on the target document element; N is an integer greater than or equal to 1;
the training subunit is specifically configured to:
determine a prediction document element corresponding to each training task respectively, according to the third feature matrix and the attention weight parameter of each document element; and
perform training on the basic model according to the target document element corresponding to each of the N training tasks and the prediction document element corresponding to each of the N training tasks to obtain the document processing model.
In a possible implementation, the training subunit is specifically configured to:
for each training task of the N training tasks, determine a loss function corresponding to the training task according to the target document element and the prediction document element corresponding to the training task;
determine a target loss function according to the loss function corresponding to each of the N training tasks; and
update, according to the target loss function, a model parameter of the basic model to obtain the document processing model.
In a possible implementation, the plurality of document elements include K1 characters and K2 document areas, and both K1 and K2 are integers greater than or equal to 0; and the determination module 902 includes:
a second determination unit, configured to perform character recognition processing on the first sample document to obtain element features of the K1 characters and positions corresponding to M position types of each character; and
a third determination unit, configured to divide the document image corresponding to the first sample document into K2 document areas, and perform feature extraction on the document image to obtain element features of K2 document areas and positions corresponding to M position types of each document area.
In a possible implementation, the training apparatus of the document processing model 900 of this embodiment further includes:
a second acquisition module, configured to acquire sample data corresponding to a preset document task, where the sample data includes a second sample document and annotation data corresponding to the second sample document;
a processing module, configured to perform processing on the second sample document through the document processing model to obtain prediction data; and
a second training module, configured to adjust a parameter of the document processing model according to a difference between the prediction data and the annotation data to obtain a target model corresponding to the preset document task.
In a possible implementation, the M position types include one or more of the following:
a one-dimensional position type, a document width direction position type, and a document height direction position type;
a position corresponding to the one-dimensional position type of the document element is for indicating an arrangement position of the document element among the plurality of document elements;
a position corresponding to the document width direction position type of the document element is for indicating an offset between a coordinate of the document element in a document width direction and a first preset reference coordinate; and
a position corresponding to the document height position type of the document element is for indicating an offset between a coordinate of the document element in a document height direction and a second preset reference coordinate.
The training apparatus for a document processing model provided by this embodiment can be used to execute the training method for a document processing model provided by any of the above-mentioned method embodiments, and their implementation principles and technical effects are similar, which will not be repeated herein.
In the technical solutions of the present disclosure, collection, storage, usage, processing, transmission, provision and disclosure of the users personal information involved are in compliance with relevant laws and regulations, and do not violate public order and good customs.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
According to an embodiment of the present disclosure, the present disclosure further provides a computer program product, where the computer program product includes: a computer program stored in a readable storage medium, and at least one processor of the electronic device can read the computer program from the readable storage medium, and at least one processor executes the computer program to cause the electronic device to execute the solution provided in any of the above embodiments.
As shown in
A plurality of components in the electronic device 1000 connected to the I/O interface 1005 include: an input unit 1006, such as a keyboard, a mouse, etc.; an output unit 1007, such as various types of displays and speakers, etc.; and a storage unit 1008, such as a magnetic disk, an optical disk, etc.; and a communication unit 1009, such as a network card, a modem, a wireless communication transceiver, etc. The communication unit 1009 allows the electronic device 1000 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
The computing unit 1001 may be various general-purpose and/or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 1001 include, but are not limited to, central processing units (CPU), graphics processing units (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processors (DSP), and any appropriate processors, controllers, microcontrollers, etc. The computing unit 1001 executes the various methods and processes described above, for example, a training method for a document processing model. For example, in some embodiments, the training method for a document processing model may be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed on the electronic device 1000 via the ROM 1002 and/or the communication unit 1009. When the computer program is loaded into the RAM 1003 and executed by the computing unit 1001, one or more steps of the training method for a document processing model described above can be executed. Alternatively, in other embodiments, the computing unit 1001 may be configured to perform the training method for a document processing model in any other suitable mode (for example, by means of firmware).
Various implementations of the systems and technologies described herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGA), application specific integrated circuits (ASIC), application specific standard products (ASSP), system-on-chip (SOC), complex programmable logic devices (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various implementations may include implementing in one or more computer programs that can be executed and/or interpreted on a programmable system including at least one programmable processor, and the programmable processor can be a special purpose or a general purpose programmable processor and can receive data and instructions from and transmit data and instructions to the storage system, at least one input apparatus, and at least one output apparatus.
The program codes used to implement the methods of the present disclosure can be written in any combination of one or more programming languages. These program codes can be provided to the processors or controllers of general-purpose computers, special-purpose computers, or other programmable data processing apparatuses, so that when the program codes, when executed by the processors or controllers, enable the functions/operations specified in the flowcharts and/or block diagrams to be implemented. The program codes can be executed entirely on a machine, partly executed on the machine, partly executed on the machine and partly executed on a remote machine as an independent software package, or entirely executed on the remote machine or a server.
In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, an apparatus or a device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, a magnetic, an optical, an electromagnetic, an infrared, or a semiconductor system, apparatus, or device, or any suitable combinations of the above. More specific examples of the machine-readable storage medium might include electrical connections based on one or more wires, portable computer disks, hard disks, random access memories (RAM), read-only memories (ROM), erasable programmable read-only memories (EPROM or flash memory), optical fibers, portable compact disk read-only memories (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the above.
In order to provide interaction with the user, the systems and techniques described herein can be implemented on a computer that has: a display apparatus for displaying information to a user (for example, a CRT (cathode ray tube) or a LCD (liquid-crystal display) monitor); and a keyboard and a pointing apparatus (for example, a mouse or a trackball) through which the user can provide input to the computer. Other types of apparatuses can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (for example, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input) to receive input from the user.
The systems and technologies described herein can be implemented in a computing system including background components (e.g., as a data server), a computing system including middleware components (e.g., an application server), or a computing system including front-end components (e.g., a user computer with a graphical user interface or a web browser through which users can interact with implementations of the systems and technologies described herein), or a computing system including any combination of such background components, middleware components, or front-end components. The components of the system can be connected to each other through any form or medium of digital data communication (for example, a communication network). Examples of communication networks include: a local area network (LAN), a wide area network (WAN), and the Internet.
The computer system can include a client and a server. The client and the server are generally remote from each other and usually interact through a communication network. The relationship between the client and the server is generated by computer programs that run on the corresponding computers and have a client-server relationship with each other. The server can be a cloud server (also known as a cloud computing server or a cloud host), which is a host product in a cloud computing service system to solve the defects of difficult management and weak business scalability in traditional physical host and VPS (Virtual Private Server) service. The server can also be a server of a distributed system, or a server combined with a blockchain.
It should be understood that the various forms of processes shown above can be used to reorder, add or delete steps. For example, the steps recited in the present disclosure can be executed in parallel, sequentially or in a different order, which is not limited herein as long as the desired result of the technical solution disclosed in the present disclosure can be achieved.
The above-mentioned detailed implementations do not limit the protection scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions can be made according to design requirements and other factors. Any modifications, equivalent replacements, improvements and the like made within the spirit and the principle of the present disclosure shall be included in the protection scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202210236324X | Mar 2022 | CN | national |