This application is based upon and claims the benefit of priority from Japanese patent application No. 2020-205659, filed on Dec. 11, 2020, the disclosure of which is incorporated herein in its entirety by reference.
The present disclosure relates to an extrusion molding apparatus and its control method.
As disclosed in Japanese Unexamined Patent Application Publication No. 2020-152097, the inventors of the present application have developed an extrusion molding apparatus and its control method using machine learning.
The inventors have found various problems during the development of an extrusion molding apparatus and its control method.
Other problems and novel features will be clarified from the descriptions in this specification and the attached drawings.
In an extrusion molding apparatus according to an embodiment, a control unit, which is configured to perform feedback control for a rotation speed (e.g., number of revolutions per minute) of a pump so as to bring a pressure measured by a pressure sensor closer to a target pressure, determines a current state and a reward for an action selected in the past based on a difference between the measured pressure and the target pressure, updates a control condition based on the reward and selects an optimum action corresponding to the current state under the updated control condition, the control condition being a combination of a state and an action, and controls the rotation speed of the pump based on the optimum action.
According to the above-described embodiment, it is possible to provide a manufacturing apparatus capable of manufacturing an excellent resin film.
The above and other objects, features and advantages of the present disclosure will become more fully understood from the detailed description given hereinbelow and the accompanying drawings which are given by way of illustration only, and thus are not to be considered as limiting the present disclosure.
Specific embodiments are explained hereinafter in detail with reference to the drawings. However, the present disclosure is not limited to the below-shown embodiments. Further, the following descriptions and the drawings are simplified as appropriate for clarifying the explanation.
Firstly, an overall configuration of an extrusion molding apparatus according to a first embodiment will be described with reference to
Note that, needless to say, right-handed xyz-orthogonal coordinates shown in
Further, in this specification, the term “resin film” includes a resin sheet.
As shown in
The extruder 10 is, for example, a screw-type extruder. In the extruder 10 shown in
A motor M1 is connected to the base of the screw 12. The motor M1 is a driving source that drives the screw 12.
Note that only one screw 12 may be provided, or a plurality of screws 12 may be provided. For example, an extruder 10 with one screw 12 is called a single-screw extruder, while an extruder 10 with two screws 12 is called a twin-screw extruder.
The resin pellets 81 supplied from the hopper 13 are extruded (i.e., pushed) from the base of the screw 12, which is rotated by the motor M1, toward the tip thereof, i.e., extruded (i.e., pushed) in the x-axis positive direction. The resin pellets 81 are heated and compressed by the rotating screw 12 inside the cylinder 11, and are transformed into molten resin 82.
As shown in
Note that as shown in
Note that the pump, which sucks in the molten resin extruded from the extruder 10 and discharges it to the T-die 20, is not limited to the gear pump, and may be any of other types of pumps.
As shown in
The cooling roll 30 discharges a resin film 83, which is formed as the film-like molten resin 82a solidifies, while cooling the film-like molten resin 82a extruded from the T-die 20. The resin film 83 discharged from the cooling roll 30 is conveyed through the conveyor roll group 40 and is wound up by the winder 50. In the example shown in
The thickness sensor 60 is, for example, a noncontact-type thickness sensor and measures the distribution of thicknesses (hereinafter also referred to as the thickness distribution) of the resin film 83, which was discharged from the cooling roll 30 and is being conveyed, in the width direction thereof. In the example shown in
As shown in
When the rotation speed of the screw 12 driven by the motor M1 is increased, the amount of molten resin extruded (i.e., pushed) toward the gear pump GP increases, so that the pressure of the molten resin on the suction side of the gear pump GP rises. Conversely, when the rotation speed of the screw 12 is decreased, the amount of molten resin extruded toward the gear pump GP decreases, so that the pressure of the molten resin on the suction side of the gear pump GP decreases.
Therefore, when the control unit 70 performs feedback control for the rotation speed of the screw 12, if the pressure measured by the pressure sensor PS is lower than the target pressure, the rotation speed of the screw 12 (i.e., the output of the motor M1) is increased. Conversely, if the measured pressure is higher than the target pressure, the rotation speed of the screw 12 (i.e., the output of the motor M1) is decreased.
Meanwhile, when the rotation speed of the gear pump GP driven by the motor M2 is increased, the amount of molten resin sucked in by the gear pump GP increases, so that the pressure of the molten resin on the suction side of the gear pump GP decreases. Conversely, when the rotation speed of the gear pump GP is decreased, the amount of molten resin sucked in by the gear pump GP decreases, so that the pressure of the molten resin on the suction side of the gear pump GP rises.
Therefore, when the control unit 70 performs feedback control for the rotation speed of the gear pump GP, if the pressure measured by the pressure sensor PS is lower than the target pressure, the rotation speed of the gear pump GP (i.e., the output of the motor M2) is decreased. Conversely, if the measured pressure is higher than the target pressure, the rotation speed of the screw 12 (i.e., the output of the motor M2) is increased.
Further, as shown in
Note that the configuration and the operation of the control unit 70 will be described later in a more detailed manner.
The structure of the T-die 20 will be described hereinafter in a more detailed manner with reference to
As shown in
In the abutting surfaces of the pair of die blocks 21 and 22, an inlet port 20a, a manifold 20b, and a slit 20c are formed. The inlet port 20a extends downward (in the z-axis negative direction) from the upper surface of the T-die 20. The manifold 20b extends from the lower end of the inlet port 20a in the y-axis positive direction and the y-axis negative direction. In this way, the inlet port 20a and the manifold 20b are formed in a T-shape in the T-die 20.
Further, the slit 20c extending from the bottom surface of the manifold 20b to the lower surface of the T-die 20 extends in the y-axis direction. The molten resin 82 is extruded downward from the slit 20c (i.e., from the gap between the lips 21a and 22a) through the inlet port 20a and the manifold 20b.
Note that while the lip 21a is a fixed stationary lip, the lip 22a is a movable lip connected to heat bolts 23. In the lip 22a, a cut-out groove 22b is formed so as to extend obliquely upward from the outer-side surface toward the abutting surface. The lip 22a is pushed and pulled by the heat bolts 23, so that the lip 22a can be moved by using the bottom of the cut-out groove 22b as a fulcrum. As described above, only the lip 22a is formed as a movable lip, so that the lip distance can be easily adjusted by a simple structure.
The heat bolts 23 extend obliquely upward along the tapered part of the die block 22. The heat bolts 23 are supported by holders 25a and 25b fixed to the die block 22. More specifically, the heat bolts 23 are screwed into threaded holes formed in the holder 25a. The tightness of each of the heat bolts 23 can be adjusted as desired. In contrast, although the heat bolts 23 are inserted through through-holes formed in the holder 25b, they are not fixed to the holder 25b. Note that the holders 25a and 25b do not necessarily have to be formed as components that are provided separately from the die block 22. That is, they may be integrally formed with the die block 22.
Note that as shown in
One heater 24 is provided for each heat bolt 23 to heat that heat bolt 23. In the example shown in
It is possible to adjust the distance between the lips 21a and 22a by adjusting the tightness of the heat bolts 23. Specifically, when the tightness of the heat bolts 23 are increased, the heat bolts 23 push the lip 22a, so that the distance between the lips 21a and 22a is reduced. On the other hand, when the tightness of the heat bolt 23 are reduced, the distance between the lips 21a and 22a is increased. For example, the tightness of the heat bolts 23 are manually adjusted.
Further, it is possible to finely adjust the distance between the lips 21a and 22a by the amounts of the thermal expansions (hereinafter also referred to as the thermal expansion amounts) of the heat bolts 23 caused by the heaters 24. Specifically, when the heating temperatures of the heaters 24 are raised, the thermal expansion amounts of the heat bolts 23 increase, so that the heat bolts 23 push the lip 22a and the distance between the lips 21a and 22a thereby is reduced. On the other hand, when the heating temperatures of the heaters 24 are lowered, the thermal expansion amounts of the heat bolts 23 decrease, so that the distance between the lips 21a and 22a is increased. The thermal expansion amount of each heat bolt 23, i.e., the heating by each heater 24 is controlled by the control unit 70.
An extrusion molding apparatus according to a comparative example has an overall configuration similar to that of the extrusion molding apparatus according to the first embodiment shown in
Next, the configuration of the control unit 70 according to the first embodiment will be described in a more detailed manner with reference to
The control unit 70 individually performs feedback control for the output of each of the motors M1 and M2, which are the driving sources of the screw 12 and the gear pump GP, respectively. Although
Note that each of the functional blocks constituting the control unit 70 can be implemented by hardware such as a CPU (Central Processing Unit), a memory, and other circuits, or can be implemented by software such as a program(s) loaded in a memory or the like. Therefore, each functional block can be implemented in various forms by computer hardware, software, or combinations thereof.
The state observation unit 71 calculates a control error err from a pressure pv measured by the pressure sensor PS. The control error err is a difference between the measured pressure pv and a target pressure.
Then, the state observation unit 71 determines a current state st and a reward rw for an action ac selected in the past (e.g., selected in the last time) based on the calculated control error err.
The state st is defined in advance in order to classify values of the control error err, which can take any of infinite number of values, into a finite number of groups. As a simple example for an explanatory purpose, when the control error is represented by err, for example, a range “−4.0 MPa≤err<−3.0 MPa is defined as a state st1; a range “−3.0 MPa≤err<−2.0 MPa is defined as a state st2; a range “−2.0 MPa≤err<−1.0 MPa is defined as a state st3; a range “−1.0 MPa≤err<+1.0 MPa is defined as a state st4; a range “+1.0 MPa≤err<+2.0 MPa is defined as a state st5; a range “+2.0 MPa≤err<+3.0 MPa is defined as a state st6; a range “+3.0 MPa≤err<+4.0 MPa is defined as a state st7; and a range “+4.0 MPa≤err<+5.0 MPa is defined as a state st8. In practice, in many cases, a larger number of states st each having a narrower range may be defined.
The reward rw is an index for evaluating an action ac that was selected in a past state st.
Specifically, when the absolute value of the calculated current control error err is smaller than the absolute value of the past control error err, the state observation unit 71 determines that the action ac selected in the past is appropriate and sets, for example, a positive value to the reward rw. In other words, the reward rw is determined so that the previously selected action ac is more likely to be selected again in the same state st as the past state.
On the other hand, if the absolute value of the calculated current control error err is larger than the absolute value of the past control error err, the state observation unit 71 determines that the action ac selected in the past is inappropriate and sets, for example, a negative value to the reward rw. In other words, the reward rw is determined so that the previously selected action ac is less likely to be selected again in the same state st as the past state.
Note that specific examples of the reward rw will be described later. Further, the value of the reward rw can be determined as appropriate. For example, the reward rw may have a positive value at all times, or the reward rw may have a negative value at all times.
The control condition learning unit 72 performs reinforcement learning for each of the motors M1 and M2. Specifically, the control condition learning unit 72 updates a control condition (a learning result) based on the reward rw, and selects an optimum action ac corresponding to the current state st under the updated control condition. The control condition is a combination of a state st and an action ac. Table 1 shows simple control conditions (learning results) corresponding to the above-described states st1 to st8. In the example shown in
The Table 1 shows control conditions (learning results) by Q learning, which is an example of the reinforcement learning. The aforementioned eight states st1 to st8 are shown in the uppermost row in the Table 1. That is, the eight states st1 to st8 are shown in the second to ninth columns, respectively. Meanwhile, five actions ac1 to ac5 are shown in the leftmost column in the Table 1. That is, the five actions ac1 to ac5 are shown in the second to sixth rows, respectively.
Note that, in the example shown in the Table 1, an action for reducing the output of the motor M2 shown in
A value determined by a combination of a state st and an action ac in the Table 1 is called a quality Q (st, ac). After an initial value is given, the quality Q is successively updated based on the reward rw by using a known updating formula. The initial value of the quality Q is included in, for example, the learning condition shown in
The quality Q will be described by using the state st7 in the Table 1 as an example. In the state st7, since the control error err is no smaller than +3.0 MPa and smaller than +4.0 MPa, the measured pressure pv is higher than the target pressure and the rotation speed of the gear pump GP is too low. That is, since the output of the motor M2, which drives the gear pump GP, is too low, it is necessary to increase the output of the motor M2. Therefore, as a result of learning by the control condition learning unit 72, the qualities Q of the actions ac4 and ac5 for increasing the output of the motor M2 are large. On the other hand, the qualities Q of the actions ac1 and ac2 for decreasing the output of the motor M2 are small.
In the example shown in the Table 1, for example, when the control error err is +3.5 MPa, the state st falls in the state st7. Therefore, the control condition learning unit 72 selects the optimum action ac4 having the maximum quality Q in the state st7, and outputs the selected action ac4 to the control signal output unit 74.
The control signal output unit 74 outputs a control signal ctr for increasing the output of the motor M2 by 0.5% to the motor M2 based on the action ac4 received from the control condition learning unit 72.
Then, when the absolute value of the next control error err is smaller than the absolute value 3.5 MPa of the current control error err, the state observation unit 71 determines that the selecting of the action ac4 in the current state st7 is appropriate, and outputs a reward rw having a positive value. Therefore, the control condition learning unit 72 updates the control condition so as to increase the quality+3.6 of the action ac4 in the state st7 according to the reward rw. As a result, in the case of the state st7, the control condition learning unit 72 continuously selects the action ac4.
On the other hand, when the absolute value of the next control error err is larger than the absolute value of 3.5 MPa of the current control error err, the state observation unit 71 determines that the selecting of the action ac4 in the current state st7 is inappropriate, and outputs a reward rw having a negative value. Therefore, the control condition learning unit 72 updates the control condition so as to reduce the quality+3.6 of the action ac4 in the state st7 according to the reward rw. As a result, in the case of the state st7, when the quality of the action ac4 in the state st7 becomes smaller than the quality+2.6 of the action ac5, the control condition learning unit 72 selects the action ac5 instead of the action ac4.
Note that the timing of the updating of the control condition is not limited to the next time (e.g., not limited to when the control error is calculated the next time). That is, the timing of the updating may be determined as appropriate while taking a time lag or the like into consideration. Further, in the initial stage of the learning, the action ac may be randomly selected in order to expedite the learning. Further, although the reinforcement learning by simple Q learning is described above with reference to the Table 1, there are various types of learning algorithms such as Q learning, AC (Actor-Critic) method, TD learning, and Monte Carlo method, and the learning algorithm is not limited to in any type of algorithms. For example, when the number of states st and actions ac increase and the number of combinations thereof explosively increases, the algorithm may be selected, such as using the AC method, according to the situation.
Further, in the AC method, a probability distribution function is used as a policy function in many cases. The probability distribution function is not limited to the normal distribution function. For example, for the purpose of simplification, a sigmoid function, a soft max function, or the like may be used. The sigmoid function is a function that is used most commonly in neural networks. Because the reinforcement learning is one of the types of the machine learning that is the same as the neural network, it can use the sigmoid function. Further, the sigmoid function has another advantage that the function itself is simple and easily handled.
As described above, there are various learning algorithms and functions to be used, and an optimum algorithm and an optimum function may be selected as appropriate for the process.
As explained above, the PID control is not used in the extrusion molding apparatus according to the first embodiment. Therefore, to begin with, there is no need to adjust a parameter(s) which would otherwise be necessary when a process condition is changed. Further, the control unit 70 updates the control condition (the learning result) based on the reward rw through the reinforcement learning, and selects an optimum action ac corresponding to the current state st under the updated control condition. Therefore, even when a process condition(s) is changed, it is possible reduce the time taken for the adjustment and the amount of a resin material required therefor as compared to those in the comparative example.
Note that the products manufactured by the extrusion molding apparatus according to the first embodiment are not limited to resin films, and may be pipe materials, rod materials, covering materials for wires, or the like. Further, the extrusion molding apparatus according to the first embodiment may be used for extrusion molding of parison for blow molding.
Next, an outline of a method for controlling an extrusion molding apparatus according to the first embodiment will be described with reference to
Firstly, as shown in
Next, as shown in
Next, as shown in
When the manufacturing of the resin film 83 has not been finished (Step S4 No), the process returns to the step S3 and the control is continued. On the other hand, when the manufacturing of the resin film 83 has been completed (Step S4 YES), the control is finished. That is, the step S3 is repeated until the manufacturing of the resin film 83 is completed.
In
Next, details of the process for adjusting the rotation speed of the gear pump GP (Step S2) will be described with reference to
Firstly, as shown in
Next, the control condition learning unit 72 of the control unit 70 updates a control condition, which is a combination of a state st and an action ac, based on the reward rw. Then, the control condition learning unit 72 selects an optimum action ac corresponding to the current state st under the updated control condition (Step S22). Note that, at the start of the control, the control condition is not updated and remains as the initial value, but the optimum action ac corresponding to the state st at the start of the control is selected.
Then, the control signal output unit 74 of the control unit 70 outputs a control signal ctr to the motor M2 of the gear pump GP based on the optimum action ac selected by the control condition learning unit 72 (Step S23).
When the rotation speed of the gear pump GP has not been stabilized and hence the adjustment of the rotation speed of the gear pump GP has not been completed (Step S24 NO), the process returns to the step S21 and the adjustment of the rotation speed of the gear pump GP is continued. On the other hand, when the rotation speed of the gear pump GP has been stabilized, the adjustment of the rotation speed of the gear pump GP is finished (Step S24 YES). That is, the steps S21 to S23 are repeated until the adjustment of the rotation speed of the gear pump GP is completed. When the adjustment of the rotation speed of the gear pump GP has been completed, i.e., when the step S2 has been finished, the process goes to the step S3 shown in
As explained above, in the extrusion molding apparatus according to the first embodiment, the PID control is not used for the adjustment of the rotation speed of the gear pump GP. Therefore, to begin with, there is no need to adjust a parameter(s) which would otherwise be necessary when a process condition is changed. Further, the control condition (the learning result) is updated based on the reward rw through the reinforcement learning using a computer, and an optimum action ac corresponding to the current state st is selected under the updated control condition. Therefore, even when a process condition(s) is changed, it is possible reduce the time taken for the adjustment of the rotation speed of the gear pump GP and the amount of a resin material required therefor as compared to those in the comparative example.
Next, details of the process for controlling the rotation speed of the screw 12 (Step S3) during the manufacturing of a product will be described with reference to
Firstly, as shown in
Next, the control condition learning unit 72 of the control unit 70 updates a control condition, which is a combination of a state st and an action ac, based on the reward rw. Then, the control condition learning unit 72 selects an optimum action ac corresponding to the current state st under the updated control condition (Step S32). Note that, at the start of the control, the control condition is not updated and remains as the initial value, but the optimum action ac corresponding to the state st at the start of the control is selected.
Then, the control signal output unit 74 of the control unit 70 outputs a control signal ctr to the motor M1 of the screw 12 based on the optimum action ac selected by the control condition learning unit 72 (Step S33).
When the manufacturing of the resin film 83 has not been completed (Step S4 NO), the process returns to the step S31 and the control is continued. On the other hand, when the manufacturing of the resin film 83 has been completed (Step S4 YES), the control is finished. That is, the steps S31 to S33 are repeated until the manufacturing of the resin film 83 is completed.
As explained above, in the extrusion molding apparatus according to the first embodiment, the PID control is not used for the control of the rotation speed of the screw 12 during the manufacturing of a product. Therefore, to begin with, there is no need to adjust a parameter(s) which would otherwise be necessary when a process condition is changed. Further, the control condition (the learning result) is updated based on the reward rw through the reinforcement learning using a computer, and an optimum action ac corresponding to the current state st is selected under the updated control condition. Therefore, as compared to the comparative example, it is possible to improve the yield rate of products in a situation in which a process condition(s) is changed, and to flexibly respond to fluctuations in the pressure of the molten resin caused by an external factor(s) during the manufacturing of products.
Next, an extrusion molding apparatus according to a second embodiment will be described with reference to
Similarly to the first embodiment, the state observation unit 71 determines a current state st and a reward rw for an action ac selected in the past based on a difference (a control error err) between the pressure pv measured by the pressure sensor PS and the target pressure. Then, the state observation unit 71 outputs the current state st and the reward rw to the control condition learning unit 72. Further, the state observation unit 71 according to the second embodiment outputs the calculated control error err to the PID controller 74a.
Similarly to the first embodiment, the control condition learning unit 72 also performs reinforcement learning for each of the motors M1 and M2. Specifically, the control condition learning unit 72 updates a control condition (a learning result) based on the reward rw, and selects an optimum action ac corresponding to the current state st under the updated control condition. Note that in the first embodiment, the output to the motor M2 is directly changed according to the content (i.e., the details) of the action ac selected by the control condition learning unit 72. In contrast, in the second embodiment, a parameter(s) of the PID controller 74a, which controls the output of the motor M2, is changed according to the content (e.g., the details) of the action ac selected by the control condition learning unit 72.
As shown in
As described above, in the extrusion molding apparatus according to the second embodiment, PID control is used, so that it is necessary to adjust a parameter(s) when a process condition(s) is changed. In the extrusion molding apparatus according to the second embodiment, the control unit 70 updates the control condition (the learning result) based on the reward rw through the reinforcement learning, and selects an optimum action ac corresponding to the current state st under the updated control condition. Note that the action ac in the reinforcement learning is to change a parameter of the PID controller 74a, which controls the output of the motor M2. Therefore, even when a process condition(s) is changed, it is possible to reduce the time taken for the adjustment of the parameter and the amount of a resin material required therefor as compared to those in the comparative example.
The rest of the configuration is similar to that of the first embodiment, and therefore the description thereof will be omitted. The same applies to the control of a parameter(s) of a PID controller (a second PID controller) that controls the output of the motor M1, which is the driving source of the screw 12.
The program can be stored and provided to a computer using any type of non-transitory computer readable media. Non-transitory computer readable media include any type of tangible storage media. Examples of non-transitory computer readable media include magnetic storage media (such as floppy disks, magnetic tapes, hard disk drives, etc.), optical magnetic storage media (e.g., magneto-optical disks), CD-ROM (compact disc read only memory), CD-R (compact disc recordable), CD-R/W (compact disc rewritable), and semiconductor memories (such as mask ROM, PROM (programmable ROM), EPROM (erasable PROM), flash ROM, RAM (random access memory), etc.). The program may be provided to a computer using any type of transitory computer readable media. Examples of transitory computer readable media include electric signals, optical signals, and electromagnetic waves. Transitory computer readable media can provide the program to a computer via a wired communication line (e.g., electric wires, and optical fibers) or a wireless communication line.
From the disclosure thus described, it will be obvious that the embodiments of the disclosure may be varied in many ways. Such variations are not to be regarded as a departure from the spirit and scope of the disclosure, and all such modifications as would be obvious to one skilled in the art are intended for inclusion within the scope of the following claims.
Number | Date | Country | Kind |
---|---|---|---|
2020-205659 | Dec 2020 | JP | national |