METHODS AND APPARATUS TO AUTOMATE THE MANAGEMENT OF INTENSIVELY MANAGED MILK PRODUCING LIVESTOCK TO PRODUCE CUSTOMIZED PRODUCT DEPENDING ON END-USE USING MACHINE LEARNING

BACKGROUND

The embodiments described herein relate to methods and apparatus for automated management of livestock and/or the production of bioproducts by the livestock, using machine learning, to produce customized product depending on an identified end use.

In the past decade, technological advancements in intensive livestock management techniques have expanded greatly. Farmers can economically collect data on many aspects of animal health and bioproduct (e.g., milk) parameters during various phases of life of an animal (e.g., during phases of a lactation cycle). Using this data, veterinarians can attempt to alter feed blends that the animals consume to alter animal health or the bioproduct. In some instances, input of other professionals (e.g., veterinarians) can be used to inform decisions (e.g., selection of feed). Conventionally, this gathering of data and decision making is done manually and is time consuming, uneconomical, and error prone. Accordingly, there exists a need to automate a process of management of livestock that produce bioproducts that meet multiple customers, providing farmers the flexibility to cost effectively provide tailored bioproducts for their clients.

SUMMARY

In some embodiments, a method includes receiving an indication of a target quality of a property associated with a bioproduct obtained from a managed livestock. The target quality can be associated with an identified end-use. The method further includes receiving an indication of a current health status of the managed livestock and generating a set of input vectors based on the target quality of the property. The method further includes providing the set of input vectors to a machine learning model to generate an output indicating a feed selection to be used to feed the managed livestock. The feed selection is such that, upon consumption, it increases a likelihood of meeting the target quality of the property. The method further includes administering a feed blend to the managed livestock, the feed blend including the feed selection.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic illustration of an automatic livestock management system, according to an embodiment.

FIG. 2 is a schematic representation of a compute device included in an automatic livestock management system, according to an embodiment.

FIG. 3 is a schematic representation of a livestock management device included in an automatic livestock management system, according to an embodiment.

FIG. 4 is a schematic illustration of a flow of information within a livestock management system to manage livestock producing milk targeted for multiple customers, according to an embodiment.

FIG. 5 is a schematic representation of an interaction between an agent included in an automatic livestock management system and an environment in which the agent takes action to implement an automatic livestock management process, according to an embodiment.

FIG. 6 is a schematic representation of a flow of information within a livestock management system implementing machine learning to manage livestock producing milk targeted for multiple end-uses, according to an embodiment.

FIG. 7 is a flowchart describing a method of managing a livestock producing a bioproduct, according to an embodiment.

FIG. 8 is a plot of an example feed schedule generated by a livestock management system to achieve a target quality of a bioproduct based on end use, according to an implementation.

FIG. 9 is a flowchart describing a method of managing a livestock producing a bioproduct, according to an embodiment.

FIG. 10 is a schematic representation of states and state changes assumed by one or more agents implemented by a livestock management system, according to an embodiment.

FIG. 11 is a schematic representation of a sequence of state changes including options assumed by agents included in a livestock management system, according to an embodiment.

FIG. 12 is a schematic representation of interaction between agents implemented by a livestock management system using hierarchical models and the external world environment, according to an embodiment.

FIG. 13 is a schematic representation of a flow of information in a livestock management system implementing agents and temporal abstractions to learn relationships in a world environment, according to an embodiment.

FIG. 14 is a schematic representation of an example hierarchical model implemented by a livestock management system, according to an embodiment.

FIG. 15 is a schematic representation of a flow of information in a livestock management system implementing generation of synthetic states, according to an embodiment.

FIGS. 16A and 16B are schematic representations of state transition graphs that can be implemented by a livestock management system, according to an embodiment. The graphs are shown to be without and with including synthetic states and actions, respectively.

FIG. 17 is a schematic representation of an example world state transition graph implemented in a world environment, and a set of example synthetic state transition graphs similar to the world state transition graph but also including synthetic states and synthetic actions that can be implemented by a livestock management system, according to an embodiment.

DETAILED DESCRIPTION

In some embodiments, an apparatus includes a memory and a processor operatively coupled to the memory. The processor can be configured to receive an indication of a target quality of a property associated with a bioproduct obtained from a managed livestock, the target quality being associated with an identified end-use. The processor can be further configured to receive an indication of a health status of the managed livestock. The processor can be further configured to generate a set of input vectors based on the target quality of the property. The processor can be further configured to provide the set of input vectors to a machine learning model to generate an output indicating a feed selection to be used to feed the managed livestock. The feed selection can, upon consumption, increase a likelihood of meeting the target quality of the property.

In some embodiments, an apparatus includes a memory and a processor. The processor is configured to train a machine learning model to receive a target quality of a property associated with a bioproduct of a first managed livestock, receive inputs associated with a health status of the first managed livestock, and determine a temporal abstraction based on the target property and the inputs to be used to identify a feed selection. The feed selection is configured to increase a likelihood of achieving the target quality of the property associated with the bioproduct of the first managed livestock, the target quality being associated with an identified end-use. The processor is further configured to receive a target value of the property associated with the bioproduct produced by a second managed livestock. The processor is further configured to receive, at a first time, a first indication of the property associated with the bioproduct produced by the second managed livestock, and generate a set of feature vectors based on the target value of the property and the first indication of the property. The processor is further configured to provide the set of feature vectors to the machine learning model to generate, based on the temporal abstraction and the first indication of the property, a first output including a first feed selection. The first feed selection is configured to, upon consumption by the second managed livestock, increase a likelihood of achieving the target value of the property associated with the bioproduct of the second managed livestock based on the first indication of the property. The processor is further configured to receive, at a second time after the first time, a second indication of the property, and compare the second indication of the property with at least one of the first indication of the property or the target value of the property, to calculate a difference metric. The machine learning model is configured to adaptively update, based on the difference metric, the temporal abstraction to generate a second output including a second feed selection configured to, upon consumption by the second managed livestock, increase a likelihood of achieving the target value of the property associated with the bioproduct of the second managed livestock based on the second indication of the property.

Disclosed embodiments include a non-transitory processor-readable medium storing code representing instructions to be executed by a processor, the instructions including code to cause the processor to receive a target value of a property associated with a bioproduct obtained from a managed livestock. The target value can be associated with an identified end-use of the bioproduct. The instructions can further include code to cause the processor to receive, at a first time, a first indication of the property associated with the bioproduct obtained from the managed livestock, and generate a set of input vectors based on the target value of the property and the first indication of the property associated with the bioproduct. The instructions can further include code to cause the processor to provide the set of input vectors to a machine learning model associated with a set of hyperparameters to generate a first output indicating a first feed selection to be used to feed the managed livestock. The first feed selection can be configured to, upon consumption, increase a likelihood of achieving the target value associated with the bioproduct based on the first indication of the property associated with the bioproduct. The instructions can further include code to cause the processor to receive at a second time after the first time, a reward signal associated with a second indication of the property associated with the bioproduct. The instructions can further include code to cause the processor to automatically adjust at least one hyperparameter from the set of hyperparameters, in response to the reward signal. The machine learning model can be configured to generate a second output indicating a second feed selection to be used to feed the managed livestock, the second feed selection configured to, upon consumption, increase a likelihood of achieving the target value associated with the bioproduct based on the second indication of the property associated with the bioproduct.

FIG. 1 is a schematic illustration of a livestock management system 100, also referred to herein as “an LM system” or “a system”. The LM system 100 is configured to help in intensive management of livestock that produce bioproducts (e.g., milk, eggs, fiber (e.g., wool), honey, etc.) using machine learning models and/or tools. The LM system 100 can be configured to provide procedural guidance directed to obtain bioproducts that can have customized characteristics based on the intended end-use of the bioproducts. For example, commercial production of milk can have multiple end-uses including distribution of milk, production/distribution of cheeses, production/distribution of butter and/or yogurts. Commercial customers (producers of milk-based products) can have different specifications for milk content that they receive from farmers that manage livestock. Meeting the specifications of various commercial customers typically involves farmers that manually manage a process of resource allocation, and maintenance of animals to produce milk having specific characteristics that meet each of the requirements of the various customers. The LM system 100 is configured to implement machine learning models and/or tools to handle the process of resource allocation, organization, and/or maintenance of animals in managed livestock to meet specific requirements of the bioproducts efficiently. The LM system 100 can also be configured to carry out the maintenance automatically with minimal human intervention to aid milk producing farms to produce milk optimized for their various client needs.

The livestock management (LM) system 100 is configured to manage receiving information from a set of compute devices 101-104 and, based on the information, implement an automatic livestock management process including evaluating procedural alternatives, making choices from the alternatives, and/or implementing rules. The choices, decisions, or rules can be associated with any suitable action or resource related to intensively managed livestock (e.g., animal selection for generating bioproduct, feed selection, medicine selection, bioproduct analysis, resource allocation, and/or the like). The livestock management system 100 can receive data related to health and/or bioproduct produced by a cohort of animals. In some instances, the LM system 100 can receive data related to a quality of bioproduct of interest to a specified customer. In some instances, the LM system can receive data related to costs of maintenance of a cohort of livestock and/or an efficiency associated with a yield of bioproduct from a cohort of livestock. Based on a received data, the LM system 100 can evaluate past and/or new protocols of livestock management including animal selection for generating bioproduct, feed selection, medicine selection, bioproduct analysis, resource allocation, and/or the like, according to an embodiment. The LM system 100 includes compute devices 101, 102, 103, and 104, connected to a livestock management device 105 (also referred to as “the device”) through a communications network 106, as illustrated in FIG. 1. While the LM system 100 is illustrated to include four compute devices 101-104, a similar LM system can include any number of compute devices.

In some embodiments, the communication network 106 (also referred to as “the network”) can be any suitable communications network for transferring data, operating over public and/or private networks. For example the network 106 can include a private network, a Virtual Private Network (VPN), a Multiprotocol Label Switching (MPLS) circuit, the Internet, an intranet, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a worldwide interoperability for microwave access network (WiMAX®), an optical fiber (or fiber optic)-based network, a Bluetooth® network, a virtual network, and/or any combination thereof. In some instances, the communication network 106 can be a wireless network such as, for example, a Wi-Fi or wireless local area network (“WLAN”), a wireless wide area network (“WWAN”), and/or a cellular network. In other instances, the communication network 106 can be a wired network such as, for example, an Ethernet network, a digital subscription line (“DSL”) network, a broadband network, and/or a fiber-optic network. In some instances, the network can use Application Programming Interfaces (APIs) and/or data interchange formats, (e.g., Representational State Transfer (REST), JavaScript Object Notation (JSON), Extensible Markup Language (XML), Simple Object Access Protocol (SOAP), and/or Java Message Service (JMS)). The communications sent via the network 106 can be encrypted or unencrypted. In some instances, the communication network 106 can include multiple networks or subnetworks operatively coupled to one another by, for example, network bridges, routers, switches, gateways and/or the like (not shown).

The compute devices 101, 102, 103, and 104, in the LM system 100 can each be any suitable hardware-based computing device and/or a multimedia device, such as, for example, a device, a desktop compute device, a smartphone, a tablet, a wearable device, a laptop and/or the like.

FIG. 2 is a schematic block diagram of an example compute device 201 that can be a part of an LM system such as the LM system 100 described above with reference FIG. 1, according to an embodiment. The compute device 201 can be structurally and functionally similar to the compute devices 101-104 of the system 100 illustrated in FIG. 1. The compute device 201 can be a hardware-based computing device and/or a multimedia device, such as, for example, a device, a desktop compute device, a smartphone, a tablet, a wearable device, a laptop and/or the like. The compute device 201 includes a processor 211, a memory 212 (e.g., including data storage), and a communicator 213.

The processor 211 can be, for example, a hardware based integrated circuit (IC) or any other suitable processing device configured to run and/or execute a set of instructions or code. For example, the processor 211 can be a general purpose processor, a central processing unit (CPU), an accelerated processing unit (APU), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a programmable logic array (PLA), a complex programmable logic device (CPLD), a programmable logic controller (PLC) and/or the like. The processor 211 can be operatively coupled to the memory 212 through a system bus (for example, address bus, data bus and/or control bus).

The processor 211 can be configured to collect, record, log, document, and/or journal data associated with health and/or yield of bioproduct of a cohort of managed livestock. In some instances, the compute device 201 can be associated with a farmer, a veterinarian, animal handling personnel, and/or the like who collect/log data associated with a health of animals or data associated with a bioproduct produced by the animals. In some instances, the compute device 201 can be associated with a customer interested in the purchase of bioproducts produced by a managed livestock. In some instances, the compute device 201 can be associated with an entity providing analytical services to analyze the contents of samples. For example, the compute device can be associated with an analytical service provider configured to analyze the contents of milk produced by a cohort of managed livestock.

The processor 211 can include a data collector 214. The processor can optionally include the history manager 231, and application 241. In some embodiments, the data collector 214, the data history manager 231 and/or the application 241 can include a process, program, utility, or a part of a computer's operating system, in the form of code that can be stored in memory 212 and executed by the processor 211.

In some embodiments, each of the data collector 214, the history manager 231, and/or the application 241 can be software stored in the memory 212 and executed by processor 211. For example, each of the above-mentioned portions of the processor 211 can be code to cause the processor 211 to execute the data collector 214, the history manager 231, and/or the software application 241. The code can be stored in the memory 212 and/or a hardware-based device such as, for example, an ASIC, an FPGA, a CPLD, a PLA, a PLC and/or the like. In other embodiments, each of the data collector 214, the history manager 231, and/or the application 241 can be hardware configured to perform the specific respective functions.

The data collector 214 can be configured to run as a background process and collect or log data related to cohorts of animals in a managed livestock. In some instances, the data can be logged by personnel via the application 241 in the compute device 201. In some instances, the data can be automatically logged by sensors associated with the compute device 201 (not shown in FIG. 2). The sensors may be operated via the application 241 in the compute device 201. The sensors can be configured to automatically log data at specified time points or intervals and the data can be recorded by the data collector 214. As an example, an animal handling system in a farm can include an automatic health analysis sensor that measures and records values associated with an animal (e.g., weight, temperature, etc.). As another example, an animal handling system in a farm can include an automatic content analysis sensor that is associated with a bioproduct collection system (e.g., a milk collection system). The content analysis sensor can be configured to automatically measure values (e.g. temperature, pH, weight, volume, density, relative and/or absolute fat/protein/dry extract content, etc.).

The data collector 214 can monitor, collect and/or store data or information related to health status data, feed data, data related to yield, quantity, and/or quality of bioproducts produced, medical supplements data, data associated with targeted end-use, customer requirements of quantity, properties associated with a bioproduct, and/or target qualities of properties associated with bioproducts, and/or the like.

In some instances, the data collector 214 can store the information collected in any suitable form such as, for example, in the form of text-based narrative of events, tabulated sequence of events, data from sensors, and/or the like. In some instances, the data collector 214 can also analyze the data collected and store the results of the analysis in any suitable form such as, for example, in the form of event logs, or look-up tables, etc. The data collected by the data collector 214 and/or the results of analyses can be stored for any suitable period of time in the memory 212. In some instances, the data collector 214 can be further configured to send the collected and/or analyzed data, via the communicator 213, to a device that may be part of an LM system to which the compute device 201 is connected (e.g., the LM device 105 of the system 100 illustrated in FIG. 1). In some instances, the data collector 214 can be configured to send the collected and/or analyzed data automatically (e.g., at specified time points, or periodically with a predetermined frequency of communication), in response to receiving an instruction from a user to send the analyzed data, and/or in response to a query from the LM device for the analyzed data.

In some embodiments, the history manager 231 of the processor 211 can be configured to maintain logs or schedules associated with a history of handling or management of animals in a cohort of livestock, the quantity/quality of feed and/or medicinal supplement provided, quantity/quality of bio products produced, the costs associated with the maintenance of the cohort of animals, and/or the like. The history manager 231 can also be configured to maintain a log of information related to the sequence of events (e.g., interventions provided to animals) and/or a concurrent set of data logged indicating health and/or production of the animals.

The memory 212 of the compute device 201 can be, for example, a random-access memory (RAM), a memory buffer, a hard drive, a read-only memory (ROM), an erasable programmable read-only memory (EPROM), and/or the like. The memory 212 can be configured to store any data collected by the data collector 214, or data processed by the history manager 231, and/or the application 241. In some instances, the memory 212 can store, for example, one or more software programs and/or code that can include instructions to cause the processor 211 to perform one or more processes, functions, and/or the like (e.g., the data collector 214, the history manager 231 and/or the application 241). In some embodiments, the memory 212 can include extendable storage units that can be added and used incrementally. In some implementations, the memory 212 can be a portable memory (for example, a flash drive, a portable hard disk, and/or the like) that can be operatively coupled to the processor 211. In some instances, the memory can be remotely operatively coupled with the compute device. For example, a remote database device can serve as a memory and be operatively coupled to the compute device.

The communicator 213 can be a hardware device operatively coupled to the processor 211 and memory 212 and/or software stored in the memory 212 executed by the processor 211. The communicator 213 can be, for example, a network interface card (NIC), a Wi-Fi™ module, a Bluetooth® module and/or any other suitable wired and/or wireless communication device. Furthermore, the communicator 213 can include a switch, a router, a hub and/or any other network device. The communicator 213 can be configured to connect the compute device 201 to a communication network (such as the communication network 106 shown in FIG. 1). In some instances, the communicator 213 can be configured to connect to a communication network such as, for example, the Internet, an intranet, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a worldwide interoperability for microwave access network (WiMAX®), an optical fiber (or fiber optic)-based network, a Bluetooth® network, a virtual network, and/or any combination thereof.

In some instances, the communicator 213 can facilitate receiving and/or transmitting data or files through a communication network (e.g., the communication network 106 in the LM system 100 of FIG. 1). In some instances, received data and/or a received file can be processed by the processor 211 and/or stored in the memory 212 as described in further detail herein. In some instances, as described previously, the communicator 213 can be configured to send data collected and/or processed by the data collector 214 and/or history manager 231 to a device of an LM system (e.g., LM device 105) to which the compute device 201 is connected.

Returning to FIG. 1, the compute devices 101-104 that are connected to LM system 100 can be configured to communicate with an LM device 105 via the communication network 106. FIG. 3 is a schematic representation of an LM device 305 that is part of an LM system. The LM device 305 can be structurally and/or functionally similar to the LM device 105 of the system 100 illustrated in FIG. 1. The LM device 305 includes a communicator 353, a memory 352, and a processor 351.

Similar to the communicator 213 within compute device 201 of FIG. 2, the communicator 353 of the LM device 305 can be a hardware device operatively coupled to the processor 351 and the memory 352 and/or software stored in the device memory 352 executed by the processor 351. The communicator 353 can be, for example, a network interface card (NIC), a Wi-Fi™ module, a Bluetooth® module and/or any other suitable wired and/or wireless communication device. Furthermore, the communicator 353 can include a switch, a router, a hub and/or any other network device. The communicator 353 can be configured to connect the LM device 305 to a communication network (such as the communication network 106 shown in FIG. 1). In some instances, the communicator 353 can be configured to connect to a communication network such as, for example, the Internet, an intranet, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a worldwide interoperability for microwave access network (WiMAX®), an optical fiber (or fiber optic)-based network, a Bluetooth® network, a virtual network, and/or any combination thereof.

The memory 352 of the LM device 305 can be a random-access memory (RAM), a memory buffer, a hard drive, a read-only memory (ROM), an erasable programmable read-only memory (EPROM), and/or the like. The device memory 352 can store, for example, one or more software modules and/or code that can include instructions to cause the device processor 351 to perform one or more processes, functions, and/or the like. In some implementations, the device memory 352 can be a portable memory (e.g., a flash drive, a portable hard disk, and/or the like) that can be operatively coupled to the device processor 351. In some instances, the device memory can be remotely operatively coupled with the device. For example, the device memory can be a remote database device operatively coupled to the device and its components and/or modules.

The processor 351 can be a hardware based integrated circuit (IC) or any other suitable processing device configured to run and/or execute a set of instructions or code. For example, the processor 351 can be a general purpose processor, a central processing unit (CPU), an accelerated processing unit (APU), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a programmable logic array (PLA), a complex programmable logic device (CPLD), a programmable logic controller (PLC) and/or the like. The processor 351 is operatively coupled to the memory 352 through a system bus (e.g., address bus, data bus and/or control bus). The processor 351 is operatively coupled with the communicator 353 through a suitable connection or device as described in further detail.

The processor 352 can be configured to include and/or execute several components, units and/or instructions that may be configured to perform several functions, as described in further detail herein. The components can be hardware-based components (e.g., an integrated circuit (IC) or any other suitable processing device configured to run and/or execute a set of instructions or code) or software-based components (executed by the processor 352), or a combination of the two. As illustrated in FIG. 3, the processor 351 includes a data aggregator 355, an agent manager 356, an ML model 357, and a predictor 358.

The data aggregator 355 in the processor 351 can be configured to receive communications between the device 305 and compute devices connected to the device 305 through suitable communication networks (e.g., compute devices 101-104 connected to the device 105 via the communication network 106 in the system 100 in FIG. 1). The data aggregator 355 is configured to receive, from the compute devices, information collected and/or generated by the one or more data collectors in the compute devices (e.g., data collector 214 of compute device 201 shown and described with respect to FIG. 2). The data from data collectors of various compute devices can, in some instances, include one or more logs or records or other data relating to animal handling of a managed livestock, animal feed schedule, schedules of medicinal and/or dietary supplements provided to animals, productions of bioproducts by the animals, history of yield of bioproducts including measures of a quantity and/or quality of bioproducts, an indication of properties of a bioproduct that may be of interest to an end use customer (e.g., a measure of protein content of milk that of interest to manufacturers of milk products), a measure of costs associated with maintenance of livestock, and/or the like. The end use can include, for example, drinking milk, milk used to produce cheese, milk used to produce butter, milk used to produce yogurt, milk used to produce ice cream, milk used in baking, eggs used in baking, eggs for cooking, eggs used in specific diets (e.g., ketogenic diet, paleo diet, Atkins diet, etc.), wool or any other suitable fiber derived from animals, honey derived from bees raised in intensely managed bee colonies, etc.

The data aggregator 355 is further configured to receive data associated with history managers in the compute devices (e.g., history manager 231 on compute device 201 in FIG. 2). The data associated with history manager 231 can include information associated with a history of handling or management of animals in a cohort of livestock, the quantity/quality of feed and/or medicine supplement provided, quantity/quality of bio products produced, the costs associated with the maintenance of the cohort of animals, and/or the like. In some instances, the data aggregator 355 can be configured to receive a record of information related to a sequence of events (e.g., interventions provided to animals) and/or a concurrent set of data logged indicating health and/or production of the animals. In some implementations, the data aggregator 355 can receive the information sent by the compute devices at one or more specified time points or intervals. In some implementations, the data aggregator 355 can be configured to query the compute devices at one or more specified time points or time intervals to receive the data or information in response to the query. In some implementations, the data aggregator 355 can be configured to send queries and/or receive data/information from compute devices automatically and/or in response to a user generated action (e.g., user activated transmission of a query via a software user interface). In some instances, the data aggregator 355 can be further configured to receive, from farmers and/or animal handling personnel, data associated with day-to-day or regular handling of animals in a managed livestock. The data can include an identification associated with animals, feed, current health indications (age, weight, etc.), medicine, schedule, duration, intervals, quantity of feed/medicine/supplement, quality of feed/medicine/supplement and/or the like.

In some instances, the data aggregator 355 can be further configured to receive, analyze, and/or store communications from compute devices regarding any suitable information related to end use. The information received from a compute device can include, for example, a quantity/quality associated with bioproduct content (e.g., milk, eggs, honey, fiber, etc.) desired for a particular end use, one or more threshold values of one or more properties associated with quality of the bioproduct content, and/or the like. The data aggregator 355, in some instances, can also be configured to receive analytical reports based on analysis of bioproduct samples from a specified cohort of animals.

The data aggregator 355, in some instances, can also be configured to receive information from animal health experts such as veterinarians including reports on the current health status of specified animals in a managed livestock. In some instances, the information can include a recommendation of dietary feed and/or medicine supplements to be provided to the animals based on the analysis of the current health status of the animals. In some instances, the information can include a recommendation of feed and/or medicinal blend to be provided to animals based on a target property of a bioproduct to be achieved from the animals, and/or based on the current health status of the animals.

The processor 351 includes an agent manager 356 that can be configured to generate and/or manage one or more agents configured to interact in an environment and/or implement machine learning. An agent can refer to an autonomous entity that performs actions in an environment. An environment can be defined as a state/action space that an agent can perceive, act in, and receive a reward signal regarding the quality of its action in a cyclical manner (illustrated in FIG. 5). An LM system can define a dictionary of agents including definitions of characteristics of agents, their capabilities, expected behavior, parameters and/or hyperparameters controlling agent behaviors, etc. An LM system can define a dictionary of actions available to agents. In some implementations the actions available to an agent can depend and/or vary based on the environment or world in which the agent acts. For example, an environment or world can be defined to include state/action pairs associated with the health of a cohort of livestock, providing or administering feeds and/or medicinal treatments to animals in the cohort, obtaining data indicating health status from the animals, obtaining bioproducts from animals in the groups, analyzing the contents of bioproducts, providing recommendations of medicine/dietary changes to animals in a group, administering medicine/dietary changes to animals in a group, responding to unexpected turns in health status and/or change in quantity or quality of bioproduct, and/or the like. Through this simple cyclical interaction, agents can be configured to learn to automatically interact within a world intelligently without the need of a controller (e.g., a programmer) defining every action sequence that the agent takes.

In an example implementation, agent-world interactions can include the following steps. An agent observes an input state. An action is determined by a decision-making function or policy (which can be implemented by an ML model 358). The action is performed. The agent receives a scalar reward or reinforcement from the environment in response to the action being performed. Information about the reward given for that state/action pair is recorded. The agent can be configured to learn based on the recorded history of state/action pair and the associated reward. Each state/action pair can be associated with a value using a value function under a specific policy. Value functions can be state-action pair functions that estimate how good a particular action can be at a given state, or what the return for that action is expected to be. In some implementations, the value of a state (s) under a policy (p) can be designated V^p(s). A value of taking an action (a) when at state (s) under the policy (p) can be designated Q^p(s,a). The goal of the LM device 305 can then be estimating these value functions for a particular policy. The estimated value functions can then be used to determine sequences of actions that can be chosen in an effective and/or accurate manner such that each action is chosen to provide an outcome that maximizes and/or increases total reward possible, after being at a given state.

As an example, the agent manager 356 can define a virtualized environment that includes the virtualized management of a specified cohort of virtualized animals of a managed livestock (e.g., goats). The virtualized environment can be developed using data aggregated by the data aggregator 355.

The managed livestock can be raised to produce a specified bioproduct (e.g., milk). The agent manager 356 can define agents that perform actions that simulate events in the real world that may impact the management of the cohort of animals of the managed livestock. For example, the agent manager 356 can define actions that can simulate providing a specified feed blend to a cohort of animals, providing a blend of medicinal supplements to the cohort of animals, measuring a health status associated with the cohort of animals, obtaining a production of a specified quantity and/or quality of a bioproduct (e.g., a volume of milk, a measured value of a protein content in milk, and/or the like), etc.

In some implementations, each agent can be associated with a state from a set of states that the agent can assume. For example, the agent can be monitoring specific values associated with a bioproduct, which can be obtained from laboratory results from testing samples of the bioproduct. The specific values can be associated with by-products obtained from samples of the bioproduct. Samples of bioproduct can be associated with concurrent or otherwise temporally related feed and/or medicinal treatments used. Samples can be obtained and analyzed intermittently to aid in the monitoring. For example, the agent can use and/or track one or more values associated with by-products that can be in proximity to a set of target values. The by-product values and/or the target values can be used to define a reward function of the agent. In one example, the agent can be configured to optimize feed recommendations for a cheese producing customer who desires a target value of an optimal, desired, and/or sufficient value of average fat percentage in the milk that they purchase. In some implementations, the reward signal is computed based on analysis of a sample and a computation of a percent difference or a percent error of a value associated with the sample for example, a content of a by-product such as fat content in milk, or the like, compared to a selected target value. The reward signal can be configured to have a higher value for when the percent difference or percent error is smaller than a first threshold, that is the content of the by-product (e.g., fat) in the bioproduct (e.g., milk) is at or near the target value. The reward signal can be configured to have a lower value when the percent difference or percent error is greater than a second threshold, that is the content of the by-product (e.g., fat) in the bioproduct (e.g., milk) is farther away from the target value. Over time the LM system can learn to select feeds and medicinal treatments that maximize and/or increase the livestock cohort's ability to produce byproduct with the ideal and/or improved value of the by-product (e.g., fat content or fat percentage). In another example, the agent can be configured to optimize and/or improve feed recommendations for producing bioproduct, which can be milk, catered for a butter producing customer that has optimal or a desired protein and fat percentage needs based on which target values of protein and fat content can be set. The reward signal can be computed by root mean error of the protein and fat percentages in samples collected intermittently and compared against the target values of each, respectively. The reward signal can have a higher value when the milk is at or near the target values by a specified amount, for example, above a specified first threshold value for protein and a specified second threshold value for fat. Conversely, the reward signal can have a lower value when the protein and fat percentages are farther away from the respective target values. Over time the LM system can learn to select feeds and medicinal treatments that maximize and/or increase the livestock cohort's ability to produce by-product with a desired protein and fat percentages. Each agent can be configured to perform an action from a set of actions. The agent manager 356 can be configured to mediate an agent to perform an action, the result of which transitions the agent from a first state to a second state. In some instances, a transition of an agent from a first state to a second state can be associated with a reward. For example, an action of providing a dietary and/or medicinal supplement can result in a reward in the form of an increase in a protein content associated a milk produced by a cohort of animals of livestock. The actions of an agent can be directed towards achieving specified goals. An example goal can be maximizing rewards in an environment. For example, a goal can be defined to achieve a specified increase in a protein content associated with milk produced by a cohort of goats within a specified duration of time. The actions of agents can be defined based on observations of states of the environment obtained through data aggregated by the data aggregator 356 from compute devices or sources related to the environment (e.g., from sensors). In some instances, the actions of the agents can inform actions to be performed via actors (e.g., human or machine actors or actuators). In some instances, the agent manager 356 can generate and/or maintain several agents. The agents can be included in groups defined by specified goals. In some instances, the agent manager 356 can be configured to maintain a hierarchy of agents that includes agents defined to perform specified tasks and sub-agents under control of the agents.

In some instances, agent manager 356 can mediate and/or control agents to be configured to learn from past actions to modify future behavior. In some implementations, the agent manager 356 can mediate and/or control agents to learn by implementing principles of reinforcement learning. For example, the agents can be directed to perform actions, receive indications of rewards and associate the rewards to the performed actions. Such agents can then modify and/or retain specific actions based on the rewards that are associated with each action, to achieve a specified goal by a process directed to increase the number of rewards. In some instances, such agents can operate in what is initially an unknown environment and can become more knowledgeable and/or competent in acting in that environment with time and experience. In some implementations, agents can be configured to learn and/or use knowledge to modify actions to achieve specified goals.

In some embodiments, the agent manager 356 can configure the agents to learn to update or modify actions based on implementation of one or more machine learning models. In some embodiments, the agent manager 356 can configure the agents to learn to update or modify actions based on principles of reinforcement learning. In some such embodiments, the agents can be configured to update and/or modify actions based on a reinforcement learning algorithm implemented by the ML model 357, described in further detail herein.

In some implementations, the agent manager 356 can generate, based on data obtained from the data aggregator 355, a set of input vectors that can be provided to the ML model 357 to generate an output that determines an action of an agent. In some implementations, the agent manager 356 can generate input vectors based on inputs obtained by the data aggregator 355 including data received from compute devices and/or other sources associated with a managed livestock (e.g., sensors). In some implementations, the agent manager 356 can generate the input vectors based on a target quality of a property associated with the bioproduct. For example, the data aggregator 355 can receive data from a first compute device associated with a customer of a bioproduct (e.g., milk), from a farmer managing the livestock (e.g., goats), the customer being a manufacturer of cheese and cheese products. In some instances, the customer can provide a target quality (e.g., a targeted high level) of a property (e.g., protein content) of the bioproduct (e.g., milk) including an indication of a desired target quality of a property associated with a bioproduct produced by a cohort of animals of a managed livestock. For example, the indication can include a threshold volume of milk and/or a threshold level of protein content to be suitable for the end use of manufacturing cheese and cheese products. In some implementations, the agent manager 356 can receive the inputs obtained by the data aggregator 355 including the indication of the target quality of the property of the bioproduct, current health status of the animals, and generate input vectors to be provided to the ML model 357 to generate an output.

The ML model 357, according to some embodiments, can employ an ML algorithm to optimize a selection of schedules, feeds and/or medicines that can be used to produce bioproducts customized for different end-uses. In some instances, the ML model 357 can implement a reinforcement learning algorithm to determine action that can be undertaken by agents in a virtualized environment to arrive at predictions of indications of a selection of feed blends, feed schedules, and/or medicines to increase a probability or likelihood of achieving a specified goal. The goal can be a specific target quality of a property of a bioproduct, for example a target quality of a bioproduct desired by a specific customer for a specific end use.

The ML model 357 can be configured such that it receives input vectors and generates an output based on the input vectors, the output including an indication of a feed blend, medicine, supplements, schedule, and/or a feed selection that can increase the likelihood of meeting the target quality of the property. In some instances, the ML model 357 can be configured to generate an output indicating a feed schedule or feed blend that puts the animals producing the bioproduct on a trajectory to achieve the desired target quality within a specific time period. In some implementations, the ML model 357 can be configured to generate an output indicating a schedule to be adopted, to meet a target quantity/quality of property of bioproduct by a specific time point. In some implementations, the ML model 357 can be configured to account for a duration that the animals have to be on a particular feed schedule in order to achieve the desired type of bioproduct quality and/or quantity. The ML model 357 can be implemented using any suitable model (e.g., a statistical model, a mathematical model, a neural network model, and/or the like). The ML model 357 can be configured to receive inputs and based on the inputs generate outputs.

In some implementations, the ML model 357 can receive inputs related to a current health status of a cohort of identified animals of a managed livestock (e.g., current health status of a selected group of goats) and agents can perform actions proposed by the agent manager 356 based on one or more outputs of a machine learning (ML) model such as the ML model 357. In some implementations, the ML model 357 can be configured to model and/or implement the environment, agents, and interactions between the agents and the environment. The ML model 357 can be configured to implement agents, their actions, and/or state transitions associated with the agents and actions. In some implementations, the ML model 357 can be configured to receive inputs based on information related to health and/or yield of animals in the managed livestock and use the inputs to implement rewards in response to agent actions. For example, the inputs can include an indication of a change in a quality of a property of bioproduct (e.g., an increase in protein content), or a change in health status of an animal (e.g., a decrease in weight of an animal). The ML model 357 can implement any suitable form of learning such as supervised learning, unsupervised learning and/or reinforcement learning. The ML model 357 can be implemented using any suitable modeling tools including statistical models, mathematical models, decision trees, random forests, neural networks, etc. In some embodiments, the ML model 357 can implement one or more learning algorithms. Some example learning algorithms that can be implemented by the ML model can include Markov Decision Processes (MDPs), Temporal Difference (TD) Learning, Advantage Actor-Critic (A2C), Asynchronous Advantage Actor-Critic (A3C), Deep Q Networks (DQNs), Deep Deterministic Policy Gradient (DDPG), Evolution Strategies (ES) and/or the like. The learning scheme implemented can be based on the specific application of the task. In some instances, the ML model 357 can implement Meta-Learning, Automated Machine Learning and/or Self-Learning systems based on the suitability to the task.

The ML model 357 can incorporate the occurrence of rewards and the associated inputs, outputs, agents, actions, states, and/or state transitions in the scheme of learning. The ML model 357 can be configured to implement learning rules or learning algorithms such that upon receiving inputs indicating a desired goal or trajectory that is similar or related to a goal or trajectory that was achieved or attempted to be achieve in the past, the ML model 357 can use the history of events including inputs, outputs, agents, actions, state transitions, and/or rewards to devise an efficient strategy based on past knowledge to arrive at the solution more effectively.

While an ML model 357 is shown as included in the LM device 405, in some embodiments, the ML model can be omitted and the LM device 405 can implement a model free reinforcement learning algorithm to implement agents and their actions.

In some implementations, the ML model 357 and/or the agent manager 356 can implement hierarchical learning (e.g., hierarchical reinforcement learning) using multiple agents undertaking multi-agent tasks to achieve a specified goal. For example, a task can be decomposed into sub-tasks and assigned to agents and/or sub-agents to be performed in a partially or completely independent and/or coordinated manner. In some implementations, the agents can be part of a hierarchy of agents and coordination skills among agents can be learned using joint actions at higher level(s) of the hierarchy.

In some implementations, the ML model 357 and/or the agent manager 356 can implement temporal abstractions in learning and developing strategies to accomplish a task towards a specified goal. In some implementations, temporal abstractions can be an encapsulation of a set of repeatable sequence of actions (e.g., primitive actions) for use within a domain of implementation, but learnt from one set of tasks and applied to a different set of tasks. Temporal abstractions can be implemented using any suitable strategy including an options framework, bottleneck option learning, hierarchies of abstract machines and/or MaxQ methods.

The processor 351 further includes a predictor 358 configured to receive outputs from the ML model 357 and based on the outputs make predictions that can be tested in the real world. For example, the predictor 358 can receive outputs of ML model 357 and generate a prediction of achieving a specified target quality of a property of a bioproduct within a specified duration of time following the implementation of a feed schedule and/or a feed selection based on the outputs of the ML model 357. In some implementations, the predictor 358 can receive outputs of ML model 357 and generate a prediction of a projected amount of time needed to administer the feed selection to the managed livestock for the managed livestock to meet a specified target quality of the property. In some implementations, the predictor 358 can receive outputs of ML model 357 and generate a prediction of a projected amount of time that the managed livestock have to be fed using a recommend feed selection and/or feed schedule in a sustained manner or according to an indicated schedule for the managed livestock to meet a specified target quality of the property.

In some implementations, the predictor 358 can provide several predictions that can be used to choose a strategy to be implemented in the real world. In some implementations, the predictor 358 can be configured to recommend a feeding schedule or an animal care schedule while accounting for a duration of time that the animals will need to be under that schedule to achieve the desired goal. The scheduling needs can include a number of animals needed to produce a desired volume or quantity of bioproduct with the target quality of bioproduct for a customer's contract. In some instances, the output of the predictor 358 can be used to provide the farmer with an estimate of needs and costs to fulfill a customer's request. In some instances, the output of the predictor 358 can be used to determine profitability and quote estimation.

In use, the LM device 305 can receive inputs from one or more compute devices and/or remote sources using a data aggregator 355. The inputs can include information regarding health, handling, and/or feeding schedule of animals producing a bioproduct such as milk, information associated with a current yield (quantity/quality) of the bioproduct, indications of desired quantities/qualities in a bioproduct, etc. The LM device 305 can implement virtualized agents acting within a virtualized world or environment, using an agent manager 356 and/or an ML model 357. In some implementations, the environment can be defined in a form of a Markov decision process. For example, the environment can be modeled to include a set of environment and/or agent states (S), a set of actions (A) of the agent, and a probability of transition at a discreet time point (t) from a first state (S1) to a second state (S2), the transition being associated with an action (a).

In some implementations, the agents and/or the world can be developed based on one or more inputs or modified by one or more user inputs. The LM device 305 can provide aggregated information to the ML model 357. In some embodiments, the agent(s) can be part of the ML model 357. In some embodiments, the ML model 357 can implement the environment in which the agent(s) are configured to act. Similarly stated, in some embodiments, the agent(s) can be the ML model 357 for the system and in some embodiments, the ML model can be connected to the external environment and execute the recommendations in the environment. In some instances, the LM device 305 can receive an indication of a change in yield following an initiation of a feed schedule. The indication may include a positive change in the yield in the direction of a desired trajectory. In some instances, the LM device 305 can receive an indication of a recommendation of feed blend from a veterinarian. The recommendation can be closely aligned with a prior prediction or recommendation generated by the LM device 305. The LM device 305 can then provide the input associated with the positive change in the yield, and/or the indication of a recommendation from a veterinarian which is aligned with a recommendation of the LM device 305, in the form of a reward such that the ML model 357 can learn the positive association of a previously recommended strategy (e.g., feed blend, feed schedule, etc.) with external validation. Over time and/or a course of implementation of the virtualized environment/agents, the LM device 305 can generate an output based on the information received. The output of the ML model 357 can be used by a predictor 358 to generate a prediction of an outcome or an event or a recommendation of an event to achieve a desired goal. For example the output of the predictor 358 based on the output of the ML model 357 can include a recommendation of a feed blend or a feed schedule that a cohort of animals can be provided with for a specified period to achieve a higher likelihood of meeting a desired quality and/or quantity of the bioproduct obtained from the cohort of animals.

While the device 305 is described to have one each of a data aggregator, an agent manager, an ML model, and a predictor, in some embodiments, a device similar to the device 305 can be configured with several instances of the above mentioned units, components, and/or modules. For example, in some embodiments, the device may include several data aggregators associated with one or more compute devices or groups of compute devices. The device may include several agent managers generating and operating multiple agents as described in further detail herein. In some embodiments, the device may include several ML models and/or several predictors assigned to perform specified computations and/or predictions such as, for example, to predict a feed blend to most efficiently achieve a target property of a bioproduct, or to predict an estimated cost associated with a specified protocol of animal handling, to predict a quality (e.g., values associated with properties) of a bioproduct given a specified feed schedule and a given duration, etc. In some embodiments, one or more of the components including a data aggregator, an agent manager, an ML model, and a predictor can be omitted and/or combined with another component to perform related functions.

FIG. 4 is an illustration of a flow of information in a LM system 400, according to an implementation. The LM system 400 can be substantially similar to the system 100 in structure and/or function. In the illustrated implementation, the LM system 400 can include a LM device 405, a compute device 406 associated with a farmer managing a livestock including goats producing milk, a compute device 404 associated with an animal's health specialist (e.g., veterinarian), and compute devices 401-403 associated with customers interested in purchasing the milk produced by the goats managed by the farmer associated with the compute device 406, for use in producing milk-based foods. The customers associated with compute device 401 can be producers of cheese-based products, the customers associated with compute device 402 can be producers of yogurt-based products, and the customers associated with compute device 403 can be producers of milk.

The LM device 405 can receive inputs from the compute device 401 indicating a target quality being a desired level of dry extract and fat content (e.g., higher than a first threshold value of dry extract and lower than a second threshold value of fat content) in milk to produce cheese. The LM device 405 can receive input from the compute device 402 indicating a target quality being a desired level of protein and dry extract content (e.g., higher than a first threshold value of protein extract and higher than a second threshold value of dry extract content) in milk for yogurts. The LM device 405 can receive input from the compute device 403 indicating a target quality being a desired level of fat content (e.g., higher than a first threshold value of fat content) in milk for milk products. The LM device 405 can receive any number of inputs. For example, the LM device 405 can receive additional inputs (not shown in FIG. 4) from other compute devices (not shown) indicating a target quality being a desired level of fat content (e.g., higher than a first threshold value of fat content) in milk for butters for example. The LM device 405 can be configured to generate a strategy to achieve the desired goals associated with each customer. In some implementations, the LM system 400 can be configured to generate a cost estimate or a quote for sale of a specified or desired quantity of bioproduct i.e. milk with the desired target quality for each customer, and send information associated with the cost estimate to the respective compute devices 401-403.

The LM device 405 can send to and/or receive inputs from the compute device 404 associated with an animal health specialist (e.g., a veterinarian). In some implementations, the LM device 405 can send feeding data or other animal handling data (e.g., data received from compute device 406 associated with farmer) to the compute device 404. In some implementations, the LM system 405 can send an indication of a target quality of a property of bioproduct that is of interest (e.g., data received from compute devices 401-403 associated with end-use customers). In some implementations, the LM device 405 can receive from the compute device 404 associated with an animal health specialist an indication of a recommendation of feed schedule and/or feed blend to be provided to the animals. In some implementations, the LM device 405 can receive information and/or a recommendation related to medicinal and/or dietary supplements to be provided to the animals to increase a likelihood of achieving a target quality. In some implementations, the LM device 405 can be configured to over time learn a pattern of information or recommendation and events associated with the information or recommendation provided by the compute device 404 associated with the animal health specialist such that the LM device 405 can provide inputs in place of and/or in addition to the information or recommendations from the animal health specialist.

The LM system 400 can send to and/or receive inputs from the compute device 406 associated with a farmer. The LM system 400 can receive from the compute device 406 an indication of a health status and current schedule of maintenance of animals. The LM device can provide based on computations carried out and/or based on inputs received from other compute devices (e.g., devices 401, 402, 403, 404, 406) and/or sources (not shown) a recommendation of feed, feed blend, and/or dietary/medicinal supplement to be provided to the animals to achieve a specific target goal. In some instances, a medicine and/or a dietary supplement can be included in a feed blend or be a part of a feed schedule. In some instances, an LM system 400 can recommend aspects of animal health other than feeding. For example, an LM system 400 can recommend a schedule of animal handling including a schedule for exercise, a schedule for sleep, a schedule for light cycle, a schedule for temperature, a schedule for any other suitable activity or state, and/or the like. In some implementations, the LM device 405 can send feeding schedule and/or other animal handling schedule (e.g., data received from compute device 406 associated with farmer) to the compute device 406. In some implementations, the LM system 405 can send an indication of an estimated quality of a property of bioproduct that may be obtained at a specified period of time if the animals were maintained in a particular regimen of feed schedule and/or dietary/medicinal supplement schedule. In some implementations, the LM system 405 can send an indication of an estimated cost associated with achieving a target quality of a property of bioproduct that may be obtained at a specified period of time if the animals were maintained in a particular regimen of feed schedule and/or dietary/medicinal supplement schedule.

As described previously, an LM system can be configured to receive information related to animal handling and/or feed schedules of animals in managed livestock that produce bioproducts, receive inputs related to a target quality of bioproduct, and generate outputs including recommendation of animal handling and/or feed schedules that can be adopted to increase a likelihood of achieving the target quality of bioproduct. In some implementations, the interactions between the components of the LM system including compute devices and LM device, or between virtualized agents and environments can be configured to be automatically carried out. FIG. 5 is a schematic representation of an interaction between an environment and an agent included in a livestock management system 500, according to an embodiment. The LM system 500 can be substantially similar in structure and/or function to the LM systems 100 and/or 400 described above. The LM system 500 includes an LM device that can be substantially similar to the LM devices 105, 305, and/or 405 described herein. The LM system 500 can include compute devices similar to compute devices 101-104, 201,401-404, and/or 406, described herein.

The LM system 500 includes a virtualized agent and a virtualized environment or world that the agent can act in using actions that impact a state of the world. The world can be associated with a set of states and the agent can be associated with a set of potential actions that can impact the state of the world. The world and/or a change in state of the world in turn can impact the agent in the form of an observation of a reward that can be implemented by the LM system 500. The LM system 500 can be configured such that the interactions between the world and the agent via actions and/or observations of rewards within the LM system 500 can be triggered and/or executed automatically. For example, an LM device within the LM system 500 that executes the interactions between the world and the agent can be configured to automatically receive inputs from sources or compute devices, and based on the inputs automatically trigger agent actions, state transitions in the world, and/or implementations of reward.

FIG. 6 illustrates an example method 600 of using the data received from the compute devices to generate an output indicating a recommended feed selection that can be administered to managed livestock, using an LM system, according to an implementation. The method 600 can be implemented by an LM system similar in structure and/or function to the LM systems 100, 400, and/or 500. In some embodiments, the method 600 can be implemented partially or fully by an LM device (e.g., a processor of an LM device) substantially similar in structure and/or function to the LM devices 105, 305, and/or 405, described herein.

At 671, the method 600 includes receiving an indication of a target quality of a property associated with a bioproduct obtained from a managed livestock, the target quality being associated with an identified end-use. In some instances, the bioproduct can be milk and the managed livestock can be intensively managed cohorts of goats. An example of such a system is shown in the illustration in FIG. 7 of an LM system 700 that can be substantially similar in structure and/or function to the LM systems 100, 400, 500, and/or 600 described herein. The indication of target quality can be received from one or more customers interested in purchasing the milk. The target quality can include a threshold value of a protein, fat, and/or dry extract content of milk produced by the managed cohort of goats, desired by a customer and intended for a specified end use (e.g., manufacture and/or distribution of milk, cheese, butter, yogurts, etc.). In some instances, the target quality can also be a desired quantity or volume of milk desired by a customer. The inputs can include a variety of available feed blends, animal feeds, dietary supplements, and/or medicines as shown in the example in FIG. 7.

At 672, the method 600 includes receiving an indication of a current health status of the managed livestock. The indication of current health can be received from animal handling personnel or alternatively from animal health specialists with access to information related to a current health status of the cohort of animals. In some instances, the indication of current health can be received from one or more sensors associated with animal handling. The health status can include details regarding well-being, age, weight, growth, production of bioproduct, quantity/quality of yield of bioproduct, etc.

At 673, the method includes generating a set of input vectors based on the target quality of the property. For example, the target quality can include a threshold level of fat content of milk. At 674, the method includes providing the set of input vectors to a machine learning model to generate an output indicating a feed selection to be used to feed the managed livestock, the feed selection configured to, upon consumption, increase a likelihood of meeting the target quality of the property. The LM device can generate input vectors based on the target quality to be provided to an ML model to be implemented as a target in a virtualized world including virtualized agents capable of virtualized actions. The ML model can implement the world and agents acting in discreet time steps to induce discreet state changes that may result in specific rewards associated with specific actions of agents. In some implementations, the ML model can draw rewards from prior learning or experience (e.g., learning based on data obtained from past virtualizations, from inputs received from compute devices associated with animal handling personnel and/or animal health specialists, etc.). The LM device can implement the world and the agents such that the agents act to maximize a cumulative reward. The scheme of cumulative rewards can be organized such that the LM device is configured to pursue conditions or states of the virtualized world that increase the likelihood of the world to arrive at a state that includes a production of the target quality of bioproduct. LM device can generate outputs and/or predictions indicating a feed selection that is recommend in feeding the cohort of animals to increase a likelihood of meeting the target quality of the property.

At 675, the method 600 includes administering a feed blend to the managed livestock, the feed blend including the feed selection. The LM device can provide a feeding schedule of a specific feed blend including the feed selection that can be adopted to increase the likelihood of achieving the target quality.

In some instances, a LM System can be used to guide in assignment of animals in a managed livestock to groups defined by the intended end-use of the bioproduct that will be produced. In some implementations, the assignment of animals to groups can be based on target goals or target qualities of properties associated with the bioproduct. In some instances, an output of an LM system can indicate how many animals are to be assigned to each group to meet a set of customer or end-use demands. An example outcome is illustrated in the plot 880 in FIG. 8. The plot in FIG. 8 shows a block feed schedule to be adopted over a period of days, the block feed schedules indicated from start to finish by each block associated with each of three groups of animals. For example, the output of an LM system can indicate that to meet the requirement of three different end uses of three kinds of customers (milk, cheese, yogurt producers), the goats in a managed livestock should be assigned to three groups that follow the three feed schedules shown in FIG. 8 for the period indicated. The output of an LM system can indicate that each of the three groups of goats should include a different count or number of animals (450, 700, and 950 for the milk, cheese and yogurt group respectively) to obtain a specific quantity and/or quality of milk desired by customers associated with each group.

In some embodiments, the disclosed LM systems and/or methods can include implementation of cognitive learning in the learning of agent-world interactions. In some implementations, an LM system can be implemented based on a hierarchical cognitive architecture as described, and/or using a hierarchical learning algorithm and/or method by an LM Device (e.g., LM Device 105 and/or 305) or a compute device (e.g., compute device 101-103, and/or 201), as described herein. A hierarchical reinforcement learning algorithm and/or method can be configured to decompose or break up a reinforcement learning problem or task into a hierarchy of sub-problems or sub-tasks. For example, higher-level parent-tasks in the hierarchy can invoke lower-level child tasks as if they were primitive actions. Some or all of the sub-problems or sub-tasks can in turn be reinforcement learning problems. In some instances, an LM system, as described herein, can include an agent that can include one or more of many capabilities and/or processes including, for example: Temporal Abstraction, Repertoire Learning, Emotion Based Reasoning, Goal Learning, Attention Learning, Action Affordances, Model Auto-Tuning, Adaptive Lookahead, Imagination with Synthetic State Generation, Multi-Objective Learning, Working Memory System, and/or the like. In some embodiments, one or more of the above listed capabilities and/or processes can be implemented as follows.

(i) Repertoire Learning—Options learning can create and/or define non-hierarchical behavior sequences. By implementing repertoire learning hierarchical sequences of options can be built that can allow and/or include increasingly complicated agent behaviors.

(ii) Emotion Based Reasoning—Emotions in biological organisms can play a significant role in strategy selection and reduction of state-spaces improving the quality of decisions. Emotions can be implemented to impact agent decisions. Such an implementation can be configured to contribute to strategy selection by an agent and/or a reduction of state-spaces such that decisions made by the agent can be of improved quality.

(iii) Goal Learning—Goal learning can be a part of the hierarchical learning algorithm. Goal learning can be configured to support the decision-making process by selecting sub-goals for the agent. Such a scheme can be used by sub-models to select actions and features that may be relevant to their respective function.

(iv) Attention Learning—Attention learning can be included as a part of the implementation of hierarchical learning and can be responsible for selecting the features that are important to the agent performing its task.

(v) Action Affordances—Similar to Attention learning, affordances can provide the agent with a selection of actions that the agent can perform within a context. A model implementing action affordances can reduce the agent's error in action execution.

(vi) RL Model Auto-Tuning—This feature can be used to support the agent to operate in diverse contexts by changing contexts via auto-tuning.

(vii) Adaptive Lookahead—Using a self-attention mechanism that uses prior experience to control current actions/behavior, the adaptive lookahead can automate the agent search through a state space depending on the agent's emotive state and/or knowledge of the environment. Adaptive lookahead can improve the agent's computational needs by targeting search to higher value and understood state spaces.

(viii) Imagination with Synthetic State Generation—Synthetic state generation can facilitate agent learning through the creation of candidate options that can be reused within an environment with the agent not having to experience the trajectory first-hand. Additionally, synthetic or imagined trajectories including synthetic states can allow the agent to improve its attentional skills by testing implementation of different strategies of using masks such as attention masks.

(ix) Multi-Objective Learning—Many real-world problems can possess multiple and possibly conflicting reward signals that can vary from task to task. In some implementations, the agent can use a self-directed model to select different reward signals to be used within a specific context and sub-goal.

(x) Working Memory System—The Working Memory System (WMS), can be configured to maintain active memory sequences and candidate behaviors for execution by the agent. Controlled by the executive model (described in further detail herein), WMS facilitates adaptive behavior by supporting planning, behavior composition and reward assignment.

In some embodiments, the one or more capabilities and/or processes listed herein can be used to build ML systems in an LM device (e.g., the ML model 357 or agent manager 356 or predictor 358 in the LM device 305) that can operate with 98% less training data, compared to other conventional systems using ML models, while realizing superior long-term performance.

In some embodiments, the systems and/or methods described herein can be implemented using quantum computing technology. In some embodiments, systems and/or methods can be used to implement, among other strategies, Temporal Abstraction, Hierarchical Learning, Synthetic State and Trajectory Generation (Imagination), and Adaptive Lookahead.

Temporal Abstraction is a concept in machine learning related to learning a generalization of sequential decision making. An LM system implementing a Temporal Abstraction System (TAS) can use any suitable strategy including an options framework, bottleneck option learning, hierarchies of abstract machines and/or MaxQ methods. In some implementations, using the options framework, an LM system can provide a general-purpose solution to learning temporal abstractions and support an agent's ability to build reusable skills. The TAS can improve an agent's ability to successfully act in states that the agent has not previously experienced before. As an example, an agent can receive a specific combination of inputs indicating a sequence of states and can make a prediction of a trajectory of states and/or actions that may be different from its previous experience but effectively chosen based on implementing TAS. For example, an agent operating in an LM system simulating a world involving the management of livestock can receive, at a first time, inputs related to a health status of a cohort of animals on a predefined feed. The agent can be configured to interact with the world such that the LM system can predict a progress in health status and/or a yield of bioproduct, even if the prediction is different from the agent's past experience, based on implementing TAS. The prediction can include a recommendation of feed selection or feed schedule to increase a likelihood of achieving a predicted result (e.g., health status/yield). Another example includes agents operating in financial trading models that can use TAS to implement superior trading system logic.

The TAS can support generalization of agent behavior. The TAS can also support automatic model tuning where an internal reinforcement learning model can be used to automatically adjust agent hyperparameters that affect learning, future reward discounting and environment behaviors/interactions. For example, in some embodiments of an LM system, a set of parameters can be defined as hyperparameters. Some parameters involved in reinforcement learning include parameters used in Q-value update including a learning rate a, a discount factor associated with weight of future rewards y, a parameter to balance between exploration and exploitation by choosing a threshold value E, actions available to agents to choose from based on exploratory/greedy behavior, a measure of risk involving or unpredictable action or behavior that an agent can perform, a set of consequences and/or a time period of impact that a model can implement based on actions of an agent, and/or the like. One or more of these parameters can be implemented as hyperparameters that can be defined to be associated with a set of dependencies with respect to a model and/or an agent such that a specified change in a hyperparameter can impact other parameters or hyperparameters and/or the performance of the model and/or the agent in a specified manner. In some instances, a specified change in a hyperparameter can for example modify an agent from a practiced behavior to an exploratory behavior. An agent and/or a model can learn a set dependencies associated with hyperparameters such that a hyperparameter can be automatically tuned or modified in predefined degrees to alter agent behavior and/or model behavior. Temporal abstraction can be executed by the hyperparameter model to occur over a sequence of time intervals that the agent interacts with the environment.

As an example, an LM system can be configured to generate a first feed selection or feed schedule selection based on one set of inputs and/or an indication of a first state received at a first time. The LM system can receive a reward signal at a second time after the first time, and the reward signal can be associated with a second set of inputs and/or an indication of a second state. The LM system can generate a second feed selection or feed schedule selection in response to receiving the reward signal. In some implementations, the LM system can be configured to, based on the reward signal, automatically adjust one or more hyperparameters and then generate the second feed selection or feed schedule selection using the adjusted hyperparameter(s) such that the adjusted hyperparameter leads to an improvement in the outcomes (e.g., yield) associated with the second feed selection compared to the outcome associated with the first feed selection based on the change.

In such an auto tuning LM system, developers no longer have to iterate on finding model configurations with good convergence. The model can support contextually adaptive hyperparameter values depending on how much the agent knows about the current context and the environment's changing reward signal. Working in concert, the agent learns reusable strategies that are context sensitive allowing the agent to support adaptive behavior over time while enabling it to balance explorative/exploitative behaviors.

As described previously, embodiments of an LM system described herein can implement temporal abstraction in the virtualization of a world and/or agents to implement temporally extended courses of action, for example, to determine a recommended protocol of animal handling to meet demands on production of bioproducts based on end-use. Disclosed herein is a method to recursively build and optimize temporal abstractions also referred to as options and hierarchical Q-Learning states to facilitate learning and action planning of reinforcement learning based machine learning agents.

In some implementations, an LM system can build and define an entire library or dictionary of options that can be used and/or reused partially and/or fully at any suitable time in any suitable manner. Learning temporal abstractions for example, skills and hierarchical states that can applied to learning can enable an LM system to learn to respond to new stimuli in a sophisticated manner that can be comparable or competitive to human learning abilities. The disclosed method provides a general approach to automatically construct options and hierarchical states efficiently while controlling a rate or progress and/or growth of a model through the selection of salient features. When applied to reinforcement learning agents the disclosed method efficiently and generally solves problems related to implementing actions over temporally extended courses and improves learning rate and ability to interact in complex state/action spaces.

FIG. 9 illustrates an example method 900 of training an ML model to receive a target quality of a bioproduct and a first indication of a property associated with the bioproduct, and output a recommendation directed to achieve the target quality, using temporal abstraction, according to an implementation. The method 900 can be implemented by an LM system similar in structure and/or function to the LM systems 100, 400, 500, and/or 700. In some embodiments, the method 900 can be implemented partially or fully by an LM device substantially similar in structure and/or function to the LM devices 105, 305, and/or 405, described herein.

At 971, the method 900 includes training a machine learning model to receive a target quality of a property associated with a bioproduct of a first managed livestock, receive inputs associated with a health status of the first managed livestock, and determine a temporal abstraction based on the target property and the inputs to be used to identify a feed selection. The feed selection can be configured to increase a likelihood of achieving the target quality of the property associated with the bioproduct of the first managed livestock, the target quality being associated with an identified end-use. The temporal abstraction can include options, skills, hierarchical states, and/or hierarchical actions as described herein. The temporal abstractions allow the agent to execute reusable behaviors in environments that the agent has not previously experienced improving the agent's ability to interact in real-world environments and learn about these environments.

At 972, the method 900 includes receiving a target value of the property associated with the bioproduct produced by a second managed livestock different from the first managed livestock.

At 973, the method 900 includes receiving, at a first time, a first indication of the property associated with the bioproduct produced by the second managed livestock. At 974, the method includes generating a set of feature vectors based on the target value of the property and the first indication of the property.

At 974, the method 900 includes providing the set of feature vectors to the machine learning model to generate, based on the temporal abstraction and the first indication of the property, a first output including a first feed selection configured to, upon consumption by the second managed livestock, increase a likelihood of achieving the target value of the property associated with the bioproduct of the second managed livestock based on the first indication of the property.

At 975, the method 900 includes providing the set of feature vectors to the machine learning model to generate, based on the temporal abstraction and the first indication of the property, a first output including a first feed selection configured to, upon consumption by the second managed livestock, increase a likelihood of achieving the target value of the property associated with the bioproduct of the second managed livestock based on the first indication of the property.

At 976, the method includes receiving, at a second time after the first time, a second indication of the property. And at 977, the method 900 includes comparing the second indication of the property with at least one of the first indication of the property or the target value of the property, to calculate a difference metric. The machine learning model can be configured to adaptively update, based on the difference metric, the temporal abstraction to generate a second output including a second feed selection. The second feed selection can be configured to, upon consumption by the second managed livestock, increase a likelihood of achieving the target value of the property associated with the bioproduct of the second managed livestock based on the second indication of the property.

Temporal abstraction can be implemented by generating and/or using options that include sequences of states and/or sequences of actions. The implementation of options can be focused on generating and adopting reusable action sequences that can be applied within known and unknown contexts of the world implemented by an LM system.

An example option 1085 is illustrated in FIG. 10. An option can be defined by a set of initiation states (S0) 1086, action sequences 1089 involving intermediary states (S1, S2, S3, S4) 1087, and a termination probability associated with a termination state (S5) 1088. When an option 1085 is to be executed, the agent can be configured to first determine its current state and if any of the available options offers to have a start state that is similar to its current state. If there is a positive identification of an option that includes a start state the same as its current state, the agent can then execute the sequence of predefined actions for every new state included in the option until it reaches the termination state and the termination probability condition is set to true. For example, the agent can identify start state (S0) 1086 to be similar to a current state and identify the option 1085 as a selection to be executed. In some instances, the option 1085 can then be executed by the agent starting at the start state 1086 and progressing through intermediary states S1-S2-S5, via actions indicated by the lines joining the respective states, to reach the termination state S5 1088. In some instances, the agent can execute the option 1085 by starting at the start state S0 1086 and progressing through state S2 alone, or through states S2-S4, or through states S3- S4 indicated by lines representing actions, to reach the termination state S5 1088. At state S5 the option terminates and the agent proceeds to select another action or option as dictated by agent behavior designed by an agent manager and/or by outputs from a ML model.

In some instances, LM systems described herein can implement hierarchical states in reinforcement learning that can play a role in improving agent learning rate and/or in the development of long-term action plans. In some instances, with an increase in complexity of a task (e.g., increase in number of alternative solutions, increase in dimensionality of variable to be considered, etc.) the trajectory to the solution can become intractable due to exponentially increasing complexity of agent actions due to an increase in the number of states in the system. In some implementations, the LM system can implement hierarchical states, which decrease the size of a state space associated with an LM system. This implementation of hierarchical states and the resulting decrease in state space can lead to an exponential decrease in a time for learning in agents. Automatic learning of hierarchical states in conventional systems, however, can represent challenges by restricting size of models that can be used.

In some embodiments, an LM system can be configured that can learn options and generate and use hierarchical states effectively using a recursive execution of a process associated with a Bellman Optimization method as described herein. The recursive process can be configured to converge on optimal or desired values over a period of time. The method can allow for the agent to select improved and/or optimal or desired policies (e.g., actions resulting in state transitions) in known and unknown environments and update their quality values over time. In some instances, the method can treat options and hierarchical states as functionally dependent at creation and can allow for the merging of existing options and hierarchical states to build new state and action compositions. Over time, as the agent explores the state space, the algorithm can generate new hierarchical states and composition hierarchical states as the agent performs numerous trajectories through the state/action space.

FIG. 11 is an illustration of an example option 1185 and including hierarchical states (e.g., S′0) generated by an LM system, according to an embodiment. The option 1185 can include a start state 1186, intermediary states 1187 and termination state 1188. An example method adopted by the LM system can include building hierarchical states (e.g., S′0) and generating options (e.g., S2-S4, 1189).

To build a hierarchical state, the LM system can first identify two consecutive state/action transitions through the environment. The LM system can perform a sequence of verification steps including verifying that (1) the identified state/action transitions have non-zero Q^p(s,a) values (also referred to herein as Q values), which can be values associated with a state/action pair under a predefined policy, as defined previously, (2) the identified state/action sequence is non-cyclical, and (3) that the transition sequence does not include a transition cycle from S0 to Sn.

Following the above steps, if positively verified the LM system can continue to the next step and if not the LM system can return to identifying two new consecutive state action transitions. If positively verified the LM system can create and/or define a new hierarchical state S′, for example state S′0 as shown by S′0 in FIG. 11 and create and/or define a new state name hash. The new state can be associated with an action A′0 and an action A′1 as shown in FIG. 11.

The LM system can extract state primitives and action primitives from standard and hierarchical state transitions. Based on the extracted information, the LM system can create and/or define a new hierarchical action from S0 state in sequence to the new hierarchical state S′ (e.g., action A′0) and add the hierarchical action to a new hierarchical action associated with state S0. The LM system can create and/or define new hierarchical action from S′ (e.g., action A′1 from state S′0) to an intermediary state (e.g., S2) or a last state in sequence Sn (e.g., S5 in FIG. 10) and add the newly created and/or defined hierarchical action to an action list associated with state Sn. The LM system can then add state S′ (e.g., A′0) to Q Model states. This new hierarchical state can be reached using normal planning and its Q value be updated using the current system logic.

In some instances, an LM system can be configured to implement and/or learn to implement state deletion. In some instances, an LM system can consider combining multiple options to create and/or define a repertoire behavior or a subset of an option action sequence that can include states previously generated by a temporal abstraction algorithm, also referred to herein as hierarchical states. The LM system can be configured to learn to merge the two options to form a single option that builds hierarchical states from the two options. In some instances, the LM system can merge two options by selecting a set of hierarchical states and merging the action primitives to construct a new hierarchical state.

To generate an option, the LM system can initiate an induction cycle, in some implementations, to create and/or define a state name S′x (eg., x=1, 2, . . . n) from action sequences by using action sequences extracted from hierarchical state algorithms. The LM system can identify an action A′x associated with the state S′x. The LM system can check that action A′x is not in a preexisting dictionary of options and that a sum of action Q values associated the action sequence including A′x is above a threshold value of interest. If the verification steps are indicated to be true (i.e., A′x is not in the dictionary of options and the sum of action Q values associated with the action sequence including A′x is above a threshold value) the LM system can continue, if not the system exits from induction cycle. If true, the LM system can create and/or define an option with a S0 state from hierarchical state induction sequence as initial initiation state or start state.

A method to construct hierarchical states can be implemented using reinforcement learning. The method can be associated with agents and can use pairwise state/action transitions to recursively optimize and/or improve action values using the Bellman Optimality Principle. In some implementations, the method can use a Q-value threshold to determine if a new hierarchical state is to be added to the reinforcement model's options dictionary. In some implementations, the method can include generating hierarchical states in a recursive manner from other hierarchical states.

A method to construct options/skills can be implemented using reinforcement learning. The method can be associated with agents and can use pairwise state/action transitions to recursively optimize action values using the Bellman Optimality Principle. The method can use a state interest value to determine if a new Option/Skill is to be added to the reinforcement model. In some implementations, the method can use a state interest value to determine if a new option/skill is to be added to the model (e.g., reinforcement model). In some implementations, the method can include generating hierarchical states associated with options/skills in a recursive manner from other hierarchical states.

In some implementations, the LM system can additionally support automatic merging of previously generated hierarchical states with new action trajectories or action sequences in a manner that can be consistent with an existing sequence of states/actions. This functionality can simplify a process of building and maintaining hierarchical states no matter how complex an environment is in a general and fully automatic algorithm. The disclosed LM systems and/or methods can thus reuse existing Q-Learning model insertion, update and deletion mechanisms to manage hierarchical states. By using model update mechanisms of Q-Learning, selection of hierarchical states can help convergence to optimal and/or improved values over time according to the Bellman optimality principle. In some such implementations, the LM system thus combines sample efficient methods for the generation and merging of hierarchical states with mathematically mature methods to ensure that the quality of actions and options executed over time converge to optimal and/or improved values.

In some embodiments, the disclosed LM systems and/or methods can include implementation of cognitive or hierarchical learning in the learning of agent-world interactions. A Hierarchical Learning System (HLS) can include a learning algorithm that utilizes a recursively optimized collection of models (e.g., reinforcement learning models) to support different aspects of agent learning.

FIG. 12 illustrates a schematic representation of an LM system 1200, implementing cognitive learning, according to an embodiment. The LM system 1200 can be substantially similar in structure and/or function to the LM systems 100, 400, 500, and/or 700, and can implement methods similar to methods 600 and/or 900 described herein. In some embodiments, the cognitive learning in the LM system 1200 can be implemented by an LM device substantially similar in structure and/or function to the LM devices 105, 305, and/or 405, described herein.

In some implementations, a model (e.g., the ML model 357 described previously) in an LM system can include multiple models that in some instances, can be configured in a hierarchical organization. The LM system 1200 can include an agent/system architecture as shown in FIG. 12, such that agent interactions with the world are based on a set of models including an executive model, an integrated model, and a hierarchical model. The world can have many states (S0, S1 . . . Sn) and states can be associated with rewards (R0, R1 . . . Rn). An agent can be defined to interact with the world via actions and the agent actions can have consequences including an impact on the state, changes in the state of the world, and/or rewards. The executive model can include a model simulating a working memory component. The working memory component can in turn include an executive model that is configured to simulate agent actions and a world model that is configured to simulate world states, state transitions, responses to agent actions including rewards, etc. s

The integrated or hierarchical learning model (also illustrated in FIG. 14) can include multiple models that are each configured to simulate various levels of cognitive and/or behavioral functions including arousal states, emotive states, goals, attentional states, affordance, experiential states, etc. As an example, organized over the experiential model that provides actions that interact with the world, an HLS can use a model simulating emotions capable in an animal to enable the agent to select strategies that include sub-goals, state features to attend to, and action types the agent can execute within a particular context. This capability effectively reduces the strategy space that the agent can act in and can improve behavior selection while dramatically reducing reward variability over time. The hierarchical model can include an auto-tuning model that can be configured to implement adjustment of one or more parameters or hyperparameters of the policy model, a policy repertoire model that can create and/or define more complex behaviors by combining world policy options, and an auto-tuning repertoire model that can build more complex hyperparameter configurations by combining auto-tuning options.

Using the hierarchical architecture of the cognitive model, the LM system can be configured to operate effectively even in new environments by automatically surveying the environment and automatically tuning hyperparameters based on results of agent interactions. The Executive Model of the Working Memory System (WMS) can provide memory and behavior replay management of the agent. Specifically, the WMS can orchestrate the internal/external generation of experience and replays to adaptively learn temporal abstractions and selection of potential behaviors for future execution. The cognitive model can thus provide a general purpose LM system for all state and action spaces that the agent possesses.

In some implementations, an LM system can operate by using a model to simulate an external world and an internal model to simulate an internal world or representation (e.g., an internal representation of an animal or a cohort of animals, etc.). The internal model can be associated with internal states that can be perceived, organized using a system of memory, and impacted via internal actions. The internal model can be configured to impact a world state value and in turn impact agent action/behavior. FIG. 12 is a flowchart 1250 schematically illustrating a flow of information in an LM system similar to the systems described herein. FIG. 13 is a flowchart 1350 schematically illustrating a flow of information in an example implementation of a world agent by an LM system similar to the systems described herein. The world agent interacts with a world environment 1355 and an internal model 1353 as shown in FIG. 13. The internal model 1353 can be a model of the world environment 1355. The flow chart 1350 shows two primary flows of information through the agent reasoning system of the world agent. In the world flow path (on the left side of FIG. 13), which can be a primary path, the agent in the LM system selects behaviors that result in actions that are executed in the world 1355. In the flow path interacting with the internal model (shown on the right side of FIG. 13), which can be a secondary path, the agent in the LM system interacts with its own model of the world 1353 (internally generated) and is used for planning and creation options. In the LM workflow the executive model 1357 associated with the executive agents 1358 is responsible for the management of the active memory content which supports the agent creating and performing complex behaviors. The executive model 1357 loads the active memory system from one of multiple memory stores 1359 that include: Short Term Memory, Prospective Memory and Long-Term Memory. The world model 1353 is responsible for the selection of actions to be performed based on the active memory contents. The model 1357 in an LM system can receive information associated with a world state value and/or its reward signal, which can impact interactions between executive model 1357 and the memory associated with the system which can lead to new behaviors. The temporal abstraction manager analyzes the changing contents of the memory system to discover new options and repertoire of options. Information associated with the value can be relayed to world model, which can be translated into an agent's action in the external world, which can impact the world state, or an internal action that impacts an internal representation or internal model. As an example, a model of the world can be a model of a cohort of animals managed in a group intended to produce milk to be purchased by manufacturers of cheeses. The world model can simulate states such as a cohort of animals at a current health status with a first average quality of milk, a first average yield of milk, a first duration of feed consumption, a first average amount of loss of production, a first average amount of waste of resources (e.g., in the form of leaked protein), etc. As an example, an internal model can be a model of an animal cohort that is in a similar state as the current world state. The internal model can simulate states of a cohort including lactation states in a lactation cycle, states of hunger, states of growth, etc. Each of the internal states can be configured to impact a world state and vice versa. The impact on the world state and/or the internal model can in turn result in a world state or world state transition, each of which can be associated with a value and used for planning by the LM system. The world state value can recursively impact the interactions between executive model, world model, and the abstraction manager and so forth.

In some embodiments, the LM systems described herein can implement the Working Memory System (WMS) such that the WMS functions similar to a biological model and includes multiple subsystems that manage long term behavior selection, planning, and skill learning. In some implementations, an LM system can be configured such that the agent can interact not only in the world but also conceive states and/or state transitions or trajectories, or actions that are not experienced by the agent in the world. Such states conceived by agents can also be referred to as synthetic states, synthetic trajectories and synthetic actions imagined by agents. As part of the WMS, a processor of an LM device of the LM system can implement a Synthetic State & Trajectory Generation System (SSTGS) that is configured to manage generation of states and transition behavior for the agent's capability to conceive states/actions that are not experienced in the world (also referred to as the agents capability to imagine). FIG. 15 is a schematic illustration of generation of synthetic states by an LM system 1500, according to an embodiment. The LM system 1500 can be substantially similar in structure and/or function to the LM systems 100, 400, 500, 700, and/or 1200, and can implement methods similar to methods 600 and/or 900 described herein. In some embodiments, the synthetic state generation in the LM system 1500 can be implemented by an LM device substantially similar in structure and/or function to the LM devices 105, 305, and/or 405, described herein.

Managed by the Executive Model, the agent can create and/or define synthetic trajectories to generate temporal abstractions that can be reused in the live environment. Derived from past actual experience, synthetic states and their transitions enable the agent to learn new sub-goals, attention and affordances from experience in an offline manner for example, when an environment has not been actually experienced by the agent. These behaviors and goals/attentional/affordances can serve as templates for future use and can improve agent performance.

To create and/or define a synthetic state (SS) (e.g., synthetic states 0, 2, and 3), a new set of features can be selected using the original state features as the source (e.g., state features associated with states S1, S2, S5, S6, and S7). Actions can be generated from a subset of the original state's actions. The executive model can then estimate transition Q-values based on the average Q-values of the original state. Thus, synthetic state generation is achieved through the re-evaluation of an instant state's attended state features and its action space. The Executive Model (EM) selects new features to attend to and creates and/or defines a new synthetic state with actions and reward values based on the source action value. The primary function of the system configured to generate synthetic states can be to build targeted temporal abstraction candidates for the agent to use in the future and accelerates agent learning of the environment through more effective use of its current experience.

In addition to the creation and/or definition of synthetic states, a WMS can create and/or define synthetic trajectories based on the current model of the world. Through this the agent generates new temporal abstractions with estimated reward values. These skills are then tested in the real world and retained/discarded depending on the quality of the behavior. The creation and/or definition of targeted synthetic trajectories can conserve processing and memory use because this happens in an offline low priority process while the agent is executing an option in the world. Options allow the agent to execute preprogramed behaviors freeing the agent to allocate processing resources to planning and behavior generation through synthetic experience simulations. FIGS. 16A and 16B illustrate world graphs representing potential state transition trajectories without and with including synthetic trajectories 1689. In some implementations, synthetic states and/or synthetic trajectories can be included in temporal abstractions. FIG. 17 illustrates an example world transition graph 1785 that is associated with a world simulated by an LM system according to an embodiment. FIG. 17 also illustrates three example synthetic graphs 1791, 1792, and 1793, representing temporal abstractions that include synthetic trajectories that allow transitions between states included in the graph 1785, but via synthetic states. The synthetic trajectories can be generated and implemented in a simulation by the LM system that can be similar to a synthetic experience or a conceived/imagined experience by an agent in the LM system, according to an embodiment. Learning of temporal abstractions can have an exponential impact on agent learning of environments. Additionally, the synthetic trajectories allow the agent to test different attentional and behavioral constraints that may prove to be more reliable in appraising and execution of behavior. An example of this can be the agent shifting the agent's attention to features associated with physical health such as protein over milk fat content. This can then be adjusted by the agent to ensure that actions that change feed type are enabled. In another scenario, the animal cohort are in optimal and/or desired health and production quality so the system can create and/or define skills that prevent the use of medicinal treatments unnecessarily to the livestock cohort.

In some embodiments, similar to the generation of synthetic states/state transition trajectories, a subset of the action space of the parent state can be selected. An LM system can estimate action Q-values and adjust the estimated values using an executive model, allowing the executive model to update the value function of various simulated synthetic trajectories. Synthetic experience (including synthetic states/state transitions) can be implemented as a temporal abstraction that is in a volatile memory representation and trimmed from the agent's model over time. The trimming can be omitted when the agent encounters a portion of a synthetic trajectory or a portion of synthetic state in a temporal abstraction in a non-synthetic context or in a simulation of a world. When the agent experiences a synthetic experience in a real simulation of the world that synthetic experience can be made permanent and its value can be updated to match the actual return value in the real simulation or model.

In some embodiments, an LM system can be configured to implement a feature referred to as Adaptive Lookahead which can be implemented as a part of the WMS. The Adaptive Lookahead System (ALS) can be an Executive Model (EM) controlled function that performs contextually relevant lookaheads from current or expected future states to guide behavior selection. Similar to Monte Carlo methods, ALS can provide an agent the ability to optimize and/or improve the use of lookahead for the agent. This system balances internal simulation time and live behavior to improve agent computational needs while providing improved action selection through experience search. Managed by the EM, the agent is configured to learn how to optimize this process minimizing its computational load with improved reward gains over time.

While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. Where methods and/or schematics described above indicate certain events and/or flow patterns occurring in certain order, the ordering of certain events and/or flow patterns may be modified. While the embodiments have been particularly shown and described, it will be understood that various changes in form and details may be made.

Although various embodiments have been described as having particular features and/or combinations of components, other embodiments are possible having a combination of any features and/or components from any of embodiments as discussed above.

Some embodiments described herein relate to a computer storage product with a non-transitory computer-readable medium (also can be referred to as a non-transitory processor-readable medium) having instructions or computer code thereon for performing various computer-implemented operations. The computer-readable medium (or processor-readable medium) is non-transitory in the sense that it does not include transitory propagating signals per se (e.g., a propagating electromagnetic wave carrying information on a transmission medium such as space or a cable). The media and computer code (also can be referred to as code) may be those designed and constructed for the specific purpose or purposes. Examples of non-transitory computer-readable media include, but are not limited to, magnetic storage media such as hard disks, floppy disks, and magnetic tape; optical storage media such as Compact Disc/Digital Video Discs (CD/DVDs), Compact Disc-Read Only Memories (CD-ROMs), and holographic devices; magneto-optical storage media such as optical disks; carrier wave signal processing modules; and hardware devices that are specially configured to store and execute program code, such as Application-Specific Integrated Circuits (ASICs), Programmable Logic Devices (PLDs), Read-Only Memory (ROM) and Random-Access Memory (RAM) devices. Other embodiments described herein relate to a computer program product, which can include, for example, the instructions and/or computer code discussed herein.

In this disclosure, references to items in the singular should be understood to include items in the plural, and vice versa, unless explicitly stated otherwise or clear from the context. Grammatical conjunctions are intended to express any and all disjunctive and conjunctive combinations of conjoined clauses, sentences, words, and the like, unless otherwise stated or clear from the context. Thus, the term “or” should generally be understood to mean “and/or” and so forth. The use of any and all examples, or exemplary language (“e.g.,” “such as,” “including,” or the like) provided herein, is intended merely to better illuminate the embodiments and does not pose a limitation on the scope of the embodiments or the claims.

Some embodiments and/or methods described herein can be performed by software (executed on hardware), hardware, or a combination thereof. Hardware modules may include, for example, a general-purpose processor, a field programmable gate array (FPGA), and/or an application specific integrated circuit (ASIC). Software modules (executed on hardware) can be expressed in a variety of software languages (e.g., computer code), including C, C++, Java™, Ruby, Visual Basic™, and/or other object-oriented, procedural, or other programming language and development tools. Examples of computer code include, but are not limited to, micro-code or micro-instructions, machine instructions, such as produced by a compiler, code used to produce a web service, and files containing higher-level instructions that are executed by a computer using an interpreter. For example, embodiments may be implemented using imperative programming languages (e.g., C, Fortran, etc.), functional programming languages (Haskell, Erlang, etc.), logical programming languages (e.g., Prolog), object-oriented programming languages (e.g., Java, C++, etc.) or other suitable programming languages and/or development tools. Additional examples of computer code include, but are not limited to, control signals, encrypted code, and compressed code.

METHODS AND APPARATUS TO AUTOMATE THE MANAGEMENT OF INTENSIVELY MANAGED MILK PRODUCING LIVESTOCK TO PRODUCE CUSTOMIZED PRODUCT DEPENDING ON END-USE USING MACHINE LEARNING

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims