Documents, newsletters, websites and other sources of information are increasingly being generated by automated writing services, sometimes called “artificial intelligence journalists” when referring to the generation of news articles. However, automated sources of information can be applied to multiple types of information, not just news articles.
It is with respect to these and other considerations that the disclosure made herein is presented.
Technologies are described herein for automating content generation. Generally, a content generator invokes a generator module and a research module to generate content, such as a news article. While the generator module is generating content, the research module is receiving intermediate content from the generator module. The research module determines additional information associated with the intermediate content received from the generator module to generate additional content. The content generator determines if the additional content is to be added to the intermediate content being generated by the generator module.
It should be appreciated that the above-described subject matter can be implemented as a computer-controlled apparatus, a computer process, a computing system, or as an article of manufacture such as a computer-readable storage medium. These and various other features will be apparent from a reading of the following Detailed Description and a review of the associated drawings.
This Summary is provided to introduce a selection of technologies in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended that this Summary be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
The following detailed description is directed to technologies for automatically generating content. In conventional systems, content is generated using various algorithms known to those of ordinary skill in the relevant art. Some technologies, in an attempt to make the content appear to be written by a human, use algorithms to “enhance” the content. For example, an automated content generator may determine that the difference in points between a winning team and a losing team is so vast that the term “rout” is used rather than the term “win” when the article is generated. However clever and efficient these algorithms are, the articles are often still “flat” in that the articles provide no context beyond that which is received. In other words, the articles often reasonably provide information as to the “what” of the information, but do not provide information as to the “meaning” of the information.
In other automated or semi-automated systems, information in content is linked to information in other content in an attempt to achieve some level of “depth” of the content. For example, “wikis,” websites in which users generate content, often link information between various pages, sometimes using hyperlinks. In this manner, a user reading the content can select linked information to view additional information about the linked information. The user can explore by continually selecting linked information. The information is merely linked, however, as no meaning is provided by the link other than information is connected.
In still further conventional systems, additional information is provided to enhance generated content. For example, an automated source may generate an article about a sports event or a company's stock. Once the article is generated, a system is implemented whereby enhancements are made to the article in an attempt to make the article appear to be written by a human. For example, in the sports article description above whereby the term “rout” was used instead of “win,” the enhancement would be the difference in connotation between “rout” and “win.”
Various implementations of the presently disclosed subject matter provide technological improvements over conventional content generation technologies. For example, as noted above, conventional content generation technologies enhance content after a significant portion (or all) of the content is generated. While the method of doing so is straightforward, thus potentially saving money, these enhancement technologies are akin to driving without a map, arriving at a location, and then analyzing a resulting location. Information that may be gained during the trip is lost or may be as inaccurate as trying to retrace steps, rather than analyzing as a person moving forward. Thus, these technologies often produce content that do not provide enough information or have an improper focus, causing the need for human intervention or a re-performance of the content generation mechanism.
While the subject matter described herein is presented in the general context of program modules that execute in conjunction with the execution of an operating system and application programs on a computer system, those skilled in the art will recognize that other implementations can be performed in combination with other types of program modules. Generally, program modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the subject matter described herein can be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like.
In the following detailed description, references are made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific examples. Referring now to the drawings, aspects of technologies for generating content will be presented.
The content generator 108 receives information 112 from an information source 110. The information 112 can be of various formats and types. For example, the information source 110 can be an article, data, a website, and the like. The presently enclosed subject matter is not limited to any particular type of information 112. The information 112 is used to provide an article 114 to the user device 104. The presently disclosed subject matter is not limited to any particular length or format of the article 114. For example, the article 114 can be a news article, information in a webpage, a book, table of contents for a book, a report and the like. In some examples, the user device 104 is used by a user (not shown) to retrieve the article 114 about the information 112.
In order to generate the article 114, the content server 102 invokes a content generator 108. The content generator 108 receives the information 112 and generates the article using a process termed “connection generation.” The content generator 108 invokes a research module 120 and a generator module 122. In various examples, the research module 120 and the generator module 122 work in concert with the other to generate the article 114.
In some examples, connection generation is a process whereby relationships between facts in the information 112 and facts stored in the data resource data store 124 that may be potentially added to create the article 114 are determined while the article 114 is being generated.
In some examples, the process of connection generation enhances content in a manner different than conventional systems. For example, as noted above, in some conventional systems, the content is first generated and then content is added to enhance the content. However, also as noted above, this approach may not provide desirable content. In some instances, if enhancements or additional content are added after content is generated, important features of the content may not be fully researched and described, while meaningless or less important features of the content may inadvertently be made more important.
For example, a sports article written about a sporting event may enhance the content about the winning score, but inadvertently neglect the importance of the content relating to a team winning its division or a player achieving a milestone in his or her career. In some instances, if all aspects of the information 112 are enhanced after the article 114 is generated, the article 114 may be difficult to read, causing the article 114 to be rewritten in some instances.
To generate the article 114, after receiving the information 112, the generator module 122 analyzes the information 112 to determine one or more themes of the article 114. A “theme” is a topic or subject of the article and is used to organize the article 114, described in more detail in
In
In some examples, if the article 114 is created using simply the content structure 200, the article 114 may appear or read like a computer-generated article. The article 114 will merely be rote from the information 112 provided to create the article 114. In some examples, conventional systems may enhance the sections of the content structure 200 once generated. For example, conventional systems may analyze the game summary and determine that the score had such a margin that terms like “win” should be replaced with “rout,” which connote a larger difference of score. In a distinctly different manner, the content structure 200 of
The content data structure 300 of
Using the example started above in which the article 114 is a sports article using the content structure 200 of
To enhance the article 114 generated by the content generation system 100 of
In an example of content data structure generation for a sports-related article, the research module 120 may determine that the number of hits in a ball game or a total number of hits by a player is significant. The research module 120 thereafter instantiates the node 302B and the connection 304A, associating the node 302A with the information that the information associated with node 302A represents a significant number. The research module 120 then determines the number of hits that is significant (for example, a baseball record or a number of hits that very few players have reached). The research module 120 then instantiates the node 302C and the connections 304B and 304C.
The research module 120 then determines the players' names that have achieved the number of hits represented by node 302C, and memorializes that information by instantiating node 302N and connection 304N. Thus, for the information represented by node 302A, the research module 120 has “gone deep” by determining the information associated with nodes 302B, 302C, and 302N, and their connections 304A, 304B, 304C, and 304N. The connections 304 can be thought of as “meaning.” Thus, while the content generation system 100 of
The number of nodes 302 and connections 304 removed from the node representing the interim content, node 302A, represents a depth of informational research. In some examples, due to various factor such as processing or operational limitations of the content server, the number of nodes 302 and the number of connections 304 removed from the node 302A may be limited. For example, the content generator 108 can receive an input that the article 114 is to have a size or length limitation (e.g. 1000 words). The content generator 108 can receive this limitation as a connection 304 limitation or a node 302 limitation and, for example, limit the research module 120 to only two connections 304 removed from the node 302A, thus limiting the potential amount of content inserted into the article 114.
Returning to
If the content generator 108 determines that the content structure 200 of
In
Returning to
If the modified content structure 400 has been modified in a manner different than what has been used before, for example, by receiving input from a human editor that the modified content structure 400 should be used, the content generator 108 can update other content structures stored in the content structure data store 126 used for similarly situated articles, such as articles having the same subject matter. The content generator 108 can continually update content structures stored in the content structure data store 126 as new content structures are designed and, in some examples, approved for use. For example, sports-related articles generated in the future may be constructed using the modified content structure 400 if the content data structure of the new article is similar to the content data structure 300 of
As noted above, the content generator 108 can use the research module 120, with the generator module 122, to generate the article 114. To generate the content data structure 300 and, eventually, the article 114, the content generator can use various processes, such as, but not limited to, observation (or data research), discovery, “going deep,” making connections, quality, quantity, and insight.
A process performed by the content generator 108 is the process of data research. The data research process can be performed by the research module 120 when, inter alia, constructing the content data structure 300 of
Another process that can be performed by the content generator 108 is the process of discovery. The process of discovery is the realizing of data generated by the data research process. For example, the content generator 108 can initiate the data research process by initiating a search of records relating to accidents in a particular location. The process of discovery is finding the data relating to the search processes commenced in the data research process. In terms of examples provided above, the data research process and the discovery process are used to generate nodes of the content data structure 300.
The process of “going deep” involves the content generator 108 using the data determined in the discovery process to start a new data research process. For example, in
The content generator 108 can continue this process of building the content data structure 300, adding additional levels to various nodes 302. It should be noted, however, that the content generator 108 may be limited in the process of “going deep,” e.g. building levels, by various factors such as the desired length of the article 114, the capabilities (such as processing power) of the content server 102, the amount of information available to the content generator 108, and the like. The content generator 108 can also receive a limiting input, such as a human input, a node limitation, or a connection limitation, that the number of levels is excessive, using that input as an input for a modified content structure, such as the modified content structure 400 of
The process of making connections is described by way of example in
It also should be understood that the illustrated method 500 can be ended at any time and need not be performed in its entirety. Some or all operations of the method 500, and/or substantially equivalent operations, can be performed by execution of computer-readable instructions included on a computer-storage media, as defined herein. The term “computer-readable instructions,” and variants thereof, as used in the description and claims, is used expansively herein to include routines, applications, application modules, program modules, programs, components, data structures, algorithms, and the like. Computer-readable instructions can be implemented on various system configurations, including single-processor or multiprocessor systems, minicomputers, mainframe computers, personal computers, hand-held computing devices, microprocessor-based, programmable consumer electronics, combinations thereof, and the like. Computer-storage media does not include transitory media.
Thus, it should be appreciated that the logical operations described herein can be implemented as a sequence of computer implemented acts or program modules running on a computing system, and/or as interconnected machine logic circuits or circuit modules within the computing system. The implementation is a matter of choice dependent on the performance and other requirements of the computing system. Accordingly, the logical operations described herein are referred to variously as states, operations, structural devices, acts, or modules. These operations, structural devices, acts, and modules can be implemented in software, in firmware, in special purpose digital logic, and any combination thereof.
For purposes of illustrating and describing the technologies of the present disclosure, the method 500 disclosed herein is described as being performed by the content server 102 and user device 104 via execution of computer executable instructions such as, for example, the content generator 108. As explained above, the content generator 108 can include functionality for generating content such as the article 114.
While the method 500 is described as being provided by the content server 102, it should be understood that the content server 102 can provide the functionality described herein via execution of various application program modules and/or elements. Additionally, devices other than, or in addition to, the content server 102 can be configured to provide the functionality described herein via execution of computer executable instructions other than, or in addition to, the content generator 108. As such, it should be understood that the described configuration is illustrative, and should not be construed as being limiting in any way.
The method 500 begins at operation 502, where information 112 is received from an information source 110. The information source 110 can be one or more documents, web pages, scholarly articles, news events, and the like.
The method 500 continues to operation 504, where the content generator 108 is initiated. The content generator 108 performs various functions. For example, the content generator 108 receives and transmits content, organizes various modules, and receives input regarding content. The content generator 108 also determines if content generated by the research module 120 is to be included in the article 114.
The method 500 continues to operations 506 and 508, where the generator module 122 and the research module 120 are started. As discussed above, various conventional technologies for generating content implement a linear approach to generating the content. For example, content is generated from information and thereafter enhanced by reviewing the finished product.
In a distinctly different manner, according to various configurations described herein, the generator module 122 and the research module 120 are initiated at the same time. As described above and in additional detail below, the generator module 122 generates basic content from the information 112. For example, the information may be numbers relating to a sports game, such as baseball. The research module 120 is configured to receive interim information from the generator module 122. As used herein, “interim information” or “interim content” is content generated by the generator module 122 while the article 114 is being generated. For example, the content generator 108 may receive a baseball score and the teams as the information 112. The generator module 122 receives the information 112 and, while generating the article 114, transmits the interim information to the research module 120.
The method 500 continues to operation 510, where a theme is determined. The theme can be the subject matter of the information 112, an intended use of the article 114, and the like. To determine a theme, in some examples, the content generator 108 analyzes the information 112. For example, the content generator 108 may determine the information 112 is a sports score, a financial report, a traffic report, an academic research paper, and the like. In other examples, the content generator 108 can receive an input that the article 114 is to be used for a humorous article. The presently disclosed subject matter is not limited to any particular manner of determining a theme.
The method 500 continues to operation 512, where a content structure 200 is received. The content structure 200 is an organizational structure of the content to be inserted into the article 114. The content structure 200 is based on the theme determined in operation 510.
The method 500 continues to operation 514, where the generator module 122 commences content generation. For example, the content generator 108 may receive the following information 112: Scientists in Belgium report that Mary has discovered a new element. Marium. The information 112 is sparse and may not provide enough information to generate the article 114. Thus, commencing content generation, the generator module 122 receives the content structure 200 and commences inserting information into the content structure 200. For example, the content structure 200 can be associated with a scientific discovery theme.
To enhance the article while the generator module 122 generates the article 114, the method 500 continues to operation 516 from operation 508, where the initial content data structure 300 is received by the research module 120, and operation 518, where interim content is received by the research module 120 from the generator module 122. As noted above, the node 302A is an example of interim content received from the generator module 122.
The method 500 continues to operation 520, where the research module 120 receives the interim content from the generator module 122 and commences the construction of the content data structure 300. The content data structure 300 includes information relevant to the interim content. The content data structure 300 represents the relationships between additional information researched from the interim content. For example, continuing with the information about Mary discovering a new element, Marium, the research module 120 can access the data resource data store 124 to determine information about one of the first pieces of information, Mary. Records of Mary can be accessed to determine where Mary is, her educational background, and other experiments she has performed.
Continuing with this example using the content data structure 300 of
The method 500 continues to operation 522, where at least a portion of the content in the content data structure 300 is provided to the generator module 122.
The method 500 continues to operation 524 from operation 514, where the generator module 122 receives the content in the content data structure 300 from the research module 120. A determination is made at operation 526 whether or not to include the content in the content data structure 300. At operation 526, if the content is not to be included, the method 500 continues to operation 514 where the content generation is continued.
If at operation 526 the determination is made that the content is to be included, the method 500 continues to operation 528 where the portion is included.
The method 500 continues to operation 530, where a determination is made as to whether or not the content generation process is complete. If the determination at operation 530 is that the content generation process is complete, the method 500 thereafter ends. If the determination at operation 530 is that the content generation process is not complete, the method continues to operation 514, where the content generation is continued.
The method 500 at operation 532 determines if the content data structure 300 is complete. If the content data structure 300 is not complete, the research module 120 can continue generating the content data structure 300. If the content data structure 300 is complete, the process of generating the content data structure 300 ends. The method 500 can thereafter end or can continue at operation 514.
The computer architecture 600 illustrated in
The mass storage device 612 is connected to the CPU 602 through a mass storage controller (not shown) connected to the bus 610. The mass storage device 612 and its associated computer-readable media provide non-volatile storage for the computer architecture 600. Although the description of computer-readable media contained herein refers to a mass storage device, such as a hard disk or CD-ROM drive, it should be appreciated by those skilled in the art that computer-readable media can be any available computer storage media or communication media that can be accessed by the computer architecture 600.
Communication media includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics changed or set in a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media.
By way of example, and not limitation, computer storage media can include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, digital versatile disks (“DVD”), HD-DVD, BLU-RAY, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer architecture 600. For purposes the claims, a “computer storage medium” or “computer-readable storage medium,” and variations thereof, do not include waves, signals, and/or other transitory and/or intangible communication media, per se. For the purposes of the claims, “computer-readable storage medium,” and variations thereof, refers to one or more types of articles of manufacture.
According to various configurations, the computer architecture 600 can operate in a networked environment using logical connections to remote computers through a network such as the network 130. The computer architecture 600 can connect to the network 130 through a network interface unit 614 connected to the bus 610. It should be appreciated that the network interface unit 614 can also be utilized to connect to other types of networks and remote computer systems. The computer architecture 600 can also include an input/output controller 616 for receiving and processing input from a number of other devices, including a keyboard, mouse, or electronic stylus (not shown in
It should be appreciated that the software components described herein can, when loaded into the CPU 602 and executed, transform the CPU 602 and the overall computer architecture 600 from a general-purpose computing system into a special-purpose computing system customized to facilitate the functionality presented herein. The CPU 602 can be constructed from any number of transistors or other discrete circuit elements, which can individually or collectively assume any number of states. More specifically, the CPU 602 can operate as a finite-state machine, in response to executable instructions contained within the software modules disclosed herein. These computer-executable instructions can transform the CPU 602 by specifying how the CPU 602 transitions between states, thereby transforming the transistors or other discrete hardware elements constituting the CPU 602.
Encoding the software modules presented herein can also transform the physical structure of the computer-readable media presented herein. The specific transformation of physical structure can depend on various factors, in different implementations of this description. Examples of such factors can include, but are not limited to, the technology used to implement the computer-readable media, whether the computer-readable media is characterized as primary or secondary storage, and the like. For example, if the computer-readable media is implemented as semiconductor-based memory, the software disclosed herein can be encoded on the computer-readable media by transforming the physical state of the semiconductor memory. For example, the software can transform the state of transistors, capacitors, or other discrete circuit elements constituting the semiconductor memory. The software also can transform the physical state of such components in order to store data thereupon.
As another example, the computer-readable media disclosed herein can be implemented using magnetic or optical technology. In such implementations, the software presented herein can transform the physical state of magnetic or optical media, when the software is encoded therein. These transformations can include altering the magnetic characteristics of particular locations within given magnetic media. These transformations can also include altering the physical features or characteristics of particular locations within given optical media, to change the optical characteristics of those locations. Other transformations of physical media are possible without departing from the scope and spirit of the present description, with the foregoing examples provided only to facilitate this discussion.
In light of the above, it should be appreciated that many types of physical transformations take place in the computer architecture 600 in order to store and execute the software components presented herein. It also should be appreciated that the computer architecture 600 can include other types of computing devices, including hand-held computers, embedded computer systems, personal digital assistants, and other types of computing devices known to those skilled in the art. It is also contemplated that the computer architecture 600 might not include all of the components shown in
Based on the foregoing, it should be appreciated that technologies for automating content generation have been disclosed herein. Although the subject matter presented herein has been described in language specific to computer structural features, methodological and transformative acts, specific computing machinery, and computer readable media, it is to be understood that the invention defined in the appended claims is not necessarily limited to the specific features, acts, or media described herein. Rather, the specific features, acts and mediums are disclosed as example forms of implementing the claims.
The subject matter described above is provided by way of illustration only and should not be construed as limiting. Various modifications and changes can be made to the subject matter described herein without following the example configurations and applications illustrated and described, and without departing from the true spirit and scope of the present invention, aspects of which are set forth in the following claims.