The present invention relates to an information-processing device, an information-processing method, and an information-processing program.
In Patent Literature 1, there is described a content creation system which selects, based on a rule managed by a rule management unit, images to be used for a slideshow from images managed by an image management unit, creates creation instruction information, and creates the slideshow through use of this creation instruction information and the images managed by the image management unit.
There can exist a certain demand for automatically extracting, from content including a plurality of images and a text, important images which can attract attention of a user to the content. For example, the attention of the user can more effectively be attracted through use of secondary content which uses important and usually a small number of images extracted from various pieces of content such as web pages and electronic books and magazines which present physical products and service products (travel products and the like) to users. Such secondary content is, for example, slideshows, moving images, and two-dimensional images formed by arranging the extracted images in a tile form.
However, it is not realistic to manually and exhaustively extract important images from a large amount of content. Moreover, a publicly known technology as described in Patent Literature 1 is a technology which simply selects images based on a predetermined specific rule, for example, such a rule that a specific person is appearing in an image, and there has not been known a technology which extracts important images in accordance with a context which differs from one piece of content to another piece of content.
The present invention has been made in view of the above-mentioned circumstances, and has an object to automatically extract, from content including a plurality of images and a text, an important image which can attract attention of a user to the content.
The invention disclosed in the present application in order to solve the above-mentioned problem has various aspects, and representative aspects among the aspects are summarized as follows.
(1) There is provided an information-processing device for extracting an important image from target information including images and a text, the information-processing device including: an image classification module configured to determine a y indicating an attribute of each of the images; a category extraction module configured to extract a category for each division of the text; a score determination module configured to determine a score of the category based on the text; and an image selection module configured to extract, as the important image, one of the images corresponding to the category selected based on the score.
(2) In the information-processing device of Item (1), the category extraction module includes: a keyword extraction module configured to extract a keyword from the text; and a keyword classification module configured to determine the category corresponding to the extracted keyword.
(3) In the information-processing device of Item (1), the category extraction module includes a machine learning model configured to receive the text as input to output the category.
(4) In the information-processing device of any one of Items (1) to (3), the image classification module includes a machine learning model configured to receive each of the images as input, and to output the category.
(5) In the information-processing device of any one of Items (1) to (4), the keyword classification module is configured to determine, based on a norm between a keyword and a category in a norm space, the category corresponding to the extracted keyword.
(6) In the information-processing device of any one of Items (1) to (4), the keyword classification module is configured to determine, based on a correspondence between a keyword and a category given in advance, the category corresponding to the extracted keyword.
(7) In the information-processing device of any one of Items (1) to (4), the keyword classification module includes a machine learning model configured to receive a keyword as input and to output a category, and is configured to input the extracted keyword into the machine learning model, to thereby determine the corresponding category.
(8) The information-processing device of any one of Items (1) to (7) further includes an assessment module configured to assess whether the text is affirmative or negative, and the score determination module is configured to determine the score based on the assessment.
(9) In the information-processing device of Item (8), the score determination module is configured to determine the score based on the number of appearances of the affirmative assessment for the determined category.
(10) In the information-processing device of Item (8), the score determination module is configured to determine the score based on a value of the assessment for the determined category.
(11) In the information-processing device of any one of Items (1) to (10), the image selection module is configured to distribute, for each category, a total number of extractions specified in advance being the number of extractions of the important images to an individual number of extractions being the number of extractions of the images as the important images in the each category.
(12) In the information-processing device of Item (11), the image selection module is configured to preferentially extract an image dissimilar to the one of the images having already been extracted as the important image when a plurality of images corresponding to one category are to be extracted.
(13) In the information-processing device of Item (12), the image selection module is configured to assess similarity between the images through use of a method selected from a method of using a machine learning model, a method of using a distance between image feature amount vectors, and a combination thereof.
(14) In the information-processing device of any one of Items (1) to (13), the image selection module includes an image score determination module configured to determine an image score being a score for each of the images, and, when a plurality of images corresponding to the category selected based on the score exist, the image selection module is configured to select one of the plurality of images to be extracted as the important image based on the image score of each of the plurality of images.
(15) In the information-processing device of Item (14), the image score determination module includes a machine learning model configured to receive each of the plurality of images as input, and to output the image score.
(16) There is provided an information-processing method of extracting an important image from target information including images and a text, the information-processing method including causing a computer to execute: an image classification step of determining a category indicating an attribute of each of the images; a category extraction step of extracting a category for each division of the text; a score determination step of determining a score of the category based on the text; and an image selection step of extracting, as the important image, one of the images corresponding to the category selected based on the score.
(17) There is provided an information-processing program for causing a computer to function as an information-processing device for extracting an important image from target information including images and a text, the information-processing device including: an image classification module configured to determine a category indicating an attribute of each of the images; a category extraction module configured to extract a category for each division of the text; a score determination module configured to determine a score of the category based on the text; and an image selection module configured to extract, as the important image, one of the images corresponding to the category selected based on the score.
The information-processing device 100 includes an image classification module 10, a keyword extraction module 20, a keyword classification module 30, an assessment module 40, a score determination module 50, and an image selection module 60. Moreover, the information-processing device 100 is configured to receive target information 2 being a target of information processing as input, and to output one or a plurality of important images 3.
A brief description is now given of the information processing to be executed by the information-processing device 100. The target information 2 is information which includes images and a text, and is usually information forming content presented to a consumer. As such content, a web page and electronic book and magazine are exemplified. Moreover, it is effective to prepare, for purposes such as introducing the content to the consumer and guiding the consumer to the content, secondary content which briefly indicates a substance of the content, and allows the consumer to view the content in a short period.
The information-processing device 100 is built so that the information-processing device 100 matches this purpose. As this secondary content or to create this secondary content, a desired number of, for example, about five to ten important images are extracted from the plurality of images included in the target information 2. The important images to be extracted are required to be selected so that the attention of the consumer is attracted more in a short time.
However, which image is to be assessed as the important image depends on a context of the target information 2. For example, when the target information 2 is a web page for introducing a hotel, photographs of an exterior of the hotel, states of rooms thereof, and dishes to be served and the like are required to be properly assessed as important images. However, an irrelevant image included in the target information 2 by chance, for example, a photograph of a blue sky may be attractive as viewed as a photograph alone, but is inappropriate to attract the consumer to the target information 2.
Thus, the information-processing device 100 uses the text of the target information 2 to extract important images matching the context of the target information 2 itself.
A simple and clear description is now given of the components of the information-processing device 100. The image classification module 10 determines a category indicating an attribute of an image. The category is a keyword indicating the attribute of the image, that is, a meaning of the image to the consumer in the context of the target information 2. As exemplified in line with the previous example, when the target information 2 is the web page for introducing the hotel, a word indicating an aspect of the hotel expressed by a photograph is the category. Specifically, each of an appearance, a front desk, a lobby, a room, a view, a dish, a bathroom, an amenity, and the like corresponds to the category. That is, the image classification module 10 classifies various images included in the target information 2 to a category corresponding thereto.
The keyword extraction module 20 extracts keywords from the text. It is considered that the text included in the target information 2 have a description in line with a context thereof. Keywords reflecting this context are thus to be included. Those keywords diverge over a wide range, and include, for example, lunch, buffet, bedroom, and view. The keywords may or may not match a word indicating the category.
The keyword classification module 30 determines a category corresponding to the keyword. That is, the extracted keyword itself does not match a word indicating a category, and it is impossible to directly use the keyword such that the text is used to assess the image. Thus, the keyword classification module 30 serves to classify the extracted keyword into the category, to thereby convert the keyword to the category. Specifically, words such as breakfast, lunch, dinner, buffet, and restaurant are to be classified into a category of “dish.” Words such as bedroom, interior, and wallpaper are to be classified into a category of “room.”
The assessment module 40 is not always an indispensable component, and assesses whether the text is affirmative or negative in this embodiment. Specifically, when a portion of the text from which a certain keyword is extracted, for example, a sentence, is used in a positive context, the positive assessment is made. When the same portion is used in a negative context, the negative assessment is made. That is, the assessment module 40 serves to semantically add the assessment to the extracted keyword.
The assessment is added by the operation of the assessment module 40 to the keyword extracted in the keyword extraction module 20, and hence information formed of a set (keyword, assessment) of the keyword and the assessment is obtained. Further, the keyword is classified into the category by the keyword classification module 30. Thus, information (category, assessment) formed of a set of the category and the assessment is finally obtained from the text.
The score determination module 50 determines a score of the category based on the text. In this embodiment, the score for each category is obtained from the information (category, assessment) formed of the set of the category and the assessment. As a result, information (category, score) formed of a set of the category and the score is obtained.
In the manner described above, the category of the image classified by the image classification module 10 and the score obtained from the text can be associated with each other. The image selection module 60 selects categories based on the scores, and extracts the images corresponding to those categories as important images 3.
Description is given later of specific details of the information processing executed in the image classification module 10, the keyword extraction module 20, the keyword classification module 30, the assessment module 40, the score determination module 50, and the image selection module 60 described above.
The information-processing device 100 described above may be physically implemented through use of a general computer.
In the computer 1, a central processing unit (CPU) 1a, a random access memory (RAM) 1b, a static storage device 1c, a graphics controller (GC) 1d, an input device 1e, and an input/output (I/O) 1f are connected through a data bus 1g so that electrical signals can mutually be transmitted and received. In this configuration, the static storage device 1c is a device which can statically record information, such as a hard disk drive (HDD) or a solid state drive (SSD). Moreover, the signal from the GC 1d is output to a monitor 1h for a user to visually recognize an image, such as a cathode ray tube (CRT) or a so-called flat panel display, and is displayed as an image. The input device 1e is a device for the user to input information, such as a keyboard, a mouse, or a touch panel. The I/O 1f is an interface for the computer 1 to transmit and receive information to and from external devices. A plurality of CPUs 1a may be prepared so that parallel computing is executed in accordance with a load of the information processing required to be executed by the computer 1.
An information-processing program including an instruction sequence for causing the computer 1 to function as the information-processing device 100 is installed in the static storage device 1c, is read out onto the RAM 1b as required, and is executed by the CPU 1a. Moreover, this program may be recorded in an appropriate computer-readable information recording medium such as an appropriate optical disc, magneto-optical disc, or flash memory, and may then be provided, or may be provided through an information communication line such as the Internet. Moreover, the interface to be used by the user of the information-processing device 100 may be implemented on the computer 1 itself, and the user may directly operate the computer 1, may be implemented by a method of the so-called cloud computing in which general-purpose software such as a web browser is used on another computer and a function is provided from the computer 1 through the I/O 1f, or may further be implemented so that the computer 1 provides an application programing interface (API) available for another computer so that the computer 1 operates as the information-processing device 100 in response to a request from the another computer.
Each of the components of the information-processing device 100 of
Referring back to
The machine learning model 11 included in the image classification module 10 is not limited to the CNN. The machine learning model 11 may be, for example, a machine learning model connected to a transformer-based model caused to learn in advance. Moreover, the machine learning model 11 may be a machine learning model other than the deep neural network (DNN), for example, a support vector machine (SVM) based on a feature amount of the image or another model.
After that, in Step S104, it is determined whether or not all of the images included in the target information 2 have been acquired. When images which have not been acquired exist, 1 is added to N in Step S105. The process then returns to Step S102, and the subsequent processing steps are repeated. When all of the images have been acquired, the categories have been determined for all of the images included in the target information 2, and the processing is thus finished.
Referring back to
The extraction of the keywords by the keyword extraction module 20 may be executed by feature word extraction in a known natural language analysis method. Specifically, importance of a word included in the text is obtained by the time frequency-inverse document frequency (TF-IDF), and a word having the highest importance or one or a plurality of words having importance higher than a predetermined threshold value in the division are extracted as the keywords. Depending on a language of the text being the target of the extraction of the keywords, appropriate preprocessing such as decomposition of the texts into the words and filtering based on the part of speech of the word may be performed before the TF-IDF. For example, when the target text is Japanese, which is an agglutinative language, known morphological analysis is applied to the text, to thereby decompose the text into words, and a filter which extracts only nouns from the decomposed words is applied. When the target text is English, which is an inflectional language, it is only required to extract only nouns from words, and to apply, for example, a filter which restores, to the original form, a word which has inflected in accordance with the difference between singular and the plural and the case.
After that, in Step S203, 1 is assigned to the variable N. Subsequently, in Step S204, an N-th sentence included in the text is acquired. In Step S205, a keyword is extracted based on the importance from words included in the acquired sentence. In this embodiment, the word having the highest importance in the sentence is extracted as the keyword.
After that, in Step S206, it is determined whether or not all of the sentences included in the text have been acquired. When sentences which have not been acquired exist, 1 is added to N in Step S207. The process then returns to Step S204, and the subsequent processing steps are repeated. When all of the sentences have been acquired, the keywords have been extracted from all of the sentences included in the text, and the processing is thus finished.
Referring back to
More specifically, the keyword classification module 30 maps the words indicating the categories and the keywords extracted by the keyword extraction module 20 in a multi-dimensional norm space. That is, a certain word is represented as a vector quantity in a semantic space, and a norm in the norm space indicates a closeness in meaning between words. In
First, keyword1 is focused. A norm, that is, a distance from keyword1 to category1 is represented by d1. A distance from keyword1 to category2 is represented by d2. In this case, the keyword classification module 30 determines the category having the shortest distance to keyword1 as the corresponding category. That is, in this example, “d1<d2” is satisfied, and category1 is thus determined as the corresponding category.
Similarly, for the keyword2, a distance to category1 is represented by d3, and a distance to category2 is represented by d4. The relationship “d4<d3” is satisfied, and category2 is thus determined as the corresponding category.
In a second example in which the keyword classification module 30 determines the category corresponding to the keyword, the keyword classification module 30 determines a category corresponding to the extracted keyword based on correspondences between keywords and a category given in advance.
Most simply put, a thesaurus for the words used as the categories is prepared. For example, for a category “meal,” breakfast, lunch, dinner, buffet, restaurant, and the like are given as the corresponding words. When the extracted keyword matches any one of those words, “meal” is determined as the corresponding category in this method.
As another example, the method may be a method of using a concept tree and determining a category having the shortest distance on the concept tree as the corresponding category.
In
In this diagram, a word indicating the category is highlighted by a thick frame. Moreover, the keyword classification module 30 counts a distance between a keyword and a word indicating a category on this concept tree, and determines a category having the shortest distance as a corresponding category. This distance is conceptualized as the number of layers required to reach a target word on the concept tree. For example, when an extracted keyword is “appetizer,” the category “meal” can be reached via two layers as “appetizer”→“dinner” →“meal,” and the distance is thus “2”. Similarly, a distance to the category “restaurant” is “4”, and “meal” having the shortest distance is thus determined as the category in this case.
In a third example in which the keyword classification module 30 determines the category corresponding to the keyword, the keyword classification module 30 includes a machine learning model which receives a keyword as input and outputs a category, and inputs the extracted keyword into the machine learning model to determine the corresponding category. The machine learning model is caused to learn in advance through use of appropriate teacher data. As this machine learning model, in addition to an appropriate DNN, an appropriate machine learning model such as an SVM and a decision tree may be used.
Subsequently, in Step S302, 1 is assigned to the variable N. In Step S303, an N-th keyword is acquired. In Step S304, the acquired keyword is mapped in the norm space. After that, in Step S305, a category having the smallest norm from the keyword is determined as the corresponding category.
After that, in Step S306, it is determined whether or not all of the keywords have been acquired. When keywords which have not been acquired exist, 1 is added to N in Step S307. The process then returns to Step S303, and the subsequent processing steps are repeated. When all of the keywords have been acquired, the corresponding categories have been determined for all of the keywords extracted from the text, and the processing is thus finished.
As described above, the information-processing device 100 according to this embodiment uses the keyword extraction module 20 to extract each of the keywords in each division from the text included in the target information 2, and uses the keyword classification module 30 to classify each of the extracted keywords into the category. Thus, when the keyword extraction module 20 and the keyword classification module 30 are considered as an integrated information-processing unit, those modules can be considered as a module which extracts the category for each division of the text, and hence the keyword extraction module 20 and the keyword classification module 30 as a whole can be referred to as “category extraction module 70.”
Referring back to
For this assessment, a method of the semantic analysis in the known natural language analysis may be used. Most primitively, for example, a dictionary collecting affirmative words and negative words may be created, and when the number of affirmative words included in the target division, the sentence in this case, is more than the number of negative words, the affirmative assessment may be given. Further, when the number of negative words is more than the number of affirmative words, the negative assessment may be given, and, when the number of negative words is the same as the number of affirmative words, the neutral assessment may be given. As another example, the target division may be input to an appropriately learned machine learning model, to thereby obtain numerically the assessment, and this method is used in this embodiment.
In this embodiment, there is exemplified the aspect in which the assessment module 40 assesses the text based on a pair of the two polarities being affirmative and negative, but the text may be assessed based on a plurality of pairs. As the plurality of pairs, for example, four pairs based on eight polarities, namely, joy, sadness, anger, fear, fondness, dislike, relief, and surprise, are given. The polarities employed by the assessment module 40 may not be limited to affirmative and negative, and known sentiment polarities may be employed regardless of the number thereof. When the assessment module 40 assesses the text based on a plurality of pairs, a numerical (quantitative) assessment in one dimension may be obtained as this assessment by weighting each pair. Moreover, the assessment module 40 may weight each of two or more polarities, to thereby numerically (quantitatively) obtain this assessment. When this assessment is numerically obtained, the assessment may be determined on a binary basis by providing a threshold value.
Referring back to
A first example of the method of determining the score of the category by the score determination module 50 is a method of simply counting the number of appearances of the affirmative assessment in each category. When the assessment made in the assessment module 40 is numerical, it is only required to provide an appropriate threshold value, to thereby treat an assessment having an assessment value equal to or higher than the threshold value as affirmative. With this configuration, for example, when the categories are described as (appearance, room, dish, . . . ), the scores are given as (5, 20, 15, . . . ) in correspondence thereto.
A second example of the method of determining the score of the category by the score determination module 50 is a method of calculating, as the score, a statistical value such as a sum or an average of the assessments when the assessments are numerical. When the assessment is given, for example, in a range of from 0 to 1, the scores are given as, for example, (0.21, 0.79, 0.60, . . . ) in the above-mentioned example.
It is only required that the method of determining the score of the category by the score determination module 50 be determined in accordance with an application of the extraction of the important images from the target information 2. For example, in the above-mentioned first method, the score is determined through use of only the affirmative assessments, and hence the first method suits to a case in which important images 3 having stronger appeal to consumers who affirmatively accept the target information 2 are required to be extracted. The above-mentioned second method suits to a case in which important images 3 appealing to consumers are required to be extracted while reducing a possibility that the consumers negatively accept the target information 2. A method other than the methods described herein may be used.
In Step S502, an N-th assessment is acquired. In Step S503, whether the acquired assessment is affirmative or negative is determined. When the assessment is affirmative, the process proceeds to Step S504, and 1 is added to the scoren for a category linked to the acquired assessment. When the assessment is not affirmative, no processing is executed, and the process proceeds to Step S505. When the assessment is negative, a score also reflecting the negative assessment may be obtained by subtracting 1 or another numerical value (for example, 0.5) from scoren.
In Step S505, whether or not all of the assessments have been acquired is determined. When assessments which have not been acquired exist, 1 is added to N in Step S506. The process then returns to Step S502, and the subsequent processing steps are repeated. When all of the assessments have been acquired, the scores have been determined for all of the categories, and the processing is thus finished.
Referring back to
First, the selection of the categories based on the scores being the first step is, more simply put, a step of determining the number of images to be selected as the important images 3 from each of the plurality of categories in accordance with the scores.
The number of important images 3 is usually predetermined in accordance with secondary content to be created based on the important images 3. For example, when the secondary content is a slideshow or a moving image, the required number of important images are determined under conditions such as an assumed viewing time thereof. The number of important images 3 to be extracted is hereinafter referred to as “total number of extractions.”
In this case, the problem is distribution of the total number of extractions to the number of important images 3 belonging to each category in order to more strongly appeal to the consumer. The score of each category determined by the score determination module 50 can be considered as an index which indicates importance of each category in order to more strongly appeal to the consumer. That is, it is considered that a large number of important images 3 are to be selected from an important category (that is, having a high score), and a smaller number of important images 3 are to be selected from an unimportant category (that is, having a low score). Meanwhile, when the selected important images 3 are excessively concentrated to a specific category, only similar images are selected, and hence it is empirically observed that the appeal to the consumer contrarily decreases. Thus, the image selection module 60 is required to, in the first step, select a larger number of important images 3 from a more important category while selecting important images 3 from more various categories.
Thus, when the number of images to be extracted as the important images 3 from an individual category is referred to as “individual number of extractions,” the image selection module 60 distributes the total number of extractions to the individual number of extractions in each category. A method for this distribution may be a known method or a novel method, but a distribution method for seats in the proportional representation in the election can be generally used. Specifically, the distribution to the individual numbers of extractions can appropriately be executed by considering the categories as parties, the assessment as the number of votes, and the total number of extractions and the individual number of extracts as a total number of seats and the number of won seats, respectively.
Various distribution methods for the seats exist, and any method may be employed. In this embodiment, the Sainte-Laguë method is used from such a viewpoint that the important images 3 are more likely to be extracted from a larger number of categories. However, other methods, for example, the Hare-Niemeyer method, the D'Hondt method, and the modified Sainte-Laguë method may be used.
Subsequently, in the second step, images corresponding to the selected categories are extracted as the important images 3. In this case, when the number of images included in the target information 2 and belonging to a specific category is equal to the individual number of extractions distributed to this category, it is only required to simply extract all of the images belonging to this category. However, those numbers are generally different from each other. In particular, when the number of images belonging to the specific category is larger than the individual number of extractions, it is required to select images to be extracted.
In the opposite case, that is, when the number of images belonging to the specific category is smaller than the individual number of extractions, it is only required to use a method of reassigning a disqualified seat in the proportional representation to redistribute the individual number of extractions.
Thus, the image selection module 60 assesses, in advance, appeal of each of the images included in the target information 2 at the time when this image is viewed solely. This appeal is numerically assessed, and is referred to as “image score.” The image selection module 60 includes an image score determination module 61 which determines the image scores being scores for images. When there exist a plurality of images corresponding to the category selected based on the score, the image determination module 61 selects images to be extracted as the important images 3 based on the image scores of the images.
The image score determination module 61 may include, in order to determine the image score, a machine learning model which receives an image as input and output an image score. As such a machine learning model, for example, an aesthetic score assessment model based on the CNN may be used, and an obtained aesthetic score may be used as the image score. As another example, in place of or in addition to the aesthetic score assessment model, another assessment model, for example, a click through rate (CTR) prediction model may be used, and the obtained CTR prediction may be used as the image score or a weighted sum of the aesthetic score and the CTR prediction may be used as the image score. In this embodiment, the image score determination module 61 includes the aesthetic score assessment model, and uses the aesthetic score as the image score.
In order to extract images corresponding to the required individual number of extractions based on the image scores, simply put, it is only required to select images corresponding to the individual number of extractions in a descending order of the image score. However, when this simple method is employed, there can occur such a problem that images which have a high image score and are extremely close to each other and substantially the same are extracted, and the appeal of the important images 3 as a whole to the consumer decreases.
Thus, in this embodiment, when a plurality of images corresponding to one category are to be extracted, the image selection module 60 preferentially extracts images dissimilar to one another so that images similar to one another are not extracted as the important images 3. As a specific method for the extraction, the important images 3 are not extracted simply based on the order of the image score, but the important images 3 are extracted so that an assessment value Reward given by Expression 1 based on the image scores is maximized.
In Expression 1, M is the individual number of extractions, Iscore (i) is an image score of an i-th image in one category, Isimilarity (i, j) is similarity of the i-th image and a j-th image in the one category, and Δ1 and Δ2 are any weights.
As the meaning of Expression 1, the first term is a sum of the image scores of images extracted in the one category. Thus, the assessment value Reward increases as images having higher image scores are extracted more. An absolute value of the second term increases as the number of similar images included in the extracted images increases, and as those similar images are more similar to one another. Consequently, the assessment value Reward decreases. That is, maximization of the assessment value Reward is equivalent to the extraction of images having higher image scores while avoiding extraction of images similar to one another.
The selection of images which maximize the assessment value Reward can be executed by, for example, using the dynamic programming to search for an optimal combination. Other methods may be used.
The similarity of the images used in Expression 1 may be obtained by a method used in any known image processing technology. As such a method, there are exemplified a method of using the DNN such as the CNN or another machine learning model, a method of using a distance between image feature amount vectors, a combination of those methods, and the like. In this embodiment, a machine learning model based on the CNN is used to obtain the similarity.
As the method of preferentially extracting an image dissimilar to images which have already been extracted as the important images 3 when a plurality of images corresponding to one category are to be extracted, in addition to the above-mentioned method, there may be employed, for example, a method of clustering the images belonging to the one category to as many clusters as the individual number of extractions, and selecting an image having the highest image score in each cluster. Other methods may be used.
The image selection module 60 subsequently assigns 1 to the variable N in Step S602, and further selects an N-th category in Step S603. In Step S604, the dynamic programming is used to select the combination of images corresponding to the individual number of extractions which optimizes the assessment value Reward. The images extracted in this case are the important images 3.
In Step S605, whether or not all of the categories have been selected is determined. When categories which have not been selected exist, 1 is added to N in Step S606. The process then returns to Step S603, and the subsequent processing steps are repeated. When all of the categories have been selected, as many important images 3 as the total number of extractions have been selected, and the processing is thus finished.
The information-processing device 200 is not provided with the assessment module 40 (see
Accordingly, the score determination module 50′ cannot determine the score based on the assessment for the category. In this case, it is only required that the score determination module 50′ count the number of appearances of each of the categories finally obtained based on the text, and set the number of appearances as the score. In this case, the context in which the keyword corresponding to the category is used in the text is not considered, but, for the target information 2 having such a nature that the assessment such as the affirmative assessment and the negative assessment is not required to be considered, appropriate important images 3 are obtained even when the information-processing device 200 is used. As such target information 2, there are exemplified, for example, content (for example, catalog) mainly including objective information and content (for example, academic paper) in which negative information is considered almost as important as affirmative information.
In Step S502′, an N-th category is acquired. In Step S504′, 1 is added to scoren corresponding to the acquired category. Subsequently, in Step S504′, whether or not all of the categories have been acquired is determined. When categories which have not been acquired exist, 1 is added to N in Step S505′. The process then returns to Step S502′, and the subsequent processing steps are repeated. When all of the categories have been acquired, the scores have been determined for all of the categories, and the processing is thus finished.
Other configurations of the information-processing device 200 may be the same as those of the information-processing device 100 according to the previous embodiment. According to the information-processing device 200 of this embodiment, it is also possible to automatically extract, from content including a plurality of images and a text, an important image which can attract attention of a user to the content.
In this embodiment, in place of the keyword extraction module 20 and the keyword classification module 30, the category extraction module 70 is configured to directly extract the categories from the text included in the target information 2. Specifically, the category extraction module 70 includes a machine learning model 71 which receives a text as input, to thereby output categories, and the categories are extracted by directly inputting the text of the target information 2 to the machine learning model 71.
In this case, the machine learning model 71 may be a learned model based on the transformer. Moreover, the category extraction module 70 may, in advance, divide the text included in the target information 2 into any units before the input of the text into the machine learning model 71 in the same manner as in the keyword extraction module 20 included in the information-processing devices 100 and 200 according to the first embodiment and the second embodiment. A category for each division is immediately obtained by inputting the text in each division into the machine learning model 71.
Also with this configuration, the information-processing device 300 can use the score determination module 50 to obtain the information (category, score) formed of the set of the category and the score, and can thus subsequently use the image selection module 60 to extract the important images 3 in the same manner as in the information-processing device 100 according to the first embodiment.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/JP2021/039472 | 10/26/2021 | WO |