The present application claims priority from Japanese patent application JP 2019-18804 filed on Feb. 5, 2019, the content of which is hereby incorporated by reference into this application.
The present invention relates to a detection apparatus, a detection method, and a detection program by which information is detected.
JP 2005-38402 A discloses a server system that, when probing for unauthorized use of image data that requires licensing, searches the internet for image data that matches or is similar to the image data subject to the probe, and that notifies the probe requester of results of the search. This server system has a search server and a management server, and is connected to a client terminal through a network. The management server records the image data inputted from the client terminal in a probe database as the image data being probed for each probe requester, and sets probe conditions for probing whether the image data has been used without authorization in a group of websites on a network. The search server calculates feature values of the image data recorded in the probe database and searches the group of websites for image data that matches or is similar to the image data being probed on the basis of the feature values and the search conditions, and the management server transmits the search results to the client terminal.
However, the server system disclosed in JP 2005-38402 accumulates image data in the probe database to increase accuracy. That is, an effort to keep adding image data to be recorded is required. Also, illegitimate content is uploaded to websites by changing the content or means of uploading, and thus, it would be difficult to adapt to changes in circumstance by a method in which image data is accumulated in a probe database.
An object of the present invention is to efficiently detect illegitimate transaction item candidates. A detection apparatus which is an aspect of the invention disclosed in the present application is a detection apparatus, comprising: a processor that is configured to execute a program; and a storage device that stores the program, wherein the processor is configured to execute: a search process of accessing a site having a group of pages pertaining to transaction items using a search keyword pertaining to a legitimate transaction item, thereby searching the site for a given page including a character string that matches or relates to the search keyword; an acquisition process of acquiring, from the given page found by the search process, a first evaluation character string that indicates a given transaction item that is included in the given page, and a second evaluation character string that describes the given transaction item; an evaluation process of evaluating whether the given page is a page pertaining to an illegitimate transaction item on the basis of an evaluation keyword pertaining to an illegitimate transaction item, and the first and second evaluation character strings acquired by the acquisition process; and an output process of outputting evaluation results obtained by the evaluation process. According to a representative embodiment of the present invention, it is possible to efficiently detect illegitimate transaction item candidates. Other objects, configurations, and effects than those described above are clarified by the following description of an embodiment.
An embodiment of a detection apparatus 100, a detection method, and a detection program according to the present invention will be explained below with reference to the attached drawings. The detection apparatus 100, the detection method, and the detection program detect illegitimate transaction item candidates. “Transaction items” include articles and software. Smartphones and smartwatches are examples of articles and applications that are installed in smartphones and that control smartwatches are examples of software. In the embodiment below, an example is described in which unlicensed and illegitimate applications that have not received licensing from a provider (including developers) of legitimate applications are detected.
<Example of Detection of Illegitimate Applications>
The distribution server 102 is a site having specification pages relating to the transaction items. The specification pages are web pages with information pertaining to the transaction items. In this example, the distribution server 102 is an application store that distributes applications for use on a smartphone. The specification pages are, for example, specification pages 131 and 132 of applications. The distribution server 102 has the function of returning a list of information in which URLs to corresponding specification pages (including IDs for the applications) are listed in the order of a score based on the degree of coincidence to a provided search keyword. The end user terminal 103 is a terminal used by the end user, and in this example is a smartphone. The “end user” is a user of the end user terminal 103.
The end user terminal 103 accesses the distribution server 102, downloads specification pages for applications, and displays the specification pages on a display screen 130. Here, a specification page 131 for a legitimate application and a specification page 132 for an illegitimate application are given as examples, and both specification pages 131 and 132 have the same layout. In this example, the illegitimate application has not received licensing from XYZ Electrical Machinery Co., Ltd., which provides the legitimate application, and is an electronic version of an operation manual for “CDEF”, which is a product of XYZ Electrical Machinery Co., Ltd.
The specification page 131 of the legitimate application and the specification page 132 of the illegitimate application both display an icon 141, an application name 142, a provider name 143, a download button 144, a thumbnail image 145, and a description 146. The icon 141 is a thumbnail image of a prescribed size indicating the application. The application name 142 is a character string indicating the name of the application. In this example, the application name 142 of the legitimate application is “ABC” and the application name 142 of the illegitimate application is “CDEF Manual”.
The provider name 143 is a character string indicating the name of the provider of the application. In this example, the provider name 143 of the legitimate application is “XYZ Electrical Machinery Co., Ltd.” and the provider name 143 of the illegitimate application is “qrstuv”. The download button 144 is a button that, by being pressed by the end user, enables downloading of the application. If the download button 144 says “Install”, then the application is free of charge. If the download button 144 states a price, then the application costs money.
In this example, the download button 144 of the legitimate application says “Install” whereas the download button 144 of the illegitimate application says “¥500”. Thus, the end user would not be charged for the legitimate application but would be charged for the illegitimate application. If such an illegitimate application were to become prevalent, then money that the legitimate provider should receive does not go to the provider, and even if the illegitimate application were free, if the quality of the illegitimate application is bad, this could damage the brand image of the legitimate provider. The thumbnail image 145 is an image introducing the application. The description 146 is a character string describing how to use the application.
<Computer Hardware Configuration Example>
<Functional Configuration Example of Detection Apparatus 100>
The search condition database 310 is a database that stores search conditions. The search condition database 310 is provided in the detection apparatus 100, but may be provided externally so as to be accessible by the detection apparatus 100. The search condition database 310 specifically stores a country designation list 311, a search keyword list 312, a search result count upper limit 313, and an access sleep interval 314, for example.
The country designation list 311 is a list of information of country codes that designate countries (or regions). As an example, the code for Japan is “JP”, the code for the United States is “US”, the code for the People's Republic of China is “CN”, and the code for Taiwan is “TW”. The distribution server 102 changes the group of applications that can be distributed in each country. The specification page of a given application indicates in the end user terminal 103 in a certain country that the application is downloadable, while not indicating that the application is downloadable in the end user terminal 103 in other countries, for example. The country designation list 311 is set in the detection apparatus 100 in advance or by the user operating the user device 101. The country code is selected from the country designation list 311 by the user operating the user device 101.
The search keyword list 312 is a list of information pertaining to search keywords. There is a search keyword list 312 for each type of search keyword. The search keyword is a keyword for searching a group of specification pages of applications stored by the distribution server 102.
All of the search keyword lists 400 to 600 have a sole use condition 402. The sole use condition 402 is a flag that indicates whether the corresponding search keyword can be used on its own. “Yes” indicates that the corresponding search keyword can be used on its own. In entry number 1 of the search keyword list 400, for example, the company name 401 is “XYZ Electrical Machinery” and the sole use condition 402 is set to “yes”. Thus, “XYZ Electrical Machinery” can be used on its own as a search keyword.
“No” indicates that the corresponding search keyword cannot be used on its own. In entry number 4 of the search keyword list 400, for example, the company name 401 is “XYZ” and the sole use condition 402 is set to “no”. Thus, “XYZ” cannot be used on its own as a search keyword. A search keyword with a sole use condition 402 of “no” can be used in combination with other search keywords for a search by the search unit 301. Other search keywords may be present in the same search keyword list or may be present in other search keyword lists. Also, the sole use condition 402 of other search keywords may be “yes” or “no”.
The search keyword lists 400 to 600 are set in the detection apparatus 100 in advance or by the user operating the user device 101. The search keyword is selected by the detection apparatus 100 in consideration of the sole use condition 402 of the search keyword lists 400 to 600.
The company name 401 is a name of the company. The company name 401 may be in Japanese or in another language (such as English). The company name 401 may be an abbreviation. The product name 501 is the representative name or model number of a product by the company that manufactures the product. A nickname having an equivalent brand value may be used for the product name 501. The rival company name 601 is another company in the same industry as the company specified under the company name 401. By performing a search thereof in combination with the company name 401, it is possible to search for applications that handle products or parts by various manufacturers in the industry.
Returning to
The search result count upper limit 313 is set in the detection apparatus 100 in advance or by the user operating the user device 101.
The access sleep interval 314 is a time interval for which the search process is set to sleep from when the detection apparatus 100 accesses the distribution server 102 to execute the search process using the search keyword to when the detection apparatus 100 accesses the distribution server 102 next. Setting the access sleep interval 314 mitigates a situation in which the distribution server 102 blocks access from the detection apparatus 100 as a result of too many accesses from the detection apparatus 100 to the distribution server 102 in a short period of time.
The access sleep interval 314 is set in the detection apparatus 100 in advance or by the user operating the user device 101.
The whitelist 331 is a list of information that stores application IDs of legitimate applications. The application ID is unique identification information for identifying applications, and the application ID differs for different applications. The application IDs of legitimate applications are recorded in the whitelist 331 of the detection apparatus 100 and the distribution server 102. The application IDs of the whitelist 331 are set in advance or by the user operating the user device 101.
The exclusion list 332 is a list of information that stores application IDs of applications to be excluded. Applications to be excluded are applications that are not legitimate applications but should not be included in the search results from the distribution server 102, or in other words, applications that have already been detected as illegitimate applications, for example. The application IDs of applications to be excluded are recorded in the whitelist 331 of the detection apparatus 100 and the distribution server 102. The application IDs of the exclusion list 332 are set in advance or by the user operating the user device 101.
The evaluation condition database 360 has an evaluation keyword list 361 and scoring rules 362. The evaluation keyword list 361 is a list of information pertaining to evaluation keywords. There is an evaluation keyword list 361 for each type of evaluation keyword. The evaluation keyword is for evaluating whether an application of which the specification page was searched is an illegitimate application.
All of the evaluation keyword lists 700 to 900 have the above-mentioned sole use condition 402. The sole use condition 402 is a flag that indicates whether the corresponding evaluation keyword can be used on its own. “Yes” indicates that the corresponding evaluation keyword can be used on its own. In entry number 1 of the evaluation keyword list 900, for example, the group company name 901 is “XYZ Automotive” and the sole use condition 402 is set to “yes”. Thus, “XYZ Automotive” can be used on its own as an evaluation keyword.
“No” indicates that the corresponding evaluation keyword cannot be used on its own. In entry number 2 of the evaluation keyword list 700, for example, the product type name 701 is “Cameras” and the sole use condition 402 is set to “no”. Thus, “Cameras” cannot be used on its own as an evaluation keyword. An evaluation keyword with a sole use condition 402 of “no” can be used in combination with other evaluation keywords for evaluation by the evaluation unit 305. The other evaluation keywords may be present in the same evaluation keyword list or may be present in other evaluation keyword lists. Also, the sole use condition 402 of the other evaluation keywords may be “yes” or “no”.
The evaluation keyword lists 700 to 900 are set in the detection apparatus 100 in advance or by the user operating the user device 101. The evaluation keyword is selected by the detection apparatus 100 in consideration of the sole use condition 402 of the evaluation keyword lists 700 to 900. The detection apparatus 100 may use at least one of the search keyword lists 400 to 600 as the evaluation keyword list 361.
The product type name 701 is a name of the type of product handled by the company. Applications that do not have hits with only the company name 401 might have hits when the company name is searched in combination with the product type name 701.
The suspicious keyword 801 is a given keyword that a user believes to be in common use in specification pages 132 of illegitimate applications, or has actually been used before in specification pages 132 of illegitimate applications. Specifically, the suspicious keyword 801 is a general word that is commonly included in documents created by companies, for example. More specifically, the suspicious keyword 801 is a keyword that pertains to the usage method for an application such as a user manual, a catalog, or a training book, a keyword that pertains to the usage method for a product connected to the application, or a keyword pertaining to a description of a part or the like that constitutes the product.
If the company name 401 and the product name 501 are searched in combination, electronic application versions of documents sometimes receive hits. If the suspicious keyword 801 is used in the specification page 132 of an illegitimate application, the end user might mistake the illegitimate application for a legitimate application and download the illegitimate application onto the end user terminal 103. In order to prevent such downloads of illegitimate applications, the suspicious keyword 801 is set as the search condition.
The group company name 901 is another company name in the same group as the company specified under the company name 401. In some cases, applications provided by a group company or applications of a partner company that engages in business with the group company receive hits.
Returning to
The evaluation points 1004 are points determined according to eight possible combinations of yes/no for the first to third evaluation items 1001 to 1003 in the illegitimate application candidate detection list (specification page data) 350. The higher the evaluation points 1004 are, the higher the probability is that the application is an illegitimate application.
Returning to
Below, a detailed description will be made regarding the search unit 301. The search unit 301 has an extraction unit 302 and a search refinement unit 303. The extraction unit 302 searches for specification pages in the distribution server 102 according to search conditions of the search condition database 310, and extracts the URLs of the specification pages of illegitimate application candidates as search results. Search conditions of the search condition database 310 include a country code selected from the country designation list 311, a search keyword selected from the search keyword list 312, the search result count upper limit 313, and the access sleep interval 314.
Specifically, the extraction unit 302 transmits search information including the URL of the distribution server 102, the search keyword, and the country code to the distribution server 102, for example. The distribution server 102 searches for a group of specification pages according to search conditions, and returns to the extraction unit 302 the URLs of the specification pages of the corresponding illegitimate application candidates (including application IDs of the illegitimate application candidates) as search results. The search results are a list of information in which URLs to corresponding specification pages are listed in the order of a score based on the degree of coincidence to the search keywords in the distribution server 102.
The extraction unit 302 extracts, from the search results, URLs starting with the URL with the top score to the URL matching the search result count upper limit 313 in sequential order, and outputs the URLs as an illegitimate application candidate detection URL list 320. The extraction unit 302 stops transmission of search information to the distribution server 102 during the access sleep interval 314, and every time the access sleep interval 314 elapses, the extraction unit 302 generates search information with a different search keyword and transmits the search information to the distribution server 102.
The search refinement unit 303 uses at least one of the whitelist 331 or the exclusion list 332 to narrow down URLs in the illegitimate application candidate detection URL list 320. Specifically, if the search refinement unit 303 uses the whitelist 331, for example, it deletes URLs including application IDs in the whitelist 331 from the illegitimate application candidate detection URL list 320.
Also, if the search refinement unit 303 uses the exclusion list 332, for example, it deletes URLs including application IDs in the exclusion list 332 from the illegitimate application candidate detection URL list 320. The illegitimate application candidate detection URL list 320 outputted from the search refinement unit 303 is referred to as the illegitimate application candidate detection URL list (unnecessary data deleted) 340. The search refinement unit 303 is not a necessary function but rather one that can be selected. If the search refinement unit 303 is not used, the illegitimate application candidate detection URL list 320 outputted from the extraction unit 302 is outputted to the acquisition unit 304.
The acquisition unit 304 accesses the distribution server 102 with reference to the illegitimate application candidate detection URL list (unnecessary data deleted) 340 from the search unit 301 or the illegitimate application candidate URLs in the illegitimate application candidate detection URL list 320, and acquires from the distribution server 102 specification page data of specification pages corresponding to the illegitimate application candidate detection URLs. In the case of the specification page 132 shown in
The specification page data of each acquired specification page is referred to as the illegitimate application candidate detection list (specification page data) 350. The text data extracted from the icon 141 and the text data extracted from the thumbnail image 145 are referred to as in-image text.
The evaluation unit 305 uses the evaluation condition database 360 to evaluate the specification page data in the illegitimate application candidate detection list (specification page data) 350. Specifically, the evaluation unit 305 searches for the specification page data in the illegitimate application candidate detection list (specification page data) 350 using an evaluation keyword in the evaluation keyword list 361, for example.
The evaluation unit 305 determines whether or not the evaluation keyword is present for each piece of specification page data in the illegitimate application candidate detection list (specification page data) 350, and calculates evaluation points using the scoring rules 362. Specifically, the evaluation unit 305 calculates, using the scoring rules 362, evaluation points regarding whether or not the evaluation keyword is present in the application name 142 of the specification page data in the illegitimate application candidate detection list (specification page data) 350, and whether or not the evaluation keyword is present in the the description 146 and the in-image text.
The evaluation unit 305 calculates total points by adding up the evaluation points. The higher the total points are, the higher the probability is that the specification page is of an illegitimate application. Thereafter, the evaluation unit 305 outputs the illegitimate application candidate detection list (with scores) 370. The illegitimate application candidate detection list (with scores) 370 is specification page data (see
The creation unit 306 creates an illegitimate application candidate detection list 390 by adding the illegitimate application candidate detection list (with scores) 370 to an illegitimate application candidate detection list template 380.
The application ID 1101 is included in the URL 1105 of the specification page in the illegitimate application candidate detection list (specification page data) 350. The application name 1102 is a character string indicating the application name 142 in the specification page in the illegitimate application candidate detection list (specification page data) 350.
The fee 1103 is a character string indicating the price in the specification page in the illegitimate application candidate detection list (specification page data) 350. In the case of the specification page 131 of
The provider 1104 is a character string indicating the provider name 143 in the specification page in the illegitimate application candidate detection list (specification page data) 350. The URL 1105 is a URL that can access the specification page in the illegitimate application candidate detection list (specification page data) 350. The update date 1106 is the latest date on which the specification page in the illegitimate application candidate detection list (specification page data) 350 was updated.
The application name evaluation points 1107 are evaluation points calculated by the evaluation unit 305. Specifically, the application name evaluation points 1107 are evaluation points 1004 attained when the scoring rules 362 are applied in determining the presence or absence of the evaluation keyword in the application name 142, for example.
The application name check item 1108 is a combination of values of the first to third evaluation items 1001 to 1003 that serves as the source for calculating the application name evaluation points 1107. The application name check item 1108 in the first entry, for example, states “a product name and a suspicious keyword are included”, because, regarding the presence or absence of an evaluation keyword in the application name 142, the value for the first evaluation item 1001 is “no”, the value for the second evaluation item 1002 is “yes”, and the value for the third evaluation item 1003 is “yes”.
The description evaluation points 1109 are evaluation points calculated by the evaluation unit 305. Specifically, the description evaluation points 1109 are evaluation points 1004 attained when the scoring rules 362 are applied in determining the presence or absence of the evaluation keyword in the description 146, for example. Also, the description evaluation points 1109 may be evaluation points 1004 attained when the scoring rules 362 are applied in determining the presence or absence of the evaluation keyword in the description 146 and in the in-image text. The in-image text is a character string attained by recognizing a character string pattern included in the icon 141 or the thumbnail image 145 by an image recognition process and converting the character string pattern into text.
A character string “ABC” is recognized from the icon 141 in the specification page 131 of
The description check item 1110 is a combination of values of the first to third evaluation items 1001 to 1003 that serves as the source for calculating the description evaluation points 1109. The description check item 1110 in the first entry, for example, states “a company name, a product name, and a suspicious keyword are included”, because, regarding the presence or absence of an evaluation keyword in the description 146, the value for the first evaluation item 1001 is “yes”, the value for the second evaluation item 1002 is “yes”, and the value for the third evaluation item 1003 is “yes”.
The total evaluation points 1111 are the total of the application name evaluation points 1107 and the description evaluation points 1109 calculated by the evaluation unit 305 for the specification pages in the illegitimate application candidate detection list (specification page data) 350.
Returning to
<Setting Screen Example>
Next, an example of setting various information in advance using the detection apparatus 100 will be described with reference to
The search condition setting button 1201 is a button for setting the content of the search condition database 310 by user operation. When the search condition setting button 1201 is pressed, a search condition setting screen 1300 shown in
The email recipient setting button 1202 is a button for setting the recipient of an email, that is, the email address by user operation. When the email recipient setting button 1202 is pressed, a setting screen for setting the email recipient (not shown) is displayed. When the email address is set by being inputted to the setting screen by user operation, the output unit 307 transmits the illegitimate application candidate detection list 390 to the recorded email address. In the example of
The whitelist recording button 1203 is a button for recording the application ID in the whitelist 331 by user operation. When the whitelist recording button 1203 is pressed, a recording screen for recording the application ID (not shown) is displayed. When the application ID is recorded by being inputted to the recording screen by user operation, the search refinement unit 303 narrows down the illegitimate application candidate detection URL list 320 using the whitelist 331 after the application ID is recorded therein.
The execution schedule setting button 1204 is a button for setting the execution schedule by user operation. The execution schedule is a schedule by which the detection apparatus 100 generates the illegitimate application candidate detection list 390. Specifically, the execution schedule is a periodic execution start time such as 9:00 every Monday, for example. The execution schedule may be set for each search condition such as country or search keyword. When the execution schedule setting button 1204 is pressed, a setting screen for setting the execution schedule (not shown) is displayed. When the execution schedule is recorded by being inputted to the setting screen by user operation, the detection apparatus 100 starts execution according to the set execution schedule.
The exclusion list recording button 1205 is a button for recording the application ID in the exclusion list 332 by user operation. When the exclusion list recording button 1205 is pressed, a recording screen for recording the application ID (not shown) is displayed. When the application ID is recorded by being inputted to the recording screen by user operation, the search refinement unit 303 narrows down the illegitimate application candidate detection URL list 320 using the exclusion list 332 after the application ID is recorded therein.
The illegitimate application candidate detection list template recording button 1206 is a button for recording the illegitimate application candidate detection list template by user operation. When the illegitimate application candidate detection list template recording button 1206 is pressed, a recording screen for recording the illegitimate application candidate detection list template 380 (not shown) is displayed. When the illegitimate application candidate detection list template 380 is recorded by being inputted to the recording screen by user operation, the creation unit 306 creates the illegitimate application candidate detection list 390 using the illegitimate application candidate detection list template 380.
The scoring rule setting button 1207 is a button for setting the scoring rules 362 by user operation. When the scoring rule setting button 1207 is pressed, a setting screen for setting the scoring rules 362 (not shown) is displayed. When the scoring rules 362 are set by being inputted to the setting screen by user operation, the evaluation unit 305 evaluates the specification page data in the illegitimate application candidate detection list (specification page data) 350 using the set scoring rules 362.
The illegitimate application candidate detection list history button 1208 is a button for displaying the history of the illegitimate application candidate detection list 390. When the illegitimate application candidate detection list history button 1208 is pressed, past illegitimate application candidate detection lists 390 are displayed in the display of the detection apparatus 100.
The country designation list setting button 1301 is a button for setting the designation of the country for which the search by the search keyword is to be performed by user operation. When the country designation list setting button 1301 is pressed, a country designation list setting screen 1400 shown in
The search keyword setting button 1302 is a button for setting the search keyword by user operation. When the search keyword setting button 1302 is pressed, a search keyword setting screen 1500 shown in
The search result count upper limit setting button 1303 is a button for setting the search result count upper limit 313 by user operation. When the search result count upper limit setting button 1303 is pressed, a search result count upper limit setting screen 1600 shown in
The access sleep interval setting button 1304 is a button for setting the access sleep interval 314 by user operation. When the access sleep interval setting button 1304 is pressed, an access sleep interval setting screen 1700 shown in
The product name setting button 1502 is a button that calls a product name setting screen (not shown). The product name is recorded in the search keyword list 312 by being inputted to the product name setting screen by user operation. The rival company name setting button 1503 is a button that calls a rival company name setting screen (not shown). The rival company name is recorded in the search keyword list 312 by being inputted to the rival company name setting screen by user operation.
<Example of Illegitimate Application Detection Process Method Performed by Detection Apparatus 100>
The extraction unit 302 selects the selected country code if there are search keywords that have not yet been selected in the search keyword list 312 (step S1902), and executes steps S1903 and S1904 for the selected search keyword. If there are search keywords that have not yet been selected, then the process returns to step S1902 and if there are no search keywords that have not been selected (step S1905), then the process returns to step S1901 and the extraction unit 302 selects one country code that has not yet been selected.
Here, in step S1902, the search keyword selected on its own from the search keyword lists 400 to 600 of
For example, the search keyword “XYZ” for the company name 401 of
In step S1903, the extraction unit 302 accesses the distribution server 102 with search information including the selected country code and the selected search keyword, searches the group of specification pages in the distribution server 102, and acquires the top N (N=search result count upper limit 313) URLs among the group of URLs to the searched specification pages (step S1903).
In step S1904, the extraction unit 302 executes the sleep process for a time equal to the access sleep interval 314 (step S1904). As a result, access to the distribution server 102 is blocked. Then, the extraction unit 302 returns to step S1902 if there are search keywords that have not yet been selected, and if there are no search keywords that have not been selected (step S1905), the extraction unit 302 progresses to step S1906.
In step S1907, the extraction unit 302 generates the illegitimate application candidate detection URL list 320 by executing the merge process (step S1907), and progresses to the list comparison process (step S1802). If a URL to the specification page of a given application can be used in multiple countries, then a separate search would be performed for each country code. As a countermeasure, the extraction unit 302 executes a merge process in which only one instance among a plurality of instances of the same URL that were acquired for each of the country codes is left remaining, with the other instances being deleted. As a result, the illegitimate application candidate detection URL list 320 does not have a plurality of instances of the same URL. Therefore, a redundant process of searching the same URL a plurality of times is eliminated from following processes, and thus, it is possible to increase the efficiency of the illegitimate application detection process.
The search refinement unit 303 compares the selected application ID to the illegitimate application candidate detection URL list 320 (step S2002), and determines whether or not there are URLs including the application ID that match the selected application ID (step S2003). If there are no URLs including application IDs that match the selected application ID (step S2003: no), then the process progresses to step S2005. If there is a URL including an application ID that matches the selected application ID (step S2003: yes), then the search refinement unit 303 deletes the URL including the application ID matching the selected application ID from the illegitimate application candidate detection URL list 320 (step S2004) and the process progresses to step S2005.
The search refinement unit 303 selects one application ID if there are application IDs that have not yet been selected in the exclusion list 332 (step S2006), and executes steps S2007 to S2009. If there are application IDs that have not yet been selected, then the process returns to step S2006 and if there are no application IDs that have not been selected (step S2010), then the process progresses to the specification page data acquisition process (step S1803).
The search refinement unit 303 compares the selected application ID to the illegitimate application candidate detection URL list 320 (step S2007), and determines whether or not there are URLs including the application ID that match the selected application ID (step S2008). If there are no URLs including application IDs that match the selected application ID (step S2008: no), then the process progresses to step S2010. If there is a URL including an application ID that matches the selected application ID (step S2008: yes), then the search refinement unit 303 deletes the URL including the application ID matching the selected application ID from the illegitimate application candidate detection URL list 320 and outputs the illegitimate application candidate detection URL list (unnecessary data deleted) 340 (step S2009) and the process progresses to step S2005.
The acquisition unit 304 accesses the distribution server 102 with the selected URL and acquires the specification page therefrom (step S2102). As shown in
The acquisition unit 304 extracts text data from the image file (step S2104). In the case of the specification page 132 shown in
In step S2202, the evaluation unit 305 determines whether or not the application name in the selected specification page data corresponds to an evaluation keyword in the evaluation keyword list 361 (step S2202). Here, in step S2202, the evaluation keyword to be compared on its own is an evaluation keyword for which the sole use condition 402 is “yes”. Evaluation keywords for which the sole use condition 402 is “no” are compared in combination with one or more other evaluation keywords that have a sole use condition 402 of “yes” or “no”. This similarly applies to steps S2204 and S2206.
In step S2203, the evaluation unit 305 applies the determination results from step S2202 to the first to third evaluation items 1001 to 1003 of the scoring rules 362, calculates the evaluation points 1004 of the application name 142 as the application name evaluation points 1107, acquires the check results for the first to third evaluation items 1001 to 1003 as the application name check item 1108, and adds the application name evaluation points 1107 and the application name check item 1108 to the selected specification page data (step S2203).
In step S2204, the evaluation unit 305 determines whether or not the description 146 and the in-image text in the selected specification page data correspond to an evaluation keyword in the evaluation keyword list 361 (step S2204).
In step S2205, the evaluation unit 305 applies the determination results from step S2204 to the first to third evaluation items 1001 to 1003 of the scoring rules 362, calculates the evaluation points 1004 of the description 146 and the in-image text as the description evaluation points 1109, acquires the check results for the first to third evaluation items 1001 to 1003 as the description check item 1110, and adds the description evaluation points 1109 and the description check item 1110 to the selected specification page data (step S2205).
In step S2205, the evaluation unit 305 totals the evaluation points 1004 of the application name and the evaluation points 1004 of the description and the in-image text, calculates the total evaluation points, and adds the total evaluation points to the selected specification page data (step S2205).
Then, if there is specification page data of an illegitimate application candidate that has not yet been selected, then the evaluation unit 305 returns to step S2201 and if there is specification page data of an illegitimate application candidate that has not yet been selected, then the evaluation unit 305 outputs the illegitimate application candidate detection list (with scores) 370 (step S2207) and progresses to the illegitimate application candidate detection list creation process (step S1805).
The creation unit 306 sorts the group of written specification page data in descending order by total evaluation points 1111 and ascending order by application ID 1101 (step S2302). As a result, a plurality of pieces of specification page data with the same total evaluation points 1111 are sorted in ascending order by application ID 1101.
The creation unit 306 deletes specification page data in which the total evaluation points 1111 amount to 0 (step S2304). The total evaluation points 1111 of the specification page data to be deleted is not limited to 0, and may be set to a prescribed number of points or less that is greater than 0. Then, the process progresses to the illegitimate application candidate detection list email sending process (step S1806).
(1) Thus, the detection apparatus 100 of the present embodiment has the processor 201 that is configured to executes programs, and a storage device 202 that stores the programs. The processor 201 is configured to execute: a search process in which, as a result of accessing the distribution server 102 having a group of specification pages that pertain to an application using a search keyword pertaining to a legitimate application, given specification pages including a character string that matches or is related to the search keyword are searched from the distribution server 102; an acquisition process of acquiring, from the given specification pages found by the search process, a first evaluation character string (application name 142, for example) that identifies given applications included in the given specification pages, and a second evaluation character string (description 146, for example) that describes the given applications; an evaluation process of evaluating whether or not the given specification pages are specification pages pertaining to an illegitimate application on the basis of evaluation keywords relating to illegitimate applications and the first and second evaluation character strings acquired in the acquisition process; and an output process of outputting the evaluation results from the evaluation process. As a result, it is possible to detect illegitimate application candidates automatically.
(2) In the detection apparatus 100 from (1), during the search process, the processor 201 accesses the distribution server 102 using a search keyword and a country code, thereby searching, in the distribution server 102, for a given specification page that includes a character string that matches or is related to the search keyword and for which the country is designated. As a result, it is possible to detect illegitimate application candidates that are only provided in a given country.
(3) In the detection apparatus 100 from (1), during the search process, the processor 201 removes specification pages including character strings matching or related to the search keyword from the given specification pages on the basis of a given application ID. As a result, it is possible exclude legitimate applications or applications that have already been detected as illegitimate applications.
(4) In the detection apparatus from (1), during the search process, the processor 1 accesses the distribution server 102 using the search keyword, and after a prescribed period of time has elapsed, accesses the distribution server 102 with another search keyword. As a result, a case in which the distribution server 102 blocks access from the detection apparatus 100 as a result of too many accesses from the detection apparatus 100 to the distribution server 102 in a short period of time is mitigated.
(5) In the detection apparatus from (1), during the acquisition process, the processor 201 accesses a given page, and after a prescribed period of time has elapsed, accesses another given page. As a result, a case in which the distribution server 102 blocks access from the detection apparatus 100 as a result of too many accesses from the detection apparatus 100 to the distribution server 102 in a short period of time is mitigated.
(6) In the detection apparatus 100 from (1), during the acquisition process, the processor 201 acquires, from the given specification page, a third evaluation character string identified from an image included in the given specification page. As a result, it is possible to detect illegitimate application candidates with character strings acquired from images.
(7) In the detection apparatus 100 from (1), during the evaluation process, the processor 201 evaluates whether a given specification page is a specification page pertaining to an illegitimate application on the basis of a first evaluation for determining whether the evaluation keyword is included in a first evaluation character string (application name 142, for example) and a second evaluation for determining whether the evaluation keyword is included in a second evaluation character string (description 146, for example). As a result, it is possible to evaluate a given specification page from different evaluation perspectives in the given specification page.
(8) In the detection apparatus 100 from (1), the evaluation keyword includes the same keyword as the search keyword and a keyword differing from the search keyword. As a result, the evaluation keyword and the search keyword partially overlap, and thus, it is possible to search, as a specification page of an illegitimate application candidate, a specification page that includes a search keyword included in the specification page of a legitimate application and an evaluation keyword that is not included in the specification page of a legitimate application. That is, it is possible to detect illegitimate application candidates that are similar to but not the same as legitimate applications.
(9) In the detection apparatus 100 from (1), the search keyword is at least one of the company name 401, the product name 501, or the rival company name 601 of the application, and the evaluation keyword is at least one of the company name 401, the product name 501, or the suspicious keyword 801 of the application. In this manner, if the search keyword and the evaluation keyword partially overlap, it is possible to search, as a specification page of an illegitimate application candidate, a specification page that includes a search keyword included in the specification page of a legitimate application and a suspicious keyword that is not included in the specification page of a legitimate application.
(10) In the detection apparatus 100 from (9), the suspicious keyword 801 is a keyword pertaining to the usage method for the application, the usage method for a product linked to the application, or a description of components of the application. As a result, it is possible to suitably evaluate the specification page of illegitimate application candidates.
It should be noted that this invention is not limited to the above-mentioned embodiments, and encompasses various modification examples and the equivalent configurations within the scope of the appended claims without departing from the gist of this invention. For example, the above-mentioned embodiments are described in detail for a better understanding of this invention, and this invention is not necessarily limited to what includes all the configurations that have been described. Further, a part of the configurations according to a given embodiment may be replaced by the configurations according to another embodiment. Further, the configurations according to another embodiment may be added to the configurations according to a given embodiment. Further, a part of the configurations according to each embodiment may be added to, deleted from, or replaced by another configuration.
Further, a part or entirety of the respective configurations, functions, processing modules, processing means, and the like that have been described may be implemented by hardware, for example, may be designed as an integrated circuit, or may be implemented by software by a processor interpreting and executing programs for implementing the respective functions.
The information on the programs, tables, files, and the like for implementing the respective functions can be stored in a storage device such as a memory, a hard disk drive, or a solid state drive (SSD) or a recording medium such as an IC card, an SD card, or a DVD.
Further, control lines and information lines that are assumed to be necessary for the sake of description are described, but not all the control lines and information lines that are necessary in terms of implementation are described. It may be considered that almost all the components are connected to one another in actuality.
Number | Date | Country | Kind |
---|---|---|---|
2019-018804 | Feb 2019 | JP | national |