The present disclosure relates generally to regulation of media content obtainable over computer networks, and more particularly to the classification and regulation of adverse and detrimental digital content.
Third-party content serving is the process of determining which content goes in which available sections on a publisher's webpage or app and then delivering the content to a user requesting a webpage or launching an app. Third-party content is not explicitly requested by a user and thus, a user has little to no control over the type of content that will be delivered upon requesting the webpage or launching the app. Moreover, third-party content can be configured to track users' behavior, obtain users personal data, load malware on a user device, and/or initiate unsecure communication sessions between user computer device and a third-party server.
The detailed description set forth below is intended as a description of various configurations of the subject technology and is not intended to represent the only configurations in which the subject technology may be practiced. The appended drawings are incorporated herein and constitute a part of the detailed description. The detailed description includes specific details for the purpose of providing a thorough understanding of the subject technology. It, however, will be clear and apparent that the subject technology is not limited to the specific details set forth herein and may be practiced without these specific details. In some instances, well-known structures and components are shown in block diagram form to avoid obscuring the concepts of the subject technology.
The terms “computer”, “processor”, “computer processor”, “compute device” or the like should be expansively construed to cover any kind of electronic device with data processing capabilities including, by way of non-limiting example, a digital signal processor (DSP), a microcontroller, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), or any other electronic computing device comprising one or more processors of any kind, or any combination thereof.
As used herein, the phrase “for example,” “such as”, “for instance” and variants thereof describe non-limiting embodiments of the presently disclosed subject matter. Reference in the specification to “one case”, “some cases”, “other cases”, or variants thereof means that a particular feature, structure or characteristic described in connection with the embodiment(s) is included in at least one embodiment of the presently disclosed subject matter. Thus the appearance of the phrase “one case”, “some cases”, “other cases” or variants thereof does not necessarily refer to the same embodiment(s).
It is appreciated that, unless specifically stated otherwise, certain features of the presently disclosed subject matter, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the presently disclosed subject matter, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination.
Until now, classification and analysis of third-party digital content has been limited to issue detection and manual remediation. Manual remediation limits users and publishers to opt between completely turning off third-party content tags or exposing users to potentially unsafe or non-compliant third-party content. Often, third-party content tags are daisy chained such that, third-party content is served to users' devices from multiple servers and/or domains unbeknown to users before they load a page, click a web link and/or load an app, increasing the complexity and uncertainty of classification and analysis of third-party digital content before is loaded on their devices.
Thus, a need exists for methods and apparatus for classification of digital content and hindrance of adverse and detrimental digital content distributed over computer networks.
The subject technology is related to methods and apparatus for classification of digital content and hindrance of adverse and detrimental digital content distributed over computer networks. In some implementations, wrapping tags or blocking-enabled tags are configured with scripting language in which variables and/or processor executable functions are defined to regulate digital content distributed over computer networks. Differently stated, blocking-enabled tags are configured with processor executable and/or interpretable instructions to preempt the execution of unsolicited digital content classified as adverse or detrimental. The blocking-enabled tags are generated based on attributes of client compute devices and compliance rules implemented as computer executable instructions. In some instances, compliance rules can be generated through telemetry processes, extraction of attributes from unsolicited digital content tags, and/or emulation of user interactions with unsolicited digital content configured to be embedded in digital content or digital-based services requested by client compute devices. In some instances, the blocking-enabled tags can be maintained and updated over time through scheduled emulation and telemetry tasks.
Advantageously, the subject technology provides systems and methods to classify and preempt the loading of adverse and/or detrimental third-party digital content in near real-time or corresponding to round-trip time between two compute devices (e.g., by way of non-limiting example, on a millisecond order, such as between 25 ms and 200 ms, between 50 ms and 150 ms, between 75 ms and 125 ms, etc.). Thus, users perceive the responsiveness of the computer-based methods as immediate and as a part of a mechanical action (e.g., as part of requesting a webpage via a browser or loading an app) made by a user. It is to be understood that where a range of values is provided, such as above, each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range is encompassed within the disclosure. That the upper and lower limits of these smaller ranges can independently be included in the smaller ranges is also encompassed within the disclosure, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the disclosure. Where a list of values is provided, it is understood that ranges between any two values in the list are also contemplated as additional embodiments encompassed within the scope of the disclosure, and it is understood that each intervening value to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of said range and any other listed or intervening value in said range is encompassed within the disclosure; that the upper and lower limits of said sub-ranges can independently be included in the sub-ranges is also encompassed within the disclosure, subject to any specifically excluded limit.
The subject technology provides objective, accurate, and consistent classification of third-party content and thus, can preempt the execution of adverse or detrimental digital content. The classification and preemption of adverse and/or detrimental digital content is reliable; that is, digital content deemed to be adverse and/or detrimental is preempted to be loaded or displayed across compute devices irrespectively of whether they are mobile devices, stationary devices, and/or whether the content is called from a webpage and/or an app. The subject technology operates in near real-time and thus, optimizes security and user experience by decreasing overhead associated with human-based classification and/or intervention after adverse and/or detrimental digital content is loaded or displayed on a compute device.
Network 103 can include one or more types of communication networks. For example communication networks can include, the Internet, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), various types of telephone networks (including, for example, Public Switch Telephone Network (PSTN) with Digital Subscriber Line (DSL) technology) or mobile networks (including for example Global System Mobile (GSM) communication, General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), and other suitable mobile network technologies), or any combination thereof. Communication within network 103 can be established through any suitable connection (including wired or wireless) and communication technology or standard (wireless fidelity (WiFi®), 4G, long-term evolution (LTE™), or other suitable standard). Client compute devices 102 and 105 include one or more computer processors and computer memories, configured to perform various data processing operations.
Client compute devices 102 and 105 also include a network communication interface (not shown in
CHS server 101 includes one or more computer processors and computer memories, configured to perform multiple data processing and communication operations associated with classification of digital content provided by third-party entities and for hindrance of adverse and detrimental digital content distributed over computer networks. In general, CHS server 101 analyzes digital content generated by third-party servers, executes one or more risk analysis processes, classifies third-party digital content, and depending on classification outcomes, determines whether or not, client compute devices should retrieve third-party digital content. In some implementations, CHS server 101, or one or more processes executed by CHS server 101 can be implemented in publisher server 107, publisher's third-party content manager server 109, and/or other compute device. Further structural and functional components of an embodiments of CHS server 101 are discussed below with reference to
Publisher server 107 includes one or more computer processors and computer memories, configured to perform multiple data processing and communication operations associated with the delivery of publisher digital content and/or services for the consumption of users 121, 123, and other suitable users. Publisher digital content and/or services include, for example, webpages, mobile apps, Software as a Service (SaaS), and other suitable services and digital content. In some instances, publisher digital content and/or services include processor executable instructions to request third-party digital content from another server, for instance, digital content requested to one or more of servers in content distribution network 117, digital content from DP servers 115A, 115B, 115C and other suitable server connected to network 103. Such executable instructions can be carried out by client compute devices upon reception of publisher digital content and/or services provided by publisher server 107. In some implementations, publisher server 107 relies on services provided by CHS server 101 for the identification and regulation of third-party digital content configured to be delivered to client compute devices requesting publisher digital content and/or services from publisher server 107.
Publisher's third-party content manager server 109 includes one or more computer processors and computer memories, configured to deliver third-party digital content to client compute devices. Publisher's third-party content manager server 109 facilitates the placement of third-party content and delivers this content to, for example, web sites or services provided by publisher server 107. In some implementations, publisher's third-party content manager server tracks third-party content displayed to users, number of clicks made by users to the third-party content, and other suitable metrics or quota that can then be processed for statistical reports and/or analytics. In some implementations, publisher third-party content manager server 109 relies on services provided by CHS server 101 for the identification and regulation of third-party digital content configured to be delivered to client compute devices requesting publisher digital content and/or services from publisher server 107.
In some instances, publisher's third-party content manager server 109 can be implemented in a compute device separate from publisher server 109 managed by a company, person, or non-person entity providing services to publisher server 107. In some other instances, publisher's third-party content manager server and publisher's third-party content manager server 109 can be managed by a same person or same non-person entity. In yet some further instances, third-party content server 109 can be implemented and maintained in publisher server 107 as shown in an example of an embodiment provided with reference to
Content distribution network (CDN) 117 is a network system connecting multiple compute devices and/or servers. CDN 117 is a large, geographically distributed network of specialized servers and/or compute devices that accelerate the delivery of third-party digital content, rich media files and other suitable content to client compute devices. In some instances, third-party digital content sent from CDN 117 is received by client compute devices upon a request for publisher digital content and/or services provided by publisher server 107. In some other instances, however, third-party digital content can be delivered to client compute devices by DP servers 115A, 115B, and 115C, SP server 113, or other suitable server shown or not shown in
In some instances, a salient difference between digital content and/or services provided by publisher server 107 and third-party digital content is that the former is explicitly requested by users (e.g., user 121 and user 123) while the third-party digital content is embedded in publisher digital content and/or services according to, for example, user's demographic characteristics, behavioral characteristics, and/or other suitable user-centric characteristics (e.g., user compliance profile). In some instances, third-party digital content embedded in publisher digital content and/or services is often selected to be delivered to users based on outcomes from realtime bidding (RTB) processes where multiple bids are exchanged between DP servers to earn a digital slot or space within publishers' digital content and/or service.
Marketer of digital content server 111 includes one or more computer processors and computer memories, configured to promote or announce third-party digital content on behalf of, for example, third-party users and/or non-person entities wanting to gain publicity or reach users for purposes unsolicited by users. Supply platform server 113 includes one or more computer processors and computer memories, configured to perform operations on behalf of publisher server 107. Specifically, supply platform server 113 manages sections within publisher digital content and/or services provided by publisher server 107 that can be configure to embed third-party digital content. Each of the demand platform servers 115A, 115B, and 115B includes one or more computer processors and computer memories, configured to perform operations on behalf of third-party publisher servers, for example, third-party publisher server 109 and other suitable types of third-party publisher servers. Specifically, demand platform servers 115A, 115B, and 115C submit bids to SP server 113 to win digital slots or spaces for the inclusion of third-party digital content in sections embedded in publisher digital content and/or services provided by publisher server 107, or other suitable publisher server.
Any of users 121 and 123 can request publisher digital content and/or services provided by publisher server 107. For example, user 121 can launch an app on mobile device 102 to start a client session with publisher server 107 and then retrieve publisher digital content and/or services. For another example, user 123 can enter a Uniform Resource Locator (URL) to a browser installed in compute device 105 to retrieve publisher digital content and/or services. The aforementioned request methods are some examples of how users can request content and/or services provided by publisher server 107. These methods can be interchangeably used depending on applications, data, users' preferences, services, and other suitable user dependent and compute devices dependent factors. Alternatively or additionally, other methods to initiate communication sessions and/or gain connectivity from client compute devices to publisher server 107 can be similarly employed. Thus, the examples of methods to request content or services from publisher server 107 are not intended to suggest any limitation as to the scope of use and/or functionality of the presently disclosed subject matter.
In some instances, digital content and/or services provided by publishers include embedded tags designated to include third-party digital content. In some instances, these tags can be configured to trigger and execute calls to SP server 113, passing along, for example, information about available spaces or digital slots for the inclusion of third-party digital content and the identity of publisher server 107. Accordingly, in some instances, SP server 113 can retrieve one or more cookies or other suitable data files associated with client compute devices (e.g., mobile compute device 103 and compute device 105) and/or users. Thereafter, SP server 113 can request bids from DP servers 115A, 115B and 115C. SP server 113 sends the cookie or other suitable data files retrieved, for example, from client compute devices to each of the DP servers. DP servers executes one or more processes to value the offer to embed third-party content within publisher digital content and/or services provided by publisher server 107.
In general, the richer the data available about users, the higher the bids from DP servers would be. In some instances, several DP servers place bids and send redirecting links to SP server 113 configured to call marketer of digital content server 111, servers in content distribution network 117, a winning DP server, and/or other suitable servers. Such redirecting links are used in case their bids end up being selected by SP server 113. SP server 113 selects a winning bid, and sends the link received from the winner DP server to publisher server 107 which calls CHS server 101 to classify and/or preempt adverse or detrimental third-party digital content from being served to users' compute devices. In some instances, the winner DP server can send the redirecting link directly to publisher server 107. In some other instances, such a redirect link can be sent to publisher server 107 by other suitable server.
Bus 215 collectively represents system, peripheral, and/or chipset buses that communicatively connect numerous internal devices of CHS server 101. For instance, bus 215 communicatively couples processor 209 with read-only memory 211, system memory (RAM) 203, storage device 201, and network communication interface 205. From these various memory units, processor 209 can retrieve instructions to execute and/or data to perform processes for classification, and hindrance of adverse and detrimental digital content. In some implementations, processor 209 can be a single processor, a multi-core processor, a master-slave processor arrangement or other suitable compute processing device. In some implementations, processor 209 can be any suitable processor such as, for example, a general purpose processor, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), and/or other suitable hardware device.
CHS server 101 can further include physical compute devices not shown in
In some implementations, CHS server 101 can include any combination of one or more computer-usable or computer-readable media. For example, CHS server 101 can contain or can be operatively coupled to computer-readable medium including system memory or random access memory (RAM) device 203, a read-only memory (ROM) device 211, a magnetic storage device 201 and other suitable memory devices. In some implementations, CHS server 101 can include a general purpose processor and/or on one or more specialized processors configured to execute and optimize the processing tasks performed by Digital Content Blocking (DCB) engine 217, Digital Content Risk Analyzer (DCRA) engine 219, Third-party Content Manager (TCM) engine 109A, and other processes implemented and executed by processor 207.
In some implementations, storage device 201 can be physically integrated to CHS server 101; in other implementations, however, storage device 201 can be a repository such as a Network-Attached Storage (NAS) device, an array of hard-disks, a storage server or other suitable repository separated from CHS server 101. In some instances, storage device 201 includes data structures and/or databases utilized by DCB engine 217, DCRA engine 219, TCM engine 109A, and other suitable engines and/or processes executed by processor 209. For example, in some implementations storage device 201, stores third-party content tags, third-party content server identifiers, third-party content domain names, and other data associated with third-party content recognizable by CHS server 101. For another example, storage device 201 stores white lists of domain names, third-party content servers, and third-party content tags, allowed to be served along with digital content provided by one or more publishers. Such white lists can be configured or customized according to publishers policies based, for example, on a set of non-compliant parameters and/or rules to regulate deliverance of adverse and detrimental third-party digital content. In some instances, blocking-enabled tags generated by CHS server 101 can block a given third-party content on behalf of some publishers while the same third-party content may not be blocked for other publishers, depending on each publisher policies. Storage device 201 can also include one or more instances of databases and/or data structures including rules database 425, blocking rules database 507, telemetry database 513, discussed with reference to
In some instances, DCB engine 217, DCRA engine 219, and third-party content manager engine 109A are implemented as hardware, software, and/or a combination of hardware and software. For example, storage device 201 (or other suitable memory coupled to processor 209 or CHS server 101) can include processor-executable instructions and/or data to configure processor 209 to execute tasks performed by engines 217, 219, and 109A. Accordingly, in some instances, processor 209 can retrieve processor executable instructions and data to execute multiple CHS processes from one or more of the memory devices shown in
In general, DCB engine 217 generates blocking-enabled tags associated with third-party content. DCB engine includes a blocking rule database with rules to determine whether or not a blocking-enabled tag should be configured to block third-party content. Rules stored in a blocking rule database are derived from multiple processes, including policies associated with a publisher server, e.g., publisher server 107 discussed with reference
In general, DCRA engine 219 analyzes third-party content tags and/or third-party content to determine whether or not adverse and/or detrimental digital content is associated with a third-party content tag. For example, in some instances DCRA engine 219 implements scanning processes that emulates sample based controls such as hovering, and clicking on a third-party content to determine whether such emulated events result on unsecure communications with third-party content servers, load of malicious third-party content, and/or cause other adverse or detrimental effects on client compute devices or publisher servers. In some instances, a third-party content tag can be daisy chained to other third-party content tags, thus redirects calls to multiple third-party content servers can result upon a first third-party content tag is loaded or called at a client compute device. In such a case, daisy chained third-party content tags are identified and further analyzed to determine potential adverse or detrimental effects. Further details regarding functional and structural attributes of DCB engine 217 are discussed throughout this specification, for example, with reference to
In some implementations, TCM engine 109A, is hosted by CHS server 101. TCM engine 109A is functionally analogous to publisher third-party content manager server 109 discussed with reference to
Bus 215 can also couple CHS server 101 to network 103 shown in
The CHS server 101 generates, based on the security compliance assessment, a blocking-enabled tag including a set of computer executable instructions corresponding to a set of blocking rules and/or compliance rules associated with publisher server 107. The set of computer executable instructions is further discussed with reference to, for example,
Many of the examples discussed below are discussed in the context of third-party content associated with advertising campaigns. The contexts in which the subject technology is discussed are not intended to suggest limitations as to the scope of use and/or functionality of the presently disclosed subject matter. For example, the subject technology can be analogously used in non-advertising context in which client compute devices require high levels of cyber security and avoidance of intrusive and unsolicited third-party digital content.
In some instances, DCB engine 217, wraps, at 407, selected third-party content tag 10 into a blocking-enabled tag according to a set of rules executed during digital content risk analysis process. Accordingly, a set of HTML instructions or other suitable markup or script language of selected third-party content tag 10 is stored as an encoded JavaScript string such that blocking-enabled tag built at 409 is eventually served to client compute devices such that, client compute devices retrieve secure and/or pre-approved third-party content.
In some implementations, JavaScript functions such as document.write can be used to act as a regular intermediary in a daisy chain of third-party content tags however, other suitable scripting and non-scripting functions can be interchangeably used. For instance, third-party content tags can be configured to be daisy-chained, such that, a first third-party content tag associated with a first third-party digital content provider includes a redirect call to a second third-party content tag associated with a second third-party content provider, and so on. Third-party digital content resulting from daisy chained configurations can be nondeterministic. Thus, in some examples, the DCB engine 217 effectively manages nondeterministic redirect calls by wrapping the outermost tag in a daisy chained configuration into a blocking-enabled tag (e.g., 407). In this instance, blocking-enabled tag 407 remains persistent throughout the daisy chained configuration ensuring secure redirect calls and/or pre-approved third-party content.
JavaScript code of blocking-enabled tag 11 includes encoded third-party content tag, blocking JavaScript code, and processor executable instructions corresponding to blocking rules associated with blocking-enabled tag 11 (also referred to herein and labeled in
As discussed above, in some implementations, publisher's third-party content manager server 109 (shown in
In some instances, when selected third-party content tag 10 is received by CHS server 101 from publisher server 109, DCRA engine 219 executes a scanning process (also referred herein as security compliance assessment) to classify the third-party digital content. DCRA engine 219 can reside on a separate system or as part of CHS server 101 and performs sample-based controls, at 415 on selected third-party content tag 10 by replicating or emulating various browser environments (as shown, for example, in
In some instances, issues are detected during the scanning process at 417, because, for example, selected third-party content tag 10 shows malicious actions, adverse behavior, impermissible user tracking (e.g., browser fingerprinting), or violated publisher's ad policies, comprehensive information about the state of selected third-party content tag 10 as loaded is extracted and retained. Such extracted and retained information includes state of Document Object Model (DOM), Uniform Resource Locator (URL) of ad creative (if any), URL of the landing page (if any), URL of the “creative tag”, which is the last tag nested in the daisy-chain of third-party content tags when multiple third-party content providers are involved in the selected third-party content tag. When an issue is detected via scanning, a probability level, score, or other suitable ordinal, or discrete value is calculated to indicate a measure of risk to which users are exposed to.
The DOM is generated by a browser whenever a web page is loaded. The DOM is a programmatic representation of the web page that allows code in a scripting language, for example, JavaScript to convert static content into something more dynamic. For example, scripting code can be programed to modify the DOM representation by adding new div elements. A div element can define a new section in a web page and inside this new section a virtual object (VO) data structure representing third-party content can be added to the DOM through, for example, a document.write Application Programming Interface(API) call for the insertion of a VO representing a given third-party content. Thus, programmed scripting language can modify the DOM representation and therefore, the content and/or sub-content rendered on client compute devices can be controlled.
In some instances, when DCRA engine 219 detects an issue, at 417, associated with third-party content tag during scanning process, the DCRA engine 219 identifies third-party digital content responsible for the detected issue and can issue an alert at 419, such that a blocking rule can be built that uniquely targets adverse third-party digital content. Accordingly, in examples, data of the alert and third-party digital content are sent to DCB engine 217, at 421, such data includes, for example, in the case of a malicious digital advertisement, a domain name, a creative identification, a creative URL and other suitable data. Accordingly, DCB engine 217 generates a blocking rule, at 423, for the issue found, additionally or alternatively DCB engine 217 sends a confirmation request to publisher server 109 prior to executing the rule, such that, publisher server 107 notifies and prevents, when needed, third-party content that will be configured to be blocked. A blocking rule includes an “input” value holding a fragment of HTML (or other suitable markup language) that uniquely identifies third-party digital content configured to be blocked. Some examples of such fragments include: Creative ID in GET function parameters of third-party content tags; URL of third-party content assets (or a fragment of it); name and/or identifier of the domain generating the issue (whether because a call to the domain serves nonsecure content, because the served content is known to be malicious, because the domain is known to be malicious, because the domain and/or content is banned by publisher server 107 for any reason non-related to security or other suitable constraints). These types of fragments are inferred from data points identified during the scanning process or during serving of third-party digital content, for example, via a network log with initiator information.
In some instances, fragments included in a given rule can include each HTTP response containing third-party active content configured in, for example, JavaScript, HTML, plugin objects like Adobe Flash, and other suitable third-party active content. Third-party active content can include an initiator URL, corresponding to a third-party content tag redirecting to another third-party content tag, forming a daisy chain from a selected third-party content tag all the way down to a third-party content tag associated with an ultimate (or last) buyer of digital space offered by a publisher of digital content and/or digital service provider.
In some other instances, a digital content identifier can include an ad creative ID and/or creative URL, or other suitable field. An ad creative is a digital object associated with a format that contains data to render an ad or other digital content at a client device. Ad creatives can vary such that, digital content is configured to be displayed or rendered in various ways. For instance, an ad creative can be configured to render a banner when a web page loads; then multiple panels and/or floating ads can be launched from the banner via a cursor click, cursor hovering or auto-initiation. For another example, an ad creative can be configured to display digital content on a transparent layer over a web page. For yet another example, an ad creative can be configured to display video content based on interactive functionality via an interactive layer rendered over video content such that different video segments are delivered in response to user interactions.
In some instances, DP servers such as, DP server 115A, 115B, and 115C (shown in
In some implementations, DCRA engine 219 logs calls (also referred herein as network logs) to script functions to keep track of the functions used during a serving of a third-party content chain and their respective input parameters. This log then informs the rule generation process about which script functions require to be patched to filter out creatives where issues were found. In some examples, the blocking rule generation process at 423 can be executed differently depending on whether a domain associated with third-party content is determined to be adverse (i.e., shows an issue at 417) or whether it is trusted/known or not. In some cases when a third-party content server is classified as trusted, a predictable way to obtain creative IDs can be used to parse URLs in the network logs using, for example, regular expressions or sub-string matching. Creative IDs can then be used as input for rules as discussed, with reference to
In some implementations, when a third-party content server is classified as untrusted/unknown, the domain included in its associated third-party content tag or the creative URL is used as the input for a rule, as discussed with reference to
In some implementations, a mutation observer is leveraged as an alternative to patching relevant ad serving functions like document.write, to achieve a similar instrumentation. As the browser adds any active element to the DOM (such as SCRIPT, IFRAME and others), the mutation observer notifies the DCB engine 217 so that the content of such a DOM element can be matched with blocking rules. In case a match occurs, the DCB removes the matched element from the DOM, which effectively prevents further execution. An example of a process to serve third-party content using a mutation observer is discussed below with reference to
In some instances when a browser installed on a client compute device loads a webpage, publisher third-party content manager server 109 configures scripts, at 503, for the display of third-party content embedded in the requested webpage. Such scripts are controlled by publisher third-party content manager server 109 to determine whether third-party content should be delivered to a client device or not. Because the publisher server 107 has fulfilled preparatory steps to replace a third-party content tag received by, for example, SP server 113 or a DP server, with a blocking-enabled tag (as described with reference to
The blocking-enabled tag gets loaded with blocking rules maintained in the CHS server 101, specifically rules contained in blocking rules database 507. These blocking rules are obtained via scanning processes, inputs, reports received from users, and/or telemetry as further explained below.
Using instrumentation of a JavaScript API as predetermined in the blocking rules, blocking-enabled tags intercept calls relevant to JavaScript APIs to match their inputs with values predetermined in by blocking rules 507, thus assessing safety and compliance of digital content at 509. If a match is found, the instrumented JavaScript API function does not run. Accordingly, the execution of the third-party content tag and any other content tag daisy chained from such third-party content tag is not executed or blocked at 511.
In some instances, when a third-party content tag and/or any other content tag daisy chained from such a third-party content tag fails the safety and compliance assessment at 509, DCR engine 219 replaces the unsafe or noncompliant tag with a trusted third-party content tag and/or pass-back the third-party content tag to DCRA engine 219 as shown at 523. The Document Object Model (DOM) output of the third-party content tag is serialized and submitted to the DCB engine 217 for telemetry/logging purposes at 513. Telemetry data is submitted to the ad-scanning functionality for verification. If no match is found, the third-party content tag JavaScript calls are allowed to proceed normally and any associated resources load as usual at step 523.
In some instances, third-party content associated with advertising campaigns may be served following an auction for ad placement in webpage hosted at a publisher server. Multiple third-parties electronically bid for the placement of their ad. In these auctions, the content of the highest bid (by dollar amount) is selected to be executed. When such content tag and/or any other content tag daisy chained from such a third-party content tag fails the security and compliance assessment at 509, DCR engine 219 replaces the unsafe or noncompliant tag with a tag from the second highest bidder in auction. The DCR engine 219 can, for example, query a database associated or implementing the auction to retrieve a tag paired to the second highest bidder and use such a tag as replacement for the winner tag (i.e., the tag associated with the highest bidder) in case the winner tag fails the security and compliance assessment.
On a sample basis, telemetry is submitted at 525 to DCB engine 217 to maximize the scope of third-party content being scanned. For example, DCRA engine 219 may not be set-up to scan from a specific geography whereas telemetry is received from all geographies where actual visitors are located. If no match is found and the content is allowed to load, a “Report this content link is added, (for example, a link to report an unwanted ad content 517), underneath the content, to invite visitors to voluntarily report any quality issue or violation in the loaded content. At 519, publisher server 109 receives a message including a form filled out by a user describing an issue associated with third-party content and declarative information, a DOM dump of the third-party content is retrieved and submitted, at 513, to telemetry database via DCB engine 217 for telemetry/logging purposes. At 515, a third-party content tag scanning process identifies issues that feed into rules database 501.
In some instances, the DCB engine 217 preemptively collects information on any content for which no match is found and that is allowed to execute. Such information may include the content itself (snippets of HTML or other suitable markup or script language), URLs of third-party hosted resources, any ID identifying the third-party serving the content, etc. A buffer of most recent third-party content information is maintained in the browser in localStorage. Websites can include processor executable instructions to store content on the browser's disk via localStorage to be retrieved at a later time. In case the user experiences an issue and submits a form at 519 to describe the violation, the content of the localStorage is retrieved from disk and submitted along the message to feed into the telemetry for scanning 515. An example of a process for the collection of adverse or detrimental content via localStorage is discussed below with reference to
In some instances, during the execution of third-party content tag it can be determined whether the third-party content tag is configured to make calls to the JavaScript function that has now been replaced. When the third-party content tag calls such a JavaScript function, for example at step 611, a filtering proxy function is called instead. In step 613, the filtering proxy function starts by inspecting its caller arguments, comparing them against the input corresponding to third-party content configured to be blocked. This comparison can be based on sub-string matching or regular expressions. If there is no match, the function calls the original function with the caller's arguments unchanged as shown at 615 and returns the resulting value to the caller. In such a case, the third-party content is deemed safe and/or acceptable, and the execution flow is not altered.
In some instances, digital content blocking engine 217 maximizes its performance by the implementation of rule matching processes at multiple string levels or substrings, such that, a single rule can be configured to match more than one input. For example, a rule defined as rule_1=[“adserv1.com”,“?creative=”, “123”] would match more than one input value, including, input1=“<script src=′http://www.adserv1.com/adserving?creative=123′>” and any other inputs containing the substrings “adserv1.com” and “?creative=”, “123” (e.g., input2=“<script src=′http://www.adserv1.com/contentserving?creative=123′>”). In some implementations, a set of rules can be instantiated in a tree data structure having an arbitrary depth. Nodes in a tree can represent substrings included in a rule. In such a case, each tree branch defined from root node to leaf node specifies a rule. An input and a rule are considered to match when, for example, each node in a tree branch is sequentially matched from root node to leaf node. The rule-input matching process can be performed by traversing the tree structure using a depth first search, a breadth first search, or other suitable traversal algorithm.
In some instances when there is a match at 613, the function does not proxy to the original JavaScript function, and further execution is prevented as shown at 617. Since the third-party content tag is dependent upon successful execution of this function, the ad serving is interrupted.
Optionally or alternatively, the blocking-enabled tag may serve a verified replacement content tag 608 as shown at 619. Such a verified replacement content tag can be, for example, an advertisement tag, including pass-backs—such that the publisher third-party content manager server 109 dynamically defines which ad to serve as a replacement for the blocked third-party content. Moreover, when it is determined that there is a match during step 613, the information is also submitted to publisher third-party content manager server 109 for telemetry/logging purposes as shown at 621.
In some instances, all, some, or other suitable parameters are passed to a browser scripting agent, which role is to set up an environment that matches the parameters, pilot the browser, mimic user interaction, and log results. In step 707, at the operating system level, the time zone of the system clock is defined according to the geography specified, and the browser plugins are enabled (desktop environments) or disabled (mobile/tablet environments). The browser is configured to use a proxy server that matches the required geography at 709, and to emulate the selected device (user agent, screen) at 711. The browser scripting agent 735 starts the browser process in step 713 and loads the tag into browser environment 739, as shown at 723. Once the tag is loaded, it mimics user interaction with the third-party content by generating mouse events such as mouse hovering 725, mouse clicks 727, and other suitable user interactions. This allows triggering of additional events like audio or expansion of the ad that may only happen upon user interactions with the third-party content. For example, upon a mimicked click interaction over third-party content, a webpage can be loaded at a client compute device by landing a webpage associated with a URL. Events triggered through user interactions can result in different operations at a compute device as shown at 729. These operations include network requests, changes to the DOM model, events generated by a given browser, landing a webpage and other suitable events.
The browser scripting agent generates a comprehensive log including information obtained from mimicking user interaction with third-party content at 717. Examples of information included in the produced log include:
In some implementations, events such as associated audio outputs 731 and system calls 733 are obtained via sandboxing browser 737. Accordingly, interception of system calls 733 at the operating system level is performed to assess security issues by testing network calls and responses for detection of abnormal behavior resulting from, for example, software vulnerabilities exploited to achieve remote code execution. In some implementations, it is determined whether such events are automated (e.g., forcefully delivered to client compute devices) or delivered in response to user interactions and whether these events are triggered on mouse hover, on click, or other suitable user interactions with third-party content. In some instances, such determinations can be executed by matching each event to a time when a user interaction was mimicked or performed. At 719, data points obtained during the scanning process are saved or recorded in a dataset stored for the generation of blocking rules as discussed at 423 with reference to
Examples of adverse third-party content hindered by blocking-enabled tags are provided below.
In some implementations, at each hour, a background task initiates a scanning process for each third-party tag retrieved from, for example, the domain name, dailymeme.com. Each block-enabled tag received from a call to the publisher digital content in which the domain dailymeme.com is hosted, is staged for analysis through scanner process 8A, at 801. Accordingly, at 803, attributes of the third-party digital content associated with block-enabled tags are determined. Based on the attributes determined at 803, conditional statement 805 is executed to verify whether or not, adverse third-party content or inappropriate content violates publication policies set-up at the publisher server hosting the website dailymeme.com. In some instances, when the analyzed third-party content is determined to be safe, the scanning process ends at 804. In other instances a violation to a publication policy is determined; for example, dailymeme.com can have a policy for the exclusion or censorship of tobacco content, and thus when the third-party content is identified as an ad for tobacco products, a violation to such publication policy is determined. The full network log (HTTP requests made by the third-party content tag) is parsed, at 807. The scanning process 8A can identify during the parsing process executed at 807, client requests calls (e.g., GET commands) directed to the third-party content server hosting ads.best.inclass.net.
In the provided example, during scanning process 8A it is identified that third-party content is configured to be served from a trusted third-party content server at 809, in this case third-party content server ads.bestinclass.net. A third-party content server is determined to be trusted if such a third-party content server has been previously white-listed by CHS server 101. In this case, ads.bestinclass.net is included in the CHS server white-list thus, at 811, unique identifier(s) of the third-party content are extracted from a URL associated with the third-party content, using, for example, regular expression matching; verbal expression matching defined as descriptive, chainable functions; and other suitable string pattern matching techniques. In the examples provided with
In some implementations, the extracted domain name and third-party unique identifiers (ads.bestinclass.net, 3456) are used, at 813, to configure a blocking or compliance rule and create a blocking-enabled tag. Such a blocking-enabled tag is activated by a client compute device associated with a user as discussed below at 823 with reference to third-party content serving process (e.g., see
In some instances, third-party content serving process (see
The third-party content tag can run with JavaScript code (or code defined in other scripting or programming language) included in the blocking-enabled tag configured to control and/or regulate execution and third-party content such that, non-adverse third-party content and content according to publisher server policies is retrieved and displayed on the user client compute device. As a result, third-party content tag makes a call at 821 to, for example, an associated third-party content domain, in this case, to ads.bestinclass.net such that, third-party content can be served at the user's client compute device. In some instances, a response from a third-party content server received from the call executed at 821 can be in the form of a snippet and/or code in JavaScript. Such a response can be configured to perform calls to the DOM document.write API function to insert a VO data structure indicative of third-party content on the DOM instantiated at the client compute device for the dailymeme.com website loaded at 815. Example of an originally retrieved code is provided below:
In the example provided above, an image file is a type of third-party content configured to be inserted into a webpage of dailymeme.com website and its associated DOM model. However, the subject technology is equally capable to handle other type of third-party content according to the apparatus and methods disclosed herein, additional examples of third-party content include linear media content, non-linear media content, executable in-stream content, and other suitable types of third-party content. In this case, an image is originally configured to be served to the client compute device from third-party content server urbancigarrillos.net. As part of the image insertion a document.write call is configured to be performed by the client compute device to update its current version of the DOM model corresponding to the dailymeme.com website such that, the new version reflects the insertion of the third-party content i.e., the image. Instead however, the executable code in the blocking-enabled tag takes control over the aforementioned execution flow and replaces and/or overwrites the snippet and/or JavaScript code received in the response from the third-party content server. Differently stated, the retrieved document.write is controlled by the blocking-enabled tag. Accordingly, at 823, the new or overwritten code executes a matching process for the received document.write input against blocking rules obtained, for example, during a scanning process (e.g., see
In some instances, a full match is established. Thus, instead of calling the native document.write function with the input from ads.bestinclass.net, the blocking-enabled tag calls a document.write function with a trusted third-party content tag, provided by, for example, the publisher server hosting the dailymeme.com or other suitable trusted server, as shown at 827. Accordingly, the third-party content classified to be in violation of dailymeme's third-party content policy is not loaded to the user's client compute device.
In some implementations, at each hour, a background task initiates a scanning process for each third-party tag retrieved from, for example, the domain name, dailymeme.com. Each block-enabled tag received from a call to the publisher digital content in which the domain dailymeme.com is hosted, is staged for analysis through scanner process at 901. Accordingly, at 903, attributes of the third-party digital content associated with block-enabled tags are determined. Based on the attributes determined at 903, conditional statement 905 is executed to verify whether or not, adverse third-party content or inappropriate content violates publication policies set-up at the publisher server hosting the website dailymeme.com. Differently stated, a security compliance assessment is executed on the website. In some instances, when the analyzed third-party content is determined to be safe, the scanning process ends at 904. In other instances a violation to a publication policy is determined, for example, if the third-party content is identified as to be redirecting the control flow towards an unsecure domain or website, such as, a website or domain classified to be a fraudulent website or “scam” website. The full network log (HTTP requests made by the third-party content tag) is parsed, at 907. During such a parsing process executed at 907, a client request (e.g., a GET call) made to the third-party content server hosting ads.deceivenetworks.de is identified.
In the provided example, during the scanning process (e.g., see
In some instances, the third-party content serving process (e.g., see
The third-party content tag can run with JavaScript code (or other scripting or programming code) included in the blocking-enabled tag configured to control and/or regulate execution flow and third-party content such that, non-adverse third-party content and content according to publisher server policies is retrieved and displayed on the user client compute device. As a result, third-party content tag makes a call at 919 to, for example, it's associated third-party content domain, in this case, to ads.deceivenetworks.de such that, third-party content can be served at the user's client compute device. In some instances, a response from a third-party content server received from the call executed at 919 can be in the form of a snippet and/or code in JavaScript. Such a response can be originally configured to perform calls to the DOM document.write API function to insert a VO data structure representing third-party content on the DOM instantiated at the client compute device for the dailymeme.com website loaded at 913. Example of an originally retrieved code is provided below:
In the example provided above, a script including a call to an untrusted server hosting ads.deceivenetworks.de.net is configured to be executed over a webpage of dailymeme.com website and its associated DOM model. Specifically, document.write call is configured to be performed by the client compute device to update its current version of the DOM model corresponding to the dailymeme.com website such that the new version reflects any changes or third-party content received or caused to be received by the call to untrusted server hosting the domain ads.deceivenetworks.de.net. Instead however, the executable code in the blocking-enabled tag takes control over the aforementioned execution flow and replaces and/or overwrites the snippet and/or JavaScript code received in the response from the third-party content server. Differently stated, the retrieved document.write is controlled by the blocking-enabled tag. Accordingly, at 921, the new or overwritten code executes a matching process for the received document.write input against blocking rules obtained, for example, during the scanning process (e.g., see
In some instances, a full match is established. Thus, instead of calling the native document.write function with the input from ads.deceivenetworks.de.net, the blocking-enabled tag calls a document.write function configured to receive third-party content from a trusted domain hosted by a trusted server. Example of a code to replace responses from third-party content servers associated with untrusted domains is provided below:
Accordingly, the redirect call to the server hosting the domain classified to be an untrusted domain is not made by the user's client compute device.
In some implementations, at each hour, a background task initiates a scanning process for each third-party tag retrieved from, for example, the domain name, dailymeme.com. Each block-enabled tag received from a call to the publisher digital content in which the domain dailymeme.com is hosted, is staged for analysis through scanner process at 1001. Accordingly, at 1003, attributes of the third-party digital content associated with block-enabled tags are determined. Based on the attributes determined at 1003, conditional statement 1005 is executed to verify whether or not, adverse third-party content or inappropriate content violates publication policies set-up at the publisher server hosting the website dailymeme.com. Differently stated, a security compliance assessment is executed on the website.
In some instances, when the analyzed third-party content is determined to be safe, the scanning process ends at 1004. In other instances, a violation to a publication policy is determined, for example, when third-party content is configured to identify security vulnerabilities in a browser, operating system, or other suitable software installed at a client device for the purpose of installing malware or any type of malicious software including virus software. The full network log (HTTP requests made by the third-party content tag) is parsed, at 1007. During such a parsing process, a client request (e.g., a GET call) made to the third-party content server hosting ads.infected.com is identified.
In the provided example, during the scanning process of an example of
In some instances, third-party content serving process of an example of
In this instance, it is identified at 1019 that third-party video player is configured to load third-party content via HTTP request, through the execution of the script:
The above code instantiates or initializes a new XMLHttpRequest( ) object associated with the variable request. The request.open is originally configured to retrieve third-party content from a server hosting ads.infected.com domain. However, at 1021, the XMLHttpRequest request.open function in this instance is controlled by the blocking-enabled video wrapper tag, and accordingly, the rule for the domain ads.infected.com is matched. Thus, the blocking-enabled video wrapper tag overwrites the request.open function, such that, third-party content is instead retrieved from a domain hosted by a trusted server, in this case, from ads.supersafe.com.
As discussed above, the examples of blocking-enabled tags discussed with reference to examples of
Attributes associated with a third-party content tag in the database 1101 include, last time a third-party content tag was analyzed, how frequently such third-party content tag should be analyzed, and a geography or geolocation associated with the content tag. For instance, a third-party content tag can be configured to be displayed or loaded in a specific country e.g., the United States, or locations composed of regions and zones. Each region defines a separate geographic area and can have multiple, isolated locations.
Profile database 1103 includes attributes of platform or software installed on client compute devices including, for example, operative systems, browser types, mobile compute device operative systems, and any other suitable software platform configured to display digital content, and/or execute digital content. Profiles stored in database 1103 are used to test and update third-party content tags stored in tag database 1101.
In some implementations, job dispatcher 1105 can be implemented in the CHS server 101 as part of; for example, DCRA engine 219 or alternatively, job dispatcher 1105 can be implemented in a separate server operatively coupled to the CHS server 101. Job dispatcher 1105 configures scheduled tasks for the update of third-party content tags stored in tag database 1101 and profiles stored in profiles database 1103. In some implementations, job dispatcher can schedule a task to update a third-party content tag based on the frequency field of the third-party content tag. A profile for the scheduled task can be selected at random from a set of candidate profiles. For example, for a selected third-party content tag configured to load on a mobile device app, a random profile can be selected from a group of profiles including Android™, iOS™ or other suitable mobile operating system. Similar profiles can be selected to update third-party content tags configured to load on webpages, for example, an Internet Explorer™, Google Chrome™, Firefox™, and other suitable browsers.
Job dispatcher 1105 can then send multiple parameters at 1107 associated with a selected third-party content tag and selected profile to update selected third-party content and profile tags through emulation and telemetry executed by light client 1111. A proxy 1109 configured with a selected geolocation 1110 creates a connection ensuring accurate geolocation.
Light client 1111 emulates the loading of selected third-party content tag 1117 on an emulated environment (e.g., emulated environment 1115) configured according to the characteristics of the selected profile 1119 and a selected device 1121. A selected device can be a compute device, for example, a laptop, desktop, mobile compute device, virtualized compute device or any other suitable compute device. Report and screenshots 1123 are computed with information regarding, for example, displayed digital content, redirect calls initiated upon loading third-party content tag, whether a loaded third-party digital content is a malware or in any ways detrimental to the performance of a client device, and other suitable telemetry metrics.
Light client 1111 then sends data gathered during the emulation to job dispatcher 1105 including reports, screenshots, updates (if any) associated with the selected profiles or third-party content tag and other suitable data gathered during the emulation. Emulation information and updates to profiles and/or third-party content tags are stored in tag database 1101 and profiles database 1103. Accordingly, the effect of rules used by CHS server 101 for the generation of blocking-enabled tags remain effective and reliable over time even when third-party content changes and/or running profile environments change.
In this example, at 1204, the third-party ad tag located at ads.decentavertising.com redirects or calls another third-party ad tag served by or located at ads.deceivenetworks.de, for example, via the execution of the script:
At 1205, the mutation observer sends a notification signal to the CHS server 101 via executable instructions encoded in the blocking-enabled tag when a new active element (such as SCRIPT, IFRAME and others) is being added to the DOM. Thereafter, at 1206, the CHS server 101 uses parameters received via executable instructions encoded in the blocking-enabled tag to determine whether a rule exists that matches, for example, the domain ads.deceivenetworks.com stored in rules database 425 discussed with reference to
Thus, when the DCB engine 217 determines a rule match, further execution of code configured to load detrimental or adverse digital content is effectively prevented from execution via the blocking-enabled tag.
Websites can include processor executable instructions to store content on the browser's disk via localStorage to be retrieved at a later time. For instance, in
In some instances, when the user experiences an issue or violation to a publication policy resulting from the content served from the third-party ad tag server at 1311, the user can be prompted to submit a form at 1312 to describe the issue or violation. In some instances, the user can submit information regarding the experienced issue or violation to the publication policy via a form displayed at the client compute device at 1313. In addition, the latest third-party content information is retrieved at 1314 from the disk of the client compute device, via localStorage, and submitted along with the information entered by the user to the CHS server 101. Thereafter, the DCRA engine 219 scans at 1315 the received information including the third-party ad tag content to uncover a violation or issue with the third-party content. Thus, a blocking rule is produces at 1316, such that a blocking-enabled tag can prevent the execution or loading of the digital content found to be adverse, detrimental or in violation with a publication policy.
This application is a continuation of U.S. patent application Ser. No. 15/867,504, filed Jan. 10, 2018 and titled “METHODS AND APPARATUS FOR HINDRANCE OF ADVERSE AND DETRIMENTAL DIGITAL CONTENT IN COMPUTER NETWORKS”; which claims priority to and benefit of U.S. Provisional Application Ser. No. 62/444,634, filed Jan. 10, 2017 and titled “METHODS AND APPARATUS FOR HINDRANCE OF ADVERSE AND DETRIMENTAL DIGITAL CONTENT IN COMPUTER NETWORKS;” both of the aforementioned priority applications being hereby incorporated by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
62444634 | Jan 2017 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15867504 | Jan 2018 | US |
Child | 17900835 | US |