This invention relates to digital rights display and methods and apparatus for determining reuse rights for content to which multiple licenses and subscriptions apply. Works, or “content”, created by an author is generally subject to legal restrictions on reuse. For example, most content is protected by copyright. In order to conform to copyright law, content users often obtain content reuse licenses. A content reuse license is actually a “bundle” of rights, including rights to present the content in different formats, rights to reproduce the content in different formats, rights to produce derivative works, etc. Thus, depending on a particular reuse, a specific license to that reuse may have to be obtained.
Many organizations use content for a variety of purposes, including research and knowledge work. These organizations obtain that content through many channels, including purchasing content directly from publishers and purchasing content via subscriptions from subscription resellers. Subscriptions generally include some reuse rights that are conveyed to the subscriber. A given subscription service will generally try to offer a standard set of rights across its subscriptions, but large customers will often negotiate with the service to purchase additional rights. Thus, reuse rights may vary from subscription to subscription and the reuse rights available for a particular subscription may vary even across publications within that subscription. In addition, the reuse rights conveyed in these subscriptions often overlap with other rights and licenses purchased from license clearinghouses, or from other sources.
Many knowledge workers attempt to determine which rights are available for particular content before using that content in order to avoid infringing legitimate rights of rightsholders. However, at present, determining what reuse rights an organization has for any given publication is a time-consuming, manual procedure, generally requiring a librarian or legal counsel to review in advance of the use, all license agreements obtained from content providers and purchased from other sources which may pertain to the content and its reuse. The difficulty of this determination means that sometimes an organization will overspend to purchase rights for which it already has paid. Alternatively, knowledge workers may run the risk of infringing a reuse right for which they believe that the organization has a license, but which, in actuality, the organization does not.
Accordingly, organizations, such as the Copyright Clearance Center located in Danvers, Mass., have developed mechanisms that allow knowledge workers to purchase licenses during the search process. In one of these mechanisms, when the worker searching on a publisher's website has navigated to a webpage containing, for example, the content of an article in which the worker is interested, and the worker wants to determine available rights for that article, the worker can click on a link provided on the webpage by the publisher. The link contains a “Rightslink” URL of a rights advisor website and accesses the website. A URL associated with the article is then provided to the website. In response, the rights advisor website extracts all agreements stored therein that are applicable to the organization to which the worker belongs. The rights advisor website converts the URL of the article to a standard publication identifier. The publication identifier is then used to determine agreements that are applicable to that publication. These agreements are processed to determine available rights, terms and prices, which are returned online to the knowledge worker.
However, in some cases, the knowledge worker is not searching on a publisher's website, but on another website which does not include the link to the rights advisor website. For example, the worker may be searching on a website, such as copyright.com, provided by the Copyright Clearance Center. In this case, if the worker requests information on available rights, information identifying an article located by the worker, such as a digital object identifier, is used to locate and access the publisher's webpage for that article. As noted, above, the publisher's webpage contains a link which allows the worker to access the rights advisor webpage and obtain available rights, terms and prices for the article. The Rightslink URL data is then extracted from the publisher's webpage and used to access the rights advisor website to obtain the rights information as disclosed above.
Generally, the Rightslink URL data extraction process involves writing a small software program that is specific to the publisher or clearinghouse whose website is being examined and which processes the website in a manner particular to that website to extract the relevant information. This, in turn, generally involves the services of a programmer and thus the overall process is expensive and may be limited by the availability of programmer resources. It would therefore be desirable if non-programmer personnel could generate the required software code without programmer involvement. However, it is imperative that limitations be placed on the code generation process so that the malfunction of any generated software code does not compromise the entire system or code that extracts data from other websites or return erroneous results to the knowledge worker.
In accordance with the principles of the present invention, the website processing code can be constructed by a non-programmer user by connecting together a chain of steps, each of which uses a pre-defined module, called a “widget”, which, in turn, performs a specific task. By selecting, configuring and arranging steps, different websites can be processed in different manners. However, since the modules are predefined, they cannot be changed and thus the overall process can be controlled to prevent problems with one program from affecting other programs.
In one embodiment, each step is defined in XML text. A sequence of steps, also defined in the XML text forms a rule that forms the website processing code.
In another embodiment, the XML text defines property expressions which are provided as input parameters to the associated widget.
In still another embodiment, widgets are implemented as Java classes.
As set forth above, a pre-written collection, or toolbox, of modules called “widgets”, each of which performs a specific task, is provided by a programming staff. A non-programmer user can then specify inputs to each widget and assemble the widgets into a chain called a “linking rule” which accepts article metadata as inputs and produces a Rightslink URL as an output. The user can then designate a set of works or articles with an existing tagging service and attach the linking rule to this set of works. Subsequently, a knowledge worker searching these works can invoke the linking rule which, in turn, scrapes or otherwise constructs a link that can be used, for instance, to invoke a rights advisor web application to review available content reuse rights.
As defined in the XML data 102, each step specifies a valid widget class name. This name can refer to any widget class that implements the ExecutableWidget interface (discussed below) and exists in the widget toolbox 108. The widget will be executed during execution of the step as schematically illustrated by arrow 110. A step definition also requires a step name, which is a character string value that is used to identify the step so the step properties and result can be referenced in subsequent steps.
Further included are zero or more optional property values that are provided to the widget. These property values can include a list of input parameters including top level arguments provided by the system that invokes the linking rule. These arguments, called context variables, could include, for example, article and work metadata, such as a digital object identifier (DOI). The context variables are stored in the execution engine thread as indicated schematically by context memory 114 and provided to the execution engine 106 as indicated schematically by arrow 112.
Other property values can also include literals, the output from a previous step, and Java Expression Language (JEXL) expressions. JEXL is a well-known open-source library intended to facilitate the implementation of dynamic and scripting features in applications and frameworks. More details can be found at commons.apache.org.
Property values can either be static or dynamic. A static property remains fixed for each execution of the step during execution of a rule. A dynamic property is any valid JEXL expression and is resolved just prior to execution of the widget. This JEXL expression can contain references to context variables and/or other widget properties
A step further defines an optional gating expression which is a JEXL expression that can access properties from any other widget that has already executed and resolves to true or false. An empty expression or any expression that resolves to true will result in the widget associated with the step executing. If the expression resolves to false, the widget will not execute. The expression is resolved at runtime so its result depends on the state of the linking rule for that invocation.
In one embodiment, widgets are implemented as Java classes. Any java class can be a widget as long as it implements an ExecutableWidget interface as defined in Java.
The widget further includes a set of methods 206 which are defined as follows:
An example widget written in the Java programming language that concatenates two character strings is shown below.
The execution engine 106 will look on the Java classpath for all implementations of the ExecutableWidget interface when it is invoked. The result of a widget can be any java object from the Java classpath and must be wrapped within a WidgetResult object, which is a standard Java object. The WidgetResult object carries additional data about the result. For example, it carries whether the invocation succeeded, failed or was gated. It also contains a reference to the exception if one was raised while executing the widget.
Using a simple graphical user interface, a user can test an individual step by providing its input arguments via the user interface. The system will display the widgets output on the screen. The user can also test a sequence of steps by providing the necessary input arguments. The system will display the output of those steps on the screen.
A user can create a linking rule by selecting one or more widgets from toolbox 108, defining the input arguments for each widget and defining the order of execution. Both the input arguments and the order of execution are determined by means of XML linking rule data that is schematically illustrated as data 102 in
The final result of a rule is the same as the result of its final widget. The result is always a Java object and it is always wrapped within a conventional Java WidgetSetResult object. The WidgetSetResult object contains a status field that identifies whether all of the steps successfully executed or whether there was an error during execution.
The XML data that defines an example rule 300 is illustrated in
The XML data for a more complicated rule is shown in
Then, the LinkScraper step is executed. This step uses the StringFragmentExtractor Widget which extracts a string from a search string. The stringToSeach property expression 406 is set to the result of the previous step. At runtime this result contains the HTML code that was retrieved from the doi.org website by the ArticleAbstractGetter step. The startGatheringBeforeToken property value specifies the position in the HTML code at which the StringFragmentExtractor Widget begins extracting characters. This property value is set to a string constant 408 identifying where to start extracting characters. Characters are extracted until the stopGatheringBeforeToken property value is reached. This latter property value is set to another string constant 410. Other property values 412-418 which may be used in other situations are left blank and are not used in this rule. The result of executing the above rule is a java.lang.string containing the characters that form the Rightslink URL. This URL can then be used to access the rights advisor website and retrieve the available rights.
The XML data defining another example rule is shown in
Rule 500 also uses a GetAbstractPage step 502 which, similar to the ArticleAbstractGetter step shown in
Next, the Javascript function definition and function call are extracted from the retrieved web page HTML code by two steps, the ExtractFunctionDefinition step 504 and the ExtractFunctionCall step 506. Both of these steps use the StringFragmentExtractor Widget to selectively extract character strings from the HTML code. For example, step 504 extracts characters from the result of the GetAbstractPage step 502 as indicated at 508. The startGatheringBeforeToken property value specifies the position in the HTML code at which the StringFragmentExtractor Widget begins extracting characters. This property value is set to a string constant 510 identifying where to start extracting characters. Characters are extracted until the stopGatheringBeforeToken property value is reached. This latter property value is set to another string constant 512.
Similarly, step 506 extracts characters from the web page HTML as indicated at 514. The startGatheringBeforeToken property value is set to a string constant 516 identifying where to start extracting characters. Characters are extracted until the stopGatheringBeforeToken property value is reached. This latter property value is set to another string constant 518.
At this point, both the Javascript function definition and function call have been extracted. The Javascript is then run in step 520 which uses a JavascriptRunner widget, which can run Javascript from within Java using a third party library called “Rhino”. The step assembles the function definition, the return value and the function call using the results of the ExtractFunctionDefinition step 504 and the ExtractFunctionCall step 506 and the JEXL concatenation operator “+” and then runs the Javascript. The result is a java.lang.string containing the characters that form the Rightslink URL.
An exemplary list of Widgets which can be used to process many web pages is set forth below:
While the invention has been shown and described with reference to a number of embodiments thereof, it will be recognized by those skilled in the art that various changes in form and detail may be made herein without departing from the spirit and scope of the invention as defined by the appended claims.