A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The invention relates generally to systems for software development, and particularly to a system and method for determining and supporting dependencies between objects or modules in a software development project.
A complex software project or software system may include multiple modules, components, or objects, each of which may relate to one another in some manner. Of particular importance are those relations in which a module (or component, object, etc.) depends on another module for some functionality. This creates a dependency by one module on the other, or between the modules. If the independent module is changed, then care must be taken that the dependent module is similarly changed, or will not fail in some way because of the change. For ease of development, modules may also be grouped and developed in layers, or as subsystems of a larger, more complex system. However, popular programming languages such as the JAVA™ programming language cannot express dependencies and module architecture relations such as layers and subsystems. These relations inherently span multiple modules (i.e. packages in the JAVA™ programming language) and hence they cannot be captured within the programming language's concepts. What is needed is a more formalized approach to managing relations and dependencies between software modules to ensure better compliance with the overall software architecture.
Disclosed herein is a system and method for module architecture specification that allows developers to maintain and control the module dependency structure of their products in a pragmatic, cost effective way over the long lifetime of large-scale, development and maintenance projects. As described herein, an embodiment of the system referred to as “IDARWIN” includes a specification language together with a set of tools that check software code, for example JAVA™ programming language source-code and class files, for adherence to a set of specifications. Unlike the use of “module architecture diagrams” (the typical picture of layered software) which are highly ambiguous, an IDARWIN module architecture specification is precise and can be checked automatically. If the code deviates from the desired structure, developers are alerted without delay and can either revise the module architecture description, or remove non compliant code dependencies. IDARWIN is thus a cost effective early warning system that helps to stop a particularly expensive form of code-aging.
Load-time linking in the JAVA™ programming language makes the issue of “dependency creep” particularly acute. While one can maintain layer- or subsystem-dependencies using build systems such as “make” or “ant” by carefully controlling the class path that is presented to the compiler runs for each subsystem, in practice this is done very rarely, since it is tedious and hard to adapt. As a result, a lower layer in a system can unintentionally start to depend on a higher layer due to a code-change. In accordance with one embodiment IDARWIN determines static dependencies among classes and interfaces in JAVA™ programming language source- and class-files. Next it reads a set of module architecture specification files which it composes into a single compound specification, and then checks each code dependency against the compound specification to find:
An important feature of the invention is the ability to present multiple specifications to IDARWIN, so as to allow both subsystem and layer “owners”, i.e. the developers of these components, to author their own specifications from their individual perspective. These alternate views will sometimes overlap and in these cases IDARWIN contains a sophisticated mechanism to detect and resolve contradictory specification statements.
Disclosed herein is a system and method for module architecture specification that allows developers to maintain and control the module dependency structure of their products in a pragmatic, cost effective way over the long lifetime of large-scale, development and maintenance projects. As described herein, an embodiment of the system referred to as “IDARWIN” includes a specification language together with a set of tools that check JAVA™ programming language source-code and class files for adherence to a set of specifications. Unlike “module architecture diagrams” which are highly ambiguous, an IDARWIN module architecture specification is precise and can be checked automatically. If the code deviates from the desired structure, developers are alerted without delay and can either revise the module architecture description, or remove non compliant code dependencies. IDARWIN is thus a cost effective early warning system that helps to stop a particularly expensive form of code-aging.
Load-time linking in the JAVA™ programming language makes the issue of “dependency creep” particularly acute. While one can maintain layer- or subsystem-dependencies using build systems such as “make” or “ant” by carefully controlling the class path that is presented to the compiler runs for each subsystem, in practice this is done very rarely, since it is tedious and hard to adapt. As a result, a lower layer in a system can unintentionally start to depend on a higher layer due to a code-change. In accordance with one embodiment IDARWIN determines static dependencies among classes and interfaces in JAVA™ programming language source- and class-files. Next it reads a set of module architecture specification files which it composes into a single compound specification, and then checks each code dependency against the compound specification to find:
An important feature is the ability to present multiple specifications to IDARWIN, so as to allow both subsystem and layer “owners”, i.e. the developers of these components, to author their own specifications from their individual perspective. These alternate views will sometimes overlap and in these cases IDARWIN contains a sophisticated mechanism to detect and resolve contradictory specification statements.
In practice, the IDARWIN system in accordance with one embodiment of the invention may be used to provide a software development environment by which software developers (programmers, architects, etc.) can formulate rules that are to be followed during the development of a “software project”, i.e. a typically complex multi-module, multi-component, or multi-object software application. Particularly, these rules can be defined so as to allow, forbid, or constrain certain dependencies between software objects within the project. On a larger scale, the system can be used to allow, forbid, or constrain certain dependencies between software modules, or between the building blocks of large enterprise systems (essentially a larger scale “software project”). When the software project or enterprise system is built, for example at compile-time, the dependencies can be checked and verified against the rules. This serves as a positive check on the consistency of dependencies within the software project according to predefined architecture design/rules, and also allows a system architect or quality assurance (QA) team to identify any dependencies that may contravene those rules. A decision may then be made as to whether to, for example, change the rules, or eliminate the errant dependency.
The following section presents a simple module architecture specification with LAYER and ISOLATE statements. Subsequent sections describe the core language features, such as rules and patterns, as well as how rules are composed and checked against dependencies extracted from the code; the central notion of “rule contradiction”; extended-language statements, such as LAYER and ISOLATE; the meaning of extended-language statements and how they are desugared into core-language rules and patterns; and additional embodiments and conclusions.
Simple Example
Assume the code checked against this specification contains the type com.rsys.cluster.Cluster, which depends on com.rsys.jms.Destination. This dependency is flagged as a violation of the LAYER statement at line 22, because the Cluster type is part of the backplane pattern (line 5), whereas the Destination type is part of the jms pattern (line 9). The LAYER statement prevents such upward dependency. The precise meaning of the LAYER and ISOLATE statement is defined in later sections.
The need for some exceptions to the groundrules often arises. Generally speaking, the groundrules should apply to all subsystems of a project, but some specific pair of subsystems may need to allow for a few, well-known dependencies that violate the groundrules. For example, assume the ejb subsystem needs to refer to a type in the jms subsystem. As shown in
Module Architecture Specification
The IDARWIN is based on a core language of two basic elements. The central concepts are the pattern, which is used to capture a set of class-, interface- and package-names, and two kinds of rules. Macro style core language extensions provide the user with convenient abstractions, such as the concept of layers and isolation. A subsystem may consist of many packages, and one might want to exclude certain subpackages. A pattern captures such configurations. The pattern language is limited (i.e. no regular expressions) for two reasons:
As illustrated in
The core language is made up of three atomic and two compound patterns as well as two kinds of rules. Atomic patterns are either fully qualified class and interface names (called “name patterns”), or wildcard patterns (called “children” and “descendent patterns”). Compound patterns combine patterns by forming their union and difference. The syntax of patterns is:
Definitions 1–6 shown below capture the notion of pattern “matching” and pattern “subsumption”. An atomic name pattern a “matches” a pattern p (atomic or compound), and is written as a ≦p p. A pattern p “subsumes” a pattern q, and is written a ≦p q. The meaning (written as [[e]]) of the ≦p pattern relation is given in terms of familiar set theory, Boolean logic and the string prefix ≦relation. P denotes the set of all package name strings, and C denotes the set of all class and interface names. The variables r, s and t denote atomic and compound patterns. A denotes an atomic patterns. N and M as well as p and q denote a string. The ≦p relation, +, − as well as .* and .# are part of the core language alphabet.
The intuitive meaning of the ≦p relation is based on the interpretation of patterns as sets of strings (where the strings are class and interface names).
IDARWIN uses the following syntactic rewrite rules to evaluate pattern “matching” (definitions 11, 14, 17) and “pattern subsumption” (definitions 7–19). Here [[e]] reduces the relation e to terms involving the ≦(string prefix) and ≐(string equal) relation over strings and Boolean logic.
Rules
A dependency is defined as a pair of fully qualified class-or interface-names (atomic name patterns), e.g. (java.util.Iterator, java.lang.Object). The core language of IDARWIN contains two kinds of rules, the (ALLOW p, q) and the (PREVENT p, q) rule. Both take two patterns as arguments. Shown below the dependency match relation (≦d) between a dependency and a rule is defined using the previously defined pattern match relation between a patterns (≦p). Here a and b are atomic name patterns, p and q are patterns (atomic or compound).
For example:
d (PREVENT java.lang.* java.*)
Rule Precedence
The notion of rule precedence is central to the ability to allow users to specify many independent specifications and then compose them. During processing IDARWIN orders the rules, so that the most specific rule is checked first, and hence dominates a less specific rule. This allows for the intuitive notion of “rule X is an exception to the general rule Y”. Using the previously defined inclusion relation between patterns, it is possible to define what it means for a rule to be equal to or more specific than another rule.
For example:
r (ALLOW
r (ALLOW
A set of rules can now be sorted using the above relation. For example, assume we have a set that contains (PREVENT java.util.* java.lang.*),(ALLOW java.util.jar.JarFile java.lang.Object) and (PREVENT * java.*), the sorting would lead to the following order, where the most specific rule comes first:
We define the polarity π(r) of a rule r as:
Polarity is a simple way to determine the “type” of a rule (it is ALLOW or PREVENT). The notion of polarity is used to define two performance related simplifications:
There is a catch to the ordering defined above. For example if we have the following two rules: (PREVENT com.foo.* *) and (ALLOW * java.*), their order is undefined since neither is a subset of the other:
r (ALLOW * java.*),
p java.*
r (PREVENT com.foo.* *),
p com.foo.*, despite java.* ≦p *
Since the two rules have different polarity (one is an ALLOW rule, while the other is a PREVENT rule), the fact that the order among them is undefined leads to an ambiguous result when a dependency is checked against them. For example, the dependency (com.foo.Bar, java.lang.Object) matches both rules, so depending on which rule is checked first, the dependency triggers an ALLOW or a PREVENT action. In the latter case the dependency will be marked as a specification violation, whereas in the former case the dependency would be accepted as specification compliant. The above rules leave it unclear whether a dependency that matches (ALLOW com.foo.* java.*) and (PREVENT com.foo.* java.*) should be accepted or rejected since both rules apply. Such ambiguity is clearly unacceptable.
A simple way to avoid the above contradiction is to rewrite the rules slightly and state, for example: (PREVENT com.foo.* (*-java.*)) and (ALLOW * java.*). Now the two rules do not conflict anymore since any dependency that matches the first rule won't match the second and vice versa, since (*-java.*) and java.* are disjoint. The notion of contradiction defined below will take this into account.
In accordance with one embodiment of the invention, a specification is defined to be inconsistent if there is at least one pair of rules that contradict each other. Later we will refine this definition to allow for the ability to explicitly resolve contradictions by either supplementing a “resolverrule”, or by annotating contradicting rules with precedence levels.
If two rules a and b contradict each other, we write a{circle around (×)}b, where {circle around (×)} is defined below. Assuming that the patterns p, q, r and s are not union pattens:
If the rules contain union patterns, they must be split:
Example contradictions
There are two ways to resolve rule contradictions:
During operation, the IDARWIN systems desugars a module architecture specification into a set of rules S. If the rules are consistent, we order them using the ≦r relation, to create an ordered set of rules O. The set of rules are now ready to be applied and checked against each dependency presented to the tool.
If the rules are inconsistent (i.e. there is at least one contradicting pair of rules), a few intermediate steps must be taken to arrive at O. First, the rules involved in contradictions (call this set of pairs of rules C) are determined by pair wise checking all rules in S. For example, after desugaring extended language statements into core language rules we may have the unordered set of rules S 270 as shown in
(PREVENT com.foo.*
(ALLOW *
Next, for each contradiction (a pair (p, q) of rules) in C, the set S\{p, q} is searched for a contradiction resolver. A rule r resolves a contradiction (p, q)εC (two rules contradicting each other), if and only if r<rp^ (AND) r<rq, meaning r is more specific than both p and q. A resolved contradiction is removed from C, the resolver rules is added to the set of resolvers R and removed from S. This process is repeated until all resolver rules are found.
In the above example, shown in
Now we have the three sets: the resolvers R, the remaining contradictions that were not resolved C−R, and the rules that do not contradict each other minus the resolvers that were moved to R, S−C−R.
The user can annotate each rule that is involved in a contradiction with a precedence level (during desugaring this precedence level is passed on). The remaining unresolved contradicting rules in C−R are ordered by precedence. If the user gave no precedence to the rules, IDARWIN exits with an error reporting the contradictions remain unsolved.
In the above example, shown in
Now, we form the union O=R+op(C−R)+o≦r(S−C−R), where op means ordered by precedence annotation and o≦r means ordered by rule specificness.
The toolkit's verbose mode 280, as illustrated for example in
Rule precedence is only invoked for rules that are involved in a contradiction. Therefore rule precedence annotations cannot be used to control the order of rule evaluation. In the above example, rule 3 has precedence 1, which lets it dominate rule 5.
Rule Checking Process
Assuming the system has a consistent set of rules O (all existing contradictions resolved through the above process), then a dependency d is checked against the rules following these steps.
After all dependencies have been presented to IDARWIN, all rules that were not used (i.e. the set of O–U), can be printed, so that the user can update the specification and remove potentially outdated rules. This is an important tool to keep the specification in sync with code changes (e.g. a library is not longer used in the code, so all rules that refer to this library can be detected and purged).
In practice, the set of rules may be under the control of an individual software developer or development team. During compilation, dependencies between the software modules can be identified, and in accordance with an embodiment of the invention, checked against a set of rules. As shown in
Extended Language Constructs
In accordance with an embodiment of the invention, the IDARWIN module architecture specification language offers the user additional constructs in addition to the core language rules. As shown below, p and q can be patterns or names bound to patterns. Some additional statements include:
To aid the structuring of large specifications, patterns can be bound to a name, so as to allow them to be reused. Bindings can be exported and imported to allow sharing of pattern definitions among specification files, for example:
To facilitate the IMPORT feature, specifications must be named, hence the first statement in any specification must be “DSPEC name IS . . . ”. Any top-level binding exported from specification name can be imported in another specification. In the example 310 shown in
Desugaring Into Core Language Rules and Patterns
The meaning of each statement type of the extended language is described in further detail below by describing how that statement type is desugared into core language rules and patterns.
The ALLOW and PREVENT statement are passed on to the core language unchanged, except for the resolution of bindings into patterns. For example the lib binding in ALLOW com.foo.* TO lib statement in the above example is resolved and the core-language rule results.
Layer and Strictlylayer
LAYER and STRICTLYLAYER take an ordered list of patterns as arguments. The patterns capture the layers of the module-architecture stack. The first pattern is assumed to be the top of the layering stack, the last element is at the bottom. LAYER p ABOVE q ABOVE r will be rewritten into three core language PREVENT rules: PREVENT r q, PREVENT r p and PREVENT q p. These three rules prevent any “upward” pointing dependencies, ensuring that lower layers do not depend on any of the layers above them.
In the STRICTLYLAYER case, an additional 4th rule is generated: PREVENT p r. This rule ensures that layers can only depend on the layer immediately below them and cannot depend on any layer further down.
Note that LAYER and STRICLTYLAYER do not imply that there's no other pieces of software above and below the top and bottom layer-element, as the pictorial representation of layers does. To capture this, the TOP and BOTTOM statements can be used (described below), which are then composed with the LAYER and STRICTLYLAYER statements.
Isolate
ISOLATE takes a set of patterns, where each patterns might for example represent a subsystem. ISOLATE p FROM q FROM r will be rewrite into six core language PREVENT rules that prevent each pair or patterns to refer to each other: PREVENT r q, PREVENT r p, PREVENT q r, PREVENT q r, PREVENT p r, PREVENT p q.
Interestingly ISOLATE p FROM q FROM r can be implemented as a combination of LAYER statements. Write ISOLATE [p, q, r] as ISOLATE I, then desugar ISOLATE I=(desugar LAYER I) U (desugar LAYER reverse I).
Bottom and Top
The TOP and BOTTOM statements take a pattern and are rewritten into two core language rules. TOP p is desugared into PREVENT * p and ALLOW p p. BOTTOM p is desugared into PREVENT p * and ALLOW p p.
The purpose of TOP p is to express the fact that p is at the very top of the module hierarchy and no other module (other than p), itself should refer to p. The purpose of BOTTOM p is analogous to express the fact that p is at the very bottom of the module hierarchy and must itself not refer to any other module (except itself). BOTTOM is most useful to make sure a library is tagged as such.
BOTTOM p is a stronger statement than LAYER . . . ABOVE p, since the latter only prevents dependencies from p into upper-layers that are named in the LAYER statement, whereas the BOTTOM statement prevents p from depending on any code other than itself.
TOP p is not implied by LAYER p ABOVE . . . . Both are needed to capture the meaning of the familiar (but ambiguous) graphical layered diagrams.
Hide
HIDE p OUTSIDE q requires p≦rq and is used to hide “internal” packages. For example: HIDE com.foo.internal.* OUTSIDE com.foo.* will make sure that no dependency from e.g. com.bar.* can reach into the internal package com.foo.internal.*. Only com.foo and it's subpackages (including com.foo.internal) are allowed to depended on the internal package.
HIDE p OUTSIDE q is desugared into a single core language rule: PREVENT (*-q) p. For example HIDE com.foo.internal.* OUTSIDE com.foo.* results in the rule PREVENT
Additional Embodiments
Additional embodiments fall into two categories—improving the toolkit and extending the specification language. Once a specification is written for a subsystem, it tends to be relatively stable and to change slowly compared to the changes to the code itself. Hence one option is to separate the specification processing steps (“specification compiler” tool) from the code-dependency checking step (“compliance checker”). Specialized compliance checkers will be written—one for commandline use, and others that integrate tightly with popular IDEs, ideally reading their internal “code dependency” representations directly.
Another option is the use of generic statements. Experience shows that the HIDE p OUTSIDE q statement in particular is often used in a similar way for multiple subsystems. The particular idiom states that “internal” types in a subsystem shall not be referred to from the “outside” of the subsystem. Rather than repeating such a conventions for each subsystem, the overall specification may be clearer and more maintainable if one expresses the above idiom with a single rule, such as
To instantiate generic statements IDARWIN needs to find the package names that match the most specific pattern in the generic statement. For example, the package of class com.foo.ejb.internal. A would match the generic statement pattern (the # pattern matches the dotless ejb), leading to an instantiation of the specific statement, substituting ejb for subsystem. The most specific pattern is used to instantiate the generic statement, so that no specific, obsolete (don't match any dependency) statements are generated.
To implement generic statements, IDARWIN needs access to the package structure of a project to instantiate specific rules. One embodiment of the current system performs this via two passes over the code-dependencies. The first pass, just after the specification-parsing phase, reads all dependencies and finds all package names that match a generic statement, instantiating additional specific statements. IDARWIN reads the code-dependencies a second time to check them against the core-language rules.
Reading the code-dependencies to find all package names in order to instantiate generic rules is at odds with the aim of separating the “specification compilation” phase from the “compliance checking” phase because now the former would need to know the actual dependencies that will be checked in the second phase. This is only an issue for a commandline “batch” version of the system. An IDE version of IDARWIN is able to notice changes to the package structure of a project, without having to see all code-dependencies through IDE callbacks in the even of package additions, renaming and deletion.
The IDARWIN system in accordance with an embodiment of the invention provides a module architecture specification language, and a toolkit to check code for specification compliance. The specification language allows for the safe composition of independently authored specifications, by automatically detecting contradictory statements and by ordering the statements by “specificness”. As a result developers and architects can structure large specifications into “structural concerns” that each focus on overall-groundrules and subsystems. They can adapt specification fragments without the danger of unintentionally affecting the meaning of other specification sections. The specification language successfully strikes a balance between expressiveness and effort to learn. Users may pick up the language instantly, and the specification language is able to capture real world module architectures of large systems. IDARWIN is designed to be used in a fast moving development environment, where frequent code structure improvements through refactoring require a lightweight specification language. In particular, module architecture specifications can be used to:
The present invention may be conveniently implemented using a conventional general purpose or a specialized digital computer or microprocessor programmed according to the teachings of the present disclosure. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art.
In some embodiments, the present invention includes a computer program product which is a storage medium (media) having instructions stored thereon/in which can be used to program a computer to perform any of the processes of the present invention. The storage medium can include, but is not limited to, any type of disk including floppy disks, optical discs, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.
The foregoing description of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations will be apparent to the practitioner skilled in the art. Particularly, it will be evident that while embodiments of the invention have been described herein with respect to a J2EE application server or WebLogic™ environment, that embodiments and implementations may also be used with other application servers, and with other computing environments. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalence.
This application claims the benefit of U.S. Provisional Application No. 60/408,697, filed Sep. 5, 2002 and claims the benefit of U.S. Provisional Patent Application Ser. No. 60/450,839, filed Feb. 28, 2003.
Number | Name | Date | Kind |
---|---|---|---|
5640567 | Phipps | Jun 1997 | A |
6725452 | Te'eni et al. | Apr 2004 | B1 |
20020129347 | Fischer | Sep 2002 | A1 |
20030018963 | Ashworth et al. | Jan 2003 | A1 |
20030131338 | Georgalas | Jul 2003 | A1 |
20040015540 | Solano et al. | Jan 2004 | A1 |
20040015833 | Dellarocas et al. | Jan 2004 | A1 |
20040015946 | Te'eni et al. | Jan 2004 | A1 |
20040205689 | Ellens et al. | Oct 2004 | A1 |
Number | Date | Country | |
---|---|---|---|
20040133875 A1 | Jul 2004 | US |
Number | Date | Country | |
---|---|---|---|
60450839 | Feb 2003 | US | |
60408697 | Sep 2002 | US |