SOURCE CODE ANALYSIS ARCHIVAL ADAPTER FOR STRUCTURED DATA MINING

Information

  • Patent Application
  • 20070261036
  • Publication Number
    20070261036
  • Date Filed
    May 02, 2006
    18 years ago
  • Date Published
    November 08, 2007
    17 years ago
Abstract
Embodiments of the present invention address deficiencies of the art in respect to code reuse management and provide a method, system and computer program product for source code archival adapter for structured data mining In one embodiment of the invention, a method for adapting archived source code for structured data mining for source code reuse can be provided. The method can include parsing source code to identify individual classification elements within the source code, generating a markup language formatted set of code constructs corresponding to the classification elements, and storing the markup language formatted set of code constructs in a source code archives
Description

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention. The embodiments illustrated herein are presently preferred, it being understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown, wherein:



FIG. 1 is a schematic illustration of a software development data processing system configured for structured data mining of archived source code for source code reuse; and,



FIGS. 2A and 2B, taken together, are a flow chart illustrating a process for adapting archived source code for structured data mining for source code reuse.





DETAILED DESCRIPTION OF THE INVENTION

Embodiments of the present invention provide a method, system and computer program product for adapting archived source code for structured data mining for source code reuse. In accordance with an embodiment of the present invention, source code can be parsed and statements within the source code can be classified. The parsed and classified source code can be transformed into markup conforming to a common source code schema and archived as markup. The archived markup can be searched to locate reusable code portions and selected reusable code portions can be retrieved for code reuse. Upon retrieval, the selected reusable code portions can be restored from markup form to source code form. In this way, reusable code portions can be more readily located and reused irrespective of the heterogeneous nature of source code across different projects.


In further illustration of an embodiment of the present invention, FIG. 1 is a schematic illustration of a software development data processing system configured for structured data mining of archived source code for source code reuse. The system can include a host computing platform 110 configured to host the operation of a software development tool 120, such as the Eclipse (TM) extensible development platform distributed by the Eclipse Foundation and its members. The development tool 120 further can be configured for code reuse through the persistence and retrieval of reusable source code in the code reuse repository 160.


The code reuse repository 160 can include a source code archive 130 coupled to the software development tool 120 via source code adapter 200. The source code adapter 200 can include program code enabled to process source code 140 into markup language formatted code portions according to the markup language schema 150. Specifically, the source code adapter 200 can parse the source code 140 to identify different code constructs within the source code 140.


The different code constructs can be denoted by markup language tags provided by the schema 150 and stored into the source code archive 130. In this way, the constructs can be located subsequently within the source code archive 130 through searching markup language denoted code portions. Once located, the markup language denoted code portions can be returned to source code form for code reuse.


In further illustration of the operation of the source code adapter, FIGS. 2A and 2B, taken together, are a flow chart illustrating a process for adapting archived source code for structured data mining for source code reuse. Beginning in block 205 of FIG. 2A, source code can be parsed to identify different portions of the source code associated with different code constructs. Examples can include classification elements including data member declarations, function declarations, event listeners and the like.


In block 210, a first code structure can be selected and in block 215, the code structure can be matched to a schema element within a markup language schema for the source code. In block 220, a markup language tag can be applied to the code structure and the process can repeat through decision block 225 for additional selected code structures in the source code. When the source code has been fully processed, in block 230 the markup language tagged code structures can be stored for subsequent indexing, searching and retrieval.


Turning now to FIG. 2B, once a set of tagged code structures have been stored in the source code archive, the structures can be indexed for searching and retrieval. In this regard, in block 235 one or more search terms can be accepted for searching the source code archives In block 240, the search terms can be applied to the tagged code structures in order to locate code structures of interest. In decision block 245, if a code structure can be located, in block 250 the code structure can be retrieved. Thereafter, in block 255 the located code structures can be converted to native source code such that the native source code can be reused within a software development project. Finally, in block 260 the process can end.


Embodiments of the invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, and the like. Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.


For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) and DVD.


A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution. Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.

Claims
  • 1. A method for adapting archived source code for structured data mining for source code reuse, the method comprising: parsing source code to identify individual classification elements within the source code;generating a markup language formatted set of code constructs corresponding to the classification elements; and,storing the markup language formatted set of code constructs in a source code archive.
  • 2. The method of claim 1, further comprising: indexing the markup language formatted set of code constructs in the source code archive; and,keyword searching the indexed markup language formatted set of code constructs to locate code constructs of interest.
  • 3. The method of claim 2, further comprising: retrieving the located code constructs of interest; and,transforming the retrieved code constructs of interest into source code suitable for code reuse in a software development project.
  • 4. The method of claim 1, wherein parsing source code to identify individual classification elements within the source code, comprises parsing source code to identify individual classification elements within the source code selected from the group consisting of data member declarations, function declarations and event listeners.
  • 5. The method of claim 1, wherein generating a markup language formatted set of code constructs corresponding to the classification elements, further comprises generating the markup language formatted set of code constructs specified within a schema for the markup language as corresponding to the classification elements.
  • 6. A software development data processing system comprising: a source code reuse repository comprising a source code archive;a software development tool coupled to the source code reuse repository; and,a source code adapter disposed within the source code reuse repository, the source code adapter comprising program code enabled to parse source code to identify individual classification elements within the source code, generate a markup language formatted set of code constructs corresponding to the classification elements, and store the markup language formatted set of code constructs in the source code archive.
  • 7. The system of claim 6, further comprising a schema specifying markup language tags for a set of code constructs corresponding to the classification elements.
  • 8. The system of claim 6, wherein the classification elements comprises individual classification elements selected from the group consisting of data member declarations, function declarations and event listeners.
  • 9. The system of claim 6, further comprising a keyword search query interface to the source code archive.
  • 10. A computer program product comprising a computer usable medium embodying computer usable program code for adapting archived source code for structured data mining for source code reuse, said computer program product including: computer usable program code for parsing source code to identify individual classification elements within the source code;computer usable program code for generating a markup language formatted set of code constructs corresponding to the classification elements; and,computer usable program code for storing the markup language formatted set of code constructs in a source code archive.
  • 11. The computer program product of claim 10, further comprising: computer usable program code for indexing the markup language formatted set of code constructs in the source code archive; and,computer usable program code for keyword searching the indexed markup language formatted set of code constructs to locate code constructs of interest.
  • 12. The computer program product of claim 11, further comprising: computer usable program code for retrieving the located code constructs of interest; and,computer usable program code for transforming the retrieved code constructs of interest into source code suitable for code reuse in a software development project.
  • 13. The computer program product of claim 10, wherein the computer usable program code for parsing source code to identify individual classification elements within the source code, comprises computer usable program code for parsing source code to identify individual classification elements within the source code selected from the group consisting of data member declarations, function declarations and event listeners.
  • 14. The computer program product of claim 10, wherein the computer usable program code for generating a markup language formatted set of code constructs corresponding to the classification elements, further comprises computer usable program code for generating the markup language formatted set of code constructs specified within a schema for the markup language as corresponding to the classification elements.