1. Technical Field
The present invention relates to data processing and, in particular, to learning content management and delivery. Still more particularly, the present invention provides a method, apparatus, and program for creation of knowledge and content for a learning content management system.
2. Description of Related Art
Electronic-learning (e-learning) is an umbrella term for providing computer instruction online over the Internet, private distance learning networks or in-house via an intranet. Computer based training (CBT) uses a computer for training and instruction. CBT programs are called “courseware” and provide interactive training sessions for all disciplines. CBT courseware is typically developed with authoring languages that are designed to create interactive question/answer sessions.
A learning management system (LMS) is an information system that administers instructor-led and e-learning courses and keeps track of student progress. An LMS may be used internally by large enterprises for their employees. An LMS may be used to monitor the effectiveness of an organization's education and training.
A learning content management system is software that manages learning content for e-learning. A LCMS provides for the storage, maintenance, and retrieval of documents, such as hyptertext markup language (HTML) and extensible markup language (XML) documents, and all related elements. For example, learning content management systems may be built on top of a native XML database and provide publishing capabilities to export content to a Web site, CD-ROM, or print.
Currently, when using a LCMS (i.e. entering content into the LCMS), customers must manually parse existing whole courses into discrete learning objects and manually associate metadata with the objects. This manual effort is intensive and reduces immediate return on investment for the conformant metadata with the objects. Current LCMS implementations focus on drawing new content into the repository. However, current LCMS implementations do not provide a method for automating the import of legacy content and automatically deriving metadata for the legacy content.
As the e-learning industry shifts to a blended approach of knowledge content management and learning content management, legacy knowledge content of various formats will also need to be added to the LCMS. As is true for legacy learning content, this is currently a manually intensive effort.
Therefore, it would be advantageous to provide an improved mechanism for the automatic creation of knowledge and content for a learning content management system.
The present invention is a mechanism that automates the creation of learning objects from knowledge and learning content in various common formats. Importing is performed using a tool with custom parsers for common formats. The parsers split the content into learning objects, generate metadata, and relate metadata to the objects. The tool may also provide points of integration for making new parsers available through the tool. Candidate content may be presented to the user by searching the local file system. Search engine output may be used to present the candidate list.
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
With reference now to the figures,
In the depicted example, learning content management system (LCMS) is implemented in server 104, which is connected to network 102 and provides for the storage, maintenance, and retrieval of documents, such as hyptertext markup language (HTML) and extensible markup language (XML) documents, and all related elements in content database 106. For example, the learning content management systems may be built on top of a native XML database and provide publishing capabilities to export content to a Web site, CD-ROM, or print. Learning content is information that is intended to be rendered in a learning experience. Knowledge content is content from a source other than educational materials. Knowledge content may be assimilated into learning content.
In the depicted example, learning management system (LMS) may be implemented in server 114. The LMS administers instructor-led and e-learning courses and keeps track of student progress. The LMS may deliver learning content from content database 106. Alternatively, the content database may be connected to server 114. The LMS may be used to monitor the effectiveness of an organization's education and training. The LMS, like the LCMS may be implemented in a server, which may include a Web server or the like.
In addition, clients 108, 110, and 112 are connected to network 102. These clients 108, 110, and 112 may be, for example, personal computers or network computers. In the depicted example, LCMS server 104 or LMS server 114 may provide learning content, such as coursework, to clients 108-112. Clients 108, 110, and 112 are clients to server 104. Network data processing system 100 may include additional servers, clients, and other devices not shown.
Alternatively, a LCMS may include learning content delivery functionality and, similarly, a LMS may include content management functionality. However, in accordance with a preferred embodiment of the present invention, the LCMS includes a tool that automates the creation of learning objects from knowledge and learning content in various common formats. Importing is performed using a tool with custom parsers for common formats and a generic parser that splits the content into learning objects, generates metadata, and relate metadata to the objects. The tool may also provide points of integration for making new parsers available through the tool. Candidate content may be presented to user by searching the local file system. Search engine output may be used to present the candidate list.
In the depicted example, network data processing system 100 is the Internet with network 102 representing a worldwide collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols to communicate with one another. At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers, consisting of thousands of commercial, government, educational and other computer systems that route data and messages. Of course, network data processing system 100 also may be implemented as a number of different types of networks, such as for example, an intranet, a local area network (LAN), or a wide area network (WAN).
Referring to
Peripheral component interconnect (PCI) bus bridge 214 connected to I/O bus 212 provides an interface to PCI local bus 216. A number of modems may be connected to PCI local bus 216. Typical PCI bus implementations will support four PCI expansion slots or add-in connectors. Communications links to clients 108-112 in
Additional PCI bus bridges 222 and 224 provide interfaces for additional PCI local buses 226 and 228, from which additional modems or network adapters may be supported. In this manner, data processing system 200 allows connections to multiple network computers. A memory-mapped graphics adapter 230 and hard disk 232 may also be connected to I/O bus 212 as depicted, either directly or indirectly.
Those of ordinary skill in the art will appreciate that the hardware depicted in
The data processing system depicted in
With reference now to
In the depicted example, local area network (LAN) adapter 310, SCSI host bus adapter 312, and expansion bus interface 314 are connected to PCI local bus 306 by direct component connection. In contrast, audio adapter 316, graphics adapter 318, and audio/video adapter 319 are connected to PCI local bus 306 by add-in boards inserted into expansion slots. Expansion bus interface 314 provides a connection for a keyboard and mouse adapter 320, modem 322, and additional memory 324. Small computer system interface (SCSI) host bus adapter 312 provides a connection for hard disk drive 326, tape drive 328, and CD-ROM drive 330. Typical PCI local bus implementations will support three or four PCI expansion slots or add-in connectors.
An operating system runs on processor 302 and is used to coordinate and provide control of various components within data processing system 300 in
Those of ordinary skill in the art will appreciate that the hardware in
As another example, data processing system 300 may be a stand-alone system configured to be bootable without relying on some type of network communication interfaces. As a further example, data processing system 300 may be a personal digital assistant (PDA) device, which is configured with ROM and/or flash ROM in order to provide non-volatile memory for storing operating system files and/or user-generated data.
The depicted example in
The custom parsers transcode content, such as whole courses, from given formats to well-structured documents. The well-structured documents may be in a markup language, such as HTML or XML, for example. Custom parsers 402, 404, 406 are designed with the knowledge of the particular content format. The custom parsers may include more or less intelligence based upon the complexity of the content format. As an example, a custom parser for HTML content may divide the content into learning objects, such as chapters, sections, subsections, etc., based upon header levels. As another example, a custom parser for word processing content may divide the content into learning objects based upon numbering tags.
The transcoded documents are provided to generic parser 410. The generic parser operates on the well-structured documents and parses the content into learning objects. Generic parser 410 automatically derives and associates learning object metadata with the learning objects. The learning object metadata may be Learning Object Metadata (LOM) or Shareable Content Object Reference Model (SCORM) conformant metadata. For more information on LOM, see Draft Standard for Learning Object Metadata, Jul. 15, 2002, IEEE 1484.12.1-2002, which is herein incorporated by reference. For more information on SCORM, see The Advanced Distributed Learning Sharable Content Object Reference Model Content Aggregation Model, Version 1.2, which is herein incorporated by reference. The metadata may include, for example, file size, multipurpose Internet mail extension (MIME) type, technical requirements, author, and date.
The tool generates and stores object relationships using the metadata so that units, chapters, lessons, subsections, sections, etc. may be reconstructed when needed. For example, the tool may relate the metadata to the content objects through IMS6 conformant XML Manifest metadata. For information on IMS Manifest metadata, see IMS Learning Resource Metadata XML Binding Specification, Version 1.2, which is herein incorporated by reference.
The tool then populates content database 420 with all metadata and content objects. The content database may be, for example, a standards-conformant LCMS repository for managing knowledge and learning content. The content database may also be used to deliver the learning content as coursework.
The tool automates the creation of learning objects from knowledge and learning content in various common formats. The tool may also provide points of integration for making new parsers available through the tool. Candidate content may be presented to user by searching the local file system. Search engine output may be used to present the candidate list.
In an example custom parser implementation, a custom parser is created for instructor-led courses developed in Framemaker, so that the content and graphics could subsequently be transformed into specific Web-based training templates and a particular site structure used to deliver Web-based training.
Rather than developing a custom parser to parse the original Framemaker file, which is a possible alternative, the exemplary implementation starts with the input of the Framemaker files saved as HTML.
Each unit is a Framemaker file (.fm) that contains text, graphic, paragraph styles, and character styles. Each unit is saved as HTML in a directory by the name of the unit. The input structure is as shown, with the HTML files and graphics in each unit folder.
The custom parser for the Framemaker documents requires that the person create a load file, such as loadfile.txt, that identifies which units are to be parsed into the repository. It is possible that the Framemaker table of contents could be used for this particular implementation; however, course developers may prefer to be able to specify which units are actually imported into the repository for future use.
The load file contains one line per unit to be parsed and resides at the top level of the course folder.
Example Load File:
In the example implementation of the parser, the base required SCORM 1.2 metadata tags plus those metadata tags of interest are identified. The tags that can be auto-derived by the parsers and which initial “seed” input is required by the user are also identified.
To handle the metadata entities that could not be easily derived, a graphical user interface (GUI) allows users to enter the seed data.
In the depicted example, expanded “Import” menu 506 is presented responsive to “Import” being selected in menu 504. “Import” menu 506 includes selections for “Course,” and “Course Outline.” Selection of “Course” launches a graphical user interface for selecting content type and for entering particular metadata, such as course title, source location, author, and so forth.
With reference to
GUI 600 also includes input field 604, which may be a drop-down box for selecting from known content types. The known content types may include, for example, student guides, instructor guides, student exercises (all from books), and a Web course (Web-Based Training). The custom parser is selected based upon the content type selected in input field 604.
Source location is used to identify the location o source to be imported by the parser. Document type is used to identify the custom parser that will import the content. The custom parser reads in the load file that resides in a source location identified by the user. The custom parser then opens a hypertext markup language (HTML) file for each unit that resides in an identified unit folder. The custom parser may also perform some “clean up” tasks. These are defined below.
Cleaning and Simplifying HTML Content
The parser includes regular expressions and other code to clean and simplify source HTML content. The parser simplifies the heading tags to basic “Hn” identifiers. This level information is used to identify reusable chunks and to determine the organizational metadata (course/object structure) for the IMS Manifest files for nested content objects. When content is reused, headings are adjusted to match their new position.
Over time, with multiple changes in formatting standards, the tags to identify headings can vary substantially from one course to another. Because of the application of multiple templates throughout the history of a course, for example, it is common for the more mature courses to contain a number of different tags, all representing the same heading level. The parser contains the rules to convert the headings defined in the legacy content to a simplified HTML heading. Table 1 below illustrates different representations of the same common level.
Additionally, there are situations where an item appearing in the same location is given a different heading level in different Framemaker templates. The parser contains the regular expression substitutions to homogenize and simplify HTML headings. Table 2 below illustrates examples of different levels for the same content.
Note that the substitution values will impact the transform. Consider that lessons are at the same level as the H3 Objectives heading. What that means is that the Objective page will be formatted however a Lesson page is formatted, if a substitution value of “<H3>” is selected to replace “<H3 CLASS=“O-Objectives”>” and “<H4 CLASS=“O-Objectives”>.”
In some legacy content, heading levels are chosen by their appearance within the browser, rather than as an indicator of level within a structured document. In those cases, the 3CS Parser substitutions map the existing heading level to the desired level in a structured document. Table 3 below illustrates an example mapping of heading levels.
The Tiv_SG parser eliminates extra, unnecessary HTML tags generated when doing a Save-as HTML from Framemaker files with IBM Tivoli Education character and paragraph templated styles. Table 4 illustrates examples of unnecessary HTML elements.
Certain characters cause the SCORM-conformant XML files storing metadata to be malformed. For example, if a special character appears in the first fifty words of a chunk of HTML, the special character might be used within the description metadata field, which would break the XML file.
The 3CS parser eliminates certain special characters that appear in the legacy content files. An alternative solution may be to find a suitable text-based replacement. Table 5 below illustrates examples of problematic characters that may be eliminated by a custom parser.
To further add to the complexity of identifying headings and related content chunks, various HTML editors and save-as HTML functions produce headings. Heading tags may span multiple lines and often contain other nested tags. The 3CS Parser may employ regular expression substitutions to simplify tags spanning multiple lines. Table 6 below illustrates examples of multi-line tags that may be eliminated by a custom parser.
Once the Custom Parser has massaged the content into clean, well-structured HTML files for the generic parser, the generic parser handles the work of deriving the metadata that can be automatically set, as described in the metadata table below, chunking the content, and identifying nested levels of objects (units, lessons, sections, subsections, etc).
Chunking of Content by the Generic Parser
Course developers may want to reuse units, lessons, sections, subsections, entire courses, or specific media objects, such as an image file, a video file file, or a Macromedia Flash file, when creating a new course or updating an existing course.
The generic parser chunks content objects on the “Hn” tags in the HTML files provided by the custom parsers. The generic parser then generates a unique object ID for the HTML object and generates metadata conforming to SCORM 1.2. The content chunk is delimited with markers to identify the chunk as belonging to a particular object. This is not of immediate importance, but will be valuable in efforts to manage versioning and to propagate changes or notify course developers regarding changes to courses reusing the object. All paths to embedded media or links to media container objects are replaced with a file name, so that the media can be stored with less effort to adjust paths. This is helpful because the tool of the present invention is not integrated with a relational database management system, and the transforms must also adjust paths to meet the requirements of the desired output format.
The diagrams in
All object files and XML metadata files are then copied to a folder, named according to the Part Number of the object that was parsed, created at the top level of the repository. Another exemplary embodiment of the present invention may tie into a relational database.
For each reusable object encountered through parsing, the generic parser generates a unique object ID, this are sequentially assigned with the last used object ID being stored in a file in the configuration directory. This unique object ID is expected to eventually be used as the unique identifier for objects within a relational database management system. At this time, the object IDs are used as unique object identifiers within the XML metadata files (CatalogEntry.Catalog of the ObjID) and are used in the naming of the HTML asset XML files and XML manifests.
Because it is slightly more difficult to rename media files linked in from parent HTML container files, the generic parser leaves multimedia objects as named in the legacy content. The associated XML file follows a naming convention of mediafile-extension.xml.
For HTML chunks, the ObjID entry is used to name the chunk of HTML. For example, when the parser locates a header and parses an HTML chunk with an automatically generated ObjID of 34323, the HTML asset file is named “34323.htm” and the associated asset XML file is named “34323-htm.xml.”
Each Asset XML file contains the structure identified in Table 6 below. Notice that many of these metadata elements are automatically derived by the generic parser.
A text file associated with each asset contains metadata information. The following are treated as assets by the parser:
In many cases, a chunk of HTML content parsed from an existing Institute for Learning Technologies (ILT) course using the generic parser included embedded media. To identify the multimedia objects as separate objects that can be searched on by keywords and reused in another course, the generic parser generates unique object IDs for each multimedia object embedded in the HTML page. In addition to using asset XML files to store asset metadata, the generic parser creates a manifest file that identifies the HTML file and the embedded media as a unified content object.
In this example, the HTML file contains three embedded graphics (only two are shown). The resources section of the manifest identifies all of the files included in the reusable object package.
Turning to
The HTML page with embedded media:
The generic parser uses the concept of metadata to filter, select, and assemble chunks of learning content (sharable content objects) into larger chunks of learning content. Ultimately, the implementation is expected to provide revision control and propagation of change notification. For this reason, it is important to be able to identify nested content objects and to maintain indication of both location and content changes within the smaller chunks of content and within parent objects.
As the Generic Parser encounters <Hn> tags, it pushes the object id and level onto a stack. After handling the individual asset objects, the Generic Parser defines the aggregates of objects making up each level, essentially identifying the objects that would be included in any level of object that a course developer may select for reuse.
Thereafter, the process splits the content into learning objects with a generic parser (step 1206) and generates learning object metadata (step 1208). Next, the process generates and stores object relationships using a metadata Manifest (step 1210). Then, the process populates the LCMS repository with metadata and content objects (step 1212) and ends.
Thus, the present invention solves the disadvantages of the prior art by providing a mechanism that automates the creation of learning objects from knowledge and learning content in various common formats. Importing is performed using a tool with custom parsers for common formats. The parsers split the content into learning objects, generate metadata, and relate metadata to the objects. The tool may also provide points of integration for making new parsers available through the tool. Candidate content may be presented to user by searching the local file system. Search engine output may be used to present the candidate list.
It is important to note that while the present invention has been described in the context of a fully functioning data processing system, those of ordinary skill in the art will appreciate that the processes of the present invention are capable of being distributed in the form of a computer readable medium of instructions and a variety of forms and that the present invention applies equally regardless of the particular type of signal bearing media actually used to carry out the distribution. Examples of computer readable media include recordable-type media, such as a floppy disk, a hard disk drive, a RAM, CD-ROMs, DVD-ROMs, and transmission-type media, such as digital and analog communications links, wired or wireless communications links using transmission forms, such as, for example, radio frequency and light wave transmissions. The computer readable media may take the form of coded formats that are decoded for actual use in a particular data processing system.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.