1. Field of the Invention
The present invention relates to computer programming languages having structured data types. More particularly, the invention concerns language processing techniques for supporting the initialization of variables based on structured data types with designated initializers.
2. Description of the Prior Art
By way of background, many programming languages have structured data types whose elements are defined by an ordered group of named fields, each usually having an associated type. The “struct,” “array” and “union” variables of the C programming language are examples. When declaring a variable that has a structured data type, many languages allow an initializer within the declaration that assigns initial values for the fields. Depending on the language, the initializer may contain only a group of values in the same order as the fields of the structured data type, called positional initializers, or it may contain a group of pairs of field names and corresponding values, called designated initializers.
Using positional initializers and relying on the order of the fields of the structured data type to initialize variables can in some cases prove error-prone and burdensome. An example would be a data structure whose elements are function pointers to a set of related operations, such as a set of file system operations for a file system. If the structured data type definition used to define the operations is modified so that the order of the fields changes, or if fields are removed or added (e.g., to remove or add operations), the meaning of one or more positional initializers in variable declarations based on this data type will likely change. Unless this leads to a type mismatch between a field and the corresponding value, language processors (such as compilers, interpreters, static analysis tools, semantic patch tools, etc.) cannot give an error, warning, or other diagnostic information when processing the variable declarations. Furthermore, whether or not language processors can issue a diagnostic, changing the fields of the structured data type will still require changing every positional initializer for declared variables of that type, which may prove prohibitive for large, complex software systems. Finally, some structured data types have many fields but users of those structured data types only utilize a smaller subset of the fields. For example, the file system data structure “struct file_operations” of version 2.6.24 of the Linux® kernel defines fields for 26 function pointers. However, most file systems do not implement 26 different file system operations. For example, the data structure “ext3_file_operations” for the Ext3 file system only fills in 12-13 of these function pointer fields (depending on the kernel configuration). Positional initializers for such variables must therefore include many placeholder values (such as NULL in the C programming language), and initializing the correct field requires counting out the correct number of placeholder values.
Designated initializers address these issues. By identifying fields by name, designated initializers will continue to work correctly even if the fields of the structured data type change, as long as the fields named in the designated initializer do not change. Furthermore, changing an existing field will only affect designated initializers that use the field, not necessarily all designated initializers. Finally, designated initializers remove the need for placeholder values.
Many large, complex systems make use of designated initializers for these reasons. For example, the Linux® kernel has long used C-language designated initializers, first via a GCC (GNU Compiler Collections) extension to the C89 standard, and later via the standard C99 syntax. For the reasons outlined above, many structured variables in the Linux® kernel should, by convention, only be declared with designated initializers, never positional initializers. Such data structures include, but are not limited to, the various data structures comprised of groups of function pointers representing related operations that may expand to include more operations at a later time. However, only convention and code review processes enforce the use of designated initializers.
A method, system and computer program product are provided for enforcing the use of designated initializers in structured type initializations. The technique may include determining whether a structured data type requires designated initialization, determining whether an initialization of a structured variable declared to use the structured data type employs an improper initializer that is inconsistent with the structured data type, and performing a diagnostic action if the initialization comprises an improper initializer.
The foregoing and other features and advantages of the invention will be apparent from the following more particular description of exemplary embodiments of the invention, as illustrated in the accompanying Drawings, in which:
Turning now to the figures, wherein like reference numerals represent like elements in all of the several views,
Turning now to
As described the Background section above, conventional language processors cannot enforce the use of designated initializers when declaring and/or initializing structured variables based on structured data type definitions. They do not give an error, warning, or other diagnostic information when processing the variable declarations/initializations, nor do they perform other actions such as refusing to build object code. As described in more detail below, the language processor 32 is designed to provide such enforcement, thereby aiding in the software development process by eliminating sources of programming errors and reducing software maintenance burdens.
To facilitate the discussion to follow, consider the following example of a C-language structured data type definition:
This structured data type definition defines a C-language “struct” data type called “my_struct.” The fields of the “my_struct” data type include a “flags” element of type “uint32_t” (an unsigned integer data type), a “value” element of type “char *” (a pointer to a character data type) and a “value2” element of type “uint32_t” (the same unsigned integer data type as “flags”).
The following structured variable declarations based on the “my_struct” data type initialize the variable using positional initializers:
In the declaration of the structured variable “s1,” the elements “flags,” “value1” and “value 2” of the “my_struct” structured data type are each assigned a value. In the declaration of the structured variable “s2,” only the “flags” and “value” elements are assigned values. In the declaration of the structured variable “s3,” only the “flags” and “value2” elements are assigned values. Note that a placeholder is required for the “value” element even though it was not initialized. If the “my_struct” structured data type had numerous fields that are not always initialized, the task of declaring and initializing many “my_struct” variables in a software project could become quite burdensome.
The following structured variable declarations based on the “my_struct” data type initialize the variable using designated initializers:
In the declaration of the structured variable “s1,” the elements “flags,” “value1” and “value 2” of the “my_struct” structured data type are each assigned a value. In the declaration of the structured variable “s2,” only the “flags” and “value 1” elements are assigned values. In the declaration of the structured variable “s3,” only the “flags” and “value2” elements are assigned values. Note that a placeholder is not required for the “value1” element even though it was not initialized. In this case, if the “my_struct” structured data type had numerous fields that are not always initialized, the task of declaring and initializing many “my_struct” variables in a software project would not be unduly complex because unused fields can simply be ignored. There is no need for placeholders.
Applicant submits that it would be beneficial for language processors to enforce the use of designated initializers by issuing appropriate diagnostics. This would allow earlier detection of problems or potential problems stemming from the improper use of positional initializers. To that end, the language processor 32 is adapted to evaluate structured variable initializations and intervene if an initialization uses an improper initializer (e.g., a positional initializer) when the structured data type requires designated initializers.
One way that this could be done is to maintain an information resource such as a list, a database or other information entity that specifies structured data types requiring designated initializers. The language processor 32 could then consult the information resource as part of its operations. As the language processor 32 parses structured variable initializations, it will thus know which initializations should be inspected to ensure compliance with the designated initialization requirement. The information resource could be maintained as part of the language processor itself, or externally thereof, such as in association with a particular software project or the source code 34 itself. In
Another technique that may be used to identify structured data types requiring designated initialization is to extend the syntax for declaring a structured type to include a technique for marking the type as requiring designated initialization, such as via a type attribute or other indicator. The language processor 32 could thus determine from the source code itself which structured data types require designated initialization. The following definition for a structured data type called “my_struct—2” illustrates how such an attribute could be used:
Note that the “_attribute_” keyword is supported by existing implementations of the C programming language to specify attributes of data types in data type definitions, including but not limited to “struct” and “union” structured data types. However, the currently-allowed types of “_attribute_” do not include the “designated_init” attribute specified above. One way that the language processor 32 could be implemented is as a modified a C-language compiler that supports the “designated_init” attribute. Existing interpreters for interpreted languages and static analyzers could be similarly modified to support such an attribute. When the attribute is detected in the definition of a structured data type, the language processor would inspect subsequent variable initializations based on the structured type for compliance with the designated initializer requirement.
Following are a set of example initializations of the “my_struct—2” structured data type. The declarations of the first two structured variables “s1” and “s2” use positional initializers. As indicated by the accompanying comments, these initializations would cause the language processor 32 to generate a diagnostic. The declarations of the second two structured variables “s3” and “s4” correctly use designated initializers. As indicated by the accompanying comments, these would not generate warnings.
Referring back to the definition for the “my_struct—2” structured data type, it will be seen that the “designated_init” attribute applies to the data type as a whole, and thus governs each of its elements (i.e., “field1” and “field2”). An alternative technique would be to allow such an attribute to be separately specified for individual fields. This technique could be used if a structured data type has a stable portion amenable to positional initialization and a varying portion that requires designated initialization. Consider the following example based on the “my_struct” structured data type introduced above. In the below-illustrated type definition, language processor 32 is advised that the first element can be initialized in any suitable fashion but the second two elements require designated initialization:
The following initialization of a structured variable “s1” based on the “my_struct” data type would not generate any warnings because designated initializers are used for all of the elements:
The following initialization of a structured variable “s2” based on the “my_struct” data type would also not generate any warnings because a designated initializer is used for the “value 1” element:
In contrast, the following initialization of a structured variable “s3” based on the “my_struct” data type would generate a warning because a positional initializer is used for the “value 1” element:
Regardless of how the language processor 32 determines which structured data types require designated initialization, the language processor can issue a diagnostic (such as a warning, an error, or even a refusal to build object code) when it encounters a positional initializer for a variable of whose structured data type requires designated initialization.
Note that struct-type data structures are not the only structured data types that benefit from designated initialization compliance. The disclosed technique could also be used for other structured data types, such as unions, and arrays.
An example of using the “designated_init” attribute for a union is shown below:
Structured variables based on the “my_union” data type may be initialized as follows, with the first two initializations for the variables “u1” and “u2” generating no diagnostics and the second two initializations for the variables “u3” and “u4” generating diagnostics stemming from the use of positional initializers:
An example of using the “designated_init” attribute for an array is shown below:
Structured variables based on the “char_handlers” data type may be initialized as follows without generating diagnostics:
Structured variables based on the “char_handlers” data type may likewise be initialized as follows without generating diagnostics:
Structured data types based on the “char_handlers” data type may likewise be initialized as follows, but would result in diagnostics due to the use of a positional initializer:
The following initialization for a structured variable based on the “char_handlers” data type would also produce a diagnostic based on the use of positional initializers:
The disclosed technique may also be used with compound literal initializations. For example, consider the definition for the “my_struct—2” structure data type discussed earlier:
Compound literals based on the “my_struct—2” data type can be initialized in the manner shown below. The first initialization generates no warning whereas the second initialization does due to the presence of positional initializers, as follows:
As another example of compound literal initialization, consider the definition for the “my_union” structure data type discussed earlier:
Compound literals based on the “my_union” data type can be initialized in the manner shown below. The first two initializations generate no warning whereas the third initialization does due to the presence of a positional initializer, as follows:
Although the foregoing examples of structured data types are all based on the C programming language, it should be understood that the technique disclosed herein may be used with other languages that support both designated and positional initializers. Examples of such languages include D, Ada and Haskell.
For example, a structured data type written in the D programming language might use the following syntax (comprising a pragma directive) to advise the language processor 32 that a structured data type requires designated initializers:
Structured variables based on the “my_struct” data type may be initialized as follows, with the initialization for the variable “s1” generating no diagnostic and the initialization for the variable “s2” generating a diagnostic stemming from the use of positional initializers:
A structured data type written in the Ada programming language might use the following syntax (comprising a pragma directive) to advise the language processor 32 that a structured data type requires designated initializers:
Structured variables based on the “DATE” data type may be initialized as follows, with the initializations for the variables “Date1” and “Date3” generating no diagnostics, and the initializations for the variables “Date2” and “Date4” generating diagnostics stemming from the use of positional initializers:
A structured data type written in the Haskell programming language might use the following syntax (the Haskell pragma syntax (-# #-)) to advise the language processor 32 that a structured data type requires designated initializers:
Structured variables based on the “Date” data type may be initialized as follows, with the initialization for the variable “Date 1” generating no diagnostic, and the initialization for the variable “Date 2” generating a diagnostic stemming from the use of positional initializers:
Pattern matching that implements initializations based on the Haskell “Date” data type may also enforce designated initializers, with the initialization in the “useYear” statement generating no diagnostic, and the initialization in the “useMonth” statement generating a diagnostic stemming from the use of a positional initializer:
Accordingly, a technique for enforcing the use of designated initializers in structured type initializations has been disclosed. It will be appreciated that the foregoing concepts may be variously embodied in any of a data processing system, a machine implemented method, and a computer program product in which programming logic is provided by one or more machine-useable media for use in controlling a data processing system to perform the required functions. Exemplary machine-useable media for providing such programming logic are shown by reference numeral 100 in
While various embodiments of the invention have been described, it should be apparent that many variations and alternative embodiments could be implemented in accordance with the invention. For example, a feature could be added to the language processor 32 that allows a user to selectively enable or disable enforcement of the use of designated initializers, thereby suppressing such processing and the implementation of diagnostic actions if positional initializers are improperly used. By way of example only, one way that this feature could be implemented is by way of a flag (e.g., “-Wdesignated-init”) that is passed to the language processor 32 when it is invoked. It is understood, therefore, that the invention is not to be in any way limited except in accordance with the spirit of the appended claims and their equivalents.