It's All in the Name - Implementing Data Element Naming Standards |-------------------------------------| | Paper presented at CAUSE92 | | December 1-4, 1992, Dallas, Texas | |-------------------------------------| IT'S ALL IN THE NAME--IMPLEMENTING DATA ELEMENT NAMING STANDARDS Amy Brooks and Judy Smutek University of Michigan Ann Arbor, Michigan ABSTRACT Data element naming standards promote and facilitate data sharing across systems and among data users by proving a means for making data readily identifiable. At the University of Michigan one of the charges of the newly established Data Administration area is to develop and implement standards for naming and defining data. A working group was established to make recommendations for standards for institutional data. This group conducted research on standards used at other institutions in order to propose naming standards which could be applied to both internal and external data. This paper focuses on the goals and activities of the Data Element Naming Standards Working Group (DENS), their successes and obstacles, end users' and programmers' reactions to the proposed standards, and how these standards are being implemented at the University. INTRODUCTION Data is an important asset to the University of Michigan. Since data is an institutional resource, it is appropriate that formal standards and guidelines be developed and used to manage and control data. Just as accounting standards help to manage the University's finances, data standards can help to manage the University's data resources. In the past, data was often viewed as belonging to a single department or business application. This meant that data was not always defined or named in such a way that it could be readily understood or shared by other departments or applications. Today, customers and information systems staff alike face a critical need to be able to merge and analyze data from many different systems in order to make informed decisions. One way to facilitate this process is through the use of data standards. What are Data Standards? Data standards comprise the rules for defining, documenting, and naming data. At the U-M, such standards are evolving and currently consist of various kinds of recommendations and approved lists, including the following: ** Guidelines for defining data elements ** Major classifications of data ** Standard syntax for naming data ** Suggested formats for data ** Approved abbreviations ** Guidelines for using standards and enforcing their use As mentioned above, one component of data standardization is a standard syntax for naming and defining data. Data element naming standards are instrumental in fostering a common understanding of data throughout the University, promoting data sharing across systems and among data users by providing uniformity in how data is defined and named. At the U-M, standards for naming and defining data were the first component of data standards to be developed. To illustrate how data standards can favorably impact an organization, consider the following scenarios: === Example A: Without standards Tuition charges have increased considerably. The new amount for medical students now exceeds the size of the original data element field. The database which stores this amount must be modified as well as all programs and files which use this data. Because there is not a standardized way to name data elements, each program must be analyzed and the data elements which contain the tuition amount identified in order for the size of the fields to be increased. The effort is intensive and takes several months. With standards The data element which contains the amount of tuition has been consistently named according to the standards. In all programs, files and the originating database the element name is TUITAMT. A global search and replace is carried out to increase the field size. The effort is completed in less than one week. === Example B: Without Standards A report is needed to identify all students who eventually become donors to the University. This report is to include the student/donor name, the date University degrees were conferred, the academic program of enrollment, and the total dollar amount of the gifts given in the last five years. The alumni/donor data, the student enrollment data, the student degree data, and the gift data must be merged. No data standards exist and each dataset contains dates which follow different formats. Conversion programs must be written to put the dates into a common format in order to merge the data. This process adds several weeks to the reporting effort. With Standards Data standards are adopted and enforced to ensure that all dates are represented in CCYY-MM-DD format. The query to produce the report can be written to join the date fields without converting the data. The report is produced in a timely fashion and the information proves invaluable to a major capital campaign. Who Should Develop Standards? At the U-M, University Information Systems (UIS) is a component of the InformationTechnology Division. UIS is responsible for developing and maintaining administrative systems. The Data Administration group within UIS is responsible for ensuring the establishment, maintenance, and delivery of stable, reliable, and accessible collections of institutional data in electronic form for shared access by the University community. Institutional data is defined as data that satisfies one or more of the following criteria: ++It is relevant to planning, managing, operating, or auditing a major administrative function of the University. ++It is referenced or required for use by more than one organizational unit. Data elements used internally by a single Database. ++It is included in an official University administrative report. ++It is used to derive an element that meets the criteria above. Data standards facilitate the discovery of data which can be shared throughout the University and provide uniformity which enables the control and management of the University data resource. Thus, the development of data standards became a charge to Data Administration. The Data Element Naming Standards Working Group (DENS) To gather support for developing data standards and to formulate a plan for getting started, a proposal was drafted identifying the major tasks. This proposal requested representatives from areas within UIS to form the core working group to develop standards, with the Data Administration representative serving as the chair. In October, 1991 the Data Element Naming Standards Working Group (DENS) met for the first time. The seven members were all senior level staff who had expertise in areas of systems development. These individuals were to be responsible for developing and promoting the standards. It was determined that customers and UIS staff would serve as the review audience for the standards the group proposed. Incorporating Total Quality Management (TQM) With the recent integration of Total Quality Management (TQM) into the Information Technology Division, it was decided that some of these concepts would be incorporated into the group, particularly the use of PAL, that every meeting should have a set purpose, agenda and time limits. Another TQM concept which was followed was the establishment of rules of conduct for the group. These rules of conduct were basic principles that the group members agreed were important to the success of the group's activities. The rules of conduct became very instrumental to the success of the DENS group. As group members wearied of the tedious process for developing standards, and attendance at meetings began to falter, these basic principles were used to rally the troops. The group put pressure on individual members to participate and to attend, emphasizing the need for everyone to participate in the weekly discussions so that decisions did not need to be rehashed and redecided. Determining a Charge The DENS group agreed to meet weekly and the first task was to establish a charge. The group decided to focus on tangible deliverables which could be completed within a year. DENS CHARGE --- Establish the University's Class Words. Draft and submit for approval a list of major classifications of data, approved acronyms and abbreviations, definitions, data types and formats to be used in standardizing the University's data elements. --- Develop Guidelines for Defining Data Elements. Develop guidelines for defining data elements to be used by UIS staff and users when creating new datasets. --- Establish a Syntax for Data Element Names. Recommend a syntax to be used when naming logical and physical data elements. --- Develop a Method for Educating Staff and Users about Standards and Guidelines and Enforcing their Use. Recommend a method for incorporating data element naming standards into the current data element definition forms and a method for enforcing the use of the standards. Interviews were held with UIS project coordinators to determine if formal or informal data standards already existed. Several internal UIS documents which addressed standards were reviewed as well. Many resources were used to provide background and guidelines for developing standards. The most valuable proved to be Arnold Barnett's All About Data Elements (1990). Many of Barnett's philosophies were incorporated into the standards developed for the U-M. THE UNIVERSITY OF MICHIGAN'S DATA STANDARDS ++Components of Data Element Naming Standards The three major entities governed by data element naming standards are: 1. data element definitions, which consist of prime words, optional qualifiers, and class words; 2. logical data element names, which are derived from data element definitions; 3. and physical data element names, which are developed from logical data element names. A data element definition is an English phrase (or phrases) which describes a data element. The definition should not be specific to an application. Instead, it should describe the data so that anyone in the University will be able to read and understand the meaning of the data. For example, the data element Organization Group Description has the following definition: A textual description of the code assigned to each UM academic and administrative unit for the purposes of financial reporting. A logical data element name uniquely identifies a data element within the University's data resource. An example of a logical data element name, derived from the above definition, is: ++Organization Group Description Finally, a physical data element name is the name required by operating software to identify data uniquely and manage it within program and system code. The University may establish several physical data element names for one logical name, depending on the language or database management systems in which the data element is used. One data element at the University may be given different physical names for use in PL/I, COBOL, DB2, Oracle, and IMS. Because these different environments impose different constraints, the physical names for the previous example could differ in length and format as follows: ORGANIZATION_GROUP_DESCRIPTION PL/I ORGANIZATION-GROUP-DESCRIPTION COBOL OrgGrpDes DB2, Oracle OrgGrpDs IMS ++The Proposed Standards and Guidelines Data element definitions, and logical and physical data element names all contain two or three basic components. It seems appropriate to name and explain these components before presenting the standards themselves. These basic components are prime words, class words, and optionally, qualifiers. Prime words describe the subject area of data. At the U-M, prime words are being identified by Data Administration during strategic data planning and data modeling efforts. Below are some prime words that have already been identified: Student Employee Account Registration Degree Funding Election Jobclass Program Appointment Course Organization Class words describe the major classifications of data associated with data elements. The DENS group developed a complete list of class words appropriate for use at the University. For each data element defined and named, one class word from the list becomes part of the data element definition and associated logical and physical names. Some examples of class words at U-M are: Date Number Amount Code Name Qualifiers further define and distinguish the prime and class words. The following are examples of qualifiers: Last Starting Birth Type Beginning Assessed Status Previous When defining and naming data elements, one step of the process flows into the next. In order to follow data naming standards at U-M, a three- step progression is recommended. First define the data element, incorporating the appropriate class word, a prime word and qualifiers as needed. Then use that definition to develop the logical element name. Finally, use the logical element name to arrive at the physical element name, taking the name length and format constraints of the operating software into consideration. ++Data Element Definitions The following guidelines apply to the process of developing data element definitions at U-M: a. The classification of data (class word) should be identified in the first part of the definition. A code representing... A name describing... b. The definition should identify the associated entity (prime word). The date of birth of an employee... The starting time of a course... c. The definition should consist of one or two statements which read clearly and concisely. d. The definition should describe what the data is, not what it is not. e. The definition should note if the data is derived from other data elements. If the data is derived, the associated algorithms should be included in the definition. f. Ideally, the definition should describe the data fully, so that it is easily understandable to someone outside of the specific business area. ++Logical Data Element Names The next task to be addressed was to establish a syntax for naming data. The experts had conflicting opinions on the correct syntax to follow: class word - prime word - qualifiers; qualifiers - prime word - class word; prime word - qualifiers- class word; prime word - class word - qualifiers. At the U-M, the testing of real data proved to be the best approach to reach a decision. In order to test the data and determine the most intuitive syntax, the DENS group tapped into the energies of customers and systems staff who were working on a very successful project to provide administrative data to users in Oracle. Thus the Data Access Project became the proving grounds for the standards. The project consisted of three subgroups focused on delivering financial, personnel and student data in an environment conducive to ad hoc reporting. The Data Access Project provided an opportunity to test the standards in their infant stage and to incorporate valuable customer participation in the process. After experimenting with different rules of syntax and applying them to many existing data elements, the DENS group concluded that the following syntax for logical data element names promotes element names that are most readable, intuitive and commonly understood: Prime word -- required Qualifiers -- optional Class word -- required Examples of logical element names which adhere to this syntax follow: PRIME WORD + (QUALIFIER + CLASS WORD = LOGICAL DATA ELEMENT or secondary NAME PRIME WORDS) Student + Last + Name = Student Last Name Account + + Balance = Account Balance Employee + Major + Code = Employee Major Instructional Instructional Organization Code Organization A slight exception to this syntax is warranted in a very few cases: that of inserting a qualifier before the prime word to promote readability. For example: (QUALIFIERS) + PRIME WORD + (QUALIFIERS) + CLASS = LOGICAL DATA or secondary WORD ELEMENT PRIME WORDS) NAME Research + Account + Type + Code = Research Account Type Code Next Higher + Organization + + Code = Next Higher Organization Code ++Physical Data Element Names Physical data element names are developed from logical data element names, but because of limitations imposed by operating software of various types, the physical names may require the use of abbreviations and acronyms. The class words used at the U-M are accompanied by two- and three-character approved abbreviations, and prime words are being documented with standard abbreviations as well. In some very constrained environments, however, physical data element names may be so restricted in length that even the abbreviation may need to be shortened, or the class or prime word may need to be excluded from the physical name. In such cases, the standard naming syntax should be followed as closely as possible, employing the shortest abbreviations while still aiming for readability and completeness. When the operating system allows it, improved readability can be achieved by using a combination of upper and lower case in element names to distinguish the beginnings of component parts and/or by including underscores or hyphens between the parts. The examples below illustrate the development of physical element names for different software environments and their differing constraints: LOGICAL PHYSICAL ELEMENT NAMES ELEMENT NAME IMS DB2/ORACLE PL/I COBOL Student StLstNm StuLastNm STU_LAST_NAME STU-LAST-NAME Last Name Account AcctBal AcctBal ACCT_BAL ACCT-BAL Balance Employee MajInOrg MajInstOrgCd EMP_MAJ_INST_ORG_CODE Major Instructional EMP-MAJ-INST-ORG-CODE Organization Code ++The University's Classifications of Data Resources were tapped to provide sample lists of class words. The general recommendation was that the list should contain ten to fifteen classifications for the organization. The DENS group "tried on" the class words from the sample lists with data elements from many different University datasets. Most of the sample lists contained the class words address and flag or indicator. After much discussion, it was decided that address was a qualifier, and all of the component elements of an address would fit into one of the classifications already identified (e.g., State Code, Street Line Text, Country Code, etc.). The flag or indicator classification proved to be a test of the consensus building techniques the DENS group followed. The words flag and indicator were interpreted to represent a binary relationship, either on or off, yes or no. Based on the members experiences in the trenches of systems development, they felt that no element ever remained completely binary, that eventually an additional value was added to the element, a "U" for unknown or an "X" for not applicable. Thus for the University's standards, an indicator became a code that indicates. This decision was reviewed several times, and the original arguments held. As the standards are being followed, there has been no disagreement with the original decision. Class words group data into major classifications. At the U-M, the classifications are themselves divided into major categories: Chronology Related to a point or span in time Measurement Indicating dimension, capacity, amount, performance or duration Identification Distinguishing a person or thing Text Free-form or narrative in nature The University's class word list contains the class words grouped by category. Definitions of each class word, recommended abbreviations, and several examples of data elements in each class are also included. One of the first steps in defining and naming a data element involves consulting the class word list to choose the major classification to which the data element belongs. The selected class word then becomes an important part of the element's definition as well as a part of its logical and physical names. ++Obstacles to Standardizing Data Two of the main obstacles to standardizing data names and definitions stem from resistance to change and the limitations imposed by operating systems' constraints. At the U-M, many administrative departments have been dealing with the same data elements for years and have become comfortable with data names that are not intuitive or clear to an outsider. And when length and format restrictions result from choices of operating environment and coding language, many data users prefer to retain names that have grown meaningful to them over time, without regard for the fact that "their" data may be accessed by increasing numbers of casual users. In these cases, it is helpful to remind data users and system designers that in the trend toward distributed data processing and data sharing, no one office or department can consider itself the owner of data. The following example characterizes these obstacles and how the Data Access Project proved that standards are crucial. ++The Great Alpha Code Debate The Data Access Project Financial Dataset Subgroup was ready to expand the initial set of corporate financial data to include additional tables and elements. The core elements had been selected and the tables developed using established data modeling techniques. The next step was to establish data element names and definitions for the expanded dataset. The Data Administration analyst was working with the group to identify names and definitions which conformed to the developing standards. A data element which was fondly known as Alpha Code to the central financial office, became the impetus for determining how the standards would be applied and accepted. The alpha code identifies the sponsor or donor of funds of an account, and its alphabetic coding scheme is used for sequencing purposes. As the data analyst questioned the subgroup about the "true nature of the element," the central office representative emphatically proclaimed that, "everyone knows what an Alpha Code is!" The other subgroup members, representative business managers from schools and colleges and extensive users of financial data, promptly replied that, in fact, they had no idea what an Alpha Code was. The data element was defined and renamed Account Source Code by the subgroup. The decision was an acknowledgment of the benefits and need for naming standards. ++Education and Promotion In William Durrell's book, Data Administration, a Practical Guide to Successful Data Management, the seventh commandment of his Ten Commandments of Data Standards (1985), proved to be an excellent complement to the TQM philosophy of "customer first." These concepts were incorporated into the DENS group's actions when it was time to promote the standards. Durrell's seventh commandment reads: Standards must be sold, not dictated. Even if upper management whole- heartedly supports Data Administration standards, the standards must be sold to employees at all levels. Data Administration must be willing to advertise the standards to all employees and to justify the need for such standards. Data Administration standards demand that programmers and analysts change the way they design data. Any lasting and meaningful change must come from the employees themselves. Educating users and UIS staff about the standards and marketing the concept of standards became a two-pronged effort. Drafts of the standards document were distributed to all UIS employees and to key users for review and comment. Members of the DENS group made presentations, providing in-depth information about the use of standards and how they were developed, and answering questions and concerns. Reactions were generally positive and all comments and revision requests were documented and discussed within the DENS group. Many of the comments or questions could be addressed by reviewing the minutes of the group's earlier discussions. A detailed education plan was developed and followed, specifically: DENS Education Plan --A technical bulletin was issued to announce the standards to UIS staff. Accompanying this technical bulletin was the DENS standards and a Question & Answer information sheet of most frequently asked questions about the standards. --An announcement was made to UIS customers via The InformationTechnology Digest, published by the University's Information Technology Division. --A segment on the DENS standards was included in the existing Data Administration overview class for UIS staff. --UIS staff and customers working with Data Administration on developing new data elements received support from Data Administration in applying the standards either through formal data modeling sessions or informal discussions. --UIS staff were encouraged to contact members of the DENS committee or Data Administration for interpretation/support in applying the standards. But How Can Standards Be Enforced? The DENS group was ready to focus on developing a method for enforcing the standards. The Documentation Library (DocLib), an area within UIS, was responsible for moving programs and systems to production and ensuring that existing conventions were being followed. The DENS group had assumed that the naming standards would be enforced at the same point. The group also understood the necessity of avoiding bottlenecks which would frustrate customers and staff trying to follow the standards. Another TQM concept, management by facts, was put into operation. A representative from DocLib gathered information regarding the number of data elements which had been defined or named using the current method and forms for entry into UCC10, the University's data dictionary for IMS data elements. Six months of UCC10 activity was studied and it was determined that fewer than a dozen data elements had been named or defined through the conventional method. It was determined that the existing standards for documenting new data elements were not being followed, as programmers and analysts were pressured to move systems to production as quickly as possible. If the standards were enforced at this point, a bottleneck was unavoidable and the enforcement was destined to fail. The group interviewed additional UIS staff to determine a method of enforcement that could succeed. The DENS recommendation for enforcement was to educate and to encourage proactive involvement in naming and defining data by Data Administration. DENS Recommendation for Enforcing Data Standards: --In order to promote consistency in naming and defining data it is imperative that Data Administration be involved in the development process. Recognizing that resource and time constraints will prevent this involvement in all application projects, the standards should be understood and their use encouraged by managers and their staff who oversee projects. --By enforcing the U-M's Systems Development Methodology, data will be named and defined consistently and according to standards. It is inappropriate for naming standards to be enforced within the move-to- production process. Enforcement at this point would be ill-timed and cause delays in project implementation. If Data Administration is involved in the development process, data will be named and defined consistently up front, prior to programs being written. SUMMARY At the U-M over the course of a year, data naming standards were successfully developed and implemented, thanks to the efforts of many interested people. To begin, a Data Administration representative, already versed in the areas of data administration and data naming, requested the help of experienced staff from various areas of UIS. A small group was formed to perform the bulk of the work. This DENS group applied TQM techniques from the outset; first, as it proceeded to gain familiarity with the concepts and terms used in data names and definitions, and later as it focused on its charge and developed the components of data naming standards that fit the University environment. When formulating the U-M naming standards, DENS group members tested the standards against real data elements and introduced them to customers collaborating with them on current projects, both to see how well the standards worked and to learn how easy they were to use. Even after several iterations of the standards were tested and reworked, the DENS group sought more customer and programmer feedback. The standards were documented, explained and distributed, and comments on the standards document were solicited from UIS managers and programming staff and from a number of U-M customer groups. As the DENS group awaited comments from programmers and customers, it continued to work on the remaining aspects of the standards effort, education and enforcement. The DENS education plan was multi-faceted. It included the standards document in its final form accompanied by a Question & Answer information sheet, a newsletter announcement, and incorporation into the Data Administration overview class. Data Administration staff continue the education process begun by DENS as they work with customers, analysts, and programmers in formal data modeling sessions or informal design discussions. Responsibility for enforcement of data naming standards falls to Data Administration staff as well. Their ongoing efforts early in a project's life cycle, identifying and naming applicable data elements, almost eliminate the need for rigid enforcement of the standards. Because the standards make it easier to name and define data elements, and because Data Administration works with project developers, the enforcement process practically takes care of itself. Since the U-M naming standards have been formalized and distributed, Data Administration has been contacted by other University units requesting copies of the DENS document for use in developing departmental systems. The DENS final report encouraged further development of standards for table names, views and synonyms, and the report also recommended that the DENS group reconvene at periodic intervals to review proposed changes to the existing standards. Throughout the process of developing standards, the DENS group focused on the basis for the standards themselves: to promote data consistency, to reduce redundancy, and to make data readily understandable. At the same time, the group acknowledged that the full benefit of data standards would not be realized until a University-wide data dictionary or repository was in place. The implementation of a repository will certainly impact the standards that the University has adopted. In the interim, data is being named and defined more consistently, and UIS staff and customers better understand the need for standards. References: Barnett, Arnold. 1990. All About Data Elements. Durell, William R. Data Administration: A Practical Guide to Successful Data Administration. New York: McGraw-Hill, 1985.