Data Element Naming Standards CAUSE INFORMATION RESOURCES LIBRARY The attached document is provided through the CAUSE Information Resources Library. As part of the CAUSE Information Resources Program, the Library provides CAUSE members access to a collection of information related to the development, use, management, and evaluation of information resources- technology, services, and information- in higher education. Most of the documents have not been formally published and thus are not in general distribution. Statements of fact or opinion in the attached document are made on the responsibility of the author(s) alone and do not imply an opinion on the part of the CAUSE Board of Directors, officers, staff, or membership. This document was contributed by the named organization to the CAUSE Information Resources Library. It is the intellectual property of the author(s). Permission to copy or disseminate all or part of this material is granted provided that the copies are not made or distributed for commercial advantage, that the title and organization that submitted the document appear, and that notice is given that this document was obtained from the CAUSE Information Resources Library. To copy or disseminate otherwise, or to republish in any form, requires written permission from the contributing organization. For further information: CAUSE, 4840 Pearl East Circle, Suite 302E, Boulder, CO 80301; 303- 449-4430; e-mail info@cause.colorado.edu. To order a hard copy of this document contact CAUSE or send e-mail to orders@cause.colorado.edu. University Information Systems Data Element Naming Standards September, 1992 Developed By: Data Element Naming Standards Working Group Members: Peggy Bennett, Departmental Systems and Services Amy Brooks, Data Administration, Chair Mary Byrkit, Financial Information Systems Kurt Richardson, Application Support Center Sharon Smith, Documentation Library Judy Smutek, Database Administration and Support Jon Turbett, Academic Record Information Systems Table of Contents I. Introduction Page 3 II. Standards A. Background Page 4 1. Components of Data Element Naming Standards Page 4 2. Components of Data Element Definitions and Names Page 4 3. Steps to Follow in Applying Naming Standards Page 5 B. Guidelines Page 6 1. Data Element Definitions Page 6 2. Logical Data Element Names Page 6 3. Physical Data Element Names Page 7 C. Class Words Page 9 1. Instructions for Use Page 9 2. Class Word List Page 9 D. Quick Reference Guide for Applying Standards Page 12 I. Introduction Data is an important asset to the University. Since data is an institutional resource, it is appropriate that formal standards and guidelines be developed and used to manage and control data. Just as accounting standards help to manage the University's finances, data standards can help to manage the University's data resource. Data standards provide rules for defining, documenting, managing and naming data. Standards include: Example: - - Guidelines for defining data elements 1 or 2 concise statements - - Major classifications of data Code, Name, Date - - Standard syntax for naming data Data classification is always last in the data element name - - Suggested formats for data Dates should include century - - Approved abbreviations Cd, Nm, Dt - - Guidelines for using standards and Guidelines should be enforcing their use used when adding data elements to existing datasets or creating new datasets As mentioned above, one component of data standardization is a standard syntax for naming and defining data. Data element naming standards will be instrumental in fostering a common understanding of data throughout the University, thereby promoting data sharing across systems and among data users. Standards will also facilitate the development of a University-wide data dictionary. A comprehensive data dictionary does not currently exist, but is intended to serve as a resource for user documentation and interpretation, and as a means to promote systems development productivity by documenting, organizing and managing the University's data. It should also assist in conducting impact analyses of changes to data. Possible components of this future dictionary include: standard definitions for data elements, physical data element names, security levels for data, data formats, update frequency of data, and physical locations of data. This document describes the following standards for defining and naming data: 1. Naming standards apply to data element definitions, logical data element names, and physical data element names. 2. Data definitions, logical and physical data element names consist of prime words, class words, and optional qualifiers, which adhere to an accepted syntax. 3. Logical data element names are developed from data element definitions. 4. Physical data element names are developed from logical data element names. Standards apply to all new data elements. This means data elements in new databases (IMS or relational), new segments in existing databases, new sequential files, and references to these new data elements in application programs and copylib members. II. Standards A. BACKGROUND 1. Components of Data Element Naming Standards This document focuses on three components of data element naming standards: data element definitions, logical data element names, and physical data element names. a. Data element definitions A data element definition is an English phrase which describes a data element. This definition should not be specific to an application, but should describe the data in such a way that anyone in the University will be able to read and understand the meaning of the data. An example of a definition for the data element Organization Group Description is: A textual description of the code assigned to each U-M academic and administrative unit for the purposes of financial reporting. b. Logical data element names A logical data element name is the name which uniquely identifies the data within the University's data resource. As we progress towards the development of a University wide data dictionary, the logical data element name will be the unique identifier for the element within the dictionary. An example of a logical data element name, based on the data definition above, is: Organization Group Description c. Physical data element names A physical data element name is the name that operating software requires to uniquely identify and manage data elements. Examples of physical data element names include PL/I, COBOL, DB2, Oracle, and IMS element names. There may be many physical data element names for one logical name. Examples of physical data element names, based on the logical data element name above, are: ORGANIZATION_GROUP_DESCRIPTION PL/I ORGANIZATION-GROUP-DESCRIPTION COBOL OrgGrpDes DB2, Oracle OrgGrpDs IMS 2. Components of Data Element Definitions and Names Data element definitions, logical data element names, and physical data element names are all composed of two to three basic components. These components are: prime words, class words and, optionally, qualifiers. a. Prime words A prime word describes the subject area of the data. Prime words for the University are being identified by Data Administration through strategic data planning and data modeling efforts. Some examples of prime words which have been identified to date are: Student Employee Degree Funding Election Jobclass Program Appointment Course Organization Registration Account This list of prime words will be expanded and documented on a public server by Data Administration as data modeling and strategic data planning activities progress. b. Class words A class word describes the type of data, i.e., the major classification of data associated with a data element. This document contains a complete list of the class words for the University. See Section C.2. for details. Some examples of class words are: Date Number Amount Code Rate Name c. Qualifiers A qualifier can further define or distinguish the prime and class words. Some examples of qualifiers are: Last Starting Birth Type Beginning Assessed Status Previous 3. Steps to Follow in Applying Naming Standards When naming data, there is a progression from the data element definition to the logical data element name and then to the physical data element name. a. Define the data. For example, A code representing the category by University standards of the progress toward a degree or certificate by a student in a specific academic program. b. Use the definition to develop the logical element name. For example: Student Degree Progression Code c. Use the logical name to develop the physical data element name with proper consideration of element name length constraints imposed by the operating software. For example: DB2 (18 character limit) StuDegPrgsnCd IMS (8 character limit) StDgPnCd B. GUIDELINES 1. Data Element Definitions A data element definition should adhere to the following guidelines: a. The classification of data (class word) should be identified in the first part of the definition. A code representing. . . A name describing . . . b. The definition should identify the associated entity (prime word). The date of birth of an employee . . . A code representing the status of an account . . . c. The definition should consist of 1 or 2 statements which read clearly and concisely. d. The definition should describe what the data is (not what it is not). e. The definition should note if the data is derived from other data elements. If the data is derived, the associated algorithms should be included in the definition. f. Ideally, the definition should describe the data fully, so that it is easily understandable to someone outside of the business area primarily responsible for the data element. Some examples of good data definitions are: Course Description - A textual description of a course. Registration Status Code - A code representing the status of a student's registration in an academic program. Account Balance - The balance of an account, which represents the net result of credits and debits applied to the account. 2. Logical Data Element Names When developing logical data element names, the following standard syntax should be adhered to: Prime word -- required Qualifiers -- optional Class word -- required Since, as mentioned, data element naming standards are intended to foster a common understanding of data throughout the University, it is important that names be readable and intuitive. After significant testing, the syntax above proved to be the most natural and the one which lent itself most readily to the readability of element names. Some examples of logical data element names which adhere to this syntax are: Prime Word + (Qualifiers + Class Word =Logical Data Element Name or secondary Prime Words) Student + Last + Name =Student Last Name Student + Degree + Code =Student Degree Code Program + Status + Code =Program Status Code Course + Starting + Time =Course Starting Time Account + + Balance =Account Balance Employee + + Name =Employee Name Employee + Major Instructional Organization + Code =Employee Major Instructional Organization Code Funding + Begin + Date =Funding Begin Date A slight exception to this syntax is warranted in a very few cases: that of inserting a qualifier before the prime word. This exception should be considered carefully before being employed, but in some cases it may be useful for readability. (Qualifiers) + Prime Word + (Qualifiers + Class =Logical Data or secondary Word Element Name Prime Words) Research + Account + Type + Code =Research Account Type Code Next Higher + Organization + + Code =Next Higher Organization Code 3. Physical Data Element Names The logical data element name is used to determine the physical data element name. Standard abbreviations are being developed and documented for prime words. Class word abbreviations are listed in Section C.2. However, at this time, the constraints of the operating software may require that the data element names and in some cases even the abbreviations, be shortened. As a rule of thumb, the full spellings of prime words, class words, and qualifiers should be replaced with abbreviations and/or acronyms which are consistent and intuitive. Because of the severe element name length constraints in some environments (e.g., IMS constraint of a maximum of eight characters in data element names), it may be impossible to include the prime and class word abbreviations in the physical data element name and simultaneously make the name unique and identifiable. In this case it may be necessary to exclude the prime or class word from the physical data element name. When developing physical element names in a severely constrained environment, the naming syntax should be followed as closely as possible. The two character standard abbreviation for the class word should be used along with the shortest abbreviations for the prime word and qualifiers. When exceptions are absolutely necessary, the data element name should be as intuitive and complete as the physical environment allows. An aid to readability can be achieved, if the operating system allows it, by using a combination of upper and lower case in element names to distinguish the beginnings of the component parts and/or by including underscores between the parts. The following examples of physical data element names are based on the logical data element names and element name length constraints imposed by the operating software: Logical Physical Element Names Element Name IMS DB2/Oracle PL/I COBOL Student StLstNm StuLastNm STU_LAST_NAME Last Name STU-LAST-NAME Student DegPrgCd StuDegPrgsnCd STU_DEGREE_PROG_CODE Degree STU-DEGREE-PROG-CODE Progression Code Program PgmStCd PgmStatCd PROG_STATUS_CODE Status Code PROG-STATUS-CODE Course CrsStTm CrsStartTm COURSE_START_TIME Starting Time COURSE-START-TIME Account AcctBal AcctBal ACCT_BAL Balance ACCT-BAL Employee Name EmpName EmpName EMPLOYEE_NAME EMPLOYEE-NAME Logical Physical Element Names Element Name IMS DB2/Oracle PL/I COBOL Employee Major MajInOrg MajInstOrgCd EMP_MAJ_INST_ORG_CODE Instructional EMP-MAJ-INST-ORG-CODE Organization Code Funding FndEndDt FundEndDt FUND_END_DATE End Date FUND-END-DATE Research ResAcTyp ResAcTypCd SEARCH_ACCT_TYPE_CODE Account RESEARCH-ACCT-TYPE-CODE Type Code Next Higher NxtHiOrg NextHiOrgCd NEXT_HIGHER_ORG_CODE Organization NEXT-HIGHER-ORG-CODE Code C. CLASS WORDS 1. Instructions for Use Class words categorize data into major classifications. The classifications of data are divided into the following major categories: Chronology Data which indicates a point or span in time Measurement Data which indicates dimension, capacity, amount, performance or duration Identification Data which distinguishes a person or thing Text Data which is relatively free-form or narrative in nature Within each major category, the class words are listed with their definitions (Section C.2.). The standard abbreviations are indicated, and several examples of data elements in each class are provided. 2. Class Word List The following comprehensive list of class words and their standard abbreviations (shown in parentheses beneath the class words) should be used when defining and/or naming data. CHRONOLOGY Data which indicates a point or span in time Date A calendar day, month and year Employee Hire Date (Dt) (including century) Student Birth Date Day A day of the week Course Meeting Day (Dy, Day) Employee Work Day Month A calendar month in Student Admitted Month (Mo, Mon) numeric form Account Budgeted Month Year A twelve month period Student Registration Year (Yr) Account Fiscal Year Time Hours and minutes, may include Class Start Time (Tm) seconds, hundredths of Last Update Time seconds, etc. MEASUREMENT Data which indicates dimension, capacity, amount, performance or duration Amount A monetary value Student Aid Amount (Am, Amt) Purchase Order Amount Balance The net value of an account Account Current Balance (Bl, Bal) Account Ending Balance Count A number of people or things Student Count (Ct, Cnt) other than money Building Count Quantity A number of things other than Invoice Order Quantity (Qt, Qty) money Shipment Received Quantity Rate A unit of measure expressed by Student Enrollment Rate (Rt) its relation to another unit Employee Full Time Rate of measure Percent Part of a whole expressed in Housing Revenue (Pc, Pct) hundredths Percentage Tuition Percentage Rank Relative standing or position Student Class Rank (Rk, Rnk) Employee Seniority Rank Hours A duration of time expressed in Course Credit Hours (Hr, Hrs) hours Employee Hours Score A number that expresses merit Student Test Score (Sc, Scr) or performance Student SAT Score Average The mean of two or more numbers Student Grade Point (Av, Avg) Average Grade A value assigned to reflect Student Course Grade (Gd, Grd) performance or a position on a scale IDENTIFICATION Data which distinguishes a person or thing Number Alphanumeric data which Account Number (No, Num) identifies a person, place Course Number or thing Student Identification Number Key One or more data elements, FI50Key (Ky, Key) each having its own class word, which combined may be used as a convenience for a sequence field in a physical segment or record. The class Key is not used for institutional data elements. (See definition of institutional data in the "Information Resource Policy and Guidelines.") Code Data which represents encoded Address State Code (Cd) values or which functions as Student Sex Code a flag or indicator Account Status Code Honors Program Code TEXT Data which has relatively undefined content Text A free form string of data Prospect Letter Text (Tx, Txt) Name A word or combination of words Student Last Name (Nm) by which a person, place or State Name thing is commonly known Description Narrative data which translates Account Status (Ds, Des) a code or number Description Honors Program Description Comment An explanatory, illustrative Student Comment (Cm, Cmt) or critical note, remark or observation D. QUICK REFERENCE GUIDE FOR DEFINING AND NAMING DATA ELEMENTS 1. Step 1 -- Define the Data Begin with the class word and include the associated prime word. Use 1 or 2 statements which read clearly and concisely and which describe the data so that it is easily understandable to someone outside the business area responsible for the data element. Describe what the data is, not what it is not. If the data is derived, include the associated algorithms. For example: Student Degree Progression Code -- A code representing the category by University standards of the progress toward a degree or certificate by a student in a specific academic program. Account Balance -- The balance of an account, which represents the net result of credits and debits applied to the account. 2. Step 2 -- Develop the Logical Element Name Use the definition to develop the logical element name, adhering to the following syntax: prime word (required), qualifiers (optional), class word (required). For example: Prime Word + (Qualifiers + Class Word =Logical Data Element Name or secondary Prime Words) Student + Degree Progression + Code =Student Degree Progression Code Account + + Balance =Account Balance Student + Last + Name =Student Last Name Course + Starting + Time =Course Starting Time Employee + Major Instructional =Employee Major Organization + Code Instructional Organization Code Funding + End + Date =Funding End Date 3. Step 3 -- Develop the Physical Element Name Use the logical name to develop the physical data element name with proper consideration of element name length constraints imposed by the operating software. For example: Logical Physical Element Names Element Name IMS DB2/Oracle PL/I COBOL Student DegPrgCd StuDegPrgsnCd U_DEGREE_PROG_CODE Degree STU-DEGREE-PROG-CODE Progression Code Account AcBal AcctBal ACCT_BAL Balance ACCT-BAL Student StLstNm StuLastNm STU_LAST_NAME Last Name STU-LAST-NAME Course CrsStTm CrsStartTm COURSE_START_TIME Starting Time COURSE-START-TIME Employee Major MajInOrg MajInstOrgCd EMP_MAJ_INST_ORG_CODE Instructional EMP-MAJ-INST-ORG-CODE Organization Code Funding FndEndDt FundEndDt FUND_END_DATE End Date FUND-END-DATE Research ResAcTyp ResAcTypCd SEARCH_ACCT_TYPE_CODE Account RESEARCH-ACCT-TYPE-CODE Type Code Next Higher NxtHiOrg NextHiOrgCd NEXT_HIGHER_ORG_CODE Organization NEXT-HIGHER-ORG-CODE Code DATA STANDARDS: FREQUENTLY ASKED QUESTIONS Q Why should there be standards for defining and naming data elements? A Data is a valuable University resource that needs to be maintained, managed and secured, just like classroom buildings and equipment, in order for the University to conduct its business. But why introduce data standards now? In the past, data was often viewed as belonging to a single department or business application. This meant that data was not always defined or named in such a way that it could be readily understood or shared by other departments or applications. Today, customers and information systems staff alike face a critical need to be able to merge and analyze data from many different systems (e.g. financial, personnel, student) in order to make informed decisions. One way to facilitate this process is through the use of data standards, which make data more understandable and reusable by providing uniformity in how it is defined and named. Q How were the standards developed? A A team of University Information Systems (UIS) staff started with examples of industry standards for naming data elements and revised these to reflect the University environment. Initial drafts of the University standards were shared both with UIS staff and a representative group of information systems users from across campus. Revisions were made based on feedback from these groups, as well as from actual hands-on experience in applying the standards to new datasets under development. Q How will the standards help me? A For users, standards can help identify which data elements they want to use, especially when doing ad hoc queries. For example, each current employee may be associated with up to four different organizations (departments). If these were all named 'employee organization', it would be difficult to know which to use. By using standards, the data element name itself can help distinguish the different kinds of employee departments, for example, employee administrative organization, employee major instructional organization, employee statistical organization, and employee appointment organization. Some users have also already expressed an interest in incorporating these standards into their own departmental systems to facilitate the process of merging local data with corporate data, as well as sharing data with other departments. For information systems staff, standards used in conjunction with a comprehensive data dictionary can help identify where data elements are used. For example, if it becomes necessary to expand all date fields to include century, it would be easier to locate all occurrences of these fields in a comprehensive data dictionary if they all ended with 'DT'. Q How will users learn about the standards? A Initially, there will be an announcement made in the ITD Digest, summarizing the purpose of the standards and explaining how to get a copy. On an on-going basis, all UIS staff should share the standards with their customers whenever developing new data. Q Are data standards meant to limit my creativity? A No! Just as standards for writing structured application programs promote more understandable, maintainable and reusable code while still allowing for personal style, data standards are intended to foster data which is more understandable and reusable across systems. Q When do I use standards? A Standards apply to all new data elements. This means data elements in new databases (IMS or relational), new segments in existing databases, new sequential files, and references to these new data elements in copylib members and application programs. Q Are there times when I don't have to use the standards? A As mentioned above, the standards apply to new data elements. They do not apply when referencing existing data elements. In this case, simply use the existing data element name. Another situation when the new standards may not apply is if there are already naming conventions used in an existing database, file, or software package. One example of this is IBM's Application Development Facility (ADF) which has prescribed standards for naming data. Another instance is when there is already a convention used for naming data in a file or database. For example, there may be a practice in an existing IMS database that each segment has an update date field named LastUpDt, and update source field named LastUpSr. If a new segment is created in this database, the segment's update date and update source fields should follow the established naming convention for this database. Q How will the standards be enforced? A There will not be a formal process for monitoring and enforcing data element naming standards. UIS staff are expected to use these standards, however, just as they are expected to follow the UIS standard of writing structured application programs. Currently, no special 'data standards' forms are required as part of the move-to-production process, although this could change in the future when a comprehensive data dictionary is available. Q Will the standards change? A Yes, the standards will evolve over time. For example, the prime word list will expand based on future data modeling efforts and Strategic Data Planning activities within Data Administration. UIS staff should ensure that they are using the most current version of the standards by checking the UIS Standards and Guidelines manual in the section set, by obtaining a copy from the Documentation Library, or by accessing the standards on-line on the UIS public servers. Q How can I get help using the standards? A The Data Element Naming Standards document contains a Quick Reference Guide, which is intended as a 'cookbook' for getting started in applying the standards. Another useful reference is to look at samples of data dictionaries which have been developed using the new standards, such as the Data Access dictionaries which are available from Data Administration. A brief review of these dictionaries will quickly highlight the pattern that is used in defining and naming data elements. Most importantly, Data Administration staff will be working with users and information systems staff as new data elements are identified, and as part of this process, will assist in developing data element names and definitions. UIS staff are encouraged to contact Data Administration for assistance. 10/02/92