Information Architecture -- Bring the University's Information Inventory Under Control

This paper was presented at the 1997 CAUSE annual conference and is part of the conference proceedings, "The Information Profession and the Information Professional," published online by CAUSE. The paper content is the intellectual property of the author. Permission to print out copies of this paper is granted provided that the copies are not made or distributed for commercial advantage and the source is acknowledged. To copy or disseminate otherwise, or to republish in any form, print or electronic, requires written permission from the author and CAUSE. For further information, contact CAUSE at 303-449-4430 or send e-mail to [email protected].

Information Architecture
Bring the University's Information Inventory Under Control

Laura Cisneros
Data Administrator
University of Arizona

David Hunt
Data Administrator
Center for Computer and Information Technology
University of Arizona

Donald McCollam
Senior Applications Systems Analyst
University of Arizona

Abstract - The University of Arizona recognizes data as an asset. This paper covers the University's efforts to manage institutional data and the events that led to developing an Information Architecture (IA). This paper will explore what an Information Architecture is; the process for developing the IA; the arguments for doing an IA; the benefits of an IA; and, issues that have surfaced during the course of this project.

Introduction

In 1994 the University of Arizona engaged in a comprehensive assessment of its current information technology environment and identified its major goals in a plan that looked to the twenty-first century. That plan, "Strategic Directions for the Year 2000" focused on the development of an integrated information environment. This included integrating the operational systems, developing information systems (i.e., a data warehouse), and consolidating redundant business processes. Essentially this shift, from a process based systems environment to a data centric information environment, meant a change in our historical approach to system development. No longer would systems be purchased or developed in isolation. Collaboration would be the cornerstone of integrated data, consolidating business units, and refining data into information.

The effort to integrate data began with the formation of a working group of University Data Stewards. Before data could be integrated some understanding of what data were essential to the operation of the university was needed. The Data Stewards Work Group began by exploring the discipline of data administration. While learning about data administration they began establishing some of the fundamental tools that would support their efforts. To-date they:

adopted data naming standards
established standard abbreviations
wrote an employee data access policy
formulated data access rules
compiled a glossary of data and computing terms
examined methods for identifying, collecting and documenting business rules
started exploring the issues and problems surrounding data integrity.

The Data Stewards Work Group needed a way to store the inventoried data. They considered the products that were available but found none that provided the functionality that they were looking for. The decision was made to develop an application in-house. The application that was developed utilized the latest technologies: object oriented language; relational database; client server; graphical user interface; and, more recently has been converted to a Web application.

The IDC contains entities and entity definitions; entity attributes and attribute definitions; alias information about each attribute (i.e., where it's stored, how it's formatted, how it's named); data steward names and contact points; administrative system names and descriptions; and, rules that control the access of certain pieces of data (i.e., "student grade"; is classified as "limited access" data), and other useful information. The IDC can be accessed through this URL: http://da.arizona.edu/ds

In 1995, during the time the Data Stewards Work Group worked on their tasks, an RFP was written seeking professional development strategies in Business Process Engineering and Information Architecture development. The contract was awarded to Texas Instruments Inc. The University formed two teams; one to learn Business Process Engineering techniques and one to learn the to develop an Information Architecture (IA).

A team comprised of data analysts from the Computer Center and representatives from the Data Stewards Work Group met with a consultant from Texas Instrument to develop an IA for the University.

What is an Information Architecture?

An IA is a comprehensive view of the business activities of the enterprise and the data required for its operation. An IA is comprised of logical data models based on the activities which have been confirmed as a complete, correct, and stable statement of the business as it currently operates and is likely to operate in the foreseeable future. The IA is developed independent of organizational structure and technology trends.

How is an Information Architecture Developed?

An IA is developed through interviews with University employees who are most knowledgeable about a particular business area. Typically two to three employees from a business area meet with two IA modelers. A minimum of two meetings are scheduled with each business unit, one for conducting the interviews and one for verifying the accuracy of the information collected. More than two sessions may be required in more complex business situations. The steps for developing an IA are:

Through the interviews the modelers identify the subject areas important to the business unit. These subject areas are called "entities". An entity is loosely defined as anything that the University is interested in. It can be a concept, person, place, event, or thing.
An Entity Relationship Diagram (ERD) for each business activity is drawn. This models the relationship that exists between entities as well as the cardinallity (i.e., occurs once and only once, occurs one or more times), and optionallity (i.e., may occur, must occur), of the relationship. [Note: At this time we are using LBMS�s System Engineer data modeling tool to draw the logical data models.]
The attributes or descriptive characteristics of each entity are identified.
A Data Item Set Diagram (DIS) for each super entity is developed. The DIS exhibits the super entity and it�s subtypes and their attributes.
Every component of the IA (activities and data) is defined to eliminate duplication and erroneous assumptions.
Finally, the entity and it�s attributes are added to the Institutional Data Catalog.

The IA identifies the salient activities and data once and only once. It�s important to continually review and analyze the activities and data as they are collected to prevent duplication.

Why do an Information Architecture?

Oceans of data - We are essentially drowning in an ocean of data. But while the quantity of data is ever-increasing, the availability of reliable, accurate information (esp. for sound business decisions) seems to be increasingly suspect. The solution is to control the rate of acquisition of data. An institution can not effectively do that unless they have an "inventory" or "catalog" of institutional data.

Duplication of data - Another dimension to the information inventory problem involves duplication of data across organizational and political boundaries. Duplication of data spawns many secondary evils. When multiple agencies collect the same data, there is duplication of effort. Data is collected in an inconsistent manner and errors become built-in. Reconciliation of data for decision support or reporting purposes becomes difficult and in many cases impossible. Data duplication has a high cost that few public institutions can afford.

Information redundancy - Redundant information is a direct result of duplicated database development efforts within individual organizational units. Information repositories containing redundant data will result when there is no overall information architecture in place to orchestrate development.

To achieve a general understanding - Understanding the processes and data needed to fulfill the University's mission seems on the surface a simple thing. However, the tendency is to think of the institution in terms of it�s organization and political structure, both of which are complex and ever changing. The IA provides a view of the business that does not change as the infrastructure changes. Unless the business of higher education changes the IA will change little over time. What we come to understand today will be true well into the future.

Institutions require reliable data for decision making - Institutions compete for students, research dollars, private and public donations, and faculty. This requires current accurate information.

Insure compliance with regulatory reporting - Funding from the Federal and State governments for research, student financial aid, and operations are based on reported data. Inaccurate reporting can result in reduced funding or penalties being assessed against the institution.

Information is an important and expensive corporate asset - For all of the reasons stated above, shouldn't we employ more techniques to improve the quality of data and reduce data redundancy?

Benefits of an Information Architecture

Facilitates integration of systems, processes, data, and information
Documents processes and data in a central repository
Supports data control, data management, and data inventory functions
Increases the understanding of the business and promotes a common data vocabulary
Establishes data as a corporate asset
Identifies redundant data and processes
Documents the most elementary business rules

Issues

Long-term project in a quick-fix culture - Changing from process-based systems to a data centric information environment requires a major shift in system development strategies. While technology increasingly supports rapid application development the IA is affecting cultural, political, and organizational changes, all of which occur gradually.

The "not invented here" mind-set -. Application development teams are producing their own logical models rather than using those already in existence. They either don�t trust or understand the work that�s already been done. As the IA proves itself to be a true and complete statement of the business activities and data from which to base integration, and as more people understand and use the IA as it was intended, this issue will lessen.

The transformation of a logical model to a physical model - Only practical experience in this area will allow us to fully actualize the process. At times the logical model has elements of a physical model and the physical has elements of the logical. This in itself creates confusion and mistrust in the IA and the value of the logical data model. Clearly establishing the scope of each of these views will reduce duplication of effort and permit the completion of the cycle from logical to physical and back to logical.

Maintaining the IA over time - Procedures need to be put in place for maintaining the IA as new data and information requirements are established. The success of this will depend on our ability to market the IA to the campus.

Education - Educating the campus on the purpose and use of an IA is our greatest challenge. Continued use of the historical system development cycle will produce the same results; unplanned redundant data and duplication of effort across the organization. Integrated systems may result, integrated data and processes will not.

Strategies and Plans

Early Involvement with New System Development Initiatives - The fruits of the labor of an information architecture project pay off when IA becomes a part of the organization's computing strategy. The IA needs to be seen as the first stop on the development path or a measuring stick for any purchased application software.

Make the IDC Available over the Web - The Institutional Data Catalog can best serve the widest audience if it is a Web-based (update and query) application. Not only can the Data Stewards maintain their data via the Web using a skinny client but this provides a link for other data initiatives interested in using the definitions contained in the IDC.

Identify and Catalog Business Rules - Business rules are defined as "collection(s) of specific rules or business policies that govern the enterprise's behavior. Since these rules govern changes in the state of an enterprise, they translate directly into updating rules for its databases". A rule processor from Pinnacle Software Corporation is being evaluated to determine if it can reasonably provide a central store for business rules. The IA defines the relationship between entities. These relationships are the most elementary business rules.

Make Code Tables Accessible - Identify and make available code tables from the transactional systems through either hot links on a web site or within an appropriate data warehouse.

Build a University Vocabulary for Data - It makes life much easier when common computer and data related terms are defined and when those definitions are more or less commonly agreed upon. Toward that end, the data catalog has a glossary section composed of computing and data related terms, easily accessible on the WWW so that they can be shared and critiqued as necessary. Defining terms is a cooperative effort involving diverse populations working together.

Complete the Information Architecture - It is our intention to fulfill our charter, "to document the Information Architecture of The University of Arizona".