Surfing the Data: Using the World Wide Web to Make Every Client of the Data Warehouse a Data Expert



This paper was presented at the 1996 CAUSE annual conference. 
It is part of the proceedings of that conference, "Broadening 
Our Horizons: Information, Services, Technology -- 
Proceedings of the 1996 CAUSE Annual Conference," page 4-1-
1+. Permission to copy or disseminate all or part of this 
material is granted provided that the copies are not made or 
distributed for commercial advantage. To copy or disseminate 
otherwise, or to republish in any form, requires written 
permission from the author and CAUSE. For further 
information, contact CAUSE, 4840 Pearl East Circle, Suite 
302E, Boulder, CO 80301; 303-449-4430; e-mail info@cause.org.



                 SURFING THE DATA:
              USING THE WORLD WIDE WEB
              TO MAKE EVERY CLIENT OF
                THE DATA WAREHOUSE
                  A DATA EXPERT
   Charlie J. Klumpp, Financial Management Analyst
          Wm. Clifton Ivy, Systems Analyst

                 Purdue University
                  West Lafayette
                     Indiana


ABSTRACT
As Data Warehouses greatly simplify data access, the people 
directly using the data expand far beyond the traditional 
"data reporting experts".  These new data users need to learn 
about the data, and they need an authoritative source for 
information about the data and its proper use.

Metadata (data about data) need to be:

  *  Easily accessible from each user's PC, without
     interrupting tasks in progress
  *  Presented in the user's own terminology
  *  Complete enough to answer the majority of data questions
  *  Linked so well that both novice and experienced users 
     can quickly find what they need

Holding metadata centrally makes it easy to maintain, 
ensuring that everyone sees the same current information.  
The hypertext capabilities of the web allow users to hot-key 
from indices to definitions to illustrations.

This paper illustrates Purdue University's metadata 
structure, the data within that structure, and how source 
data are stored to optimize usability and maintainability.


SURFING THE DATA: USING THE WORLD WIDE WEB
TO MAKE EVERY CLIENT OF THE DATA WAREHOUSE A DATA EXPERT

When the project charter was drafted to create Purdue's 
management decision support system, the focus was the 
University's * #1 Client Request *:

     Provide more management data and make it easy to use

Analysis of Purdue's management data needs revealed:[1]

  *  Most people did not know what data were (or could be)
     available
  *  The data in the source files did not match the reporting
     needs (key data were missing or granularity not right)
  *  Data information was too broad and complex -- it took
     too long to "master the data"
  *  Mainframe reporting tools did not meet user needs;
     conversely, customers were comfortable with PC tools

Even in the initial design stage it was apparent that a 
quality management decision support system (DSS) would be 
comprised of two complementary elements:

  *  Quality data
  *  Quality metadata (information about the data).

Everyone agreed that management data need to be easy to use.  
Many potential users of management data were unfamiliar with 
the client-server concept -- but they certainly could relate 
to the client-service concept!

The DSS design team held a series of information-gathering 
sessions where clients were encouraged to identify their 
information needs and to describe how they would use 
management information.  It turned out that to clients "easy 
to use" didn't necessarily mean particular software features 
-- "easy to use" meant not having to learn another 
application -- not having to remember another password -- not 
having to deal with another icon...or window...or manual.


PARTICIPANTS IN THE PROJECT

The ten-person DSS design team was drawn from technical 
experts ( Network and ORACLE specialists, data analysts, 
programmers) and business analysts from the client side of 
the house (selected for their experience and knowledge of the 
data).  All but one of these people were committed full-time 
to the DSS project.

The initial target audience for management data was the 
academic area business managers and their support staff, and 
the managers and clerks from Purdue's central business 
office.  Five business managers from the academic areas and 
seven data experts from various central offices each spent 5-
20 hours per week helping identify, define and test the data 
and metadata.  Weekly working meetings kept all participants 
updated and provided a forum for resolution of questions and 
problems.

The participant mix reflected the proposed client base: Some 
experienced experts and some data novices.  Most importantly, 
it was (and continues to be) the users of the data who 
determine what is good, what is usable -- what is _easy_.


DESIGN CONSIDERATIONS

The team kept coming back to the issues identified in the 
customer needs survey: The data need to work for clients who 
"know what they need to know" (but not where to find the 
answers), as well as for clients who are already very 
experienced data users.

Three key factors predicated the metadata design:

  *  The metadata need to be accessible to all staff and 
     accessible at all times, including while the data are 
     being queried
  *  The metadata need to be quickly and easily maintained by 
     non-technical staff
  *  Good resource information stored in many places is not 
     very usable; staff need one place to go for business
     rules, business facts and all related business
     information

The selected alternative was to place the metadata on the 
World Wide Web (WWW).  LAN (local area network) connections 
to the University's fiber-optic network allow the information 
to be easily accessed from any work station on any of 
Purdue's campuses.  Holding the metadata centrally makes it 
easy to maintain, and ensures that everyone is simultaneously 
seeing the same current information.  Best of all for the 
clients, all their reference data are accessed _one_ way, 
using one icon.  (The web browser icon functions as the 
desktop equivalent of the public library "Reference Section" 
sign -- it identifies _one_ spot where they can find all 
sorts of useful facts, definitions, and how-to's.)


DETERMINING THE METADATA STRUCTURE

In part the metadata structure is defined by the database it 
describes.  Purdue's Decision Support System is a multi-
dimensional database of fact and dimension tables[2] which 
are presented to the clients in logical data models.

Clients need to know what data models are available, and what 
data elements are contained in each table in the model.  They 
also need to know which data tables can and should be used 
together to answer management questions.

To accommodate these needs Purdue's DSS metadata are 
organized to fit the way clients think about and use the 
data.  The "doorway" to the metadata is a DSS home page.  
This home page is the directory of all information about 
using DSS, from how to request access, training materials and 
sample queries, to a list of the DSS steering team.  This 
page also has links to the top-level DSS metadata indexes.

Due to space and printing limitations, the sample web pages 
may not be fully readable.  The web pages are available at 
"http://www.adpc.purdue.edu/DSS/".

The indexes are the Tables Index, the Fields Index by Common 
Name, and the Fields Index by Oracle Name.



Tables Index


(SCREEN SHOT MISSING IN ASCII TEXT VERSION)


This index describes the two types of tables used in multi-
dimensional modeling, Fact tables and Dimension tables, and 
lists the tables.  

Each entry contains the table name, description, and a status 
alert if the table is being modified or is still being 
developed.  The table name is a link to more detailed 
information about the table.


FIELDS INDEX BY COMMON NAME


(SCREEN SHOT MISSING IN ASCII TEXT VERSION)


This index lists the fields in alphabetic sequence by the 
common name of the field.  Each entry contains the common 
name, the name of the table in which the field is found, a 
description of the field, and a status alert if the field is 
being modified or is still being developed.  

The field name is a link to more detailed information about 
the field.  There are letter pointers to allow the client to 
instantly jump to any point in the alphabetic listing.


FIELDS INDEX BY ORACLE NAME


(SCREEN SHOT MISSING IN ASCII TEXT VERSION)


This index lists fields by the Oracle (database) name of each 
field.  These names are usually abbreviated versions of the 
common name, and this is the name displayed by the Brio query 
tool.  

The entries in this index are like those in the common name 
index, with the Oracle name substituted for the common name.  
The Oracle name links to detailed information for the field.


The indexes provide clients with the ease of an encyclopedia 
index -- without the hassles of juggling a stack of heavy 
reference books.  The World Wide Web is ideal for just such a 
presentation.  With a single mouse click clients can jump to 
a data definition, drill down for greater detail, or jump 
back to check for alternative data sources.


SURFING THE DATA

Clients typically approach the data through the metadata.  A 
client asked to check into strangely varying account balances 
might go to the index of tables, and display the information 
about the Account Balance table.  They would see:


TABLE DETAIL

This page shows all the information stored for a table, 
including description, definition, typical business uses of 
the table, location of summarized or more detailed data, 
refresh rate, table size, the staff person to contact with 
questions or comments, and security information.

Links give a list of fields, external references (a picture 
of the model), and other related tables.

Links are also on every page to jump to the table index, both 
field indexes, and the home page.


The client might then click on the "Field list for table: 
ACCOUNT BALANCE" link to see what data fields are available:


FIELDS FOR A TABLE


(SCREEN SHOT MISSING IN ASCII TEXT VERSION)


This list shows the fields in the sequence they are displayed 
by the Brio query tool.  In some cases, this is alphabetic 
sequence, but many “most used” fields were moved to the top 
of the lists to make them easier to find.
The list shows the field's common name, description, and a 
status alert.  The field name is a link to detailed 
information about the field.  


Creating and running a simple query on the Account Balance 
table shows that the balances on a few specific accounts do 
vary in a regular manner.  Returning to the metadata and the 
table detail for Account Balance, the client clicks on the 
"Supporting Dimensions" link to see what other tables contain 
related information.


RELATED TABLES


(SCREEN SHOT MISSING IN ASCII TEXT VERSION)


This page lists the other tables in DSS that contain 
logically related data.  The page is tailored by type of 
table (fact or dimension).

For each related table, the list shows the table name, 
description, and status alert.  The table name is a link to 
detailed information for that table.


This information is also available graphically by clicking on 
the "External References" link.


External References for Tables


(SCREEN SHOT MISSING IN ASCII TEXT VERSION)


This page provides a list of links to other documents that 
help describe a table.
The first use for this has been to link to a graphical 
picture of the data model for this table.


DATA MODEL

This is a typical diagram. It shows the data model for the 
Account Balance fact table.  The fact table is in the center, 
and the supporting dimension tables are arrayed around it.  
The field names shown are Account Balance fields used to join 
to the dimension tables.


Checking the related tables, the client wonders if the 
accounts whose balances vary regularly might have common 
account characteristics.  Looking at the fields in the 
Account Number table, the client notices the "Pay Method" 
field, and decides to see what information it contains.


Detailed Field Information


(SCREEN SHOT MISSING IN ASCII TEXT VERSION)


This page shows all information stored for a field, including 
description, definition, oracle name, originating document, 
whether the field is indexed, format, and source file.
This information is tailored by table, to accommodate 
differences when the same field means different things in 
different tables.

Links are provided to go to the list of all fields in this 
table, the detailed information for the table, and the common 
links to the indexes and the home page.


Seeing that this field describes the terms in which a sponsor 
transfers funds to the University, and noting a link to the 
list of payment methods, the client clicks on the link.


AD HOC LINKS


(SCREEN SHOT MISSING IN ASCII TEXT VERSION)


A link can be inserted into virtually any text description 
field within the metadata, and that link can be to any URL 
supported by the browser.

In many cases, these are to hypertext documents that describe 
code values in plain english.  Since these documents may be 
"linked-to" from various places, they contain instructions to 
use browser capability to return to the earlier page.

The client adds the Pay Method field from the Account Number 
table to the simple query that was created earlier.  Running 
the query again shows that all of the accounts in question 
have payment methods that indicate quarterly payments, and 
quarterly payments are the reason that the balances of these 
accounts to vary on a regular basis.


DECIDING WHERE TO STORE METADATA

A data base query tool selection process was conducted 
concurrent with the DSS project.  Part of that process 
examined the capability of each tool to store and display 
metadata.  None of the tools reviewed provided as broad a 
definition of metadata as it was evident was needed, and all 
of them restricted access to the metadata to their tool.  For 
these reasons, other storage and display options were 
examined.

The selected alternative placed the metadata on the World 
Wide Web (WWW).  Separating the metadata from the query tool 
allows maximum flexibility and ease of use.  The metadata can 
be viewed, paged, scrolled, and linked to answer client's 
questions -- _and can be used along with or separate from any 
query tool_.  LAN (local area network) connections to the 
University's fiber-optic network allow the information to be 
easily accessed from any workstation on any of Purdue's 
campuses.  Holding the metadata centrally makes it easy to 
maintain, and ensures that everyone is seeing the same 
current information.

The wisdom of this choice became even more apparent as the 
use of the metadata became broader than just support of DSS 
queries.  The metadata quickly became a definitive source of 
information for all data applications.  Never before had 
source file data elements been so clearly and completely 
defined, nor had data definitions been accessible to such a 
broad client base.

Purdue is on its third generation of providing metadata on 
the web.  All three embody the same design concepts, but as 
newer tools became available we were able to better realize 
the design vision.  The key consideration in moving from one 
generation to the next was to make it simpler to maintain and 
present the metadata.

In the first generation, multiple hypertext documents were 
created by doing a one-way conversion of existing word-
processing documents that described the data.  These provided 
access and redundant display, but were _never_ updated due to 
the difficulty of finding and updating all iterations of 
repeated data.

The second generation stored the source metadata in Microsoft 
Word documents, in table form.  Microsoft's Internet 
Assistant allowed us to create web documents directly in 
Word, and macros were written to create the various 
interconnected types of hypertext documents.  Six macros were 
written to create the five different type documents.  This 
solution worked quite well until the list of fields reached 
nearly 1000, and then we started hitting some Word memory 
limitations.  These macros are a good illustration of a low-
tech method of creating interconnected web pages.  Copies of 
the macros are available by contacting either of the 
presenters.


MAINTAINING THE METADATA (OR IS IT MEGADATA?)

The third generation stores the source metadata in Oracle 
tables, and uses Oracle's Web Server tools to gather and 
present the data as the client clicks on a link.
Each of the pages shown above are created by a stored Oracle 
procedure that combines a query with Hypertext Markup 
Language (HTML) formatting.  The query can include any data 
from any Oracle table, although these are limited to the 
tables of metadata information.  Formatting can include 
boilerplate paragraphs in combination with metadata from the 
Oracle tables.  The metadata formatting creates the links to 
interconnect the pages.  


UPDATE THE ORACLE METADATA TABLES

The data in the Oracle tables are maintained by a Microsoft 
Access update application.  MS Access attaches to the Oracle 
tables through an Open Database Connectivity (ODBC) 
connection, and uses Access forms to simplify the 
presentation of the metadata in an updateable format.  
Audits and edits within the forms enforce data integrity, and 
make it easy to enter the correct information.

Access to data that are changed by the MS Access update 
application is as quick as clicking the "reload" button on 
your browser.


SO MUCH MORE THAN JUST A DICTIONARY....

As it turned out, Purdue's Decision Support System metadata 
has become much more than just a dictionary or encyclopedia 
to support management reporting from the DSS database.

The information stored in Purdue's metadata is every bit as 
valuable and useful as the management data it describes.  DSS 
metadata can be profitably used by clients who do not even 
access the DSS system.  In many instances, this is the only 
public-accessible source for complex business rules and 
plain-english interpretations of cryptic codes.  For example, 
Purdue's staff position codes are an intimidating melange of 
numbers and letters which defy interpretation by even 
seasoned veterans.  By using the DSS metadata, clients can 
quickly and easily discover how to determine a staff member's 
rank, voting status or tenure status -- an effort which 
previously required an extensive apprenticeship in arcane 
decryption techniques.  (The sort of stuff that has never 
been written down -- or at least not published!)

So what is this batch of information we call Purdue's DSS 
metadata?  Well, it certainly is a dictionary or encyclopedia 
of data about data...and of Purdue's business rules....with 
dynamic links which allow drilling down for greater detail or 
exploring related topics.  

Perhaps best of all, the DSS metadata have been discovered to 
be a wonderful training tool.  The way we approach teaching 
Purdue staff what data are available and how to use the data 
begins with the metadata.[3] Time and again we hear, "I 
didn't know we had that information" and "So _that's_ what 
that means! " -- oftentimes, from staff who have been trying 
(with limited success) to use these data for years.  The 
first groups of clients to be trained devised training for 
all the groups which followed them.[4] In every case, 
training begins with and is centered around the metadata.  We 
regard this as a strong validation of the metadata concept 
and design -- and acknowledge that the combined efforts of 
the DSS team and our very hard-working clients has resulted 
in a more useful product than we had ever envisioned.

The keystone to the amazingly rapid acceptance and 
utilization of Purdue's management decision support data has 
been accessibility and ease of use.[5] Using the World Wide 
Web as the delivery mechanism provides staff with a 
continuously current, homogenous, indexed, linked business 
resource in the same place and accessible in the same way as 
their other business office reference tools: Dictionaries, 
thesari, the Indiana Statutes, federal laws and regulations -
- all are accessible using the same easy-to-use tool.  Each 
reference source is a bookmark, and the DSS metadata 
bookmarks are among the most-utilized references in Purdue's 
electronic library.


=============================================================

ENDNOTES

[1] Purdue's Administrative Computing Master Plan identifies 
data access as the University’s #1 management need.  In 
October of 1993, 80 interviews were conducted, involving 116 
managers representing all areas of the University.  The 
results of that survey determined which data should be 
provided, to whom, and how the data were needed.  (Access to 
Data Problem Summary, Wilson, _et al._, 1993)

[2] Fact tables are tables of recurring business events, such 
as Accounting Transactions or Payroll Charges. The Data 
Models listed in Brio are named for the Fact table which is 
central to that model. 
Dimension tables are tables of multiple-use definitional 
information, such as Account Numbers. The same Account Number 
information is useful whether you are looking at accounting 
transactions, payroll charges, or account balances. These 
tables are included on Brio Data Models wherever they apply. 

[3] Training prerequisites include mastery of Windows; 
training in use of BRIO, Purdue’s selected query tool; 
familiarity with use of the Internet, and ability to utilize 
file transfer protocols.  Training classes for prerequisite 
skills are offered centrally and by training specialists in 
the various areas and schools.

[4] Roll out of Purdue’s DSS and associated metadata began in 
August of 1996.  After only three months there were 212 staff 
trained and operating as active DSS clients, with all 
training sessions through December fully enrolled.  (As of 
publication date, January enrollment had not been opened.)

[5] Within the first three months of deployment there were 
active DSS clients in every business and academic area of the 
university, and on each regional campus.