Data management has always been a challenge — for individuals, for small businesses, for big enterprises, and particularly for large, decentralized organizations like higher education institutions. The discipline of data management is not new — it started way back as records management, when paper files and folders were the data collection medium of choice. Unfortunately, legacy approaches based on paper often remain in place today for campus information, even when converted to electronic data formats and processes. At Berkeley, we continue to maintain significant portions of our employee records in paper format, in the form of the traditional personnel file, even though we have fully automated our human resources management systems.
In spite of higher education's legacy of paper, the trend is clear: the continuing automation of business processes will inexorably move virtually all data from paper to far more flexible digital objects. Notably, electronic data won't be limited to servers on your local campus. Thousands of options for data storage and digital search and retrieval services will emerge as part of the explosion of cloud computing services that allow unlimited storage in systems where physical location is no longer a driving concern. This also means our past data management techniques to store, protect, and retrieve information that were based on the physical paradigms of the file cabinet, or their virtual equivalent of local computing systems, are no longer viable. Even the ubiquitous electronic "file and folder" paradigm, imagined by Xerox PARC more than 40 years ago and incorporated into our everyday lives thanks to Apple and Microsoft, is under assault from new cloud–based models that will require fresh thinking on how to manage your institutional information.
Start with a Data Management Plan in Place
As most campuses have already recognized, cloud computing has many benefits, and we will want to position ourselves to take advantage of those benefits. Nevertheless, any plan to move into the cloud should start with a data management plan, not by creating something new or customized for cloud environments but rather by modifying your existing internal data management policies and plans. If you don't have an institutional data management plan already in place, I suspect you are very aware of how difficult it is to manage and maintain your institutional data in a consistent, easily accessible, and secure fashion. Unfortunately, moving to the cloud will simply amplify those challenges. Taking the time to create a data management plan now, before you broadly embrace cloud services, will be well worth the effort. It is much easier to handle complex data management issues with clear, well-established, and consistently followed guidelines before you layer on the additional challenges of the cloud. Many good roadmaps for creating data management plans are available online, but let's start by reviewing six of the most common data management standard practices in use today.
- Institutional Data Dictionary: The dictionary is critical to the ability to manage and analyze data across systems. It should include agreed-upon definitions of the key data elements that are in widespread use across the institution, as well as those elements required for compliance with state and federal laws or policies.
- Data Quality Review Process: A procedure for the review, assessment, and resolution of conflicts within the data sets needs to be in place. This process will require the existence of, and management by, a strong data governance function (see below).
- Data Access Model: This component should address questions like: Who can have access to the data? What level of granularity is required to access a specific data element — one profile for all or one profile per element? Who should have read and edit rights to the data? Who has authority to grant the read/write permissions? How often must all access rights be reviewed or renewed? These questions, and many others, may be easily answered within small local systems but become increasingly complex when considered at the institutional level.
- Data Security Model: Beyond individual access, this component of the plan should identify the overall security expectations for the data. Is the data considered restricted? Is it covered by policies or laws mandating special protections? Under what conditions are electronic interfaces available to other systems, and what security requirements are in place for those ancillary systems? A good data security model and related policies should address these and other data security and privacy concerns.
- Data Life Cycle and Preservation: Effective institutional data management requires detailed understanding of the data element life cycle. What retention period is needed for each element? Are there specific cycles for snapshots or required retention windows? What about archiving or backups? Many members of a typical higher education institution have the expectation that digital assets are stored "forever," necessitating a data management strategy that includes clear documentation about retention and related issues.
- Data Governance: The key to bringing together a cohesive data management plan is the existence of an oversight body that has campus-wide responsibility to develop, implement, and change the rules, policy, ownership delegation, etc., of all aspects of institutional data. This group must have clear authority to make final decisions to resolve differences of opinion regarding the five data management components listed above.
If you are not fortunate enough to have all six elements of a data management strategy in place, don't expect to get there overnight. If done right, developing the structure will take time, and it must involve your most senior campus leaders. It is not uncommon that when first presented with the problems and opportunities related to data management, senior leadership will perceive the challenge to be a technical one. Even when the CIO is closely involved or is the institutional leader appointed to spearhead a data management initiative, it is critical to frame the issue not as a technical one but as one of institutional resource and business process governance. The management of data might be achieved through technology, but the issues most in need of process oversight involve critical institutional decision making and operations.
Data Management in the Cloud
After your data management structure is established and operating well, you are ready to take on the new frontier of data management in the cloud. Let's start by considering how moving a typical mid-tier enterprise application to a software as a service (SaaS) offering in the cloud might work. Take, for example, student prospect management that contains sensitive data. Today, your institution likely receives data files from third-party vendors containing profile information about prospective students, including contact data, regional demographic data, ethnicity or gender, and possibly high school or transfer school detail. You may then add additional data elements related to recruiting plans, outreach targeting, alumni matching, or other relevant data drawn from other campus data sets. The department that manages this data follows the spirit and intent of your campus data management plan and agrees to protect the information: it guards the data closely, likely on a "secure" server under a desk in a locked office with access limited to a select few individuals. The solution works well but is aging and doesn't meet the growing demand for new, more sophisticated analytic tools and features.
The business owner of this system is approached by a third-party vendor with a new SaaS solution that offers fantastic new features, high performance, low cost, and all of it available over the web. No infrastructure required, just purchase it, log in, and start using it. And the vendor assures the business owner that their system can accommodate the integration of other data sets from both campus and external sources.
This all sounds great, right? But what happens to your data management plan in this new environment? The first thing to remember is that a good data management plan always follows the data. Regardless of the data location, the server platform, or the application itself, you need to maintain your ability to adapt and enforce your data plan in any environment, whether that environment is on campus or in the cloud.
If you have developed your data management plan well, most elements should continue to work with minimal modification. However, you will probably need some amendments to each of the basic data management practices to address some of the issues that could arise from moving your data to the cloud.
One particularly significant challenge might be the data dictionary. Cloud services aren't necessarily different from your in-house architecture. However, unlike locally developed solutions where you can decide which data dictionary will take precedence, most cloud providers will not allow changes to their product's core data dictionaries. That will require you to have a local campus data dictionary with the ability to automatically integrate with cloud provider offerings. Given the scarcity of standards on data dictionaries, you might need to plan on a simple XML feed and do manual mapping from there. Make sure you map differences into your local data definition tool so that you can keep straight which definitions you are agreeing to use unaltered and which must be mapped to your campus definitions.
The challenge for moving to a SaaS provider also involves basic change management: the pace of new options becoming available, including data feeds from other sources, will likely exceed the former norm, when the system was hosted locally. A provider may offer additional data elements with little warning, seeing a greater solutions benefit in doing so. However, would you allow anyone to just add new data elements to your solution without a change management plan? Even mashups are done as extensions to core data sets rather than augmentations of the source data. Remember, you will need a process for deciding whether to take advantage of any additional data, just as you will need to consider whether you already have that data elsewhere, who uses it, and whether accepting the new data offered will create data conflict or quality issues.
Access and security model issues are often cited as reasons not to adopt cloud or SaaS services, under the premise that somehow SaaS offerings are inherently less secure because the servers are not physically located on your premises. I fundamentally disagree with this assessment, although that's not a popular position to take with IT departments. Many cloud service providers invest substantially more resources in security to protect their clients' data than we can do locally in our increasingly constrained resource environments. This is where the need for good contracts that include the requirements of your data management security and access practices come in. This is a broad topic so I will not go into further detail here, but if you are interested in more depth, I highly recommend an article by Tom Trappler on cloud computing contract issues in the no. 2 1010 EQ.
Another particularly problematic area that threatens to become more challenging in a cloud environment is data life cycle and retention. Beyond the general-purpose SaaS providers, increasing numbers of dedicated SaaS applications promise to copy and protect your data, often with such extravagant retention claims as "forever." While that might sound enticing, do you really want your data retained forever (or until that company goes out of business — whichever comes first)? In some cases where extremely long retention periods are desired (like scholarly publications), you should take extra precautions to ensure the service in question will be able to back up its longevity claims. Many SaaS providers have simply gone out of business with no notice and little ability to quickly retrieve your important information. Regardless of contractual commitments, you should retain a local copy of all data hosted by a SaaS provider. Where possible, you should also get copies of the data schema and metadata information about the SaaS application, along with an escrow agreement for the source code in the event of provider bankruptcy. It won't prevent problems if the provider fails to deliver, but it will provide an increased ability to recover.
The final consideration concerns your data governance group itself. While traditionally focused on internally controllable aspects of data management, the group may need to add a senior contracts administrator or an attorney as more and more data moves beyond the boundaries of the institution. While you might not have regular issues with cloud providers today, they will probably multiply as cloud services become increasingly customized and targeted to meet the needs of higher education.
In the next column, I will discuss some innovative ventures and specific progress occurring in the development of above-campus services by higher education for higher education.
© 2010 Shelton M. Waggener. The text of this article is licensed under the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 license.