In the summer of 2003, members of Purdue University’s School of Technology decided to produce 90 hours of HDTV-quality instructional video as part of a distance-learning initiative. To fully exploit the power of the video medium and the added resolution of high-definition television, the team used virtual-set and advanced computer-graphics compositing techniques, which allow multiple layers of video to be shown within the same frame as the instructor. Figure 1 shows a sample frame from a lesson that employs virtual-set technology, live video, and computer-generated images to teach linear algebra concepts.
Figure 1,Click image for larger view.
Purdue University offers degree programs at its main campus in West Lafayette, Indiana, and at numerous smaller, regional campuses. Distance learning was identified as a vehicle for delivering consistent instruction across all of the sites. My team was charged with creating the video content.
While video is one the most powerful communication media, most distance-learning initiatives do not maximize its instructional impact. A cursory look at such initiatives finds that most use a combination of video of the instructor and synchronized presentations of PowerPoint slides. Dastbaz1 found that this approach is neither effective nor efficient. The ineffectiveness persists even when following recommendations such as those provided by Howell and Morrow2 for using PowerPoint in the classroom.
The most effective use of instructional video involves showing animations and moving illustrations of the concepts presented, while the instructor conducts the lecture. For each instructional objective presented in a lecture, the design team must ask, "What is the most effective way to communicate this objective to the audience using video as a medium?" Seldom is the answer to this question "a PowerPoint slide" or "video of the instructor writing on the overhead projector." Creating effective distance-learning instruction carries with it the same requirements as creating high-quality instructional television shows such as those shown on the Public Broadcasting Service series NOVA or the Discovery Channel.
This level of production is rarely seen in distance learning because of its high cost and lengthy production times. Television producers employ the help of editors, animators, scriptwriters, and compositors to produce content. Instructional designers, on the other hand, are primarily taught to systematically structure instruction. They are typically skilled at conducting needs assessments, defining objectives, writing assessment items, and carrying out a plethora of other instructional design–related tasks in accordance with an instructional model.
In television production, the audience is typically much larger than in a distance-learning scenario. The net effect is that television production can achieve a relatively low cost per unit, even if the total production cost is high. The cost of distance learning typically cannot be amortized to this extent.
To produce instructional materials at a pedagogical and aesthetic level that maximized their effectiveness, we needed to find a way to structure the lessons in accordance with the cannons of instructional design. We also needed to produce the content in a manner consistent with television production at a relatively a low cost.
The project was assigned a total budget of $15,000 and a staff of six people. The project manager had a background in instructional design and video production. While the instructors who worked with us were experts in their fields, many did not have formal instructional-design experience. The initial project involved three instructors, who produced three courses composed of 30 lectures each. The project manager helped them create an instructional map that served as a basis for the script used during the production phase of the project.
Three members of the team were assigned to video compositing. Their primary responsibility was to use compositing tools such as Adobe After Effects or Discreet Combustion to create two-dimensional animations. The final two team members were responsible for creating three-dimensional animations and graphics using 3D Studio Max and AliasWavefront Maya.
To orchestrate the interaction of the media assets, we created a television-style script. The primary purpose of this script was to guide the media-creation efforts of the production team while minimizing the amount of instructor intervention.
For each course produced, we filmed each instructor while he lectured in a traditional classroom. From this video we created a script of the instructor’s lesson using speech recognition software. The project manager identified instructional objectives, media requirements, and questions asked by the instructor that could be turned into embedded assessment items.
The script was e-mailed to each instructor for approval. After iterative changes and final approval from the instructor for the course, the script was distributed to the animators and compositors, who started to create media assets. PowerPoint slides of the lecture notes and illustrations were created in anticipation of the production process. These slides would serve as placeholders during the production phase, to be replaced with broadcast-quality graphics during the postproduction phase.
With the script and slides prepared, we filmed the video of each of the instructors to be used in the final product. We shot the videos in an office that we converted into a small studio. The entire room was painted blue or covered with blue material to allow us to replace the instructor’s environment digitally.
Using the chromakeying process, we removed the blue environment around the instructor in the video and replaced it with computer-generated graphics. The net result was that we could make the instructor appear to be in any location we selected. We could also make it look as if the instructor were interacting with his created environment.
We shot the videos using a JVC GR-HD1 camera because it allowed us to capture high-definition video relatively inexpensively at a resolution of 1,280 × 780 pixels. Standard television is typically digitized at 720 × 480 or 640 × 480 pixels. Thus, with this camera we achieved a considerable increase in resolution at a relatively low price point—less than $3,500.
The camera also had disadvantages. It captures video using a customized version of MPEG-II at approximately 20 megabits per second, sampled at 4:1:1. The chrominance portion of the video signal was getting sampled at half the frequency of the luminance. From these specifications, we expected our chromakeying process to be more difficult than if we had used a format such as Digital BetaCam or DVCPRO50. We found that by using software that up-samples the video to 4:2:2 before producing the mattes, we achieved respectable results.
We had the instructor being videotaped refer to the PowerPoint slides and used Camtasia to capture the computer’s desktop to a digital video file. The instructor used a Wacom tablet to make notes while lecturing, which provided a more natural feel for writing notes than using a mouse.
This scenario generated two video files. One showed the instructor in the blue environment. The second included PowerPoint slides and software demonstrations captured by Camtasia from the instructor’s computer.
To facilitate editing, we shot everything in a single take. Thus, the two videos were synchronized in time. This yielded tremendous savings when it came to editing the content. The media producers could reference the live video to see the instructor’s actions while also looking at the Camtasia video to observe any drawings or illustrations the instructor created. The net result was that the media creators seldom had to consult the instructor after the video session.
The editing and compositing phase of this project turned out to be the largest challenge in terms of expenses for labor and equipment. Typically, distance-learning initiatives do not spend resources on compositing computer-generated images with live video because it takes a relatively large team of people with specialized skills and equipment. We needed to find a way to use this technology to produce 90 hours of finished lectures in 120 days and within budget.
We had the correct balance of skills in the personnel involved in the project, but we lacked the equipment for our animators and compositors to work at a rate fast enough to meet our deadline. We had access to real-time editing packages, including Adobe Premiere Pro, Sonic Foundry’s Vegas Video, and Apple’s Final Cut—but these packages did not offer the capabilities we needed. Real-time editors excel at working with long-format videos with relatively few layers. We needed something that would allow us to stack a dozen layers of video in real time.
Before looking at some of the hardware-assisted compositors, we did a few tests to gauge our production times. Our test involved compositing a 10-minute segment in the software packages that we had available, which included Final Cut, Premiere, Vegas Video, and After Effects. The shortest production time that we could achieve was 50 hours to composite our 10-minute test. This was unacceptable.
The main thing that slowed our production pipeline was synchronizing the video layers with the audio. Every time the animators tried to preview the instructor’s work, the computer loaded up those layers and rendered them into RAM. This took an incredible amount of seat time for our compositors. We had no choice but to consider using a hardware-assisted solution.
We first considered tying our production efforts into the university’s video efforts because that would allow us to consider hardware solutions otherwise beyond our budgetary constraints, such as systems offered by Discreet or Quantel. Both of these companies offer solutions that can handle tons of layers at a resolution up to 2,048 × 1,556 pixels in real time. So how many layers is "tons of layers"? Well, it depends on the effects applied. In something simple, 30 to 40 layers are not uncommon. On the low end, solutions in this strata cost approximately $100,000. On the high end, the prices approach $1 million. No university-wide plan existed for deploying this type of equipment in time for it to be useful to our team. Thus, tying our efforts to the university’s video efforts was not an option.
Next, we considered using a hardware-assisted hybrid editor/compositor from Media 100 called 844/X. This device can play eight layers of video in real time. Using advanced render caching techniques, it can give the operator the illusion of rendering many layers simultaneously. The final price on this solution ranged from $20,000 to $60,000. Again, this option was not available to us without external budgetary assistance. If we had had the funds, this would have been the solution we selected.
Unable to get a turnkey hardware compositor, we turned our attention to reducing the bottlenecks that plague software solutions. Video compositors are expensive because they have to move lots of data and perform image processing operations extremely fast. If we have a total of 10 layers of uncompressed video, each at 720 × 480 pixels, our system would have to move 270 megabytes per second just to load the data into RAM.
Our first idea was to find a way to deliver as much data to the CPU in our system in as little time as possible by using a very large RAM disk. Then we could use a tool like Vegas Video or After Effects and get a huge performance upgrade. We evaluated a Tyan Thunder GC-HE motherboard with 12 gigabytes of RAM, expandable to 24 GB, and four Intel Xeon processors. We set up 10 GB as a RAM disk and moved the video files onto it.
The performance was astonishing. In After Effects and Combustion we achieved almost real-time results. The real boost in speed came when we used Vegas Video. We threw more than 50 layers at our system, and it responded in real time. Thus, for under $5,000 we were getting the type of performance associated with a $100,000 system. We edited the video in 11-minute segments in order to get the video files to fit into the RAM disk.
We also considered using a real-time editing card, such as those offered by Canopus and Matrox. The limitations with the cards we examined—Matrox RT.X100 and Canopus DV Storm 2 SE—were that they did not support real-time compositing effects at HD resolution and only accelerated specific software packages. Both cards accelerated Adobe Premiere, however, and up-sampled the video to 4:2:2 to perform hardware keying. This was significant because these cards could greatly reduce the cost of performing chromakeying operations, which were required to create the illusion of a virtual set.
After evaluating these solutions, we purchased a Matrox card, knowing that its usefulness would be restricted to the chromakeying portion of our process. With a cost of less than $1,000 for the card, this trade-off was acceptable.
For real-time compositing of HD video, we decided to purchase Aspect HD. This product allowed us to edit up to four layers of HD-quality video in real time. To operate under these constraints, we made some changes to our production process that resulted in fewer simultaneous layers of video. Each major on-screen element was composited separately and kept to four layers or less. These elements were then prerendered and brought into the final composition as a single layer. Thus, the final composition could have tens of layers, but only four of them would be in an unrendered state. We also decided to employ the RAM disk to assist in the compositing of elements that exceeded the limitation of Cineform HD.
To save on some of our expenses, we did not go with the Tyan motherboard. We found a Super Micro motherboard at one third of the cost that provided up to 6 GB of RAM. Any effects that required more than six layers were composited in After Effects and Combustion on the machine with the 6 GB of RAM. Those assets were then passed over to the editor working on the machine with the Matrox RTX.100 or the editor working with Aspect HD. They still required rendering at the end of the process, however.
In the end, this solution allowed us to achieve the real-time response often associated with systems costing hundreds of thousands of dollars—for less than $6,000. Figure 2 illustrates the process employed in this project.
Figure 2,Click image for larger view.
Storage and Networking
Our network and storage infrastructure was not adequate for the project. For each hour of finished video, we produced approximately 30 GB, which transferred extremely slowly over our 100-megabit network. We alleviated the problem by placing gigabit network cards into our production machines and assembling a network attached storage (NAS) device composed of 12 250-GB drives.
Typically, our IT department does not use home-brewed hardware because the cost of maintenance usually exceeds that of turnkey solutions. In this case, we made a compelling argument to secure permission to build our own NAS. Compared to the cost of a 3-terabyte, turnkey NAS, we saved more than $10,000 without sacrificing performance or ease of maintenance. The combination of the gigabit Ethernet solution and the home-brewed NAS reduced idle time for our animators by a factor of 11.
This project demonstrates that, using instructional design practices and high-end computer graphics compositing, we can produce instructional content that is not only pedagogically effective but also maximizes the power of video. The cost of producing graphics-rich video can be reduced by using a structured production pipeline and strategically employing the latest technological advances.
Based on our experiences, we would recommend the following to anyone undertaking a similar project:
- Use a process that minimizes the time required from the instructor. Instructors will be more willing to participate if they do not have to create media assets or spend time guiding the creation of those assets.
- Implement a process that ensures an instructionally effective script. In our case, this involved videotaping the instructor twice and having an instructional designer review the script.
- Use multi-layer video instead of 3D graphics whenever possible. This will reduce the production time.
- Strategically select and use contemporary video and computer peripherals. Often current consumer versions of video gear can outperform and cost less than professional video gear produced a year or two ago.
Using these techniques, it is possible to generate distance-learning content that is instructionally effective and maximizes the effectiveness of the video.