Telepresence in Education:

Building the Universal University

By Ramesh Jain

Sequence: Volume 32, Number 3
Release Date: May/June 1997

The ongoing revolution in computing infrastructure is having a profound effect on the entire process of education. Not only is technology being used to extend educational opportunities to populations and communities previously underserved, but it is also transforming the way and form in which learners are absorbing knowledge. Some fundamental aspects and motivations remain unchanged, however. Most of "mass" education is still built around the notion of a classroom and a laboratory. Discussion groups do play an important role in education and books are still the primary source of information. But technological innovations are rapidly revolutionizing our traditional methods of education, transforming and enhancing these basic notions of classrooms, laboratories, discussion groups and books.

Multiple Perspective Interactive Video

Interactive TV and video-on-demand have been highly touted features of multimedia systems over the past few years.

The current popularity of virtual reality and cyber-communities can be attributed largely to their strong interactive component. A powerful interactive environment can not only give viewers a feeling of being present at the event, but also the ability to view the objects and events of interest from desired viewpoints. It will be soon possible to provide highly interactive, real, immersive environments that will impart the feeling of telepresence. In fact, in addition to the feeling of being present, in many cases it will be possible for users to extract information of interest and see other information related to events, such as the opinions of other professors on the same topic, or similar moves by other dancers.

A good example of flexible interactive environments are video games. Though a player is immersed in a "virtual environment," he has freedom to interact with the environment through an avatar. It is this interactivity that makes video games so appealing. Even in their early days when the quality of graphics was very primitive, interactivity popularized and motivated video game sales. Compare the interactivity offered by video-on-demand and interactive movies with the interactivity offered by video games such as Super Mario on Nintendo 64. Clearly the former is limited to very simple "branching" conditions at fixed points, whereas the interactivity offered in video games enables the freedom to act at any point in time and space.


Telephony allowed us to hear and interact with people far away; television brought us pictures from a distance. By combining the interactivity of telephones, the visual environments of television, the virtual environments of cyberworld, and new kinds of information systems, it is now feasible to develop telepresence systems. This presence can be as simple as a videophone, but today's technology can deliver much more. It may be possible for us to see from a distance an event that is taking place far away and interact in that environment. The technology to implement telepresence using audio and video is almost here; incorporating tactile sensors in this environment is around the corner; and touch and smell are a distinct possibility in the early part of the coming century.

Content-Based Interactivity

In systems with a large volume of data, content-based interactivity is not only desirable, it is essential. As the amount of information grows, the human ability to remember correct information sources becomes overloaded and begins to fail. The success of databases is largely attributable to their ability to allow access to their content, based on the queries related to specific aspects of that content. On the World Wide Web also, search engines have played a major role in granting easier access to textual information. Currently, commercial tools to provide content-based access to visual information are in their early infancy.

A video or a television event can be considered a vast stream of data representing intensity value at a point in an image. This intensity value represents some physical attributes in space for the scene captured by a camera. Viewers are interested in objects, their characteristics, relationships and temporal history. A video is interesting because it provides that information.

We can also view a physical event as the evolution of spatio-temporal characteristics at a certain location. Now, as the amount of data increases, human ability to specify the location decreases. Thus, a system that will provide facilities to specify objects and events and will return or retrieve data corresponding to those will be much more interesting and useful to humans.

Gestalt Vision

At any given time, we can only see the world, or the environment, from one perspective. To acquire other perspectives, we must move our eyes. To explore the environment from other viewpoints, we have to physically move. When we view the environment from one perspective, we are limited to what one may call tunnel vision or more precisely, considering the nature of image formation processing, "funnel vision." Remember the famous fable about the six blind men and an elephant? Cameras have similar limitations. Thus, when a scene is captured using only one camera, the perspective is limited. One could obtain more information about the environment by panning and tilting the camera so that one could see a complete view from one position. QuickTime VR has attracted attention by providing a mechanism to record these scenes from one position and then allowing a user to view the scene in any direction, but from this viewpoint. Similar efforts are being made in many research groups by taking multiple images of a scene and then using software to merge these images to provide a larger picture than is possible from any single camera view.

Using a powerful information system to mediate between viewers and multiple cameras, it is possible to provide Gestalt Vision, which is more than is possible using any individual camera. Gestalt Vision provides a holistic view by combining localized views. A viewer then can see the scene from any position and may walk through a dynamic scene without disturbing the events in the scene.

Architecture of Multiple Perspective Interactive Video

Content-based interactivity and Gestalt Vision can be implemented by combining the tools and techniques currently being developed in different disciplines of computer science. In many applications, such as sports broadcasting, traffic monitoring and visual surveillance, multiple cameras are placed at strategically selected points to provide an operator a global view of events. In all these applications, different camera shots are fed to one location and displayed. In a broadcast application, one of these views is selected by the editor or producer of the program to be broadcast to consumers. Clearly, this is intelligent multiplexing where the operator plays the role of the intelligent multiplexor.

Using evolving information systems and the delivery mechanisms created by the network infrastructure commonly available now, it is possible to develop systems that allow content-based interactivity and Gestalt Vision by strategically placing multiple cameras in an environment of interest. The image stream from each camera is processed to extract task-dependent information and is fed to an information system, called an Environmental Model. The Environmental Model assimilates information received from all cameras into one information system, which represents that information at multiple levels of abstraction.

This information system offers two major facilities. A user can interact with the information system at many different levels of information abstraction, and select the visualization mode in which to view the desired information. Also, a user can view any information of interest from any viewpoint of interest. Thus, the human multiplexor is removed and the user becomes the producer of information. Another major advantage is that the Environmental Model can be used by several users to view different information at the same time. Since the Environmental Model is an information system, it can be designed to reside at one or multiple locations and satisfy the information or entertainment needs of a diverse group of users at the same time.

Multiple Perspective Interactive video provides a framework for the management of and interactive access to multiple streams of data capturing different perspectives of an event. It has strong database and hypermedia components that allow a user to interact with live events and browse the underlying database for similar or related events or to construct interesting queries.

Impact of Multiple Perspective Interactive Video on the Classroom

Classroom education is one of the most popular methods of instruction, from the elementary school to the graduate school. The attractive aspect of this method is that a learned person is presenting, sometimes in a very entertaining way, important material to an audience. The audience is free to ask questions and receive clarifications. The presenter, usually called instructor, uses several media to explain complex concepts. Even before the days of the popular multimedia technology, it was common to use physical models, speech, blackboard and other appropriate support tools. Classroom education has been a popular, efficient and reasonably effective method of education.

It is well-known that the effectiveness of classroom teaching diminishes in proportion to the number of students in a class. Once a class size reaches more than about 25 students, several deteriorations occur. The interactivity in class decreases dramatically and the instructor loses any personal touch with the class. Not every teacher or professor possesses the talent to explain the material well. And with other pressures mounting, many instructors either are not experts in their class material, are not good performers in class, or are not interested in teaching.

A traditional class meets at a particular time at a particular place, creating an essential requirement that all students and the instructor be collocated during the specified period. This is the most serious limitation inherent in the classroom model. By removing this requirement, all other limitations can be significantly eliminated. This can be accomplished by applying Multiple Perspective Interactive techniques to a classroom. Let us call such a classroom an MPI classroom.

MPI Classroom

Consider the following scenario: A normal classroom can seat about 30 students. This room is equipped with strategically placed cameras to allow viewing of the instructor, to zoom in on material that the instructor may want to explain, to display the board, and to show all students in the class. There are enough cameras to allow close viewing of all activities in the classroom. The board itself may be an electronic board that functions as a computer display as well as a tablet used for input to a computer.

The Multiple Perspective Interactive system allows interactive viewing of the classroom, and works both in live and archival modes. The live mode allows any person anywhere to be telepresent in the class and participate in it. This telepresence is very different than the current video-based systems. The Multiple Perspective Interactive system removes several limitations of the current systems and provides a feeling of actually being there and participating in all activities.

The most important feature of the MPI classroom is that you have freedom to observe what you want to rather than relying on what the camera operator wants you to see. In fact, you have more access to activities than the students actually situated in the MPI classroom. In a real classroom, you have a fixed location and you are not free to move around without disturbing the class. In the MPI classroom you are free to change your position as many times as is convenient to you without disturbing anybody. This freedom helps in making a remote student part of the classroom. A student can be observing reactions of other students physically present in the classroom while the instructor is explaining some complex concepts. A remote student can electronically copy the contents of the classroom material, including what is displayed on the board. In fact, a student may give instructions to automatically store everything that is displayed on the board, and concentrate fully on understanding the material being presented. A remote student can ask questions in class at any time and his or her video may be displayed to the class any time. In fact, it is possible to display all remote students in the form of a video gallery to provide feedback to the instructor.

All activities in the MPI classroom can be archived for later use. The system can reconstruct the proceedings of the class in Multiple Perspective Interactive mode to allow a student to visit, or revisit, the class from any viewpoint. In archival mode, some post-processing can be done to further enhance certain features, as may be required. Many other information sources can be attached to the class to provide all information at one place.

Several utilities may be added in the archival mode to enhance their effectiveness. For each class, a list of frequently asked questions (FAQs) can be compiled and the lectures annotated with these questions, as if the instructor were answering them in real time. A student could ask a question in this class and the system would go back to the FAQ list to answer the question. If the question is not on the list, it can be e-mailed to the instructor with a reply automatically dispatched to the student asking, "Can I give you this information after the class?" A summary of the class along with the table of contents can also be provided to students. This will allow them to visit the parts of the class that they consider important. Also, they have the option of visiting only the part that they need to, very quickly. Another important thing is that the archival system can also be used by regular students to review the class.

Another interesting implication of using MPI video technology may be that classes given by an excellent instructor on the topic will become timeless. Thus, lectures by an excellent instructor will be available to students to review even after the lecturer is no longer alive.

MPI Laboratory

Laboratories are important parts of science, medicine and engineering education. Even in other fields, many different forms of laboratories exist. Laboratories require physical access to a facility. Traditionally this facility was located at a particular place that invested in equipment. In many cases, the difference between a good or bad education boils down to variations in the quality of laboratory facilities.

High-quality laboratories are very expensive to build and in many cases are used only part of the time. If they could be shared among several universities, then the quality of facilities available to a large number of students could be enhanced without much, if any, increase in the overall cost. Supercomputing centers are a good example of providing mass access effectively and economically. This concept could be popularized using the Multiple Perspective Interactive approach.

There are two main types of activities that can be performed remotely. In one class of activities, a student may just want to observe things. Consider the case of a process or experiment in which a student needs to observe something from different viewpoints. Suppose that a famous surgery professor is performing an intricate procedure and many students want to observe the operation. Many of the students may be interested in different aspects of the operation. Clearly, all students can not be present in an operation theater because neither the situation allows it, nor are the cost and logistics associated with visiting an expert located thousands of mile away practical.

In such situations, multiple cameras and other types of sensors (related to the task) can provide input to an Environmental Model to build the Gestalt Vision of the situation. The database representing the whole situation may a lso be linked to several other databases that could provide related information such as a particular patient's history, a surgeon's classification, information related to the patient's condition, and any other pertinent material. A student could then control the viewpoint to see the situation from the desired perspective, and could also access other measurements or signals to understand precisely what is being done. In fact, students may even pretend that they are the surgeon and record their own actions and then compare their decisions with those made by the surgeon.

To allow more interaction and conduct experiments personally, it may be possible to combine tele-robotics with Multiple Perspective Interactive-based technology. By integrating robotic platforms and manipulators, it is possible to perform several experiments remotely; it is also possible to put several devices and sensors on the Web and use them for different experiments. By improving the interfaces to control sensors and devices it will soon be possible to design laboratories that will allow a student in San Diego to run an experiment in a lab in Ann Arbor, or even a lab at the North Pole.

Such laboratories will enhance education significantly because students can experience high-quality research environments, independent of their own geographic location. Another major result will be the proliferation of "natural laboratory environments" by organizing virtual visits. To observe the force of falling water, students in De Moines, Iowa, can see Niagara Falls from different perspectives, or to experience high mountains, they can travel virtually to the top of Mount Everest and take a stroll, while still in their virtual reality cave in De Moines. Clearly, these natural laboratories will allow students to explore many different environments ranging from natural environments to completely controlled, well-equipped laboratories.

MPI classrooms and MPI laboratories can be implemented in the very near future. I believe that Multiple Perspective Interactive techniques will have a significant impact on the way discussion groups are conducted. People will be able to participate in virtual discussions that will combine Multiple Perspective Interactive video with "chat rooms" to provide very realistic virtual discussion groups. Multiple Perspective Interactive techniques will also result in a new generation of "books" that will contain current real world examples. Thus, in place of showing a stale picture of a bazaar in Bangkok, you will be able to experience it at different times of the day or night, as it is at that moment.

In the last few years, educators and policy makers have begun envisioning "Universal Universities." Multiple Perspective Interactive video technology will be key in realizing these universal universities of the future.

Ramesh Jain is professor of electrical and computer engineering and computer science and engineering, director of the Visual Computing Laboratory at the University of California, San Diego, and editor-in-chief of IEEE Multimedia. [email protected]

Take me to the index