Body Language
How Virtual Reality Will Help Reduce Social Distance
Virtual reality will help solve the eye-contact problem in teleconferences.
Posted March 29, 2020
As we adjust to a different life during the coronavirus pandemic, we are all grappling with how to remain connected with each other. Shelter-in-place and stay-at-home orders have made typical face-to-face socialization a thing of the past—at least for the foreseeable future. Instead, many of us are resorting to teleconferencing technologies, like Zoom, Skype, Google Hangouts, and FaceTime, to keep us connected and reduce the effects of social distancing.
Last week, I attended a Zoom-based wedding reception with about a dozen close friends living in New York, Boston, Chicago, California, and Sweden. We all celebrated the newlyweds and spent over an hour exchanging stories, sharing updates, and laughing together. The ability to see everyone's faces in those little Zoom windows made the interaction much more personal than it would have been over the phone, and afterward, we all felt like we had spent some quality time with each other. However, as grateful as we were for the teleconferencing technology available to us, it also became clear that current platforms like Zoom are missing two important aspects of group communication: (1) the ability to make mutual eye contact, and (2) the ability to initiate multiple conversation threads. These are both issues that virtual reality is well poised to solve.
Achieving mutual eye contact
The first problem in group teleconferencing is the lack of mutual eye contact. As anyone who has used Skype or Facetime knows, it is practically impossible to have true person-to-person eye contact in these systems. The reason has to do with the geometry of webcams and screens: With the typical placement of the webcam above the screen, when a user looks anywhere on the screen, it will appear to other users as if they are looking down, below the perspective of the camera. For other users, the *appearance* of eye contact can be intermittently simulated, if and when the original user looks directly at the camera. But in those moments, the original user cannot simultaneously look at the screen where the faces are, so from the perspective of the first user, there is no eye contact happening.
Navigating multiple conversation threads
The second problem in group teleconferencing, which is ultimately related to the first problem, is the difficulty of initiating and navigating multiple conversation threads. In a real-life group setting, subsets of participants can temporarily break off from the main group conversation and have an exchange before returning to the main conversation. These independent exchanges can be achieved with nonverbal cues, such as turning one's body to face an addressee or initiating eye contact.
Technologies such as Zoom do not allow for this type of person-to-person signaling, because nobody knows who anyone else is looking at. Of course, Zoom does provide alternative solutions: For example, one can "raise a hand" to request the floor, or a facilitator can create "breakout groups" to allow private chats among subsets of the group. However, these solutions are unnatural and cumbersome. They don't approach the quick, effortless action of turning to a person and addressing them directly, which happens naturally in face-to-face contexts.
Fortunately, advances in virtual reality technology promise to solve both of these problems. First, rather than representing each person as a static box somewhere on the screen, each person in a virtual group is represented by an avatar, a dynamic, three-dimensional character that co-exists in the same virtual environment as everyone else in the group. Each person's avatar has a virtual physical location, so if I turn to my right to address someone, that person will see my avatar turn toward them; they will know they are being addressed. Platforms like Facebook Horizon (oculus.com/facebookhorizon) and Alt Space (altvr.com) give users a range of choices of environments and possibilities for interaction.
Furthermore, virtual avatars can mimic the eye movements and facial expressions of the participant, including simulating mutual eye contact. For example, virtual reality systems like Oculus Rift and HTC Vive now incorporate in-facing cameras and eye-tracking technology that can measure the precise direction of the user's gaze dozens of times per second. Once the gaze direction is measured, it is instantly simulated on the user's avatar, and other users are able to see and interpret it.
Although virtual reality is not yet as popular as the various teleconferencing technologies that have become so ubiquitous in the recent weeks of social distancing, it is not very far behind. Soon, VR will allow us to have more naturalistic interactions in which we can use natural body language and eye movements to signal the beginning and end of conversational turns.
Until then, we will need to adjust to these limitations and be more deliberate about when we want to say something, and who we are saying it to.
References
Latoschik, M. E., Kern, F., Stauffert, J. P., Bartl, A., Botsch, M., & Lugrin, J. L. (2019). Not Alone Here?! Scalability and User Experience of Embodied Ambient Crowds in Distributed Social Virtual Reality. IEEE transactions on visualization and computer graphics, 25(5), 2134-2144.
Yuan, M. L., Chua, G. G., Farbiz, F., & Rahardja, S. (2011). Eye contact with a virtual character using a vision-based head tracker. In Computer and Information Sciences (pp. 217-225). Springer, Dordrecht.