As hiccups go, NBC’s attempt to live stream virtual reality coverage of some Winter Olympics events appears to have been a pretty big one, but in light of what’s in store with VR streaming it’s not likely to be more than a cautionary tale about how not to do things.
While more glitches can be expected at this early stage of VR programming, commitments to ongoing live sports coverage worldwide mark a significant jump beyond what’s been done so far, ensuring the ball will keep rolling as broadcasters and service providers adjust to advances that promise to expand the appeal of VR-related viewing experiences to ever larger audiences. Notably, growing adaptation to use of what is known as tiling technology could be a game changer impacting how VR-captured events are delivered for viewing with and without head-mounted displays (HMDs).
As described by Rob Koenen, co-founder and chief business officer of Tiledmedia and president of the Virtual Reality Industry Forum (VRIF), tiling is a function supported by MPEG’s HEVC (High Efficiency Video Coding) Main 10 profile. It provides a means by which the transmitted “viewport,” encompassing what a viewer sees at any instant in time, is filled in with degrees of resolution mapped to how the eye registers fields of vision in real life.
Tilemedia’s ClearVR adaptation of tiling for VR applications with latest iterations introduced at this year’s CES “is ready for prime time,” Koenen says. Over the past year Tiledmedia has participated in a variety of trade show demonstrations with the likes of Sky, Harmonic, ViaAccess-Orca, Ericsson and others.
ClearVR, now undergoing testing by service providers and broadcasters, has found its way into encoding, chipset and VR platform solutions from a growing number of suppliers. It’s unclear to what extent the industry will rely on Tiledmedia’s solution, but there’s little doubt it will be widely used.
“Investing in VR 360-degree experiences now has the potential to pay off,” says Harmonic vice president Thierry Fautier in a recent blog post. “With advances in tiling technology, VR 360 video is ready for trials and is expected to become a market reality this year. This is the beginning of something big.”
Amplifying on the point in an article posted last summer by CED, Fautier notes that network delivery of VR content “has not been practicable until recently, because it required very high bandwidth (at least 15 Mbps to 20 Mbps) while providing poor video quality.” Tiling resolves those issues by “reducing bandwidth requirements by an order of magnitude and enabling fully immersive UHD VR to be enjoyed by viewers on legacy head-mounted displays (HMDs).”
Reaction To VR Olympics Coverage
It remains to be seen whether Intel is ready to make use of tiling with upcoming live VR engagements. But there are plenty of inducements, judging from the reception accorded performance of Intel’s True VR platform with the Winter Olympics, not to mention the inclusion of tiling in guidelines recently issued by the VRIF Distribution Task Force co-chaired by Intel senior standardization manager Ozgur Oyman.
As reflected in an outpouring of commentary from news, consumer technology outlets and users, the Olympics VR coverage supported by Intel’s True VR platform left a lot to be desired. Many observers expressed appreciation for some aspects of the immersive viewing experience Intel, NBC and the Olympic Broadcasting Services provided, but most reactions to live streaming of 50+ hours of events such as alpine skiing, snowboarding, ice hockey, curling and speed and figure skating were lukewarm at best.
According to Sports Business Daily, the NBC Sports VR app garnered overall ratings of 1.7 stars out of 5 in both the Apple App Store and the Google Play store. “In both cases, one-star reviews outnumbered all others combined,” the report said.
Echoing individual users’ assessments, MIT Technology Review’s senior editor for mobile Rachel Metz offered this reaction: “The first thing I realized was that while the resolution of virtual-reality videos appears to be better than it was at the 2016 Summer Olympics, it’s still pretty terrible.” While the immersive experience offered views not available on regular feeds, “faces were impossible to pick out unless I was watching in a special ‘VR Cast’ mode that automatically decided which VR camera view to show me; it included a virtual big-screen TV displaying close-ups of the athletes. To make matters worse, the stream regularly cut out,” she said.
Poor image quality and less-than-optimal vantages on fast-moving athletes was a recurring theme. “Ice skating was abysmal in NBC and Intel’s VR because of the poorly placed cameras and the necessarily wide view,” wrote Engadget associate editor Steve Dent. “The latter made it hard at times to tell if there was even a skater on the rink.”
Dent added: “Ski jumping, snowboard halfpipe, bobsled and luge were also not great, because the athletes move by the cameras too quickly to see much. Again, the frame rates and resolution can’t keep up, so sometimes the athletes are literally just a blur.”
While, as Sports Business Daily put it, the reasons for the negative reviews “appear to run the gamut, from the design of the app itself to the quality of the 3-D video that comprised the content,” many of the issues cited by reviewers and users would be less problematic, if non-existent with use of tiling. This is why tiling is a key component of the guidelines for VR distribution presented by VRIF leaders at CES in January.
VRIF Tiling Guidelines
This first set of specifications focuses on best ways to apply existing industry standards for creating and distributing VR content in on-demand mode with three degrees of freedom (3DoF). The next round of guidelines slated for completion by year’s end will provide recommendations for live distribution with 6DoF, among other things, officials said. (In 3DoF mode the user can look around a statically rendered 3600 or 1800 space – turning to the left or right, looking up and down and tilting side to side, whereas with 6DoF the user can do all these things while moving around to shift the point of view within the captured space.)
Intel, by setting up multiple locations for camera clusters at each covered event, made it possible for users to switch from one cluster’s field of vision (FOV) to the next, but they were limited to 3DoF within each 1800 panoramic view generated from each location. While 6DoF is commonly used in gaming and other forms of stored VR content, to date live VR programming has been delivered with 3DoF.
While tiling can be used advantageously in on-demand as well as live scenarios involving 3DoF or 6DoF, support for 6DoF in live event streaming is where the technology is set to play a major role. In addition, tiling will enable new approaches to viewing live VR content without as well as with HMDs.
As described at CES by VRIF Distribution Task Force co-chair Ozgur Oyman, the forum’s current guidelines support two approaches to delivering VR content – viewport independent and viewport dependent. Viewport-independent transmission occurs when the entire field captured by clustered or new 3600 or 1800 single-lens cameras is transmitted to each viewer, as in the case of the Olympics coverage.
In order to minimize the amount of bandwidth required in viewpoint-independent transmissions, providers load the entire FOV incrementally into buffers, creating a slight delay before the viewer sees the captured scene. The ongoing amount of bandwidth consumed during the viewing session is reduced by the traditional approaches to video compression where only motion-induced changes in the scene such as what’s happening on the field of play and in the stands need to be transmitted.
Nonetheless, the need to deliver the full FOV with minimal delays imposes a bit-load penalty that produces compromises in overall video quality. This is exacerbated by the fact that a certain amount of bandwidth is unnecessarily consumed to transmit changes in the full FOV that are not within the user’s viewport at any moment in time.
In contrast, “tiling allows you to deliver the content within the viewport at higher levels of resolution while using lower resolution for the remainder,” Oyman said. “When the viewer looks in another direction, the client can fetch tiles at the higher resolution matched to the new viewport.”
Challenges And Performance Results
But there are a lot of conditions that need to be met to make this possible, including reduction in latency related to transmission and client decoding. “And,” he added, “tiling also depends on advanced CDN capabilities.”
But these are not capabilities any up-to-date CDN can’t support. Tiling-based VR streams “can be delivered over existing CDN infrastructure using very well-known protocols,” Oyman said, pointing to how protocol stacking in the MPEG DASH (Dynamic Adaptive Streaming over HTTP) streaming platform can be applied with VR-optimized extensions encapsulated in MPEG’s Omnidirectional Media Application Format (OMAF).
As for the need to reduce latency to the point where a viewer can’t tell any difference between the real-world and the VR experience of looking around, applications over typical networks demonstrate this is doable with the help of a low resolution backdrop of the full FOV. “Since it takes approximately 20 to 40 ms to retrieve tiles from the network under good network conditions, precautions are needed to prevent black areas or picture freezes appearing in the HMD when users turn their head,” Fautier notes.
“The ideal solution is to use an extra layer at a much lower resolution of the entire panorama,” he continues. “When a user moves the attention to a different part of the panorama, this ‘fallback’ layer ensures that there are no black holes for the 20 ms to 40 ms it takes for the high-resolution tiles to arrive.”
The ramifications of what tiling brings to the VR experience were demonstrated at CES following the VRIF workshop. In one demonstration utilizing Tiledmedia’s technology and Akamai’s VR-optimized CDN facilities, a VR clip accessed through a hotel Wi-Fi network provided a seamless viewing experience across the captured space with no discernable latency no matter how fast the viewer looked in different directions.
Another demonstration, using content stored on a local server a couple of miles away, offered the first public viewing of a fully immersive 6DoF, high-resolution VR implementation conforming to the VRIF specifications. The viewport-dependent transmission of a mountain-climbing scene delivered clear stereoscopic views of all elements at all distances instantaneously with rapid head movement in all directions.
These experiences comport with what Harmonic has found in extensive testing of tile-based VR encoding. “VR tiling technology is being tested and is used effectively in the real world,” Fautier says in his CED blog. “At the 2017 NAB Show, Harmonic teamed up with the Blue Man Group to showcase 360-degree VR video produced in 8K using Tiledmedia’s…technology integrated with the Harmonic PURE Compression Engine, Viaccess-Orca’s Connected Sentinel Player for secure playback, and Samsung Gear VR headsets.”
Video publishing platform provider Bitmovin’ is another entity preparing to adopt tiling in its encoding processes, says Tanya Vernitsky, senior product marketing manager at the firm. In a recent posting describing a paper delivered by Bitmovin’ executives at the IEEE International Conference on Image Processing in Beijing last year, Vernitsky says the researchers analyzed adaptive bitrate streaming of VR and 3600 video over HTTP and described “the use of tiles, as specified within modern video codecs, such as HEVC/H.265 and VP9, to recognize bitrate savings of 40-65%.”
In another paper delivered at the Beijing conference, researchers from Trinity College in Dublin reported similar findings. “Tiling and adaptive streaming enable the proposed system to deliver very high-resolution 3600 video at good visual quality,” they wrote. “Further, the proposed viewport-aware bitrate assignment selects an optimum DASH representation for each tile in a viewport-aware manner. The quality performance of the proposed system is verified in simulations with varying network bandwidth using realistic view trajectories recorded from user experiments. Our results show that the proposed streaming system compares favorably compared to existing methods in terms of PSNR [peak signal-to-noise ration] and SSIM [structural similarity index measure] inside the viewport.”
Live VR Prospects
How fast tiling enters the live VR streaming arena remains to be seen, but it looks like it won’t be long in coming. Tiledmedia’s Koenen says his company is supporting trials with a major U.S. telco and others abroad, including one of the largest in Europe, and has deals pending with a large digital media platform provider and a Hollywood-based cloud platform start-up. Both Sky and BT in the U.K. have acknowledged they are exploring tiling with Tiledmedia.
Koenen acknowledges he was surprised to learn recently that one potential customer is weighing launch of a VR service using tiling that would be subscription-based. “This might come to light at NAB,” he says. “It would be a major demonstration of confidence in the appeal of VR services.”
Meanwhile, the pace of VR penetration into live sports venues continues to accelerate. NCAA’s March Madness, the NBA and MLB have committed to an expanded schedule of live broadcasts this year while the NFL indicates it’s exploring where to take the technology beyond the limited on-demand replay applications it offered last year. The technology is showing up in golf tournaments, NASCAR races, tennis, soccer and other sports as well.
And tiling might not be the only way distributors find to improve on performance. NextVR, which has led in live sports VRcasts, including regular NBA coverage, has promised to introduce 6DoF with its approach to live delivery, which is not tile-based but promises a high-resolution, latency-free experience as users look in different directions and move to different vantages for a better view of what’s happening.
“Producing VR content with 6DoF will deliver the most immersive experiences for fans,” says Danny Keens, NextVR’s vice president of content. “Live broadcasts are a point of differentiation for NextVR, and these new technical introductions to our VR technology platform will completely redefine live experiences for our fans and partners alike.”
The advantage for tiling rests in part at least on the fact that it is a function intrinsic to HEVC encoding. Koenen also points to the 2D part of the equation, which has been of particular interest to Ericsson, as shown in tablet-based demonstrations of non-immersive 3600 viewing. In this application users are able to view and zoom in on whatever interests them across the entire 3600 FOV.
In fact, this type of viewing experience was the initial target of the Tiledmedia technology, which was incubated at the Dutch national research institute TNO. “What we’re doing with Ericsson involves use of the technology for a second-screen app where, in sync with the main broadcast, users can use their tablets or smartphones to choose what they want to see,” Koenen says. “This allows people to view an event together on the TV while having an individual viewing experience that is very compelling.”
Indeed, while interest in HMD-based immersive experiences has dominated use of Tiledmedia’s technology so far, the prospects of having an unscripted user-driven full-field view of what’s going on creates an opportunity with mass market potential well beyond the small population of people who own HMDs. Insofar as the tiling encoding process for a given event can deliver both the 6DoF and 2D 3600 viewing experience, it seems inevitable that the 2D application will be widely available to everyone in the months ahead.
“VR tiling technology supports both VOD and live content, a stark contrast with other approaches, and the full panorama only has to be encoded once compared with other approaches that need to encode the panorama up to 30 times (once for each viewport),” Fautier says. “This makes VR tiling an inexpensive and easy to deploy solution. Even better, VR tiling works with HMDs as well as flat screens such as tablets, phones, and even set-top boxes.”