Just barely coming off the overclocking details, Intel is back with another information dump regarding its upcoming Arc Alchemist GPUs. Alright, I shouldn’t be that harsh; it’s Intel’s first proper discrete graphics card launch, of course they’re excited! We’ve seen Intel slowly pan back the curtains on Arc over the past few days and today is no different. Continuing the tradition, Intel’s Principal Engineer, Karthik Vaidyanathan sat down with Wccftech to break down XeSS, Intel’s new super sampling technology.
Recap On XeSS
XeSS is major. It’s Intel saying they can do both hardware and software right. XeSS is supposed to compete against AMD‘s FidelityFX Super Resolution (FSR) and Nvidia‘s infamous DLSS. Intel is showing extremely promising numbers for this tech and they’re making it open-source to enable wider adoption as quickly as possible. XeSS will work on both Intel’s own Arc Alchemist GPUs and the competition’s graphics cards as well.
XeSS will use XMX matrix engines on Alchemist to power XeSS. On the green and red team, the DP4a instruction set will be used to make XeSS work. XMX, or Xe Matrix eXtensions, are essentially Intel’s equivalent of the Tensor cores found on RTX 20 and 30 series GPUs. This way, you get the best of both worlds and the barrier to adoption is lowered significantly. Obviously, the XMX version of XeSS will be the superior one as it’s running on Intel’s own proprietary hardware optimized to make the best use of XeSS.
The New Information
All that we already knew, but Wccftech was able to obtain a lot more from Karthik so let’s take a deep dive into today’s findings. Karthik first broke down the details of upscaling. He talked about the differences between spatial upscaling and super sampling. He also laid out the main goal of XeSS which is to upscale the image to a higher resolution without loosing any frames or quality in doing so.
The ultimate goal for us with a technology like XeSS is to produce the highest quality rendering with the most accurate lighting that has the most detailed shadows and reflections – and introduce that at a smooth frame rate.
Like DLSS, Like XeSS
Karthik further tells how AI-based neural networks are imperative for super sampling techniques. Like DLSS, XeSS takes motion vector data into account to predict nearby pixels and reconstruct the image. So, for instance, if we take 10 frames and each of those are dynamic—meaning that the objects in the frame are moving, it’s hard to upscale that image properly as the pixels present in one frame might not be there in the next. This is where neural networks shine as they help figure out the missing pixels and then build them back up from neighbouring pixels through AI and machine-learning. This is poles apart from FSR which is limited to only spatial upscaling.
There are many such scenarios where because the scene is dynamic, things are moving, you cannot have a one-to-one correspondence between the pixels in your previous frame and your current frame. […] You really need some smartness to try and detect which pixels are usable. […] And that’s where neural networks come in because this is almost an ideal problem for neural networks, because they are very good at detecting complex features and that’s where we can use them to integrate just the right amount of information and when that information is not there, try to detect these complex features and reconstruct them.
XeSS Won’t Need Per-Game Training
When asked to put XeSS in the context of DLSS and FSR, Karthik describes XeSS as much closer to DLSS. Again, it comes back to how XeSS and DLSS are utilizing motion vector data whereas AMD’s FSR is solely spatial upscaling. We find out that each game won’t need its own training for XeSS. DLSS 1.0 was restricted to per-game training while DLSS 2.0 was a generalized technique. That means whatever Nvidia’s supercomputers learn from one game is generalized across every other. XeSS will also work like this from launch.
So FSR to my knowledge is spatial upscaling, and we already discussed spatial upscaling and some of its limitations. DLSS 1.0, again I am not aware of the internals of DLSS because its not open, but from my understanding, it was not something that generalized across games, DLSS 2.0 plus was neural network based and it generalized very well. XeSS from day one, our objective has to be a generalized technique.
Karthik also added that the Unreal Engine demo shown at Architecture Day was running with XeSS for the first time. XeSS was never trained to upscale that demo beforehand which makes the feat that much more impressive.
You’ve seen the demo and I can say that XeSS has never seen that demo. It was never trained on that demo. All the content in that scene that you saw was not used as a part of our training process.
XeSS using XMX vs. DP4a
As we know by now, XeSS will run on older and competitor hardware thanks to the DP4a instruction set. Karthik outlines that Microsoft Shader Model 6.4 is behind the DP4a version of XeSS and GPUs supporting that should be compatible with XeSS. So, that’s Nvidia‘s Pascal, Turing, and Ampere architectures along with AMD‘s RDNA 1 and 2. Moreover, he also mentions that the XMX version of XeSS will still be the better implementation since it’s hardware-accelerated, but the DP4a version is no slouch either.
Nvidia has had this [DP4a support] I think, since Turing and AMD has this now on RDNA2. So even without Matrix acceleration you can go quite far. It might not be as fast as matrix acceleration, but certainly meets the objective. […] So, when it comes to older models, on older internal GPUs, we’ve had dot product acceleration (DP4a) for a while now. […] Microsoft has enabled this through Shader Model 6.4 and above and on all these platforms XeSS will work.
XeSS 2.0 and 3.0 Will Happen
Intel isn’t shy to admit that XeSS won’t be perfect at launch. There are improvements to be made after the launch as the neural network and Intel themselves learn more. DLSS was notoriously awful at its first go but Nvidia came back and made a night-and-day difference with DLSS 2.0. While, XeSS may not follow the same path, Intel wants to evolve XeSS over time and, thus, a 2.0 and even a 3.0 version is imminent.
A technology like XeSS, I believe there is so much more we can do and it would be naive of me to say that we solved all problems and that XeSS is just perfect, and that we’re done. It’s going to improve more. […] There’s so many interesting problems in this space that we will continue to improve, evolve and lead. So yes. There will be XeSS 2.0 at some point, XeSS 3.0 at some point.
XeSS Will Have Multiple Quality Modes
When asked about the different modes that XeSS could potentially offer, Karthik stated that it’s become sort of a standard to expect multiple quality modes with an upscaling technology. DLSS does it, FSR does it, so it only makes sense to develop a similar model for XeSS. But, apart from confirming XeSS will have different quality modes, Karthik also detailed how we often lose the true purpose of upscaling amongst these toggles and sliders.
The performance mode is meant for the highest FPS but also brings the most noticeable dip in quality with it. Whereas the quality mode renders at a higher internal resolution to offer better quality but sacrifices frames in that process. The whole point is to produce an image that’s reminiscent of the quality mode but offers the framerate of the performance mode. And, that’s ultimately what these super-sampling technologies are supposed to achieve through different methods and what XeSS will do.
We will have the quality modes as both FSR and DLSS have those at this point. So, you know, we will support the same when users are used to it. So we would support that. But I also wanted to point out that the one thing that sort of gets lost in these different modes, performance, quality, ultra quality is that what you really want to have is something like the performance mode produce an image quality that is so close to ultra quality that it doesn’t take away from the visual experience.
XeSS is Built Around Both Computers and Humans
Karthik told Wccftech that Intel is taking both quantitative and qualitative data into account for XeSS. They’re using several metrics like Picture Signal to Noise Ratio (PSNR) to study the numbers are figure out how well XeSS is doing its job. But, Intel is also conducting user testing to take feedback and construct qualitative data. Both objective and subjective data is taken into consideration to measure the quality of the upscaled image.
We do both. We do user testing and we have a set of qualitative metrics that we use. […] You have things like PSNR, but there are more advanced metrics that are available to us now […] like perceptual metrics. […] No metric is perfect when it comes to user perception, especially with gaming. So, we always have to rely a fair amount on user testing too.
At the time, we don’t know whether AMD or Nvidia also relies on user testing to analyze the final product. It makes sense that they would since, as Intel puts it, hard-fact numbers are not enough and you need human feedback to understand the subjective quality of something as well.
Zip, Zip and Zip!
Wccftech tried hard to get out some info on topics under covers from Karthik but it’s safe to say that his media training kept him professional. He denied comment on any direction question and replied diplomatically in ones that needed some sort of reference to unrevealed details. Intel knows they have something special on their hands here so they want to control the flow of information and narrative, and not let the press mold it.
I am not fully aware of the timelines involved – but it will be eventually open sourced that I can confirm.[…]
There are several partners that we are working with. I cannot comment more at this point.[…]
This includes game developers. I wish I could share more haha.[…]
We haven’t detailed that yet so I can’t comment on that.
Hardware-Accelerated Super Sampling on Nvidia Cards?
Moving on, Karthik made it clear that XeSS can not and will not use Tensor cores found on recent Nvidia GPUs. As mentioned before, Tensor is Nvidia’s matrix acceleration hardware that powers DLSS. XeSS, which can leverage both matrix math and software wizardry to upscale games, could potentially use Tensor to do hardware-accelerated super sampling, but Karthik was quick to turn down that possibility.
Ah, no. Until there is standardization around Matrix acceleration that is cross-platform, it’s not easy for us to build something that runs on all kinds of Matrix acceleration hardware. DP4a has reached a stage where, you know, it supports this on all platforms. Certainly on all modern platforms. So that makes it much easier for us. But Matrix acceleration is not at that same stage. So our matrix implementation that targets XMX is Intel specific.
AMD’s FSR does not utilize any proprietary hardware for its upscaling and that could be seen as its biggest disadvantage. Relying on software does allow for much broader and quicker adoption, but it comes at a cost of just not being that good. 🤷♂️ In this sense, if Intel could make XeSS compatible with Tensor then we could see results on the levels of Nvidia’s own DLSS on RTX GPUs. But, since matrix acceleration isn’t standardized across platforms and everyone has their own version of it, that scenario is as plausible as the release of Half Life 3.
No FP 16 or FP 32 Fallback
AMD FSR had FP 16/32 fallback at launch so older GPUs could still support FSR and enjoy its upscaling prowess. This obviously made FSR much more accessible, but the same cannot be said about Intel’s XeSS. Karthik went on record to say that XeSS won’t have the fallback option to FP 16/32 at launch. But, he left the answer open-ended by saying that there is always a possibility in the future if the performance Intel is aiming for is there.
No, not at the moment. We will look into it, but I cannot commit to anything at this point. Even, if you were able to do it, there’s the big question of performance and whether its justified.
DLSS vs XeSS Training Model
Nvidia’s DLSS is trained at 16K resolution whereas, now we know, Intel is training XeSS at effectively 32K. But, that number is not 100% accurate. Karthik was quick to point out that Intel looks at this a little differently and that their model is trained at “64 samples per pixel reference images“. What that means is that there are 8 samples per both X and Y axes, which totals to 32, hence an effective 32K resolution. Karthik also added that 64x SSAA is the target quality XeSS is trained at.
Let me put it differently, we train with 64 samples per pixel reference images and I think that makes more sense. Because what we are trying to match, the kind of quality that we are trying to train the network with is 64x SSAA. […] Now if you want to draw a resolution from that you can you can do the math. 64 would be, you know, 8 samples in X and Y so you could you know, that would be 32K. […] But I wouldn’t call it 32K because what we are doing is effectively all [64 of] those samples are contributing to the same pixel. But yeah, effectively 32K pixels is what we use to train – is what we used to create the reference for one image.
Karthik then confirmed that 8K support for XeSS is also on the way. Intel wants to eventually make XeSS compatible with resolutions higher than 4K, just like DLSS. For now, it seems that 4K is the max resolution XeSS would support, but that 4K looks prettier than even native in some cases.
Same API for XMX and DP4a
When asked for details on how the API is exposed over the XMX and DP4a versions of XeSS, Karthik pointed out that Intel is using the same API for both versions. The interface is the same on both and, thus, implementation is same as well. A game engine running on either the XMX-accelerated XeSS or DP4a XeSS will get access to the exact same library that XeSS has built up. The only difference is the platform decision where the game will switch paths to use either XMX or DP4a depending on what GPU you have.
[…] I would want to point out that both the DP4a version and the XMX version are exposed through the same API. So as far as the integration is concerned, it’s actually the same. […] What the game engine sees is the same interface and underneath that interface, you know, you can select the DP4a or the XMX version and depending on the platform. […] It’s the same interface and the same library that it’s integrated with two different paths inside of it, which makes it a lot easier for game developers.
Karthik further added to the question by revealing that DP4a is working over DirectX 12 and Microsoft Shader Model 6.4 and beyond are being used to make that happen. However, he does go on to say that SM 6.6 is actually what Intel recommends as extracting and packing 8-bit data is much more efficient on SM 6.6, but, officially, SM 6.4 is the minimum that XeSS can work with. More importantly, Karthik confirms that XeSS is not using DirectML as it’s machine-learning library, but instead XeSS is built around a custom solution. That being said, Intel is not oblivious to this possibility and future implementation is always on the table.
So, for DP4a, yes, SM 6.4 and beyond SM 6.6 for example supports DP4a and SM 6.6 also supports these packing intrinsics for extracting 8-bit data and packing 8-bit data. So we recommend SM 6.6.[…]
We don’t use DirectML. […] For our implementation we need to really push the boundaries of the implementation and the optimization and we need a lot of custom capabilities, custom layers, custom fusion, things like that to extract that level of performance, and in its current form, DirectML doesn’t meet those requirements, but we are certainly looking forward to the evolution of the standards around Matrix acceleration and we’re definitely keeping an eye on it and we hope that our approach to XeSS sets the stage for the standardization effort around real-time neural networks.
XeSS Has Been in the Works For Years
Wccftech continued the interview with an easier question asking Karthik about when Intel started working on XeSS. If the demo is anything to go by, we can say that it wasn’t a last-minute decision and Karthik certainly confirms that. Neither AMD nor Nvidia detail how long it took them to develop their respective upscaling technologies, but we pretty safely assume that DLSS was in the oven for longer than FSR.
From the point at which we started working on our research, it’s been more than a couple of years. Let’s just say that. So certainly not something we put together in the last year or the last couple of months, it has been going on for a while.
Karthik further clarified that the DP4a version of XeSS will launch later this year but it won’t be open-source just yet. And, as we already know, the XMX version is releasing later this month for developers.
It will be later this month for ISVs (XMX) and later this year for the DP4a but it will not be a public release. And that as XeSS matures, we’ll open up to tools and SDK for everyone.
XeSS Will Be Integrated Within The Game Engine
Like DLSS, XeSS will need to be implemented on an engine level. Intel believes that a driver-level solution is simply not as effective and post-processing effects such as film grain can seriously mess up the upscaled output. Plus, being as close to the renderer as possible and having access to the frame buffer will allow XeSS to work its magic better. Karthik said Intel realizes that implementation on game engine level is more difficult, but XeSS builds upon the foundation of pre-existing technologies which would aid in this area – you’ll understand this point in the next heading.
Just like DLSS, it would have to be integrated into the game engine. […] It requires developer support but having said that, generally, super sampling technologies that are implemented at the tail end of the pipeline, closer to the display, will always have more challenges. […] So being closer to the render gives you, […] the highest fidelity information with the amount of controllability that you need to be able to produce the best result.
XeSS Can Be Implemented Just As Easily As DLSS
Lastly, since both are engine-level solutions that can’t be applied at the end of a pipeline, Wccftech inquired how difficult it would be to actually implement XeSS as compared to DLSS. Karthik replied that it shouldn’t be that much different. Karthik highlighted that TAA, which is present as an anti-aliasing options in most games today, already has all the building blocks of XeSS. All developers need to do is modify TAA slightly by working in tandem with Intel and XeSS should be up and running in no time.
It should be similar and there’s another way to look at it. So for a game that implements TAA already, integrating something like XeSS should only be a small amount of effort because you already have all the pieces that we need with any TAA Implementation. Like you have the motion vectors, you have the jitter. […] TAA has almost become like a de facto, you know, standard for antialiasing. So for any game that already has TAA, it already has the pieces that you would need to integrate XeSS or any super sampling technique with a few modifications, of course, but those are small modifications.
From all the information above, I think it’s quite easy to gather that Intel is putting their best efforts forward with XeSS. The way Intel is shaping XeSS to be both open-source and as powerful as DLSS is fascinating. I was honestly surprised by how intuitively Intel was developing XeSS and how it can truly be a game-changer. All of a sudden, Intel is back on track and a real threat to AMD and Nvidia, and maybe even Apple.
I dwelled in my intrigue over this new Intel since the day Arc was announced, wondering where the hell has Intel been for the past few years. My colleague mentioned that this sudden wave of innovation surfaces out of thin air after every new console generation is launched. Then, it quickly stagnates. But, perhaps, this time the wave will stay on land for longer.