When a VFX issue reveals a compiler bug

Fri, Apr 21, 2023

The first symptoms

On a quiet simple Tuesday in March when one of our Sound Designers shared his latest sound work.
Nothing really out of the usual here just some nice sci-fi pews and beams.

But among the people feedbacking on the sound a keen eyed programmer noticed that the beam vfx of our weapons were coming from a weird origin.
Not from the weapons muzzle as one would expect.
My first suspect was the Particle Effect Component as I noticed that many components were attached to the weapon
but they were done so with the FAttachmentTransformRules::KeepRelativeTransform flag.
So I wondered if something had caused them to shift over and get offset.
And would you look at that, switching it to FAttachmentTransformRules::SnapToTargetNotIncludingScale fixed it.

So that would be it right? Issue fixed and I moved on. Until a week later I got another report saying it was happening again.
I launch the game, hop into a debug level, and shoot the beam weapon. And look at that the beam VFX work just fine for me.
So that very weird right, it works on my machine but not theirs.

Background

A little background on our workflow, since we have a lot of people working on our game that aren’t programmers, we make sure to push our DLL’s to git.
This way all the art, design, narrative, animation, etc people have the updated version of our project.
What this also means is that if one programmer compiles without all the code and pushes it, that all those people would be missing changes even though they pulled those changes from git.
We just work around this by making sure our programmers pull changes and recompile the dlls before they commit and push.
All together we rarely have issues with this workflow.

“It works on my machine”

So now with a “it works on my machine” case, my suspicions shifted to my colleague who was the last to commit and push the DLL’s.
So I checked with him to see if he forgot to pull the changes before building.
We quickly ruled this out since this wouldn’t be realistic as it had been an entire week since my commit, so my changes definitely were included.
And to solidfy this fact, when he compiled the game with a completely up-to-date git he would still get the issue.

So now what? If I compile it works, if he compiles it doesn’t.
This is around the time I remembered an issue we had years ago. An issue where FVector::GetSafeNormal would return faulty data.
This was something that we had only figured out because other people had reported it on the UnrealEngine forums.
So I wondered…
What if?

I started checking my compiler version, “Visual Studio 2019 14.29.30145 toolchain”, and then his compiler version, “Visual Studio 2022 14.35.32215 toolchain”.
In this situation I was very glad I had not enforced any specific compiler version amongst the team members.
So I asked everyone on the team to report me their compiler version and whether or not the beam vfx were comming from the right origin for them.
With this done I limited the issue to anyone with the 14.35 and 14.34 toolchain versions.

This is fine

Damage control engaged!

Since our build pipeline solely uses 14.29.30145 as well it was an easy decision for me to just enforce everyone to install Visual Studio 2019.
I had noticed that Unreal Engine will prefer to use the 2019 compiler even when 2022 was used.
Running from desk to desk and contacting remote colleagues we got everyone on 2019 within an hour.

A summary

The report that the issue was still happening got in at 9:40 AM, by 3:16 PM everyone was on the 2019 compiler and the issue was officially resolved.
I am really glad we had experience with a compiler bug before. If not for that I don’t think I would’ve jumped to the possibility this quickly.
And so we would’ve likely had a much longer resolution time.

Time to analyze!

As the dust settled I finally had time to sit down and analyze what actually was going wrong.
In the midsts of the chaos some theories were flung around.
“The beam is coming from 0,0,0”, not a bad first guess but when you actually test it yourself you quickly realize it’s not a fixed position.
Playing around with it myself I noticed it felt like it was somehow offset relative to gun barrel.
Turns out I wasn’t too far off. (Spoilers)
But let’s take a look at the code at hand!

You can click on the little arrow on the left to show all of the code.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
for (int i = 0; i < Beams.Num(); i++)
{
	if (Beams[i].IsValid())
	{
		FVector Start;
		if (i == 0)
		{
			Start = Weapon->GetMuzzleSocketLocation();
		}
		else
		{
			Start = Beams[i].GetBeamOrigin();
		}
32
33
34
35
36
37
38
39
		for (int EmitterIndex = 0; EmitterIndex < EmittersCount; EmitterIndex++)
		{
			Beams[i].GetParticleSystem()->SetWorldLocation(Start);
			Beams[i].GetParticleSystem()->SetBeamSourcePoint(EmitterIndex, Start, 0);
			Beams[i].GetParticleSystem()->SetBeamEndPoint(EmitterIndex, End);
		}
	}
}

So nothing too crazy here, just some code to set the Start and End points of our beam Particle Effects.
Along with some fallbacks in case there are no particle effects we make sure there are some.

Testing setup

Okay so before I could get started with digging further into the issue, I had to obviously replicate the issue.
As I previously mentioned, the compiler version I had was seemingly working just fine.
I did have a faulty compiler installed but Unreal would still use the 2019 version.

Luckily it is pretty easy to force Unreal to use the faulty compiler version instead.
I just had to go to %appdata%/Unreal Engine/UnrealBuildTool/BuildConfiguration.xml.

And add the following xml:

<?xml version="1.0" encoding="utf-8" ?>
<Configuration xmlns="https://www.unrealengine.com/BuildConfiguration">
    <WindowsPlatform>
        <Compiler>VisualStudio2022</Compiler>
        <CompilerVersion>14.35.32215</CompilerVersion>
    </WindowsPlatform>
</Configuration>

Documentation

Stepping through

Now that I have it using the problem compiler I can start debugging the issue further!
When we step through it there is something very perculiar.
After executing “Start = Weapon->GetMuzzleSocketLocation();” the Y value of Start never got set.

Picture Showing the Y value of Start being 0

That’s definitely not supposed to happen. And while it was incredibly unlikely that my weapon was exactly at Y 0 I did double check this and rule it out.
No matter what happened the Y value was always 0 even though the function definitely returned a vector with a Y value.

So it seems it was a bit of a combination of our previous guesses. The X and Z value would correspond to the barrel location.
But the Y value was always at 0.

Taking it a step further

So now we roughly know what is happening. And I could’ve left it at that.
But I wanted to dig deeper into the issue and look at the assembly generated by the compiler.
Maybe there are some glaring differences?

“Correct” (14.29.30145)

;Problem Area
A0007FF86E39434D  movups		xmm1,xmmword ptr [rax+10h]		;XMM1 = 0000000042F32217-44EE58E2C4F55432
A0007FF86E394351  movaps		xmm2,xmm1						;XMM2 = 0000000042F32217-44EE58E2C4F55432
A0007FF86E394354  movaps		xmm0,xmm1						;XMM0 = 0000000042F32217-44EE58E2C4F55432
A0007FF86E394357  shufps		xmm0,xmm1,0AAh				;XMM0 = 42F3221742F32217-42F3221742F32217 
A0007FF86E39435B  shufps		xmm2,xmm1,55h					;XMM2 = 44EE58E244EE58E2-44EE58E244EE58E2
A0007FF86E39435F  unpcklps		xmm1,xmm2						;XMM1 = 44EE58E244EE58E2-44EE58E2C4F55432
A0007FF86E394362  movss		dword ptr [rbp+48h],xmm0		;0x00000066455797A8 = 42F32217
A0007FF86E394367  mov			eax,dword ptr [rbp+48h]			;RAX = 0000000042F32217
A0007FF86E39436A  movsd		mmword ptr [rbp+40h],xmm1		;0x0000006645579690 = 00007FF86E6F7688
A0007FF86E39436F  movsd		mmword ptr [Start],xmm1			;Sets X and Y
			;}
A0007FF86E394375  jmp         		UWeaponFiringBeamComponent::DrawBeam+44Dh (07FF86E3943BDh)  
			;FVector End;
			;if (Beams[i].HasValidHitResult())
A0007FF86E3943BD  cmp		edi,dword ptr [rbx+8]  
A0007FF86E3943C0  mov			dword ptr [rsp+38h],eax			;Sets Z

“Wrong” (14.35.32215)

;Problem Area
A0007FF86DD707BA  movups		xmm6,xmmword ptr [rax+10h]		;XMM6 = 00000000430214D2-44EE05BBC4F4FD36
A0007FF86DD707BE  movaps		xmm8,xmm6						;XMM8 = 00000000430214D2-44EE05BBC4F4FD36
			;}
A0007FF86DD707C2  movaps		xmm0,xmm6						;XMM0 = 00000000430214D2-44EE05BBC4F4FD36
A0007FF86DD707C5  shufps		xmm8,xmm6,55h					;XMM8 = 44EE05BB44EE05BB-44EE05BB44EE05BB 
A0007FF86DD707CA  movaps		xmm7,xmm6						;XMM7 = 00000000430214D2-44EE05BBC4F4FD36
A0007FF86DD707CD  unpcklps	xmm0,xmm8						;XMM0 = 44EE05BB44EE05BB-44EE05BBC4F4FD36
A0007FF86DD707D1  movss		dword ptr [Start],xmm0			;Sets X
A0007FF86DD707D7  shufps		xmm7,xmm6,0AAh				;XMM7 = 430214D2430214D2-430214D2430214D2 
A0007FF86DD707DB  jmp		UWeaponFiringBeamComponent::DrawBeam+435h (07FF86DD70835h)  
			;FVector End;
			;if (Beams[i].HasValidHitResult())
A0007FF86DD70835  cmp		esi,dword ptr [rdi+8]  
A0007FF86DD70838  mov		eax,r12d  
A0007FF86DD7083B  movss		dword ptr [rsp+38h],xmm7			;Sets Z

Now maybe many of you are like me and don’t really understand what’s going on here at first glance.
And honestly even after researching it for a while I still don’t have a very thorough understanding.
But at least I think I get the gist of the issue.

In the good version it uses a 64bit mov instruction (movsd) and in the bad one it uses a 32bit mov instruction (movss).
So I am assuming that in the 64bit version this would correspond to a 32bit X and a 32bit Y next to eachother in memory and so both gets set.
And with the 32bit version this now only sets the 32bit X and thus the Y never gets touched and always stays 0.

What’s next?

I am not sure what’s next. I have been trying to recreate the issue in a clean Unreal Engine 4.27 project and so far have been unsucessful.
Which means there are aspects to this issue I haven’t figured out yet. Maybe part of the issue lies in the changes we’ve made to our version of UE 4.27.
Or maybe there is another part in the code somewhere that is somehow affecting it that I haven’t noticed yet.

Regardless this doesn’t seem to be a very widespread and easy to reproduce issue.
But maybe if someone else runs into this they will now know what is happening and how to resolve it!
I will probably keep investigating this further when I have time and keep you updated if I find anything new!