I absolutely dispise this meme, why on earth can’t it be rotated 45° the other way around? If I want to read it my head is 90° from the head of the fox. It should be 0.
I saw a YT vid where they mentioned that you hear things differently at a 45 degrees angle because of the different heights of your ears. So it makes sense as a behavior associated with trying to better understand and observe something.
Helps them determine the relative height of a sound. Is the sound coming from above me or below me?
Expanding on why humans don’t do this (as often) the fleshy part of our ears is functional. Depending on how sound bounces in your brain can determine some additional features that it couldn’t just from two ears. If I recall correctly variation in ear shape between people also creates difficulty in creating identical universal 360 sound. Can’t help but find it fascinating we’ve had ray traced games for a bit now but sound is mostly still just faked and not simulated.
I literally waded through hundreds of HRTFs and didn’t find a single one that works, all are worse than panning/delay because there’s some nasty “nope this should be in the back, not front” discontinuity somewhere. Try for yourself, you might be more lucky. Best I can hope for is an app that lets me do photogrammetry on my ears to create a personalised one.
Regarding the sound that comes out of games though you can do a lot to be more realistic without getting HRTFs involved, the whole general theme of simulating the impact of geometry and textures on sound. Early 3d sound systems simply said “there’s an infinitely large room here, you can hang up speakers everywhere, say whether the room has reverb for that speaker or not, go nuts”, while newer stuff bounces audio waves off geometry so sound can be occluded, go multiple paths towards the two virtual microphones (your ears), and be influenced by the floor texture (say grass vs. tiles) differently. All that is about recreating real-world timbre, not so much 3d perception.
Basically the only thing that HRTFs do is enable up/down detection. Left/right is practically flawless with pan/delay, directly front/back is technically ambiguous but generally obvious from context.
To expand a little bit - the ridges of your outer ear will attenuate certain frequencies of sound more than others depending on the height of the source relative to your head. Animals that don’t have these outer ear ridges can still find the source of a sound laterally since they have two ears, and tilting their head tilts the plane of their ears letting them get some vertical sound perception.
In the smarter every day vid on this topic, they had a blindfolded person point to the source of a sound (succeeded no problem), and then packed clay into the outer ear ridges to mess with that sound attenuation. The person was able to keep their lateral sound perception but the vertical was all over the place.