It's almost 2 decades since Stefan and I first encountered HRTFs. In that time, we came to realise that in order to raise the quality bar for spatial audio experiences we had to move away from HRTFs. In this post, I plot the journey to our revelation.
From the mid-2000s onwards, our consistent response to HRTFs was always dominated by our disdain for the sonic quality of the processing. Every solution we listened to would have some mix of artefacts, timbral shifts and phase inconsistencies. From a production perspective, this wasn't something we prepared to rely on in our work.
At that time, we didn't really understand the cause of what we hearing. Nor did we appreciate how much of the detrimental effect was a feature of HRTF processings. It wasn't anything we'd put much thought into. It wasn't mission-critical to our work - in surround sound production - so we weren't missing what we'd never had. We would simply move on and hope the next candidate that promised 3D Audio for Surround Sound would offer high-fidelity speaker-headphone translation. Nothing ever materialised.
By 2010 we were still hoping. On the back of rapidly emerging trends in smartphones, headphones and music streaming services we decided to run an experiment. We wanted to test if listeners would prefer to hear their music as per the status quo - in-head localisation with limited stereo-imaging - or with 3D Audio that would externalise the audio into virtual space with full stereo-imaging.
Our first goal was to develop a signal path that could take stereo and virtualise it. We explored several solutions on the market and available through academic networks, In the end, there were no HRTF sets that could reduce our original disdain for the poor quality renders. This stopped us from taking our experiment further; what would be the benefit of offering a better stereo presentation that was a downgrade in fidelity?
By 2014, friends who were getting into VR development knew we had an interest in 3D Audio would ask about HRTF's quirks.. Slowly, we started to think about how HRTFs were working in this domain. We soon realised the same sonic quality issues were present but VR also bought to our attention some other problems HRTF had too; like poor localisation and slow response times to head tracking.
We started Kinicho in late 2015 to fix HRTF. Our own research and development efforts had yielded some incremental success, but every avenue of investigation mostly yield no improvement; at best, we would find a trade-off. By 2017, several HRTF solutions had come onto the market backed by blue-chip corporates in audio, social media and other web technologies. yet the problems persisted.
We speculated that the persistence was more fundamental, causing a shift in our approach. We returned to first-principles to consider hearing and audio reproduction from fresh perspective. We had dug deep enough to realise that HRTF was the problem. Every headache encountered with HRTF was a result of the fundamental engineering for binaural rendering. We simply concluded it wasn't fixable.
Out of this realisation, we understood that to raise the quality bar for spatial audio we needed to move away from HRTF entirely. The Sonic Reality Engine is the result.