Recently I received this excellent question and link about ultrasonic frequencies from the Computer Audiophile site:
Hi, Archimago. Visit your blog frequently and find your posts enlightening and entertaining at times without the usual smoke and mirrors.
What caught my eye in the musings on MQA utilizing the Mytek Brooklyn DAC was the selection of a musical reference which had ultrasonic, specifically musical, content. After reading the paper by James Boyk <https://www.cco.caltech.edu/~boyk/spectra/spectra.htm>, ; I have been interested in the subject of ultrasonics and their (potential) effect on the listening experience fully recognizing that these frequencies are well above the capability of human hearing. Included would be identification of recordings that have musical ultrasonic content.
Given that formats such as PCM (24/96 or higher) and DSD (2x or higher) [Ed: remember that even DSD64 1x can go >20kHz] have the potential to capture musical content above 20kHz, I am intrigued by the possibilities. As with all HiRes formats, I understand that not only all components of the recording chain but also the reproduction chain must have frequency response greater than 20kHz to accomplish this. I do see where speakers are being offered which are spec'd to 40kHz and as high as 100kHz. There are also numerous add-on "supertweeters" being offered which have this capability as well.
IF the topic were of interest to you and worthy of your time and consideration, I for one would be most interested in your musings on the subject.
My apologies for using this Musing as a portal for my inquiry but did not know how else to contact you with the proposal.
FWIW, given the potential of the existing HiRes formats to capture the musical experience if fully realized, I too am less than interested in MQA as the latest flavor-of-the-day.
Connecticut Audio Society
Thank you Frank for the link, interesting discussion and question. I try to do what I can to collect information and synthesize posts to provide hopefully reasonable thoughts on these matters; mixed with some measurements and personal subjective impressions as appropriate.
As is essentially characteristic of all biological traits, there is a "normal" distribution of threshold for frequency detection; some of us will be able to hear to 19kHz, others have ears that poop out by 14kHz. Even if it's not "hearing" the tone, but rather "sensing" the presence of it, this concept remains valid. Since I trust none of the readers here are aliens from another planet who may have extraordinary sensory perception, I believe we can find answers based on studies that are out there exploring this physical limitation of Homo sapiens.
You've brought up one of the fundamental claims in the audiophile world when companies embarked on going beyond CD-level resolution back in the late 1990's - that ultrasonic frequency reproduction improves the audible quality of high fidelity reproduction. This is of course on top of claims that going from 16 to 24-bits made a big difference... For more on the bit-depth discussion, have a look at the blind test from 2014 (and also general concepts of expectations for high-definition audio).
I remember reading about the supposed benefits of ultrasonic frequency reproduction around 1999. "Super tweeters" with response usually at least an octave beyond 20kHz started showing up on the scene; devices like this Fostex T90A horn, or even more easily available the Radio Shack 40-1310 which I have somewhere in my electrical parts box. Offerings grew in the early 2000's with the advent of SACD and DVD-A. Since sampling rates overcame the 44.1kHz (22.05kHz Nyquist) limits of CD audio, the opportunity was there to promote add-on transducers like the Tannoy SuperTweeter, Townshend Audio Supertweeter (2004) or this Audiosmile Supertweeter by 2008. As noted by Frank, there were some multi-way speakers over the years with super tweeters incorporated or the tweeters themselves capable of extended ultrasonics (Tannoy Dimension TD12, Linn Majik 140 for example among others like B&W's with ultrasonic-capable tweeters and JBL M2 rated to 40kHz). Check out the current Madisound list of super tweeters for sale.
Despite this ~20 years history, notice that a minority of speakers these days explicitly aim for flat frequency extension significantly over 20kHz even though many, especially the metal tweeters like titanium and beryllium ones can to varying degrees. When was the last time you saw a speaker with a reasonably flat extended frequency response like -6dB at 40kHz viewed as a major selling point? Stereophile measures speakers to 30kHz but there's no suggestion that this extension is important (rather, it's a nice buffer to ensure everything looks good to 20kHz). By definition, ultrasonic refers to frequencies we humans cannot hear; at least not based on research with typical pure-tone tests.
As a starting point, a good document to analyze which originated back during the advent of "high resolution" digital audio is this Tannoy white paper "The Need for Extended High Frequency Bandwidth - or Why You Need A Supertweeter" from 1999. In there you see the link to the James Boyk measurements as well.
We can see from the white paper 3 major claims:
1. The fact that instruments have very high ultrasonic frequency harmonics - especially piccolos, oboes, triangles, cymbals. Fair enough, nobody is contesting this information from Boyk. It's obviously objectively measurable.
2. Time coherence is of importance... Yes, the concentric speaker system like in the Tannoy (or KEF, etc...) is great. Indeed, if one were to have a super tweeter incorporated into a multi-way speaker, it would be good that it remains time/phase coherent with crossovers typically up at the 5kHz-7.5kHz range. Again, I think nobody generally contests this since it applies to all speakers, not just those with super tweeters (and remember, we can use DSP processing to improve time domain performance in our sound rooms).
3. Human perception of ultrasound is possible... Ahhhhhh, here lies the contentious claim!
Is there any evidence that what are typically considered ultrasonic frequencies (>20kHz) are perceptible?
If you look around at audiophile articles, just like the Tannoy white paper, you will no doubt run across the "Oohashi Hypersonic Effect". Basically, this was a series of papers published at the turn of the Millennium by Tsutomu Oohashi and colleagues, some concepts initially discussed in an AES convention way back in 1991 before making its way to the Journal of Neurophysiology in 2000 with further additions. I'm not going to spend much time talking about this again because I have written and analysed the papers (including a more recent 2014 paper where the researchers used DSD128) back in January 2015's post "MUSINGS: What Is The Value of High Resolution Audio (HRA?)". The bottom line IMO is that this stuff is simply too speculative and clearly it's more complicated and should not be used IMO as proof that reproducing ultrasonic frequencies will always result in a beneficial effect. For example, the recent 2014 presentation actually considered safety issues and both "positive" and "negative" hypersonic effects on EEG activity were reported. My sense is that until there is actual paradigmatic understanding (not just observational reports of neurophysiological change), there is no point debating the Oohashi stuff as a pro or con. (For more on this along with links to other studies, including those that failed to replicate Oohashi, see the Wiki on "Hypersonic Effect".)
As for the other research referenced by that Tannoy whitepaper, the 1991 paper by Lenhardt is worth thinking about. Realize that this research involved placing a transducer directly against the subject's body for bone conduction (see this New York Times article related to the research). The theory is that ultrasonics are not "heard" by the usual cochlear mechanism but rather the saccule which is an inner ear organ of the vestibular system used in balance / positional equilibrium. Although I don't know how far the research has progressed, we can see based on this recent 2013 paper by Kagomiya that they used a ceramic transducer placed over the mastoid bone behind the ear and had the non-deaf subjects determine the "audibility" of "sound" transmitted through a 30kHz carrier. We're not told how much power was needed to create the ultrasonic vibrations which the subjects were able to detect (ie. what SPL in the air would be equivalent to cause these vibrations?). Suppose we accept that ultrasonic stimulation through bone conduction is beneficial for high-fidelity appreciation, how loud does a 30kHz signal from the super tweeter need to be in order to vibrate our skull so that it even has a chance to be appreciated!? Perhaps someone can let us know but I have a feeling the typical amplitude of ultrasonic material during an acoustic musical concert is far less than the amount that the researchers in this experiment utilized (obviously I'm not talking about lo-fi ear-splitting amplified rock concerts filled with distortions and all kind of wide bandwidth noise).
Finally, the Tannoy whitepaper speaks of a 10dB peak at 30kHz being audible in their internal testing. So did they publish this finding? If not, why not?! This would be fascinating, would have prompted replication research, and likely will spur on sales of their SuperTweeter. Talk is cheap, Tannoy.
Remember that recently there was also the meta-analysis by Joshua Reiss to determine perceptibility of high-resolution audio (Journal of the AES, June 2016). He cites 4 papers (out of 31 utilizing auditory experiments) which suggested the potential for hearing >20kHz. Two of these 4 are from Ashihara. Have a look at this 2006 paper where they tried to determine hearing thresholds beyond 20kHz for 15 subjects.
For convenience, here is Figure 6 from the paper where they showed the results from the 4 out of 15 subjects with best high-frequency hearing acuity:
So far, we see little evidence that ultrasonic frequencies are actually audible by assessing the claims at least put forth by the Tannoy whitepaper. Likewise, I see no evidence that typical audio playback could result in bone conduction to any significant degree as would be suggested by the Lenhardt work.
Here are a few further points to consider:
1. Again, only the young can likely hear up to 20kHz with any significance. Let's be honest guys. By the time we're 35 years old, on average, hearing acuity of an 8kHz tone is somewhere on the order of -10dB compared to 1kHz. We basically don't hear anything by 20kHz. Remember that the ladies among us are blessed with more graceful decline in acuity. Even the Ashihara paper above showed only a minority (6/15) of young folks up to 33 years old had a measurable threshold to 22kHz (Nyquist limit of CD), and nobody in their population was able to hear pure tones beyond 24kHz up to ~90dB SPL. It is unfortunate that they did not publish demographic information like the mean age or gender of those they found with best high frequency acuity.
The evidence at least based on pure-tone analysis suggests that a sample rate around 48kHz is the absolute most that any adult really can even have hope of perceiving; any frequencies that need a higher sample rate is beyond benefit barring the vague Oohashi stuff. As Figure 6 above also showed, if there were noise or other frequencies in the signal, masking happens and makes detection even worse. I think we can round up to 50kHz PCM sample rate and state with good confidence that this is all we need to capture for human consumption in the frequency domain.
2. Microphones typically are bandwidth limited to around 20kHz. You can easily see this in the vast majority of recordings (synthetic sounds of course can have all kinds of unintentional ultrasonics). An example of what one typically sees from a pop/rock studio is something like Joni Mitchell's "Both Sides Now" from the DVD-A released in 2000 (24/96):
Coincidentally, the graph also demonstrates just how noisy this recording is. Although it's presented as 24-bits on the DVD-A, there's no evidence that it needs anything more than 16-bits given the high noise floor clearly way above -96dB. Furthermore, in the production chain (I don't know if it was recorded to analogue tape, went through an analogue mixer, or had a noisy ADC), there is a rather high level 29kHz noise peak. Clearly this is not part of the music, nor would anyone claim they "should" hear this. So why bother reproducing this ultrasonic signal?
Even in bona fide high-resolution recordings, ones like this 2L DXD sample found here (I downsampled it to 96kHz to better demonstrate the roll-off in the recording at one of the more dynamic segments of the music):
Clearly this is a much better recording than "Both Sides Now" with very low noise floor so 24-bit resolution could be reasonable. But time and again, this is what you see in the spectrum of essentially all recordings. There's just nothing much up in the high frequencies beyond 20kHz in the music we buy; even those that are "high resolution" with high sample rates like 192+kHz.
I mentioned a bit more about microphones in the post last year: "MUSINGS/ANALYSIS: Is there any value in 176.4 and 192kHz Hi-Res audio files?".
Finally, just to complete this illustrations, here's a very good sounding SACD - Christina Pluhar & L'Arpeggiata's La Tarantella: Antidotum Tarantulae:
I'm showing the FFT at one of the most dynamic parts of the music - notice how the high frequency naturally drops off. As another natural sounding acoustic recording, this time to DSD, notice that there's no evidence of anything significant beyond 25kHz being recorded by the microphones. Without filtering, the DSD64 quantization noise is rather nasty as well, obviously adding nothing of value to "high fidelity" sound...
3. Adding super tweeters make the speaker more complicated and increases risk of poor integration. More expensive, potential for suboptimal cross-overs. Why bother unless there is proof of value?
4. No transducer is perfect. Speaker non-linearities result in intermodulation and subharmonic distortions that may be audible <20kHz. This is reason not to record too much ultrasonic content nor try to reproduce it. This is especially true with ultrasonically noisy content like unfiltered SACD/DSD64 as above which is part of the noise shaping used in the technology. Large amounts of noise amplified and sent to the speakers in no way provides benefits to the audiophile who desires true high fidelity! (Refer to Monty's "Intermod Tests" and have a listen to demonstrate to yourself why too much ultrasonic content might be bad in your own system.)
5. High frequencies are attenuated through air. Let's just focus on this for a bit because I think it's interesting and perhaps we don't think about it enough. Tell me, friends, how many of you enjoy listening to an orchestra or jazz band sitting from a vantage point just 4 feet in front of a cymbal, triangle, or trumpet? Nobody, I hope :-). Realize that this is the distance the measurement microphone (a 1/4" B&K 4135 condenser) was placed in the Boyk article analyzing the high-frequency content of instruments.
Assuming we could hear the ultrasonics, and our microphones are able to record >20kHz at high fidelity, and the playback chain including the speakers can reproduce >20kHz frequencies accurately, there is still significant attenuation of these high frequencies due to the air between us and the speakers. In fact, you can have fun calculating this here. As I type this on a sunny day in Vancouver, the humidity indoors is around 40%, and it's about 20°C room temperature in my sound room. Calculated attenuation per meter at 20/25/30 kHz look like this for "atmospheric absorption":
Considering that I sit 3m (about 10 feet) from my speakers, this means that at 20kHz, the sound reaching my ears attenuates another -1.7dB, at 25kHz it's -2.3dB, 30kHz -2.8dB, and -3.7dB by 40kHz. Here's a graph demonstrating the attenuation curve 10' away in a 20°C, and 40% humidity room as an average benchmark for absorption in this part of the world (typically lower temperature and humidity will increase absorption):
Comparatively, this is not a huge amount at 10-feet (3m) of course. However, if we're talking about frequencies produced by real instruments, and we're trying to replicate the sound of an actual performance at a venue, we must ask ourselves then the question "what distance should the recording sound like it's coming from?" This is important for the tonal balance of high frequencies due to the non-linear disproportionately high amount of atmospheric absorption.
As someone who enjoys going to the orchestra, when I listen to the Vancouver Symphony at the Orpheum Theatre in downtown Vancouver, the instruments are clearly much further away than the 10-feet or so between me and the speakers in the sound room. Suppose I sat front-and-center at the Orpheum, I would estimate that the guy playing the cymbal and triangle emitting all those ultrasonic frequencies at the back of the orchestra would be at least 40' away. Assuming 20°C, 40% humidity, this means that a 20kHz tone would have already attenuated by at least -6dB even if this was direct with nothing but air in the way. By 30kHz, there's >11dB of loss at this seating position; again without all the rows of musicians between me and the cymbal emanating >20kHz material. That's assuming I'm sitting close to the orchestra as an audience member - how much worse is the attenuation sitting further back with the bodies of other patrons in the rows in front of me?!
It's worth thinking about this idea of perspective as a listener (not just for the frequency response reason of course). If our home stereos are supposed to reproduce sounds in "high fidelity", based on what sounds "natural" around us, the reality is that acoustic music (IMO the type of music most appropriate for "high resolution" reproduction) heard as an audience member naturally rolls off the highs. This is potentially why research into "target curves" or "house curves" used in room correction tends to de-emphasize frequencies from 10-20kHz. An example is the "Harman Target Curve" (as described in this 2015 paper):
This data was produced empirically by allowing subjects to use tone control to find subjectively preferred tone curves starting with a flat-calibrated room/speaker system. Notice how trained listeners preferred a gradually descending frequency response of the speakers. Wouldn't you think that if ultrasonics were important, that experimental results, especially with trained listeners would at least suggest flat frequency response to 20kHz for most music?
I've often thought about the rationale for preference of rolled-off high frequencies in these target curves (the classic B&K curve has similar roll-off). I wonder whether close-mic'ed studio productions may be adding to this these days. Remember, we typically sit many meters away from the artists in a live performance. Our ears/mind naturally expect high frequency roll-off in such a venue and would also expect the same in the home sound room. When artists are recorded in a studio, close-mic techniques where the microphone is placed often a foot away from instruments could result in a rather unnatural tonal response picking up more high frequency energy than a normal listener would hear from many feet away - the kind of roll-off demonstrated by acoustic recordings like the Magnificat and La Tarantella tracks shown above.
High-fidelity speaker systems capable of flat response to 20kHz may actually sound too "harsh" or "analytical" when playing these close-mic'ed studio recordings. Who knows, maybe this is why some objectively "inferior" speakers with early high-frequency attenuation can sound natural and more "musical" in certain situations. Perhaps this is also why suboptimal digital converters like NOS DACs and those using early roll-off filters like the PonoPlayer could be preferred despite the aliasing distortions that come along with those filter settings. The idea about close-mic'ed studio recordings and harshness is speculation on my part, so I would be interested in others' thoughts. Considering recent developments, I think it is possible that it's not "time domain" quality and short impulse response graphs that are important when looking at digital filters... Rather, it's the high frequency roll-off that can sound more "natural" (eg. PonoPlayer / Ayre filter starts rolling off by 7kHz and about -4.5dB by 20kHz for 44.1kHz material). Perhaps these digital filters are acting as mild tone control compensating in an era where our hi-fi gear no longer have those control knobs any more.
For completeness, I know that some vinyl lovers will talk about the superiority of frequency response compared to CD. While it is true that a good quality LP can contain frequencies well above 20kHz, it's not like you see much ultrasonic content on most LP's. My personal experience with vinyl rips whether it's with Denon DL-110, Shure M97xE, or Ortofon Cadenza Black cartridges have shown that LP playback typically rolls off the highs up to 25-30kHz reaching the noise floor with little content above. I have no qualms with down-sampling my vinyl rips to 48kHz these days.
The Bottom Line...However one looks at the evidence, I think it's fair to say that the likelihood of perceiving (not necessarily even hearing) a difference in sound material >20kHz is simply dubious. Then there's the question of whether this actually benefits the sound even if perceptible.
Yes, while there are the occasional research papers suggesting differences in audibility between sample rates (like this one comparing 88.2kHz vs. 44.1kHz from 2010), I have yet to see clear evidence that ultrasonic frequencies themselves have resulted in audible differences as opposed to sonic differences because of the DAC or ADC operating at different sample rates.
Among the research, I found the negative NHK study (Nishiguchi et al. 2009) particularly fascinating using the Pioneer/TAD PT-R9 super tweeter, B&W Nautilus 801 speakers, dcs digital gear, Sony and Marantz amps. In that study, they used 36 subjects ranging from teenagers to those in their 50's; a huge proportion (33/36) with audio engineering experience, 6/36 women, and the musicians who recorded the test audio also participated. There was no evidence of a difference whether the super tweeter was activated with >21kHz content. Interestingly, one 17-year old female subject actually did really well in the research trials but subsequent further trials with this individual did not pass statistical levels of significance.
Yes, I know there are some folks with strong testimonies out there. I remember interacting with a "Golden Ear" who claimed he could hear up to 30 kHz (with no evidence). Anyone can have an opinion but opinions aren't facts. Even if this fellow tried a 30kHz tone test, it's possible that he heard harmonic distortion below 20kHz but didn't know it. More often than not, proponents of super tweeters claim that they improve the "air" of the sound system, improve the spatial dimension or clarity of the treble, some even claim they improve the "transient" accuracy of bass - vague claims indeed. But time and again, the scientific literature questions the audibility and benefits wherever we look. Whether it's our human physiology (which deteriorates with age - especially frequency response), recording equipment specs (like roll-off from typical microphones used), the music signal not likely containing much >20kHz, or even the air we breathe absorbing high frequencies, all these factors collude to limit the likelihood of significant ultrasonic content in live acoustic performances, within the recorded music itself and ultimately potential audibility.
My personal experience resonates with the science. For years now I have been examining "high resolution" albums from SACD, DVD-A, Blu-Ray to digital downloads and have rarely seen what looks like actual recorded ultrasonic content especially in good acoustic recordings. I have done my own ABX listening tests with what should be very high quality recordings like those from 2L on different systems (such as discussed here). These days, in my mid-40's, I just see no point in purposely pursuing a system with ultrasonic frequency response. A system with the ability to reproduce sound reasonably flat to around 20kHz is great. Beyond that, I'll do my own tweaking of the room, and find the best sounding mastering of albums...
If we take Ashihara's research in pure-tone audibility in young people as true, we might say a sample rate of 50kHz is absolutely all we would ever need. In fact, this is why I honestly think that for the purpose of streaming "high-resolution" audio these days, what's wrong with just 24/48 FLAC? This is what I would prefer instead of a restrictive scheme like MQA.
Having said this, as a "perfectionist audiophile", I like the idea of high-resolution in the sense that deeper bit-depth (ie. 24-bits) and higher sample rate to 88.2/96kHz (because of universal compatibility at these sample rates) will allow us to capture all that decades worth of research reports have shown to be the limits of human hearing and more. As one who likes to "own" my own music collection, this would be my preference. Whatever arguments are being made about digital filter effects would be moot at the 88.2/96kHz sample rates as well. Realize that playback even of 88.2/96kHz material could result in non-linearities with poor speaker systems. So long as the high resolution recordings do not contain inordinate amounts of ultrasonic noise and respect the natural high frequency roll-off, this should not be an issue. No need in my mind for sample rate higher than 96kHz as a consumer.
Despite claims over the decades, I have yet to see super tweeters considered "must have" features of most speaker designs. Likewise, I'm not sure if there is any data to support the idea that headphones capable of frequency response significantly above 20kHz are considered to sound "better". Given the mere centimeters distance between the headphone transducer and the auditory organ, ultrasonic frequencies would be essentially free from air attenuation. Also, why is nobody (I know) asking for hybrid air/bone conduction headphones that can handle up to 20kHz or so by air conduction and offer ultrasonic stimulation through bone conduction* (as per the Lenhardt research)?!
Needless to say, I believe we can all enjoy the music fully without "super tweeters". No need for concern or regret; fearing that we're missing out.
I hope this provides some food for thought, Frank... Cheers and greetings to the Connecticut Audio Society.
*BTW, there are bone conduction headphones out there like this one, but if you look at reviews, one cannot expect sound quality to be as good as air conduction even though they do provide benefits (eg. low isolation so you still hear what's going on around, possibly easier for fit and comfort in some situations).
Over the years, I have wanted to do a blind test comparing content with >20kHz vs. filtered audio for you guys (something like the 24-bit vs. 16-bit Internet Blind Test in 2014). Unfortunately, it would be very difficult to ensure adequate blinding because anyone these days could pull up the file in an audio editor or see results on a spectral analyzer and determine which is which. I would need to ensure subjects are honest in order to have faith in the results if we did such a trial :-)...
Well, Dynamic Range Day 2017 just passed on March 31st. Always good to remember that at the end of the day, quality of the mastering (of which a big part is the retention of good dynamic range) is an essential piece of the joy of high-fidelity sound! While you're on the web site, do check out the Loudness War Research page for some great info.
This week I've been getting into the guitar blues of Hubert Sumlin (1931-2011). Perhaps best known as a member of Howlin' Wolf's band, this guy "rocks" on his solo work as well. His 1998 album I Know You (DR13) I found very enjoyable.
As usual, have a great week ahead everyone and hope you're all enjoying the music!
In response to jhwalker below in the comments about DSD and the 25kHz roll off... Not true, DSD64 can encode frequencies way beyond 25kHz. Remember those SACD ads claiming 100kHz frequency response? Those ultrasonic frequencies however then get buried in the rising noise floor - this is the price to pay for low-resolution 1-bit sampling, and why high sampling rates in the MHz range are needed. Many DAC's will implement analogue filtering so the amp/speakers don't suffer through trying to reproduce all that high frequency noise! Software like JRiver for example will by default also digitally low-pass filter DSD --> PCM playback around 24kHz.
The La Tarantella DSD track was converted with no filtering done to the conversion. Here's an example of "synthetic" music with >25kHz components in DSD64:
As you can see, that's from the Beck album. Synthetic instruments, close mic recording, studio tricks with all kinds of ultrasonic frequencies (likely noise), much of it buried below the typical DSD64 quantization noise from 35kHz onward.
Addendum 2: (April 2, 2017)
A friend just E-mailed to me his disappointment with Bob Dylan's Triplicate available as 24/192 at the usual places. Apparently there's a nasty 27-28kHz noise that runs through the songs:
I don't know if others have "heard" this. Assuming nobody has complained, I therefore assume a frequency like this above 24kHz is inaudible and undetectable.
Sadly, this is yet another example of why much of the pop/rock "high resolution" albums IMO are worthless. Why pay more for 24/192? With a noise peak like that, might as well downsample the whole thing from 192kHz to 48kHz. Plus the noise floor doesn't demand >16-bits, and the album is compressed to DR8 so by all means also dither it down to 16-bits.