Jon, this is certainly an awesome and interesting way to target audio on the web, which can be tremendously useful.
Given what you’ve got here, I suspect that you may be unaware of the W3C spec for media fragments which may make portions of what you’re attempting to do a bit easier (and also much more standardized). The spec is relatively broadly supported by most browsers, so it immediately makes things a tad easier from a plumbing perspective.
Some people will be somewhat familiar with the targeting technique as it’s similar to the one used by YouTube which lets users hot link to specific portions of video on their platform.
To summarize the concept, on most audio and video files one can add a #t=XXX the the end of a URL where XXX is the number in seconds into the file where one wants to start. One can target stretches of audio similarly with the pattern #t=XXX,YYY where XXX is the start and YYY is the stop time for the fragment, again in seconds.
As an example I can use it to specifically target the audio on a particular standalone audio file like so:
As an added bonus, on this particular page with audio, you’ll notice that you can play the audio and if you pause it, the page URL in your browser should automatically refresh to indicate the particular audio timestamp for that particular position! Thus in your particular early example it makes things far easier to bookmark, save, or even share!
It would be nice if the user could queue up the particular audio segment and press pause, and then annotate the audio portion of the page using such a targeting segment. Then one could potentially share a specific URL for their annotation (in typical Hypothesis fashion) that not only targets the original page with the embedded audio, but it could also have that audio queued up to the correct portion (potentially with a page refresh to reset the audio depending on the annotation.)
The nice part is that the audio can be annotated within the page on which it originally lived rather than on some alternate page on the web that requires removing the context and causing potential context collapse. It also means one doesn’t have to host an intermediate page to have the whole thing work.
I’m curious if the scheme may make putting all the smaller loose pieces together even easier, particularly for use within Hypothesis? and while keeping more of the original context in which the audio was found?
I also suspect that these types of standards could be used to annotate audio in much the same way that the SoundCloud service handles their audio annotations, though in a much more open way. One would simply need to add on some additional UI to make the annotations on such audio present differently.