I mis-read your question....
Consider the fact that Google can recognize song lyrics and tell you the name of the song.
Consider the fact that Google processes "mobile search via audio input sampling"
Hey Google "Closest Chinese restaurant to me".
So are you optimizing for audio input or "keyword phrase" ?
You optimize the content for the audience, .. So it makes much more sense to optimize the content by offering it on the appropriate channels. A music vid on Youtube etc. etc.
The point being.. you are not providing an audio sample, but in fact targeting a keyword phrase that the content has in it.
Until Google becomes 100% accurate with pictures we describe the picture with many specific attributes, file name, alt text, title text etc.
The same principle's apply to audio seo as the market and communities now call it..
file name, song name, song lyrics, artist, year of recording, etc.etc.
So is seo worth it ... "Yes"