The default memory pools contain a lot of different things.
One of these thing is the structural information of the audio to be played
Each sound that can potentially be played, even when streaming, needs to have its parameters ready.
Sounds, being at the bottom of the tree are likely to be the ones requiring the more memory, principally because you can end up with hundreds or sounds inside a single actor mixer tree. Actor mixers are the nodes that requires the less memory. (You usually save more memory removing a sound than removing an actor mixer).