I have been "away" from this site for a while, and thus was delighted to see Robin's thoughtful comments.
I would put the issue (the emphasis of the issue) in a somewhat different (but compatible) framework.
In neuroscience (behavior, mental operations) there is a delicate balance between modularity (separation) and interconnection (even interdependence). Too much of the former leads to an anarchy of function; too much of the latter leads to smudge!
If neither anarchy or smudge are workable extremes, then we have the question of RELATIVE AUTONOMY, and its DYNAMICS. One way to achieve this is to have systems with an excitatory core of relatively low threshold and surround (lateral) inhibition with a higher threshold.
The consequence is that systems are broadly tuned at low activation levels and more narrowly tunned at higher activation levels.
When viewing a given system, surround or contextual factors can play a greater or lesser role. The key point is that this role need not be fixed, but can shift with the overall dynamics of the systems in question.
Here is a loose analogy. Early ethologists often spoke and wrote about early "appetitive" behavior followed by narrow "consummatory acts". The early appetitive phase reflected early activation, and was broadly tuned (e.g. "get food"). As the system got cranked up over time the focus tighted. (e.g. bite this deer on the nose, in this way). Tinbergen's hierarchy model was based on such considerations.
I suspect (but its hard to get hard data!) that many systems are broadly tuned during early and/or low levels of activation. Contextual factors become important. Later, these systems become "self-organized" and relatively autonomous from their surrounds. The story can be modelled, at least qualitatively, through center excitatory processes and higher threshold surround inhibitory processes.
Its a fact that vision and audition are for the most part separable channels, and its a good thing they are! However, it is also well know that, under appropriate conditions - such as early attention, each modality can influence processing through the other modality. Context becomes important. The systems do not operate in total isolation.
So systems are partially isolated, perhaps. This relative isolation can vary with system dynamics. These dynamics can in principle be modelled. This has not, to my knowledge, been done very well.
Make any sense?
John