The illusion of self-motion induced by moving visual stimuli ("vection") has typically been attributed to low-level, bottom-up perceptual processes. Therefore, past research has focused primarily on examining how physical parameters of the visual stimulus (contrast, number of vertical edges etc.) affect vection. Here, we investigated whether higher-level cognitive and top-down processes - namely global scene consistency and spatial presence - also contribute to the illusion. These factors were indirectly manipulated by presenting either a natural scene (the Tübingen market place) or various scrambled and thus globally inconsistent versions of the same stimulus. Due to the scene scrambling, the stimulus could no longer be perceived as a consistent 3D scene, which was expected to decrease spatial presence and thus impair vection. Twelve naive observers were asked to indicate the onset, intensity, and convincingness of circular vection induced by rotating visual stimuli presented on a curved projection screen (FOV: 54°×45°). Spatial presence was assessed using presence questionnaires. As predicted, scene scrambling impaired both vection and presence ratings for all dependent measures. Neither type nor severity of scrambling, however, showed any clear effect. The data suggest that higher-level information (the interpretation of the globally consistent stimulus as a 3D scene and stable reference frame) dominated over the low-level (bottom-up) information (more contrast edges in the scrambled stimuli, which are known to facilitate vection). Results suggest a direct relation between spatial presence and self-motion perception. We posit that stimuli depicting globally consistent, naturalistic scenes provide observers with a convincing spatial reference frame for the simulated environment which allows them to feel "spatially present" therein. We propose that this, in turn, increases the believability of the visual stimuli as a stable "scene" with respect to which visual motion is more likely to be judged as self-motion. We propose that not only low-level, bottom-up factors, but also higher-level factors such as the meaning of the stimulus are relevant for self-motion perception and should thus receive more attention. This work has important implications for both our understanding of self-motion perception and motion simulator design and applications.