A number of regions in the human brain are known to be involved in processing natural scenes, but the field has lacked a unifying framework for understanding how these different regions are organized and interact. We provide evidence from functional connectivity and meta-analyses for a new organizational principle, in which scene processing relies upon two distinct networks that split the classically defined parahippocampal place area (PPA). The first network of strongly connected regions consists of the occipital place area/transverse occipital sulcus and posterior PPA, which contain retinotopic maps and are not strongly coupled to the hippocampus at rest. The second network consists of the caudal inferior parietal lobule, retrosplenial complex, and anterior PPA, which connect to the hippocampus (especially anterior hippocampus), and are implicated in both visual and nonvisual tasks, including episodic memory and navigation. We propose that these two distinct networks capture the primary functional division among scene-processing regions, between those that process visual features from the current view of a scene and those that connect information from a current scene view with a much broader temporal and spatial context. This new framework for understanding the neural substrates of scene-processing bridges results from many lines of research, and makes specific functional predictions.
There are a number of brain regions that only show high levels of activity for full photographic scenes, not individual objects. By examining their relationships to each other and the rest of the brain, we argue that there are two types of scene-processing regions that belong to two separate networks. One network, which overlaps most of the visual system, processes visual features of the current view of the world, such as spatial layout. Another network, which is connected to long-term memory, puts this moment-by-moment information in context, allowing us to navigate through environments and remember past events in familiar locations. These two groups of brain regions cooperate to help us understand the world and our place in it.
Natural scene perception has been shown to rely upon a distributed set of cortical regions, including the parahippocampal place area (PPA; Epstein and Kanwisher, 1998), retrosplenial complex (RSC; O’Craven and Kanwisher, 2000), and the occipital place area [OPA; also called the transverse occipital sulcus (TOS); Nakamura et al., 2000; Hasson et al., 2003]. More recent work has suggested that the picture is even more complicated, with multiple subdivisions within PPA and the possible involvement of the parietal lobe (Baldassano et al., 2013). Although there has been substantial progress in understanding the functional properties of each of these regions and the differences between them, the field has lacked a coherent framework for summarizing the overall architecture of the human scene-processing system.
There is a long history of proposals for partitioning the visual system into separable components with different functions, such as spatial frequency channels (Campbell and Robson, 1968); what versus where/how pathways (Mishkin et al., 1983; Kravitz et al., 2011); or magnocellular, parvocellular, and koniocellular streams (Kaplan, 2004). With respect to natural scene perception, one can imagine at least two separable functions: processing the specific visual features present in the current glance of a scene, and connecting that to the stable, high-level knowledge of where the place exists in the world, what has happened here in the past, and what possible actions we could take here in the future. For most cognitive and physical tasks we undertake in real-world places, the specific visual attributes we perceive are just a means to this end, of recalling and updating information about the physical environment; “the essential feature of a landmark is not its design, but the place it holds in a city's memory” (Muschamp, 2006). The connection between place and memory has been recognized for thousands of years, reflected in the ancient Greek “method of loci” that strengthens a memory sequence by associating it with physical locations (Yates, 1966).
To determine whether moment-by-moment visual processing versus dependence on past experience is a major organizing principle of the brain, we take a data-driven approach to identifying scene-sensitive regions and clustering cortical connectivity. We first aggregate local high-resolution resting-state connectivity information into spatially coherent parcels, in order to increase signal to noise and obtain more interpretable units than individual voxels. We then apply hierarchical clustering to show that there exists a natural division in posterior human cortex that splits scene-related regions into two separate, bilaterally symmetric networks. The posterior network includes OPA and the posterior portion of PPA (retinotopic maps PHC1 and PHC2), while the anterior network is composed of the RSC, anterior PPA (aPPA), and the caudal inferior parietal lobule (cIPL). We then show that these two networks differ in their connectivity to the hippocampus, with the anterior network exhibiting much higher resting-state hippocampal coupling (especially to anterior hippocampus), suggesting that memory- and navigation-related functions are primarily restricted to the anterior network. We provide supporting evidence for this functional division from a reverse-inference meta-analysis of previous results from visual, memory, and navigation studies, and an atlas of retinotopic maps.
Based on these results, as well as a review of previous work, we propose that scene processing is fundamentally divided into two collaborating but distinct networks, with one focused on the visual features of a scene image and the other related to contextual retrieval and navigation. Under this framework, scene perception is less the function of a unified set of distributed neural machinery and more of “an ongoing dialogue between the material and symbolic aspects of the past and the continuously unfolding present” (Baker, 2012).
Materials and Methods
The majority of the data used in this study were obtained from the Human Connectome Project (HCP), which provides detailed documentation on the experimental and acquisition parameters for these datasets (Van Essen et al., 2013). We provide an overview of these datasets below.
The group-level functional connectivity data were derived from the 468-subject group–principal component analysis (PCA) eigenmaps, distributed with the June 2014 “500 Subjects” HCP data release. Resting-state fMRI data were acquired over four sessions (14 min, 33 s each), while subjects fixated on a bright cross-hair on a dark background, using a multiband sequence to achieve a TR of 720 ms at 2.0 mm isotropic resolution (59,412 surface vertices). These time courses were cleaned using the Oxford Centre for Functional MRI of the Brain independent component analysis-based Xnoiseifier (FIX; Salimi-Khorshidi et al., 2014), and then the top 4500 eigenvectors for each vertex were estimated across all subjects using Group–PCA (Smith et al., 2014). These data were used to perform the parcellation and network clustering, and to generate whole-brain maps (Figs. 1, 2a, 3a )
Because using the full dataset in its entirety would be computationally challenging to assess statistically, we performed more detailed analyses on a subset of 20 subjects (Figs. 2b,c, 3b–d ). For 20 subjects within the “500 Subjects” release with complete data (subject identifications 101006, 101107, 101309, 102008, 102311, 103111, 104820, 105014, 106521, 107321, 107422, 108121, 108323, 108525, 108828, 109123, 109325, 111413, 113922, and 120515), we created individual subject resting-state datasets by concatenating their four resting-state sessions (after removing the per-run means).
We identified group-level scene localizers (used only as functional landmarks) from a separate set of 24 subjects (see below). Subjects viewed blocks of stimuli from up to six categories: child faces, adult faces, indoor scenes, outdoor scenes, objects (abstract sculptures with no semantic meaning), and scrambled objects. Functional data were acquired on one of two GE MR 750 3 T scanners, with an in-place resolution of 1.56 mm, a slice thickness of 3 mm (with 1 mm gap), and a TR of 2 s; a high-resolution (1 mm isotropic) spoiled gradient-recalled acquisition in a steady state structural scan was also acquired to allow for transformation to MNI space.
The cIPL was defined using the Eickhoff–Zilles PGp probabilistic cytoarchitectonic map (Eickhoff et al., 2005; Baldassano et al., 2013). The hippocampus was divided into anterior and posterior subregions at MNI coordinate y = −21, consistent with previous studies (Poppenk et al., 2013; Zeidman et al. 2015).
Scene-localizer data were collected from 24 subjects (6 females; age range, 22–32, including one of the authors). Subjects were in good health with no history of psychiatric or neurological diseases, and with normal or corrected-to-normal vision. The experimental protocol was approved by the institutional review board of Stanford University. Subjects were recruited only at Stanford University and gave their written informed consent.
The 468-subject eigenmaps distributed by the HCP are approximately equal to performing a singular value decomposition on the concatenated time courses of all 468 subjects, and then retaining the right singular values scaled by their eigenvalues (Smith et al., 2014). This allows us to treat these eigenmaps as pseudo-time courses, since dot products (and thus Pearson correlations) between eigenmaps approximate the dot products between the original voxel time courses. We generated a voxel-level functional connectivity matrix by correlating the group-level eigenmaps for every pair of voxels and applying the Fisher z-transform (hyperbolic arctangent). We parcellated this 59,412 × 59,412 matrix into contiguous regions, using a generative probabilistic model (Baldassano et al., 2015). This method finds a parcellation of the cortex such that the connectivity properties within each parcel are as uniform as possible, making multiple passes over the dataset to fine-tune the parcel borders. We set the scaling hyperparameter to produce a manageable number of parcels, but our clustering results are similar for a wide range of settings for (producing between 140 and 216 parcels).
To identify PPA, RSC, and OPA, we deconvolved the localizer data from the 24 localizer subjects using the standard block hemodynamic model in AFNI (Cox, 1996), with faces, scenes, objects, and scrambled objects as regressors. The scenes > objects t statistic was used to define PPA (top 300 voxels near the parahippocampal gyrus), RSC (top 200 voxels near retrosplenial cortex), and OPA (top 200 voxels near the transverse occipital sulcus), with mask sizes chosen conservatively based on typical ROI volumes (Golarai et al., 2007). The ROI masks were then transformed to MNI space, summed across all subjects, and mapped to the closest vertices on the group cortical surface. The group-level ROI was then manually annotated as the cluster of highest overlap between the subject ROI masks. These ROIs are consistent with typical definitions in the literature (Julian et al., 2012).
Parcel-to-parcel and hippocampal functional connectivity
Given a parcellation, we computed the group-level connectivity between a pair of regions by taking the mean over all eigenmaps in each region, then correlating these mean eigenmaps (which, as described above, can be treated as pseudo-time courses) and applying the Fisher z-transform (hyperbolic arctangent). We computed subject-level connectivity in the same way, using the resting-state time course for each voxel rather than the eigenmap.
Connectivity between cortical parcels and the hippocampus was computed similarly, using eigenmaps (for group data) or time courses (for subject data) extracted from the hippocampal volume data distributed by the HCP. In order to focus on hippocampal connectivity differences among parcels, we used the mean gray time course regression version of the group data and regressed out the global time course from the subject data.
The 172 × 172 parcel functional connectivity matrix was converted into a distance matrix by subtracting every entry from the maximum entry. Hierarchical ward clustering (unconstrained by parcel position) was applied to the distance matrix to compute a hard clustering into 10 networks. After identifying the 16 parcels (8 per hemisphere) overlapping with scene-related regions, we computed a similar distance matrix for these parcels (subtracting every entry from the maximum entry) and applied classical multidimensional scaling to yield a two-dimensional visualization of its structure.
Meta-analysis and retinotopic field maps
Two reverse-inference meta-analyses were performed using the NeuroSynth website (Yarkoni et al., 2011). NeuroSynth is a set of open-source python tools for automatically extracting data from fMRI studies and computing activation likelihood maps, and the website hosts these tools (and associated datasets) for public use. Supplying a key word query identifies all studies whose abstract contains that key word, and then analyzes the activations reported in these queried studies. In addition to standard “forward inference” maps giving the probability p(activation|query) that a voxel will be activated in these studies, NeuroSynth generates “reverse-inference” maps giving the probability p(query|activation) that a voxel activation came specifically from this query set. Voxels appearing in the reverse-inference map, therefore, appear more often in the query set relative to the full set of (>10,000) fMRI studies in the database. This accounts for base rate differences in how often activation is observed in different brain regions.
Our meta-analyses can be viewed on-line at http://neurosynth.org/analyses/custom/dda0e003-efd0-4cfa/ and http://neurosynth.org/analyses/custom/9e6df59d-02df-4357/. The first used the query “scene,” and consisted of 47 studies. Manual inspection of all studies confirmed that they all studied the perception of environments, and 45 of 47 studies involved the presentation of visual scenes. The second meta-analysis used the query “episodic memory OR navigation OR past future,” which returned 125 studies that were nonoverlapping with the first query.
A volumetric group-level probabilistic atlas (Wang et al., 2014) was used to define retinotopic field maps. We computed the total probability mass of each map that fell within one of our two networks or in other regions of the cortex, and then normalized the sum of the three values to 100%. For visualization, the probability that a voxel belongs to any field map was computed as , where pi is the probability that the voxel falls within field map i.
Our primary dataset is a 1.8 billion element resting-state connectivity matrix distributed by the Human Connectome Project (Van Essen et al., 2013), which estimates the time course correlation between every pair of locations in the brain at 2 mm resolution based on a group of 468 subjects. Since we wish to understand the large-scale structure of visual cortex, it is helpful to abstract away from individual voxels and study the functional and connectivity properties of larger parcels. Rather than imposing a parcellation based on specific regions of interest, we used a data-driven approach to produce spatially coherent parcels tiling the cortical surface in a way that retains as much information as possible from the full connectivity matrix. This parcellation consists of 172 regions across both hemispheres, each of which contains surface vertices that all have very similar connectivity patterns with the rest of the brain. The connectivity matrix among these 172 parcels captures >76% of the variance in the original connectivity matrix, despite being dramatically smaller (by five orders of magnitude).
Clustering parcels into networks
To determine how these local parcels are organized into distributed networks, we performed hierarchical clustering to group together parcels with high functional connectivity (regardless of their spatial position). These networks are remarkably similar between hemispheres (despite not being constrained to be symmetric), as shown in the 10-network clustering in Figure 1.
Which of these networks are directly related to scene perception? We used data from a standard localizer in a separate group of subjects to define group-level regions of interest for scene-selective regions OPA, PPA, and RSC. We also anatomically identified cIPL as was done in a previous study (Baldassano et al., 2013), since this region has been shown to have functional connections to scene regions.
We found that these scene ROIs fell almost entirely onto two of the connectivity networks. A posterior network (dark blue), overlapping OPA and posterior PPA (pPPA), covered all of visual cortex outside of an early foveal cluster. An anterior network (magenta), overlapping cIPL, RSC, and anterior PPA, covered a parietal/medial-temporal network that includes anterior temporal and orbitofrontal parcels. This corresponds to a portion of known default mode regions, with other default mode regions being grouped into a separate network (green); a similar fractionation of the default mode has been proposed previously (Andrews-Hanna et al., 2010). Within the PPA, this anterior/posterior split occurred at approximately MNI coordinate y = −42 mm, with both segments of the PPA falling largely in the collateral sulcus and extending onto the parahippocampal gyrus.
We can visualize the connectivity differences among the parcels overlapping with scene-related regions using classic multidimensional scaling (Fig. 2a), which shows that the network clustering captures the primary dimension of variance in connectivity properties, separating the most posterior parcels overlapping OPA and pPPA from the most anterior parcels overlapping cIPL, RSC, and aPPA. To evaluate the reliability of this shift in connectivity properties within individual subjects, we measured the functional connectivity between these parcels and a reference parcel in the anterior network. We selected the reference parcel to be on the opposite side of the cortical surface (in order to avoid influences from local noise correlations) and to be as far anterior as possible; for dorsal parcels on the lateral surface (overlapping OPA and cIPL), the reference parcel overlapped RSC on the medial surface; and for ventral parcels on the medial surface (overlapping PPA), the reference parcel overlapped cIPL on the lateral surface. In both cases, we observed rapid increases in connectivity as we moved posterior to anterior across the network boundaries (Fig. 2b,c). Along the dorsal boundary, we see significant increases in connectivity to the RSC parcel when moving from the first to the second parcel (left: t(19) = 6.98, p < 0.001; right: t(19) = 6.35, p < 0.001; two-tailed paired t test), from the second to the third parcel (left: t(19) = 7.72, p < 0.001; right: t(19) = 6.16, p < 0.001), and from the third to the fourth parcel (right: t(19) = 2.44, p = 0.025). We observe a similar significant (though less dramatic) increase in connectivity to the cIPL parcel when moving from the first to the second PPA parcel (left: t(19) = 4.21, p < 0.001; right: t(19) = 2.68, p = 0.015) and from the second to the third PPA parcel (right: t(19) = 3.03, p = 0.007).
Connectivity with the hippocampus
Since the anterior scene network overlaps with default mode regions, while the posterior scene network does not, we predict that the anterior network should be more connected to the hippocampus (Buckner et al., 2008). To test this hypothesis, we measured the functional correlation at rest between mean hippocampal activity and the mean activity in each parcel within the posterior and anterior scene networks. As shown in Figure 3, there is a dramatic difference in hippocampal connectivity for parcels in the posterior network (overlapping with OPA and posterior PPA) compared with the anterior network (overlapping with RSC, cIPL, and anterior PPA). Moving posterior to anterior along the dorsal path, hippocampal connectivity first decreases slightly (first parcel to second parcel: left: t(19) = −3.04, p = 0.007; right: t(19) = 2.15 p < 0.04; two-tailed paired t test), then increases significantly when moving to the third parcel (left: t(19) = 5.62, p < 0.001; right: t(19) = 3.79, p = 0.001) and to the fourth parcel (left: t(19) = 4.17, p < 0.001; right: t(19) = 5.74, p < 0.001). Along the ventral path, hippocampal connectivity jumps from the first to the second parcel overlapping with PPA (left: t(19) = 5.27, p < 0.001; right: t(19) = 5.76, p < 0.001) and from the second to the third parcel (right: t(19) = 5.80, p < 0.001).
We also investigated whether this effect was being driven by a subregion of the hippocampus, by correlating the mean time course in both scene networks with the time courses of each posterior-to-anterior coronal slice of the hippocampus. Our results show that the entire hippocampus is more strongly connected to the anterior scene–network than the posterior scene–network, but this difference is especially large in the anterior hippocampus. To confirm this pattern of results, we divided the hippocampus into posterior and anterior subregions at MNI coordinate y = −21 (Poppenk et al., 2013; Zeidman et al., 2015) and correlated their mean time courses with the two scene–network time courses. This analysis confirmed that the anterior network is more strongly connected to both posterior (t(19) = 7.66, p < 0.001; two-tailed paired t test) and anterior (t(19) = 6.58, p < 0.001) hippocampus than is the posterior scene network, and that this anterior–network connectivity is larger in anterior hippocampus (t(19) = 3.29, p = 0.004); a repeated-measures ANOVA shows significant main effects of both hippocampal subregion (F(1,19) = 11.32, p = 0.003) and scene network (F(1,19) = 59.2, p < 0.001), and an interaction (F(1,19) = 7.03, p = 0.016). Group-level connectivity values are reported in Table 1. Note that both the anterior and posterior scene networks are closer to posterior hippocampus, ruling out a distance-based explanation for this pattern of results.
Comparison to meta-analyses and retinotopic atlas
The connectivity results described thus far suggest a functional division for scene-related regions, with some belonging to a posterior network and others belonging to an anterior network. To assess the functional significance of these two networks, we ran two reverse-inference meta-analyses using the NeuroSynth tool (Yarkoni et al., 2011). This system automatically extracts activation coordinates from many fMRI studies (>10,000 at the time of writing); given a particular set of studies, it can identify voxels that are more likely to be activated in this set of studies relative to the full set of studies. These voxels are therefore preferentially active in the query set compared with general fMRI experiments. Based on the areas involved, we hypothesized that the posterior network processes the current visual properties of the scene, whereas the anterior network incorporates episodic memories and contextual aspects of the scene. Thus, in Figure 4a, we compare meta-analyses for the query “scene” (47 studies) with the query “episodic memory, navigation, past future” (125 studies). Along the parahippocampal gyrus, we find that the visual scene activations tend to be posterior to the memory activations, and that the transition point corresponds almost exactly to the division between our two networks. Dorsally, we also observe a separation between the reverse inference maps, with scene and memory activations falling into our two separate networks. Overall, voxels significant only in the scene meta-analysis were concentrated in the posterior network (66% in posterior network, 18% in anterior, 16% in other), while voxels significant only in the memory/navigation meta-analysis were spread more widely across the cortex, but were concentrated more in the anterior than the posterior network (16% posterior, 42% anterior, 42% other). Voxels significant in both the scene and memory/navigation meta-analyses tended to fall near the border between the two networks and divided approximately equally among them (44% posterior, 53% anterior, 4% other).
Another prediction of our framework is that voxels whose activity is tied to specific locations in the visual field (i.e., retinotopic) should, as clearly visual voxels, be present only in the posterior scene network. In Figure 4b, we compared our networks to a group-level probabilistic atlas of retinotopic visual field maps (Wang et al., 2014). The vast majority of the probability mass in this atlas is concentrated in the posterior network. In early visual cortex (V1, V2, V3, hV4), all nonfoveal portions of the visual field maps fall in the posterior network (80% posterior, 0% anterior, 20% other). Ventrally, the posterior network covers VO1/2 (100% posterior, 0% anterior, 0% other), PHC1 (98% posterior, 2% anterior, 0% other), and the peak of the probability distribution for PHC2, which also extends slightly across the anterior network border (78% posterior, 22% anterior, 0% other). Laterally and dorsally, the posterior network includes most of the LO1/2 and TO1/2 maps (82% posterior, 0% anterior, 17% other), V3a and V3b (96% posterior, 0% anterior, 3% other), and IPS0–IPS5 (68% posterior, 4% anterior, 28% other), with SPL1 being the only map falling substantially outside the networks that we consider (18% posterior, 2% anterior, 80% other).
By combining a variety of data sources, we have shown converging evidence for a functional division of scene-processing regions into two separate networks (summarized in Fig. 5). The posterior visual network covers retintopically organized regions, including OPA and pPPA, while an anterior memory-related network connects cIPL, RSC, and aPPA. This division emerges from a purely data-driven network clustering, suggesting that this is a core organizing principle of the visual system.
Subdivisions of the PPA
The division of the PPA into multiple anterior–posterior subregions with differing connectivity properties replicates previous work (Baldassano et al., 2013) on an entirely different large-scale dataset, and shows that there is a strong connection between connectivity changes in PPA and the boundaries of retinotopic field maps. There is now a growing literature on anterior versus posterior PPA, including not only connectivity differences (Nasr et al., 2013; Silson et al., 2016a) but also the response to low-level (Nasr et al., 2014; Silson et al., 2015; Baldassano et al., 2016a,b; Watson et al., 2016) and high-level (Park et al., 2014; Aminoff and Tarr, 2015; Linsley and Macevoy, 2015; Marchette et al., 2015) scene features, as well as stimulation studies (Rafique et al., 2015). Our results place this division into a larger context, and demonstrate that the connectivity differences within PPA are not just an isolated property of this region but a general organizing principle for scene-processing regions.
The visual network
The visual network shows a close correspondence with the full set of retinotopic maps identified in previous studies (Brewer and Barton, 2012; Huang and Sereno, 2013; Wang et al., 2014). Previous measurements in individual subjects have also shown strong overlap between OPA and retinotopic maps, especially V3b and LO2 (Nasr et al., 2011; Bettencourt and Xu, 2013; Silson et al., 2016a), and between pPPA and VO2, PHC1, and PHC2 (Arcaro et al., 2009). The only portion of cortex with known retinotopic maps that is not clustered in this network is the shared foveal representation of early visual areas, which segregates into its own cluster, which is consistent with other work showing a peripheral eccentricity bias in the scene network (Malach et al., 2002; Goesaert and Op de Beeck, 2010; Huang and Sereno, 2013; Baldassano et al., 2016a).
OPA and posterior PPA have been shown to be closely related to the visual content of a stimulus. Even low-level manipulations of spatial frequency (Rajimehr et al., 2011; Kauffmann et al., 2015; Watson et al., 2016) or rectilinearity (Nasr et al., 2014) can drive responses in these regions. Higher-level visual features also drive response patterns in these regions (Bryan et al., 2016), and they are hypothesized to be involved in extracting visual environmental features that can be used for navigation (Marchette et al., 2015; Julian et al., 2016; Kamps et al., 2016). However, neither OPA nor posterior PPA show reliable familiarity effects (Epstein et al., 2007b; see further discussion below).
The functional distinction between pPPA and OPA is currently unclear. Previous work has speculated about the purpose of the apparent ventral and dorsal “duplication” of regions sensitive to large landmarks, proposing that it may be related to different output goals (e.g., action planning in OPA, object recognition in pPPA; Konkle and Caramazza, 2013), or to different input connections (e.g., lower visual field processing in OPA, upper visual field processing in pPPA; Kravitz et al., 2013; Silson et al., 2015). OPA and pPPA may also use information from different visual eccentricities, with OPA processing less peripheral, relatively high-resolution environmental features and pPPA processing more peripheral, large-scale geometry, and context (Baldassano et al., 2016a).
The memory and navigation network
The network of parahippocampal, retrosplenial, and posterior parietal regions that we identify has been emerged independently in many different fields of neuroimaging, outside of scene perception. Meta-analyses of internally directed tasks, such as theory of mind, autobiographical memory, and prospection, have identified this as a core, reoccurring network [Spreng et al., 2009; Kim, 2010; Yeo et al., 2015 (component C10 of )]. It comprises a subset of the broader default mode regions, but functional and anatomical evidence suggests that it is a distinct, coherent subnetwork (Andrews-Hanna et al., 2010, 2014; Yeo et al., 2011). The broad set of tasks that recruit this network have been summarized in various ways, such as “scene construction” (Hassabis and Maguire, 2007), “mnemonic scene construction” (Andrews-Hanna et al., 2010), “long-timescale integration” (Hasson et al., 2015), or “relational processing” (Eichenbaum and Cohen, 2014). A review of memory studies referred to this network as the posterior medial memory system, and proposed that it is involved in any task requiring “situation models” relating entities, actions, and outcomes (Ranganath and Ritchey, 2012).
The network has strong functional connections to the hippocampus, which has been implicated in a broad set of cognitive tasks involving “cognitive maps” for organizing declarative memories, spatial routes, and even social dimensions (Eichenbaum and Cohen, 2014; Schiller et al., 2015). During perception, the hippocampus binds together visual elements of an image (Olsen et al., 2012; Warren et al., 2012; Zeidman et al., 2015), which is especially important for scene stimuli (Lee et al., 2005a,b; Graham et al., 2006; Hodgetts et al., 2016) and then stores this representation into long-term memory (Ryan and Cohen, 2004). As we become familiar with an environment, the hippocampus builds a map of the spatial relationships between visual landmarks, which is critical for navigation (Morgan et al., 2011). Recalling or even imagining scenes also engages the hippocampus, especially anterior hippocampus, which may serve to integrate memory and spatial information (Zeidman and Maguire, 2016). Our results suggest that only the anterior scene regions interface directly with the hippocampus, potentially enabling the construction of hippocampal environmental representations, and retrieval of relevant memories and navigational information for a presented or imagined scene.
The specific functions of the individual components of this network have also been studied in a number of contexts. RSC appears to be most directly involved in orienting the viewer to the structure of the environment (both within and beyond the borders of the presented image) for the purpose of navigational planning; it encodes both absolute location and facing direction (Vass and Epstein, 2013; Epstein and Vass, 2014; Marchette et al., 2014), integrates across views presented in a panoramic sequence (Park and Chun, 2009), and shows strong familiarity effects (Epstein et al., 2007a,b). This is consistent with rodent neurophysiological studies, which have identified head direction cells in this region (Chen et al., 1994). RSC is not sensitive to low-level rectilinear features in nonscene images, such as objects or textures (Nasr et al., 2014), though it does show some preference for rectilinear features in images of 3D scenes (Nasr et al., 2014; Watson et al., 2016).
The specific properties of anterior PPA have been less well studied, since it was not recognized as a separate region within the PPA until recently. It has been shown to be driven more by high-level category information than by spatial frequency content (Watson et al., 2016), to represent real-world locations (even from perceptually distinct views; Marchette et al., 2015), to encode object co-occurrences (Aminoff and Tarr, 2015), and to represent real-world physical scene size (Park et al., 2014). Its representation of scene spaciousness draws on prior knowledge about the typical size of different scene categories, since it is affected by the presence of diagnostic objects (Linsley and Macevoy, 2015).
The cIPL (also referred to as posterior IPL, PGp, or the angular gyrus) has been proposed as a “cross-modal hub” (Andrews-Hanna et al., 2014) that connects visual information with other sensory modalities as well as knowledge of the past. It is more intimately associated with visual cortex than most lateral parietal regions, since it has strong anatomical connections to higher-level visual regions in humans and macaques (Caspers et al., 2011), and has a neurotransmitter receptor distribution similar to V3v and is distinct from the rest of the IPL (Caspers et al., 2013). It has been mostly ignored in the scene perception literature, primarily because it is not strongly responsive to standard scene localizers that show sequences of unfamiliar and unrelated scene images. For example, a study showing familiarity effects in cIPL described this location only as “near TOS” (Epstein et al., 2007b). The cIPL appears commonly, however, in studies involving personally familiar places, which are associated with a wealth of memory, context, and navigational information. It is involved in memory for visual scene images (Montaldi et al., 2006; Takashima et al., 2006; Elman et al., 2013; van Assche et al., 2016), learning navigational routes (Burgess et al., 2001; Bray et al., 2015), and even imagining past events or future events in familiar places (Hassabis et al., 2007; Szpunar et al., 2009). It can integrate information across space (Livne and Bar, 2016) and time (Lerner et al., 2011; Vilberg and Rugg, 2012), and has been shown in lesion studies to be critical for orientation and navigation (Kravitz et al., 2011). Our connectivity results and meta-analysis suggest that cIPL may play a prominent role in connecting visual scenes to the real-world location they depict.
Contrasting the two networks
Although our work is the first to propose the visual versus context networks as a general framework for scene perception, several previous studies have shown differential effects within these two networks. Contrasting the functional connectivity patterns of RSC versus OPA or lateral occipital cortex (LOC; Nasr et al., 2013) or anterior versus posterior PPA (Baldassano et al., 2013) show a division between the two networks, consistent with our results. Contrasting scene-specific activity with general (image or word) memory retrieval showed an anterior versus posterior distinction in PPA and cIPL/OPA, with only more anterior regions (aPPA and cIPL, along with RSC) responding to content-independent retrieval tasks (Johnson and Rugg, 2007; Fairhall et al., 2014). Our two-network division is also consistent with the “dual intertwined rings” model, which argues for a high-level division of cortex into a sensory ring and an association ring, the second of which is distributed but connected into a continuous ring through fiber tracts (Mesmoudi et al., 2013).
The anterior/posterior pairing of aPPA/pPPA and cIPL/OPA raises the question of whether there is a similar anterior/posterior division in RSC. Evidence for a division has been mixed: wide-field retinotopic mapping using natural scenes shows a partial retinotopic organization in RSC (Huang and Sereno, 2013); the response of RSC to visual rectilinear features appears to be limited to the posterior portion (Nasr et al., 2014); but a study of retinotopic coding in scene-selective regions failed to find any consistent topographic organization to RSC responses (Ward et al., 2010), and previous analyses of the functional properties of anterior versus posterior RSC have not found any significant differences (Park et al., 2014). A very recent study (Silson et al., 2016b) that carefully compared scene selectivity, functional connectivity, and retinotopic mapping has proposed that there are in fact two separable subregions in medial parietal cortex. The more anterior region is strongly connected to anterior PPA and is less retinotopic, likely corresponding to the parcel overlapping RSC on which we focus in this work. The more posterior region, which falls in the parieto-occipital sulcus, is more strongly driven by visual scenes, has a clear contralateral field bias, and is connected more evenly to the subregions of PPA (though still more to anterior than posterior PPA). Future work may confirm that this region should also be included as a part of the visual scene network, yielding a third interface between the two networks.
Another interesting question is how spatial reference frames differ between and within the two networks. Given its retinotopic fieldmaps, the visual network presumably represents scene information relative to the current eye position; previous work has argued that this reference frame is truly retina centered and not egocentric (Ward et al., 2010; Golomb and Kanwisher, 2012). The context network, however, likely transforms information between multiple reference frames. Models of spatial memory suggest that medial temporal lobe (possibly including aPPA) uses an allocentric representation, while the posterior parietal lobe (possibly including cIPL) is based on an egocentric reference frame, and that the two are connected via a transformation circuit in RSC that combines allocentric location and head direction (Byrne et al., 2007; Vann et al., 2009). There is some recent evidence for this model in human neuroimaging: posterior parietal cortex codes the direction of attention in an egocentric reference frame (even for positions outside the field of view; Schindler and Bartels, 2013), and RSC contains both position and head direction information (anchored to the local environment; Marchette et al., 2014; Shine et al., 2016). This raises the possibility that another critical role of cIPL could be to transform retinotopic visual information into a stable egocentric scene over the course of multiple eye movements. The properties of aPPA, however, are much less clear; it seems unlikely that it would use an entirely different coordinate system than neighboring PHC1/2, and some aspects of the scene encoded in aPPA, such as object co-occurrence (Aminoff and Tarr, 2015), do not seem tied to any particular coordinate system.
Finally, we note that a hard division into two networks is only a first-order description of the structure and function of scene regions. A number of these regions (e.g., PHC2) fall on a continuum from visual to contextual, and recent theories of information processing argue that almost all cortical regions accumulate information at varying timescales (Hasson et al., 2015). Task demands will also shift the functions of these regions (e.g., during top-down imagery; Dentico et al., 2014) and can lead to the dynamic reconfiguration of networks (Bray et al., 2015). Our proposed framework is intended to capture the primary functional dimension that distinguishes between scene-sensitive regions during natural perception, and to offer a starting point for future work on the organization of the human scene-processing system.
Based on data-driven connectivity analyses and analysis of previous literature, we have proposed a unifying framework for understanding the neural systems involved in processing both visual and nonvisual properties of natural scenes. This new two-network classification system makes explicit the relationships between known scene-sensitive regions, re-emphasizes the importance of the functional subdivision within the PPA, and incorporates posterior parietal cortex as a primary component of the scene-understanding system. Our proposal that much of the scene-processing network relates more to contextual and navigational information than to specific visual features suggests that experiments with unfamiliar natural scene images will give only a partial picture of the neural processes evoked in real-world places. Experiencing our visual environment requires a dynamic cooperation between distinct cortical systems to extract information from the current view of a scene, and then to integrate it with our understanding of the world and determine our place in it.
Note added in Proof - Minor revisions were made to the version that was published on-line October 10, 2016, as an Early Release, including adjustments to the labeling of Figures 2 and 3, and small wording changes in the Abstract and Materials and Methods.
Acknowledgments: We thank the Richard M. Lucas Center for Imaging, the Stanford Center for Cognitive and Neurobiological Imaging, and Michael Arcaro for helpful discussions.
The authors declare no competing financial interests.
Funding was provided by a National Science Foundation Graduate Research Fellowship (to C.B.) under Grant DGE-0645962, and by Office of Naval Research Multidisciplinary University Research Initiative (to D.M.B. and L.F.) Grant N000141410671. Data were provided in part by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.