The Challenge: Navigating Vast Networks with Limited Information

In our increasingly interconnected world, networks are everywhere. From social media connections to logistics routes, understanding the quickest way from point A to point B—or finding the shortest path—is fundamental. However, as these networks grow exponentially, the challenge isn’t just about finding these paths; it’s about finding them faster and with significantly less data. Traditional methods often demand comprehensive knowledge of the entire graph, an impractical ask for massive, dynamic networks. This creates a critical need for innovative graph algorithms that can operate efficiently under constraints, delivering approximate yet reliable results.
The Challenge: Navigating Vast Networks with Limited Information
Imagine trying to find the fastest driving route across a continent without a complete map, only being able to query intersections as you go. This mirrors the complex problem faced by researchers and engineers: constructing an effective data structure for approximately answering shortest-path inquiries between pairs of vertices (s,t) in an undirected graph G, given limited query access to the graph.
The standard node query model is often employed for this. In this model, we start with an arbitrary seed vertex as the “access point” to the network. Querying a node 𝑣 reveals its list of neighbors Γ(𝑣). Unlike existing index-based solutions, which perform preprocessing on the whole graph, we aim for a solution that queries and stores only a small fraction of the nodes in the network.
Following the initialization of the data structure, the task is to answer multiple shortest path inquiries, where each inquiry SP(s,t) needs to be answered with a valid path p0p1…pl between s = p0 and t = pl. The primary objective is to minimize the mean additive error measured over all inquiries. The additive error for an inquiry SP(s,t) is the difference between the length of the returned s–t path and the actual shortest distance between s and t in G. Depending on the specific application, one would like to minimize (a subset of) the additive error, running time, memory, and/or node queries.
Leveraging Natural Network Structures: The Core-Periphery Model
Many real-world networks aren’t uniformly structured. Take social media, for instance. You often see a few highly connected individuals (influencers) and many smaller groups with fewer connections. This naturally occurring pattern is known as a core-periphery structure.
The degree distribution in social and information networks often follows a power-law distribution with exponent 2<β<3, which results in a core-periphery structure [9, 43, 50, 52, 63]. Here, the core is a highly connected component with good expansion properties, consisting of higher degree nodes. The periphery, conversely, is a collection of small, poorly connected components of low degree.
Our data structure is designed for networks exhibiting these structural characteristics. It takes advantage of the structure by first performing a preprocessing step to acquire (parts of) the core of the network, and then answering approximate shortest path inquiries by routing through the core. The working hypothesis is that pairs of nodes that are sufficiently far apart will typically have the shortest path between them (or close to it) routed through the higher degree parts of the network. This is somewhat reminiscent of approaches based on the highway dimension [1–3] for routing in road networks, although the structural characteristics of these network types differ considerably.
Introducing WormHole: A Two-Phase Pathfinding Algorithm
WormHole represents a significant advancement in finding shortest paths with reduced data. WormHole builds an explicit hierarchical core-periphery type structure with a sublinear inner ring and provides a framework which uses this structure to answer shortest path inquiries. The algorithm operates in two distinct phases, offering a streamlined approach to network exploration and pathfinding.
The Structural Decomposition Phase
The first phase is a crucial preprocessing step, where we decompose the graph into three partitions, storing only the smallest one: a highly dense subgraph of sublinear size. It is well-documented that social networks exhibit a core-periphery structure; see, e.g., [43, 50, 52, 63] and the many references within. The core is a highly-connected component with good expansion properties and smaller effective diameter.
The periphery, denoted P, consists of smaller isolated communities that connect to the core but are sparsely connected internally, and whose union is of linear size [16]. Therefore, when answering shortest path inquiries, it is reasonable to first check if the two vertices are in the same peripheral community, and otherwise route through the core.
The Routing Phase: Answering Shortest-Path Queries
The second phase is where the algorithm (approximately) answers shortest path inquiries of the form SP(s,t) for arbitrary vertex pairs (s,t). This phase leverages the pre-computed core-periphery structure for efficient querying.
Given a query SP(s,t), WormHole does the following. First, it checks if the two vertices are in the same peripheral component by performing a truncated BiBFS from both s and t up to depth two. If the two trees collide, it returns the shortest path between s and t. Otherwise, WormHole continues both BFS traversals until it reaches the outer ring (from both s and t). From here, it takes a single step to reach the inner ring, and then performs a restricted BiBFS on the subgraph induced by the inner ring vertices. We note that the choice of BiBFS here is arbitrary, and we can use any shortest-path algorithm (including modern index-based approaches, initialized only on the inner core) as a black-box to find a shortest path in the inner ring.
A figure illustrates a few typical cases encountered by the algorithm; in the first two cases the algorithm returns a true shortest path, and in the third case the returned path is not a shortest path (thus incurring a nonzero additive error). We stress that a single decomposition is subsequently used to answer all shortest path queries. Theorem 1.1 provides a strong theoretical guarantee on the performance of WormHole. It is worth emphasizing that our notion of approximation is inspired by practical relaxations and is distinct from the one usually considered in theoretical works.
Conclusion
The ability to find shortest paths faster and with less data is transformative for analyzing vast and complex networks. The WormHole algorithm, with its innovative two-phase approach leveraging the inherent core-periphery structure of many real-world networks, offers a powerful solution. By focusing data acquisition and processing on the most relevant parts of the network—its dense core—it dramatically reduces the computational burden while still providing high-quality approximate shortest paths. This approach not only pushes the boundaries of graph algorithms but also paves the way for more scalable and efficient applications in diverse fields, from social network analysis to logistics and beyond. As networks continue to grow, algorithms like WormHole will be indispensable for making sense of their intricate connections.
Authors:
(1) Talya Eden, Bar-Ilan University (talyaa01@gmail.com);
(2) Omri Ben-Eliezer, MIT (omrib@mit.edu);
(3) C. Seshadhri, UC Santa Cruz (sesh@ucsc.edu).
This paper is available on arxiv under CC BY 4.0 license.




