June 3, 2025
A few weeks ago I opened a group chat where seven close friends had their locations shared. To my surprise, I observed a near-perfect line spanning from Rochester, MN to Richmond, VA. Curious about the odds, I explored the data further using tools at my disposal and open source data. The following page was my approach. Enjoy!
I started by treating each friend's location (longitude, latitude), as a coordinate and plotting those points on a graph. I immediately shared with my friends, and that visual alone caught their attention:
"This is statistically significant"
"OMW to West Virginia"
"But what does it mean?
I could have stopped the discussion there. The key takeaway was clear: something unusual is happening. However, I saw an opportunity to take a few more steps to validate this intuition. Below is the truncated data and the initial plot I shared:
This is a pretty remarkable fit, but it assumes that the earth is flat. For reference, the length of the line is ~830 miles. The vertical drop due to earth's curvature over this distance is ~22 miles. If we extrapolate beyond this segment, the estimate completely falls apart.
I adjusted the model to a spherical geometry and built a parametric best-fit great circle through the points, leveraging basic linear algebra in Python (SVD). This allowed me to confirm that the alignment was not an artifact of 2D projection.
The result? an even stronger fit than expected. To be precise, the R² value was 0.9985 – an exceptionally close alignment. Here is the resultant plot:
The R² here is calculated from the residuals, but instead of linear distance to the line (or plane), it is the angular distance along the sphere. For those wondering, the actual equation of the Rochester-Richmond line is:
r(θ) = R·[cos(θ) · v + sin(θ) · w]
Where R = 6371km, and <v> and <w> are unit vectors in the SVD plane orthogonal to one another. For example, if v = <-0.223, 0.783, -.580>, then w = <-0.868, 0.110, 0.484>.
Explore the length of the line for yourself below!
Most of the line manages to bypass major landmasses which alone is interesting. However, there are a few locations of note it passes through or close by:
Fort Wayne, IN
Fernando de Noronha Archipelago, Brazil
Karlamilyi National Park, Australia
Chuuk, Micronesia
Vancouver, Canada
If you want to know how close you live to the Rochester Richmond line, type in your ZIP code below, and see how close you are:
To understand how unlikely this is, I built a quick and dirty population-weighted Monte Carlo simulation. I had to make a few assumptions to speed up the process:
I only looked at the contiguous United States.
Assume that while location is random, it is weighted heavily by population of large cities. In this evaluation, I took the 100 biggest cities in the United States, where the population magnitude corresponds to a "pull" factor.
Since this was a group chat of Notre Dame graduates, I added an extra pull factor close to South Bend, accounting for Midwestern skew.
I added an extra pull from Chicago since there is a disproportionate number of young alums in the windy city.
Notre Dame actually publishes what states alumni end up directly after graduation, so I gathered all the data available from 2014-2024, and used that as a general guideline to find the weight factors later on.
I coded up an initial guess, created a heatmap, and did the same for the ND alumni data. I optimized the weight factors (city pull, ND pull, Chicago pull, and their corresponding spreads), and found the following distribution:
This is not perfect, but it is good enough for a project like this.Â
Note that on the right, the population centers match the state centroid, rather than the actual population centers. This is because the ND alumni information only shows the states. This is also why we cannot use the alumni data as our distribution: there may be many ND alums in New York, but the vast majority are in NYC, not Pratts Hollow.
Next, I generated seven random points on the map, based on the population distribution created previously. The great circle best fit is then calculated for these points using SVD. Just as before, we find the R² of that line, store the value, and then repeat. I decided to run this trial 100,000 times, which running in the background took long enough for my wife and I to get through an episode of How I Met Your Mother. Here is the resulting distribution:
This is even crazier than I thought! After 100k simulations, this line which I stumbled upon by complete accident is still among the most unlikely outcomes.
My weighing the population distribution, we made it less random, so this leftward skew makes sense. The weights that we added clump towards the Midwest, so any sample with data points at the coasts could create a relatively straight East-West line. This is the nature of the United States. Low R² values are rare since the country has directionality to it, and we are selecting a small number of data points.
Take these four randomly generated examples from the total 100k simulations:
Example 1
Example 2
Example 3
Example 4
Even in these random cases, the best fit line is often still a decent fit, so this distribution makes sense. Because there is always the chance of a terrible fit, the long left tail shifts the mean smaller than the median.
To better visualize the data, I plotted a cumulative distribution function. This essentially arranges each value from smallest to highest. The Rochester-Richmond line value is the vertical blue line on the right, and the intersecting horizontal line is the probability of that occurring:Â
Zooming in on the top right intersection a bit:
Absolutely wild. It turns out that in this simulation, the Rochester-Richmond line was better than 99.98% of instances. There were only 20 lines more perfect in the 100k.
I was curious what these lines looked like, and it turns out they all look similar. I have limited the visual to just a few for readability purposes, but see below:
All 20 instances were of this largely East to West form. The pattern is not random noise, but rather it reflects a subtle clustering driven by population, geography, and social networks.
While this was a fun personal project, it is an example of how I approach any analytical challenge:
Start with a question, not a method. Define the problem, then create a structure to solve it. I started with a simple plot, and dove into additional questions as they arose.
Analyze and summarize findings. I will not claim that everyone should find this as interesting as me, but I hope I was able to convince you of the significance and spark your curiosity as well.
Tailor the depth of analysis to the audience. My friends are all nerds– lawyers, doctors, engineers, consultants, etc. I knew they would appreciate the detail.
Prioritize actionable insights. While these results aren't life-changing, the takeaways and lessons satiated a core curiosity I had.
Couple these principles with access to powerful tools like Python and Excel, and an ability to use them, and I like to think I can tackle most problems that arise.
I did not mention this at the beginning, but the line is still somewhat around. The friend who was visiting Rochester, MN headed back to Chicago (which is still on the line!), and others shifted a bit as well. It is a little more wavy now that people have settled, but I guess my friend group is a statistically significant one.
The only real untapped curiosity I have from this project is looking to the globe, do any major city lines exist? Is there Dublin-Paris-Rome line? It would be fun to investigate another time, but I need a break from maps to dive in to something else. Thanks for reading!
Python packages used: pandas, matplotlib, numpy, scipy, re, sklearn, random. See code attached.