NYT Puzzle Score Analytics

Daily scores for Wordle, Connections, and Strands — scraped from Reddit and Bluesky and compared across communities. Updated hourly.


Connections

Recent Puzzle Rankings

Connections score distribution
Connections score Gaussian
Connections D convergence

Strands

Recent Puzzle Rankings

Strands score distribution
Strands score Gaussian
Strands D convergence

Wordle

Recent Puzzle Rankings

Wordle score distribution
Wordle score Gaussian
Wordle D convergence

Details

Connections
Points range from 0 to 180. '🟪' = 4 points, '🟦' = 3, '🟩' = 2, and '🟨' = 1 point. A mistake subtracts the row of points from the running score, and a correct answer adds the points for each square. Further, a multiplier is applied to correct answers: the first row of correct answers is worth x4, second row is x3, and so on. Guess the hard categories first for maximum score!

Strands
Points range from 0 to ~135. '🟡' = 10 points, '🔵' = 5 points, and '💡' = -5 points. The position of the yellow dot also is taken into account. A blue word found before the spanagram is worth fewer points. To get max points, find the spanagram first!

Wordle
Points range from 0 to 150. '⬜' = 5 points, '🟨' = 3 points, and '🟩' = 0 points. Each incorrect guess adds points, incorrect placement is worth fewer points. Guess the word with as few mistakes as possible and get the lowest score you can!

The puzzle's difficulty rank D is calculated with the mean score μ, score standard deviation σ, and skewness of score distribution δ. μ would make for a decent ranking, however, the spread of the data can also contribute to a difficulty ranking. Two puzzles can have a similar mean score, but a higher spread in the score data indicates a higher difficulty. σ is considered in this case to increase or decrease the difficulty ranking. The coefficient of variation, σ / μ is a measure of how far the data of a distribution is spread out from the mean, and is useful for comparing multiple distributions together. How the distributions are skewed, either towards high or low scores, is also taken into account with δ. This helps score a distribution with outliers.

Finally, adjustable weights α and β are applied to the distribution parameters in order to achieve desired balance.

For Wordle, difficulty of the puzzle can be thought of as increasing with the score. The difficulty of Wordle puzzles is quantified as follows,

D = δ [ α μ + β σ μ ]

For Connections and Strands, lower scores indicate a harder puzzle, so the difficulty will increase inversely to the mean of the score distribution. A tiny decimal γ is added to avoid dividing by zero. The difficulty of these puzzles is quantified as,

D = δ [ α 1 μ + γ + β σ μ ]

D* is a shrunk version of D computed via hierarchical empirical Bayes. Puzzles with few scores are pulled toward the grand mean across all puzzles for that platform — the less data behind a ranking, the more it shrinks. As scores accumulate, D* converges to D. This makes early rankings more conservative and comparable across puzzles with different sample sizes.

Cross-platform comparison is computed analytically using the sampling distribution of the difference of means. The reported difference is BS − Reddit: positive means Bluesky players scored higher on average for that puzzle. P(BS higher) is the probability that the true Bluesky mean exceeds the Reddit mean, given the observed scores and sample sizes. Values near 50% indicate the platforms are indistinguishable for that puzzle.

The convergence plot shows how D evolves as more scores are collected. At each value of n, the mean D is averaged across all puzzles that had at least that many scores. A flat line means the ranking stabilizes quickly; a drifting line means early scores are not representative of the full distribution. Bluesky and Reddit are plotted separately, so differences in convergence behavior between platforms reflect differences in who posts early versus late.

View on GitHub