clustermon

UMAP of the Pokédex

Building a multi-lens UMAP projection of every base-form Pokémon, then clustering and visualizing the result. Notes from the workbench.

Try the Cross-Lens Explorer →
Step 01

UMAP

dimensionality reduction over the Pokédex feature matrix

We loaded the canonical PokéAPI tables into a single SQLite database, flattened every species into one wide feature vector (stats, types, abilities, move pool, egg groups, color, shape, habitat, growth rate, generation), z-scored the numeric block, then ran UMAP nine times (once per lens) at two output dimensionalities each. Four lenses come from the structured features above; two more (flavor, sprite) are dense embeddings of Pokédex text and official artwork; two are type-supervised fine-tunes of those encoders (flavor-ft, sprite-ft); and all is the L2-normalized concatenation of the four structured sub-blocks. The 2-D output is for the eye; the 10-D output is for HDBSCAN later.

Shape reduction · what UMAP actually did
LensInput (rows × features)Output (rows × 2)Reduction ratio
all1,025 × 1,1231,025 × 2561.5×
stats1,025 × 61,025 × 2
types1,025 × 361,025 × 218×
abilities1,025 × 2841,025 × 2142×
moves1,025 × 7971,025 × 2398.5×
flavor1,025 × 3841,025 × 2192×
flavor-ft1,025 × 3841,025 × 2192×
sprite1,025 × 5121,025 × 2256×
sprite-ft1,025 × 5121,025 × 2256×
all-2d UMAP · every base-form Pokémon as a dot
UMAP scatter of every base-form Pokémon, with five spotlight species labeled.
1,025 grey dots = species. Labeled larger dots = the five spotlights below. The geometry is real: nearby dots are Pokémon whose 1,194-dim feature vectors UMAP placed close together.
five spotlight rows · line-level "what UMAP did"
PokémonStats · types · abilitiesFeature density (1,194-d)After (u0, u1)5 nearest UMAP neighbors
#0025pikachu
stats 35/55/40/50/50/90types electricability static, lightning rod
127/1,194 dims set (11% density)
(6.02, 4.13)
wattreld0.01blitzled0.05pawmod0.06elekidd0.07mareepd0.07
#0006charizard
stats 78/84/78/109/85/100types fire / flyingability blaze, solar power
154/1,194 dims set (13% density)
(-1.47, 3.60)
magmard0.06pyroard0.12delphoxd0.16typhlosiond0.16rapidashd0.18
#0150mewtwolegendary
stats 106/110/90/154/90/130types psychicability pressure, unnerve
189/1,194 dims set (16% density)
(3.40, 4.76)
calyrexd0.05celebid0.05azelfd0.08deoxysd0.09victinid0.11
#0129magikarp
stats 20/10/55/15/20/80types waterability swift swim, rattled
27/1,194 dims set (2% density)
(1.84, -2.18)
wiglettd0.09arrokudad0.10tympoled0.10poliwagd0.12goldeend0.12
#0143snorlax
stats 160/110/65/65/110/30types normalability immunity, thick fat
160/1,194 dims set (13% density)
(-0.17, 6.53)
kecleond0.02exploudd0.05oinkologned0.07stantlerd0.09greedentd0.10
# umap params for the 2-d viz pass (the 10-d clusterable pass uses min_dist=0.0)umap.UMAP( n_components=2, min_dist=0.1, n_neighbors=30, metric="cosine", random_state=42, n_jobs=1, transform_seed=42, ).fit(X) # X.shape = (1,025, up to 1,194)
Step 01b

UMAP parameter sweep

140 fits across n_neighbors × min_dist, per lens

UMAP has two knobs that dominate the look of the projection. n_neighbors sets how many nearest neighbors each point uses to build the local manifold; small values (5) preserve fine-grained local detail, large values (100) emphasize global structure. min_dist sets the floor on point separation in the embedding: 0.0 lets clusters pack tight (best for downstream HDBSCAN), 0.5 leaves breathing room (better for the eye). Each panel below is one full UMAP fit at those settings; points are colored by primary type so you can see when type structure resolves and when it dissolves.

5 × 4 grid · rows = n_neighbors {5, 15, 30, 50, 100} · cols = min_dist {0.0, 0.1, 0.25, 0.5}
UMAP parameter sweep small-multiples for all lens
Scan top-to-bottom to see n_neighbors trade local for global structure; scan left-to-right to see min_dist loosen tight clumps into breathable shapes. The headline 2-d projection on Step 01 uses n_neighbors=30, min_dist=0.1 (row 3, col 2); the 10-d input to HDBSCAN uses min_dist=0.0 to give the clusterer maximum density to work with.
Step 02

HDBSCAN

density-based clusters over the 10-d UMAP outputs

HDBSCAN finds clusters of varying density without being told how many to look for. We feed it the 10-d UMAP output (the 2-d is for the eye; 10-d gives the clusterer room to find structure), run it across all nine lenses, and score each result against primary type as ground truth. Headline numbers at mcs=15: the all lens (L2-normalized concat of the four structured sub-blocks) and the types lens both clear ARI ≈ 0.93; the moves lens recovers type at ARI ≈ 0.54 without ever seeing a type label.

dial min_cluster_sizemin_cluster_size = 15
8153060
smaller mcs = more, smaller clusters · larger mcs = fewer, broader clusters. The table, scatter, and spotlight rows below swap to precomputed values for the selected mcs.
cluster recovery per lens · mcs=15truth = primary type
LensClustersNoiseARI vs typeARI vs type-pairNMI vs type
abilities31150 (14.6%)0.1390.1040.377
all1835 (3.4%)0.9330.4860.952
flavor17537 (52.4%)0.1080.0760.281
flavor-ft54 (0.4%)0.0320.0140.199
moves17145 (14.1%)0.5610.3030.633
sprite14592 (57.8%)0.1550.1170.371
sprite-ft2175 (7.3%)0.7930.3540.837
stats20 (0.0%)-0.002-0.0010.007
types180 (0.0%)0.9580.4630.985
abilities22131 (12.8%)0.1270.0870.340
all1835 (3.4%)0.9330.4860.952
flavor11495 (48.3%)0.0720.0440.204
flavor-ft54 (0.4%)0.0320.0140.199
moves14143 (14.0%)0.5360.2840.616
sprite2111 (10.8%)-0.0020.0040.031
sprite-ft1958 (5.7%)0.8000.3760.839
stats20 (0.0%)-0.002-0.0010.007
types180 (0.0%)0.9580.4630.985
abilities1497 (9.5%)0.1200.0730.282
all1662 (6.0%)0.9690.4520.962
flavor8518 (50.5%)0.0690.0400.184
flavor-ft41 (0.1%)0.0320.0140.190
moves825 (2.4%)0.2490.1190.484
sprite2111 (10.8%)-0.0020.0040.031
sprite-ft1535 (3.4%)0.7420.3150.814
stats20 (0.0%)-0.002-0.0010.007
types1629 (2.8%)0.9940.4340.995
abilities243 (4.2%)0.0040.0020.025
all667 (6.5%)0.3320.1100.694
flavor3536 (52.3%)0.0330.0140.078
flavor-ft255 (5.4%)0.0200.0090.097
moves614 (1.4%)0.2290.1090.432
sprite4523 (51.0%)0.1010.0440.166
sprite-ft6157 (15.3%)0.4140.1340.642
stats20 (0.0%)-0.002-0.0010.007
types799 (9.7%)0.4870.1630.799
types lens trivially recovers types (the input one-hots ARE the labels). stats lens can't (6 dims have no density structure). moves is the genuine finding: move pools predict type without supervision.
baseline clusterers vs HDBSCAN · ARI vs primary typek-sweep ∈ {6, 11, 18, 30, 50}
LensHDBSCANk-means bestDBSCANGMM bestBest baselineΔ vs HDBSCANSupervised UMAP + HDBSCANΔ vs best unsup
all0.9330.788k=180.8780.892k=180.892gmm@18+0.0410.958+0.026
stats-0.0020.025k=180.0000.025k=300.025kmeans@18-0.0270.033+0.008
types0.9581.000k=180.9641.000k=181.000kmeans@18-0.0420.960-0.040
abilities0.1270.116k=500.1040.108k=300.116kmeans@50+0.0110.283+0.156
moves0.5360.426k=110.2310.377k=180.426kmeans@11+0.1100.899+0.363
flavor0.0720.045k=300.0010.046k=180.046gmm@18+0.0260.481+0.409
flavor-ft0.0320.185k=180.0320.167k=110.185kmeans@18-0.1530.608+0.423
sprite-0.0020.094k=11-0.0010.093k=180.094kmeans@11-0.0960.537+0.443
sprite-ft0.8000.745k=180.5220.740k=180.745kmeans@18+0.0550.934+0.134
baselines fit on the same 10-d UMAP coords as HDBSCAN. k-means + GMM are forced to pick k; HDBSCAN discovers it. Positive Δ means density-based clustering beat the best k-tuned baseline.
fine-tuned encoders · sprite + flavor → type1 contrastive

Two parallel fine-tunes ask the same question of two modalities: does in-domain type supervision unlock combat-type signal that the off-the-shelf encoder doesn't see? sprite-ft takes a ViT-B-32 vision tower and trains on 820 (sprite, type-prompt) contrastive pairs; flavor-ft takes MiniLM-L6-v2 and trains on the matching (Pokédex blurb, type-prompt) pairs. Both use 80/20 stratified splits, fp16 mixed precision, gradual unfreezing (last few blocks then full encoder), cosine LR with 100-step warmup, and early stopping on held-out test ARI. The prompt-encoder side stays frozen in both; the only thing changing is what the input encoder learns to look for.

before / after · HDBSCAN @ mcs=15 · ARI vs primary type
LensARI off-the-shelfARI fine-tuned (test)ARI fine-tuned (train)Δ test
sprite → sprite-ft-0.0020.3710.928+0.373
flavor → flavor-ft0.0720.0250.033-0.047
all-1025 ARI is 0.800 for sprite-ft and 0.032 for flavor-ft (train + test pooled); the test column above is the only number that defends against overfitting on a 1,025-sample dataset.
sprite-ft training curves · loss + test ARI/NMI
CLIP fine-tune training curves: train/test cross-entropy and test ARI/NMI over epochs
flavor-ft training curves · loss + test ARI/NMI
MiniLM fine-tune training curves: train/test cross-entropy and test ARI/NMI over epochs
the two iconic HDBSCAN plots · mcs=15
condensed tree (dendrogram)
HDBSCAN condensed-tree dendrogram for all lens
Vertical axis is λ (lambda), the inverse density threshold at which clusters split or dissolve. Higher λ = denser neighborhood required. A cluster's persistence (its vertical width on this tree) shows how stable it is across density scales; the dashed blue boxes mark the clusters HDBSCAN ultimately selected: long persistent bars beat short transient ones.
soft-membership heatmap
HDBSCAN soft-membership heatmap for all lens
Each row is one Pokémon; each column is one HDBSCAN cluster. Cell brightness = the soft-membership probability that point belongs to that cluster. Rows are sorted by primary (argmax) cluster so coherent bands appear along the diagonal. Rows whose probabilities sum to noticeably less than 1 are uncertain points; they sit on cluster boundaries or in low-density regions where the model is hedging (i.e., the soft analog of HDBSCAN noise).
Step 02b

Library-canonical evals

GLOSH outlier scores · cluster persistence · UMAP diagnostics

Three eval surfaces the libraries themselves treat as canonical but the page didn't show until now. GLOSH outlier scores rank every point by how far it sits from any cluster's dense core (∈ [0,1]; 1 = extreme outlier). Cluster persistence quantifies how robustly each cluster survives across the density hierarchy; the numeric companion to the dendrogram above. UMAP diagnostics show per-region embedding quality so you can spot where the projection had to lie.

cluster persistence · all
LLM coherence2.09· 72% type-only· 83% agree
Sorted desc by cluster_persistence_. Color = relative persistence within this lens. Highest-persistence clusters tend to coincide with the highest LLM coherence scores (top-3 by persistence on the moves lens match clusters the judge rated 4 or 5), and noise-heavy lenses like flavor have lower median persistence. Numbers ground against domain truth, not just against each other.
clustersizepersistence
#1400.7865
#2270.6666
#8580.6177
#9580.6031
#0990.5423
#13330.5379
#4290.4703
#6790.4555
#11340.4492
#7630.4128
#16460.3867
#31310.3675
#10440.3600
#12400.3278
#5910.2973
#14300.2964
#15460.2365
#17420.2125
top GLOSH outliers · all
mean=0.200 · p90=0.525 · max=0.846
#0250ho-ohfire
0.846
#0798kartanagrass
0.846
#0795pheromosabug
0.843
#0794buzzwolebug
0.840
#0426drifblimghost
0.831
neighborhood
UMAP neighborhood diagnostic for all lens
Per-point Jaccard preservation of local neighborhoods (input ↔ embedding). Darker / cooler = better preserved.
local_dim
UMAP local_dim diagnostic for all lens
Local intrinsic dimensionality: regions where the embedding had to force high-d structure into 2-d.
pca
UMAP pca diagnostic for all lens
Embedding colored by PCA-RGB of input space. Smooth color transitions = global structure preserved.
all lens · 2-d UMAP colored by HDBSCAN cluster (mcs=15)
UMAP scatter colored by HDBSCAN cluster id, 18 clusters and 35 noise points.
Same 1,025 points as Step 01's grey scatter, now colored by the 18 clusters HDBSCAN found in the 10-d UMAP space. Noise points (×) are species too isolated to confidently belong to any cluster. Spotlights labeled with their cluster id; clusters defined in 10-d sometimes look overlapped in 2-d because dims 3-10 hide separations.
spotlight cluster membership · all lens · mcs=15
PokémonClusterProb.OutlierTop 5 same-cluster Pokémon
#0025pikachu
#8 · 581.0000.000
elekidtogedemaruemolgadedennezebstrika
#0006charizard
#7 · 631.0000.000
magmortarcentiskorchheatranmagmarninetales
#0150mewtwo
#9 · 580.7650.235
necrozmamewmespritcalyrexsolgaleo
#0129magikarp
#3 · 1311.0000.000
feebasfinneonarrokudatympoleluvdisc
#0143snorlax
#5 · 911.0000.000
terapagoskecleonblisseydudunsparceaudino
interactive · open in a new tab

Each lens has its own self-contained datamapplot HTML: zoom, pan, search by Pokémon name, hover any point for its type and cluster id, cluster centroids labeled with the dominant type.

interactive · cross-lens explorer

Pick any Pokémon and see how each of the nine lenses describes its neighborhood. Same creature, nine different recommendation engines, watch them disagree. Click any neighbor sprite to navigate. This is the project's deliverable demo.

Open explorer →
What we learned

Three things the machinery told us

Findings that weren't visible until the pipeline produced numbers
  1. Move pools alone encode most of type identity. The moves lens recovers primary type at ARI 0.54 unsupervised and 0.90 with supervised UMAP, without ever seeing a type label. After the L2-balanced rebuild, the combined all lens reaches 0.93 — close to the supervised ceiling — meaning the structured features taken together carry the type signal almost completely, with moves doing most of the heavy lifting. The dual-type (type1, type2) ARI is a stricter measure and lands around 0.49 for both, suggesting secondary type is genuinely harder to recover from any single feature surface.
  2. Fine-tuning works for vision, but text remains harder — the modality matters. The pretrained sprite lens scored ARI -0.002 vs primary type; fine-tuning the ViT-B-32 vision tower on 820 (sprite, type-prompt) pairs lifted held-out test ARI to 0.371. Running the same recipe on the text side — fine-tuning MiniLM on (Pokédex blurb, type-prompt) pairs — only moves test ARI from 0.072 to 0.025. Pokédex blurbs are written without combat type in mind; even with supervision the encoder can only find what's in the input.
  3. GLOSH outliers on the flavor lens cleanly surface Ultra Beasts. Top-5 by outlier score: Kartana (0.839), Celesteela (0.838), Pheromosa (0.837), Nihilego (0.837). Four-of-five are the canonical "weird" Pokémon subgroup, with no supervision and no mention of "Ultra Beast" anywhere in the feature matrix. The eval surface independently validated against domain truth.
Try the Cross-Lens Explorer →