[R] VLouvain: Louvain Community Detection Directly on Vectors, No Graph Construction
You have embeddings for your objects. You want to build a similarity graph and find communities, whether for GraphRAG, a recommender system, or just finding structure in your data. So you compute pairwise similarities, build the graph, run Louvain. Except now you have O(n^2) edges and everything crashes above ~15K nodes.
VLouvain reformulates Louvain to work directly on the embedding matrix. Degrees and modularity gains are computed from community-level vector sums, no edges involved. You maintain O(n*d) state instead of O(n^2). The result is mathematically identical to standard Louvain, not an approximation.
On Amazon Products (1.57M nodes, d=200), VLouvain completes in ~11,300 seconds. Every other method we tested (cuGraph, iGraph, GVE, NetworKit) fails before reaching half that scale.
One thing we didn't expect: Top-K sparsification doesn't save you. We built exact and approximate Top-K graphs via FAISS, and even at K=256 the partitions had NMI ~0.04 against the full graph. If you're truncating your similarity graph to make Louvain feasible, you're getting back essentially random communities.
As a drop-in replacement for graph construction in GraphRAG, indexing went from 3 hours to 5.3 minutes, retrieval recall improved from 37.9% to 48.8% on MultiHopRAG.
Paper (EDBT 2026): https://openproceedings.org/2026/conf/edbt/paper-72.pdf
[link] [comments]
Want to read more?
Check out the full article on the original site