What is PyAGC?
PyAGC is a Python library built on PyTorch and PyTorch Geometric for attributed graph clustering, unifying over 20 state-of-the-art methods like deep attentional embedding approaches and adaptive graph convolution under a simple encode-cluster-optimize framework. It handles everything from loading attributed graph datasets to training scalable models that run on graphs up to 111 million nodes using a single 32GB GPU, with mini-batch support and YAML-driven configs for quick experiments. Developers get reproducible benchmarks across 12 diverse datasets, complete with supervised accuracy metrics and unsupervised ones like modularity and conductance.
Why is it gaining traction?
It stands out by fixing common pain points in attributed graph clustering: over-reliance on tiny homophilous graphs, poor scalability beyond 100k nodes, and inconsistent evaluations that ignore structural quality. The plug-and-play design lets you swap encoders or cluster heads via config tweaks, while production-tested mini-batching and GPU-accelerated KMeans make it deployable for real-world tasks like fraud detection. Clean docs, PyPI install, and full benchmark reproducibility hook devs tired of scattered, non-scalable codebases.
Who should use this?
Graph ML engineers at companies like Ant Group tackling user profiling, anti-money laundering, or community detection on large attributed graphs with tabular or textual features. Researchers benchmarking methods on heterophilous or massive datasets, needing both label alignment (ACC, NMI) and graph structure metrics. PyTorch Geometric users wanting a drop-in library for attributed graph clustering without reinventing encoders or loaders.
Verdict
Grab it if you're in graph clustering—strong scalability, docs, and benchmarks outweigh the early maturity (19 stars, 1.0% credibility score). Low adoption means watch for edge cases, but it's a smart bet for PyTorch workflows.
(198 words)