Project Overview

1. Foundation & Architecture

1.1 Project Overview: The Engine vs. The Layer

GitGalaxy is, fundamentally, a high-velocity, deterministic knowledge graph generator for planetary-scale codebases.

While the project is widely recognized for its interactive 3D WebGL visualizer—which renders repositories as explorable galaxies—this visualizer is ultimately just a presentation layer. The true core of the project is the analysis engine that powers it.

We built GitGalaxy because modern software engineering has outgrown traditional analysis tools. Abstract Syntax Trees (ASTs) are computationally expensive, deeply fragile, and require compilable code—meaning they fail completely when analyzing legacy mainframes or fragmented microservices. On the other end of the spectrum, Large Language Models (LLMs) suffer from severe context window limitations and output probabilistic, hallucination-prone architectural summaries.

GitGalaxy solves this by treating code files as raw structural text, scanning for functional anchors and architectural triggers to build a deterministic 3D knowledge graph in seconds. It seamlessly handles mid-file language switching, assesses the architectural ratio of test files to logic, and extracts invaluable project structure data that standard parsers ignore.

1.2 The blAST Paradigm: Bypassing LLMs & ASTs

To achieve this, GitGalaxy abandons the AST entirely in favor of a novel algorithm: blAST (Bypassing LLMs and ASTs).

The engine hunts for the universal structural markers of logic across over 50 languages and 250 file extensions. The blAST engine scans source code for regular expression (regex) profiles that indicate specific architectural intent. It parses the software's heuristics to identify: * Structural Purity: Where does the actual logic live versus the literature and comments? * Network Topology: How do files mechanically couple to one another? * Behavioral Traits: Does this file act as a pure data producer, an orchestrator, or a UI view?

Hyper-Scale Velocity By bypassing the compiler bottleneck and focusing on lexical alignment, blAST achieves processing velocities that traditional scanners cannot match: * Peak Velocity: Parsed the 141,445 lines of the original Apollo-11 Guidance Computer assembly code in 0.28 seconds (an alignment rate of 513,298 LOC/s). * Massive Monoliths: Processed the 3.2 million lines of OpenCV in just 11.11 seconds. * Planetary Scale: Effortlessly maps the architectural topology of hyper-scale repositories like TensorFlow (7.8M LOC), Kubernetes (5.5M LOC), and FreeBSD (24.4M LOC).

1.3 The Hub and Spoke Architecture

Because the blAST engine creates a perfectly standardized, queryable dataset (outputting to SQLite and highly optimized JSON), it acts as a central "Hub." The raw telemetry generated by the engine powers a massive ecosystem of specialized "Spokes" designed for enterprise engineering operations:

Legacy Modernization: Automated pipelines that utilize the engine's graphs to map, slice, and refactor legacy COBOL monoliths into modern Java Spring Boot microservices.
Supply Chain Security: Zero-trust firewalls that verify the physical reality of dependencies against Unicode homoglyphs and malicious execution headers.
Boundary Cartography: Full API network maps that compare physical codebase routers against official Swagger documentation to hunt down "Shadow APIs."
Compliance & Auditing: High-speed log parsers and SBOM generators that run on the engine's foundational logic to track PII leaks and verify software bills of materials.

1.4 Visualizing Complexity: The Artistic Layer

Code can be art. Logic can be art. Systems engineering can be art.

While the engine outputs forensic databases, humans are not optimized for processing rows of digits in a SQLite table; we are optimized for detecting patterns in nature and physics. The GitGalaxy Visualizer translates the engine's raw telemetry into a generative 3D environment.

GitGalaxy maps the calculated complexity to spatial position, movement, pulses, and colors, tapping into our evolutionary strengths to help us understand complex systems intuitively. It renders codebases with the following visual metaphors:

Files = Stars orbiting around an unseen black hole.
File Relationships (Dependencies) = Relative spatial locations of stars in the galaxy.
Inbound Imports (Gravity) = A star's pulse rate.
Functions / Classes = Satellites orbiting the stars.
Function Complexity = Arrangement, number, and size of satellites in a unit.
Languages Used = Base color overlays.

1.5 System File Map: Hub & Spoke Architecture

The backend ecosystem is organized into strict domains, separating the core graphing logic from the operational toolsets:

```text gitgalaxy/ ├── galaxyscope.py # The Main CLI Entry Point ├── ai_guardrails/ # Autonomous AI AppSec Constraints ├── core/ # The Hub: File ingestion, network graphing, and lens routing ├── physics/ # The Engine: Signal processing, ML analytics, and chronometers ├── recorders/ # The Exporters: JSON, SQLite, and LLM-ready markdown generators ├── security/ # The ML Models: XGBoost multiclass threat inference ├── standards/ # The Universal Laws: Language heuristics and analysis schemas └── tools/ # The Spokes: Enterprise operations driven by the core graph ├── cobol_to_cobol/ # Legacy modernization and JCL analysis ├── cobol_to_java/ # Automated Spring Boot microservice forging ├── compliance/ # Zero-Trust SBOM generators ├── network_auditing/ # API boundary mappers (Shadow API hunting) ├── supply_chain_security/ # Quarantines, vault sentinels, and binary X-Rays └── terabyte_log_scanning/ # High-velocity PII leak and mainframe SMF telemetry hunters

1.6 Measuring Risk Exposure, Not "Code Quality"

GitGalaxy assesses deviations from organizational standards and displays that info as localized heatmaps onto the 3D generated world. Importantly, GitGalaxy does not measure subjective "Code Quality." It measures Risk Exposure. Our measurements do not judge; they highlight. They function as sensors, not critics.

We do not assess "Bad Code"; we measure Cognitive Load Exposure—how hard it is for a human to work through the logic.
We do not assess "Missing Docs"; we assess Documentation Exposure—the risk to the team if a key person leaves.
We do not measure "System Security"; we measure Safety Exposure—the structural brittleness caused by a lack of try/except blocks and robust error handling.

By visualizing these exposures as toggleable color overlays, the visualizer provides an intuitive non-numeric dashboard. It allows an architecture team to agree on standards and instantly see—without reading a single line of text—where their system might be drifting into dangerous territory.

1.7 Zero-Trust Privacy Protocol

Whether you are running the command-line engine or the WebGL visualizer, GitGalaxy operates on a strict Zero-Trust Privacy Model: Your code never leaves your computer.

No Data Transmission: Source code is never transmitted to any API, cloud database, or third-party LLM service.
Air-Gap Ready: The entire suite of tools is designed to run in highly secure, internet-disconnected environments.
Ephemeral Memory Processing: Repositories are unpacked into a volatile memory buffer and are automatically purged when the operation completes.

🌌 Powered by the blAST Engine

This documentation is part of the GitGalaxy Ecosystem, an AST-free, LLM-free heuristic knowledge graph engine.

🪐 Explore the GitHub Repository for code, tools, and updates.
🔭 Visualize your own repository at GitGalaxy.io using our interactive 3D WebGPU dashboard.