Skip to content

How to Prove Dead Code with Terabyte Log Scanning

In massive legacy systems (like COBOL mainframes or decade-old Java monoliths), developers are terrified to delete code. Even if a function looks unused in the repository, no one knows if a critical end-of-year batch job relies on it via a dynamic call.

Static analysis cannot solve this. To safely deprecate code, you must cross-reference your repository map against raw production execution logs.

Because production logs often span terabytes, standard indexing tools crash. GitGalaxy solves this with the Mega Log Parser, which streams binary logs at extreme velocities to build an execution matrix.

The "Memory Shield" Execution Scanner

The scanner takes the structural targets identified by GitGalaxy and hunts for their execution signatures in translated ASCII SMF dumps or standard application logs, streaming the data without ever loading the file into RAM.

1. Execute the Scan

You can pass manual keywords, or directly feed the scanner the ir_state.json generated by your initial GitGalaxy repository scan.

python gitgalaxy/tools/terabyte_log_scanning/terabyte_log_scanner.py /path/to/production_dump.log --input_state /path/to/ir_state.json

2. Analyze Runtime Execution vs. Dead Code

As the engine streams the log, it generates ASCII time-series histograms for every target program.

If a program exists in your Git repository but registers 0 hits across 12 months of production logs, it is mathematically proven to be dead "Graveyard" code and can be safely deleted before migrating to the cloud.

If a program is executing, the histogram instantly reveals its temporal cadence (e.g., a daily cron job vs. an erratic anomaly spike).

 === TIME-SERIES: PAYMENT_ROUTER_SVC ===
 [2026-04-18T01:00] â–ˆ (120 hits)
 [2026-04-18T02:00] ████████████████████████████████████████ (5,000 hits)  <-- ANOMALY SPIKE
 [2026-04-18T03:00] â–ˆ (115 hits)

3. The Dynamic Telemetry Sidecar

The scanner outputs a dynamic_telemetry.json sidecar file alongside your logs.

This file contains the hard execution counts for every program. You can feed this sidecar back into the GitGalaxy visualizer or LLM Recorder to augment your static risk map with physical runtime reality.

Read the full technical specification: Terabyte Log Scanner