Operation Log System
Comprehensive audit trail with structured logging and query
Background
Centralized operation logging service that captures structured audit records from all microservices. Must handle high write throughput without impacting source service latency while supporting flexible historical queries for compliance and debugging.
Architecture
Source services → Kafka producer (structured log event with trace ID) → oplog consumer (go-zero) → batch insert → MySQL (3 tables). Query API: gRPC with time range, operator, and action type filters.
Key Implementations
Kafka Async Ingestion
All operation log entries are published to Kafka rather than written synchronously, decoupling log capture from source service response paths.
Why: Synchronous logging adds latency to every audited operation; async ingestion ensures audit capture never degrades user-facing performance.
Structured Log Schema
Each log entry captures actor, action, target, metadata, and cross-service trace ID in a consistent schema across all producing services.
Why: A uniform schema enables cross-service queries (e.g., all actions by a specific admin) without service-specific parsing logic.
Batch Log Processing
Kafka consumers accumulate log entries and perform batch inserts into MySQL at configurable intervals or batch sizes.
Why: Batch inserts reduce per-record write overhead and MySQL connection pressure compared to individual inserts under high log volume.
Technical Decisions
| Technical Decisions | Chosen | Alternative | Reason |
|---|---|---|---|
| Log ingestion method | Kafka async | Direct gRPC write | Kafka decouples producers from the oplog service's availability and absorbs traffic spikes without backpressure on source services. |
| Storage engine | MySQL with time-partitioned queries | Elasticsearch | MySQL meets current query requirements without introducing an additional infrastructure component. |