节点在线、应用在线、配置在线使用令牌查询
|
# 星尘监控中心 - 链路追踪统计架构图
本文档使用 Mermaid 图表展示链路追踪统计服务的架构设计和数据流。
---
## 1. 整体架构图
```mermaid
graph TB
subgraph "客户端应用层"
A1[业务应用1<br/>Stardust SDK]
A2[业务应用2<br/>Stardust SDK]
A3[业务应用N<br/>Stardust SDK]
end
subgraph "接入层"
B[TraceController<br/>验证+过滤+批量写入]
end
subgraph "统计服务层"
C1[TraceStatService<br/>核心统计引擎]
C2[TraceItemStatService<br/>跟踪项统计]
C3[AppDayStatService<br/>应用日统计]
end
subgraph "数据存储层"
D1[(TraceData<br/>原始数据<br/>按天分表)]
D2[(TraceMinuteStat<br/>5分钟统计)]
D3[(TraceHourStat<br/>小时统计)]
D4[(TraceDayStat<br/>日统计)]
D5[(AppMinuteStat<br/>应用分钟统计)]
D6[(AppDayStat<br/>应用日统计)]
end
A1 -->|HTTP POST| B
A2 -->|HTTP POST| B
A3 -->|HTTP POST| B
B -->|批量插入| D1
B -->|触发统计| C1
B -->|触发统计| C2
B -->|触发统计| C3
C1 -->|流式+批量| D2
C1 -->|流式+批量| D3
C1 -->|流式+批量| D4
C1 -->|流式+批量| D5
C2 -->|更新| D2
C3 -->|汇总| D6
style C1 fill:#f96,stroke:#333,stroke-width:4px
style B fill:#9cf,stroke:#333,stroke-width:2px
style D1 fill:#ffa,stroke:#333,stroke-width:2px
```
---
## 2. TraceStatService 核心流程
```mermaid
graph TB
subgraph "数据接入"
A[TraceData上报]
end
subgraph "流式计算 - 5秒周期"
B[放入 ConcurrentQueue]
C{队列长度<br/>> 10万?}
D[DoFlowStat<br/>取出数据]
E[累加到延迟队列]
E1[DayQueue<br/>日统计队列]
E2[HourQueue<br/>小时统计队列]
E3[MinuteQueue<br/>分钟统计队列]
E4[AppMinuteQueue<br/>应用分钟队列]
end
subgraph "批量计算 - 30秒周期"
F[DoBatchStat触发]
G[ProcessMinute<br/>从TraceData聚合]
H[等待5秒]
I[ProcessHour<br/>从MinuteStat聚合]
J[ProcessDay<br/>从HourStat聚合]
end
subgraph "延迟写入 - 60秒周期"
K[EntityDeferredQueue<br/>批量提交]
end
subgraph "数据库"
L[(统计表)]
end
A --> B
B --> C
C -->|是| X[丢弃新数据]
C -->|否| D
D --> E
E --> E1
E --> E2
E --> E3
E --> E4
E1 --> K
E2 --> K
E3 --> K
E4 --> K
K --> L
F --> G
G --> H
H --> I
I --> J
J --> L
style C fill:#f96,stroke:#333,stroke-width:2px
style K fill:#9f6,stroke:#333,stroke-width:2px
```
---
## 3. 数据流转时序图
```mermaid
sequenceDiagram
participant Client as 客户端应用
participant Controller as TraceController
participant Queue as ConcurrentQueue
participant FlowTimer as 流式定时器(5s)
participant BatchTimer as 批量定时器(30s)
participant DelayQueue as 延迟队列(60s)
participant DB as 数据库
Client->>Controller: 上报追踪数据
Controller->>DB: 批量插入 TraceData
Controller->>Queue: 放入队列
Note over Queue: 等待流式计算
FlowTimer->>Queue: 取出数据(最多10万条)
Queue->>DelayQueue: 累加到延迟队列
Note over DelayQueue: DayQueue<br/>HourQueue<br/>MinuteQueue<br/>AppMinuteQueue
Note over DelayQueue: 60秒后批量提交
DelayQueue->>DB: 批量Update统计表
Note over BatchTimer: 30秒周期触发
BatchTimer->>DB: ProcessMinute<br/>从 TraceData 聚合
DB-->>BatchTimer: 返回聚合数据
BatchTimer->>DB: 写入 MinuteStat
Note over BatchTimer: 等待5秒
BatchTimer->>DB: ProcessHour<br/>从 MinuteStat 聚合
DB-->>BatchTimer: 返回聚合数据
BatchTimer->>DB: 写入 HourStat
BatchTimer->>DB: ProcessDay<br/>从 HourStat 聚合
DB-->>BatchTimer: 返回聚合数据
BatchTimer->>DB: 写入 DayStat
```
---
## 4. 数据模型层级关系
```mermaid
graph LR
subgraph "原始层"
A[TraceData<br/>按天分表<br/>保留7-30天]
B[SampleData<br/>采样详情<br/>按天分表]
end
subgraph "分钟层 - 5分钟粒度"
C[TraceMinuteStat<br/>接口维度<br/>保留30-90天]
D[AppMinuteStat<br/>应用维度<br/>保留30-90天]
end
subgraph "小时层 - 1小时粒度"
E[TraceHourStat<br/>接口维度<br/>保留90-180天]
end
subgraph "日层 - 1天粒度"
F[TraceDayStat<br/>接口维度<br/>保留365天+]
G[AppDayStat<br/>应用维度<br/>保留365天+]
end
A -->|聚合| C
A -->|聚合| D
C -->|聚合| E
E -->|聚合| F
F -->|聚合| G
style A fill:#faa,stroke:#333,stroke-width:2px
style F fill:#afa,stroke:#333,stroke-width:2px
style G fill:#afa,stroke:#333,stroke-width:2px
```
---
## 5. 性能瓶颈点分布
```mermaid
graph TB
subgraph "接入层瓶颈"
A1[数据验证和过滤]
A2[批量插入原始数据]
end
subgraph "流式计算瓶颈"
B1[队列积压<br/>超10万条丢弃]
B2[延迟队列内存占用]
B3[四层队列并发处理]
end
subgraph "批量计算瓶颈"
C1[大量数据库查询<br/>单应用数万行]
C2[TP99计算<br/>排序+循环]
C3[同步阻塞等待<br/>Thread.Sleep 5秒]
end
subgraph "存储层瓶颈"
D1[查询扫描<br/>分表索引]
D2[批量写入<br/>并发冲突]
D3[缓存穿透<br/>10秒过期]
end
B1 -.->|影响| E1[数据丢失]
C1 -.->|影响| E2[数据库压力]
C2 -.->|影响| E3[CPU消耗]
C3 -.->|影响| E4[吞吐量降低]
D3 -.->|影响| E5[缓存雪崩]
style B1 fill:#f96,stroke:#333,stroke-width:3px
style C1 fill:#f96,stroke:#333,stroke-width:3px
style C2 fill:#fc6,stroke:#333,stroke-width:2px
style C3 fill:#fc6,stroke:#333,stroke-width:2px
style D3 fill:#fc6,stroke:#333,stroke-width:2px
```
---
## 6. 优化路线图
```mermaid
graph LR
subgraph "短期优化 1-2周"
S1[增大队列容量<br/>10万→50万]
S2[移除同步阻塞<br/>Thread.Sleep]
S3[优化缓存策略<br/>10秒→30秒]
end
subgraph "中期优化 1-2月"
M1[引入消息队列<br/>Kafka/RabbitMQ]
M2[拆分统计服务<br/>微服务化]
M3[优化TP99计算<br/>近似算法]
end
subgraph "长期优化 3-6月"
L1[实时计算引擎<br/>Flink/Spark]
L2[列式存储<br/>ClickHouse]
L3[多级存储架构<br/>热温冷分离]
end
Current[当前状态<br/>80%资源占用] --> S1
S1 --> S2
S2 --> S3
S3 -->|收益20-30%| M1
M1 --> M2
M2 --> M3
M3 -->|收益50-100%| L1
L1 --> L2
L2 --> L3
L3 -->|收益10-100倍| Future[目标状态<br/>支持亿级QPS]
style Current fill:#f96,stroke:#333,stroke-width:2px
style Future fill:#9f6,stroke:#333,stroke-width:2px
```
---
## 7. 缓存架构
```mermaid
graph TB
subgraph "查询请求"
A[统计查询]
end
subgraph "三级缓存"
B[L1: 对象缓存<br/>5分钟<br/>单个统计对象]
C[L2: 列表缓存<br/>10秒<br/>查询结果集]
D[L3: XCode缓存<br/>默认<br/>实体缓存]
end
subgraph "数据源"
E[(数据库)]
end
A --> B
B -->|未命中| C
C -->|未命中| D
D -->|未命中| E
E -->|回填| D
D -->|回填| C
C -->|回填| B
B -->|返回| A
style B fill:#9f6,stroke:#333,stroke-width:2px
style C fill:#9cf,stroke:#333,stroke-width:2px
style D fill:#fcf,stroke:#333,stroke-width:2px
```
---
## 8. 分表策略
```mermaid
graph LR
subgraph "应用上报"
A[TraceData<br/>2026-02-01]
B[TraceData<br/>2026-02-02]
C[TraceData<br/>2026-02-03]
end
subgraph "按天自动分表"
D[(TraceData_01)]
E[(TraceData_02)]
F[(TraceData_03)]
end
subgraph "查询路由"
G[XCode ShardPolicy<br/>时间路由]
end
A -->|写入| D
B -->|写入| E
C -->|写入| F
G -.->|范围查询| D
G -.->|范围查询| E
G -.->|范围查询| F
style G fill:#fc6,stroke:#333,stroke-width:2px
```
---
## 9. 双模式计算对比
```mermaid
graph TB
subgraph "流式计算模式"
A1[实时处理]
A2[内存队列]
A3[延迟60秒写入]
A4[适用正常流量]
A1 --> A2 --> A3 --> A4
end
subgraph "批量计算模式"
B1[定期触发30秒]
B2[数据库聚合]
B3[立即写入]
B4[补偿遗漏数据]
B1 --> B2 --> B3 --> B4
end
subgraph "最终一致性"
C[统计结果]
end
A4 -.->|快速| C
B4 -.->|准确| C
style A4 fill:#9f6,stroke:#333,stroke-width:2px
style B4 fill:#fc6,stroke:#333,stroke-width:2px
style C fill:#9cf,stroke:#333,stroke-width:3px
```
---
## 10. 资源消耗分布(预估)
```mermaid
pie title 系统资源占比
"链路追踪统计" : 80
"其他服务" : 20
```
```mermaid
pie title 统计服务内部资源消耗
"CPU(聚合计算)" : 35
"数据库IO" : 35
"内存(队列缓存)" : 25
"网络IO" : 5
```
---
## 使用说明
以上图表使用 Mermaid 语法编写,可以在以下环境中渲染:
1. **GitHub/GitLab**: 原生支持 Mermaid 渲染
2. **VS Code**: 安装 `Markdown Preview Mermaid Support` 插件
3. **在线工具**: [Mermaid Live Editor](https://mermaid.live/)
4. **文档工具**: Notion、Obsidian 等支持 Mermaid 的工具
---
**最后更新:** 2026-02-02
|