feat: 实现Kafka批量消费与写入以提升吞吐量

引入批量处理机制,将消息缓冲并按批次写入数据库,显著提高消费性能。调整Kafka配置参数,优化消费者并发与提交策略。新增分区索引自动创建功能,并重构处理器以支持批量操作。添加降级写入逻辑以处理数据错误,同时增强指标收集以监控批量处理效果。
This commit is contained in:
2026-02-09 10:50:56 +08:00
parent a8c7cf74e6
commit 8337c60f98
17 changed files with 1165 additions and 330 deletions

View File

@@ -0,0 +1,18 @@
# Change: Optimize Kafka Consumption Performance
## Why
User reports extremely slow Kafka consumption. Current implementation processes and inserts messages one-by-one, which creates a bottleneck at the database network round-trip time (RTT).
## What Changes
- **New Requirement**: Implement Batch Processing for Kafka messages.
- **Refactor**: Decouple message parsing from insertion in `processor`.
- **Logic**:
- Accumulate messages in a buffer (e.g., 500ms or 500 items).
- Perform Batch Insert into PostgreSQL.
- Implement Row-by-Row fallback for batch failures (to isolate bad data).
- Handle DB connection errors with retry loop at batch level.
## Impact
- Affected specs: `onoffline`
- Affected code: `src/index.js`, `src/processor/index.js`
- Performance: Expected 10x-100x throughput increase.

View File

@@ -0,0 +1,13 @@
## ADDED Requirements
### Requirement: 批量消费与写入
系统 SHALL 对 Kafka 消息进行缓冲,并按批次写入数据库,以提高吞吐量。
#### Scenario: 批量写入
- **GIVEN** 短时间内收到多条消息 (e.g., 500条)
- **WHEN** 缓冲区满或超时 (e.g., 200ms)
- **THEN** 执行一次批量数据库插入操作
#### Scenario: 写入失败降级
- **GIVEN** 批量写入因数据错误失败 (非连接错误)
- **WHEN** 捕获异常
- **THEN** 自动降级为逐条写入,以隔离错误数据并确保有效数据入库

View File

@@ -0,0 +1,5 @@
## 1. Implementation
- [ ] Refactor `src/processor/index.js` to export `parseMessageToRows`
- [ ] Implement `BatchProcessor` logic in `src/index.js`
- [ ] Update `handleMessage` to use `BatchProcessor`
- [ ] Verify performance improvement

View File

@@ -83,3 +83,16 @@
- **WHEN** 解析时间戳
- **THEN** 自动乘以 1000 转换为毫秒
### Requirement: 批量消费与写入
系统 SHALL 对 Kafka 消息进行缓冲,并按批次写入数据库,以提高吞吐量。
#### Scenario: 批量写入
- **GIVEN** 短时间内收到多条消息 (e.g., 500条)
- **WHEN** 缓冲区满或超时 (e.g., 200ms)
- **THEN** 执行一次批量数据库插入操作
#### Scenario: 写入失败降级
- **GIVEN** 批量写入因数据错误失败 (非连接错误)
- **WHEN** 捕获异常
- **THEN** 自动降级为逐条写入,以隔离错误数据并确保有效数据入库