feat: 实现Kafka批量消费与写入以提升吞吐量
引入批量处理机制,将消息缓冲并按批次写入数据库,显著提高消费性能。调整Kafka配置参数,优化消费者并发与提交策略。新增分区索引自动创建功能,并重构处理器以支持批量操作。添加降级写入逻辑以处理数据错误,同时增强指标收集以监控批量处理效果。
This commit is contained in:
@@ -0,0 +1,18 @@
|
||||
# Change: Optimize Kafka Consumption Performance
|
||||
|
||||
## Why
|
||||
User reports extremely slow Kafka consumption. Current implementation processes and inserts messages one-by-one, which creates a bottleneck at the database network round-trip time (RTT).
|
||||
|
||||
## What Changes
|
||||
- **New Requirement**: Implement Batch Processing for Kafka messages.
|
||||
- **Refactor**: Decouple message parsing from insertion in `processor`.
|
||||
- **Logic**:
|
||||
- Accumulate messages in a buffer (e.g., 500ms or 500 items).
|
||||
- Perform Batch Insert into PostgreSQL.
|
||||
- Implement Row-by-Row fallback for batch failures (to isolate bad data).
|
||||
- Handle DB connection errors with retry loop at batch level.
|
||||
|
||||
## Impact
|
||||
- Affected specs: `onoffline`
|
||||
- Affected code: `src/index.js`, `src/processor/index.js`
|
||||
- Performance: Expected 10x-100x throughput increase.
|
||||
@@ -0,0 +1,13 @@
|
||||
## ADDED Requirements
|
||||
### Requirement: 批量消费与写入
|
||||
系统 SHALL 对 Kafka 消息进行缓冲,并按批次写入数据库,以提高吞吐量。
|
||||
|
||||
#### Scenario: 批量写入
|
||||
- **GIVEN** 短时间内收到多条消息 (e.g., 500条)
|
||||
- **WHEN** 缓冲区满或超时 (e.g., 200ms)
|
||||
- **THEN** 执行一次批量数据库插入操作
|
||||
|
||||
#### Scenario: 写入失败降级
|
||||
- **GIVEN** 批量写入因数据错误失败 (非连接错误)
|
||||
- **WHEN** 捕获异常
|
||||
- **THEN** 自动降级为逐条写入,以隔离错误数据并确保有效数据入库
|
||||
@@ -0,0 +1,5 @@
|
||||
## 1. Implementation
|
||||
- [ ] Refactor `src/processor/index.js` to export `parseMessageToRows`
|
||||
- [ ] Implement `BatchProcessor` logic in `src/index.js`
|
||||
- [ ] Update `handleMessage` to use `BatchProcessor`
|
||||
- [ ] Verify performance improvement
|
||||
@@ -83,3 +83,16 @@
|
||||
- **WHEN** 解析时间戳
|
||||
- **THEN** 自动乘以 1000 转换为毫秒
|
||||
|
||||
### Requirement: 批量消费与写入
|
||||
系统 SHALL 对 Kafka 消息进行缓冲,并按批次写入数据库,以提高吞吐量。
|
||||
|
||||
#### Scenario: 批量写入
|
||||
- **GIVEN** 短时间内收到多条消息 (e.g., 500条)
|
||||
- **WHEN** 缓冲区满或超时 (e.g., 200ms)
|
||||
- **THEN** 执行一次批量数据库插入操作
|
||||
|
||||
#### Scenario: 写入失败降级
|
||||
- **GIVEN** 批量写入因数据错误失败 (非连接错误)
|
||||
- **WHEN** 捕获异常
|
||||
- **THEN** 自动降级为逐条写入,以隔离错误数据并确保有效数据入库
|
||||
|
||||
|
||||
Reference in New Issue
Block a user