-
Notifications
You must be signed in to change notification settings - Fork 14.5k
KAFKA-19427: Allow the coordinator to grow its buffer dynamically #20040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
KAFKA-19427: Allow the coordinator to grow its buffer dynamically #20040
Conversation
...tor-common/src/main/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntime.java
Outdated
Show resolved
Hide resolved
...tor-common/src/main/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntime.java
Outdated
Show resolved
Hide resolved
...tor-common/src/main/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntime.java
Outdated
Show resolved
Hide resolved
int maxBatchSize = partitionWriter.config(tp).maxMessageSize(); | ||
if (currentBatch.builder.buffer().capacity() > maxBatchSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we create the batch, we set maxMessageSize
as the limit so it should not create a buffer larger than it as we don't append to the batch if it does not have room for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, or are you doing this in case maxMessageSize
is reduced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I haven’t written the PR description yet.
If there's a single record whose size exceeds maxMessageSize
, it's still possible for the buffer to grow larger than maxMessageSize
. So in this case, I think we should revert to using a smaller buffer afterward.
kafka/clients/src/main/java/org/apache/kafka/common/record/MemoryRecordsBuilder.java
Lines 857 to 858 in 56d1dc1
if (numRecords == 0) | |
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, my fault. I found there is check to prevent append large record
Lines 1000 to 1004 in 3d8a018
if (estimatedSize > currentBatch.builder.maxAllowedBytes()) { | |
throw new RecordTooLargeException("Message batch size is " + estimatedSize + | |
" bytes in append to partition " + tp + " which exceeds the maximum " + | |
"configured size of " + currentBatch.maxBatchSize + "."); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mingyen066 the code you attach is an example of atomic write. However, it is possible to write a single large record for the non-atomic writes. Additionally, as @dajac's commented, the maxMessageSize
could be configured to a smaller value dynamically. Hence, I think the check is necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For another, the origin buffer currentBatch.buffer
could be larger than new maxMessageSize
too. Perhaps, we should keep only the buffer which has valid capacity. For example:
Stream.of(currentBatch.builder.buffer(), currentBatch.buffer)
.filter(buf -> buf.capacity() <= maxBatchSize)
.max(Comparator.comparing(Buffer::capacity))
.ifPresent(bufferSupplier::release);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dajac and @chia7712. I’ve updated the code to check both currentBatch.buffer
and currentBatch.builder.buffer()
.
A small finding while writing the test: I found MockPartitionWriter#append
includes another record size check, but CoordinatorPartitionWriter#append
does not. So I'm not sure if it's possible to write a single large record with non-atomic writes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mingyen066 you could simplify override the MockPartitionWriter#append
to skip the check in the test case. In production code, the size check happens in UnifiedLog
...tor-common/src/main/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntime.java
Outdated
Show resolved
Hide resolved
...tor-common/src/main/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntime.java
Outdated
Show resolved
Hide resolved
...tor-common/src/main/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntime.java
Outdated
Show resolved
Hide resolved
...common/src/test/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntimeTest.java
Outdated
Show resolved
Hide resolved
...common/src/test/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntimeTest.java
Outdated
Show resolved
Hide resolved
@mingyen066 this PR also fix the issue that decreasing the message size dynamically does not re-create the buffer. Hence, could you please add test for the scenario? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice patch! Overall LGTM. Leave one minor comment. Could we change test case name like following:
- testCoordinatorDoNotRetainLargeBuffer -> testCoordinatorDoNotRetainBufferLargeThanMaxMessageSize
- testCoordinatorRetainExpandedBuffer -> testCoordinatorRetainBufferLessOrEqualToMaxMessageSize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I left one comment. Also, it would be good to add a test case where maxBatchSize
is set smaller than MIN_BUFFER_SIZE
.
...tor-common/src/main/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntime.java
Outdated
Show resolved
Hide resolved
Thanks @chia7712 , I've added a test to cover the scenario where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
...common/src/test/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntimeTest.java
Outdated
Show resolved
Hide resolved
// Verify that there is no cached buffer. | ||
assertEquals(1, ctx.bufferSupplier.get(1).capacity()); | ||
|
||
// Write #3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add unit test to ensure the maximum value of capacity of buffer is equal to maxMessageSize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced the previous assertion with assertEquals(mockWriter.config(TP).maxMessageSize(), ctx.bufferSupplier.get(1).capacity())
to ensure that
…hat less than INITIAL_BUFFER_SIZE
Coordinator starts with a smaller buffer, which can grow as needed.
In freeCurrentBatch, release the appropriate buffer:
The Coordinator recycles the expanded buffer
(
currentBatch.builder.buffer()
), notcurrentBatch.buffer
, becauseMemoryBuilder
may allocate a newByteBuffer
if the existing oneisn't large enough.
There are two cases that buffer may exceeds
maxMessageSize
1.If there's a single record whose size exceeds
maxMessageSize
(which,so far, is derived from
max.message.bytes
) and the write is innon-atomic
mode, it's still possible for the buffer to grow beyondmaxMessageSize
. In this case, the Coordinator should revert to using asmaller buffer afterward. 2. Coordinator do not recycles the buffer
that larger than
maxMessageSize
. If the user dynamically reducesmaxMessageSize
to a value even smaller thanINITIAL_BUFFER_SIZE
, theCoordinator should avoid recycling any buffer larger than
maxMessageSize
so that Coordinator can allocate the smaller buffer inthe next round.
Add tests to verify the above scenarios.