What spark version are you using?
(您正在使用什么星火版本?)
AFAIK, you can specify group.id
, it's not random. (AFAIK,您可以指定group.id
,它不是随机的。)
Also, you don't have to use checkpoint dir, you can use kafka itself for managing the offsets (refer to https://spark.apache.org/docs/2.4.0/streaming-kafka-0-10-integration.html#storing-offsets ) (另外,您不必使用检查点目录,可以使用kafka本身来管理偏移量(请参阅https://spark.apache.org/docs/2.4.0/streaming-kafka-0-10-integration。 html#storing-offsets ))
Last but not least, I like to measure the lag from both sides - consumer metrics and broker side.
(最后但并非最不重要的一点是,我喜欢从双方(消费者指标和经纪人方面)来衡量滞后。)
The reason is that I have seen cases in which a consumer group was allegedly assigned to all partitions, but in some cases no offset was reported: (原因是我看到了一些案例,其中有一个消费者组据称分配给了所有分区,但是在某些情况下没有报告偏移量:)
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID
topic1 0 45896605954 46222875284 326269330 consumer-1
topic1 1 45888273257 46227424210 339150953 consumer-1
...
topic1 16 45678505506 46013061139 334555633 consumer-1
topic1 17 - 46225917726 - consumer-1
topic1 18 45893413333 46225853655 332440322 consumer-1
So the consumer itself didn't report any lag on partition 17, and the only way to catch it was describing the consumer group from broker side and parse it...
(因此,消费者本身没有报告分区17的任何滞后,捕获该分区的唯一方法是从经纪人角度描述消费者组并对其进行解析...)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…