kafka log deletion and load balancing across consumers -


say consumer time intensive processing. in order scale consumer side processing, spawn multiple consumers , consumer messages kafka topic in round robin fashion. based on documentation, seems if create multiple consumers , add them in 1 consumer group, 1 consumer messages. if add consumers different consumer groups, each consumer same message. so, in order achieve above objective, solution partition topic ? seems odd design choice, because consumer scalability bleeding topic , producer design. ideally, if topic not partitioning, there should no need partition it. puts un-necessary logic on producer , causes other consumer types consume these partitions may make sense 1 type of consumer. plus limits usecase, consumer type may want ordering on messages, splitting topic partitions may not possible.

second if choose "cleanup.policy" compact, mean kafka log keep increasing maintain latest value each key? if not, how can log deletion , compaction?

update: seems have 2 options achieve scalability on consumer side, independent of topic scaling.

  1. create consumer groups , have them consume odd , offsets. logic have built consumers discard un-needed messages. doubles network requirements

  2. create hierarchy of topics, root topic gets messages. job classifies logs , publish them again more fine grained topics. in case, strong ordering can achieved @ root , more fine grained topics consumer scaling can constructed.

in 0.8, kafka maintains consumer offset, publishing messages in round robin across various consumers not far fetched requirement design.

partitions unit of parallelism in kafka design. not consumtion kafka distributes partiotions accross cluster has different other benifits sharing load among different servers, replication management ensuring no data loss, managing log scale beyond size fit on single server etc.

ordering of messages key factor if not need storng ordering diving topics multiple partitions allow evenly distribute load while producing (this handled producer itself). , while using consumer group need add more consumer instances in same group in order consume them parallely.

plus limits usecase, consumer type may want ordering on messages, splitting topic partitions may not possible.

true,from doc

however, if require total order on messages can achieved topic has 1 partition, though mean 1 consumer process.

maintaining ordering whiile consuming in distributed manner requires messaging system maintain per-message state keep track of message acknowledgement. involve lot of expensive random i/o in system. there trade-off.

ideally, if topic not partitioning, there should no need partition it. puts un-necessary logic on producer , causes other consumer types consume these partitions may make sense 1 type of consumer

distributing messages across partitions typically handled producer self without intervention programmers end (assuming don't want categories messages using key). , consumers mentioned here better choice use simple/low level consumers allow consume subset of partitions in topic.

this seems odd design choice, because consumer scalability bleeding topic , producer design

i believe system kafka focuses on high throughput ( handle hundreds of megabytes of reads , writes per second thousands of clients ), ensuring scalability , strong durability , fault-tolerance guarantees might not fit having totally different business requirements.


Comments

Popular posts from this blog

c - Bitwise operation with (signed) enum value -

xslt - Unnest parent nodes by child node -

python - Healpy: From Data to Healpix map -