Messaging Systems: Queue Based vs Log Based
Learn the key differences and important properties of queue and log based messaging systems.
Today, sharing another article covering technology that is widely used in the real time and streaming world. We will dive into the two popular messaging systems from a broader perspective, covering differences, key aspects and properties, giving you clear enough pictures where to go next.
Whether its queue based or log based, the end goal is simple that is buffering messages in a system to handle high volume of real time data asynchronously, producer and consumers are decoupled.
Lets dive into queue based and log based.
Queue Based
Queue Based Message Systems leverages either a JMS or a AMQP approach to process data in First In and First Out (FIFO) fashion.
A simple Queue Based Example would be something like this, where N Producers can push data to the queue and N consumers can read data from the queue in async fashion.
To Scale a queue to next level, consider the PubSub approach where we have a Topic and a Subscription that allows consumers to read same messages from their respective queues.
Lets go through some important aspects:
Examples: RabbitMQ, ActiveMQ, AWS SQS, GCP PubSub. They all share the same fundamentals with some specific features.
Multiple Producers can push to same Topic or Queue. Multiple consumers can read from same Queue.
A Queue can subscribe to multiple Topics.
Consumers pull data from the queue based on the interval, most queues today support long polling as well.
It handles duplication by making sure each message is processed exactly once.
Multiple consumers cannot read the same message from the same queue as shown in image 1a, in order to support that usecase, multiple queues are required leading us to a PubSub solution as shown in the image 1b.
Reprocessing is not supported as messages are deleted once they get processed successfully by the consumer.
A Dead Letter Queue (DLQ) is required for handling failed messages, DLQ are regular queues but connected to Source Queue to allow redriving failed messages easily.
Queue Based system supports message prioritization.
Guarantees order per queue where ordering is defined as consumer(s) reads in the same order as it was submitted by the producer(s).
Advanced message routing, filtering and exclusions can be done in a PubSub approach through Topics.
Queue may persist/spill data to disk for many reasons, e.g. memory pressure or configuration like delivery mode in RabbitMQ.
Messages remain in the Memory or Disk until consumed, making it much lighter in terms of storage.
Memory or Disk issues arise if there is a backpressure, that is a consumer is not reading at the same rate the producer is pushing to queue/topic.
Log Based
Log Based Messaging System maintains an append only log on disk, making it fault tolerant and providing persistent storage.
A simple end to end architecture would be something like this:
To scale this beyond, consumers can be grouped together in the following fashion to process from same topic in parallel on the partition level.
Lets go through some important aspects:
Example: Apache Kafka, AWS Kinesis. They all share the same fundamentals with some specific features.
Multiple Producers can push to same Topic. Multiple consumers or consumer groups can read from same topic.
A partition can be read by one consumer within a consumer group, applicable for Kafka.
Consumers pull the messages according to the specification, e.g. offsets and sequence number.
Multiple consumers can read the same message as shown in the image 2a, this is one of the big differentiator, allowing you to have one centralized system compared to Queue Based Model.
Consumers can be grouped together to work like one big consumer, each consumer in the group reads a different message as shown in image 2b.
Log Based system allows messages to be reprocessed via moving the offsets to a specific point in time, depending on log retention policy.
Guarantees ordering per partition, meaning you have to carefully decide the partitioning key.
Built in replication support per partition.
Easier to scale horizontally by adding more brokers, topics and partitions along with consumer groups within one centralized system.
Message routing via topics/partitions, meaning you need to carefully decide how to partition.
Log based takes advantage of page cache first then appends to disk making it fault tolerant.
Log Based stores data persistently for the specified timeframe configured via retention policy, making it act as a temporary short lived storage solution. It should not be considered as a database.
Disk Space issues arise with the increase in message size and count, estimating disk space needed in future is critical.
Use case
High level, both functionality wise achieve the same goal, to decide which one to choose requires some deep dive into the following items:
Ownership: Who will own?
Infrastructure: How easy is to setup?
Scale: What would be the estimated data scale in the near future?
Consumer requirements: What are the functionality needed?
💬Let me know in the comment section what else would you consider when deciding between the two.
⭐ To learn more about real time pipelines on cloud, checkout these few common patterns used across the industry below.
Related Content:
Learn what Messaging System Netflix Uses by
Learn how LinkedIn processes trillions of events by
Learn how Disney+ Hotstar leverages Message Queue by
Learn what is Database Sharding by