From Kafka to ZeroMQ for real-time log aggregation

thomaslee · on Sept 17, 2016

I used to be on a team responsible for a single small-ish Kafka cluster (between 6-12 nodes) doing non-trivial throughput on bare metal. Without commenting on whether ZeroMQ is the right alternative: I can understand being scared off. Our hand was forced such that we had to go the other way and understand what was going on in Kafka.

The kicker is that Kafka can be rock solid in terms of handling massive throughput and reliability when the wheels are well greased, but there are a lot of largely undocumented lessons to learn along the way RE: configuration and certain surprising behavior that can arise at scale (such as https://issues.apache.org/jira/browse/KAFKA-2063, which our team ran into maybe a year ago & is only being fixed now).

Symptoms of these issues can cause additional knock-on effects with respect to things like leader election (we wound up with a "zombie leader" in our cluster that caused all sorts of bizarre problems) and graceful shutdowns.

Add to that the fact the software is still very much under active development (sporadic partition replica drops after an upgrade from 0.8.1 to 0.8.2; we had to apply some small but crucial patches from Uber's fork) & that it needs a certain level of operational maturity to monitor it all ... it's easy to get nervous about what the next "surprise" will be.

Having said all that, I'd use Kafka again in a heartbeat for those high volume use cases where reliability matters. Not sure I'd advise others without similar operational experience to do the same for anything mission critical, though -- unless you like stress. That stress is why Confluent is in business. :)

BrandonBradley · on Sept 17, 2016

I can attest to 'getting nervous about what the next surprise will be' with Kafka. And I'm only dealing with a single node right now.

Kafka and Confluent Platform are very much still works in progress. I had to patch Kafka Connect HDFS connector because a fix I needed was left out of the last release. Be prepared to do something similar with any of Kafka's components.

buster · on Sept 17, 2016

To me it sounds like Kafka was not understood in full detail (maybe because missing documentation or the high complexity) and they switched to a system they build themselves. Naturally they know in full detail what is going on and can set up the system as needed.

I am wondering if working on solving the actual problems with Kafka would have been the better route. I've never used Kafka and i find ZeroMQ great, but reading that their logging solution does drop log messages is a huge no-go for operations. How can you claim to run a serious business and say "babies will die" when you can't be sure to be able to find problems?

Because, when will you lose logs? Not in normal operation, but when weird things happen. When networking has a hiccup. When Load on the system is too high, so most likely when many people are using your service. Exactly when shit hits the fan. And you just made the decision that it's ok to drop log messages in such cases? That's not good.

I think you should either dive into Kafka/Zookeeper and fix your problems or switch to another logging solution. You should probably just drop that non-sense "streaming and real-time logs" requirement and live with a log delay of a few seconds and build something really stable instead of building something inherently unstable. Honestly, just collecting syslogs on the core vm and sending them to a central server would have been the better solution. Better then looking into fancy real-time, streaming logs on a sunday night because the system is having a breakdown and you can't even be sure that you are not missing essential logs.

AYBABTME · on Sept 17, 2016

One has only two choices in those situations: drop logs or block receiving more logs. Given their availability requirements, I don't think that blocking is a viable choice. So dropping logs seems to be the only sane choice here. There's no other alternative really so I'm not sure about the consternation.

hinkley · on Sept 19, 2016

I come to this discussion much like one would sit down at a bar and find their two friends are deep in a coding discussion they didn't hear the beginning of.

How do you end up with a system design that can generate logs so fast that you can't keep up? It seems to be that some fundamental element of capacity planning was missed long ago and we're trying to fix the symptoms and not the cause.

If I have a geographically distributed system, I'm going to have bandwidth and latency issues if I try this, sure. But why do I care? A request to the SF data center shouldn't involve the Munich data center. That is, if I care about response times, and if I don't care about that, then why do I care about instantaneous log availability?

I think sometimes we get so bored with the problems we have that we invent new things to get upset about. Or management does, which is always worse.

buster · on Sept 17, 2016

What? No. You can just save logs on the disk and buffer them. Just dropping logs or blocking because some network resource is not available are both terrible choices and that's not how logging worked for the last decades. Throwing away logs ist a major step backwards.

AYBABTME · on Sept 17, 2016

If you buffer to disk, the same problem will eventually show up. Queues (in memory, on disk, anywhere) are all ultimately bounded, and when they are full, you have 2 choices: block or drop. Somehow you need to make the choice, there's no getting away from it.

If you don't make the choice consciously, say by assuming that you can buffer to disk and avoid the problem, at some point you'll fill up your disks and your system will block: you'll have unknowingly picked the "block" option. If you decide to rotate logs and delete old rotations when too many logs are present, then you're picking the "drop" option...

buster · on Sept 17, 2016

That's why you aggregate logs in a central service. I was writing about sending logs to a central service and not about how your disks fill up with more logs. There is log rotation for that and usually your logs will have been sent way before any log rotates. If your log rotation deletes logs before you aggregated them or if you let your disks fill up with logs you have a much bigger problem you should fix, of course.

AYBABTME · on Sept 18, 2016

That's again the same problem; if your centralized service isn't reachable for whatever reason, your nodes can buffer for a while (in memory or on disk) but eventually the problem always will boil down to 'drop' or 'block'. However you construct it, somewhere you need to make that call. They made the call to drop logs, it's totally fine.

buster · on Sept 18, 2016

If your central infrastructure is down for many days you have other problems. Buffering logs on disks. Rotating and zipping files doesn't take much space. You can buffer a lot before you run into trouble. That's my point. You sound like buffering on the node is only possible for seconds whereas in real world scenarios log files are written for days, weeks or even months. Even in very large deployments and lots of logs. You would make the decision to don't use this advantage and throw logs away for no good reason.

macns · on Sept 17, 2016

[..] i find ZeroMQ great, but reading that their logging solution does drop log messages is a huge no-go for operations.

ZMQ silently drops messages when subscribers fail or not listening or when buffers fill up, but as they describe later on "access to historical logs", it's much easier to set up separate process/es for just that.

It seems that when shit hits the fan for this reason ZMQ really is a more reliable choice because it's more flexible.

buster · on Sept 17, 2016

No. When you can't rely on your collected ZMQ logs and need to "access historical logs" by some other means, why use the ZMQ logs at all? You usually don't know that something was not logged.

Also, as he describes in the article, historical logs are scoped out and it is "likely" they they will develop something for those logs in the future. Again it looks like the plan is to use ZMQ and a subscriber to put those logs into logstash. That doesn't solve the problem i mentioned at all. ZMQ may still drop the logs! So, as far as i understand they don't have a plan for reliable logging. Even if they would, they'd have one reliable solution and an unreliable solution. The unreliable ZMQ based approach is probably neat and leads to fancy realtime dashboard stuff, but since it's not a reliable source of information it's not a good solution for operating a system where "babies will die".

macns · on Sept 18, 2016

AFAIK there's no ZMQ logs, ZMQ is just messaging patterns over various protocols, logging is your responsibility.

agentgt · on Sept 17, 2016

I don't understand why people need such ridiculously fast systems when we are using RabbitMQ and crappy Apache flume and we generate more than 5k with spikes of 50k messages/second. Please author of the article tell me your metrics.

And our log messages are ridiculously big at times (15k to as big as 50k).

Our pipe never has problems. What fails for us is Elastic Search. In fact at one point in the past we did 100k messages/s when embarrassingly had debug turned on in production and RabbitMQ did not fail but Elastic Search and sadly Flume did as well (I tried to get rid of flume with a custom Rust AMQP to Elastic Search client but at the time had some bugs with the libraries.. Maybe I will recheck out Mozilla Heka someday).

There is this sort of beating of the developer chest with a lot of tech companies.. that hey listen we are ultra important and we are dealing with ridiculously traffic and we need ultra high performance. Please tell/show me these numbers.... Or maybe stop logging crap you don't need to log.

Or maybe I'm wrong and we should log absolutely everything and Auth0 made the right choice given their needs (lets assume they have millions of messages a second), I still think I could make a sharded RabbitMQ go pretty far.

This goes with other technology as well. You don't need to pick hot glamorous NoSQL when Postgresql or MySQL and a tiny bit of engineering will get the job done just fine particularly when mature solutions give you such many things free out of the box (RabbitMQ gives you a ton of stuff like a cool admin UI and routing that you would have to build in ZeroMQ).

packetized · on Sept 17, 2016

We run an average of 14k logs/sec through a two-node RMQ cluster, with max sustained throughput in the ~50k range. You're spot on with the bottleneck being Elasticsearch, but the latest releases in the 2.x train have a lot of fine adjustments that have drastically improved our indexing rate, such that we actually index at a 50k/sec rate. Would be interested to hear about your ES cluster configuration.

agentgt · on Sept 17, 2016

I'm embarrassed to say that at the present moment we currently don't use ES clustering but rather a monstrous powerful bare metal machine as we had issues with the cluster failing with some network issues we had with Rackspace.

BTW I didn't mean to denigrate Elastic Search (I assume that is why I'm getting downvoted.... a comment would help). We just haven't had the chance to upgrade it and properly configure it.

In fact Elastic has been pretty darn speedy as of lately particularly since we purge some of the data after 6 months (we still have permanent filesystem storage of logs of course).

gerakinis · on Sept 17, 2016

You can turn off multicast discovery and write in unicast peering addresses. If you are in the cloud and you are clustering this is step 1 =)

trimbo · on Sept 17, 2016

> cluster failing with some network issues we had with Rackspace

Were you using Zen Discovery at the time?

I haven't kept up with ES development in the last year so maybe they fixed this, but a flaky network can cause a cluster using Zen to freak out a lot.

gerakinis · on Sept 17, 2016

Are you using HA functionality and also on disk backing? These two things bring down performance roughly 5-10x and are mostly required for situations that can't afford message loss. I still like the rabbitmq solution, it is my own, but i've found it takes more hardware than you are suggesting.

smetj · on Sept 17, 2016

Yes that's my experience as well.

trimbo · on Sept 17, 2016

> We run an average of 14k logs/sec through a two-node RMQ cluster

How many MB/s are you indexing?

benmccann · on Sept 17, 2016

What are the new options that help improve indexing rate?

smetj · on Sept 17, 2016

100K msg/s going through RabbitMQ ... Would you mind commenting on how your Rabbitmq is setup? Is it a cluster? Distributed queue(s)? Synced queues? What kind of exchange? How many queues your messages end up in? (because 1 queue is bound to 1 core), persistent queue? lazy queue? What is the "Consumer utilisation" value when doing 100K msg/s?

I'd be really interested to hear how you can achieve such a thoughput with rabbitmq

agentgt · on Sept 18, 2016

I'm not sure how much I can help because I didn't setup the RMQ cluster so I don't know the configuration details but I know it is fairly powerful (its also partly why I can't entirely be critical of Auth0 because ZMQ is probably far less expensive infrastructure wise).

I do know we use multiple queues and even exchanges (and I did not know about the one core to queue).

A simple googling shows though folks have achieved far greater throughput[1] than 100k (and by the way this wasn't sustained.. it was spikes).

[1]: https://blog.pivotal.io/pivotal/products/rabbitmq-hits-one-m...

gerakinis · on Sept 17, 2016

Heka is really stupid to configure, really fast, and now deprecated... I've used it for log tailing and metrics forwarding extensively and can't recommend it enough if you need to use amqps out.

If you don't need amqps out there are more modern, better supported projects.

brightball · on Sept 17, 2016

Agree with everything you said, just curious is the GT in your name for Georgia Tech?

agentgt · on Sept 17, 2016

Yep... I was the first agent@cc.gatech.edu circa 99-04 (I wonder who owns it now). Advance apologies if you knew me then... I was an inconsiderate a$$hole at the time. I'm still an a$$hole but less inconsiderate.

brightball · on Sept 17, 2016

No apologies needed, lol. I went to Clemson. Just saw it and was curious.

wcdolphin · on Sept 17, 2016

Did you ever try running 5 ZK's in the ensemble? 3 is the absolute minimum to survive a single machine failure. If you are having trouble with availability, it seems natural to increase your safety factor there.

I was surprised by the contrasting sense of importance of delivery guarantees in the article. At the start, losing a message was akin to the death of a child. At the end, shrug. Now every single machine failure (or even ømq process restart) failure will lose you log messages stored in memory :(.

Glad to hear you found a solution that worked for you though! Would love to hear about difficulties you had with the new system, in particular adding brokers.

_qc3o · on Sept 17, 2016

They said availability was "death of a child", not dropping log messages. The trade-off they've made here in terms of being available with some potential loss of visibility is the right one. The system overall is clearly simpler and simpler systems have simpler failure modes and so it is easier to add mitigation components on top that can recover from those failure modes to guarantee higher uptime.

I've never heard anyone say managing a production Kafka cluster was easy or simple. Well, anyone who has had to actually maintain such clusters hasn't said it anyway.

fauigerzigerk · on Sept 17, 2016

>They said availability was "death of a child", not dropping log messages.

True, but it appears to me that availability problems and dropped log messages often have the same root cause - network issues.

So whenever they do have availability issues (and dying babies) they won't be able to investigate properly because log messages are being lost as well.

That's obviously a very general observation. It may well be that in their architecture availability issues are mostly caused by something unrelated to networking (e.g. the database).

_qc3o · on Sept 17, 2016

It would be quite simple to have a two tiered approach to the logging problem since they have separated it into 2 components. One can just write and ship files while the other is what they have described in terms of providing real time streaming.

So the question then becomes what are the failures modes of their logging setup in terms of misbehaving clients? I don't know how kafka handles misbehaving clients. I suspect it would lead to global effects and slowdown of the entire cluster because of 1 or 2 misbehaving clients whereas in the current set up local misbehavior will be localized to the nearest aggregator dropping messages. Simple memory usage and other kinds of monitoring can then be used to find these issues and then mitigate them accordingly.

This is still a heck of lot simpler setup than using kafka and worrying about all sorts of weird distributed system failure modes. I'm sure kafka got them started initially but continuing to use it is like using a sledgehammer to kill a fly. For the use case they have this setup is the correct one and migrating to kafka if it becomes necessary will be possible. So in my view this is proper engineering. They've made all the right trade-offs instead of just chasing fads and trends.

fauigerzigerk · on Sept 17, 2016

>It would be quite simple to have a two tiered approach to the logging problem since they have separated it into 2 components. One can just write and ship files while the other is what they have described in terms of providing real time streaming.

Yes, they absolutely could do that, but they apparently don't. And maybe that's because they would lose a lot of the simplicity they won by ditching Kafka.

Anyway, I didn't want to defend Kafka specifically. The one time I considered it, I ended up not using it because it seemed too heavy weight for my use case in terms of memory usage and complexity.

kod · on Sept 17, 2016

I've managed a production kafka cluster at my current gig for over two years. It has been easy and trouble free with the exception of one incident, which was ultimately our fault.

TheHydroImpulse · on Sept 17, 2016

FYI, Kafka doesn't need to fetch from disk every time as it caches the logs pretty aggressively, as long as you have enough memory.

Running Zk and Kafka on the same nodes is likely not the best thing.

im_down_w_otp · on Sept 17, 2016

Why? I would think that, as long as there wasn't massive I/O contention between the two, that co-locating Kafka and Zookeeper on the same machines would mitigate a whole massive class of weird edge cases by removing one of the failure modes; the network boundary between the two critical components.

Though for my part I still don't understand why Zookeeper wasn't built as a library to add distributed strongly consistent coordination to software that needs/benefits from it rather than being an external service that needs to be connected to, and thus introduces a gnarly mess of new failure modes that make Zookeeper client behavior extremely critical and often fragile. Something that's more like a "libpaxos/libraft" (e.g. serf for Go-lang or riak_ensemble for Erlang) seems a lot more valuable. /shrug

TheHydroImpulse · on Sept 17, 2016

But co-locating them won't actually remove a class of errors because Zk is not HA. The Kafka brokers need to communicate with the leader in the Zk cluster.

If we have K1,Z1 -- K2,Z2 -- K3,Z3 -- and one node goes down, you've now taken down both a broker and a Zk node. Remember, the brokers don't care about connecting to any Zk node, they want the leader. So you aren't gaining any more fault tolerant by co-locating them.

If there's a network partition between the leader Zk node and other nodes, the local Kafka broker won't actually be able to do much because the Zk cluster will elect a new leader, on another node, so again, you aren't gaining anything.

Moreover, you're now tying the scalability of Kafka with Zk. Zk doesn't scale linearly, so there's only so many nodes you may have in a cluster. Kafka, on the other hand, scales linearly. So if you're colocating them and you have to bump up Kafka, do you still start up Zk for those nodes (but they don't actually join the cluster)? You're now special casing and adding more edge cases.

htn · on Sept 17, 2016

FWIW, you can get Kafka packaged as a fully managed and HA service from https://aiven.io on AWS and also Azure, GCE and DigitalOcean.

But if the Auth0 runs their entire operations on AWS, maybe Kinesis would have been a more natural transition.

mpd · on Sept 17, 2016

Eh, Kinesis has some pretty significant trade-offs to know about if you are comparing it with Kafka (e.g. data retention time and write latency).

janczukt · on Sept 17, 2016

We need an on-premise and cloud story, so cloud only solutions did not cut it for us.

PieterH · on Sept 17, 2016

The article is a little old. How has the system run since you deployed it? Do you have any interesting figures?

janczukt · on Sept 17, 2016

It continues to run beautifully. Since we rolled it out back in 2015 we had zero issues with real time logging. I have particularly fond memories of the first week after rollout, it felt like vacation. I finally could get some sleep.

ZenoArrow · on Sept 17, 2016

I'm in a similar boat. I'm hoping to propose Kafka to help with some data replication and consolidation tasks, but it has to be both on-premise and as low maintenance as possible (low maintenance in the sense of the work local developers would do).

To anyone reading this with Kafka experience, do you have any tips/advice when it comes to maintaining a Kafka service?

hashmp · on Sept 17, 2016

Use 5 zookeepers, on a separate set of servers.

Use configuration management such as chef to allow you to quickly build new nodes and to roll out changes accross the cluster. You will need to make tweaks. The chef Kafka cookbook which is the top result on Google has means of coordinating restarts of brokers accross the cluster. Use consul as a locking mechanism for this. You could use zookeeper, but consul works well for auto DNS registration and auto discovery.

Use the yahoo Kafka-manager app to manage the cluster and to see what is going on.

Don't use the Kafka default of storing data in /tmp/. Your OS will periodically clean it.

Gigablah · on Sept 17, 2016

> Don't use the Kafka default of storing data in /tmp/.

That seems like MySQL level of bad defaults.

cbsmith · on Sept 18, 2016

Five kind of kills performance compared to three, and doesn't map well into AWS, where you generally have 3 or 4 AZ's. I tend to go with three but make sure you've got fully automated responses towards failures.

ZenoArrow · on Sept 17, 2016

Five zookeepers? Seems like a lot. Why five? Is it hard to keep them active?

Thanks for the tips.

darkr · on Sept 17, 2016

Zab [the distributed consensus algorithm that powers ZK] shares some similarities with Paxos, and requires a quorum of nodes to be online.

If you want highly available ZK, your choices are 3, 5, 7... nodes, for which you can have 1, 2, or 3 nodes offline at any one time.

If you have one node fully down on a 3 node cluster, and there is even a tiny network blip or partition (as often happens in cloud environments) then you are down.

abritishguy · on Sept 17, 2016

Kinesis is very poor

StreamBright · on Sept 17, 2016

The author correctly points out that he is comparing apples to oranges.

Kafka gives you features that certain systems cannot live without, like on disk persistence (saved my life couple of times) and topics. Filtering messages on the client side like ZeroMQ does it not an option in many cases, just think about security. I think Kafka has a long way to go before it can be used as a general message queue (many features are not there yet like visibility timeout for example) but if you can manage Zookeeper and have means to work with it (somebody understands it and knows its quirks) it can provide a reliable platform for distributing a large number of messages with low latency and high throughput, just like it does at LinkedIN.

bachback · on Sept 17, 2016

With ZeroMQ I had the worst possible results and experience. Honestly much of what it claims is bogus. It is highly optimized for certain cases and utterly useless for distributed systems. Try and find out in PUB/SUB what the IP addresses of the subscribers are. Not possible. In many cases you will be much better off learning TCP/IP yourself. In the mentioned case you simply iterate over the vector of subscribers - much more powerful and the sane default. It seems at some point people confused internal networking solutions with the Internet.

vegabook · on Sept 17, 2016

it is trivially easy for any node to broadcast its IP address to the whole network periodically (in my case every 2 seconds) using a separate thread and UDP. Using this technique I have rock solid ZeroMQ topology that reconnects with max downtime about 2.5 seconds (because I broadcast every 2 sdconds) for any single node failure. I agree that this functionality could be better implemented in zmq but using this simple technique, the rest of zmq becomes amazing. In Python:

  import socket
  import time
  cs = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
  cs.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
  cs.setsockopt(SOL_SOCKET, SO_BROADCAST, 1)
  while True:
      cs.sendto('Node ID', ('255.255.255.255', 54545))
      time.sleep(4)

Everybody listening on the same on port 54545 without knowing Node ID's IP address will get these messages which includes the broadcaster IP address.

  import socket
  s=socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
  s.bind(('',54545))
  m=s.recvfrom(1024)
  print m[0]

This is a very useful technique when using ZeroMQ generally as you can broadcast services without knowning any IP address so they can come up and down on new addresses if / when necessary.

mej10 · on Sept 17, 2016

This would be great, but I don't think it works on AWS -- I don't think they support broadcast.

kchoudhu · on Sept 17, 2016

My ZeroMQ components all register themselves in a database when coming and going. This makes it trivially easy to find where stuff is just by running a bog simple database query.

Lots of ways to skin this cat...

vegabook · on Sept 17, 2016

indeed, with the small caveat that you now need to know the database's IP address, thereby introducing a failure point / master IP point. My solution is completely IP address agnostic. Indeed you could run my version even with DHCP for all nodes.

kchoudhu · on Sept 17, 2016

For sure. In my case, the bootstrap IP is distributed during initial software configuration. Not elegant as your solution, but it works.

As for the SPOF: my logic was that if the DB is dead, there's no point having the ZeroMQ components come up anyhow. The jobs will keep on trying to come up, but until Ops brings the database back up, they can't do any damage.

vegabook · on Sept 17, 2016

that's a very fair point. There are few applications where if the DB goes down, there's any point for the rest of the system to stay up. My use case happens to be distribution of financial tick info without persistence - ie: no cares on temporary data losses. Therefore it's very peer-oriented and doesn't care about lost messages or duplicated messages. All it cares about is throughput of latest data samples (note the world "samples". Not every single data point is needed though it's nice to have 99.999.. and we usually get that).

It's amazing how small use case differences can mean very big differences in the effectiveness of various strategies. The at least once / at most once / no guarantee either way but soft real time, use cases, can mean radically different toolchains. This is the real lesson of distributed streaming.

janczukt · on Sept 17, 2016

This is exactly how we solved our "discoverability" problem at the end of the day. It was pragmatic enough for us to have each node register itself in our production MongoDB in a collection with a TTL index. If the node went down, the DB itself removed its registration.

oldmanjay · on Sept 17, 2016

What if they crash without registering an exit?

kchoudhu · on Sept 17, 2016

Python context managers and heartbeats :)

vegabook · on Sept 17, 2016

They broadcast every 2 seconds. No heartbeat = dead.

oldmanjay · on Sept 19, 2016

So there's more machinery waiting in the wings, not just the bog-simple query. Presumably you've barely described all of the mechanisms, and I won't bother socratically making the point that handwaving complexity away isn't the same thing as simplifying. I'll simply state it.

vegabook · on Sept 20, 2016

The above code is wrapped in a thread and runs non stop in each node. It's really not very complicated. It's basically sticking a def around the top block of code and then thread.start() in the main. It's extremely cheap because it sends then sleeps for a few seconds. Any node than then listen to the broadcasts on a known port (but no IP address needed) and will know exactly what the state of the system is and all the IPs. Then in the main loop you just make sure that you've gotten a recent pulse otherwise you reconnect. If you're using zmq it's just another short poller entry amongst the others that you will already inevitably have. Literally a couple of lines. It allows you to bring nodes up and down at will for robustness and scalability.

I think it's much less complex than running an entire database node just for this, which btw will also require you constantly to poll, and will require you to bring in an (often heavyweight) client library into each node too, as opposed to standard-library sockets which if you're running multinode you're almost certainly already importing. If you're looking for "simple" distributed computing my sense is that that has yet to be invented.

vegabook · on Sept 17, 2016

That's very interesting and I did not know that. I happen to run my own modest (12-node) cluster, and had planned to move to cloud if I grew beyond 20 nodes. The fact that I would not own the transport layer in my case is quite severely problematic. Going to have to investigate if this is the case at Azure and the rest too. Can't I partition my instances into a VPN and UDP broadcast within that? If anybody has any insight, very happy to hear about it.

kchoudhu · on Sept 17, 2016

The ZeroMQ documentation is pretty up front about the need for you to build those pieces yourself. It would appear you chose the wrong tool for your requirements.

bachback · on Sept 17, 2016

ZeroMQ wants to be a neutral wrapper in any language but in the end its a C++ library enforcing C++ concepts. You can't map in a straightforward way process concepts from C++ to other languages (also the OS and VM sits in between that). In the end its just mapping programming logic to state machines. There are much better ways to do this and end up with something much more powerful (with first class meta-programming).

sidlls · on Sept 17, 2016

I've found ZeroMQ to be an immensely useful and powerful library for certain kinds of distributed queueing applications. It's just a library, not an entire application or web framework to drop practically pre-dictated code into.

Also what "c++ concepts" does it enforce and how? And why is that a bad thing?

PieterH · on Sept 17, 2016

ZeroMQ has native stacks in Java, C#, Erlang, C. Also C++.

PieterH · on Sept 17, 2016

If you try to use ZeroMQ to replace TCP with the same semantics, then no, it won't work.

bachback · on Sept 17, 2016

How should ZeroMQ replace TCP? Internet runs on TCP/IP (last time I checked). In private networks many problems of public networks don't appear. I'm aware that organisations use it for their internal systems. As soon as one bridges to the outside world, one is going to hit with a problem.

PieterH · on Sept 17, 2016

ZeroMQ is a messaging layer that runs over numerous network protocols (TCP, UDP, PGM, TIPC,...), as well as inter-process and inter-thread. It provides abstractions (like pub-sub) that work over all these protocols, and it shows the developer the abstraction using a socket-like API.

In practice you build services which are internally composed and scaled using threads and processes, and which talk to each other over plain (unencrypted) protocols, e.g. TCP between boxes, IPC between processes on the same box. And when you need to talk to the outside world you build bridges that can speak any protocol you like, such as HTTP, or ZeroMQ's encrypted TCP protocol (CurveZMQ). From the developer's point of view, it's mostly transparent. Obviously you need security infrastructure, e.g. key/certificate management.

You can build entire ZeroMQ applications using only encrypted TCP and they will run on public infrastructure. You can develop and test the same apps on a laptop. You might run the same code using IPC on a laptop, and CurveZMQ over the Internet.

So though ZeroMQ definitely uses TCP as one of its transport protocols, it's not a replacement for it.

Which means that doing things like asking for the IP address of a peer make no sense. What's the IP address of a thread, or another process on the same box? If you need identity information, you should not use the transport protocol for that.

_halgari · on Sept 17, 2016

ZMQ's default behavior (and in some cases only behavior) of dropping new messages when buffers are full, made it a no-go for my client. We ended up switching away from ZMQ to a more traditional durable queue and ended up saving a ton of code complexity and got a lot of reliability in the process. Having now researched it I can't think of a reason I'd ever use ZMQ again. I'll either use a durable queue when I care about message delivery, or something much more traditional when I don't.

markpapadakis · on Sept 17, 2016

Maybe TANK ( https://github.com/phaistos-networks/TANK ) would have been a good alternative on there. No features parity with Kafka, but setting it up is a matter of running one binary and creating a few topics, and it is faster than Kafka for produce/consume operations. (disclosure: I am involved in its development).

siscia · on Sept 17, 2016

Did you consider MQTT? Sound to me a more natural choice.

jpgvm · on Sept 17, 2016

Probably should have been running ZK and Kafka queues separate to CoreOS/container shenanigans.

If deployed using the Netflix co-processes both are very durable.

Nimimi · on Sept 17, 2016

You can deploy Kafka using DC/IO and it takes care about HA for you. DC/IO is quickly becoming the go-to solution for database deployments. ArangoDB even recommends it as default.

Now about Kafka vs ZeroMQ: you want Kafka if you cannot tolerate the loss of even a single message. The append-only log with committed reader positions is a perfect fit for that.

bogomipz · on Sept 17, 2016

>"DC/IO is quickly becoming the go-to solution for database deployments."

It is? Can you provide any evidence supporting this claim?

Mesos is mostly used to deploy stateless services.

oblio · on Sept 17, 2016

Do you mean this? https://dcos.io/get-started/ aka DC/OS?

From what I can see it doesn't really support database deployments except for ArangoDB and Cassandra.

ryanmaclean · on Sept 17, 2016

Riak and MySQL are in Universe, for example: https://github.com/mesosphere/universe/tree/version-3.x/repo...

Nimimi · on Sept 17, 2016

Ok, database was the wrong term. Maybe "things that big data companies use" or something.

mej10 · on Sept 17, 2016

I wish there was more information on how much work it takes to maintain a DC/OS setup. All the marketing makes it out to be the easiest thing in the world.

k__ · on Sept 17, 2016

I'm a total message queue noob. What are the usecases for them?

I used MQTT but only as a message bus.

zo1 · on Sept 17, 2016

From my point of view, the main things behind message queues (Not zMQ specifically) is guaranteed delivery, persistence, multiple-message atomicity, message passing/forwarding, and sometimes guaranteed message ordering. Other than that, all it does is facilitate communication between different actors.

Nothing magical/weird about it, just depends on whether or not you've got a nail to hammer with your MQ-hammer.

macns · on Sept 17, 2016

You should have a look at http://zguide.zeromq.org/page:all

It's a great read and describes most scenarios well and easy to understand.

weitzj · on Sept 17, 2016

Did you look at nsq.io or NATS?

tjholowaychuk · on Sept 17, 2016

+1 for NSQ, it's not a magic bullet in terms of scalability but you can get quite far. When I was at Segment we were pushing an easy 2-3B messages per day through it, if not more with message "amplification" internally.

tapirl · on Sept 17, 2016

Do you have any experience on NATS? It (with NATS steam) looks great.

manigandham · on Sept 26, 2016

Why dont all these companies ever just use real enterprise software?

There are about a dozen message systems out there that will handle much more than Kafka with minimal or no operational overhead while supporting everything they need.

wanderr · on Sept 17, 2016

I came up with a very different solution for real time access to logs: tail them to slack. It's not an aggregation solution and doesn't work well if you have chatty logs with nothing to filter on, but if you just want to be notified when things are happening in the logs it's pretty nice and doesn't need any infrastructure.

http://wanderr.com/jay/tail-error-logs-to-slack-for-fun-and-...

wanderr · on Sept 17, 2016

why the downvote? the article says "Real-time access to server-side logs is what makes backend development palatable in the era of cloud computing. As a developer you want to be able to get real-time feedback from your server side code deployed to the actual execution environment in the cloud, especially during active development or staging." and this is another solution that provides that.

jvoorhis · on Sept 17, 2016

efangs · on Sept 17, 2016

Anyone use collectd + rrd for this purpose? Still trying to understand at what level it's worth to move to something else.

asasidh · on Sept 17, 2016

So you used Kafka for something that should have been handled by a MQTT or ZeroMQ in the first place ?

cbsmith · on Sept 18, 2016

MQTT is just a protocol, so not sure how that helps.

0MQ doesn't sound like it is the right solution either, but yeah... often you pick the wrong tool and learn something in the process.

bdowling · on Sept 17, 2016

But why ZeroMQ and not nanomsg?

janczukt · on Sept 17, 2016

The answer is rather simplistic and does not even scratch the surface of the drama surrounding zeromq/nanomsg.

I knew a big part of the reliability problems we were having was related to the distributed state that needed to be kept synchronized. I wanted to move to something simpler that did not rely on any durable, distributed state, while supporting the messaging patters we required. ZeroMQ fit the bill.

While there were other implementations with similar properties, there is no reasonable way to compare them up front given that what makes the real difference at the end of the day is the behavior of the system at 2am one day after a prolonged stress run. As a startup one does not have resources to conduct an up front analysis of that sort. You just take a bet. If it does not pan out, you pivot. This is exactly what we have done with the move from Kafka to ZeroMQ in the first place.

Now that we've been using ZeroMQ for over a year and have been perfectly happy, there is no incentive to look elsewhere.

PieterH · on Sept 17, 2016

See http://hintjens.com/blog:112 for my opinion on why nano isn't (wasn't, perhaps, as it seems to be doing better) a good choice.

kal31dic · on Sept 17, 2016

With sincerely the greatest respect and admiration for what you achieved with ZeroMQ, Pieter, I think perhaps one might be a bit more nuanced when one isn't a neutral party. Full disclosure - I wrote a D wrapper, but I am not involved in nanomsg development and just a user.

There was some drama when the maintainer quit briefly before rejoining. Since then the gitter channel has been more active than I remember it being before. The mailing list is quiet it is true. Somebody just released a Rust version, and version 1.0.0 of nanomsg was indeed released.

You can see commit history here: https://github.com/nanomsg/nanomsg/commits/master/src

PieterH · on Sept 17, 2016

Thanks for the updates. I've edited my comment. I've always wanted nano to succeed, just disliked the negative attitude to ZeroMQ expressed in its docs, which seemed unnecessary and damaging.

kaleidic · on Sept 17, 2016

Yes, well the little guy always wants to unseat the dominant player, and when there is some personal history involved, things get more mixed up (technical, emotional, manner of expression). In your shoes I would be irritated by that too. But a project if it develops eventually transcends the personalities involved, and that seems to be happening now.

PieterH · on Sept 17, 2016

ZeroMQ is hardly dominant. It's a small player in a huge market and there was and is space for many more projects in this area. I'm glad nano seems healthy again, yet it's not enough, and I'll explain why.

In the end the whole point of ZeroMQ was to build new protocols and APIs for decentralized messaging. My real disappointment with nano was that it made zero effort to build on existing work (mainly, ZMTP) and instead just started again, as if thousands of people hadn't spent years figuring out what a decentralized messaging protocol might look like.

It was worse than that, in fact. Nano launched itself on a wave of negativity. It makes good press, and poor everything else. Such hate for one's own history and knowledge base isn't healthy. If a messaging product isn't aiming at interoperability as a primary goal, it is worthless.

I'm not a fan of the "IETF or bust" approach either. That just doesn't work if you're a decentralized community. We needed and still need lightweight processes for RFC development. We use such a process (Digistan's COSS) in our RFCs. It wasn't random. I built Digistan and COSS over years after seeing AMQP swallowed up and destroyed by a committee. Why isn't nano using a process like COSS?

Without interoperable protocols, all we have is a bunch of software projects. And they die. And then all this is for nothing and the proprietary systems will rule the world and our dreams of making distributed software cheap again will die as well.

And this makes me angry: nano had the chance to push this forwards, and threw it away like old trash. What a stupid, petty waste of opportunity and goodwill.

kaleidic · on Sept 17, 2016

Note that I replied to a different version of the parent comment. Thanks for updating.

Personally I started using zeromq, but was a bit disturbed by the experience of the saltstack guys with timeout problems and memory leaks. Probably fixed now, I should think. But more than that, the LGPL and MPL 2.0 licences aren't great for a commercial project that might in future need to be distributed to a different entity. So the license alone is a reason one might welcome greater diversity in available choices.

PieterH · on Sept 17, 2016

We use the LGPL license with a clause that makes it harmless. The MPLv2 license is not an issue. All complaints from businesses about licensing come straight to me, as they have for years. MPLv2 raises zero complaints.

ZeroMQ "memory leaks and timeouts" wasn't the issue. Do you have a reference for that?

Salt's issues with ZMTP were performance; they wanted to multicast securely, and ZMTP does not support that. I'd complain once again that making their own custom protocol (actually, it seems to be largely just CurveCP) rather than working with us to extend ZMTP was poor form. Also, Salt wanted a pure Python stack.

raarts · on Sept 17, 2016

From the blog: "Crazy Idea: Clone nanomsg, move to zeromq organization, relicense as MPL, support ZMTP, only new socket types and expose CZMQ API."

Did that ever happen? I still like the idea behind nanomsg.

PieterH · on Sept 17, 2016

It never happened because (IMO, I'm guessing), no-one actually needs nanomsg. I've never really understood the hostility some people had towards e.g. the ZeroMQ protocols, since these RFCs are as plastic and open to contributions as any part of the project. Yet people like to react against things, and this was a large part of nano's reason for existence. In reality, people who need stable working code just take ZeroMQ and that's it.

Which is all kind of a shame since it would have been so nice to see a new C engine in the community. We have new engines in C#, Erlang, Java, yet the old core project is still that somewhat clunky C++ engine.