tl;dr: fight entropy in your Kafka Connect connector configurations by resolving cluster nodes with my connect-dns-provider plugin
Thermodynamics presents us with the ruthless teaching that global entropy or disorder is constantly growing and that locally you can only fence it off by expending energy.
For living beings, this means the need to ingest enough food over time to sustain themselves. For the world of software development, this means the will to refactor and to introduce new organizing principles over time. The alternative to paying this price is a deteriorating codebase and supporting artefacts that exposes you to risks and reduces the development pace over time. Eventually, you reach the point in which you add one bug per bug removed and you can declare the thermal dead of the system.
How do you keep that entropy at bay? Ruthlessly remove what is no longer needed and introduce structure to support what is actually needed. Bonus points if you avoid falling for the shiny object syndrome and choose the appropriate proven technology even if it’s not the latest hype.
A personal recent example of this for me is related to working with a Kafka ecosystem in which we use Connect connectors to move events in and out of a main Kafka cluster. Connectors are created, modified and removed by means of some scripts your can launch as Jenkins pipelines.
A pain point of this setup is that the configuration for some connectors
require of a list of servers and ports of other clusters. For example, if you
are using the Cassandra source connector, you are expected to configure
connect.cassandra.contact.points=server1:port1,server2:port2. When you
rotate or redimension those nodes, you need to update the scripts and run
the pipelines anew.
At this point, there are many different options to attack this problem but many of them add unnecessary complexity. You might be tempted to integrate with a shiny new configuration parameter store or try to parse remote Terraform states and whatnot.
However, there is an old, well-proven technology for this purpose: DNS. In fact, there is a specific kind of DNS record for this purpose. It’s called SRV and it was introduced more than 20 years ago. Applied to the previous example, you will have one SRV record per Cassandra node, for example:
_cassandra._tcp.example.internal. 300 IN SRV 0 0 9042 c-123.example.internal. _cassandra._tcp.example.internal. 300 IN SRV 0 0 9042 c-333.example.internal. _cassandra._tcp.example.internal. 300 IN SRV 0 0 9042 c-234.example.internal.
These records are read in this way:
- Name. DNS name to use when resolving.
- Timeout in seconds. In this case, 5 minutes in seconds.
- Class. It’s always IN for SRV records.
- Type. SRV!
- Priority. For clusters like Cassandra it makes sense to have the same priority for all nodes. The lower, the most priority.
- Weight. This is useful to assign different fractions of the load to different servers but, as with priority, it makes sense to use the same value for all nodes of a Cassandra cluster.
- Host name.
You can keep these records updated in many ways. For example, if you use Terraform, Cloud Formation or any other software defined infrastructure, you can define the SRV records as linked to the node names. That way, any infrastructure change will automatically change the records.
Another common use case nowadays are services managed within Kubernetes. Because Kubernetes embraces DNS as an integral part to service discovery, you get SRV records for free for services.
The only missing piece in the puzzle is that our connectors resolve the SRV names. Fortunately, we don’t need to modify existing connectors but we can write a ConfigProvider implementation to add support for any connector.
As you can see, a simple single class is able to do the work. Because this might be useful for more people and I never before published libraries to Maven Central I took the pain to properly publishing it (GPG setup, opening JIRA tickets1). You can find usage instructions at the repository.
Remember, let DNS do what DNS does. Even if it’s a “boring” technology.
In one of the many official videos explaining the “simple process”, there was an amazing comment of someone comparing the process US immigration redtape ↩