Don’t leave Apache Flink and Schema Registry alone

Schema Registry?

  • Evolution of data in a simple way, where the consumers and producers can be using different schemas version and even doing that can be compatible, depending on the compatibility strategy used backward or forward compatibility.
  • Centralized repository/catalog, having a centralized piece that manages the data models, allows your architecture to have a data catalog that knows all the data types in the system.
  • Schemas enforcement ensures the data published, matches the schema defined and the constraints are respected.

Apache Flink and Schema Registry

For this integration is used Apache Kafka, Schema Registry Confluent, and Apache Flink.

Using Avro to define our model

Record definition in Avro
docker-compose.yml
  • Kafka Tool: GUI application for managing and using Apache Kafka clusters
  • Kafkacat: Generic command line non-JVM Apache Kafka producer and consumer

Register the new schema

Using the Avro format firstly is necessary to register the schema before starting to use it, to do that you can register it manually using the REST API or use the maven plugin. The maven plugin can simplify the process and reduce the complexity.

Schema registry plugin
Topic _shemas used to store the schemas created

Inserting messages

To send events, the utility kafka-avro-console-producer will be used to insert valid and invalid events, the schema of each event sent is validated using the Schema Registry that is up-and-running in the endpoint specified in the command arguments.

Build Flink Job

Apache Kafka Flink provides a built-in connector to read and write messages to Apache Kafka, in the documentation you easily find how to integrate the connector in the project.

Kafka consumer definition
Kafka producer definition

Is possible to define multiple event types in the same topic?

Exists two blog posts 1) and 2) very interesting where are explained the purpose to have a topic containing multiple event types instead of having a topic with only one event type.

Bonus: Flink Job reading multiple event types in a single topic

Slight changing the previous approach and to cope with the requirement:

Multiple event types deserializer
Flink consumer for multiple event types
Model record
Kafka producer

Summary

The benefits of having a schema registry in our architecture are huge which allows event types to evolve and to be sure exactly what’s going on in our system.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store