Apache Kafka

This article covers connecting to a Streaming Data Source - specifically Apache Kafka.

Introduction

Leveraging a streaming data source provides the ability to:

  • Collect data from any application
  • Ingest the streaming data and perform additional processing such as persisting the contents to a database or file format (e.g. Parquet, JSON).

Zetaris supports Apache Kafka streaming which consists of the following logical components:

  • Topic - messages belong to a particular category called a topic
  • Producer - publish messages to one or more Kafka topics. 
  • Consumer - subscribe to one or more topics and consume published messages

Architecture

Screen Shot 2022-11-23 at 10.30.47 pm

In this architecture, Zetaris plays the role of the Consumer, with topic definitions and publishing of messages performed outside of Zetaris.

The integration point between Zetaris and Kafka is the topic.

On the Producer side, we will leverage the Kafka producer console to publish messages.

For our scenario, we will ingest streaming data from and perform additional processing such as persisting the contents to a database or file format (e.g. Parquet, JSON).

Use Case

For our scenario, we have a Stock Ticker stream that publishes updates on technology stock equities.  As the consumer, Zetaris will ingest streaming data and persist the contents to a Parquet file format to the file system.

Kafka Steps

  1. Ensure Kafka is running
  2. Create a topic, in our example the topic is named stock.
  3. Ensure your firewall permits connections to your kafka bootstrap port (e.g. 9092).

Connect to a Streaming Data Source

  1. Click on NDB Fabric Builder widget.

    2022-11-21_10-52-10
  2. Click Screen Shot 2022-11-21 at 12.08.55 pmnext to Streaming Data Sources.

    2022-11-21_12-10-07
  3. Specify the following values:
    Data Source Name Stock_Ticker
    Input Format JSON
    Input Source Kafka
    kafka.bootstrap.servers your kafka boostrap server and port
    kafka.topic stock
    kafka.security.protocol PLAIN
    kafka.sasl.mechanism PLAIN

     
    For Input Sample enter the following JSON payload:

    { 
    "symbol" : "AAPL",
    "name": "Apple",
    "price": 150.12,
    "previous_close" : 148.01,
    "open" : 148.13,
    "bid" : 150.26,
    "ask" : 150.27,
    "volume" : 50304716,
    "avg_volume": 89767359
    }
  4. Your completed dialogue should look like the following:

    2022-11-24_11-46-27
    Click Create.
  5. Your streaming data source should now appear.

    Screen Shot 2022-11-23 at 4.06.46 pm
  6. Click Screen Shot 2022-11-21 at 12.08.55 pmnext to your streaming data source to add an aggregation table.

    2022-11-23_16-10-23
  7. Specify the following values:

    Aggregate Table Name stock
    Output Mode Append
    Output Format Parquet
    Storage Local File System
    File Path File path accessible to Zetaris

    For the Query enter the following:

    select timestamp, json_body.* from Stock_Ticker
  8. Your completed dialogue should look like the following:

    2022-11-24_11-52-27
    Click Create.
  9. You should now see your newly created aggregate table.

    Screen Shot 2022-11-23 at 8.52.06 pm
  10. Click the aggregated table to expand.  

    2022-11-23_20-53-30
  11. You should see a data structure that corresponds to the query we entered.

    Screen Shot 2022-11-23 at 8.55.28 pm
  12. Click preview next to stock to view the contents of the table. 

    2022-11-23_20-59-28

    Note - there will not be any contents as messages are yet to be published to the topic.

    Screen Shot 2022-11-23 at 9.00.43 pm

Publish messages

  1. Switch to your KAFKA_HOME/bin directory
  2. Start the topic producer
    ./kafka-console-producer.sh --broker-list localhost:9092 --topic stock
  3. Enter the following JSON payload and hit Enter to send the message.
    {"symbol" :"AAPL", "name":"Apple","price":150.12,"previous_close" :148.01,"open":148.13,"bid" :150.26,"ask":150.27,"volume":50304716,"avg_volume":89767359}
  4. Repeat the above exercise the the following JSON payload and hit Enter to send the message:
    {"symbol":"SPOT","name":"Spotify","price":75.25,"previous_close":74.12,"open":72.97,"bid" :74.98,"ask":75.25,"volume":1564035,"avg_volume":1847590}

View Messages

  1. Switch back to your Zetaris GUI to view messages consumed.
  2. Click preview next to stock to view the contents of the table. 

    2022-11-23_20-59-28
    You should see the following 2 messages in the result, corresponding to the messages we published from Kafka in the previous step

    Screen Shot 2022-11-23 at 10.33.40 pm
With messages now appearing in Zetaris, we have successfully integrated Zetaris with Apache Kafka.