Skip to content

Proof of concept | Collecting nginx log data and pushing it to kafka with the help of gRPC.

Notifications You must be signed in to change notification settings

thearyanahmed/logflow

Repository files navigation

Logflow

What it does?

Logflow collects nginx layer 7 data and layer 4 packet data and will be sent to kafka using protobuf. As of using protocol buffer, it works very fast. We will be improving on it. It is still in very early stage.

It takes the data from the gRPC request and uses publishes into kafka . Logflow uses client stream to collect the data.

It can publish a single request into multiple topics and topics can be changed on every request, as the time of writing this, all possible topics must be given (set in the .env file) before you start the server.

Dependencies

  1. Kafka
  2. GO
  3. Protocol Buffer for development purpose.

At the moment we don't have the docker environments ready yet. The dependencies need to be installed manually.

Running the Application

First, git clone

// https 
git clone https://github.com/thearyanahmed/logflow.git

// or ssh
git clone [email protected]:thearyanahmed/logflow.git

// github cli
gh repo clone thearyanahmed/logflow

then cd into the directory

cd logflow

If you don't have ZooKeeper and Kafka running, you can follow these instructions

.ENV

Once you have ZooKeeper and Kafka running, in the logflow directory

cp .env.example to .env

Make sure you have the .env values setup correctly, in case you have changed any config for kafka or if some default ports/values are already in use on your machine.

To start the server, run

go run main.go --action serve

This is start tcp connect server at RPC_PORT ( from .env) . The default is 5053.

Then you need to start the upd server by running

go run server/udp_server.go

Add the following to your nginx config,

log_format tufin escape=json
'{'
     '"time":"$msec",'
     '"connection":"$connection",'
     '"request":"$request",'
     '"status":"$status",'
     '"user_agent":"$http_user_agent"'
'}';


# and inside your server block 
server {
    ...
    access_log syslog:server=localhost:6060,facility=local7,tag=nginx,severity=info tufin;
}

Make sure you keep the server=$host:$port same as .env's UDP_SERVER_PROT

After that, visit your web app that is using the nginx config, visit any page, and it should start streaming access logs to kafka.

Notes

  • Before running the client, make sure you have ZooKeeper and Kafka running, and the topics have been created.
  • Current docker image does not work, for what I believe due to a bug in wurstmeister/kafka-docker#issue-516, I get the same error as segmentio/kafka-go#issues/682. Thus, you'll need to run kafka and zookeeper manually for the time being.
  • The program is at a very early stage. Program structure is subject to change.

Architecture

Logflow Architecture

Start ZooKeeper and Kafka

First run ZooKeeper and Kafka brokers. Go to your kafka installation directory and run

bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties

If this is your first time running them, you probably don't have topics. You'll need at least one. To create one, use the following command.

It will create a topic name hello_world

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic hello_world
  • Running kafka consumer (optional) In case you want to see the data transmission directly from kafka.
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic hello_world --from-beginning```

With Docker

At the moment I have not docker image ready for this but soon will. If you want to run kafka inside docker, you can simply use your docker container's address eg: localhost:39092 or something like that ( in the .env) .

Flags

There is at the moment only one flag you can pass along when running the main.go file.

  1. --action Possible values: serve, client, help. eg: go run main.go --action serve

Useful Resources

Enterprise Network Flow Collector (IPFIX, sFlow, Netflow) from Verizon Media

The high-scalability sFlow/NetFlow/IPFIX collector used internally at Cloudflare

GopherCon 2016: John Leon - Packet Capture, Analysis, and Injection with Go

Capturing HTTP packets the hard way

LISA16: Linux 4.X Tracing Tools: Using BPF Superpowers

Sniffing Creds with Go, A Journey with libpcap

Collecting NGINX Plus Monitoring Statistics with Go

GoPacket by Google

Packet Capture, Injection, and Analysis with Go Packet

BCC HTTP Filter

Logkit by Qiniu

NGINX log data format

These ^ are gems

Todos

  • Installations & setup
    • Setup kafka
    • Use docker container for kafka
    • Decide on protobuf ( grpc / rpc )
    • Setup a nginx docker
  • Enable .env support
  • Draw system architecture
  • Connect to kafka
  • Setup a kafka client to receive messages
  • Setup a kafka producer
  • Write tests
  • Get some log data and test the whole system
  • Dockerize full app
  • Prepare dummy NginxLogRequest, Packet & Headers
  • Prepare a client to test grpc and kafka producer with dummy data
  • Also see manually if kafka consumer is consuming messages
  • Document everything ( on going so far)

At the moment, I did not include any docker image for any component.

More to come

About

Proof of concept | Collecting nginx log data and pushing it to kafka with the help of gRPC.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages