session lasts only one hour so we need to cycle websocket processes before that – we use 30 minutes as those logs take an incredible amount of space for heavy-duty websites
we experienced some issue (bug?) with v3
– the geo IP fields were not working against first input JSON field ClientIP.
it was solved by moving the ClientIP field further down in the JSON output (and removed RayID by the same occasion).
this seems to be a Fluent-Bit v3 bug, maybe investigate further and report the bug.
grab latest websocat binary
cd /usr/local/bin/ wget https://github.com/vi/websocat/releases/download/v1.12.0/websocat.x86_64-unknown-linux-musl mv websocat.x86_64-unknown-linux-musl websocat
and install the json parser
apt install jq
find your Zone ID
Overview --> see bottom right corner
and create an api token
My Profile --> API Tokens Create Token permissions: Account.Account Analytics, Zone.Logs, Zone.Analytics
define a few useful variables
zoneid= token=
define what fields you want
request an url to pull
echo $zoneid
echo $token
# https://developers.cloudflare.com/logs/instant-logs/
# ClientIP,ClientRequestHost,ClientRequestMethod,ClientRequestURI,EdgeEndTimestamp,EdgeResponseBytes,EdgeResponseStatus,EdgeStartTimestamp,RayID
# w/o EdgeEndTimestamp EdgeStartTimestamp + additional fields after RayID
# fluentbit geoip does not work when ClientIP comes first for some reason
curl -sS -X POST "https://api.cloudflare.com/client/v4/zones/$zoneid/logpush/edge/jobs" \
    -H 'Content-Type: application/json' \
    -H "Authorization: Bearer $token" \
    -d '{
    "fields": "ClientRequestHost,ClientRequestMethod,ClientRequestURI,EdgeResponseBytes,EdgeResponseStatus,RayID,ClientCountry,ClientDeviceType,ClientIPClass,ClientRequestUserAgent,ClientIP,ClientASN,EdgeServerIP,OriginIP,SecurityActions,SecurityRuleDescription,SecuritySources,WAFAttackScore,WAFRCEAttackScore,WAFSQLiAttackScore,WAFXSSAttackScore,LeakedCredentialCheckResult,EdgeColoID,EdgeColoCode,OriginResponseStatus,ZoneName,ParentRayID",
    "sample": 1,
    "filter": "",
    "kind": "instant-logs"
}' | jq .
and proceed with the provided url
websocat wss://...
the trick we use is to clean-up and generate a new session every 30 minutes in an infinite loop – in a nutshell
workdir=/data/cloudflare-instant-logs/
while true; do
        time=`date +%Y-%m-%d-%H:%M:%S`
        dest=$workdir/$zone-$time.json
        find $workdir/ -maxdepth 1 -type f -mmin +31 -exec rm -f {} \; && echo done
    url=`curl ... | jq -r .result.destination_conf`
    timeout --preserve-status --foreground 30m websocat $url > $dest
    unset url
done
cat > /data/list-sockets <<EOF
#!/bin/bash
pgrep -a run-instant
pgrep -a websocat
EOF
cat > /data/kill-sockets <<EOF
#!/bin/bash
echo
echo previous processes:
pgrep -a run-instant
pgrep -a websocat
echo
echo killing
pkill run-instant
pkill websocat
echo
sleep 1
echo actual processes:
pgrep -a run-instant
pgrep -a websocat
echo
EOF
    chmod +x /data/list-sockets
    chmod +x /data/kill-sockets
## ready to go
enable at boot-time
vi /etc/rc.local
nohup /data/run-wrapper > /var/log/instant-logs.log 2>&1 & /data/list-sockets ```
you are now ready to proceed with log parsing
those we could not get
# creating a new instant logs job is not allowed: Bot Management fields are not allowed" # BotScore,BotScoreSrc,BotTags
#-H "X-Auth-Key: $authkey" \ #-H "X-Auth-Email: $email" \ #timeout --kill-after=3
https://github.com/vi/websocat
https://docs.fluentbit.io/manual/pipeline/outputs/elasticsearch
https://docs.fluentbit.io/manual/administration/scheduling-and-retries
https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/unit-sizes
https://stackoverflow.com/questions/69833783/fluent-bit-buffer-size-max-issue
https://github.com/fluent/fluent-bit/issues/4120
https://github.com/fluent/fluent-bit/discussions/5173
https://github.com/websockets/wscat