Linux network troubleshooting a la Dr. House

Intro

The following story is inspired by a recent case I had to troubleshoot at work. I think it is a nice example of troubleshooting Linux networking issues, so I’ve modified/simplified the setup a bit to be able to reproduce it on a VM. I’ll go through the troubleshooting steps in almost the same way we handled the actual case. Service names, IPs, ports, etc are all different that the real case as the focus should not be the example itself but the process.

It all started a few days ago when I was asked to help on an “unusual” case. Docker containers on every single host of an installation could not establish connections towards services that listen on the “main” IP of the host they run on, nor can they ping that IP, but the containers have full access to the internet and can connect to the service ports on other hosts in the LAN. As everyone who has done even a tiny bit of support, asking whether something changed recently in the setup is always replied back with a single global truth: “nothing has recently changed, it just stopped working”.
Challenge accepted!

Reproduction setup

For reproduction purposes I’ve used a VM with one ethernet interface, and a docker bridge. In this VM I have injected the same problem as with the real case. Even though the real case case was a bit more complicated, to make following the post somewhat easier, I’ve used only one service listening on the host, an Elasticsearch process, and only one Kibana docker container that needs to communicate with Elasticsearch on the host.

Troubleshooting process

Host interfaces:

2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 06:ce:3b:94:fe:ac brd ff:ff:ff:ff:ff:ff
    inet 172.31.45.100/20 brd 172.31.47.255 scope global dynamic ens5
       valid_lft 3572sec preferred_lft 3572sec
    inet6 fe80::4ce:3bff:fe94:feac/64 scope link 
       valid_lft forever preferred_lft forever
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:e9:38:3d:a8 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:e9ff:fe38:3da8/64 scope link 
       valid_lft forever preferred_lft forever

Kibana’s config has the following ENV variable set ELASTICSEARCH_HOSTS=http://172.31.45.100:9200, and for simplification purposes let’s assume that this IP could not be changed.

As originally described, curl from the container towards the service IP:port does not work

(container) bash-4.2$ curl -v 172.31.45.100:9200
* About to connect() to 172.31.45.100 port 9200 (#0)
*   Trying 172.31.45.100...

it just hangs there without error. There’s no DNS resolution involved here, straight curl towards the IP:port

Let’s check if the service is actually listening on the host

[root@ip-172-31-45-100 ~]# ss -ltnp | grep 9200
LISTEN      0      128                             [::]:9200                                  [::]:*                   users:(("java",pid=17892,fd=257))

The service listens on 9200. Since the service listens on all interfaces, let’s curl from the container towards the service IP:port on the docker0 interface.

bash-4.2$ curl -v 172.17.0.1:9200
* About to connect() to 172.17.0.1 port 9200 (#0)
*   Trying 172.17.0.1...
* Connected to 172.17.0.1 (172.17.0.1) port 9200 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 172.17.0.1:9200
> Accept: */*
> 
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 524
< 
{
  "name" : "node1",
  "cluster_name" : "centos7",
  "cluster_uuid" : "d6fBSua6Q9OvSu534roTpA",
  "version" : {
    "number" : "7.7.0",
    "build_flavor" : "default",
    "build_type" : "rpm",
    "build_hash" : "81a1e9eda8e6183f5237786246f6dced26a10eaf",
    "build_date" : "2020-05-12T02:01:37.602180Z",
    "build_snapshot" : false,
    "lucene_version" : "8.5.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

that works, so the service is running properly. Curl-ing the service from the host using the host’s IP also works

[root@ip-172-31-45-100 ~]# curl http://172.31.45.100:9200
{
  "name" : "node1",
  "cluster_name" : "centos7",
  "cluster_uuid" : "d6fBSua6Q9OvSu534roTpA",
  "version" : {
    "number" : "7.7.0",
    "build_flavor" : "default",
    "build_type" : "rpm",
    "build_hash" : "81a1e9eda8e6183f5237786246f6dced26a10eaf",
    "build_date" : "2020-05-12T02:01:37.602180Z",
    "build_snapshot" : false,
    "lucene_version" : "8.5.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Let’s check for internet connectivity from the container

bash-4.2$ curl -v 1.1.1.1
* About to connect() to 1.1.1.1 port 80 (#0)
*   Trying 1.1.1.1...
* Connected to 1.1.1.1 (1.1.1.1) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 1.1.1.1
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
< Date: Sun, 31 May 2020 10:10:27 GMT
< Content-Type: text/html
< Transfer-Encoding: chunked
< Connection: keep-alive
< Location: https://1.1.1.1/
< Served-In-Seconds: 0.000
< CF-Cache-Status: HIT
< Age: 5334
< Expires: Sun, 31 May 2020 14:10:27 GMT
< Cache-Control: public, max-age=14400
< cf-request-id: 030bcf27a3000018e57f0f5200000001
< Server: cloudflare
< CF-RAY: 59bfe7b90b7518e5-FRA
< 
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>cloudflare-lb</center>
</body>
</html>

internet connectivity for the container works just fine. Let’s curl to another host in the same LAN on the same service port.

bash-4.2$ curl -v 172.31.45.101:9200
* About to connect() to 172.31.45.101 port 9200 (#0)
*   Trying 172.31.45.101...
* Connected to 172.31.45.101 (172.31.45.101) port 9200 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 172.31.45.101:9200
> Accept: */*
> 
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 524
< 
{
  "name" : "node2",
  "cluster_name" : "centos7",
  "cluster_uuid" : "d6fBSua6Q9OvSu534roTpA",
  "version" : {
    "number" : "7.7.0",
    "build_flavor" : "default",
    "build_type" : "rpm",
    "build_hash" : "81a1e9eda8e6183f5237786246f6dced26a10eaf",
    "build_date" : "2020-05-12T02:01:37.602180Z",
    "build_snapshot" : false,
    "lucene_version" : "8.5.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

That also works. Time to use the swiss army knife of network troubleshooting, tcpdump. If you want to find which veth interface a container uses you can either use dockerveth or use the following commands to figure it out manually.
Get the iflink of container’s eth0:

[root@ip-172-31-45-100 ~]# docker exec -it <container-name> bash -c 'cat /sys/class/net/eth0/iflink'

In this case that would be:

# docker exec -it kibana bash -c 'cat /sys/class/net/eth0/iflink'
41

then find the file name of the ifindex that contains that link number in `/sys/class/net/veth*/ifindex` of the host

[root@ip-172-31-45-100 ~]# grep -lw 41 /sys/class/net/veth*/ifindex
/sys/class/net/veth0006ca6/ifindex

`veth0006ca6` is what we need to use. Let’s run tcpdump on it

[root@ip-172-31-45-100 ~]# tcpdump -nni veth0006ca6
10:06:16.745143 IP 172.17.0.2.47166 > 172.31.45.100.9200: Flags [S], seq 1062649548, win 29200, options [mss 1460,sackOK,TS val 1781316 ecr 0,nop,wscale 7], length 0
10:06:16.749126 IP 172.17.0.2.47168 > 172.31.45.100.9200: Flags [S], seq 4174345004, win 29200, options [mss 1460,sackOK,TS val 1781320 ecr 0,nop,wscale 7], length 0
10:06:16.749131 IP 172.17.0.2.47170 > 172.31.45.100.9200: Flags [S], seq 1386880792, win 29200, options [mss 1460,sackOK,TS val 1781320 ecr 0,nop,wscale 7], length 0

the syn packet is seen going out of the container’s veth interface. So let’s tcpdump on docker0

[root@ip-172-31-45-100 ~]# tcpdump -nni docker0
10:07:07.813153 IP 172.17.0.2.47148 > 172.31.45.100.9200: Flags [S], seq 4114480937, win 29200, options [mss 1460,sackOK,TS val 1396384 ecr 0,nop,wscale 7], length 0
10:07:07.845141 IP 172.17.0.2.47144 > 172.31.45.100.9200: Flags [S], seq 3273546229, win 29200, options [mss 1460,sackOK,TS val 1412416 ecr 0,nop,wscale 7], length 0
10:07:07.845147 IP 172.17.0.2.47146 > 172.31.45.100.9200: Flags [S], seq 2062214864, win 29200, options [mss 1460,sackOK,TS val 1412416 ecr 0,nop,wscale 7], length 0

the syn packet can also be seen on the docker0 bridge. The syn packet cannot be seen on the interface (ens5) that has the service IP (172.31.45.100) on it, since it doesn’t traverse that link to go outside the host.

[root@ip-172-31-45-100 ~]# tcpdump -nni ens5 port 9200 or icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes

Let’s check routing entries.

[root@ip-172-31-45-100 ~]# ip route ls
default via 172.31.32.1 dev ens5 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
172.31.32.0/20 dev ens5 proto kernel scope link src 172.31.45.100 

Nothing interesting here at all. Time to check iptables.

[root@ip-172-31-45-100 ~]# iptables -nxvL
Chain INPUT (policy ACCEPT 169 packets, 27524 bytes)
    pkts      bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy DROP 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination         
     368    27870 DOCKER-ISOLATION  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
     184    14254 DOCKER     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0           
     184    14254 ACCEPT     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
     184    13616 ACCEPT     all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0           
       0        0 ACCEPT     all  --  docker0 docker0  0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT 109 packets, 10788 bytes)
    pkts      bytes target     prot opt in     out     source               destination         

Chain DOCKER (1 references)
    pkts      bytes target     prot opt in     out     source               destination         

Chain DOCKER-ISOLATION (1 references)
    pkts      bytes target     prot opt in     out     source               destination         
     368    27870 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0      

[root@ip-172-31-45-100 ~]# iptables -nxvL -t nat
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination         
     204    12200 DOCKER     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 6 packets, 456 bytes)
    pkts      bytes target     prot opt in     out     source               destination         
       0        0 DOCKER     all  --  *      *       0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 6 packets, 456 bytes)
    pkts      bytes target     prot opt in     out     source               destination         
      92     6808 MASQUERADE  all  --  *      !docker0  172.17.0.0/16        0.0.0.0/0           

Chain DOCKER (2 references)
    pkts      bytes target     prot opt in     out     source               destination         
     191    11460 RETURN     all  --  docker0 *       0.0.0.0/0            0.0.0.0/0  

[root@ip-172-31-45-100 ~]# iptables -nxvL -t mangle
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination         

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination  

There’s not even a DROP rule at all and all the policies are set to ACCEPT. iptables is definitely not dropping the connection. Even if there was a DROP rule, we would see the packet on tcpdump…so where’s the packet going ?

Let’s add an extra rule for both FORWARD and INPUT chains just to see if iptables can match these rules as the packets are passing by.

[root@ip-172-31-45-100 ~]# iptables -I INPUT -p tcp --dport 9200
[root@ip-172-31-45-100 ~]# iptables -I FORWARD -p tcp --dport 9200

wait for a while and then check the statistics of those 2 rules:

[root@ip-172-31-45-100 ~]# iptables -nxvL | grep 9200
       0        0            tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:9200
       0        0            tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:9200

no packets match these 2 rules at all! Time to inspect the container and the docker bridge network.

[root@ip-172-31-45-100 ~]# docker network inspect bridge
[
    {
        "Name": "bridge",
        "Id": "a6290df54ea24d14faa8d003d17802b3f8a4967680bc0c82c1211ab75d1815e2",
        "Created": "2020-05-31T09:40:19.81958733Z",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.17.0.0/16"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Containers": {},
        "Options": {
            "com.docker.network.bridge.default_bridge": "true",
            "com.docker.network.bridge.enable_icc": "true",
            "com.docker.network.bridge.enable_ip_masquerade": "true",
            "com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
            "com.docker.network.bridge.name": "docker0",
            "com.docker.network.driver.mtu": "1500"
        },
        "Labels": {}
    }
]

pretty standard options for the bridge network, even `enable_icc` is set to `true`. What about the container though ?

[root@ip-172-31-45-100 ~]# docker inspect kibana
[
    {
        "Id": "2f08cc190b760361d9aa2951b4c9c407561fe35b8dbdc003f3f535719456f460",
        "Created": "2020-05-31T10:24:11.942315341Z",
        "Path": "/usr/local/bin/dumb-init",
        "Args": [
            "--",
            "/usr/local/bin/kibana-docker"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 3735,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2020-05-31T10:24:12.329214827Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:eadc7b3d59dd47b1b56f280732f38d16a4b31947cbc758516adbe1df5472b407",
        "ResolvConfPath": "/var/lib/docker/containers/2f08cc190b760361d9aa2951b4c9c407561fe35b8dbdc003f3f535719456f460/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/2f08cc190b760361d9aa2951b4c9c407561fe35b8dbdc003f3f535719456f460/hostname",
        "HostsPath": "/var/lib/docker/containers/2f08cc190b760361d9aa2951b4c9c407561fe35b8dbdc003f3f535719456f460/hosts",
        "LogPath": "",
        "Name": "/kibana",
        "RestartCount": 0,
        "Driver": "overlay2",
        "MountLabel": "system_u:object_r:svirt_sandbox_file_t:s0:c434,c792",
        "ProcessLabel": "system_u:system_r:svirt_lxc_net_t:s0:c434,c792",
        "AppArmorProfile": "",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": null,
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "journald",
                "Config": {}
            },
            "NetworkMode": "bridge",
            "PortBindings": {
                "5601/tcp": [
                    {
                        "HostIp": "",
                        "HostPort": "5601"
                    }
                ]
            },
            "RestartPolicy": {
                "Name": "no",
                "MaximumRetryCount": 0
            },
            "AutoRemove": true,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "docker-runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": -1,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        },
        "GraphDriver": {
            "Name": "overlay2",
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/84217deb518fa6b50fb38aab03aa6a819150e0a248cf233bda8091b136c4825a-init/diff:/var/lib/docker/overlay2/bfda0aa2ec51f7047b5694e5daf89735f3021691e6154bc370827c168c4572f0/diff:/var/lib/docker/overlay2/5d32d74a3bb95b8e3377b1c115622f12a817a591936c4ae2da4512bc2e281e4b/diff:/var/lib/docker/overlay2/6482e711a89a90bebd61834aa8bd3463f567684dd3cdbbf2698179b752fdad7b/diff:/var/lib/docker/overlay2/4ae81e6a07956c974d985674c35e113ad3fbd9f4fdde43f4752c0e36a1153e69/diff:/var/lib/docker/overlay2/8330cdd839ec316133d659805f2839d1e65b16fbf7035324f419c2aa8d097925/diff:/var/lib/docker/overlay2/cd377c8c6fb23d050771d55ca15253cc9fa5043c7e49f41a2f73acd25f8e7ca9/diff:/var/lib/docker/overlay2/408c72d7e496be76503bbb01d5248c25be98e2290d71cae83d8d5d09d714f81d/diff:/var/lib/docker/overlay2/20068d51c4dd214db7b2b9d30fe13feb2e8ab35de646c9b652fea255476d396b/diff:/var/lib/docker/overlay2/0fdedfd6dbb551d32a9e826188a74936d0cec56e97c1b917fdd04b0e49a59a70/diff:/var/lib/docker/overlay2/0238ff31fbd60fdeaee1e162c92a1aa46735ec7b17df3b11455c09f18657c30f/diff:/var/lib/docker/overlay2/99a7a64a569e5e524e4139f9cf95bd929744c85c1633bcd8173c9172756c3233/diff",
                "MergedDir": "/var/lib/docker/overlay2/84217deb518fa6b50fb38aab03aa6a819150e0a248cf233bda8091b136c4825a/merged",
                "UpperDir": "/var/lib/docker/overlay2/84217deb518fa6b50fb38aab03aa6a819150e0a248cf233bda8091b136c4825a/diff",
                "WorkDir": "/var/lib/docker/overlay2/84217deb518fa6b50fb38aab03aa6a819150e0a248cf233bda8091b136c4825a/work"
            }
        },
        "Mounts": [],
        "Config": {
            "Hostname": "2f08cc190b76",
            "Domainname": "",
            "User": "kibana",
            "AttachStdin": false,
            "AttachStdout": true,
            "AttachStderr": true,
            "ExposedPorts": {
                "5601/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "ELASTICSEARCH_HOSTS=http://172.31.45.100:9200",
                "PATH=/usr/share/kibana/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "ELASTIC_CONTAINER=true"
            ],
            "Cmd": [
                "/usr/local/bin/kibana-docker"
            ],
            "Image": "docker.elastic.co/kibana/kibana:7.7.0",
            "Volumes": null,
            "WorkingDir": "/usr/share/kibana",
            "Entrypoint": [
                "/usr/local/bin/dumb-init",
                "--"
            ],
            "OnBuild": null,
            "Labels": {
                "license": "Elastic License",
                "org.label-schema.build-date": "2020-05-12T03:25:49.654Z",
                "org.label-schema.license": "Elastic License",
                "org.label-schema.name": "kibana",
                "org.label-schema.schema-version": "1.0",
                "org.label-schema.url": "https://www.elastic.co/products/kibana",
                "org.label-schema.usage": "https://www.elastic.co/guide/en/kibana/index.html",
                "org.label-schema.vcs-url": "https://github.com/elastic/kibana",
                "org.label-schema.vendor": "Elastic",
                "org.label-schema.version": "7.7.0",
                "org.opencontainers.image.created": "2020-05-04 00:00:00+01:00",
                "org.opencontainers.image.licenses": "GPL-2.0-only",
                "org.opencontainers.image.title": "CentOS Base Image",
                "org.opencontainers.image.vendor": "CentOS"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "258a4a11f55f7425b837c7d5c0420dd344add081be79da7e33c146501dd8f0ec",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "5601/tcp": [
                    {
                        "HostIp": "0.0.0.0",
                        "HostPort": "5601"
                    }
                ]
            },
            "SandboxKey": "/var/run/docker/netns/258a4a11f55f",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "7126885dddf9ff031f2ff8c3b2cbd14708391dae619020bdb40efe7a849a01c7",
            "Gateway": "172.17.0.1",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "172.17.0.2",
            "IPPrefixLen": 16,
            "IPv6Gateway": "",
            "MacAddress": "02:42:ac:11:00:02",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "a6290df54ea24d14faa8d003d17802b3f8a4967680bc0c82c1211ab75d1815e2",
                    "EndpointID": "7126885dddf9ff031f2ff8c3b2cbd14708391dae619020bdb40efe7a849a01c7",
                    "Gateway": "172.17.0.1",
                    "IPAddress": "172.17.0.2",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:11:00:02"
                }
            }
        }
    }
]

all looks very normal regarding the docker container. Let’s check sysctl settings in /etc

[root@ip-172-31-45-100 ~]# ls -Fla /etc/sysctl.d/
total 12
drwxr-xr-x.  2 root root   28 May 31 09:34 ./
drwxr-xr-x. 84 root root 8192 May 31 10:15 ../
lrwxrwxrwx.  1 root root   14 May 31 09:34 99-sysctl.conf -> ../sysctl.conf

[root@ip-172-31-45-100 ~]# cat /etc/sysctl.d/99-sysctl.conf 

# sysctl settings are defined through files in
# /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.
#
# Vendors settings live in /usr/lib/sysctl.d/.
# To override a whole file, create a new file with the same in
# /etc/sysctl.d/ and put new settings there. To override
# only specific settings, add a file with a lexically later
# name in /etc/sysctl.d/ and put new settings there.
#
# For more information, see sysctl.conf(5) and sysctl.d(5).

nothing interesting here as well. What if someone has messed up ip forwarding via other means though ?

[root@ip-172-31-45-100 ~]# sysctl -a 2>/dev/null| grep forward | grep -v ipv6
net.ipv4.conf.all.forwarding = 1
net.ipv4.conf.all.mc_forwarding = 0
net.ipv4.conf.default.forwarding = 1
net.ipv4.conf.default.mc_forwarding = 0
net.ipv4.conf.docker0.forwarding = 1
net.ipv4.conf.docker0.mc_forwarding = 0
net.ipv4.conf.ens5.forwarding = 1
net.ipv4.conf.ens5.mc_forwarding = 0
net.ipv4.conf.lo.forwarding = 1
net.ipv4.conf.lo.mc_forwarding = 0
net.ipv4.ip_forward = 1
net.ipv4.ip_forward_use_pmtu = 0

all looks fine here too. Let’s check some more sysctl settings regarding bridge + iptables

[root@ip-172-31-45-100 ~]# sysctl -a 2>/dev/null| grep bridge
net.bridge.bridge-nf-call-arptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-filter-pppoe-tagged = 0
net.bridge.bridge-nf-filter-vlan-tagged = 0
net.bridge.bridge-nf-pass-vlan-input-dev = 0

everything still looks fine in these configuration settings, but the packets from the container still can’t reach the host.
Next step is to setup a netcat listening service on the host on a different port and try to connect to it via the container. That still doesn’t work, no packets to be seen on ens5.
Could it be ebtables ? No..no way..but what if…

[root@ip-172-31-45-100 ~]# ebtables -L
Bridge table: filter
Bridge chain: INPUT, entries: 0, policy: ACCEPT
Bridge chain: FORWARD, entries: 0, policy: ACCEPT
Bridge chain: OUTPUT, entries: 0, policy: ACCEPT

still nothing interesting. Could it be a kernel bug ? is this some custom kernel ?

[root@ip-172-31-45-100 ~]# uname -a
Linux ip-172-31-45-100.eu-central-1.compute.internal 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4 23:02:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

nope…that’s a vanilla centos7 kernel. Could it be nftables ? On 3.10 kernel and centos7 ?

[root@ip-172-31-45-100 ~]# nft list tables
-bash: nft: command not found

Nobody uses nftables yet, right ? Another wild thought, are there any ip rules defined ?

[root@ip-172-31-45-100 ~]# ip rule ls
0:    from all lookup local 
100:    from 172.31.45.100 lookup 1 
32766:    from all lookup main 
32767:    from all lookup default 

bingo, there’s a rule with priority 100 that matches the host’s IP address! What is this ip rule doing there ? Let’s check routing table 1 that the lookup of rule 100 points to

[root@ip-172-31-45-100 ~]# ip route ls table 1
default via 172.31.32.1 dev ens5 
172.31.32.0/20 dev ens5 scope link 

at last, here’s the answer!

There’s an IP rule entry that says that packets with a source IP of the ens5 interface should lookup routing entries only in routing table 1, which is not the main routing table. That routing table knows nothing about the docker network (172.17.0.0/16). Let’s delete the rule from the host

[root@ip-172-31-45-100 ~]# ip rule del from 172.31.45.100/32 tab 1 priority 100

and check if the container can contact the service now

(container) bash-4.2$ curl 172.31.45.100:9200
{
  "name" : "node1",
  "cluster_name" : "centos7",
  "cluster_uuid" : "d6fBSua6Q9OvSu534roTpA",
  "version" : {
    "number" : "7.7.0",
    "build_flavor" : "default",
    "build_type" : "rpm",
    "build_hash" : "81a1e9eda8e6183f5237786246f6dced26a10eaf",
    "build_date" : "2020-05-12T02:01:37.602180Z",
    "build_snapshot" : false,
    "lucene_version" : "8.5.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Success!

Where’s the SYN+ACK ?

Does the SYN packet reach the listening service ? No…and the reason is rp_filter. Centos7 sets net.ipv4.conf.default.rp_filter=1, so when docker0 interface gets created it is set to net.ipv4.conf.docker0.rp_filter=1.

Here’s what rp_filter values mean according to kernel documentation:

  • 0 – No source validation.
  • 1 – Strict mode as defined in RFC3704 Strict Reverse Path Each incoming packet is tested against the FIB and if the interface is not the best reverse path the packet check will fail. By default failed packets are discarded.
  • 2 – Loose mode as defined in RFC3704 Loose Reverse Path Each incoming packet’s source address is also tested against the FIB and if the source address is not reachable via any interface the packet check will fail.

After reverting the deleted ip rule via ip rule add from 172.31.45.100/32 tab 1 priority 100 and setting sysctl -w net.ipv4.conf.docker0.rp_filter=0 we can see the SYN+ACK packet going out of ens5 interface towards the default gateway.

[root@ip-172-31-45-100 ~]# tcpdump -enni ens5 port 9200
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes
15:44:19.028353 06:ce:3b:94:fe:ac > 06:1b:e5:19:30:12, ethertype IPv4 (0x0800), length 74: 172.31.45.100.9200 > 172.17.0.2.47460: Flags [S.], seq 1431574088, ack 1879007024, win 26847, options [mss 8961,sackOK,TS val 22063599 ecr 22063599,nop,wscale 7], length 0
15:44:20.029163 06:ce:3b:94:fe:ac > 06:1b:e5:19:30:12, ethertype IPv4 (0x0800), length 74: 172.31.45.100.9200 > 172.17.0.2.47460: Flags [S.], seq 1431574088, ack 1879007024, win 26847, options [mss 8961,sackOK,TS val 22064600 ecr 22063599,nop,wscale 7], length 0
15:44:21.229126 06:ce:3b:94:fe:ac > 06:1b:e5:19:30:12, ethertype IPv4 (0x0800), length 74: 172.31.45.100.9200 > 172.17.0.2.47460: Flags [S.], seq 1431574088, ack 1879007024, win 26847, options [mss 8961,sackOK,TS val 22065800 ecr 22063599,nop,wscale 7], length 0

Finding such discarded packets, called martians, in the logs can be done by enabling log_martians via sysctl -w net.ipv4.conf.all.log_martians=1. Example syslog message:

May 31 16:01:08 ip-172-31-45-100 kernel: IPv4: martian source 172.31.45.100 from 172.17.0.2, on dev docker018
May 31 16:01:08 ip-172-31-45-100 kernel: ll header: 00000000: 02 42 e9 38 3d a8 02 42 ac 11 00 02 08 00 .B.8=..B......

But why ?

Why was the rule there in the original case ? Multihoming was tried, it didn’t work as expected and not all the configs were removed. Grep-ing /etc for the host’s IP found the following file:

/etc/sysconfig/network-scripts/rule-ens5:from 172.31.45.100/32 tab 1 priority 100

In multihoming it’s common that packets reaching a host on interface X should also be replied back from interface X. Part of a method to achieve this is to assign each interface its own routing table.

So when asked to troubleshoot networking issues act like Dr. House would, assume the worst.

P.S. thanks to Markos for the comments on improving the blogpost

Graphing Covid-19 related data for Greece using Elastic’s ElasticSearch Service

TL;DR

https://covid19.void.gr

Intro

A few weeks ago I started converting the Covid-19 daily reports from Greek National Public Health Organization (EODY) to json documents, and publishing them to Github in a repository called covid19-gr-json. Reasoning behind this effort was that EODY does not publish their data in a reusable format, one that “machines” can easily read and parse. There are many websites that have created their own Covid-19 graphs but most of them have very basic stats (cases/day, deaths/day) and they state that their data sources are coming from media sources, without letting you access and re-use their original data. I haven’t been able to locate any other open data set regarding Covid-19 data for Greece that is not based on media sources but from official data coming out of EODY.

Report frequency and data consistency

After a few days of trying to convert these daily reports to json, EODY stopped publishing them. 3 days later EODY published another report with a different set of metrics than the previous reports. This new report did not have a per region analysis of cases and EODY also stopped publishing data for the number of hospitalized people or how many people have recovered. 4 more days passed without a report. Then EODY started publishing daily reports again, we’re now at 9 days in a row, but the metrics are again different than the previous ones published. There’s now a drill down by age and gender regarding cases, deaths and people in intensive care (IC). Still no mention of how many people have recovered though or how many are hospitalized but are not in IC. The result is that json files in the repository cannot be consistent since the original sources are not consistent. Such is life.

Visualizing data

A few days had passed since I started converting pdf reports to json and I wanted to see how easy it is to visualize some of it. Since data were in json format, the easiest thing for me to do was to store them to an Elasticsearch cluster and create visualizations using Kibana. Working at Elastic has its perks, I created a new Elastic stack deployment in ElasticSearch Service (ESS), and posted the json documents to it. I had to make a tough decision though, which graph tool from Kibana to use to visualize data. I started using line graphs but soon switched to Time Series Visual Builder (TSVB) as it’s easier to use and allows to visualize multiple metrics/time series in the same graph. I believe it’s producing way more beautiful visualizations anyway. After creating some visualizations I also created a dashboard, took some screenshots and shared them with friends.

Allowing access to dashboards

As people took notice of the json files in the repo they started pinging me to create some visualizations from them. I created a Kibana readonly user and shared the password with a few of them. That worked well for a couple of days, but sharing a password for a readonly view is a bit meeeh. I wondered if I could create a password-less version of it and post the URL publicly. Publishing a URL in the form of https://abc123fe48752c2da3b19.eu-central-1.aws.cloud.es.io:9243 is not exactly memorable either. I googled if it’s possible to post credentials to Kibana, and indeed it is. So nginx to the rescue!

Reverse proxy to ESS

I first tried to use a URL within void.gr, eg void.gr/covid19 to point directly to Kibana hosted on ESS. Because of the various location rules already used in my void.gr nginx config this quickly became more complicated than it had to, so I created a new subdomain for it, covid19.void.gr. Here’s the reverse proxy config:

    location / {
      proxy_pass https://abc123fe48752997c2da3b19.eu-central-1.aws.cloud.es.io:9243;
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection 'upgrade';
      proxy_set_header Host $proxy_host;
      proxy_cache_bypass $http_upgrade;
      proxy_set_header  X-Real-IP         $remote_addr; # pass on real client's IP
      proxy_set_header  X-Forwarded-For   $proxy_add_x_forwarded_for;
      proxy_set_header  X-Forwarded-Proto $scheme;
      auth_basic "Basic Auth";
      proxy_set_header Authorization "Basic bXl1c2VyOm15cGFzcw=="; # base64-encoded username:password to pass in header
      proxy_set_header x-forwarded-user $remote_user;
      proxy_hide_header kbn-license-sig;
      proxy_hide_header kbn-xpack-sig;
      proxy_hide_header kbn-name;
      proxy_hide_header x-found-handling-server;
      proxy_hide_header x-found-handling-instance;
      proxy_hide_header x-found-handling-cluster;
      proxy_hide_header x-cloud-request-id;
    }

Authorization header

Authentication is based on the base64 encoded output of username:password combination which is to be posted to Kibana. Create it via:

$ echo -n myuser:mypass | base64
bXl1c2VyOm15cGFzcw==

Adding these 2 config options is enough for nginx reverse proxy to authenticate to Kibana:

auth_basic "Basic Auth";
proxy_set_header Authorization "Basic bXl1c2VyOm15cGFzcw==";

Header removal

While not strictly necessary, I think it’s a good practice to remove non-essential headers being sent back to the browser via the reverse proxy.

End Result

One Let’s Encrypt certificate later and tada! https://covid19.void.gr is live!

Contribution

Do you want to contribute either by converting EODY’s reports to json or by improving the above visualizations dashboards ? leave a comment, send an email or ping me via twitter.