Resolving OSSEC active response iptables issues

The past few days some of my servers are having difficult times due to the increase of spam by some botnet(s). From around 600-700 emails per day for unknown addresses/recipients on local domains, this number reached a peak of 8.000 emails 2 days ago. In order to reduce further botnet attempts I’m having ossec to engage, which in turn tries to firewall hosts.

That worked quite ok for a while but then I’ve started seeing errors in the active-response.log like the ones below:

Unable to run (iptables returning != 3): 1 – /var/ossec/active-response/bin/firewall-drop.sh delete – 91.121.21.8 1310919172.51029 31106
Unable to run (iptables returning != 1): 1 – /var/ossec/active-response/bin/firewall-drop.sh delete – 79.149.198.149 1310919524.52191 3302
Unable to run (iptables returning != 1): 2 – /var/ossec/active-response/bin/firewall-drop.sh delete – 79.149.198.149 1310919524.52191 3302
Unable to run (iptables returning != 1): 3 – /var/ossec/active-response/bin/firewall-drop.sh delete – 79.149.198.149 1310919524.52191 3302
Unable to run (iptables returning != 1): 4 – /var/ossec/active-response/bin/firewall-drop.sh delete – 79.149.198.149 1310919524.52191 3302
Unable to run (iptables returning != 1): 5 – /var/ossec/active-response/bin/firewall-drop.sh delete – 79.149.198.149 1310919524.52191 3302
Unable to run (iptables returning != 4): 1 – /var/ossec/active-response/bin/firewall-drop.sh add – 115.242.188.157 1310969220.1045522 3302

Obviously iptables is busy doing something else at the time, adding or deleting some other rule, so the loop inside firewall-drop.sh sometimes fails. That was a bit worrying, I had to fix ossec so one way or another so that iptables rules would eventually be applied. I’ve faced the same issue with iptables in the past, trying to simultaneously add multiple (>5) iptables rules at exactly the same time is very error prone, there’s no way to tell which of those rules will be applied. In order to circumvent the issue, I added locking to the active response script.

Whenever it comes to locking with shell scripts I am using a set of four functions inside a file that I source when I need to. I place this file usually inside /usr/local/bin/ under the lock.sh filename.

lockme () {
    if [ -z "$1" ];then
        echo " o Use an argument to lock"
        return 1
    fi
    if [ -z "$2" ];then
        PID=$$
    else
        PID=$2
    fi
    LOCK_PID_FILE=/var/lock/$1
    if [ -f $LOCK_PID_FILE ];then
        sleep 1
        echo " o Lock file found"
        if [ ! -d /proc/`cat $LOCK_PID_FILE 2>/dev/null` ];then
            echo " o Stale lock file ignoring..."
            rm -f $LOCK_PID_FILE
        else
            return 1
        fi  
    fi  
    #temp file
    echo -n $PID > $LOCK_PID_FILE.$PID
    ln -s $LOCK_PID_FILE.$PID $LOCK_PID_FILE && return 0
    rm -f $LOCK_PID_FILE.$PID
    return 1
}

lockme_wait () {
    if [ -z "$1" ];then
        echo " o Use an argument to lock"
        return 1
    fi  
    if [ -z "$2" ];then
        PID=$$
    else
        PID=$2
    fi  
    while [ 1 ];do
        lockme $1 $PID && break
        sleep 4
    done
    return 0
}

unlockme () {
    if [ -z "$1" ];then
        echo " o Use an argument to unlock"
        return 1
    fi
    #remove pid file
    rm -f /var/lock/$1.`cat /var/lock/$1 2>/dev/null`
    rm -f /var/lock/$1
    return 0
}   

kill_locked () {
    if [ -z "$1" ];then
        echo " o Use an argument to kill_locked"
        return 1
    fi
    if [ -e /var/lock/$1 ]; then
        kill `cat /var/lock/$1 2>/dev/null`
    fi
    rm -f /var/lock/$1.`cat /var/lock/$1 2>/dev/null`
    rm -f /var/lock/$1
}

You can also use %s/var\/lock/tmp/g if you prefer having the locks on the /tmp which is usually ramfs, partition.

Afterwards I edited /var/ossec/active-response/bin/firewall-drop.sh to just add 3 lines. (I only edited the relevant Linux section of the script, since I haven’t tested, or don’t even know if it’s needed on the BSD, SunOS sections, I left those unedited):

  • Add . /usr/bin/lock.sh right after the “# Checking for an IP” section (around line 45)
  • Right after “# Executing and exiting” add lockme_wait active-response (around line 75)
  • Right after the second while loop finishes, after “done” and before “exit 0” add unlockme active-response (around line 110)
  • That’s it…just 3 lines added and the errors have completely stopped since then.

    P.S. Yes, I could have used lockfile-progs to achieve the same result, but I (also) use lock.sh file in embedded systems when needed, and it’s far more portable and easy.