Some statistics on linux-greek-users mailing list and forum.hellug.gr

Out of boredom I decided to parse the Linux-Greek-Users (LGU) archives and create some graphs. Then I wrote a few more oneliners to deduct some numbers out of the archives. These numbers may or may not mean anything to someone, it’s entirely up to the reader.. Since the archives contain some amount of spam (not too much though) one must take that into consideration as well while reading the numbers I extracted below…

First thing I did was to download the index file containing the links to the monthly archives since February 1997:
wget http://lists.hellug.gr/pipermail/linux-greek-users/

Then download each month’s archive:
for i in `grep date index.html | cut -d"\"" -f2`; do foo=`echo $i|cut -d"/" -f1`; wget http://lists.hellug.gr/pipermail/linux-greek-users/$i -O $foo-date.html ; done

After a while I had the full archives at my disk…
Then it was time for the first metric.
How many posts does each month have ?
This was quite easy because mailman contains this information inside each month’s archive:
for i in *-date.html; do count=`grep -i "Messages:" $i | cut -d">" -f 3|cut -d"< " -f1`; echo "$i $count" >> count.txt;done
That command gave me an output such as:

1999-May-date.html 985
1999-November-date.html 1441
1999-October-date.html 1148
1999-September-date.html 1369
2000-April-date.html 690
2000-August-date.html 354
2000-December-date.html 444
2000-February-date.html 833

It didn’t take me a long time before I had this data entered inside OOcalc. Time for the first graph:
LGU posts per month:
lgu-posts_per_month

And the average monthly posts per year:
lgu-monthly_average_posts_per_year

One can certainly argue there’s a tendency here. 2008 had the fewer posts than any other year…

What’s more interesting is who are actually writing on this list. To extract that information made the following oneliners:
First of all I converted all html files to utf8 because the archives are kept at iso-8859-7 encoding and I am using a unicode terminal:
for i in *-date.html; do iconv -f iso-8859-7 -t utf-8 $i -o $i.utf; done
Then I could find all the authors for each monthly archive:
for i in *-date.html.utf; do grep "" $i | cut -d">" -f 2 >$i-authors.txt; done
Then count each author’s posts:
for i in *-date.html.utf-authors.txt; do sort $i | uniq -c | sort -n >$i-sorted-count; done
Easiest thing to do is to count how many people post per month. Are the users of the mailing list increasing or decreasing ?
for i in *-sorted-count; do wc -l $i >> posters-per-month; done
That gave a nice output like that:

55 1997-April-date.html.utf-authors.txt-sorted-count
49 1997-August-date.html.utf-authors.txt-sorted-count
88 1997-December-date.html.utf-authors.txt-sorted-count
62 1997-February-date.html.utf-authors.txt-sorted-count
41 1997-July-date.html.utf-authors.txt-sorted-count
60 1997-June-date.html.utf-authors.txt-sorted-count
62 1997-March-date.html.utf-authors.txt-sorted-count
69 1997-May-date.html.utf-authors.txt-sorted-count

I inserted that to OOcalc again and here is the output:
lgu-posters_per_month1

The decline in the number of posters is clearly shown.

Another interesting statistic would be the months with the most and fewer posters respectively:
Months with most posters:
sort -n posters-per-month | tail -n10

166 1999-May-date.html.utf-authors.txt-sorted-count
167 2001-October-date.html.utf-authors.txt-sorted-count
168 1999-December-date.html.utf-authors.txt-sorted-count
173 1999-October-date.html.utf-authors.txt-sorted-count
173 2001-June-date.html.utf-authors.txt-sorted-count
174 2001-May-date.html.utf-authors.txt-sorted-count
176 1999-September-date.html.utf-authors.txt-sorted-count
184 1999-November-date.html.utf-authors.txt-sorted-count
192 2000-March-date.html.utf-authors.txt-sorted-count
222 2004-November-date.html.utf-authors.txt-sorted-count

Months with fewer posters:
sort -n posters-per-month | head -n10

16 2008-August-date.html.utf-authors.txt-sorted-count
32 2008-December-date.html.utf-authors.txt-sorted-count
36 2008-July-date.html.utf-authors.txt-sorted-count
41 1997-July-date.html.utf-authors.txt-sorted-count
44 2008-May-date.html.utf-authors.txt-sorted-count
49 1997-August-date.html.utf-authors.txt-sorted-count
51 2008-September-date.html.utf-authors.txt-sorted-count
51 2009-March-date.html.utf-authors.txt-sorted-count
53 2003-August-date.html.utf-authors.txt-sorted-count
53 2009-February-date.html.utf-authors.txt-sorted-count

Then I got interested in finding out who are the top posters throughout the archives. I didn’t want to write anything complex in order to sum every user’s post and I decided to find out who are the top5 posters in each month and the see whose names are repeated over and over in the top5.

Using the following oneliner I was able to find the top5 posters of each month:
for i in *-count; do tail -n5 $i >> top5-of-each-month.txt; done

And then using a bit of perl I was able to extract the posters appearing most times, here’s the top 20 of them:

8 Alexios Chouchoulas
9 fs
9 Spiros Bolis
10 Alexandros Papadopoulos
10 Giannis Stoilis
10 Harris Kosmidhs
10 Vasilis Vasaitis
11 Giannis Papadopoulos
11 Michael Iatrou
11 Panos Katsaloulis
11 Άγγελος Οικονομόπουλος
16 George Notaras
17 Nick Demou
24 George Daflidis-Kotsis
24 Michalis Kabrianis
33 I.Ioannou
41 DJ Art
52 V13
74 Christos Ricudis
83 Giorgos Keramidas

The number appearing before the name is the number of months the poster has been inside the top5 posters for a month. That means that Giorgos Keramidas is in the top5 posters for 83 months out of 147 months of archives that I parsed. Pretty impressive!

Then I wanted to create some graphs about http://forum.hellug.gr as well. HELLUG’s forum has less than 2 years of life so the graphs cannot really be compared to the ones from LGU. I just put them here for completeness.

Forum.hellug.gr – Posts per month:
forumhelluggr-posts_per_month

Forum.hellug.gr – New members per month:
forumhelluggr-new_members_per_month

I won’t do any further comments…I’d be glad to see yours though 🙂

P.S. I know that I could have used some form of database to store and process the results of those commands but I wanted to keep it as simple as possible.

5 Responses to “Some statistics on linux-greek-users mailing list and forum.hellug.gr”

  1. May 3rd, 2009 | 00:00
    Using Opera Opera 9.64 on Linux Linux

    Looking at yours graphs, especially the one about the yearly posts on LGU mailing lists, I think that the low activity is due to the increased numbers of greek linux forums. It is easier for a user to monitor a post via a forum platform rather than a mailing list.

  2. May 3rd, 2009 | 02:06
    Using Mozilla Firefox Mozilla Firefox 3.0.9 on FreeBSD FreeBSD

    I know I have been posting many email messages since I got hooked with email, but I didn’t realize I was posting that many 🙂

    I’ have to ponder a bit about the apparent decline in email posts. One of the things I was sort of expecting was a rise in forum subscriptions and an equally impressive rise in forum posts. While there are new forum users subscribing every month, I am not sure these are the users who would post to l.g.u anyway, so we have to look elsewhere for the real reasons of the decrease in list email.

    One of the reasons may be that the list subscribers are a mostly static group of people, who are getting older. We used to be about 20 years old when l.g.u started. We are not in our 30’s and have other things to do, so we post less and less messages, because we have a few more things to keep us busy now.

    I am not sure, though, and I certainly feel highly uncertain about extrapolating from my personal experience to draw conclusions about the wider population of the entire l.g.u subscriber base.

  3. May 3rd, 2009 | 02:12
    Using Mozilla Firefox Mozilla Firefox 3.0.9 on FreeBSD FreeBSD

    I know I have been posting many email messages since I got hooked with email, but I didn’t realize I was posting that many 🙂

    I’ have to ponder a bit about the apparent decline in email posts. One of the things I was sort of expecting was a rise in forum subscriptions and an equally impressive rise in forum posts. While there are new forum users subscribing every month, I am not sure these are the users who would post to l.g.u anyway, so we have to look elsewhere for the real reasons of the decrease in list email.

    One of the reasons may be that the list subscribers are a mostly static group of people, who are getting older. We used to be about 20 years old when l.g.u started. We are not in our 30’s and have other things to do, so we post less and less messages, because we have a few more things to keep us busy now.

    I am not sure, though, and I certainly feel highly uncertain about extrapolating from my personal experience to draw conclusions about the wider population of the entire l.g.u subscriber base.

    I think it makes sense to factor into any attempt to analyze the post count graphs what Markos said too. Back in 1997 forums were a new and slightly unknown way of posting information online. The proliferation of different Greek forums during the last few years may have played a very significant role in fragmenting the user base to many disjoint parts. The total of all Greek computer users who know about Linux and use the Internet has probably increased a lot, but we now have a multitude of places to post, instead of 4-5 “well known” mailing lists.

  4. May 3rd, 2009 | 05:06
    Using WordPress WordPress 2.6.5

    […] Here is the original post:  Some statistics on linux-greek-users mailing list and forum.hellug … […]

  5. May 3rd, 2009 | 12:38
    Using Mozilla Firefox Mozilla Firefox 3.0.10 on Mac OS X Mac OS X 10

    My personal view on this is leaning towards Keramidas’ view. The average age on LGU is surely above 25 and that means people are getting busier and busier with their everyday tasks. This leaves them with fewer time to provide “support” on a mailing list or even to post something humorous like they used to.

    What’s clear is that LGU and forum.hellug.gr has a very different pool of posters. Forum did not actually steal any LGU members, as many had feared that would happen, but it covers an entirely different part of the Linux users pool.

    The latest LGU posts are of very high complexity, they are mostly asked by people who are already experienced and they need some very specific guidance. On the other hand, the posts on the forum are mostly from more inexperienced users and they are general topics like “how to change my resolution”, “how to add greek support”, etc. This topics were once scoffed on LGU and people had stopped posting such topics before forum even started.

    Both LGU and forum should continue doing their much needed work. There were/are/will be always issues with user behavior on both but I hope that these won’t stop people from asking questions or stopping others from replying to them.

Leave a reply