<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Into.the.Void. &#187; forum.hellug.gr</title>
	<atom:link href="http://www.void.gr/kargig/blog/tag/forumhelluggr/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.void.gr/kargig/blog</link>
	<description>Into The Void</description>
	<lastBuildDate>Sat, 07 Aug 2010 08:06:02 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Some statistics on linux-greek-users mailing list and forum.hellug.gr</title>
		<link>http://www.void.gr/kargig/blog/2009/05/02/some-statistics-on-linux-greek-users-mailing-list-and-forumhelluggr/</link>
		<comments>http://www.void.gr/kargig/blog/2009/05/02/some-statistics-on-linux-greek-users-mailing-list-and-forumhelluggr/#comments</comments>
		<pubDate>Sat, 02 May 2009 14:00:32 +0000</pubDate>
		<dc:creator>kargig</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[forum]]></category>
		<category><![CDATA[forum.hellug.gr]]></category>
		<category><![CDATA[HELLUG]]></category>
		<category><![CDATA[LGU]]></category>
		<category><![CDATA[linux greek users]]></category>
		<category><![CDATA[oneliner]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.void.gr/kargig/blog/?p=463</guid>
		<description><![CDATA[Out of boredom I decided to parse the Linux-Greek-Users (LGU) archives and create some graphs. Then I wrote a few more oneliners to deduct some numbers out of the archives. These numbers may or may not mean anything to someone, it&#8217;s entirely up to the reader.. Since the archives contain some amount of spam (not [...]]]></description>
			<content:encoded><![CDATA[<p>Out of boredom I decided to parse the Linux-Greek-Users (LGU) archives and create some graphs. Then I wrote a few more oneliners to deduct some numbers out of the archives. These numbers may or may not mean anything to someone, it&#8217;s entirely up to the reader.. Since the archives contain some amount of spam (not too much though) one must take that into consideration as well while reading the numbers I extracted below&#8230;</p>
<p>First thing I did was to download the index file containing the links to the monthly archives since February 1997:<br />
<code>wget http://lists.hellug.gr/pipermail/linux-greek-users/</code></p>
<p>Then download each month&#8217;s archive:<br />
<code>for i in `grep date index.html | cut -d"\"" -f2`; do foo=`echo $i|cut -d"/" -f1`; wget http://lists.hellug.gr/pipermail/linux-greek-users/$i -O $foo-date.html ; done</code><br />
<span id="more-463"></span><br />
After a while I had the full archives at my disk&#8230;<br />
Then it was time for the first metric.<br />
<strong>How many posts does each month have ?</strong><br />
This was quite easy because mailman contains this information inside each month&#8217;s archive:<br />
<code>for i in *-date.html; do count=`grep -i "Messages:" $i | cut -d">" -f 3|cut -d"< " -f1`; echo "$i $count" >> count.txt;done</code><br />
That command gave me an output such as:</p>
<blockquote><p>1999-May-date.html 985<br />
1999-November-date.html 1441<br />
1999-October-date.html 1148<br />
1999-September-date.html 1369<br />
2000-April-date.html 690<br />
2000-August-date.html 354<br />
2000-December-date.html 444<br />
2000-February-date.html 833<br />
&#8230;</p></blockquote>
<p>It didn&#8217;t take me a long time before I had this data entered inside OOcalc. Time for the first graph:<br />
LGU posts per month:<br />
<a href="http://www.void.gr/kargig/blog/wp-content/lgu-posts_per_month.png"><img src="http://www.void.gr/kargig/blog/wp-content/lgu-posts_per_month-300x77.png" alt="lgu-posts_per_month" title="lgu-posts_per_month" width="300" height="77" class="alignnone size-medium wp-image-464" /></a></p>
<p>And the average monthly posts per year:<br />
<a href="http://www.void.gr/kargig/blog/wp-content/lgu-monthly_average_posts_per_year.png"><img src="http://www.void.gr/kargig/blog/wp-content/lgu-monthly_average_posts_per_year-299x165.png" alt="lgu-monthly_average_posts_per_year" title="lgu-monthly_average_posts_per_year" width="299" height="165" class="alignnone size-medium wp-image-466" /></a></p>
<p>One can certainly argue there&#8217;s a tendency here. 2008 had the fewer posts than any other year&#8230;</p>
<p>What&#8217;s more interesting is <strong>who are actually writing on this list</strong>. To extract that information made the following oneliners:<br />
First of all I converted all html files to utf8 because the archives are kept at iso-8859-7 encoding and I am using a unicode terminal:<br />
<code>for i in *-date.html; do iconv -f iso-8859-7 -t utf-8 $i -o $i.utf; done</code><br />
Then I could find all the authors for each monthly archive:<br />
<code>for i in *-date.html.utf; do grep "<i>" $i | cut -d">" -f 2 >$i-authors.txt; done</i></code><br />
Then count each author&#8217;s posts:<br />
<code>for i in *-date.html.utf-authors.txt; do sort $i | uniq -c | sort -n >$i-sorted-count; done</code><br />
Easiest thing to do is to <strong>count how many people post per month</strong>. Are the users of the mailing list increasing or decreasing ?<br />
<code>for i in *-sorted-count; do wc -l $i >> posters-per-month; done</code><br />
That gave a nice output like that:</p>
<blockquote><p>55 1997-April-date.html.utf-authors.txt-sorted-count<br />
49 1997-August-date.html.utf-authors.txt-sorted-count<br />
88 1997-December-date.html.utf-authors.txt-sorted-count<br />
62 1997-February-date.html.utf-authors.txt-sorted-count<br />
41 1997-July-date.html.utf-authors.txt-sorted-count<br />
60 1997-June-date.html.utf-authors.txt-sorted-count<br />
62 1997-March-date.html.utf-authors.txt-sorted-count<br />
69 1997-May-date.html.utf-authors.txt-sorted-count<br />
&#8230;
</p></blockquote>
<p>I inserted that to OOcalc again and here is the output:<br />
<a href="http://www.void.gr/kargig/blog/wp-content/lgu-posters_per_month1.png"><img src="http://www.void.gr/kargig/blog/wp-content/lgu-posters_per_month1-300x179.png" alt="lgu-posters_per_month1" title="lgu-posters_per_month1" width="300" height="179" class="alignnone size-medium wp-image-469" /></a></p>
<p>The decline in the number of posters is clearly shown.</p>
<p>Another interesting statistic would be the months with the most and fewer posters respectively:<br />
<strong>Months with most posters</strong>:<br />
<code>sort -n posters-per-month | tail -n10</code></p>
<blockquote><p>166 1999-May-date.html.utf-authors.txt-sorted-count<br />
167 2001-October-date.html.utf-authors.txt-sorted-count<br />
168 1999-December-date.html.utf-authors.txt-sorted-count<br />
173 1999-October-date.html.utf-authors.txt-sorted-count<br />
173 2001-June-date.html.utf-authors.txt-sorted-count<br />
174 2001-May-date.html.utf-authors.txt-sorted-count<br />
176 1999-September-date.html.utf-authors.txt-sorted-count<br />
184 1999-November-date.html.utf-authors.txt-sorted-count<br />
192 2000-March-date.html.utf-authors.txt-sorted-count<br />
222 2004-November-date.html.utf-authors.txt-sorted-count</p></blockquote>
<p><strong>Months with fewer posters</strong>:<br />
<code>sort -n posters-per-month | head -n10</code></p>
<blockquote><p>16 2008-August-date.html.utf-authors.txt-sorted-count<br />
32 2008-December-date.html.utf-authors.txt-sorted-count<br />
36 2008-July-date.html.utf-authors.txt-sorted-count<br />
41 1997-July-date.html.utf-authors.txt-sorted-count<br />
44 2008-May-date.html.utf-authors.txt-sorted-count<br />
49 1997-August-date.html.utf-authors.txt-sorted-count<br />
51 2008-September-date.html.utf-authors.txt-sorted-count<br />
51 2009-March-date.html.utf-authors.txt-sorted-count<br />
53 2003-August-date.html.utf-authors.txt-sorted-count<br />
53 2009-February-date.html.utf-authors.txt-sorted-count</p></blockquote>
<p>Then I got interested in finding out <strong>who are the top posters throughout the archives</strong>. I didn&#8217;t want to write anything complex in order to sum every user&#8217;s post and I decided to find out who are the top5 posters in each month and the see whose names are repeated over and over in the top5.</p>
<p>Using the following oneliner I was able to find the top5 posters of each month:<br />
<code>for i in *-count; do tail -n5 $i >> top5-of-each-month.txt; done</code></p>
<p>And then using a bit of perl I was able to extract the posters appearing most times, here&#8217;s the top 20 of them:</p>
<blockquote><p>            8  Alexios Chouchoulas<br />
      9  fs<br />
      9  Spiros Bolis<br />
     10  Alexandros Papadopoulos<br />
     10  Giannis Stoilis<br />
     10  Harris Kosmidhs<br />
     10  Vasilis Vasaitis<br />
     11  Giannis Papadopoulos<br />
     11  Michael Iatrou<br />
     11  Panos Katsaloulis<br />
     11  Άγγελος Οικονομόπουλος<br />
     16  George Notaras<br />
     17  Nick Demou<br />
     24  George Daflidis-Kotsis<br />
     24  Michalis Kabrianis<br />
     33  I.Ioannou<br />
     41  DJ Art<br />
     52  V13<br />
     74  Christos Ricudis<br />
     83  Giorgos Keramidas</p></blockquote>
<p>The number appearing before the name is the number of months the poster has been inside the top5 posters for a month. That means that Giorgos Keramidas is in the top5 posters for 83 months out of 147 months of archives that I parsed. Pretty impressive!</p>
<p>Then I wanted to create some graphs about <a href="http://forum.hellug.gr">http://forum.hellug.gr</a> as well. HELLUG&#8217;s forum has less than 2 years of life so the graphs cannot really be compared to the ones from LGU. I just put them here for completeness.</p>
<p><strong>Forum.hellug.gr &#8211; Posts per month</strong>:<br />
<a href="http://www.void.gr/kargig/blog/wp-content/forumhelluggr-posts_per_month.png"><img src="http://www.void.gr/kargig/blog/wp-content/forumhelluggr-posts_per_month-300x165.png" alt="forumhelluggr-posts_per_month" title="forumhelluggr-posts_per_month" width="300" height="165" class="alignnone size-medium wp-image-476" /></a></p>
<p><strong>Forum.hellug.gr &#8211; New members per month</strong>:<br />
<a href="http://www.void.gr/kargig/blog/wp-content/forumhelluggr-new_members_per_month.png"><img src="http://www.void.gr/kargig/blog/wp-content/forumhelluggr-new_members_per_month-300x215.png" alt="forumhelluggr-new_members_per_month" title="forumhelluggr-new_members_per_month" width="300" height="215" class="alignnone size-medium wp-image-477" /></a></p>
<p>I won&#8217;t do any further comments&#8230;I&#8217;d be glad to see yours though <img src='http://www.void.gr/kargig/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>P.S. I know that I could have used some form of database to store and process the results of those commands but I wanted to keep it as simple as possible. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.void.gr/kargig/blog/2009/05/02/some-statistics-on-linux-greek-users-mailing-list-and-forumhelluggr/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
