User:Bawolff bot
From Wikinews, the free news source you can write!
Summary
- Owner: user:Bawolff
- Updates {{popular articles}}
- Most popular overall this last hour
This bot is owned by user:Bawolff
If it has any problems block if necessary and, please leave a note at user:Bawolff (This should never happen though, because I should always be around when it does stuff. If I am around, try and find me on irc as well). The bot is currently running.
This bot is run from the toolserver. Thank you toolserver folks for letting me use your servers.
Current Caveats
- It only runs when my computer is on, which is not that often
- Statistics can be distorted rather easily
- A hit does not necessarily equal a page view. Could be googlebot indexing us. Could be someone repeatedly hitting refresh. Could be javascript fetching pages, without the user ever seeing it (although most js uses the api, so its not counted), could be many things.
- Redirects are counted as separate pages.
- It may lie about the time period it used to generate statistics. (If it gets a 404 on the stats file, it uses the file from last hour, but still reports using the latest hour in some places)
- Counts interwiki interlanguage links as hits (say you link to chinese wikinews from english wikipedia, that gets routed through us, and is counted as a hit). This is only for both interlanguage and interproject.
New source code
- This also will filter out anything not in category:Published
- If you adapt it, you will have to change somethings, like the email in the user-agent, and the cd command at the begging of each file. and the username/password in post.dat
doPopArticle.sh
#!/bin/bash -- #Note: this script + save.sh and update.sh are under the GPL version 2 # as published by the free software foundation. cd /home/bawolff/pop temp=`mktemp -p . tmp.XXXXXXXXXXXXXXXX` ./update.sh > "$temp" ./save.sh 'Template:Popular_articles' "$temp" rm -f "$temp"
save.sh
#!/bin/bash -- cd /home/bawolff/pop # first param article name, second file containing article text. postDat=`cat post.dat` cookies=`mktemp -p . tmp.XXXXXXXXXXXXXXXXX ` site='http://en.wikinews.org/w/api.php' token=`wget --post-data $postDat --save-cookies "$cookies" --keep-session-cookies --header 'User-agent: Wikinews popular article bot - bawolff+wnbot@somewhere.invalid' -q --header 'From: bawolff@somewhere.invalid' "$site" -O - |egrep '^\s*token:'|cut -d : -f 2|tr -d ' '` #echo $token #echo "`cat post.dat`&token=$token&" res=`wget -q --post-data "$postDat&lgtoken=$token&" --save-cookies "$cookies" --load-cookies "$cookies" --keep-session-cookies --header 'User-agent: Wikinews popular article bot - bawolff+wnbot@somewhere.invalid' --header 'From: bawolff+wnbot@somewhere.invalid' "$site" -O - | egrep '^\s*result: Success$' ` #echo result #echo d"$res"d if [ -z "$res" ] then echo Error logging in 1>&2 exit 1 fi editToken=`wget "${site}?action=query&prop=info&titles=${1}&intoken=edit&format=yaml" -q --save-cookies "$cookies" --load-cookies "$cookies" --keep-session-cookies --header 'User-agent: Wikinews popular article bot - bawolff+wnbot@somewhere.com' --header 'From: bawolff+wnbot@somewhere.com' -O - |egrep '^\s*edittoken:' | sed 's/^\s*edittoken:\s\([a-f0-9]*\)../\1%2B%5C/g' ` #echo $editToken temp=`mktemp -p . tmp.XXXXXXXXXXXXXXXXXX ` echo -n "action=edit&format=yaml&title=${1}&token=${editToken}&summary=Updating%20popular%20artcle%20list&bot&minor&assert=user&text=" > $temp tr \\n \\v < $2 |sed -e 's/%/%25/g' -e 's/\v/%0A/g' -e 's/ /%20/g' -e 's/\+/%2B/g' -e 's/&/%26/g' >> $temp #cat $temp wget -q --post-file "$temp" --save-cookies "$cookies" --load-cookies "$cookies" --keep-session-cookies --header 'User-agent: Wikinews popular article bot - bawolff+wnbot@somewhere.com' --header 'From: bawolff+wnbot@somewhere.com' "$site" -O /dev/null rm -f $temp rm -f $cookies #page=sed
update.sh
#!/bin/bash -- #downloads statistics #figures out what is relevent to wikinews #new wikimarkup for stats page to standard out cd /home/bawolff/pop check_if_there () { #takes a date format string that equals url of stats, and a relative date/time. #checks http status code #returns 0 for 200, 1 for 404, and exits shell script for anything else isThere=$(HEAD -S -H 'User-agent: Wikinews stats bot. Contact [[user:Bawolff]]' -H 'From: bawolff+wnb@somewhere.com' $(date -d "$2" -u "$1") | head -n 1 |sed 's/.*--> \([0-9][0-9][0-9]\).*/\1/') if [ "$isThere" = 200 ] then return 0 # it is there, success! elif [ "$isThere" = 404 ] then return 1 # not there, try the next one elif [ "$isThere" = 301 ] then return 1 else exit 1 # strange status code, bail out fi } get_and_make() { #takes a date format string that equals url of stats, and a relative date/time #gets the corresponding file, greps lines relevant to en.wikinews #cuts out Main Page and other namespaces, and gives [hits in hour, pagetitle]. sorts, takes top 20 #wikifies (including hour for map in [[Template:Popular articles/top]]) and outputs to stdout #to increase number of results you have to change head filter, AND change number of closing }}s export LC_ALL=C #for sorting sortedArticleList=`mktemp -p . tmp.XXXXXXXXXXXXXXX || echo "articleList-$$.tmp"` pubAPIRes=`mktemp -p . tmp.XXXXXXXXXXXXXXXXXX || echo "pubAPIRes-$$.tmp"` pubList=`mktemp -p . tmp.XXXXXXXXXXXXXXXXXXXXXXXXX || echo "pubList-$$.tmp"` filteredArticleList=`mktemp -p . tmp.XXXXXXXXXXXXXXXXXXX || echo "filteredList-$$.tmp"` #get pop articles. rm obvious non-articles take 45 most pop, then resort alphabetical (needed for join) wget `date -d "$2" -u "$1"` -q --header\='User-agent: Wikinews stats bot. Contact [[user:Bawolff]]' --header\='From: me@somewhere.com' -O - \ |zgrep 'en\.n' |awk '-F ' '{if ($2 !~ /(^Main_Page)|(^Talk:)|(^User:)|(^User_talk:)|(^Wikinews:)|(^Wikinews_talk:)|(^Category:)|(^Category talk:)|(^File:)|(^File talk:)|(^Special:)|(^en:)|(^Http:)/) print $3, $2}' \ |sed 's/%27/'\'/g \ |sort -g -r \ |head -n 45 \ |sort -k 2 > "$sortedArticleList" #list of the $sortedArticleList that are in category pub wget 'http://en.wikinews.org/w/api.php?action=query&prop=categories&clcategories=Category:Published&format=xml&cllimit=max&titles='"`cut -d ' ' -f 2 $sortedArticleList |tr '\n' '|'`" -O "$pubAPIRes" -q --header\='User-agent: Wikinews stats bot. Contact [[user:Bawolff]]' --header\='From: somewhere@replacewithyouremail.com' #turn into nice newline seperated list. echo 'cat api/query/normalized/n[@to=/api/query/pages/page[categories]/@title]/@from' | xmllint $pubAPIRes --shell --noent|sed -n -e 's/"/"/' -e 's/^ from\=\"\(.*\)\"$/\1/p' | sort > "$pubList" #remove non-published from $sortedArticleList join -o 1.1\ 1.2 -1 2 "$sortedArticleList" "$pubList" \ |sort -g -r > $filteredArticleList rm -f "$sortedArticleList" "$pubList" "$pubAPIRes" #take space seperated value, turn to wikisyntax head -n 15 < "$filteredArticleList" |awk 'BEGIN { HOURSTART = "'$(date -u -d '1 hour ago' +%H)'" ;HOUREND = "'$(date -u +%H)'"; print "<noinclude>{{/top|" HOURSTART "}}</noinclude>"} {print "{{#ifexpr: {{{top|40}}} > "NR-1"|# [[:" gensub(/_/, " ", "g", $2) "]] {{#if:{{{nohits|}}}|| <small>('\'\'\''" $1 "'\'\'\'' hits last hour)</small>}}"} END {print "}} }} }} }} }} }} }} }} }} }} }} }} }} }} }} \n<noinclude>\nThese statistics are generated from [http://dammit.lt/wikistats/ Wikistats]. They are based on number of visits to each page over the last hour. These statistics include all visits, both by people and by automated computer programs. Although these are probably reasonably accurate, they are easy to distort. Please note that sometimes these statistics are updated on an irregular basis. This page was generated at 21:36, 13 September 2010 (UTC) for the time period " HOURSTART ":00–" HOUREND ":00 UTC.</noinclude>"}' rm -f "$filteredArticleList" } # try each of these until we get one if check_if_there '+http://dammit.lt/wikistats/pagecounts-%Y%m%d-%H0000.gz' now then get_and_make '+http://dammit.lt/wikistats/pagecounts-%Y%m%d-%H0000.gz' now elif check_if_there '+http://dammit.lt/wikistats/pagecounts-%Y%m%d-%H0001.gz' now # sometimes files are a minute late then get_and_make '+http://dammit.lt/wikistats/pagecounts-%Y%m%d-%H0001.gz' now elif check_if_there '+http://dammit.lt/wikistats/pagecounts-%Y%m%d-%H0000.gz' '1 hour ago' then get_and_make '+http://dammit.lt/wikistats/pagecounts-%Y%m%d-%H0000.gz' '1 hour ago' elif check_if_there '+http://dammit.lt/wikistats/pagecounts-%Y%m%d-%H0001.gz' '1 hour ago' then get_and_make '+http://dammit.lt/wikistats/pagecounts-%Y%m%d-%H0001.gz' '1 hour ago' else # none of them worked :( exit 2 fi
post.dat
action=login&lgname=Bawolff_bot&lgpassword=*****&format=yaml&
What it does (old)
- takes domas's statistics [1]. Gets whats relevent to wikinews. Takes the top 20 (this could be easily changed to anything) along with total hits.Makes a wikipage out of it. Uses pywikipedia sandbot to reset template:Popular articles based on output.
- Source code: (if you want to run it, change your email to reflect something appropriate, make sure the directory structure matches (might want to change some relative paths to absolute). this expects a pywikipedia install in a directory called pywikipedia, and lwp-request (part of perl), and wget to be installed
- wn_stats/update.py (main script called to do stuff)
#!/bin/sh -- #cd /some path to/wn_stats/ #false || echo false if ! ./make_stats.sh >cur_stats.wiki then cd ../pywikipedia/ python bot_error.py echo wikinews statistics error 1>&2 exit 1 fi cd ../pywikipedia/ python update_newpop.py && python bot_ok.py #file is hardcoded
- wn_stats/make_stats.sh
#!/bin/bash -- #downloads statistics #figures out what is relevent to wikinews #new wikimarkup for stats page to standard out cd /home/bawolff/src/wn_stats/ check_if_there () { #takes a date format string that equals url of stats. check http status code # returns 0 for 200, 1 for 404, and exits shell script for anything else isThere=$(HEAD -H 'From: Bawolff+wnbots@**somewhere**.invalid' $(date -d "$2" -u "$1") | head -n 1 |cut -d ' ' -f 1) #echo $isThere is status if [ "$isThere" = 200 ] then #echo it is there return 0 elif [ "$isThere" = 404 ] then #echo not there `date -d "$2" -u "$1"` return 1 else exit 1 fi } get_and_make() { #wget `date -d "$2" -u "$1"` -q -O - #cat pagecounts-20080713-170000.gz #to increae count you have to change head filter, AND change {{#ifexpr}}s wget `date -d "$2" -u "$1"` -q --header\='From: bawolff+wnbots@**somewhere**.invalid' -O -|zgrep 'en\.n'|awk '-F ' '{print $3, $2}'|sort -g -r|head -n 20|awk 'BEGIN { TIMESTART = "'$(date -u -d '1 hour ago' +%H)':00" ;TIMEEND = "'$(date -u +%H)':00"; print "<noinclude>\n== Most Popular Last Hour ==\n</noinclude>{{#ifexpr: {{{count|40}}} > 0|{{#ifexpr: {{{count|41}}} > 1|{{#ifexpr: {{{count|41}}} > 2|{{#ifexpr: {{{count|41}}} > 3|{{#ifexpr: {{{count|41}}} > 4|{{#ifexpr: {{{count|41}}} > 5|{{#ifexpr: {{{count|41}}} > 6|{{#ifexpr: {{{count|41}}} > 7|{{#ifexpr: {{{count|41}}} > 8|{{#ifexpr: {{{count|41}}} > 9|{{#ifexpr: {{{count|41}}} > 10|{{#ifexpr: {{{count|41}}} > 11|{{#ifexpr: {{{count|41}}} > 12|{{#ifexpr: {{{count|41}}} > 13|{{#ifexpr: {{{count|41}}} > 14|{{#ifexpr: {{{count|41}}} > 15|{{#ifexpr: {{{count|41}}} > 16|{{#ifexpr: {{{count|41}}} > 17|{{#ifexpr: {{{count|41}}} > 18|{{#ifexpr: {{{count|41}}} > 19|"} {print "# [[:" gensub(/_/, " ", "g", $2) "]] {{#if:{{{nohits|}}}| |<small>('\'\'\''" $1 "'\'\'\'' hits last hour)</small>}}| }}"} END {print "* Total hits last hour: '\'\'\'$(./make_proj_stats.sh)\'\'\''<noinclude>\n\nThese statistics are generated by Domas'\''s [http://dammit.lt/wikistats/ Wikistats]<sup>['$( date -d "$2" -u "$1")']</sup>. They are based on number of visits to each page over the last hour. These statistics include all visits, both people and automated computer programs ones. Although these are probably reasonably accurate, they are easy to distort. Please note sometimes these statistics are updated on an irregular basis. This page was generated on ~~~~~ over the time period of " TIMESTART "-" TIMEEND " UTC.\n\n'\'\''For an extended list, see a [http://wikistics.falsikon.de/latest-daily/wikinews/en/ daily] or [http://wikistics.falsikon.de/latest/wikinews/en/ monthly] summary.'\'\''</noinclude>"}' } #at pagecounts/pagecounts-20080701-000000 |grep '^en\.n'|awk '-F ' '{print $3, $2}'|sort -g if check_if_there '+http://dammit.lt/wikistats/pagecounts-%Y%m%d-%H0000.gz' now then get_and_make '+http://dammit.lt/wikistats/pagecounts-%Y%m%d-%H0000.gz' now elif check_if_there '+http://dammit.lt/wikistats/pagecounts-%Y%m%d-%H0001.gz' now then get_and_make '+http://dammit.lt/wikistats/pagecounts-%Y%m%d-%H0001.gz' now elif check_if_there '+http://dammit.lt/wikistats/pagecounts-%Y%m%d-%H0000.gz' '1 hour ago' then get_and_make '+http://dammit.lt/wikistats/pagecounts-%Y%m%d-%H0000.gz' '1 hour ago' elif check_if_there '+http://dammit.lt/wikistats/pagecounts-%Y%m%d-%H0001.gz' '1 hour ago' then get_and_make '+http://dammit.lt/wikistats/pagecounts-%Y%m%d-%H0001.gz' '1 hour ago' else exit 2 fi
- wn_stats/make_proj_stats.sh
#!/bin/bash -- #helper script. not called directly #downloads statistics #figures out what is relevent to wikinews #output number of total hits to wikinews check_if_there () { #takes a date format string that equals url of stats. check http status code # returns 0 for 200, 1 for 404, and exits shell script for anything else isThere=$(HEAD -H 'From: bawolff+wnbots@**SOMEWHERE**' $(date -d "$2" -u "$1") | head -n 1 |cut -d ' ' -f 1) #echo $isThere is status if [ "$isThere" = 200 ] then #echo it is there return 0 elif [ "$isThere" = 404 ] then #echo not there `date -d "$2" -u "$1"` return 1 else exit 1 fi } get_and_make() { wget `date -d "$2" -u "$1"` --header\='From: bawolff+wnbots@**SOMEWHERE**' -q -O -|grep 'en\.n'|cut -d ' ' -f 3 } if check_if_there '+http://dammit.lt/wikistats/projectcounts-%Y%m%d-%H0000' now then get_and_make '+http://dammit.lt/wikistats/projectcounts-%Y%m%d-%H0000' now elif check_if_there '+http://dammit.lt/wikistats/projectcounts-%Y%m%d-%H0001' now then get_and_make '+http://dammit.lt/wikistats/projectcounts-%Y%m%d-%H0001' now elif check_if_there '+http://dammit.lt/wikistats/projectcounts-%Y%m%d-%H0000' '1 hour ago' then get_and_make '+http://dammit.lt/wikistats/projectcounts-%Y%m%d-%H0000' '1 hour ago' elif check_if_there '+http://dammit.lt/wikistats/projectcounts-%Y%m%d-%H0001' '1 hour ago' then get_and_make '+http://dammit.lt/wikistats/projectcounts-%Y%m%d-%H0001' '1 hour ago' else exit 2 fi
- pywikipedia/update_newpop.py
# -*- coding: utf-8 -*- """ This is a modified version of pywikipedias sandbot to update based on hardcoded file I no this is very ugly, but i really don't know python This bot cleans a sandbox by replacing the current contents with predefined text. This script understands the following command-line arguments: -hours:# Use this parameter if to make the script repeat itself after # hours. Hours can be defined as a decimal. 0.001 hours is one second. """ # # (C) Leogregianin, 2006 # (C) Wikipedian, 2006-2007 # (C) Andre Engels, 2007 # (C) Siebrand Mazeland, 2007 # # Distributed under the terms of the MIT license. # __version__ = '$Id: clean_sandbox.py 4402 2007-10-03 14:24:58Z leogregianin $' # import wikipedia import time f = open('../wn_stats/cur_stats.wiki', 'r') content = { 'en': unicode(f.read(), 'UTF-8'), } msg = { 'en': u'Robot: Updating Popular article list (over hour)', } sandboxTitle = { 'en': u'template:Popular_articles', } class SandboxBot: def __init__(self, hours, no_repeat): self.hours = hours self.no_repeat = no_repeat def run(self): mySite = wikipedia.getSite() while True: now = time.strftime("%d %b %Y %H:%M:%S (UTC)", time.gmtime()) localSandboxTitle = wikipedia.translate(mySite, sandboxTitle) sandboxPage = wikipedia.Page(mySite, localSandboxTitle) try: text = sandboxPage.get() translatedContent = wikipedia.translate(mySite, content) if text.strip() == translatedContent.strip(): wikipedia.output(u'No change!.') else: translatedMsg = wikipedia.translate(mySite, msg) sandboxPage.put(translatedContent, translatedMsg) except wikipedia.EditConflict: wikipedia.output(u'*** Loading again because of edit conflict.\n') if self.no_repeat: wikipedia.output(u'\nDone.') wikipedia.stopme() return else: wikipedia.output(u'\nSleeping %s hours, now %s' % (self.hours, now)) time.sleep(self.hours * 60 * 60) def main(): hours = 1 no_repeat = True for arg in wikipedia.handleArgs(): if arg.startswith('-hours:'): hours = float(arg[7:]) no_repeat = False else: wikipedia.showHelp('clean_sandbox') wikipedia.stopme() return bot = SandboxBot(hours, no_repeat) bot.run() if __name__ == "__main__": try: main() finally: wikipedia.stopme()
These are set to run 11 minutes past the hour, every hour my computer is on.
What it used to do
This is outdated, and no longer true.
Recent Popular articles
- Takes a list of the last 42 published article (~3-5 days), and intersects it with a list of the top 200 articles with the most hits between now and the start of the month.
- The length of the lists for published articles can be changed by admins by modifying the count on the DPL at User:Bawolff bot/recentPub
- If you wish to change the count, feel free, but leave me a note at user talk:bawolff
- If you want to change anything else about the DPL, please consult with me first to hopefully avoid breaking the bot
- If you want to change the top 200 popular article list length contact me. (Can be any number up to 5001)
- The resulting list is usually ~9 articles, and ordered by most popular. It can be found at {{Popular articles/recent}}
- If you wish to change the formating of that template, please tell em as the bot will override your changes.
Popular articles (over month)
- Takes a list of the last 15 (technically 16 including main page which is discarded) most popular articles (articles with most hits between now and the beginning of month).
- The length of this list can be changed to any value up to 5001. Leave me a note
- The list can be found at {{Popular articles}}
- If you wish to change the formating of that template, please tell me as the bot will override your changes.
- The list may be misleading as it is number of hits since now to begining of month, and stories are ussually popular for periods of less then a month
Misc. info
- Uses Leon's Wikicharts tool. This takes 1 out of 50 download of a wikinews page (not including downloads of images, css, javascript, but does include image description pages. basically any wikipage and any special page viewed directly), and reports it to wikicharts. Wikicharts then compiles a list of the pages with the most hits. (Since most people only care about articles, that is all this bot counts)
- This bot is likely to quite possibly be misleading, especially the recent popular list, as it limits the article list to the most recent 42, which are often very low on the popularity list (often #1 is really #30th most popular). This is it because it takes time to accumulate hits. In other words, USE STATS WITH CAUTION
- In addition the statistics tool this depends on also has a big warning label, so doubly use with caution.
- Users without javascript are not counted.
- The Wikicharts tool can be viewed on the toolserver
- This bot runs sporadically. (pop articles over month, if it runs it will be every 2 hours at zero minutes. Recent articles over a month, every 2 hours at 10 minutes past the hour). This bot will do nothing if there is nothing to do (i.e. the list is the same as last time checked). Note: May go for periods without checking when my computer is off.
- This bot is a combination of a shell script and a modified version of the pywikipediabot sandbox cleaner bot.
- Any question comments and questions please ask at user talk:Bawolff
