User:IlyaHaykinson/On Bots and Checkers
This page includes some of my thoughts on the future of bots and checkers.
In the last few months, User:David Vasquez and I have been collaborating on WeatherChecker. I mostly wrote some of the framework code, and David has really taken charge of making the application do all of the graphics work.
One item that remained in front of us was the automation of posting of the WeatherChecker output image and text files to Wikinews. For the longest time, this was being done by David, manually.
At some point I developed the initial implementation of Wikinewsbot, a .NET (and Mono) scriptable framework for automatic image upload, page retrieval, modification and posting. This framework was immediately incorporated by David into the WeatherChecker, thus no longer requiring him to manually post images etc, and allowing more regular updates of the weather.
In order to test the bot framework, I've also developed the Market Data Bot. With some input from David and other users, I've made the bot automatically pull stock market information from Yahoo Finance, chart this information and upload regular updates to a test wiki on my server.
Issues at hand
David brought up a number of good points regarding the use of bots versus server-side controls or "special pages", the naming of applications (__Checker vs ___Bot), proposals for new bots, and the specific functionality of bots. Here are my views on how things could be structured.
We are not big enough to let developers incorporate our code easily. Until we are big, or one of us can be a core Mediawiki developer, we will not get functionality built into our Mediawiki software that will let us do what we want to do. The best we can hope for is that some processes will be run on the Linux servers that run Wikinews.
Therefore, to a large degree we must depend on bots. A side-effect is that information provided will not be interactive, will always be more out of date than it would have been with server-side code. Another side-effect is that we run the risk of filling Wikimedia Commons with too many images, and cluttering up Recent Changes with bots' edits.
This leads us to the Principle of Limited Information: the data we should provide via bot solutions should be minimal and of the greatest appeal.
Applications and naming
In the bot world, at the end of the day information must be uploaded on a regular basis to be useful. At the same time, the process is driven by humans, and as such should be somewhat user-friendly.
I propose that we have two levels of applications.
- configurators, which let humans come up with definitions for content that should be retrieved, as well as configure places for this content's upload, and
- bots, which take config files produced by the configurators and act on them, constantly retrieving data and uploading it to Wikinews
Perhaps the configurators can be "Checkers" by name.
The bots should run as unattended Windows services. Indeed, unless there's reason to do otherwise, they should be runnable in some form on Linux using the Mono framework, in order to let Windows-less people run the bots. This is a goal, not a requirement. However, this means that the architecture for the bots should allow completely unattended, UI-less operation.
The configurators should, for now, be Windows applications. Their output should be enough information for the bots to be able to run unattended.
The bots should, for now, have custom code that reads in configuration files and acts on them. In the future we might develop a generic bot engine that can run bots that are wholly based in configuration files (some portions of which would be generated by the configurators)
Specific bot: WeatherChecker
For now the WeatherChecker should remain as-is. It's a great application. Eventually, it would be nice to split up the automatic weather checking and updating Wikinews from the process of creating regions and selecting stations.
Specific bot: MarketDataBot
Currently, the market bot downloads the latest and historical information about the top world indexes, plotting daily historical information on charts and posting those charts every 6 hours to the Commons, and then updating Wikinews with the most recent current-value data (note: this process is in testing on my local wiki).
David expressed an interest in having the market data collection be a lot more extensive. He suggested having the bot update the homepage and/or sub-pages with status on a few indexes (Dow Jones, NASDAQ, etc) and a few specific stocks that are index components.
Instead, I suggest continuing with what the bot currently does. Selecting just the US indexes is too US-centric. Showing component securities is a bit redundant (given that the index value reflects the aggregate movement of the components). Indeed, showing individual stocks means picking some set over others.
I thus mainly object to David's suggestions out of adherence to the Principle of Limited Information — we don't have a lot of room to post every stock while we have the technical limitation of running as a bot. I would love to have all the data we can possibly have, but if picking just a reasonable subset, I'd rather post the major international indexes and leave the rest to some commercial site, for now. When we get our server-side controls, we could do more.
David suggests a few other ideas: sports, entertainment. I think these are great ideas — and ones that could perhaps be developed with the dual configurator/bot framework. I would imagine the first thing to do there would be to get a reliable source of data identified.
I am ready to unleash the (current version of) MarketDataBot on Wikinews. This bot makes lots of edits, though, so I would like it to get marked as a bot so as to not clutter the Recent Changes.
This is written here in User space to reflect personal views on the matter, but I would like to encourage others (David, especially) to chime in on the talk page as a step to forming consensus and a Wikinews-wide bot policy that we can propose.
(originally written: 09:43, 14 Mar 2005 (UTC))