User:Amgine/Google News Sitemap
Google News Sitemap is an extension special page designed to provide an xml feed from a Mediawiki website, using categories, notcategories, and namespace as primary selection criteria. It was originally developed to provide en.Wikinews a sitemap feed for Google News, so its content may be distributed.
XML feed formats
[edit]Sitemap
[edit]- The Sitemap schema is a very basic listing of urls, with optional last modification and priority elements. The Google News extension includes the publication date and optional keywords elements. Currently all required elements, last modification, and keywords are supports; priority schema ideas are being actively solicited. See planned improvements.
RSS
[edit]- An RSS 2.0 compliant feed. The page uses the Mediawiki classes to provide the feed, which is very robust.
Atom
[edit]- An RFC 4207 (December 2005) compliant Atom feed. The page uses the Mediawiki classes to provide the feed, which is very robust.
URL parameters
[edit]Parameters are provided in the url:
http://domain/wiki/Special:SpecialGNSM/[parameter=value][¶meter=value][...]
Parameters determine which articles will be found/returned, the order in which they are sorted, and which feed format will be used.
category
[edit]http://domain/wiki/Special:SpecialGNSM/category=Published
Selects only articles which are members of the category value. Up to six (configurable) categories and notcategories may be provided; current behavior is to ignore >6 categories or notcategories. For multi-word categories replace spaces with _, eg category=Science_and_Technology.
Options: string value
Default value = Published
count
[edit]http://domain/wiki/Special:SpecialGNSM/count=10
Returns no more than the count value articles. Note the configurable maximum value may not be exceeded, and the count may not be less than the configurable minimum.
Options: integer value
Default value = 50 (maximum)
days
[edit]http://domain/wiki/Special:SpecialGNSM/days=7[...]
Limit the feed to articles added to the category in the past X days (in seconds). Only available for Sitemap feeds at the moment.
Options: integer
Default value = 3
feed
[edit]http://domain/wiki/Special:SpecialGNSM/feed=[rss/atom/sitemap][...]
Produces different standard formats of feed.
Options: sitemap||atom||rss
Default value = atom.
namespace
[edit]http://domain/wiki/Special:SpecialGNSM/namespace=String_value
http://domain/wiki/Special:SpecialGNSM/namespace=3
Selects only articles which are in the named/number namespace. If this parameter is present more than one time, only the last will be used.
Options: integer value || string value
Default value = null
notcategory
[edit]http://domain/wiki/Special:SpecialGNSM/notcategory=Unpublished
Selects only articles which are not members of the notcategory value. Up to six (configurable) notcategories and categories may be provided; current behavior is to ignore >6 notcategories and categories. For multi-word notcategories replace spaces with _, eg notcategory=Science_and_Technology.
Options: string value
Default value = null
order
[edit]http://domain/wiki/Special:SpecialGNSM/order=[descending/ascending]
Sorts returns in either ascending or descending order, based on the ordermethod.
Options: ascending || descending
Default value: descending
ordermethod
[edit]http://domain/wiki/Special:SpecialGNSM/ordermethod=[lastedit/qualitypages/categoryadd]
Returns the found articles sorted by when they were last edited, the qualitypage rating (using Flagged Revisions) or by the timestamp when they were first added to the first (or default) category.
Options: lastedit || qualitypages || categoryadd
Default value = categoryadd
qualitypages
[edit]http://domain/wiki/Special:SpecialGNSM/qualitypages=[only/include/exclude]
If the extension Flagged Revisions is installed, will exclude, return only, or ignore whether an article's quality rating is >1.
Options: include || only || exclude
Default value: null
redirects
[edit]http://domain/wiki/Special:SpecialGNSM/redirects=[exclude/include/only]
Excludes, return only, or ignores whether an article is a redirect.
Options: include || only || exclude
Default value: exclude
stablepages
[edit]http://domain/wiki/Special:SpecialGNSM/stablepages=[only/exclude/include]
If the extension Flagged Revisions is installed, will exclude, return only, or ignore whether an article has an article's stable revision.
Options: include || only || exclude
Default value: only
Planned improvements
[edit]AKA ToDo List.
Determine last modification date (Sitemap feed)31 Oct 2009Determine keywords from all category memberships (Sitemap feed)31 Oct 2009- Filter category members
- - dates
- - Published
- Filter category members
- Develop priority criteria and implement (Sitemap feed)
- Age of article?
- Additional ordermethod?
Develop qualitypages as an ordermethod02 Nov 2009- Develop curid urls
Remove useNamespace DPL cruft- Graceful error fails
When category is empty, close root xml element. (rprtd 2 Nov 2009 Bawolff)tentatively mrkd fixed 2 Nov 2009Graceful fail when category param present but empty (clean up debug code) (rprtd 2 Nov 2009 Bawolff)tentatively mrkd fixed 2 Nov 2009Both the above errors probably also apply to notcategorytentatively mrkd fixed 2 Nov 2009
Add GN bool param to limit ts > ( ts_now - 3 days ) [high priority]01 Nov 2009Make it configurable.01 Nov 2009Make it actually work.02 Nov 2009
Default author = $wgSitename02 Nov 2009- Normalize feedSMItem to feedItem parameters, so $wgFeedClasses can be implemented. 02 Nov 2009
- Not sure this is worth the effort as the feeds are quite different.