User:Amgine/Google News Sitemap

From Wikinews, the free news source you can write!
Jump to navigation Jump to search

Google News Sitemap is an extension special page designed to provide an xml feed from a Mediawiki website, using categories, notcategories, and namespace as primary selection criteria. It was originally developed to provide en.Wikinews a sitemap feed for Google News, so its content may be distributed.

XML feed formats[edit]

Sitemap[edit]

The Sitemap schema is a very basic listing of urls, with optional last modification and priority elements. The Google News extension includes the publication date and optional keywords elements. Currently all required elements, last modification, and keywords are supports; priority schema ideas are being actively solicited. See planned improvements.

RSS[edit]

An RSS 2.0 compliant feed. The page uses the Mediawiki classes to provide the feed, which is very robust.

Atom[edit]

An RFC 4207 (December 2005) compliant Atom feed. The page uses the Mediawiki classes to provide the feed, which is very robust.

URL parameters[edit]

Parameters are provided in the url:

http://domain/wiki/Special:SpecialGNSM/[parameter=value][&parameter=value][...]

Parameters determine which articles will be found/returned, the order in which they are sorted, and which feed format will be used.


category[edit]

http://domain/wiki/Special:SpecialGNSM/category=Published

Selects only articles which are members of the category value. Up to six (configurable) categories and notcategories may be provided; current behavior is to ignore >6 categories or notcategories. For multi-word categories replace spaces with _, eg category=Science_and_Technology.

Options: string value

Default value = Published

count[edit]

http://domain/wiki/Special:SpecialGNSM/count=10

Returns no more than the count value articles. Note the configurable maximum value may not be exceeded, and the count may not be less than the configurable minimum.

Options: integer value

Default value = 50 (maximum)

days[edit]

http://domain/wiki/Special:SpecialGNSM/days=7[...]

Limit the feed to articles added to the category in the past X days (in seconds). Only available for Sitemap feeds at the moment.

Options: integer

Default value = 3

feed[edit]

http://domain/wiki/Special:SpecialGNSM/feed=[rss/atom/sitemap][...]

Produces different standard formats of feed.

Options: sitemap||atom||rss

Default value = atom.

namespace[edit]

http://domain/wiki/Special:SpecialGNSM/namespace=String_value
http://domain/wiki/Special:SpecialGNSM/namespace=3

Selects only articles which are in the named/number namespace. If this parameter is present more than one time, only the last will be used.

Options: integer value || string value

Default value = null

notcategory[edit]

http://domain/wiki/Special:SpecialGNSM/notcategory=Unpublished

Selects only articles which are not members of the notcategory value. Up to six (configurable) notcategories and categories may be provided; current behavior is to ignore >6 notcategories and categories. For multi-word notcategories replace spaces with _, eg notcategory=Science_and_Technology.

Options: string value

Default value = null

order[edit]

http://domain/wiki/Special:SpecialGNSM/order=[descending/ascending]

Sorts returns in either ascending or descending order, based on the ordermethod.

Options: ascending || descending

Default value: descending

ordermethod[edit]

http://domain/wiki/Special:SpecialGNSM/ordermethod=[lastedit/qualitypages/categoryadd]

Returns the found articles sorted by when they were last edited, the qualitypage rating (using Flagged Revisions) or by the timestamp when they were first added to the first (or default) category.

Options: lastedit || qualitypages || categoryadd

Default value = categoryadd

qualitypages[edit]

http://domain/wiki/Special:SpecialGNSM/qualitypages=[only/include/exclude]

If the extension Flagged Revisions is installed, will exclude, return only, or ignore whether an article's quality rating is >1.

Options: include || only || exclude

Default value: null

redirects[edit]

http://domain/wiki/Special:SpecialGNSM/redirects=[exclude/include/only]

Excludes, return only, or ignores whether an article is a redirect.

Options: include || only || exclude

Default value: exclude

stablepages[edit]

http://domain/wiki/Special:SpecialGNSM/stablepages=[only/exclude/include]

If the extension Flagged Revisions is installed, will exclude, return only, or ignore whether an article has an article's stable revision.

Options: include || only || exclude

Default value: only

Planned improvements[edit]

AKA ToDo List.

  • Determine last modification date (Sitemap feed) 31 Oct 2009
  • Determine keywords from all category memberships (Sitemap feed) 31 Oct 2009
    • Filter category members
      • - dates
      • - Published
  • Develop priority criteria and implement (Sitemap feed)
    • Age of article?
    • Additional ordermethod?
  • Develop qualitypages as an ordermethod 02 Nov 2009
  • Develop curid urls
  • Remove useNamespace DPL cruft
  • Graceful error fails
    • When category is empty, close root xml element. (rprtd 2 Nov 2009 Bawolff) tentatively mrkd fixed 2 Nov 2009
    • Graceful fail when category param present but empty (clean up debug code) (rprtd 2 Nov 2009 Bawolff) tentatively mrkd fixed 2 Nov 2009
    • Both the above errors probably also apply to notcategory tentatively mrkd fixed 2 Nov 2009
  • Add GN bool param to limit ts > ( ts_now - 3 days ) [high priority] 01 Nov 2009
    • Make it configurable. 01 Nov 2009
    • Make it actually work. 02 Nov 2009
  • Default author = $wgSitename 02 Nov 2009
  • Normalize feedSMItem to feedItem parameters, so $wgFeedClasses can be implemented. 02 Nov 2009
    • Not sure this is worth the effort as the feeds are quite different.