Wikinews:Categories

From Wikinews, the free news source you can write!
Jump to navigation Jump to search

Wikinews categories help users find articles; and coordinate project infrastructure, especially article-production.

Using categories[edit]

Much of this essay discusses different kinds of categories, how to set them up and when to put articles in them. To understand those things, though, one should first understand how categories are used.

DPLs[edit]

The primary tool we use for viewing selected sets of articles is dynamic page listsDPLs. All of our lists of recent articles are DPLs, such as those on infoboxes, on our Main Page and Newsroom, and on most of our categories. The dynamic page list extension to the software platform was designed for Wikinews, and has been widely used throughout the wikimedia sisterhood, though it is, still, especially intensively used here.

A DPL lists pages in an intersection of categories and/or inverted categories; that is, pages that belong to all of certain specified categories, and to none of certain other specified categories. There are two main technical limitations.

  • Only an intersection of categories (and/or inverted categories) can be listed. There is no way to list a union of categories.
  • Only a few categories and inverted categories can be specified for a given list. At this writing, the effective limit is three categories/inverted-categories; internally the DPL extension allows six, but most DPLs here only show published articles, and that in itself uses up three of the six (more on that below).

There is also a technical limit on the list length, but it doesn't usually matter because typically no more than ten pages at a time are listed, and the length limit is much larger (at this writing, 200). A single-category DPL can also be scrolled through ten-at-a-time using a dialog-based device; see {{topic cat/latest}}.

A more obscure technical limitation of DPLs concerns the efficiency, or lack thereof, of the implementation of the DPL wiki extension; this precipitated a major practical crisis for Russian Wikinews in September 2020. The intersection operation performed by a DPL, as originally implemented and, at this writing, still in place, is at its least efficient when all of the intersected categories are large; and although the server caches wiki pages, so that it doesn't have to recompute this intersection each time the page is viewed, creating a very large number of pages with expensive DPLs in a short time can create a tremendous server load. This combination of circumstances happened on Russian Wikinews, and the devs eliminated the immediate problem by shutting off the DPL extension for Russian Wikinews. (See phab:T262391.)

String searches[edit]

We use string searches in our archives mainly when first hunting for candidate articles to populate a new category. Compared to DPLs, string searches have some significant disadvantages.

  • The results aren't ordered by date of publication.
  • Each search match comes with a bulky phrase context, in contrast to DPL entries which are typically single-line headlines (with or without a date).
  • The results are based on detecting words in articles rather than on the meaning of the content, so the search is less focused than that of a DPL. This is why each search match comes with phrase context: DPLs already have contextual understanding built into them, through the decisions that populated the categories.

An additional drawback of ordinary string searches is that —unless explicitly overridden— they search the final appearance of the article pages; in particular, anything that appears in a DPL on a page can match the search. Thus, in particular, if the searched keyword occurs in a recently published article that appears in the infoboxes of many articles, the search will match all of those articles. There is currently (at this writing) a way to override this behavior: prefixing the keyword with insource: causes searching of the wiki markup rather than the displayed page.

It is also possible (at this writing) to qualify a search with membership in a category, by prefixing the category name with incategory:.

Publication[edit]

As part of the publication process, the article is filed in Category:Published; an article is not considered published if it doesn't belong to that category, and is also not considered published if it does belong to either Category:No publish or Category:Disputed. (This use of Category:Disputed is a legacy from the early years of Wikinews, before review; in modern practice there is ordinarily no use of Category:Disputed on published articles.)

Wikilinks and redirects[edit]

Redirects from mainspace play a vital role in orchestrating our categories and our categorization of articles, mediated by the {{w}} template.

Wikilinks in the body of an article are used to link a keyword (or key phrase) to supplemental information about it. Don't use wikilinks for basic information: a news article should stand on its own, understandable without consulting some other page along the way. Use local links when available. Links explicitly to other projects are common in image credits and may appear in Sister links sections, but are otherwise almost never used barring some news about wikimedia; and we never put external (i.e., non-wikimedia) links in the body of an article. Wikilinks in the body of an article are ordinarily (or, at least, generally) set up initially using the {{w}} template, which —amongst other things— uses a local target if it exists, and only links to Wikipedia (or to another project, specified via parameter sister) if the target does not exist locally. The destination of these wikilinks is therefore regulated by our local mainspace redirects.

Almost all, at this writing, of our local mainspace redirects are to categories that articles may belong to (discussed below in section Topic cats). Wikiinks are therefore expected to follow a natural progression:

  1. If there is no local target when the wikilink is first written, it links to Wikipedia (or to some other project, if parameter sister is specified). The {{w}} template puts the page on which this occurs in hidden Category:Pages with defaulting non-local links.
  2. When a local redirect becomes available, the link becomes local and {{w}} puts the page in hidden Category:Pages with categorizable local links.
  3. Eventually, someone (likely an admin) considers the local link, to decide whether it leads to a category that the article does not belong to but should. If so, they add the article to that category. Once the article belongs to the category, or they have determined it shouldn't belong to the category, they convert the {{w}} call to a hard local link (i.e., using square brackets).

Pages containing local links via {{w}} are located through Category:Pages with categorizable local links; to visually identify which wikilinks on the page are local, there's a gadget.

The more mainspace redirects we have (thus, by implication, the more categories we have), the higher the proportion of local wikilinks in our articles; before the modern {{w}} was set up, most of the wikilinks in our articles were to Wikipedia and it made the project feel less independent, whereas the local proportion for new articles is usually much higher now.

Once a category is created, our ability to keep it well-populated —by adding new articles to it whenever they're relevant to it— depends largely on the existence of a set of mainspace redirects to the category such that whenever an article is relevant to the category, it will naturally tend to contain a wikilink on one of those keywords. If we're considering creating a category, but it doesn't have an associated set of keywords that will likely keep it populated, that may be grounds not to create it.

Date cats[edit]

Date categories have names that are fully specified dates formatted per WN:DATE (e.g., Category:February 6, 2013). Each article should be filed in a date category via template {{date}}; relative dates in the article should be relative to the date specified via {{date}}. When an article is published, the review gadget fixes the {{date}} template to the publication date, and it should remain fixed at that date permanently thereafter.

The content of each date category page is set up by template {{datecategory}}, which also provides assistance for setting up new date categories and some related infrastructure (the corresponding date pages in project space). Assistance is offered at need to set up the following day's date category if the following day's month is no more than one month after the current month (thus, e.g., in September of 2019, assistance would be made available for setting up any later non-existent date categories up to, but not beyond, October 31, 2019).

Topic cats[edit]

Most non-date categories on Wikinews are used to file articles according to what the articles are about. These are commonly called topic categories.

When considering whether an article is sufficiently related to a topic to justify filing it in the category, the underlying question —thus, starting point for further deliberation— should be, if you were researching the topic in our archives, would you want that article to be included in the set of articles provided on your query.

In naming topic categories, when faced with multiple things sharing the same name we prefer to qualify them all equally, rather than assigning the unqualified name to one alternative deemed in-some-sense "most important". Choosing one as most-important is a subjective decision and apt to promote bias. Typical cases are Category:Georgia (country) vs Category:Georgia (U.S. state), and Category:Tripoli, Libya vs Category:Tripoli, Lebanon. In such situations, the unqualified term should be given a mainspace disambiguation page rather than a mainspace redirect.

Topic category pages should use template {{topic cat}}. There's a pattern in the template documentation you can copy-and-paste for a first approximation to the content; then one hunts around on each sister for an appropriate target page.

Geocats[edit]

A large class of topic categories are geographical areas; geocats for short. These are arranged in a containment hierarchy. The root (that is, top) of the hierarchy is Category:News articles by region. Most of its children (that is, immediate subcategories) are news regions, informally called "continents", which partition much of the surface of the globe.

Usually, places mentioned in the article lede —as answers to where— are central enough to the story to warrant categorization. One might not categorize a mention of the specific meeting place of a body whose rulings apply to a larger area, when the specific meeting place doesn't otherwise impinge on the story; for example, an article about a ruling by the US Supreme Court would not necessarily be filed under Washington, D.C. even though that's where the court is located; though it would be filed under United States. Sometimes when a politician mentions something in a speech, the place where they gave the speech doesn't really matter, especially if their remark isn't the focal event of the article. On the other hand, sometimes a group such as a sports team is so strongly associated with a place that articles relevant to the team should be filed under the place even when they play an away game.

When an article is put into a geocat, it should usually also be put into all the geocats that contain that one (with a notable exception for oceans; more on that in a moment). Each parent geocat is therefore the union of its children, plus some articles that belong specifically to the parent; for example, Category:United States would be the union of all the US state-and-territory categories plus some other articles that don't belong to any particular state or territory. There is no ready means to extract a list of pages that relate specifically to the US as a whole. However, as noted earlier, DPLs do not support union; thus, given the choice between making unions (an essential operation) impossible, or inconveniencing searches for articles about a large area as-a-whole, Wikinews chooses inconvenience over impossibility. (We try to avoid complicated arrangements, as news production needs to be relentlessly streamlined.)

The geocats themselves are only filed in their immediate parent, not in all their ancestors. Thus, for example, Category:Brisbane belongs to Category:Queensland, which belongs to Category:Australia, which belongs to Category:Oceania, but Brisbane doesn't belong directly to Australia, and neither Brisbane nor Queensland belongs directly to Oceania. This prevents hopeless cluttering of large geocats with huge numbers of much smaller geocats — as if, say, Category:United States directly listed as subcategories all supported US cities.

Although many of the news regions have the names of continents, they sometimes deviate a bit from the geographical boundaries of the continents. Sometimes this happens for cultural or political reasons; for example, Category:Hawaii is placed under Category:US states and territories even though it is both culturally and geographically part of Polynesia (a subregion of Oceania). A major deviation from geographical continent boundaries is region Middle East.

Three immediate subcats of Category:News articles by region, at this writing, are not actually regions:

  • Category:Space covers everything beyond the Earth's atmosphere.
  • Category:Oceans covers most of the water area of the globe; however, it usually does not include news about islands in the midst of the oceans. For example, Comoros is an island in the Indian Ocean but belongs to the African news region.
  • Category:World does not have a single clear definition. It isn't the ancestor of all geocats —that would be News articles by region— nor is is it the ancestor of all geocats except Space. It has been used differently by different contributors over the years, and there has not been general agreement on how to use it or what should replace it.

People[edit]

At this writing there are over 700 person categories. Each person category belongs to Category:News articles by person, and either to one or more of the "occupational" categories at Category:People by occupation, or to Category:People not categorized by occupation. Those three categories, and the occupation categories under the second, are internal categories, i.e., articles don't belong to them directly (discussed further below).

If an article says a person did something, and that something is part of the news at the time of the article's publication, the article should be categorized under that person. This notably includes saying something; if somebody died, say, and we report that some politician made a non-vacuous statement about the death, the article should be filed under that politician. Don't categorize for an historical mention of somebody; but do categorize for action that directly pertains to the person's past actions — such as repealing some legislation they were responsible for.

When setting up a person category, in addition to {{topic cat}}, Category:News articles by person, and occupation cats (or the no-occupation cat), we are currently (at this writing) also setting up a DEFAULTSORT magic word for each person, in which their names are rotated to put the family name at the front, for alphabetical listings in categories. (This is a rare exception to the general principle that we avoid DEFAULTSORT.) Don't use a comma after the family name. For example, Barack Obama has {{DEFAULTSORT:Obama Barack}}. Western names simply move the last name to the front. Other cultures have different rules; e.g., Spanish names work differently, and Portuguese different again; so, some care is warranted. There has been some discussion of introducing some alternative approach to alphabetization that avoids the magic word, but no such definite proposal has been put forward as of this writing.

Other topics[edit]

People and places are the two largest groupings of concrete topic categories. There are also institutions, teams, musical groups, various kinds of animals, specific events such as conferences or elections. But then there are also more abstract topics. Here are some factors to keep in mind.

When one topic cat belongs to another topic cat, sometimes this indicates that all articles in the child category should also belong to the parent category, but this is not always so. Some caution is warranted, as there is no formal marking, nor simple absolute rule, on which category relations are cumulative versus which are not. Geocats are the archetype for a cumulative category relationship (though, at this writing, there's at least one murky case we're struggling with, to do with Category:Russia vs. news regions Asia and Europe). Category relations representing simple classification-by-containment are usually cumulative, e.g. all articles in Category:Infectious disease also belong in Category:Disease. Organizational containment is usually of this sort; e.g., when an article is related to an organ of the United Nations, the article also goes in Category:United Nations.

On the other hand, more free-form category relations sometimes are suggestive rather than cumulative. There has not been full agreement on when these sorts of suggestive category-relations should be used. As of this writing, Category:Conflict-of-interest editing on Wikipedia belongs suggestively to Category:Science and technology and Category:Politics and conflicts. Person categories have sometimes been placed directly into topic categories, but most of these have since been eliminated by setting up non-topic occupation categories, such as Category:Musicians which contains people and belongs to Category:Music (thus uncluttering Category:Music, which used to directly contain many individual person categories). Person categories are thus usually not cumulative. When a category relation is particularly prone to confusion, it may be appropriate to clarify with a usage note on the child category (though best of all, when possible, would be to arrange the category hierarchy to avoid the confusion).

A news topic name for things of a class is usually plural, where the corresponding Wikipedia article would be singular; e.g., Wikinews Category:Whales versus Wikipedia article whale. For abstract topic names, beware of non-neutrality through phrase bias; e.g., no Category:Terrorism nor Category:Islamophobia. A paradigmatic case is Category:Gun politics, which avoids attitudinal bias of a name facing either side of the issue (whereas "Gun rights" or "Gun control" would frame the issue in terms of, respectively, permission or restriction); such solutions are not always readily forthcoming, as with English Wikinews Category:Free speech and Category:Freedom of the press where some other languages' Wikinews projects have instead a category focusing on censorship. Abstract topics are where availability of suitable redirects (mentioned earlier) become especially important, as it is —usually— evident when a concrete topic, such as a person or place, is referenced.

Internal cats[edit]

If a category isn't used for articles and isn't used for project pages, it's internal. Internal category pages should be tagged with template {{Internal cat}}, which displays a hatnote and puts the category in Category:Internal Wikinews organization. (Category:Internal Wikinews organization is, of course, itself an internal category, and therefore belongs to itself; somewhat idiosyncratically, although one might expect Category:Category to belong to itself since it is a category, it does not — but Category:Category is certainly internal, and accordingly belongs to Category:Internal Wikinews organization, making the latter essentially the root of the project category hierarchy.)

Most internal categories are not targets for mainspace redirects. In particular, an internal category should not be a mainspace redirect target if it is adjunct to a particular topic cat — as happens often with occupation categories, and in a few other cases at this writing, such as Category:US states and territories or Category:People from the United States. However, in some cases an abstract status such as a political office, while not meeting the three-article bound for a topic cat, may have an internal category grouping categories. It then makes sense to target a mainspace redirect to the internal category, so that {{w}} will link locally. All internal categories targeted by mainspace redirects should use template {{topic cat}}, which is designed with the flexibility to handle this unusual case; use parameter nolist, and parameter usage note ending with a call to {{internal cat}} — see e.g. Category:Presidents of the United States.

Wikidata[edit]

English Wikinews does not remove local interwiki markup when the information is also represented on Wikidata (though it does support the efforts of its sister project Wikidata to include such information). If there is a discrepancy between the two, there should be a good reason for it. At this writing, the Wikidata infrastructure favors maximal separation of items, while at most one such item can link to a given page on another sister, so that in some cases Wikidata automatically generates far fewer interwikis than are appropriate; this may happen especially when projects differ on how to decompose concepts (e.g., English Wikinews Category:Guantanamo Bay, which combines the topics of at least three separate Wikipedia articles; or the complex contrast, mentioned earlier, in different-language Wikinews‍‍'s categorization of free speech/expression/press and censorship).

A Wikinews topic category should be linked from (and to, via {{topic cat}}) the Wikidata item of the corresponding Wikipedia article, not to the associated Wikipedia category. (E.g., our Category:France links from Wikidata item Q142 corresponding to w:France, rather than Q8249 corresponding to w:Category:France.) Typically, corresponding categories on other-language Wikinews projects are linked from the same Wikidata item. As of this writing, Arabic Wikinews has, in some cases, pages corresponding to both Wikipedia's article and category, while Swedish Wikinews has pages corresponding to both Wikipedia's article and portal.