Jump to content

User:Michael.C.Wright/TranslationAnalysis

From Wikinews, the free news source you can write!

In response to this thread: Wikinews:Water_cooler/policy#Update_of_license, I am using Wikidata in an attempt to get an idea of how many en.wikinews articles have been translated.

SPARQL queries

[edit]

I'm new to SPARQL queries and am using ChatGPT to help. Therefore there is plenty of room for error. Take this with a healthy grain of salt.

The queries center on sitelinks, the URL of which indicates the language of the Wikinews project . For example, a siteLink that contains both "https://en.wikinews.org/" and "https://fr.wikinews.org/" are assumed to be translated from English to French. There are (at least) two three problems with this:

  1. It assumes English → French translation when it could be the other way around.
  2. It will not catch translated articles that have not had wikidata added in this manner.
  3. It assumes translation has occurred, when it could be two articles in two different languages about the same event encoded by wikidata. (This makes some of the assumptions below questionable.)

I exclude internal pages that contain the following in their titles:

  • Template:
  • Category:
  • Portal:
  • Page:

Note: The table entry for English + ≥ One other language will not exactly equal the sum of all other languages, as many articles are translated into more than one language. For example, Wikidata item Q120920258 indicates the article exists in three different languages. I am exploring how to unpack that.

Wikinews in multiple languages
Values as of February 24, 2024
Article count SPARQL query
Total English articles 22910 [1]
English + ≥ One other language 4425 [2]
English + French 1581 [3]
English + Spanish 1041 [4]
English + Portuguese 1016 [5]
English + Deutsch 838 [6]
English + Chinese 710 [7]
English + Polish 586 [8]
English + Italian 556 [9]
English + Dutch 310 [10]
English + Czech 288 [11]
English + Serbian 265 [12]
English + Persian 127 [13]
English + Arabic 121 [14]

Checking the numbers

[edit]

Below is an attempt to sanity-check the SPARQL results above.

Number of articles in Category:Published: 21,984
Number of articles in Category:Translated_news: 269

The fact that the number of articles in Category:Published is lower than the number of articles found by the SPARQL query could be at least partially be attributed to pre-review articles being translated and published into other languages and then failing publication in English, as was the case with Wikidata item Q124460563.

The difference between the two, 988 articles, is not insignificant. It is 4.5% of the total articles in Category:Published. Maybe a way to look at that is as a margin of error of ± 4.5%.

That would mean that according to the SPARQL queries above, we can estimate that between 14.8% and 23.8% of all English articles also exist in at least one other language and the licenses between all versions of each article will have to align.

I posted a request at Wikidata[15] to have someone take a look at the SPARQL queries but there has been no takers yet(See section below).

Using Wikidata to determine source language of a translation

[edit]

I was hoping that we could use Wikidata to indicate an original version of an article versus translated versions. However, the recommendation given at Wikidata's "Request a query" page was that the method overly complicates Wikidata items we create along with a published article.

However, the queries above were said to be fine.

Translated news category + SPARQL query

[edit]

Articles that are translated to English and published on en.Wikinews are added to Category:Translated_news when published. That can help us to infer the number of articles that are translated from English:

For example; if 4,425 articles have Wikidata that indicate English + ≥ one other language and there are 261 articles in en.Wikinews that state they are translations, we can infer that 4,164 articles were translated from English.

If all articles published in all languages accurately use Wikidata, then we can say that 94% of all translations use English as a source language.

Conclusion

[edit]

Wikidata would be a way to indicate translations of Wikinews articles. However, to do so may be an inappropriate and overly complex use of Wikidata and was discouraged by one editor at Wikidata.

However, we can get a rough estimate of the number of articles that exist in more than one language using Wikidata. We can then compare that to the number of articles known to be translations in en.Wikinews. The difference between the number of articles that exist as English + ≥ one other language minus the number of articles known to be translations in en.Wikinews indicates the number of articles translated from English.

As of February 24, 2024;

  • Wikidata indicates that there are 22,910 English articles with a corresponding Wikidata item
  • Wikidata indicates that there are 4,425 articles that exist as English + ≥ one other language
  • Category:Translated_news contains 261 articles
  • Category:Published contains 21,922 articles

Therefore we can estimate that;

  • 4,164 articles have been translated from English to at least one other language
  • between 14.8% - 23.8% of all English articles also exist in at least one other language