News | International
16 Apr 2025 20:01
NZCity News
NZCity CalculatorReturn to NZCity

  • Start Page
  • Personalise
  • Sport
  • Weather
  • Finance
  • Shopping
  • Jobs
  • Horoscopes
  • Lotto Results
  • Photo Gallery
  • Site Gallery
  • TVNow
  • Dating
  • SearchNZ
  • NZSearch
  • Crime.co.nz
  • RugbyLeague
  • Make Home
  • About NZCity
  • Contact NZCity
  • Your Privacy
  • Advertising
  • Login
  • Join for Free

  •   Home > News > International

    Wikipedia's largest non-English version was created by a bot. Generative AI poses new problems

    A single bot generated and published millions of articles on the largest non-English version of Wikipedia. The results caused a rift among editors — and a glimpse of what the online encyclopedia might face from AI.


    With nearly 7 million articles, the English-language edition of Wikipedia is by many measures the largest encyclopedia in the world.

    The second-largest edition of Wikipedia boasts just over 6 million articles. It isn't French, or Spanish, or Chinese Wikipedia.

    It's Cebuano: a language spoken mostly in the southern Philippines.

    But Cebuano Wikipedia didn't grow with the help of thousands of volunteer editors, as its English counterpart did. Most of the articles come from one person: Swedish linguist Sverker Johansson.

    Dr Johansson designed a program, dubbed "lsjbot", which generated millions of articles in several languages, but particularly Cebuano.

    It also laid bare a debate which Wikipedia has been grappling with since its inception, and which artificial intelligence (AI) is making ever more pressing.

    How lsjbot 'writes' articles

    Programs that automate parts of Wikipedia are nearly as old as the website itself.

    These bots crawl the site, doing jobs such as fixing dead links, but many generate articles only a sentence or two long.

    It was these article-producing bots that Dr Johansson encountered in the early 2010s, when he was writing and editing articles himself.

    "I started thinking: I can do that. I can do better," he says.

    Lsjbot generates articles by taking information from online databases, mostly on biology and geography, and fitting the data into a set number of pre-written sentences.

    "The core language model is a few hundred sentence templates, and then the bot will check which information is available," Dr Johansson says.

    An article about an animal, for instance, might start with the sentence "The X is a Y that belongs to the Z family", with lsjbot filling in the blanks: lion, mammal, cat.

    While lsjbot could work in any language, most of its output has been in Cebuano. It's so far generated a couple million articles on plants and animals, 4 million articles on geography, and a few articles on smaller categories such as chemical elements.

    Dr Johansson chose to focus on Cebuano because it's his wife's native language. She helped him write the sentence templates.

    "I'm not really fluent in her language, but I wanted to help anyway, and I figured this is a way I can do it," he says.

    He has also run the bot in Waray, another language from the Philippines, and his native Swedish.

    The controversy around lsjbot

    Lsjbot caused huge ripples through the Philippine Wikipedia community, and not all of them good.

    Volunteers who create and maintain Wikipedia, called Wikipedians, found many of the Cebuano-language pages had grammatical and sometimes factual errors, thanks to imperfect translations.

    The sheer number of articles was another problem. In a community of few editors, it was difficult to maintain or improve their quality.

    In 2018, there was even a proposal to delete the entire Cebuano Wikipedia, including the small fraction of human-generated articles. It was rejected and strongly opposed by the official Philippine Wikimedia Community.

    Irvin Sto. Tomas, a member of that community, says a small group of local Wikipedians has been trying to improve the quality of the Cebuano pages, including working with Dr Johansson on lsjbot.

    "Unfortunately, there is so much to be done that volunteer editors alone cannot do," Mr Tomas, who works with other Philippine-language Wikipedias, says.

    Josh Lim, who also edits non-Cebuano Philippine Wikipedias, says automated bots caused reputation problems long before lsjbot.

    An early version of Cebuano Wikipedia was composed mostly of thousands of articles on French communes, created by another bot. Articles on topics relevant to Cebuano speakers were sparse.

    "You know how embarrassing that is?" Mr Lim says.

    He also believes the huge number of bot-generated articles caused a "race to the bottom" among Philippine-language Wikipedias, with editors valuing quantity over quality.

    Tagalog, the most widely spoken language in the Philippines, saw its Wikipedia grow in size as part of this race. At its largest, it boasted about 80,000 articles.

    It now has half the number of pages it used to. Mr Lim and his fellow editors have been culling short articles for ease of maintenance.

    Nevertheless, he believes Dr Johansson's intentions were good.

    "I think that what he did was right in the service of Wikipedia," Mr Lim says.

    He also thinks Wikipedians should not reject bots and automation tools outright.

    "It speaks to this capacity, or lack thereof, on the part of Wikipedians to assess and embrace change."

    The Swedish Wikipedia community, meanwhile, first agreed to, and then pulled back from, lsjbot's use.

    Lsjbot has been largely inactive since 2021. Dr Johansson says that the debate around its use was part of his reason for retiring it.

    Native languages devalued

    Another reason Dr Johansson retired lsjbot was that it wasn't achieving one of the aims he'd hoped it might: bringing a "critical mass" of readers and editors to Cebuano Wikipedia, catalysing a richer encyclopedia.

    According to Wikimedia Statistics, Cebuano Wikipedia currently reels in tens of thousands of page views from the Philippines each month.

    English Wikipedia, meanwhile, gets more than 100 million Filipino viewers per month.

    Mr Lim says this is a broader problem with non-English Wikipedias. The Tagalog Wikipedia, for instance, receives about 2 million local hits per month.

    "It's just general colonial experience — our native languages have been devalued," Mr Lim says.

    "That, in turn, impacts the amount of information that is available in those languages."

    This "devaluing" appears all over the internet. An early version of Google Translate, for instance, translated a number of scientific terms into profanities in Filipino, apparently lacking better data to do it accurately.

    But Mr Lim says an "upswell of linguistic pride" is drawing more Filipinos to use their native languages, and he believes sites like Wikipedia can help.

    "Wikipedia is part of that solution to allow people to express themselves, and to express complicated thoughts, in their native language."

    Cebuano Wikipedia has had other uses — sometimes from unusual corners.

    Heather Ford, a researcher at University of Technology Sydney, is part of a team investigating how Australia is depicted on English Wikipedia.

    Last year, the team used Cebuano Wikipedia as a point of comparison to see how Australian places were portrayed.

    They found that, compared to Cebuano's database-generated entries, the English Wikipedia focused more on Australian cities, and less on natural features.

    "We wanted to really make the point that it matters which encyclopedia you're looking at [as to the information presented]," Dr Ford says.

    Wikipedia in the age of generative AI

    Dr Johansson draws a clear line between his bot and generative AI models, such as ChatGPT and Microsoft's Copilot.

    "[Lsjbot] does not produce any new text; it only packages existing information into existing templates. It can't do anything else. It doesn't invent anything," he says.

    AI large language models, meanwhile, do invent things — they're designed to sound convincing, not to produce reliable information.

    "They are very risky if you want factual information, because it's fiction," Dr Johansson says.

    It's a distinction both Dr Ford and Mr Lim agree with.

    "We can debate the quality of the translations [of lsjbot], but the underlying facts are nevertheless true. We cannot say the same with generative AI," Mr Lim says.

    But while he cautions against writing whole articles, Mr Lim says some Wikipedians have found uses for generative AI — particularly those working in their non-native languages.

    "Generative AI can actually help … polish Wikipedia articles so that they look, read and have that gravitas of being something that is more encyclopedic."

    He emphasises that the decision to include AI-generated text should still rest with a human editor, who can check its accuracy.

    "I generally trust Wikipedians are much more thoughtful about this — that we don't just believe what generative AI says, hook, line and sinker," Mr Lim says.

    It's hard to tell exactly how much of Wikipedia has been AI-generated, although researchers have made attempts to find out. One recent preprint suggested the encyclopedia may now be one or two per cent AI, while another proposed as much as five per cent.

    If this proportion grows, Dr Ford fears it may make all versions of Wikipedia too big to maintain, as lsjbot did with Cebuano.

    "When Wikipedians make decisions about what should be on the encyclopedia and what shouldn't, it's not only about questions of notability," she says.

    "It's also a question of labour."

    Because Wikipedia is used to train large language models, adding AI-generated material to it also runs the risk of model collapse.

    Mistakes made by AI models would be used to train and be incorporated into future AI models, entrenching the errors. Or, as Mr Lim summarises: garbage in, garbage out.

    "I don't think that having this recursive engine to generate Wikipedia articles is necessarily in our best interest," he says.

    Automation is not a new concept for Wikipedia. But generative AI brings a new host of risks and benefits.

    Mr Lim says Wikipedians will need to address these questions sooner rather than later if the site is to maintain its status as a trusted resource.

    "I believe we are reliable, but I also believe that we are fallible, and we should not be playing God with facts."


    ABC




    © 2025 ABC Australian Broadcasting Corporation. All rights reserved

     Other International News
     16 Apr: Mums on why they're happily 'one and done'
     16 Apr: Russia 'working quietly' on Indonesia military ties before air base storm
     16 Apr: Trump administration slammed by judge for doing 'nothing' to retrieve man wrongly deported to El Salvador
     16 Apr: Netanyahu and Macron speak days after Israeli PM son's social media spray
     16 Apr: Xi Jinping visits Vietnam, Cambodia and Malaysia to shore up support in Asia amid US trade war
     16 Apr: Europeans told to gather supplies for 72 hours amid 'direct' war threat
     15 Apr: Canberra confirms Indonesia won't host Russian planes at air force base
     Top Stories

    RUGBY RUGBY
    The Blues will carry an extra forward on the bench as they head to the table-topping Crusaders on Friday night in Super Rugby More...


    BUSINESS BUSINESS
    Air New Zealand is expecting to make 20 million dollars - from travel credits thought highly unlikely to be redeemed by customers More...



     Today's News

    Entertainment:
    Rosie O'Donnell found moving to Ireland felt like "coming home" 19:56

    Entertainment:
    Issa Rae is "so honoured" whenever her name is an answer in the New York Times crossword 19:26

    Rugby:
    The Blues will carry an extra forward on the bench as they head to the table-topping Crusaders on Friday night in Super Rugby 18:57

    Entertainment:
    Brittany Cartwright can't forgive Jax Taylor for refusing to move out of their home for months after they split 18:56

    Technology:
    Internet and phone services in Otago and Southland have been disrupted - first by animals, then a human whoopsie 18:37

    Entertainment:
    Olivia Munn was "done" with acting after the birth of her son 18:26

    Rugby:
    Another Black Fern is switching codes to join the Warriors 18:07

    International:
    Mums on why they're happily 'one and done' 17:57

    Entertainment:
    Jeff Bridges discovered "the magic of life" when he came close to death 17:56

    Law and Order:
    A Frenchman on a working holiday's hoping to avoid a conviction for fighting police officers near Nelson 17:27


     News Search






    Power Search


    © 2025 New Zealand City Ltd