Our newsroom AI policy

(arstechnica.com)

75 points | by zdw 7 hours ago ago

51 comments

legitster 5 hours ago
AI is in danger of peeing in it's own water source. It's unbelievably useful at imitating and generating content, but it needs enough original content to be able to train and scrape.
Google got one thing wrong and nearly destroyed the internet - people need to have an incentive to contribute content online, and that incentive should not be to game the system for advertising.
This in particular dawned on me when asking Claude for instructions in taking apart my dryer. There was literally only one webpage on the internet left with instructions for my particular dryer - the page was more or less unusable with rotten links and riddled with adware. Claude did it's best but filled in the missing diagrams with hallucinations.
I was imaging if LLMs could finally solve the micropayments solution people have always proposed for the internet. Part of my monthly payment gets split between all of the sites that the LLM scraped knowledge. Paid out like Spotify pays out artists.
It might not be a lot of money, but it would certainly be more than the pitiful ad revenue you get from posting content online right now. And if I want to upload corrected instructions for repairing this dryer I would have reason to.
[-]
- pjc50 18 minutes ago
  > Paid out like Spotify pays out artists.
  So, mostly to fraudulent AI spam?
  AI makes this problem worse in both directions. It makes it fantastically easy to produce ""content"". So if you're scraping content, or browsing content, you're going to run in to increasing amounts of AI. Micropayments makes this worse, because it's then a means of getting paid to produce spam. The problem comes when you want the ""content"" to be connected to real questions like "how does my dryer work" or "what is going to happen to oil availability six months from now".
  AI trainers didn't pay book authors until forced to. $3,000 ended up being a pretty high value! But it was also a one-off. Everyone writing books from now on is going to have to deal with being free grist to the machine.
- spacechild1 an hour ago
  > Paid out like Spotify pays out artists.
  That's probably not the best comparison. Spotify only benefits the big players resp. those with the most bots. If you actually want to support specific artists, you'd have to use Bandcamp or similar sites.
- meander_water 2 hours ago
  I think most labs actively create synthetic data using existing model as part of the mix for the pretraining stage for their next model.
  Would love to know exactly what the latest process is to keep slop out of training data.
  [-]
  - martinald 37 minutes ago
    const isAiContent = (str) => str.includes('—');?
    :)
- ares623 5 hours ago
  > I was imaging if LLMs could finally solve the micropayments solution people have always proposed for the internet. Part of my monthly payment gets split between all of the sites that the LLM scraped knowledge. Paid out like Spotify pays out artists.
  As a software user I wish I could do the same for all the software I use.
  [-]
  - Jach an hour ago
    Many open source projects accept donations. There's also explicitly paid-for software. What exactly do you wish for that you can't do right now?
    [-]
    - ares623 an hour ago
      Specifically the part where engineers get paid the same way as artists on Spotify.
      [-]
      - Loughla 7 minutes ago
        So a handful will make a buttload but the vast majority won't make enough to pay rent?
      - techpression an hour ago
        So not at all for their work and with a reverse Robin Hood model? That would be terrible for software. The way artists gets paid on streaming is a genius play at catering to the biggest artists and labels and screw over the smaller ones, especially true on Spotify with their freemium model
- intended 3 hours ago
  > in danger
  It has already done so, and we can be confident in saying that.
  Verified content will always be relatively expensive when compared to AI content.
  Visits to wikipedia and most sites have dropped. Rtings has gone full paywall. Ad revenue for producing Verified content will be too meager to allow for public consumption.
  Theres jokes about GenAI being the great filter; while I doubt this, I do hope this is the final push that makes us think of how we want our information commons to be nurtured.
- someone_eu 2 hours ago
  > I was imaging if LLMs could finally solve the micropayments solution people have always proposed for the internet. Part of my monthly payment gets split between all of the sites that the LLM scraped knowledge. Paid out like Spotify pays out artists.
  This system is usually called taxes.
  Which then pay for the universal healthcare, free education, affordable housing, libraries, parks,.. and so on.
  LLM doesn't need to invent it, we should stop allowing them (people and companies behind LLM) to avoid it.
applfanboysbgon 4 hours ago
Self-contradictory policy.
> Reporters may use AI tools vetted and approved for our workflow to assist with research, including navigating large volumes of material, summarizing background documents, and searching datasets.
If this is their official policy, Ars Technica bears as much responsibility as the author they fired for the fabricated reporting. LLMs are terrible at accurately summarizing anything. They very randomly latch on to certain keywords and construct a narrative from them, with the result being something that is plausibly correct but in which the details are incorrect, usually subtly so, or important information is omitted because it wasn't part of the random selection of attention.
You cannot permit your employees to use LLMs in this manner and then tell them it's entirely their fault when it makes mistakes, because you gave them permission to use something that will make mistakes 100% without fail. My takeaway from this is to never trust anything that Ars reports because their policy is to rely on plausible generated fictional research and their solution to getting caught is to fire employees rather than taking accountability for doing actual research.
---
Edit: Two replies have found complaint with the fact that I didn't quote the following sentence, so here you go:
> Even then, AI output is never treated as an authoritative source. Everything must be verified.
If I wasn't clear, I consider this to be part of what makes the policy self-contradictory. In my eyes, this is equivalent to providing all of your employees with a flamethrower, and then saying they bear all responsibility for the fires they start. "Hey, don't blame us for giving them flamethrowers, it's company policy not to burn everything to the ground!". Rather than firing the flamethrower-wielding employees when the inevitable burning happens, maybe don't give them flamethrowers.
[-]
- brey 4 hours ago
  The next sentence after your quoted section:
  “Even then, AI output is never treated as an authoritative source. Everything must be verified.”
  [-]
  - applfanboysbgon 4 hours ago
    Any verification process thorough enough to catch all LLM fabrications would take more work than simply not using the LLM in the first place. If anything verifying what an LLM wrote is substantially more difficult than just reading the material it's "summarising", because you need to fully read and comprehend the material and then also keep in mind what the LLM generated to contrast and at that point what the fuck are you even doing?
    I believe this policy can never result in a positive outcome. The policy implicitly suggests that verification means taking shortcuts and letting fabrications slip through in the name of "efficiency", with the follow-up sentence existing solely so that Ars won't take accountability for enabling such a policy but instead place the blame entirely on the reporters it told to take shortcuts.
    [-]
    - JumpCrisscross 4 hours ago
      > Any verification process thorough enough to catch all LLM fabrications would take more work than simply not using the LLM in the first place
      Sometimes you have a weak hunch that may take hours to validate. Putting an LLM to doing the preliminary investigation on that can be fruitful. Particularly if, as if often the case, you don't have a weak hunch, but a small basket of them.
      [-]
      - Mordisquitos 2 hours ago
        You can prompt LLMs to scan thousands of documents to generate text validating your hunches. In some cases those validated hunches may even be correct.
    - klausa 3 hours ago
      The LLM can find material that it would be hard or time-consuming for you to do.
      You still need to verify it, but "find the right things to read in the first place" is often a time intensive process in itself.
      (You might, at that point, argue that "what if LLM fails to find a key article/paper/whatever", which I think is both a reasonable worry, and an unreasonable standard to apply. "What if your google search doesn't return it" is an obvious counterpoint, and I don't think you can make a reasonable argument that you journalists should be forced to cross-compare SERPs from Google/Bing/DuckDuckGo/AltaVista or whatever.)
    - Paracompact 4 hours ago
      > I believe this policy can never result in a positive outcome.
      I get where you're coming from (I'm learning more and more over time that every sentence or line of code I "trust" an AI with, will eventually come back to bite me), but this is too absolutist. Really, no positive result, ever, in any context? We need more nuanced understanding of this technology than "always good" or "always bad."
      [-]
      - bandrami 2 hours ago
        If you need accuracy, an LLM is not the tool for that use case. LLMs are for when you need plausibility. There are real use cases for that, but journalism is not one of them.
      - applfanboysbgon 4 hours ago
        I didn't say in any context. I'm specifically talking about this policy on journalistic research.
    - Angostura 3 hours ago
      Disagree. If I’m I’m a reporter and I’m trawling though a mass data dump - say the Epstein files or Wilileaks or statistics on environmental spills or something, using AI to pull out potential patterns in the data, or find specific references can be useful. Obviously you go and then check the particular citations. This will still save a lot of time.
- dspillett an hour ago
  > You cannot permit your employees to use LLMs in this manner and then tell them it's entirely their fault when it makes mistakes, because you gave them permission to use something that will make mistakes 100% without fail.
  Yes you can. The same way Wikipedia (or, way back when, a paper encyclopedia) can be used for research but you have to verify everything with other sources because it is known there are errors and deficiencies in such sources. Or using outsourced dev resource (meat-based outsourced devs can be as faulty as an LLM, some would argue sometimes more so) without reviewing their code before implemeting it in production.
  Should they also ban them from talking to people as sources of information, because people can be misinformed or actively lie, rather than instead insisting that information found from such sources be sense-checked before use in an article?
  Personally I barely touch LLMs at all (at some point this is going to wind up DayJob where they think the tech will make me more efficient…) but if someone is properly using them as a different form of search engine, or to pick out related keywords/phrases that are associated with what they are looking for but they might not have thought of themselves, that would be valid IMO. Using them in these ways is very different from doing a direct copy+paste of the LLM output and calling it a day. There is a difference between using a tool to help with your task and using a tool to be lazy.
  > it's company policy not to burn everything to the ground!
  The flamethrower example is silly hyperbole IMO, and a bad example anyway because everywhere where potentially dangerous equipment is actually made available for someone's job you will find policies exactly like this. Military use: “we gave them flamethrowers for X and specifically trained them not to deploy them near civilians, the relevant people have been court-martialled and duly punished for the burnign down of that school”. Civilian use: “the use of flamethrowers to initiate controlled land-clearance burns must be properly signed-off before work commences, and the work should only be signed of to be performed by those who have been through the full operation and safety training programs or without an environmental risk assessment”.
- furyofantares 2 hours ago
  I'm not a journalist and just for random things I'm interested in, I have no problem using an LLM to point me in a direction and then directly engage with the source rather than treat any of the LLM output as authoritative. It's easy to do. This is not a flamethrower.
- JumpCrisscross 4 hours ago
  > the author they fired for the fabricated reporting
  Didn't one of the magazine's editors share the byline?
- knighthack 4 hours ago
  > LLMs are terrible at accurately summarizing anything. They very randomly latch on to certain keywords and construct a narrative from them, with the result being something that is plausibly correct but in which the details are incorrect, usually subtly so, or important information is omitted because it wasn't part of the random selection of attention.
  I don't know what you've been doing, but the summaries I get from my LLMs have been rather accurate.
  And in any event, summaries are just that - summaries.
  They don't need to be 100% accurate. Demanding that is unreasonable.
  [-]
  - lamasery 3 minutes ago
    The LLM meeting-summary bot in Teams seems accurate… unless you were in the meeting, and also closely read the summary afterward. It misrepresents what people actually said all the time.
  - avereveard an hour ago
    Depends on topic, often what they consider important isn't what is important and details that are essential get out of view. I'm having good success with youtube video, not as much with technical docs.
  - carefree-bob 4 hours ago
    Yes, search and summarization is where LLMs shine. I use them all the time for that, and much less for code generation. I would say search > summarization > debugging > code gen/image gen
  - suddenlybananas 4 hours ago
    >They don't need to be 100% accurate. Demanding that is unreasonable.
    If an intern was routinely making up stuff in the summaries they provided to their bosses, they'd be let go.
- fooker 4 hours ago
  > LLMs are terrible at accurately summarizing anything.
  I think you are perhaps stuck in 2023?
  [-]
  - Mordisquitos 2 hours ago
    And yet we are discussing this in the context of a reporter having been fired from Ars Technica for publishing an article which included inaccurate LLM-generated summaries in 2026. How come?
    https://news.ycombinator.com/item?id=47226608
    [-]
    - fooker 2 hours ago
      Maybe you should read the article? :)
      What failed was extracting verbatim quotes, not summarizing.
      If you want an LLM to do verbatim anything, it has to be a tool call. So I’m not surprised.
dheatov an hour ago
AI policy with AI usage is always difficult to write/read. Lengthy, frontloaded with excuses, values, and big words, followed by more words to fill the gap between sliced up ugly truth.
AI policy without AI usage is easy to read and write.
> We don't use them. That's it.
[-]
- gblargg an hour ago
  All the beating around the bush is because they use AI throughout the process, but want to frame it as not being written by AI (oh, a human signs off on the AI content).
lproven 6 minutes ago
From TFA:
> Our approach comes from two convictions:
Uhuh.
> that AI cannot replace human insight, creativity, and ingenuity
Sure, agreed, no dispute.
> and that these tools, used well, can help professionals do better work.
[[citation needed]]
Prove it. I don't believe you.
mellosouls 4 hours ago
Related discussions from a couple months ago:
Ars Technica fires reporter after AI controversy involving fabricated quotes (606 points, 394 comments)
https://news.ycombinator.com/item?id=47226608
Editor's Note: Retraction of article containing fabricated quotations (308 points, 211 comments)
https://news.ycombinator.com/item?id=47026071
defrost 5 hours ago
\1 AI-generated news is unhuman slop. Crikey is banning it (2024) - Crikey.com.au - https://www.crikey.com.au/2024/06/24/crikey-insider-artifici...
\2 Why Crikey retracted an article that we found out was written with AI help (2026) - https://www.crikey.com.au/2026/03/19/crikey-responds-to-ai-c...
```
  Yesterday, we published an article by a contributor who later confirmed they used AI in some aspects of its production.

  This goes against our editorial policies. As a result, we’ve taken down the story and the preceding three stories in the series.
```
(\2) is an interesting follow on from the policy set two years earlier (\1) as the specific piece in question "used AI in some aspects of its production" but was largely very much a human conceived, shaped and written piece that was only "assisted" by AI.
The Australian Media Watch team looked at this tension closely and felt the rejection was unfair, pointing out that while slop is bad, assistance (subject to terms and conditions) can enhance.
- Media Watch, likely geolocked to AU, might need a proxy - https://www.abc.net.au/mediawatch/episodes/ep-08/106487250
Hendrikto an hour ago
I have had that feeling for a while and Ars had some high-profile slipups over the past few months. This confirms that it is just slop now :(
sharkjacobs 4 hours ago
> Our creative team may use AI tools in the production of certain visual material, but the creative direction and editorial judgment are human-driven.
As opposed to what? This is a little facetious, but what could it possibly mean to have creative direction and editorial judgement without human involvement?
Presumably we're talking about image generated by a diffusion model or something, but further, an image which is generated without being edited by any human. The prompt used to generate the image isn't written by a human, and it can't really be based on the contents of the (human authored and edited) article either. No human may select the service or model used, and once generated the image is published sight unseen without being reviewed by any human.
If some kind of agentic AI does any of these things it is one which appears ex nihilo, spontaneously appearing without being created or directed by any human.
[-]
- sharkjacobs 4 hours ago
  There's a good post from Aurich in the comments of the article detailing the practical reality of how they (don't) use AI tools in their image work, but as a policy statement this sentence is 100% vibes, 0% actual guidance or restriction
riffraff 4 hours ago
> Our creative team may use AI tools in the production of certain visual material, but the creative direction and editorial judgment are human-driven.
How is this different from anyone else publishing AI slop images on their blog? Those people also direct the AI through prompting and evaluate the results.
I mean, use AI images, so long as they are not crap, but why keep up this charade of "we're authoring the slop".
ares623 5 hours ago
Trust, reputation, and credibility will become (even more of) a premium.
tomalbrc 3 hours ago
Good to know which companies/news to avoid.
JumpCrisscross 4 hours ago
Context:
"An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library.
...
I’ve talked to several reporters, and quite a few news outlets have covered the story. Ars Technica wasn’t one of the ones that reached out to me, but I especially thought this piece from them was interesting (since taken down – here’s the archive link). They had some nice quotes from my blog post explaining what was going on. The problem is that these quotes were not written by me, never existed, and appear to be AI hallucinations themselves.
This blog you’re on right now is set up to block AI agents from scraping it (I actually spent some time yesterday trying to disable that but couldn’t figure out how). My guess is that the authors asked ChatGPT or similar to either go grab quotes or write the article wholesale. When it couldn’t access the page it generated these plausible quotes instead, and no fact check was performed.
...
Update: Ars Technica issued a brief statement admitting that AI was used to fabricate these quotes" [1].
[1] https://theshamblog.com/an-ai-agent-published-a-hit-piece-on...
Discussion: https://news.ycombinator.com/item?id=47009949
gnabgib 7 hours ago
Doesn't need Ars Technica added to the title
npodbielski 4 hours ago
It is nice to see, but I fear it will be the same as with papers and their news and internet. I could buy a paper and read it but why would I?
The same will most likely happen with human written news and cheap AI slop news. Why would anyone pay more for higher quality when you can have low quality cheap product?
Look at food for example. Price is most important factor in the choice of what you are going to buy. I will probably not happen now, in few months or in even few years but it will happen if models will still be advancing.
vintagedave 5 hours ago
> Anyone who uses AI tools in our editorial workflow is responsible for the accuracy and integrity of the resulting work. This responsibility cannot be transferred to colleagues, editors...
This sounds a direct callout to the incident earlier this year where an apparently sick staff member relied on an AI to reproduce quotes, and it did not. Ars retracted the article and the staffmember was fired.
I have felt very ethically uneasy about this because the person was ill, and I emailed the Ars editorial team directly to express concern re labour conditions, and to note that it is the editorial team's responsibility to do things like check quotes.
Of course it is the journalist's responsibility: when you have a job you do your job by policy (I wonder if this policy existed in writing at the time of the firing?) plus, it is part of the job to be accurate. But I am also a firm believer in responsibility being greater at higher levels. This sounds a direct abrogation of journalistic standards by the Ars editorial team.
[-]
- cubefox 3 hours ago
  > and to note that it is the editorial team's responsibility to do things like check quotes.
  Publishing things online for free (as Ars does) is difficult business. I doubt they can realistically afford an "editorial team" which checks quotes. Paying the journalists is expensive enough.
- lynx97 4 hours ago
  > apparently sick staff member relied on an AI to reproduce quotes
  "Apparently sick", you couldn't phrase it more accurately.
  Kudos for firing them, the only valid course of action for a publisher.
- intended 2 hours ago
  >This sounds a direct abrogation of journalistic standards by the Ars editorial team.
  We depended on an ecosystem of news and journalism to keep our polities informed.
  However, if that ecosystem is starving it will increasingly fail to live up to its standards and we can expect these failures to impact us increasingly.
  I am not defending bad journalists, nor creating an excuse to tolerate such behavior in the future.
  I am describing the macro trend we are facing, the failure state we can expect, and asking what happens if nothing grows to replace it.
  The NYT earns revenue through games more than journalism and ads. Wikipedia is seeing reduced visitors due to AI summaries, and this leads to lower donations. A review site I used went into a full paywall.
  I don't really see how Ars or most other sites will be able to earn revenue and pay salaries in this bot first environment.