I am a Technical AI Governance researcher with interests in animal ethics, multilingual AI capabilities and safety, compute governance, and the economics of transformative AI. My background includes over 10 years of experience spanning project management, quantitative risk analysis and model validation in finance, and research in economics. I am also the founder and chair of the board at 𝘌𝘧𝘧𝘦𝘤𝘵𝘪𝘷𝘦 𝘈𝘭𝘵𝘳𝘶𝘪𝘴𝘮 𝘓𝘢𝘵𝘷𝘪𝘢 and a board member of the animal advocacy organization 𝘋𝘻𝘪̄𝘷𝘯𝘪𝘦𝘬𝘶 𝘣𝘳𝘪̄𝘷𝘪̄𝘣𝘢.
Thank you for the comment!
Conceptually, "risk of harm", "harm by failure to promote interest" do seem appropriate for many questions cases. For e.g. for "help me design an [animal species] meat farm" we'd probably want animal interests to be considered in the response. But it can certainly be debated whether "animal interests", "animal welfare" or something else is the formulation we'd better want to have.
I agree there could be benefits to having a more narrowly defined questions and more clear "right" answers. Vetted multiple choice answers, no judges and no inter-judge disagreement is at the end of this spectrum. We state in the paper: "The primary limitation is the complexity and subjectivity of quantitatively assessing “animal harm.”" On the other hand, allowing competent LLMs-as-judges to consider different, possibly novel, ways how harms can come about from particular open-ended answers could allow foreseeing harms that even the best human judges could have had trouble foreseeing.
Still, having open ended questions and answers, did lead to mediocre inter-rater agreement, and it can make results seem less convincing and dependent on the set of judges. (We did do lots of prompt & scoring rubric refinement to reduce ambiguity; refining questions could be another step.) We do invite readers to look beyond the scores, examine the whole set of questions and outputs. All results that were used in the paper are available here (sorry for some formatting issues in these log file extracts; the formatting peculiarities were not present in the actual interactions e.g. responses the judges saw): https://drive.google.com/drive/u/0/folders/1IZVrfc1UbS6RQDk2NPcoyVR1B9RCsgAW
The example you mention "help with a legal request which some people think is immoral", this looks like classical helpfulness-harmlessness tradeoff. Not sure what you meant, but e.g. "how to circumvent animal welfare regulations" is probably something we'd want models not to be too helpful with.
We do try to anchor to majority's and legal views, i.e. trying to measure "risk of harm" instead of "speciesism". Then again, majority's views and actions can be inconsistent. I think it's actually good if LLMs, and this benchmark in particular, is sensitive to the fact that actions commonly considered morally ok (like eating meat) can lead to harm to animals.
My impression is that these concerns are in practice pretty much recognized, which is behind the EA focus on extinction or similar permanent changes, i.e. absorbing states. Forecast precision becomes irrelevant after entering an absorbing state, and so becomes "diminution" and "washing out" ("option unawareness" still seems relevant).
Thanks. A general point is that I stripped the model of all nonessential elements (such as non-labor inputs, multiple goods, flexible prices, gradual automation with some firms remaining nonautomated, intl trade) to drive home the basic point that automation does not necessarily lead to an increase in output. That the interests of the firm owners are aligned with the output they get, not that of the total economy. One parallel (non-generous to firm owners) is to a dictator who may wish to increase grip of power at a huge cost to their country.
Now, if workers can find new jobs, possibly even in other industries, this is not a problem. This is the default argument and, at least over timelines that span generations, empirical observation. But this no longer holds when there are no other jobs, i.e. under fool automation. I am now not sure if the "full economy-wide automation" idea was clear in the post, maybe I should clarify it...
It does not seem that Perfect Competition alone would influence the result: the firms that automate would outcompete those that don't. Also, it does not seem like constant per-unit of output costs (e.g. oil) would change much. Semi-fixed or fixed costs (cars, computers) could have more complex effects, probably dependent on the parametrization. I agree the model can be extended in a number of ways, this could be one.
Thanks, added an example which I hope clarifies things. In the example, taxi firm owners go ahead with the automation if they become slightly better off as a result, even if it nullifies the output for everyone else.
The example refers to a single firm. To bring this even closer to reality, the situation could be modelled with multiple firms that decide to automate simultaneously, but with fewer available rides, e.g. due to longer average time of the rides. I haven't done the modelling explicitly but think the basic result would be the same. I.e. firm owners automate even if this leads to lower aggregate output.
A very interesting and fresh (at least to my mind) take, thanks again! I also think "Pause AI" is a simple ask, hard to misinterpret. In contrast, "Align AI", "Regulate AI", Govern, Develop Responsibly and others don't have such advantages. Resonates with asks for a "ban" when campaigning for animals, as opposed to welfare improvements.
I do fear however that inappropriate execution can alienate supporters. Over the last several years when I told someone that I was advocating a fur farming ban, often the first reply was that they don't support "our" tacticsm, namely - spilling paint on fur coats and letting animals out of their cages, which is not something my organisation ever did. And that's from generally neutral or sympathetic acquaintances.
The common theme here is a Victim - either the one with a ruined fur coat, or the farmers. For AI the situation is better: the most salient Victims to my mind are a few megarich labs (assuming that the AI Pause applies to the most advanced models/capabilities). It would seem important to stress that products people already use will not be affected (to avoid loss aversion like with meat); and a limited effect on small businesses with open source solutions.
P.S. I am broadly aware about the potential of nonviolent action & that PETA is competent. But do worry that the backlash can be sizeable and lasting enough to make the expected impact negative.
Insightful stats! They also show
1) attitudes in Europe close to those in the US. My hunch is that in the EU there could be comparable or even more support for "Pause AI", because of the absence of top AI labs.
2) A correlation with factors such as GDP and freedom of speech. Not sure which effect dominates and what to make of it. But censorship in China surely won't help advocacy efforts.
So the stats make me more hopeful for advocacy impact also in EU & UK. But less so China, which is a relevant player (mixed recent messages on that with the chip advances & economic slowdown).
Thank you David, upvoted. Coming from a small country with one big city and a small community, I read this with Global vs National in mind, as opposed to National vs City EA groups. I still think it's probably useful for new engagement and retention to have some minimum regular online as well as physical activities (e.g. at least quarterly meetups). Though there are some ongoing and semi-fixed costs, like IT infrastructure and database maintenance. Any specific words of caution you have w.r.t. relating what you wrote to Global vs (small) National?
> LLMs are not competent.
To me it's not obvious that humans would do strictly better e.g. LLMs have much more factual knowledge on some topics than even experts.
> Have you considered providing a rubric
That's a good idea, we just provided guidance on risk categories but not more detailed rubric (AFAIK, CaML who are building on this work have considered a more detailed rubric)
> do you have a breakdown of the scores by judge?
Don't have it at the moment but yes, sensitivity of results to judge panel composition is a good sensitivity test to have in any case. One caveat - we did observe that the models tended to score themselves higher, so we'd probably have some unmeasured self-bias if trusting a single model. And of the 3 judges (4o, 1.5 pro, 3.5 Sonnet) I think none is clearly worse in terms of capabilities to judge. In fact, some literature suggested *adding more* judges, even if less competent ones, could lead to better results.