yoavg / rl-wrong-about-rewards.md

Last active February 24, 2026 09:54

Computer-science Reinforcement Learning got Rewards Wrong

In a recent blog post, Ben Recht described the Reinforcement Learning (RL) setup as:

Paraphrasing Thorndike’s Law of Effect, Lior defines reinforcement learning as the iterative process:

Receive external validation on how good you’re currently doing

Adjust what you’re currently doing so that you are better the next time around.

Whether or not this is how humans or animals learn, this is a spot-on definition of computer scientific reinforcement learning.

yoavg / llm-materials-2025.md

Last active February 19, 2026 14:23

Learning LLMs in 2025

So you know how the transformer works, and you know basic ML/DL, and you want to learn more about LLMs. One way to go is looking into the various "algorithmic" stuff (optimization algorithms, RL, DPO, etc). Lot's of materials on that. But the interesting stuff is (in my opinion at least) not there.

This is an attempt to collect a list of academic (or academic-like) materials that explore LLMs from other directions, and focus on the non-ML-algorithmic aspects.

Courses

David Chiang's Theory of Neural Networks course.
This is not primarily LLMs, but does have substantial section on Transformers. Formal/Theory. More of a book than a course.

yoavg / structured-cot.md

Created November 25, 2024 23:13

Structured-chain-of-thought breaks some basic language-use principles

Are OpenAI training models in a way that encourages security risks?

Todays's topic is structured outputs, how to produce them, their interprlay with chain-of-thought, and a potential security risk this opens up.

Structured Outputs

When using an LLM programatically as part of a larger system or process, it is useful to have the model produce outputs in a structured format which is easy to parse programatically. Formatting the output as a JSON structure makes a lot of sense in this regard, and the commercial LLM models are trained to produce JSON outputs according to your specification. So for example instead of asking the model to produce a list of 10 items (left) which may be tricky to parse, I could ask it to return the answer as a JSON list of 10 strings (right).

yoavg / multi-llm-agents.md

Last active January 7, 2026 05:45

What makes multi-agent LLM systems multi-agent?

Are multi-LLM-agent systems a thing? Yes they are. But.

Yoav Goldberg, Nov 24, 2024

This piece started with a pair of twitter and bluesky posts:

let's talk about "agents" (in the LLM sense). there's a lot of buzz around "multi-agent" systems where agents collaborate but... i don't really get how it differs from a thinking of a single agent with multiple modes of operation. what are the benefits of modeling as multi-agent?
— (((ل()(ل() 'yoav))))👾 (@yoavgo) November 23, 2024

yoavg / instruct-to-not-hallucinate.md

Created September 9, 2024 20:23

Is telling a model to "not hallucinate" absurd?

Can you tell an LLM "don't hallucinate" and expect it to work? my gut reaction was "oh this is so silly" but upon some reflection, it really isn't. There is actually no reason why it shouldn't work, especially if it was preference-fine-tuned on instructions with "don't hallucinate" in them, and if it a recent commercial model, it likely was.

What does an LLM need in order to follow an instruction? It needs two things:

an ability to perform then task. Something in its parameters/mechanism should be indicative of the task objective, in a way that can be influenced. (In our case, it should "know" when it hallucinates, and/or should be able to change or adapt its behavior to reduce the chance of hallucinations.)
an ability to ground the instruction: the model should be able to associate the requested behavior with its parameters/mechanisms. (In our case, the model should associate "don't hallucinate" with the behavior related to 1).

yoavg / acl-presedential-response.md

Created August 14, 2024 18:10

ACL is not an AI Conference (?)

Yoav Goldberg, August 2024

In her "Presidential Address" at the ACL 2024, Emily Bender gave a talk called "ACL is not an AI Conference". For those who did not attend (or were not paying close attention), you can find the slides in the following link: https://faculty.washington.edu/ebender/papers/ACL_2024_Presidential_Address.pdf

Somewhat surprisingly, I found myself agreeing with some core aspects of her argument. Perhaps less surprisingly, there is also a substantial part which I strongly disagree with. This text is a response to this address, and, beyond just responding, may also shed some light on what is ACL, and what is NLP. I of course welcome discussion on these topics, either on the comments section here (unfortunately not very convenient) or on Twitter (not convenient in a different way). Ok, Let's go.

ACL is not a Computational Linguistics Conference

yoavg / GM-level-chess-without-search.md

Last active December 26, 2025 00:20

Grand-master Level Chess without Search

Grand-master Level Chess without Search: Modeling Choices and their Implications

Yoav Golderg, February 2024.

Researchers at Google DeepMind released a paper about a learned systems that is able to play blitz-chess at a grandmaster level, without using search. This is interesting and imagination-capturing, because up to now computer-chess systems that play at this level, either based on machine-learning or not, did use a search component.[^1]

Indeed, my first reaction when reading the paper was to tweet wow, crazy and interesting. I still find it crazy and interesting, but upon a closer read, it may not be as crazy and as interesting as I initially thought. Many reactions on twitter, reddit, etc, were super-impressed, going into implications about projected learning abilities of AI systems, the ability of neural networks to learn semantics from observations, etc, which are really over-the-top. The paper does not claim any of them, but they are still perceiv

yoavg / preprint-vs-anon.md

Last active January 5, 2025 10:43

Putting papers on arxiv early vs the protections of blind review

The tension between putting papers on arxiv as soon as possible and the double-blind peer review process is ever present. Some people favor the fast-pace of progress facilitated by making papers available before or during the peer review process, while others favor the protection of double-blind reviewing (actually, of author-blind reviewing. reviewer-anonymity is not part of the debate).

As I now serve on an ACL committee which is tasked at assessing this tension, I've spend a longer-then-usual time thinking about it, and came up with an analysis which I find informative, and which others may also find useful. These are my personal opinions, and are not representative of the committee. Though naturally, I will share them there as well.

The analysis examines the dynamics of review bias due to author identities being made exposed through a pre-print, and its effect on other authors at the same conference. The conclusion, as usual with me,

yoavg / rl-for-llms.md

Last active March 10, 2026 16:12

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

yoavg / searle.md

Last active December 27, 2025 05:35

On Searle's Chinese Room Argument

On Searle's "Chinese Room" argument

When I first heard of Searle's "Chinese Room" argument, some twenty+ years ago, I had roughly the following dialog:

"Imagine there is a room with instructions, and someone slips a note written in chinese into this room, and you don't know chinese, but you follow the instructios in the room and based on the instructions you produce a different note in chinese and send it back out, and whoever sends you the original note thinks your note is a perfect response."

Oh, so the person outside doesn't know chinese either?

"No no, they do know chinese, you produced a perfect answer"

Yoav Goldberg yoavg