You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So you know how the transformer works, and you know basic ML/DL, and you want to learn more about LLMs. One way to go is looking into the various "algorithmic" stuff (optimization algorithms, RL, DPO, etc). Lot's of materials on that. But the interesting stuff is (in my opinion at least) not there.
This is an attempt to collect a list of academic (or academic-like) materials that explore LLMs from other directions, and focus on the non-ML-algorithmic aspects.
Structured-chain-of-thought breaks some basic language-use principles
Are OpenAI training models in a way that encourages security risks?
Todays's topic is structured outputs, how to produce them, their interprlay with chain-of-thought, and a potential security risk this opens up.
Structured Outputs
When using an LLM programatically as part of a larger system or process, it is useful to have the model produce outputs in a structured format which is easy to parse programatically. Formatting the output as a JSON structure makes a lot of sense in this regard, and the commercial LLM models are trained to produce JSON outputs according to your specification.
So for example instead of asking the model to produce a list of 10 items (left) which may be tricky to parse, I could ask it to return the answer as a JSON list of 10 strings (right).
Are multi-LLM-agent systems a thing? Yes they are. But.
Yoav Goldberg, Nov 24, 2024
This piece started with a pair of twitter and bluesky posts:
let's talk about "agents" (in the LLM sense). there's a lot of buzz around "multi-agent" systems where agents collaborate but... i don't really get how it differs from a thinking of a single agent with multiple modes of operation. what are the benefits of modeling as multi-agent?
Can you tell an LLM "don't hallucinate" and expect it to work? my gut reaction was "oh this is so silly" but upon some reflection, it really isn't. There is actually no reason why it shouldn't work, especially if it was preference-fine-tuned on instructions with "don't hallucinate" in them, and if it a recent commercial model, it likely was.
What does an LLM need in order to follow an instruction? It needs two things:
an ability to perform then task. Something in its parameters/mechanism should be indicative of the task objective, in a way that can be influenced. (In our case, it should "know" when it hallucinates, and/or should be able to change or adapt its behavior to reduce the chance of hallucinations.)
an ability to ground the instruction: the model should be able to associate the requested behavior with its parameters/mechanisms. (In our case, the model should associate "don't hallucinate" with the behavior related to 1).
Somewhat surprisingly, I found myself agreeing with some core aspects of her argument. Perhaps less surprisingly, there is also a substantial part which I strongly disagree with. This text is a response to this address, and, beyond just responding, may also shed some light on what is ACL, and what is NLP. I of course welcome discussion on these topics, either on the comments section here (unfortunately not very convenient) or on Twitter (not convenient in a different way). Ok, Let's go.
Grand-master Level Chess without Search: Modeling Choices and their Implications
Yoav Golderg, February 2024.
Researchers at Google DeepMind released a paper about a learned systems that is able to play blitz-chess at a grandmaster level, without using search. This is interesting and imagination-capturing, because up to now computer-chess systems that play at this level, either based on machine-learning or not, did use a search component.[^1]
Indeed, my first reaction when reading the paper was to tweet wow, crazy and interesting. I still find it crazy and interesting, but upon a closer read, it may not be as crazy and as interesting as I initially thought. Many reactions on twitter, reddit, etc, were super-impressed, going into implications about projected learning abilities of AI systems, the ability of neural networks to learn semantics from observations, etc, which are really over-the-top. The paper does not claim any of them, but they are still perceiv
Putting papers on arxiv early vs the protections of blind review
The tension between putting papers on arxiv as soon as possible and the double-blind peer review process is ever present. Some people favor the fast-pace of progress facilitated by making papers available before or during the peer review process, while others favor the protection of double-blind reviewing (actually, of author-blind reviewing. reviewer-anonymity is not part of the debate).
As I now serve on an ACL committee which is tasked at assessing this tension, I've spend a longer-then-usual time thinking about it, and came up with an analysis which I find informative, and which others may also find useful. These are my personal opinions, and are not representative of the committee. Though naturally, I will share them there as well.
The analysis examines the dynamics of review bias due to author identities being made exposed through a pre-print, and its effect on other authors at the same conference. The conclusion, as usual with me,
With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback".
I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much
When I first heard of Searle's "Chinese Room" argument, some twenty+ years ago, I had roughly the following dialog:
"Imagine there is a room with instructions, and someone slips a note written in chinese into this room, and you don't know chinese, but you follow the instructios in the room and based on the instructions you produce a different note in chinese and send it back out, and whoever sends you the original note thinks your note is a perfect response."
Oh, so the person outside doesn't know chinese either?
"No no, they do know chinese, you produced a perfect answer"