Noeon Research’s architecture is intended to have pragmatic communication. The system should make common-sense decisions about incomplete specifications, and ask clarifying questions when necessary. The problem of pragmatic communication consists of four parts:
1. Detecting insufficient knowledge. LLMs are not designed or trained to assess their own knowledge or ask questions about what knowledge they lack. This makes current code synthesis systems, like GitHub Copilot or Amazon CodeWhisperer, hard to use: the system always makes a completion using only the information provided, and it is up to the user to determine, based on the quality of the output, the amount of context the system requires. The system cannot judge the quality of its own completions, and it cannot choose not to make a completion when its context is insufficient.
2. Asking questions. It is possible to train an LLM to detect if it is missing some information and use external sources [1] or directly ask a question, but the current generation of code assistants do not do this. Even if they did, the necessary information might not fit into a limited context window.
3. Theory of Mind. In a human conversation, a person formulates sentences in a way they think is most appropriate to the other person. There is a difference in how a developer explains their work to a colleague or a client. A pragmatic communicator asks or answers a question in a way that the other person can easily understand. For Noeon’s system to be useful it should communicate in a natural language that is understandable for a developer.
While it is speculated that the latest LLMs have Theory of Mind abilities [2] and thus can model the beliefs of a person, it was only demonstrated in very limited tests involving the completion of children’s stories. In private testing, NR found that this behaviour did not occur in more complicated examples, particularly examples where the model is asked to explain the reasoning behind the characters’ decisions. For pragmatic communication, a system not only has to model the person it speaks with but also how this person perceives the system. MIT researchers attacked this problem directly with a two-stage probability distribution in an RL setup [3]. While the pragmatic system needed fewer explanations to accomplish a task, this also was demonstrated only in a limited setting.
4. Interpretability. Even if an LLM produces bug-free code that matches the specification, the system cannot explain what the code does and how it works [4]. Moreover, this result is uninterpretable: an LLM cannot justify why it gave this particular answer and not another one. This is especially concerning, given code assistants often reproduce bugs they learned from code examples [5].
To be practically applicable for work with IT projects, the system must have a model of its own knowledge and a representation of its reasoning process. While it would be good for the system to have a model of a person it talks to, to make explanations understandable by this person, this problem exceeds NR’s proof of concept.