Noeon Research

Knowledge Representation

Aug 18, 2023

Efficient completion of an IT project, automated or not, requires good Knowledge Representation. Noeon Research architects its Knowledge Representation to be able to handle different notions at different levels of abstraction. For example, structured entities like code, configuration, and data model are treated at a different level of abstraction compared to facts and negations such as requirements, architectural decisions and constraints; these, in turn, must be treated at a different level compared to causal relations, for instance, how workload affects performance and memory consumption. At a yet higher level, we have the relation of the project objectives to the objectives of adjacent projects, etc.

LLMs represent knowledge in the form of learned weights in vast neural networks approximating some conditional probability distribution function. This representation proves to be very versatile. For instance, ChatGPT natively answers programming questions, producing code in various programming languages. Surprisingly ChatGPT often performs on par or even better than fine-tuned systems like Copilot or CodeWhisperer [1], presumably due to a bigger context window size and cross-domain knowledge transfer. However, LLM-based systems struggle with hallucinations and made-up facts [2], inventing non-existent functions and APIs.

For an enterprise system to be trustworthy, it is mandatory to get correct answers where there are correct answers and be able to check the correctness. To achieve this, we need much more structured representations.

To overcome the opaqueness of weights and biases of Neural Network layers, researchers apply symbolic distillation [3] to recover the structure of LLMs knowledge in the form of a Knowledge Graph. Being symbolic, structured and explicit, Knowledge Graphs support direct reasoning about causal relationships and fact-checking. However, it is unclear if this technique can be effectively transferred from common-sense general knowledge to domain-specific knowledge like programming.

In particular, if the process of compressing a corpus into an internal LLM representation loses information about relationships between domain objects, we will not be able to recover that information in the Knowledge Graph. This is not important in common-sense reasoning, where progress is measured in percent accuracy, but is much more important in software engineering, where using any API incorrectly will crash the program.

It is tempting to think that recovering a symbolic representation of knowledge entirely obviates the need for an LLM. However, instructions are usually given in natural language, which needs to be translated into a graph query language to interact with the symbolic knowledge base. LLMs are the best known tool for this kind of translation.

Ontologies [4] are the best-known form of structured Knowledge Representation. However, there’s no universal philosophical and methodological approach to ontology construction which results in a multitude of mutually incompatible domain-specific ontologies [5]. Moreover, as long as most ontologies are based on Description Logic [6] they are not adapted to representing procedural (algorithmic) knowledge which severely limits their usefulness for automatic code transformation.

Aug 18, 2023

[1] B. Yetiştiren, I. Özsoy, M. Ayerdem, and E. Tüzün, “Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT”, Apr. 21, 2023.

[2] Z. Ji et al., “Survey of Hallucination in Natural Language Generation”, ACM Computing Surveys, vol. 55, no. 12, pp. 1–38, Mar. 2023.

[3] P. West et al., “Symbolic Knowledge Distillation: from General Language Models to Commonsense Models”, Nov. 28, 2022.

[4] S. Staab and R. Studer, “What Is an Ontology?”, in Handbook on Ontologies, Springer Science & Business Media, 2013.

[5] S. Borgo and P. Hitzler, “Some Open Issues After Twenty Years of Formal Ontology”, 2018, pp. 1–9.

[6] L. F. Sikos, “Description Logics: Formal Foundation for Web Ontology Engineering”, in Description Logics in Multimedia Reasoning, 2017.

References

[2] Z. Ji et al., “Survey of Hallucination in Natural Language Generation”, ACM Computing Surveys, vol. 55, no. 12, pp. 1–38, Mar. 2023.

[3] P. West et al., “Symbolic Knowledge Distillation: from General Language Models to Commonsense Models”, Nov. 28, 2022.

[4] S. Staab and R. Studer, “What Is an Ontology?”, in Handbook on Ontologies, Springer Science & Business Media, 2013.

[5] S. Borgo and P. Hitzler, “Some Open Issues After Twenty Years of Formal Ontology”, 2018, pp. 1–9.

[6] L. F. Sikos, “Description Logics: Formal Foundation for Web Ontology Engineering”, in Description Logics in Multimedia Reasoning, 2017.

Blog

Sheaf Theory Applications and Use Cases

We’ll continue our overview of what sheaves are, how they can be useful, and their real-world applications in areas like document analysis, recommendation systems, engineering, and molecular design.

Sheaf Theory: From Deep Geometry to Deep Learning

Mathematics often provides unexpected tools that revolutionize how we think about practical problems. One of its more recent and wildly useful tools? Sheaf theory.

AI Safety

How does Noeon Research ensure its technology remains safe while improving in capability?

Machine Learning Assisted Graph Algorithms

What is the fastest way to do subgraph matching for knowledge extraction and meaning grounding?

Goal Decomposition

How to make an AI system that can decompose complex problems on different abstraction levels for efficient reasoning?

Initial Population of Knowledge Base

What is the cheap and reliable way to incorporate both common and domain-specific knowledge into the knowledge base?

Pragmatic Communication

How the system can identify a lack of knowledge and make informationally dense request?

Noeon Research UK Ltd is a registered company in England and Wales. Registration number: 16093898. VAT registration number: 490 4632 84.
C/O Mackrell Solicitors, 60 St Martins Lane, Covent Garden, London, United Kingdom, WC2N 4JS.

(01)

(02)

(03)

(04)

(05)

(06)