Noeon Research
Noeon Research
Initial Population of Knowledge Base

For Noeon Research system to be applicable, it needs to automate organisational knowledge processing and structuring. It should convert project specifications, documentation, and code into a rich internal knowledge representation capable of handling both highly structured and natural language data. This representation should connect to vast pre-existing common-sense and IT-specific knowledge in order to swiftly and automatically learn project specifics.

Every knowledge-based system needs an initial population of its knowledge base, no matter what form it takes – explicit knowledge graph or non-interpretable LLM weights. Cyc – arguably the most comprehensive and vast knowledge graph and symbolic reasoning system – has been in development for almost 40 years, and much of that time was spent inputting handcrafted facts and rules [1]. LLMs require only a few weeks of training distributed across a fleet of machines. But while ontologies [2] and knowledge graphs are laborious to build, LLMs are opaque, hard to reason about and interpret.

However, it’s possible to automatically extract knowledge from LLMs in the form of a symbolic Knowledge Graph using symbolic knowledge distillation [3] techniques. Application of the same approach to current multi-modal LLMs has the potential to cover also not-so-common-sense knowledge from diverse areas like Physics, Chemistry and IT with highly structured knowledge in the form of formulae, code and diagrams.

This approach, however, is very new, and its applicability to wider contexts remains speculative. Moreover, it needs an extensive restricted natural language dataset expressing relevant knowledge to coax the LLM into generating further bits of knowledge in the same domain.

Another promising direction is to fine-tune pre-trained LLMs on a structured corpus in order to produce interpretable causal explanations [4]. These explanations are shaped into a predetermined structure in a restricted fragment of natural language and can be easily parsed into a symbolic form [5]. This approach also requires a relevant seed dataset for fine-tuning. Moreover, a limited context window demands clever prompt engineering in order to present an LLM with all relevant information and structure and elicit correct generation.

AI Safety
How does Noeon Research ensure its technology remains safe while improving in capability?
Mar 29, 2024
© 2024 Noeon Research. All rights reserved.
Midtown Tower 18F, 9-7-1 Akasaka, Minato-ku, Tokyo, Japan