diff --git a/docs/index.html b/docs/index.html new file mode 100644 index 0000000..f3d69c6 --- /dev/null +++ b/docs/index.html @@ -0,0 +1,393 @@ + + + +
+ + + + ++ The challenge in object-based visual reasoning lies in generating descriptive yet distinct concept + representations. Moreover, doing this in an unsupervised fashion requires human users to understand a + model's learned concepts and potentially revise false concepts. In addressing this challenge, we + introduce the Neural Concept Binder, a new framework for deriving discrete + concept representations + resulting in what we term “concept-slot encodings”. These encodings leverage both "soft binding" via + object-centric block-slot encodings and "hard binding" via retrieval-based inference. The Neural Concept + Binder facilitates straightforward concept inspection and direct integration of external + knowledge, such + as human input or insights from other AI models like GPT-4. Additionally, we demonstrate that + incorporating the hard binding mechanism does not compromise performance; instead, it enables seamless + integration into both neural and symbolic modules for intricate reasoning tasks, as evidenced by + evaluations on our newly introduced CLEVR-Sudoku dataset. +
++ Our proposed Neural Concept Binder (NCB) framework tackles the challenge of + learning inspectable + and revisable object-factor level concepts from unlabeled images by combining two key elements: (i) + continuous representations via (block-)slot-attention based image processing with (ii) discrete + representations via retrieval-based inference. +
++ One key advantage of NCB's concept representations is their inherent + readability and inspectability. + The concepts of an object can be inspected and compared to different concepts of the same block. For a + more detailed understanding, concepts can even be swapped and new images based on the modified concept + representation can be generated. +
++ We introduce the CLEVR Sudoku dataset, a new benchmark that represents a + challenging visual puzzle requiring both visual object perception and reasoning capabilitie. The dataset + consists of 9x9 Sudoku puzzles with + varying degrees of difficulty. Each image is annotated with the correct solution to the puzzle, which + serves as the ground truth for evaluating the model's performance. +
+CLEVR-Sudoku requires visual perception as well as deductive reasoning skills. We as humans are able to + abstract the relevant information from the images to reason about missing cells. Try for yourself!
+ ++ In our evaluations we show the solving CLEVR Sudoku puzzles is harder than + one + might expect. Only small errors in the concept prediction (even in the case of supervised learning) + cause + a wrong symbolic representation of the grid and thus a wrong solution. With NCB + we propose a + strong baseline for solving the CLEVR Sudoku puzzles without supervision of the ground thruth concepts. +
+@article{stammer2024neural,
+ title={Neural Concept Binder},
+ author={Stammer, Wolfgang and W{\"u}st, Antonia and Steinmann, David and Kersting, Kristian},
+ journal={arXiv preprint arXiv:2406.09949},
+ year={2024}
+ }
+
+