[Community] Document Loader for Logseq #27400
ishaan-upadhyay
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Checked
Feature request
Logseq is an open-source knowledge base with >= 30K Github stars - see the repository here. We (a group of 4 university students from the University of Toronto) would like to implement document-loading support for Logseq to enable RAG through Langchain.
Motivation
I use Logseq quite heavily for taking notes and generally organizing information (to the point where my graph is getting quite complicated). With a first-class Langchain integration, using RAG to explore it would greatly simplify my searching process. Furthermore, as Logseq expands to incorporate collaborative editing with its database version and becomes more viable for organization-level knowledge bases, it will be very useful to be able to navigate it using retrieval-augmented generation.
Other knowledge bases, such as Obsidian, have their own loaders as well, which are able to make use of special properties instead of simply loading the directory and contained markdown files. Furthermore, there is interest from the Logseq community for LLM integration. There are also a few plugins (#1, #2) for Logseq based around integrating LLMs, though primarily for summarizing text or assisting in note generation rather than retrieval.
Proposal (If applicable)
Currently, Logseq operates on a flat-directory structure of Markdown files, under
pages
andjournals
respectively, with embedded assets stored inassets
. In the future, this may diverge as Logseq implements a database version (which should still have 2-way sync to the Markdown structure).Therefore, the initial implementation would be similar to the existing ObsidianLoader and would be used as follows, and we anticipate only having to add a
LogseqLoader
class:Metadata is also stored similarly at the top of the file as front-matter (but not in YAML format). We propose a further extension of the loading functionality to add metadata to documents for:
[[page name]]
) or tags (#page
) in the body of the file,A__B
in the file name corresponds toA/B
, withB
being a subpage ofA
.Some brief pseudocode below for how the
load
function would work:These can all be detected with regexes and should not be overly complicated.
If a strategy-based approach is more appropriate (considering the similarities), we could also use strategies for parsing front matter, parsing bodies, loading file paths and then use a base
lazy_load
function for knowledge bases, but this may be over-engineering for this problem.If accepted, we plan to submit a PR by no later than mid-November.
Beta Was this translation helpful? Give feedback.
All reactions