Contextual proximity: #6
Replies: 1 comment 1 reply
-
I must preface that I have no real background in LLMs, graph theory, etc. Currently in grad school for data science trying to learn all I can. Contextual proximity:I'm not sure what direction this will take but I have some thoughts and I want to get out there, again I speak on no authority I'm just interested in this stuff. The contextual proximity certainly does a great job connecting ideas, but it also strips some of the nuance that the LLM has given us. graph.csv this is the output from an input of the first 6 chapters of this book History of Economic Thought. Also for reference this is the dfg_merged.csv after it has gone through the contextual proximity function. The graph G that is produced has the following centrality measures. These are the top 10 nodes from each measure, `
Some of these are very good and, in my opinion(having read the book), ARE the most central concepts of this text corpus. Economic thought, state, usury, money, Aristotle, government, are good and relevant. Some are less so, Aquinas(st. Thomas), Pirate, plague, Cicero's parable. These are definitely relevant but I feel like they are being over-weighted somewhere. This same analysis on dfg1 yields much worse results, less relevant central nodes, somewhat proving the usefulness of contextual proximity. A possible next step could to do some preprocessing of dfg_merged.csv. Combine similar terms is an easy one, I'm not sure what else. Visualization:I am really trying to find a justification for the graph beyond its artistic satisfaction, there is a lot of value here for quickly viewing complex relationships but I think it has to be refined. Not sure how. I will continue thinking about this. I made an ontology of this same text with webProtege, here is a small snip-it. I like this because it quickly shows the flow of ideas and contributions throughout history. It's much more expansive in total and can be seen at the link above. Watching these ideas evolve from the point of view of their contributors is very interesting to me and I think has applications beyond learning/teaching, I will have to do some thinking about what those applications are. I think we are way off from having an LLM do this but maybe finding a way to incorporate hierarchical relationships could make the visualization more useful and could be used as a teaching tool. Some of these same relationships are being caught by the model which is exciting. On the knowledge graph, most of the edges are just contextual proximity but some maintain their unique edge, and some of these are quite insightful and interesting. Could restricting the choices for relationships that the model can choose from be possible? Only let it connect ideas with certain phrases/ideas like ones in the ontology. I have not done much prompt engineering, most of my experience with LLMs so far has been with pretraining and fine-tuning much smaller model for text classification, not sure how feasible this would be. I will try to work on this. Somehow combining contextual proximity and hierarchical relationships could open up some new paths for the visualization. |
Beta Was this translation helpful? Give feedback.
-
Background:
Q:
Could you explain your rationale on contextual proximity? As I understand it the contextual proximity weighs topics that occur in the same text chuck more than topics not in the same chunk. Does this have a negative impact on identifying concepts that are spread out over the whole corpus and pop up throughout? Is there another way to weigh the connections between nodes that doesn't account for proximity? Would it make sense to take dfg1, feed it back though an LLM and have it group similar nodes/edges together and go from there?
Also what do you think is the main purpose of the visualization? In your example about healthcare in India, the largest node is "doctors" and it has a strong connection to "India", this is on the nose and does not provide any more insight than simply reading the title of the article. Would dropping the largest 1 - 5% of nodes leave room for more subtle connections? The same issues happen in my graph as well and it seems like the more interesting things happen on the edges of the graph that represent non-obvious connections between concepts. Am I missing a different interpretation of the graph?
A:
Hey Luke, you have raised several great questions and ideas here
This idea is somewhat of a guess work. The concepts that are in the same text chunk may be related, and when we implement a RAG based approach on KG, It may be beneficial to map and many relation to a text chunk as possible.
Contextual proximity actually has a positive impact on identifying the concepts that are spread across the text. Because every chunk that the concept appears in, increases the degree of the concept, sometimes undesirably so.
But I too feel it is not the best way to do it.
Your idea of feeding dfg1 back to LLM to identify non-proximity connections is excellent. We should experiment on this.
Your idea of dropping two 1% nodes is also very good. Some concepts like 'India' and 'Doctors' are bound to be ubiquitous in the body of work. This can be easily implemented by identifying the outlier nodes based on their degree, and remove all of them from the KG.
Would you mind if we take this discussion on GitHub Repo?
Also what do you think is the main purpose of the visualization?
I feel the main purpose of visualisation is artistic gratification.
Visualisation is used just to demonstrate the possibilities. A good use of KG or a Concept Graph will be to improve upon RAG or recursive RAG, and create a better AI based Agent.
Beta Was this translation helpful? Give feedback.
All reactions