Educational Applications of AI in Web Browsers #108

AdamSobieski · 2023-03-18T22:19:10Z

AdamSobieski
Mar 18, 2023

Hello. I am interested in educational applications of AI in Web browsers.

I am looking forward to students being able to ask questions about and to engage in dialogues about Web documents, e.g., Wikipedia articles, and to do so with respect to PDF documents displayed in their Web browsers, e.g., arXiv, Google Scholar, Semantic Scholar, etc. Interestingly, these technologies could also benefit students by interoperating with their digital textbooks.

With respect to interoperations between AI and Web browsers, one can envision the page and scroll positions of documents as well as any content selections as being relevant to end-users’ questions and dialogues. One can also envision AI components being able to turn pages, scroll documents, and highlight selections of content while responding to end-users. Soon, AI systems may be able to open documents for end-users and subsequently turn pages, scroll to, and highlight relevant content while communicating.

I would like to request more information and documentation about how the document-related skills work. I am eager to learn more about these topics to be able to better brainstorm about educational applications of AI in Web browsers and about how new Web standards could be of use.

For starters: Can developers provide one or more documents' URLs and, potentially, content selectors to LLMs? Can developers ask questions while indicating for LLMs to consider one or more documents, or chapters or sections thereof, as being prioritized above others while answering those questions? Would this, instead, require uploading portions of documents before or during end-users' questions or dialogue moves? Thank you for any more information about LLMs and document-related skills!

timlaverty · 2023-03-20T01:09:35Z

timlaverty
Mar 20, 2023
Collaborator

Hi Adam, Great questions. Some answers to get you started:

Can developers provide one or more documents' URLs and, potentially, content selectors to LLMs?

Yes, here's a prompt asked of -003:

summarize the article at this URL: https://en.wikipedia.org/wiki/RimWorld

A possible summary is:

RimWorld is a sci-fi simulation game that allows players to create and manage a colony of survivors on a distant planet. The game is influenced by the works of Isaac Asimov, Robert A. Heinlein, and Ursula K. Le Guin, and features a dynamic story generator that adapts to the players' actions and choices. The game also incorporates elements of psychology, ecology, combat, and social relations, as well as various scenarios and challenges.

Can developers ask questions while indicating for LLMs to consider one or more documents, or chapters or sections thereof, as being prioritized above others while answering those questions?

I think so, i.e., you can ask the AI to prioritize different content in its responses. I believe sections and other docs would be fine. This would be worth experimenting for speicfics though:

Write a 1 paragraph summary of this video game: https://en.wikipedia.org/wiki/RimWorld but prefer the content in https://rimworldwiki.com/wiki/Ideology_(DLC) because it's a later update to the core game.

Possible summary:

RimWorld is a sci-fi colony simulation game where players manage the lives, needs, and beliefs of a diverse group of survivors on a hostile planet. The game features a dynamic story generator that creates events and challenges based on the colonists' personalities, skills, and relationships. In the Ideology DLC, players can customize their own belief system or choose from various pre-made ones, such as transhumanism, or tribalism, and shape their colony's

Would this, instead, require uploading portions of documents before or during end-users' questions or dialogue moves?

You could include the docs & articles in your prompts. Another thought is using a mix of native and semantic code to chunk your docs up, summarize pieces of them, and ask the AI to adjust priority based on those summaries. Skills I think you might be interested in: HTTPSkill, Summarizer skill. I suspect you will want to customize what is there, using your own custom prompts as you build our your app above. I suspect though the inbox skills will get you through a POC and give you a feel for what you'd want to change / add.

HTH- TIm

0 replies

evchaki · 2023-03-21T21:30:14Z

evchaki
Mar 21, 2023
Collaborator

Adding @Stevenic here to see if he can help @AdamSobieski with any ideas on prompts for this great use case.

0 replies

Stevenic · 2023-03-21T23:30:40Z

Stevenic
Mar 21, 2023

@AdamSobieski to pile on to what @timlaverty suggested you're probably going to want to ground the model by showing it all or at least part of the document you want to discuss. The key issue here is that you have only so many tokens to work with and passing in a large document could easily chew up most of your budget. As Tim suggested you could break your document up into chunks and then use Semantic Search to identify the most relevant portions of the document in the prompt. To do that the document chunks need to be converted into embeddings and the users question needs to be converted to embeddings. You can then use cosine similarity to compare each chunk to the users query and stack the rank the results. It's probably a research topic in itself to identify how to best break up the source document...

0 replies

Stevenic · 2023-03-21T23:46:12Z

Stevenic
Mar 21, 2023

One tricky bit will be how do you identify new document chunks to pull into the prompt... You want your students to feel like they can discuss any part of the document or text regardless of its size. This could mean that as the conversation progresses new parts of the document might become more relevant... To help with that you could take not just the users current query but maybe the last 5 user message and generate embeddings for that and compare that to your document chunks. That way you're building a sliding window of sort over the document that can be moved around as the conversation progresses...

I'd probably also feed the document fragments into the prompt in source order and give the model a since of where the gaps are. For example your prompt might look like this:

You are a conversational AI having a discussion with a student about a 20 page document titled "Some Title".
Here is a selection of relevant pages from that document:

<page 3>
Contents of page 3.

<page 4>
Contents of page 4.

<page 11>
Contents of page 11.

Here is the conversation:

Student: Can you explain to me how this does that?
AI:

This will give the model a sense of what information it's not seeing. Just keep in mind it might try to fill in those gaps on its own. If the document has a summary it might be useful to always include that so that if the model does try and stray from the facts, it at least a hope of synthesizing something that could be true.

1 reply

AdamSobieski Mar 22, 2023
Author

Thank you for all of the in-depth answers. I'm looking forward to exploring these topics upcoming with Semantic Kernel.

My initial thoughts include that there could be optimizations for learning at scale. A large number of students might utilize a dialogue-based service concurrently and, at least for non-virtual courses, they would tend to progress through the same course materials together.

With URL-addressable chunks, e.g., https://www.publisher.org/textbook/2023/ch1/se2/page3.html#p4, perhaps a multi-user cache could be checked, using some protocol, to determine whether there already exists a stored embedding for the provided key. I’m new to Semantic Kernel and, so, I’m guessing that this might pertain to Semantic Kernel memories?

Tables generated from educational content could, similarly, be uploaded, memoized, cached, and stored server-side. I’m generally interested in how LLMs interoperate with tables, data, spreadsheets, and so forth. I find the new Copilot in Excel to be interesting.

We can envision a table which includes URL/URI-based paths to content as the primary key and also, in another column, the corresponding natural-language content. With such a table uploaded to and understood by an LLM, one could ask the LLM which row’s natural-language content (if any) most resemble the answer to a student's prompt question. One could ask the LLM to return the primary key for the matching row(s), instead of or alongside the natural-language response. As considered, these keys would be useful for turning to pages, scrolling to content, and highlighting and displaying documents' contents for end-users.

Also, with table-based approaches, some computations could be parallelized across instances of LLMs running across compute nodes.

In addition to retrieving single selections of documents' contents with which to answer to students' questions, students could be provided with multiple candidate answers, or with multiple results, each result drawn from the same document set. Perhaps an RSS-styled UI widget could be displayed in a sidebar, with each RSS item hyperlinking into a document, to a page, to a scroll position, and to a highlighting of selected content. These RSS items could be weighted and the highest weighted one, or the most probable candidate answer, could be automatically navigated to, for the end-user, while other possibilities were displayed in a sidebar. There are many exciting UX possibilities, in these regards.

Software could process digital textbooks and their contained schema data to select portions or chunks which are more likely than other content to be answers to students’ questions. Interestingly, digital-textbook-specific schemas could also be more broadly useful for training LLMs. For instance, digital textbook authors could use schemas to distinguish worked examples and exercises from other natural-language content.

One tricky bit will be how do you identify new document chunks to pull into the prompt... You want your students to feel like they can discuss any part of the document or text regardless of its size. This could mean that as the conversation progresses new parts of the document might become more relevant... To help with that you could take not just the users current query but maybe the last 5 user message and generate embeddings for that and compare that to your document chunks. That way you're building a sliding window of sort over the document that can be moved around as the conversation progresses...

Context management during dialogues is also a very interesting topic. Brainstorming, perhaps some sort of index could be of use for mapping dialogue topics to pages, portions of content, or to document chunks? Thank you again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Educational Applications of AI in Web Browsers #108

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Educational Applications of AI in Web Browsers #108

AdamSobieski Mar 18, 2023

Replies: 4 comments · 1 reply

timlaverty Mar 20, 2023 Collaborator

evchaki Mar 21, 2023 Collaborator

Stevenic Mar 21, 2023

Stevenic Mar 21, 2023

AdamSobieski Mar 22, 2023 Author

AdamSobieski
Mar 18, 2023

Replies: 4 comments 1 reply

timlaverty
Mar 20, 2023
Collaborator

evchaki
Mar 21, 2023
Collaborator

Stevenic
Mar 21, 2023

Stevenic
Mar 21, 2023

AdamSobieski Mar 22, 2023
Author