-
Notifications
You must be signed in to change notification settings - Fork 887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AI Chat] Introduces support for host-specific distiller scripts #25722
Conversation
components/ai_chat/distiller_scripts/scripts/github.com/branches.ts
Outdated
Show resolved
Hide resolved
components/ai_chat/distiller_scripts/scripts/github.com/branches.ts
Outdated
Show resolved
Hide resolved
3fec07d
to
5e36be3
Compare
if we intend to ship this, please file a sec review ticket |
Working on that now :) |
5e36be3
to
9495662
Compare
A Storybook has been deployed to preview UI for the latest push |
components/ai_chat/distiller_scripts/grit/ai_chat_site_distiller_generated.h
Outdated
Show resolved
Hide resolved
5bf95be
to
c76399f
Compare
components/ai_chat/distiller_scripts/scripts/github.com/branches.ts
Outdated
Show resolved
Hide resolved
components/ai_chat/distiller_scripts/scripts/github.com/branches.ts
Outdated
Show resolved
Hide resolved
I'm going to work on breaking this logic up and restructuring it into an updatable component. |
c76399f
to
00dcfea
Compare
A Storybook has been deployed to preview UI for the latest push |
00dcfea
to
5445ae1
Compare
No more events. By outputting an ES6 module we can wrap the logic in our own IIFE, and return the distilled representation of the page (if any).
There's ultimately no need to do any special wrapping here if we can do it instead at build/transpilation time.
This approach fits the convention more commonly seen in the codebase.
The main change here is the detection of a user's profile page, which is something we can distill (unlike other pages on X which may have a similar path-style/pattern). Additionally, this change introduces a small amount of refactoring, as well as a bug fix (need to invoke `isSupportedPage`). Lastly, this change introduces a small preface to the in-situ metadata, to assist the LLM in understanding where the actual page content begins within the distillation result.
d873edf
to
46abf01
Compare
[puLL-Merge] - brave/brave-core@25722 DescriptionThis PR introduces a new feature for host-specific distillation scripts in the Brave AI Chat functionality. It adds support for custom site-specific content extraction, particularly for GitHub and X (formerly Twitter) websites. The changes include new TypeScript files for content distillation, modifications to existing C++ files to incorporate the new feature, and updates to build configurations. ChangesChanges
sequenceDiagram
participant User
participant BraveUI
participant ContentExtractor
participant DistillationScript
participant Website
User->>BraveUI: Requests content distillation
BraveUI->>ContentExtractor: Initiates content extraction
ContentExtractor->>Website: Loads webpage
ContentExtractor->>ContentExtractor: Checks for custom script
alt Custom script available
ContentExtractor->>DistillationScript: Executes custom script
DistillationScript->>Website: Extracts specific content
Website-->>DistillationScript: Returns raw content
DistillationScript-->>ContentExtractor: Returns distilled content
else No custom script
ContentExtractor->>Website: Extracts content using default method
Website-->>ContentExtractor: Returns raw content
end
ContentExtractor-->>BraveUI: Returns distilled content
BraveUI-->>User: Displays distilled content
Possible Issues
Security Hotspots
|
std::string_view script_content, | ||
int32_t world_id, | ||
base::OnceCallback<void(const std::optional<std::string>&)> callback) { | ||
// TODO (jonathansampson): Wrap scripts at build/transpile-time instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK(ai_chat::features::IsCustomSiteDistillerScriptsEnabled())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: TODO(jonathansampson)
and also is there an issue you can link to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issue to link to at this time; how we approach that might rely on whether or not the code is moved to a component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Released in v1.75.92 |
This changelist introduces a framework for distilling content from specific hosts using distiller scripts. The functionality is being introduced behind a
FEATURE_DISABLED_BY_DEFAULT
flag.The key elements of this change include supporting the
x.com
andgithub.com
domains, but it is designed to accommodate more sites in the future. Changes are made to only a few files, though many more files (mostly TypeScript) are being introduced.The newly introduced distillation framework is crucial for site-specific content extraction, allowing the Leo to provide distilled information from major sites. This change lays the groundwork for future expansions by adding more distiller scripts for additional hosts.
For GitHub, the distillation scripts focus on extracting data from branches, pull requests, and repository pages. Specifically, it captures information from tables like branch details (e.g., updated times, pull request statuses) and translates them into a distilled format for easy consumption. This provides a cleaner summary of repository activity, enhancing the ability to extract actionable insights from GitHub pages.
For X (formerly Twitter), the distillation scripts handle content from posts, notifications, user profiles, and media entities. The framework extracts key elements such as tweet text, user interactions (likes, retweets), and attached media (photos, videos), as well as structured content like notifications and quoted posts. This allows for a comprehensive summary of the user's feed or interactions on X, presented in a simplified and structured format.
The distillation scripts for both platforms are considered incomplete, but get us considerably further than where we are today.
Security Review: https://github.com/brave/reviews/issues/1754
Resolves brave/brave-browser#40794
Submitter Checklist:
QA/Yes
orQA/No
;release-notes/include
orrelease-notes/exclude
;OS/...
) to the associated issuenpm run test -- brave_browser_tests
,npm run test -- brave_unit_tests
wikinpm run presubmit
wiki,npm run gn_check
,npm run tslint
git rebase master
(if needed)Reviewer Checklist:
gn
After-merge Checklist:
changes has landed on
Test Plan:
Navigate to https://github.com/brave/brave-core/branches/all and ask Leo for a list of branches with an open (as opposed to a draft) pull-request. This is information Leo was previously not able to provide.