Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libretexts: get real cover page #68

Closed
benoit74 opened this issue Nov 13, 2024 · 1 comment · Fixed by #70
Closed

Libretexts: get real cover page #68

benoit74 opened this issue Nov 13, 2024 · 1 comment · Fixed by #70
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@benoit74
Copy link
Contributor

Currently, to build index (#55) and soon detailed licensing (#54) and probably for the table of content (#56) we need to get the URL and ID of the "cover page" of current book.

In #55, we've built this logic by going up into the tree and search for a page with "article": "topic-category" property (see get_page_parent_book_id logic).

Real logic used online on libretexts.org is based on tags, in getCoverpage function of https://cdn.libretexts.net/github/LibreTextsMain/Miscellaneous/reuse.js:

    /**
     * Locates the parent page that is the coverpage, if it exists
     * @param url - page to look up the coverpage for
     * @returns {Promise<string>} - path to the coverpage
     */
    async function getCoverpage(url = window.location.href) {
        if (typeof getCoverpage.coverpage === 'undefined') {
            const urlArray = url.replace("?action=edit", "").split("/");
            for (let i = urlArray.length; i > 3; i--) {
                let path = urlArray.slice(3, i).join("/");
                if (!path)
                    break;
                let response = await LibreTexts.authenticatedFetch(path, 'tags?dream.out.format=json');
                let tags = await response.json();
                if (tags.tag) {
                    if (tags.tag.length) {
                        tags = tags.tag.map((tag) => tag["@value"]);
                    }
                    else {
                        tags = tags.tag["@value"];
                    }
                    if (tags.includes("coverpage:yes") || tags.includes("coverpage:toc") || tags.includes("coverpage:nocommons")) {
                        getCoverpage.coverpage = path;
                        break;
                    }
                }
            }
        }
        return getCoverpage.coverpage;
    }

As one can see, this code walks up the tree of pages and look for first page with coverpage:yes or coverpage:toc or coverpage:nocommons tags.

Obviously, this is something we might have to consider at some point (probably easier to compute on the tree of pages at crawl time).

To be analyzed at least, if not integrated into the crawler (finding all page tags means doing a query to retrieve them, we do not use them ATM, and this will have an impact on crawl time).

@benoit74 benoit74 added enhancement New feature or request question Further information is requested labels Nov 13, 2024
@benoit74 benoit74 added this to the 0.1 milestone Nov 13, 2024
@benoit74 benoit74 self-assigned this Nov 13, 2024
@benoit74 benoit74 removed the question Further information is requested label Nov 13, 2024
@benoit74
Copy link
Contributor Author

In fact, we need to retrieve tags only for the backmatter pages and their parents. Quite a limited impact, so let's do it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant