Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A search vignette #289

Merged
merged 3 commits into from
Oct 16, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ Suggests:
purrr,
ggplot2,
coro,
rentrez,
covr
Encoding: UTF-8
LazyData: true
Expand Down
2 changes: 1 addition & 1 deletion _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ url: https://docs.ropensci.org/openalexR/
template:
bootstrap: 5
bslib:
primary: "#BB4827"
primary: "#1a3b6e"
border-radius: 0.5rem
btn-border-radius: 0.25rem
reference:
Expand Down
93 changes: 93 additions & 0 deletions vignettes/articles/literature-search.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
title: "Literature search"
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

While `oa_fetch()` offers a convenient and flexible way of retrieving results from queries to the OpenAlex API, we may want to specify some of its arguments to optimize your API calls for certain use cases.

This vignette shows how to perform an efficient literature search, comparing to a similar search in PubMed using the [**rentrez**](https://github.com/ropensci/rentrez) package.

```{r setup, message=FALSE}
library(openalexR)
library(dplyr)
library(rentrez)
```

# Motivating example

Suppose you're interested in finding publications that explore the links between the **BRAF** gene and **melanoma**.

With the **rentrez** package, we can use the `entrez_search` function retrieves up to 10 records matching the search query from the PubMed database.

```{r}
braf_pubmed <- entrez_search(db = "pubmed", term = "BRAF and melanoma", retmax = 10)
braf_pubmed
braf_pubmed$ids |>
entrez_summary(db = "pubmed") |>
extract_from_esummary("title") |>
tibble::enframe("id", "title")
```

On the other hand, with **openalexR**, we can use the `search` argument of `oa_fetch()`:

```{r}
braf_oa <- oa_fetch(
search = "BRAF AND melanoma",
pages = 1,
per_page = 10,
verbose = TRUE
)
braf_oa |>
show_works(simp_func = identity) |>
select(1:2)
```

This call performs a search using the OpenAlex API, retrieving the top 10 results for the query "BRAF AND melanoma".
trangdata marked this conversation as resolved.
Show resolved Hide resolved

By default, an `oa_fetch()` call will return all records associated with a search, for example, querying "BRAF AND melanoma" in OpenAlex may return over 54,000 records.
Fetching all of these records would be unnecessarily slow, especially when we are often only interested in the top, say, 10 results (based on citation count or relevance — more on sorting below).

We can limit the number of results with the arguments `per_page` (number of records to return per page, between 1 and 200, default 200) and `pages` (range of pages to return, *e.g.*, `1:3` for the first 3 pages, default NULL to return all pages).
For example, if you want the top 250 records, you can set

- `per_page = 50, pages = 1:5` to get exactly 250 records; or
- `per_page = 200, pages = 1:2` to get 400 records, then you can slice the dataframe one more time to get the first 250.

# Sorting results

By default, the results from `oa_fetch` are sorted based on *relevance_score*, a measure of how closely each result matches the query.[^1]
If a different ordering is desired, such as sorting by citation count, you can specify `sort` in the `options` argument.

[^1]: *relevance_score* also includes a weighting term for citation counts: more highly-cited entities score higher.

Here are the commonly used sorting options:

- `relevance_score`: Default, ranks results based on query match relevance.
- `cited_by_count`: Sorts results based on the number of times the work has been cited.
- `publication_date`: Sorts by publication date.

```{r}
results <- openalexR::oa_fetch(
search = "BRAF AND melanoma",
pages = 1,
per_page = 10,
options = list(sort = "cited_by_count:desc"),
verbose = TRUE
)
```

# Conclusion

The `openalexR` package provides a powerful and flexible interface for conducting academic literature searches using the OpenAlex API. By controlling the number of results and the sorting order, you can tailor your search to retrieve the most relevant or impactful publications.
In cases where large datasets are involved, it's useful to limit the number of results returned to ensure efficient and timely searches.

We encourage users to explore further options provided by `openalexR` to refine their search and retrieve the specific data they need for their research projects:

- <https://docs.openalex.org/api-entities/works/search-works>
- <https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/search-entities>
Loading