Creating a published trend chart using the rentrez package in R
For research enthusiasts, understanding publication trends of different topics in major databases, e.g., PubMed, can be a vital tool. In this post, we will see how to make a trend chart of PubMed search of a given term using the rentrez
First, we need to create a handy function to extract the data on search trends:
counts <- integer(end_year - start_year + 1)
years <- start_year:end_year
for (i in seq_along(years)) {
query <- paste(term, "[Title/Abstract] AND", years[i], "[PDAT]")
counts[i] <- entrez_search(db="pubmed", term=query, use_history=FALSE)$count
data.frame(Year = years, Count = counts)}
# Now we can use the function to pull the data for the term "suicide" from 1990 to 2021
df <- fetch_article_counts("suicide", 1990, 2021)
ggplot(df, aes(x=Year, y=Count)) +
geom_line(color="blue") +
geom_point(color="red") +
labs(title="Trend of Articles Published on 'Suicide' (1990-2021)",
x="Year", y="Number of Articles") + theme_mediocre()
How cool is that!
Now we will go a step further, to make a plot for multiple terms let’s say suicide
, climate change
, andinflation
years <- 1990:2021
search_terms <- c("suicide", "climate change", "inflation")
df <- expand.grid(Year = years, Term = search_terms) %>%
mutate(Count = mapply(function(term, year) {
query <- paste0(term, "[Title/Abstract] AND ", year, "[Date - Publication]")
entrez_search(db = "pubmed", term = query, retmax = 0)$count
}, Term, Year))
ggplot(df, aes(x = Year, y = Count, color = Term)) +
geom_line() +
geom_point() +
theme_minimal() +
panel.grid = element_blank(),
panel.background = element_blank(),
plot.title = element_text(hjust = 0.5),
legend.position = "bottom"
) + labs(
title = "Trend of PubMed Articles (1990-2021)",
x = "Year",
y = "Number of Articles")
I hope you find the chart useful. Remember that the plot is highly customizable. You can easily change the year range, add or remove search terms, or modify the plot aesthetics.