Changing trends in antimicrobial publications

microbiology
data science
Published

January 12, 2022

Introduction

Publication trends can be used as a surrogate to guage the changing interests of the medical community in certain subjects. This approach has been used by Luz et al to show changes in antimicrobial resistance research between 1999 and 2018.

Similarly, trends in individual antimicrobial publications could perhaps give insight into the “popularity” of certain agents. We’ll use the data from the above study to look into this further.

Data prep

Conveniently the data from the above study has been made available for download. The file ‘data_osf.csv’ contains details of studies between 1999-2018 related to antimicrobials and AMR, pulled from pubmed. Once this is downloaded, we’ll load it into R. I’ll load the tidyverse package, and the AMR package, which will simplify some things later on. Firstly let’s have a look at the data using glimpse().

library(tidyverse)
library(AMR)
data <- read_csv('data_osf.csv')
data <- data %>% rename("study_group" = "name")
glimpse(data, width = 60)
## Rows: 158,616
## Columns: 10
## $ pmid        <dbl> 10614508, 10614958, 10614950, 10614949…
## $ year        <dbl> 1999, 1999, 1999, 1999, 1999, 1999, 19…
## $ title       <chr> "Cytokine therapy: a natural alternati…
## $ abstract    <chr> "Disease control in food production an…
## $ journal     <chr> "Veterinary immunology and immunopatho…
## $ topic       <dbl> 1, 60, 22, 21, 10, 20, 20, 79, 70, 21,…
## $ study_group <chr> "Strategies for emerging resistances a…
## $ theme       <chr> "strategy", "clinical", "organism", "s…
## $ country     <chr> "Australia", "Spain", "Spain", "Nether…
## $ who_region  <chr> "WHO Western Pacific Region", "WHO Eur…

The main fields of interest to us are the year of publication, title and abstract. I have renamed the name column to study_group to avoid confusion (this column describes what type of a study the item is).

Word Clouds

Without going into too much detail on how to create a word cloud, at a high level we need to:

  • clean up the title text (remove punctuation and stop words such as “at”, “by”, etc)
  • create a matrix of each word vs frequency detected in text
  • generate a word cloud using an R package such as wordcloud2

The code is below (wrapped in a convenience function to allow changing of antibiotic).

make_wordcloud <- function(x, abx){
  require(tidyverse)
  require(tm)
  require(ggwordcloud)
  if(!abx %in% AMR::antibiotics$name) stop('Invalid abx')
  words <- x %>% 
    subset(antibiotic == {{abx}}) %>% 
    subset(value == TRUE) %>% pull(title) %>% 
    VectorSource() %>% 
    Corpus %>% 
    tm_map(removePunctuation) %>% 
    tm_map(removeNumbers) %>% 
    tm_map(stripWhitespace) %>% 
    tm_map(content_transformer(tolower)) %>% 
    tm_map(removeWords, stopwords('en')) %>% 
    TermDocumentMatrix() %>% 
    as.matrix()
  words <- sort(rowSums(words), decreasing = T)
  words <- data.frame(word = names(words), freq = words)
  words <- words %>% filter(word != tolower({{abx}}))
  words <- slice_head(words, n = 30)
  ggplot(words, aes(label = word, size = freq)) +
  geom_text_wordcloud() +
  theme_minimal()
}

make_wordcloud(data_long, 'Rifampicin')

Rifampicin word cloud

Indeed the word cloud tilts towards TB terms (tuberculosis, mycobacterium, isoniazid). If we compar this to ciprofloxacin, for example, the results are different. Here we mainly see “salmonella”, “pseudomonas”, “escherichia”.

make_wordcloud(data_long, 'Ciprofloxacin')

Ciprofloxacin word cloud

Conclusion

Publication trends seem to reveal changes in academic interest for certain antibiotics. The results suggest also appear to reflect growing concern on certain AMR pathogens such as Cabapenem Resistant Enterobacteraceae, MDRTB, MRSA, etc. Relatively novel antimicrobials released within the data period appear to reach peak interest and plataeu quite quickly. If you have any ideas on what else could be looked at using this data, please do post below.