Using gendertext

Introduction

The gendertext package provides simple, transparent tools for identifying gendered language in text and suggesting gender neutral alternatives. It is designed for researchers, policy analysts, editors, and practitioners who want to assess and improve inclusive language in documents.

The package follows a dictionary based approach. All results come from a built in corpus of gendered terms paired with suggested neutral replacements, so every match can be traced back to a specific dictionary entry.

The built in dictionary

The package ships with gender_dictionary, a curated dictionary of 208 gendered words and phrases. It covers occupational titles, pronouns, forms of address, family terms, and common idioms, informed by the United Nations guidelines for gender inclusive language and the European Parliament guidance on gender neutral language.

data(gender_dictionary)
head(gender_dictionary, 10)
#>       gendered         neutral
#> 1      actress           actor
#> 2       airman         aviator
#> 3       airmen        aviators
#> 4     alderman  council member
#> 5       alumna        graduate
#> 6       alumni       graduates
#> 7      alumnus        graduate
#> 8    anchorman     news anchor
#> 9  assemblyman assembly member
#> 10   authoress          author
nrow(gender_dictionary)
#> [1] 208

Scoring a text

The simplest way to use gendertext is to score a character string. The result reports how many tokens the text contains, how many of them are gendered according to the dictionary, and the corresponding percentages.

gender_score(
  text = "Ladies and gentlemen, the chairman said he will call the policeman."
)
#>   total_units gendered_units neutral_units gendered_percent neutral_percent
#> 1          11              6             5         54.54545        45.45455

The reported neutral percentage is a proxy: it is the share of tokens not matched by any dictionary entry. Multi word phrases are matched before single words and each piece of text is counted at most once, so the phrase “ladies and gentlemen” is counted as one match spanning three tokens, never as “ladies” plus “gentlemen” on top of the phrase.

If you only need the number of dictionary matches, use unit = "matches":

gender_score(
  text = "The chairman and the spokesman left.",
  unit = "matches"
)
#>   total_units gendered_units neutral_units gendered_percent neutral_percent
#> 1           2              2            NA              100              NA

Listing suggestions

gender_suggestions() returns the gendered terms found in a text together with the suggested neutral replacement for each one.

gender_suggestions(
  text = "Our chairman said he will email the mailman and the stewardess."
)
#>     gendered suggested_neutral count
#> 1   chairman             chair     1
#> 2         he              they     1
#> 3    mailman      mail carrier     1
#> 4 stewardess  flight attendant     1

Rewriting a text

gender_replace() applies the dictionary to the original text and returns a rewritten version. Capitalisation follows the matched text.

gender_replace(
  text = "The Chairman called the policeman and the FIREMAN."
)
#> [1] "The Chair called the police officer and the FIREFIGHTER."

Replacement is plain substitution: the function does not adjust the surrounding grammar, so a replacement such as “they” for “he” may need a manual touch afterwards. Treat the output as a draft.

Using your own dictionary

Every function accepts a custom dictionary through the dictionary argument: a data frame with character columns gendered and neutral. This makes it easy to extend, restrict, or fully replace the built in corpus.

my_dict <- data.frame(
  gendered = c("dude", "bro"),
  neutral = c("person", "friend")
)
gender_suggestions(text = "Hey dude, thanks bro!", dictionary = my_dict)
#>   gendered suggested_neutral count
#> 1      bro            friend     1
#> 2     dude            person     1

Working with files

The functions also accept a path argument. Plain text files are read with base R, so no additional packages are required.

txt <- system.file("extdata", "test.txt", package = "gendertext")
gender_score(path = txt)
#>   total_units gendered_units neutral_units gendered_percent neutral_percent
#> 1         113             20            93         17.69912        82.30088
head(gender_suggestions(path = txt))
#>      gendered suggested_neutral count
#> 1         his             their     5
#> 2         her              them     3
#> 3     actress             actor     1
#> 4 brotherhood         community     1
#> 5 businessmen    businesspeople     1
#> 6    chairman             chair     1

Other document formats, such as PDF and Word, are supported through the optional readtext package. Install it with install.packages("readtext").

pdf <- system.file("extdata", "test.pdf", package = "gendertext")
gender_score(path = pdf)
#>   total_units gendered_units neutral_units gendered_percent neutral_percent
#> 1         114             20            94         17.54386        82.45614

Please note: PDF analysis depends on the presence of extractable text. Scanned or image only documents may not yield readable content.

Limitations

Conclusion

gendertext offers a lightweight and reproducible way to examine gendered language in text. Its transparent, dictionary based design makes it suitable for research, policy review, editorial work, and exploratory analysis.