Skip to contents

This function takes a dataframe and a variable, and expands it into binary indicators. The variable is split by the split_by separator, and each choice is represented by a binary column. The binary columns are separated by the bin_sep separator.

Usage

expand_bin(
  df,
  vars,
  split_by = " ",
  bin_sep = ".",
  drop_undefined = NULL,
  value_in = NULL,
  value_in_suffix = NULL,
  remove_new_bin = TRUE,
  remove_other_bin = TRUE
)

Arguments

df

The input dataframe.

vars

The name of the variables to expand.

split_by

The separator used to split the variable into choices (default: " ").

bin_sep

The separator used to separate the original variable name and the choice name in the binary columns (default: ".").

drop_undefined

A character vector of values to consider as undefined. Defaults to NULL if none.

value_in

A character vector of values to consider as value_in. Defaults to NULL if none.

value_in_suffix

A character scalar or an empty string to append to the variable names. Defaults to NULL.

remove_new_bin

A logical scalar indicating whether to remove the new binary columns if they already exist in the dataframe. Defaults to TRUE.

remove_other_bin

A logical scalar indicating whether to remove other binary columns starting with the variable name and the bin_sep. Defaults to TRUE.

Value

The modified dataframe with as many binary columns as there are choices in the original variable.

Examples

df <- data.frame(var1 = c("a b c", "a c", "d", NA), var2 = c("a b c", "a c", "c a", NA))
df <- expand_bin(df, c("var1", "var2"))
#> Warning: Converting df to data.table.
df
#>      var1 var1.a var1.b var1.c var1.d   var2 var2.a var2.b var2.c
#>    <char>  <int>  <int>  <int>  <int> <char>  <int>  <int>  <int>
#> 1:  a b c      1      1      1      0  a b c      1      1      1
#> 2:    a c      1      0      1      0    a c      1      0      1
#> 3:      d      0      0      0      1    c a      1      0      1
#> 4:   <NA>     NA     NA     NA     NA   <NA>     NA     NA     NA