Skip to contents
library(impactR.analysis)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

This vignette emphasizes how to analyze survey data based on Kobo, leveraging both the svy_*() family of functions and a Kobo tool.

Proportion for select ones

As of version v0.0.3, kobo_select_one() uses only the information present in the dataset, and then retrieve labels from the Kobo tool. It means that it does not provide lines for responses that have not been chosen.


# With the labels, wonderful "var_label" and "var_value_label"
# Choices is optional
# Survey is mandatory
kobo_select_one(design, 
                vars = c("h_2_type_latrine", "admin1"), 
                survey, 
                choices, 
                group = "milieu")

# For all select ones in the survey sheet
kobo_select_one_all(design, survey, choices)

Proportion for select multiples

This function is deliberately conservative for the following:

  • if a choice exists in the Kobo tool, but not in the dataset, it is removed from the calculation;
  • if a choice exists in the dataset, but not in the Kobo tool, it will not be taken into account; for example, if a choice has been added and recoded during cleaning, the Kobo tool must be updated beforehand (which goes hand in hand with the good practice of having an up-to-date Kobo tool that can be used as a dictionary of variables;
  • input a filtered survey sheet with the variables corresponding to the data (main, hh roster, education loop, etc.).
# With the labels, note the "choices_sep" argument
# that allows for choosing the choice separator in the database
# either a "/" or "." or a "_" , etc.
# It still only accepts one variable
# Arg 'vars' can take a vector of select_multiple variables
kobo_select_multiple(design, c("e_typ_ecole", "e_typ_ecole"), survey, choices, choices_sep = "_")

# For all select multiples
kobo_select_multiple_all(design, survey, choices, choices_sep = "_")

Mean and median for numeric variables (decimal, integer, calculate)

# Mean for one or several numeric variables
kobo_mean(design, c("c_total_3_17_femmes", "e_abandont_3a_4a_fille"), survey)

# Median for one or several numeric variables
kobo_median(design, "f_5_depenses_ba", survey, group = "milieu")

# Do the same for all variables
kobo_mean_all(design, survey)
kobo_median_all(design,survey)

Ratio for numeric variables (decimal, integer, calculate)

kobo_ratio(design, nums = "e_abandont_3a_4a_fille", denoms = "c_total_3_17_femmes", survey = survey)

Interaction of variables (e.g. for needs profiles)

kobo_interact(design, c("h_2_type_latrine", "e_typ_ecole_publique"), survey = survey)

Quick automation

The auto_kobo_analysis() function runs all the above functions but kobo_ratio() at once.

While all these functions provide a quick workflow for analyzing survey data, the recommend way is to provide a data analysis plan and use functions kobo_analysis() or kobo_analysis_dap() (see below), which allows for finer analyses (e.g. providing ratios, labels of indicators, etc., beyond types as defined in the Kobo tool.)

Make your own analysis

# Calculate a mean
kobo_analysis(design, analysis = "mean", vars = c("c_total_3_17_femmes", "e_abandont_3a_4a_fille"), survey)

# Calculate a median
kobo_analysis(design, analysis = "median", vars = "f_5_depenses_ba", survey)

# Calculate a ratio proportion
kobo_analysis(design, analysis = "ratio", vars = c("e_abandont_3a_4a_fille" = "c_total_3_17_femmes"), survey)

# Calculate a select_one proportion
kobo_analysis(design, analysis = "select_one", vars = c("h_2_type_latrine", "admin1"), survey, choices, na_rm = F)

# Calculate a select_multiple proportion
kobo_analysis(design, analysis = "select_multiple", vars = "e_typ_ecole", survey, choices)

# Calculat the proportio of an interaction
kobo_analysis(design, analysis = "interact", vars = c("h_2_type_latrine", "e_typ_ecole_publique"), survey)

Make your own analysis using a data analysis plan

Necessary columns are: analysis, vars, na_rm. Other arguments are to be passed for the whole data analysis plan, e.g. group, level, vartype, etc. If there are other columns, for instance useful for reporting such as the indicator name or the sector, it is kept. The function runs as is:

  • separate the dataframe to lists by analysis type
  • map out the analysis for each type
  • bind all
  • left_join the other columns

It should contain only one variable to analyze per row (or in the case of a ratio the two variables to calculate the ratio from separated by a comma or in the case of interaction the variables separated by a comma). The package contains an example:

analysis_dap |> dplyr::as_tibble()
#> # A tibble: 6 × 6
#>   sector              indicator                      var   analysis na_rm subset
#>   <chr>               <chr>                          <chr> <chr>    <chr> <chr> 
#> 1 General information Mean of the number of school-… c_to… mean     yes   House…
#> 2 Expenses            Median of food expenses        f_5_… median   yes   NA    
#> 3 Education           A ratio that does not really … e_ab… ratio    yes   Maybe…
#> 4 Sanitation          % of households by type of la… h_2_… select_… no    NA    
#> 5 Education           % of households by type of sc… e_ty… select_… yes   House…
#> 6 Education           % of households by type of sc… e_ty… select_… no    NA

Then, to run the analysis, do the following:

# Default
kobo_analysis_from_dap(design, analysis_dap, survey, choices, choices_sep = "_")

# Grouped and confidence level of 0.99
kobo_analysis_from_dap(design, analysis_dap, survey, choices, group = "milieu", level = 0.99, choices_sep = "_")