class: center, middle, inverse, title-slide # AdvancedR chapter 22: Debugging ### Alejandra Hernandez-Segura ### 8-December-2020 --- # Welcome! - This meetup is part of a joint effort between RLadies Nijmegen, Rotterdam, 's-Hertogenbosch (Den Bosch), Amsterdam and Utrecht -- - We meet every 2 weeks to go through a chapter of the book "Advanced R" by Hadley Wickham -- - Slides from previous sessions are available: https://github.com/rladiesnl/book_club --- # Overall approach for Debugging Strategy recommended by Hadley Wickham: 1. Google + Remove variable names or values + Use specific packages for automation: `errorist107` or `searcher108` -- 2. Make it repeatable + MINIMAL reproducible example + Automated test case (if applicable) -- 3. Figure out where it is + Tools explained here + Ask for help (using the reproducible example) -- 4. Fix it and test it --- # Tools **Finding where the error is:** - Traceback in RStudio or through the function `traceback()` **Fixing the error:** - Editor breakpoint - Browser breakpoint `browser()` - `options(error = browser)` (Opposite: `options(error = NULL)`) - `debug()` and `debugonce()` --- # Women in the Workforce (TidyTuesday 4-Mar-2019) Data from the Bureau of Labor Statistics and the Census Bureau about women in the workforce. According to the AAUW - "The gender pay gap is the gap between what men and women are paid. Most commonly, it refers to the median annual pay of all women who work full time and year-round, compared to the pay of a similar cohort of men." The data is provided as is, and you recognize the limitations and issues in defining gender as binary. --- # An example ``` tibble [2,088 x 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame) $ year : num [1:2088] 2013 2013 2013 2013 2013 ... $ occupation : chr [1:2088] "Chief executives" "General and operations managers" "Legislators" "Advertising and promotions managers" ... $ major_category : chr [1:2088] "Management, Business, and Financial" "Management, Business, and Financial" "Management, Business, and Financial" "Management, Business, and Financial" ... $ minor_category : chr [1:2088] "Management" "Management" "Management" "Management" ... $ total_workers : num [1:2088] 1024259 977284 14815 43015 754514 ... $ workers_male : num [1:2088] 782400 681627 8375 17775 440078 ... $ workers_female : num [1:2088] 241859 295657 6440 25240 314436 ... $ percent_female : num [1:2088] 23.6 30.3 43.5 58.7 41.7 63.5 33.6 27.5 53.5 76.9 ... $ total_earnings : num [1:2088] 120254 73557 67155 61371 78455 ... $ total_earnings_male : num [1:2088] 126142 81041 71530 75190 91998 ... $ total_earnings_female: num [1:2088] 95921 60759 65325 55860 65040 ... $ wage_percent_of_male : num [1:2088] 76 75 91.3 74.3 70.7 ... - attr(*, "spec")= .. cols( .. year = col_double(), .. occupation = col_character(), .. major_category = col_character(), .. minor_category = col_character(), .. total_workers = col_double(), .. workers_male = col_double(), .. workers_female = col_double(), .. percent_female = col_double(), .. total_earnings = col_double(), .. total_earnings_male = col_double(), .. total_earnings_female = col_double(), .. wage_percent_of_male = col_double() .. ) ``` --- # Our goal <img src="WomenInWorkforce_files/figure-html/unnamed-chunk-3-1.png" width="648" height="103%" style="display: block; margin: auto;" /> --- # Non-interactive sessions - Steps are the same than before - Common problems: + Different environment + Different working directory + Different environment variables (like PATH or R_LIBS) --- # Specific tools for non-interactive sessions `dump.frames()` ```r # In batch R process ---- dump_and_quit <- function() { # Save debugging info to file last.dump.rda dump.frames(to.file = TRUE) # Quit R with error status q(status = 1) } options(error = dump_and_quit) # In a later interactive session ---- load("last.dump.rda") debugger() ``` If you work in bash... ```bash Rscript my_r_script.R &> log_file.txt ``` --- # My example ```r # Code to get input from user # Load libraries # Looking for files in input (run) directory list_R1_fastq <- list.files(run_dir, pattern = "R1_?\\d*.fastq$|R1_?\\d*.fastq.gz$", full.names = TRUE) list_R2_fastq <- list.files(run_dir, pattern = "R2_?\\d*.fastq$|R2_?\\d*.fastq.gz$", full.names = TRUE) # Get sample names sample_names <- list_R1_fastq %>% str_extract("[:alnum:]{0,2}[0-9]{1,3}[_-][0-9]{3}") ## Only keep files that are in the metadata list_R1_fastq <- list_R1_fastq[sample_names %in% metadata$assay_sample] list_R2_fastq <- list_R2_fastq[sample_names %in% metadata$assay_sample] # Only keep fastq files larger than 200kb ## Plot Quality Profiles R1 plotR1 <- plotQualityProfile(sample(list_R1_fastq, 50), aggregate = TRUE) + labs(title = "Quality plot Forward Reads") ggsave(plot_R1_file, plot = plotR1, width = 6.5, height = 5, units = 'in', dpi = 300) ## Plot Quality Profiles R2 # Filer and trim fastq files trimOut <- filterAndTrim(fwd = list_R1_fastq, rev = list_R2_fastq, compress = TRUE, maxN = maxN, # Does not allowed any Ns in the reads maxEE = maxEE, # Maximum expected errors truncLen = truncLen, # Truncating length (forward and reverse) rm.phix = TRUE, #Removes phix reads matchIDs = matchIDs, #Makes sure that both pairs are present multithread = TRUE, verbose = TRUE) colnames(trimOut) <- c("Initial Reads", "Trimmed reads") write.csv(trimOut, trimOut_file) ``` --- # The good old print statements... .panelset[ .panel[.panel-name[Undecipherable error] ``` V1 1 Loading required package: Rcpp 2 Registered S3 methods overwritten by 'ggplot2': 3 method from 4 [.quosures rlang 5 c.quosures rlang 6 print.quosures rlang 7 Warning message: 8 package â\200\230Rcppâ\200\231 was built under R version 3.6.3 9 Warning message: 10 package â\200\230purrrâ\200\231 was built under R version 3.6.3 11 Error in data.frame(sequence = names(freqtbl$top), count = as.integer(freqtbl$top), : 12 arguments imply differing number of rows: 0, 1 13 Calls: plotQualityProfile ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous> 14 Execution halted ``` ] ] .panel[.panel-name[Slightly less undecipherable error] ``` V1 1 Loading required package: Rcpp 2 Registered S3 methods overwritten by 'ggplot2': 3 method from 4 [.quosures rlang 5 c.quosures rlang 6 print.quosures rlang 7 Warning message: 8 package â\200\230Rcppâ\200\231 was built under R version 3.6.3 9 Warning message: 10 package â\200\230purrrâ\200\231 was built under R version 3.6.3 11 [1] Looking for files in input directory... 12 [1] Only keeping files larger than 200kb... 13 [1] Plotting quality profiles... 14 Error in data.frame(sequence = names(freqtbl$top), count = as.integer(freqtbl$top), : 15 arguments imply differing number of rows: 0, 1 16 Calls: plotQualityProfile ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous> 17 Execution halted ``` ] ] ] --- # Other non-errors - For warnings use: `options(warn = 2)` to turn warnings into error and use the the call stack, like `doWithOneRestart()`, `withOneRestart()` - For messages use: `rlang::with_abort()` to turn these messages into errors - A function might never return. Sometimes terminating the function and looking at the `traceback()` is informative - R crashing completely. This indicates a bug in compiled (C or C++) code - If the bug is in your compiled code, you’ll need to follow use an interactive C debugger (or insert many print statements) - If the bug is in a package or base R, you’ll need to contact the package maintainer. In either case, work on making the smallest possible reproducible example --- class: middle, inverse # .fancy[Time to share experiences!!!] --- # Share your experience - Did you know this tools? - Which debugging tools you use the most? - What is the worst error message/debugging experience you have ever had? --- class: middle, inverse # .fancy[Thank you!] <img src="files/thankyou.gif" width="40%" style="display: block; margin: auto;" /> ### We are almost at the end of the book! Follow the last chapter!!