Skip to content

Instantly share code, notes, and snippets.

@alexg9010
Last active August 12, 2025 19:18
Show Gist options
  • Select an option

  • Save alexg9010/a71269078b6613edaeab138db2b62945 to your computer and use it in GitHub Desktop.

Select an option

Save alexg9010/a71269078b6613edaeab138db2b62945 to your computer and use it in GitHub Desktop.
Check if methylKit::unite with min.per.group filter would work for given sample number and chunk size
library(methylKit)
test_overflow <- function(n_samples, chunk_size) {
# simulate 10 samples
res <- dataSim(replicates = 10,
sites = 1e3,
treatment = rep(1, 10)) |>
makeMethylDB(dbdir = tempdir()) |> # convert to tabix
suppressMessages() |>
getDBPath() |>
TabixFile(yieldSize = 100) |>
open() |>
Rsamtools::scanTabix() |>
paste(collapse = "\n") |>
paste0(collapse = "\n") |>
object.size() * (n_samples / 10) * (chunk_size / 100)
if(res < 2^31 - 1) {
message("chunk size is fine")
return(invisible(TRUE))
}
message(
sprintf(
"chunk would exceed 2^31-1 bytes by factor %2.f",
as.numeric(res)/(2^31-1)
)
)
message(
sprintf(
"Consider reducing chunk size to > %0.f",
chunk_size / (as.numeric(res)/(2^31-1)) |> round()
)
)
return(invisible(chunk_size / (as.numeric(res)/(2^31-1)) |> round()))
}
test_overflow(n_samples = 1000, chunk_size = 500000)
test_overflow(n_samples = 1000, chunk_size = 50000)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment