Skip to content

Instantly share code, notes, and snippets.

@seandavi
Last active March 10, 2026 15:01
Show Gist options
  • Select an option

  • Save seandavi/b73a6f4a9103a3f67474526e56e884b9 to your computer and use it in GitHub Desktop.

Select an option

Save seandavi/b73a6f4a9103a3f67474526e56e884b9 to your computer and use it in GitHub Desktop.
Temporary access to cmgd data
---
title: "CMGD data getting started"
format: html
editor: visual
---
## Duckdb setup
```{r}
library(duckdb)
```
Set up the connection:
```{r}
con <- dbConnect(duckdb::duckdb())
```
```{r}
dbExecute(con, "CREATE OR REPLACE SECRET r2 (
TYPE s3,
KEY_ID 'b161e1f32a72992c284eeca78a76f5c0',
SECRET 'da3015476a7694d0261687af2a59ec1420c76dd7e10c746f8f65bf8363e78944',
ENDPOINT '55bf7202fe14474e57a300f56a652f64.r2.cloudflarestorage.com',
URL_STYLE 'path',
REGION 'auto'
)")
```
```{r}
dbGetQuery(con, "select count(*) from read_parquet('s3://cmgd-export/marker_abundance/**/*parquet')")
```
```{r}
dbGetQuery(con, "select * from read_parquet('s3://cmgd-export/marker_abundance/**/*parquet') limit 10")
```
```{r}
dbGetQuery(con, "select count(*) from read_parquet('s3://cmgd-export/marker_presence/**/*parquet')")
dbGetQuery(con, "select count(*) from read_parquet('s3://cmgd-export/metaphlan_viruses_list/**/*parquet')")
dbGetQuery(con, "select count(*) from read_parquet('s3://cmgd-export/metaphlan_unknown_list/**/*parquet')")
dbGetQuery(con, "select count(*) from read_parquet('s3://cmgd-export/marker_rel_ab_w_read_stats/**/*parquet')")
```
```{r}
dbExecute(con,'create view marker_abundance as select * from read_parquet("s3://cmgd-export/marker_abundance/**/*parquet")')
dbGetQuery(con, "select count(*) as records, study_name from marker_abundance group by study_name order by records desc")
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment