Skip to content

Instantly share code, notes, and snippets.

@ocean
Last active November 15, 2024 20:52
Show Gist options
  • Select an option

  • Save ocean/b0de93d10850afc3aadc0ba747a26563 to your computer and use it in GitHub Desktop.

Select an option

Save ocean/b0de93d10850afc3aadc0ba747a26563 to your computer and use it in GitHub Desktop.
Process an Omnivore export ZIP archive into a CSV

As a result of the Omnivore app shutting down, lots of people will have exported files to process to import into other read-it-later and bookmarking apps. I chose Raindrop.io but there are other alternatives.

(originally posted here omnivore-app/omnivore#4461)

For anyone trying to process their Omnivore export into something more suitable for import into Raindrop.io, etc, Omnivore suggests using a jq command to convert your files, but it doesn't work very well.

Once you've installed jq, here's a command which creates a nice CSV out of your metadata_*.json files from your Omnivore export, including extracting your tags and cleaning up any non-printable characters in your title and description fields:

jq -r '
  (["url","title","note","tags","created"]), 
  (.[] | [
    .url,
    (.title | gsub("\\n";" ") | gsub("\\r";" ") | gsub("\"";"''") | gsub("[^[:print:]]";" ") | gsub("\\s+";" ")),
    (.description | gsub("\\n";" ") | gsub("\\r";" ") | gsub("\"";"''") | gsub("[^[:print:]]";" ") | gsub("\\s+";" ")),
    ([.labels[]?]|join(",")),
    .savedAt
  ]) | @csv
' metadata_*.json > omnivore-export.csv

Once you've unzipped your Omnivore export .zip file, change into the directory where the metadata_*.json files are, and then you should be able to paste this command into your shell and run it (works for me in zsh and should work in bash as well).

For your edification, learning and enjoyment, here's a detailed breakdown of what this jq command does:

  • First, it outputs a header row with column names: ["url","title","note","tags","created"]
  • Then for each object in the JSON array (.[]), it creates an array with these transformations:
    • .url - Grabs the URL
    • For the .title field, it applies these cleanups in sequence:
      • gsub("\\n";" ") - Replaces newlines with spaces
      • gsub("\\r";" ") - Replaces carriage returns with spaces
      • gsub("\"";"''") - Replaces double quotes with two single quotes
      • gsub("[^[:print:]]";" ") - Replaces any non-printable characters with spaces
      • gsub("\\s+";" ") - Collapses multiple spaces into single spaces
    • Applies the same cleanups to the .description field
    • [.labels[]?]|join(",") - Takes the labels array and joins it into a single comma-separated string
    • .savedAt - Grabs the timestamp
  • Finally, | @csv formats everything as proper CSV, automatically:
    • Adding double quotes around fields that need them
    • Adding commas between fields
    • Creating proper line endings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment