Created
April 1, 2019 16:59
-
-
Save emdupre/3cb4d564511d495ea6bf89c6a577da74 to your computer and use it in GitHub Desktop.
Pull download keys for all files listed in an OSF repository
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import json | |
| import re | |
| import requests | |
| repo = '5hju4' # my example repository, update as appropriate | |
| query = '?filter[name][contains]=confounds_regressors.tsv' # my example query, update or leave blank as appropriate | |
| url = 'https://api.osf.io/v2/nodes/{0}/files/osfstorage/{1}'.format(repo, query) | |
| guids = [] | |
| while True: | |
| resp = requests.get(url) | |
| resp.raise_for_status() | |
| data = json.loads(resp.content) | |
| for i in data['data']: | |
| sub = re.search(r'sub-(\S+)_task', i['attributes']['name']).group(1) | |
| guids.append((sub, i['id'])) | |
| url = data['links']['next'] | |
| if url is None: | |
| break |
Author
Hi @emdupre,
Thanks, when doing those changes I do get 384 guids, corresponding to our folders.
Now, how do you query the hashes to get the contents of each folder ?
BTW where did you find this info ? Is this from the API guide on the OSF website ? We re having a hard time extracting the relevant info from this doc...
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi @nicofarr !
It looks like you currently have Dropbox, Gitlab, and OSFStorage as providers in that repository. Just to be clear, the code here is only going to pull from OSF storage ! If that's the behavior you're looking for, the second thing to note is that while your OSFStorage is structured such that all the data files are in subfolders, the code here assumes a flat directory structure. Which means we'll need to add another layer to this, in addition to updating the query. So, I'd suggest the following three immediate changes:
repo = 'h285u'query = ''sub = re.search(r'sub-(\S+)_task', i['attributes']['name']).group(1)tosub = i['attributes']['name']This should then return the hashes for each of the individual folders (along with the folder names) in guids. Of note, it only seems to return
10 folder names for me. Are you expecting more to have data ?
Then you'd need to query those hashes to get the associated contents. Let me know if that's in line with what you're looking for !