README is empty
| #!/usr/bin/env python | |
| """ | |
| Utility to generate a first-line support message for a user who has submitted a roundup issue. | |
| To only ever be one-file to discourage feature creep for a small utility. | |
| The only main extensions I'd really consider are automatically posting this message to roundup, | |
| but I don't think this is a good idea, as you may need to generate a couple of messages till | |
| an appropriate one is generated |
| { | |
| "patterns": { | |
| "P1": { | |
| "expression": "(path):(line)" | |
| }, | |
| "P2": { | |
| "expression": "(path)\\s+(line)", | |
| "path": "(?:\\/[\\w\\.\\-]+)+" | |
| } | |
| }, |
| #!/usr/bin/env python3 | |
| # the thing above is called a shebang. it tells your shell what program to use | |
| # to run this script. in this case, it says, this is python3. this makes it possible | |
| # to run the script by typing `thod...`, rather than `python3 thod ...` | |
| # the thing below is a module docstring. it's where you describe what the script | |
| # is and how it works. it shows up if you do `thod --help` | |
| """ |
| #!/usr/bin/env python3 | |
| """ | |
| Script to make a plain text corpus of PTSD narratives, | |
| with a little bit of metadata. | |
| """ | |
| import os | |
| import time | |
| import requests |
| # query a conll file | |
| CONLLU2_FILE="/Users/danielmcdonald/Downloads/test.conllu" | |
| QUERY="[pos=/V.*/]" | |
| LANGUAGE="german" | |
| API="https://weblicht.sfs.uni-tuebingen.de/tundra-beta/api/query/visres" | |
| curl -X POST -F "file=@$CONLLU2_FILE" -F "query=$QUERY" -F "lang=$LANGUAGE" "$API" > api-test.json | |
| # query a treebank | |
| ID="UD_French" | |
| QUERY="[pos=/V.*/]" |
Halfway through my PhD candidature in linguistics at Melbourne Uni, I was introduced by Fiona to the ResPlat family. One of their aims, I was told, was to train researchers across the university in emerging tools and methods for doing better, more reproducible research. A specific target of this agenda was the Humanities and Social Sciences, who, let's admit, sometimes lag behind a little when it comes to engagement with digital tools and methods.
IMAGE OF RESPLAT http://67.media.tumblr.com/ede2ddf22557269fd92dd13c4b344c53/tumblr_inline_nk9gcyW6pE1ssbz72.jpg "ResPlat Family"
My thesis was about corpus linguistics—that is, using computers to locate patterns in large collections of written text. Because of this, Fiona asked me if I could come on board and help out, teaching Python to researchers around the university, but with extra focus on those from the humanities. A key issue among corpus linguists, however, is that many don't really know how to code. A more common w
| %matplotlib notebook | |
| import seaborn as sns | |
| import numpy as np | |
| from scipy.spatial.distance import pdist | |
| from scipy.cluster.hierarchy import linkage, dendrogram | |
| # pdist can be 'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', | |
| # 'dice', 'euclidean', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'matching', | |
| #'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule' |
| from corpkit import * | |
| corpus = corpus('corpus_name') | |
| langmod = corpus.make_language_model('modelname') | |
| # score string | |
| langmod.score('Check similarity for this text to each corpus') | |
| # score file | |
| corpus_file = corpus.subcorpora[5].files[1] | |
| langmod.score(corpus_file) |
| # a list of ints | |
| l = [4, 2, 6, 7, 8, 1, 50, 23, 13, 55, 12, 3] | |
| # one line to sort them! | |
| [l.insert(ind, l.pop(l.index(min(l[ind:])))) for ind in range(len(l))] | |
| # see? | |
| print(l) | |
| # elaborated code |