This is a sketch of a proposal for a "robots.txt for github" -- a policy that defines what actions automated tooling can take against a given repository.
Bots self-identify, and use project/repo-style naming. So code that lives at https://github.com/jacobian/coolbot identifies as jacobian/coolbot. Forks should generally use the upstream identifier until/unless they become different enough to warrent new names. This is a matter of judgement.
Policies live in .github/robots.yml. Well-behaved robots should consult this file before taking action.
Somewhat inspired by robots.txt, but in YAML to troll security researchers. I have no spec yet so here are examples:
Robots may not interact with this repository:
deny: *Go hog wild:
allow: *Nobody is welcome except jacobian/coolbot:
allow:
- jacobian/coolbotThat's the same as:
deny: *
allow:
- jacobian/coolbotThat is, an allow without a deny implies deny: *.
The same is true of a deny list. This allows any bot, except jacobian/coolbot:
deny:
- jacobian/bot1and that's the same as:
allow: *
deny:
- jacobian/coolbotIf there's both an allow and a deny list, an implicit deny: * should also be inferred. So given:
allow:
- jacobian/coolbot
deny:
- jacobian/otherbotjacobian/otherbot clearly should stay away, but so should jacobian/bot3 and all other bots. The above should be treated as:
allow:
- jacobian/coolbot
deny: *Bots can also be allowed or denied by organization. This policy welcomes bots from the Python Packaging Authority:
allow:
- pypa/*This policy welcome most bots, but none made by me:
allow: *
deny:
- jacobian/*Finally, policies may allow or deny specific actions. This policy allows jacobian/coolbot any action, and allows PyPA bots to open issues (but only open issues):
allow:
- jacobian/coolbot
- pypa/*@issuesValid actions are:
- `issues`
- `pull_requests`TBD: more granular permissions e.g. "open issue", "comment on issue", etc?
I think more than just access, it'd be good to have some kind of mandatory identification if the bot is going to take actions in a repo, which should include or link to some notion of maturity and/or purpose. And maybe require that read-only bots publicly maintain a log of repos they've scanned, though I know enforcement would be difficult.
My motivation is that I was experimented on by UChicago researchers who tested their source analysis tool by opening PRs against random repos without identification or consent. If the maintainer merged their PR, they claimed that as evidence of the tool's efficacy.