Skip to main content
Musings

Abusing regex in GitHub code search

I recently discovered that the new(ish) GitHub Code Search feature supports regular expressions. While dorking on the classic GitHub search has been documented to death by skids, I haven't seen anyone reference this yet. I'm sure someone is using it, since it's powerful.

Case in point:

/"[a-z]{4}(?: [a-z]{4}){3}"/ language:Python SMTP

That regex is a bit tricky, but it's just matching for 4 space-separated groups of 4 lowercase letters. What good does that do, you might ask? The SMTP should be a hint - it's some kind of credential for email. More specifically, this is the Google app password format.

This search has 5k hits.

I think Google will still block suspicious connections, so this isn't a huge pwn. But I guarantee at least one of these accounts has bad opsec, so you can determine their location. Combined with a bit of residential proxy work... you get the idea.

GitHub should be a lot more proactive about this stuff - blocking it from search, blocking it from public discovery, or even blocking the commit itself. Most people who are posting this stuff don't know how they could do it better.