Only up for five hours, but that’s plenty of time for the wrong person to spot it
An AWS engineer reportedly published “personal identity documents and system credentials including passwords, AWS key pairs, and private keys” to a public GitHub repository by accident.
On 13 January, security company UpGuard’s data leak service discovered a 954MB repository containing AWS resource templates (used to create cloud services) and log files generated in the second half of 2019.
The logs included hostnames, which could be used to identify the customers involved, but the bigger concern was over customer credentials.
“Several documents contained access keys for various cloud services,” UpGuard reported. “There were multiple AWS key pairs including one named ‘rootkey.csv,’ suggesting it provided root access to the user’s AWS account. Other files contained collections of auth tokens and API keys for third party providers. One such file for an insurance company included keys for messaging and email providers.”
A couple of hours after the discovery, UpGuard notified AWS security and it was taken offline. The repository was public for less than five hours. However, as UpGuard noted by referencing this paper (PDF) from North Carolina State University, there are ways to discover mishaps like this quickly via GitHub’s search features.
“One is able to discover 99 per cent of newly committed files containing secrets in real time,” it said. These researchers believe that “thousands of new, unique secrets are leaked every day”. What this means is that even five hours of exposure is plenty of time for confidential information to be picked up by criminals.
Scotiabank slammed for ‘muppet-grade security’ after internal source code and credentials spill onto open internet
Why do so many secrets end up in GitHub repositories? A common reason is that developers trying out some new ideas hardcode credentials into applications and then publish the code without thinking through the implications.
The problem is so common that GitHub has a Token Scanning service that “scans public repositories for known token formats to prevent fraudulent use of credentials that were committed accidentally”.
GitHub also recommends “considering any tokens that GitHub sends you messages about as public and compromised”.
In this case, however, the repository was “structured as general storage rather than application code, with many files in the top-level directory and no clear convention for the subdirectories”. Why was this in a GitHub repository at all? This is not known; it could be anything from an errant script to a misguided attempt to use GitHub like Dropbox, for exchanging large files.
How does UpGuard know that it was an AWS engineer? UpGuard said: “A LinkedIn profile matching the exact full name identified one person who listed AWS as their employer in a role that matched the kinds of data found in the repository.”
UpGuard added: “There is no evidence that the user acted maliciously or that any personal data for end users was affected, in part because it was detected by UpGuard and remediated by AWS so quickly.” It is an oddly complacent conclusion bearing in mind the statements that precede it, but AWS will be hoping it is correct; in the time between the breach and the publication of the news, it would be reasonable to assume that customers have been informed and credentials changed and invalidated.
Does GitHub make it too easy to search its repositories for passwords and access tokens? Should GitHub scan for tokens before rather than after they are in public repositories? Should such data be redacted from internal logs and support data just in case – as Microsoft appears to have done?
We have asked AWS for comment and will report back with any statements. ®