[Rd] typosquatting and trojan horses in packages

Ben Bolker bbolker at gmail.com
Fri Jun 10 23:20:10 CEST 2016

A friend passed along this interesting link:
http://incolumitas.com/2016/06/08/typosquatting-package-managers/ about
the strategy of using "typosquatting" (packages with very similar names
to existing packages) to trick users into downloading/installing
packages with malicious code).  They made fake trojans (with empty
payloads) for Ruby, Python and NodeJS and experimented to see how widely
they would be distributed (we can discuss the ethics of this experiment
later ...)

  For those who don't want to read the whole thing, the author points
that this attack vector is enabled by

1. The possibility of registering any package name and uploading code
without supervision.

   [CRAN is obviously supervised, but I'm not sure if there might be
ways to evade CRAN scrutiny and achieve this goal -- a lot of
obfuscation plus a test that avoided the malicious behavior if "running
on CRAN" was detected?  I know there has been discussion in the past
about how to have a package not do things like run long tests when on
CRAN ... I think CRAN maintainers would probably notice an attempt at
typosquatting on a common package (e.g. "ggplot" for "ggplot2", "nmle"
for "nlme"), but again I'm not sure ...)

2. The feasibility to achieve code execution upon package installation
on the host system.

  [I think this one is true, due to the possibility of including a
generic Makefile?]

I've pasted the author's comments about defense below. The whole page is
definitely worth reading.


  Ben Bolker

Defenses against typo squatting

In short, read the thesis. If you are too lazy, do the following:

Prevent Direct Code Execution on Installations This one is easy. Make
sure that the software that unpacks and installs a third party package
(pip or npm) does not allow the execution of code that originates from
the package itself. Only when the user explicitly loads the package, the
library code should be executed.

Generate a List of Potential Typo Candidates Generate Levenshtein
distance candidates for the most downloaded N packages of the repository
and alarm administrators on registration of such a candidate.

Analyze 404 logfiles and prevent registration of often shadow installed
packages Whenever a user makes a typo by installing a package and the
package is not registered yet, a 404 logfile entry on the repository
server is created (because the install HTTP requests targets a
non-existent resource). Parse these failed installations and prevent all
such names that are shadow-installed more than a reasonable threshold
per month.

More information about the R-devel mailing list