Learning Bits

Logging the progress.

Publishing Octopress over FTPES

My webhost does not support ssh, the only secure transfer method that they provide is FTPES. I therefore had to put together my own Octopress deployment method. Since rake generate also copies unchanged files to public/ (and thus modifies their mtime stamps), I combine the following three steps to only upload changed files and delete stale files on the server:

  1. Use checksum-based rsync locally to identify changed files

  2. Use lftp to synchronize the content with the web server

  3. Use openssl for secure transfer

None of this is difficult. Hopefully, reading this post saves you some time if you have to solve the same problem.

Canonical Correlation Analysis under Constraints

nscancor” is an R package for canonical correlation analysis (CCA) under constraints. As the name implies, the nscancor function has the same interface as cancor from the “stats” package, but supports enforcing constraints on the canonical vectors, such as non-negativity and sparsity.

The implemented algorithm is based on iterated regression (Sigg et al., 2007), and generalized deflation (Mackey, 2009) adapted from PCA to CCA. By using readily available constrained regression algorithms, it becomes straightforward to enforce the appropriate constraints for each data domain. And by using generalized deflation, each subsequent tuple of canonical variables maximizes the additional correlation not explained by previous ones.

I hope to do a proper writeup at a later date, but for now, here is an explanation of how to use the package and a demonstration of its benefits.

Preventing rMBP Thermal Meltdown

My Retina MacBook Pro (early 2013 model) has been too quiet lately.

I suspect that either the recent EFI or the SMC update modified the fan control curves, with the result that the fans stay at 2k RPM independent of thermal load. Running multi-threaded code, such as par2tbb which takes all the cores that it gets, quickly overheats the processor to the point of emergency shutdown.

At first I thought that there might be a hardware problem with the fans, but increasing the fan speed successfully using smcFanControl proved otherwise. An SMC reset had no effect1, and the firmware installers refuse to re-run.

The solution comes in the form of the Fan Control preference pane and daemon, which lets me specify a linear curve between measured temperature and desired fan speed. Unfortunately, the SMC address polled for reading the temperature does no longer exist on the Retina MBP and thus the reported temperature is stuck at 0 degrees. Fortunately, Fan Control is free software and MacRumors forum members compiled binaries with modified sensor addresses.

I settled on the version which reads the TC0F address. I don’t know exactly which sensor this address corresponds to, but comparing with iStat Menus it is close to the “CPU Die – digital” sensor, although the change in reported temperature is substantially slower.

If I find the time I will compile Fan Control myself for further fine tuning, but I am glad that I can run heavy workloads again. My thanks go to Lobotomo Software and MacRumors forum members xqdong and maratus. No thanks to Apple for the botched update.

  1. At first I was unsure if the SMC reset even took place. Apparently the only feedback for a successful reset is the charging indicator switching from orange to green and back. So make sure that the battery is charging before pressing the magic key combination.

My Strategy for Personal Backups

A sane backup strategy is guided by the data recovery needs and the appropriate threat model. Executing it must be effortless, or else it won’t be followed through.

As with every technical system, the backup system must be validated by testing — a backup is worthless unless the lost data can be restored from it. Unfortunately, this is easier said than done. To minimize the residual risk, good rules of thumb suggest to be conservative, to follow a simple strategy that is well understood, and to use software that is in wide use and that has a proven track record.

This post is an attempt to reflect my strategy for personal backups — to find bugs and opportunities for improvement. It is tailored to backing up an Apple laptop with an SSD running OS X 10.9, which sees daily use at home, at work and on the commute ride in between.

Non-Negative Sparse PCA Comparison

Version 0.4 of the nsprcomp package brings several improvements:

  • The various deflation methods have been replaced with generalized deflation (Mackey, 2009), which directly optimizes the additional variance explained by each component. Implementing generalized deflation required changes to the inner EM loop, and I was unsure at first whether they could be made efficient for high dimensional data. Fortunately, there is only a small constant increase in computational complexity.

  • nscumcomp includes a variational re-normalization step (i.e. recomputing the loadings given the support of the pseudo-rotation matrix), which improves the explained variance quite a bit.

  • Both nsprcomp and nscumcomp return the additional explained standard deviation of each component. This is identical to standard PCA for an orthogonal rotation matrix, but avoids double counting of explained variance for principal axes which are not pairwise orthogonal. See the asdev function documentation for details.

A comparison on the marty data from the EMA package illustrates the relative performance of sparse PCA methods with R implementations. This data matrix contains 23 expression profiles for 54613 genes, and thus explores the \( N \ll D \) case. The three methods considered are nsprcomp and nscumcomp, and arrayspc from the elasticnet package (version 1.1). PCAgrid from the pcaPP package has problems with long vectors in version 1.9-49 under R 3.0.1 and therefore could not be included in this comparison.

nsprcomp is on CRAN

When we published our ICML paper on sparse and non-negative PCA back in 2008, I thought it might be worthwhile to provide Matlab code for the emPCA algorithm available as well. Since then, I’ve received several requests and questions about its usage. While the core functionality was there, the implementation lacked a friendly interface and some additional functionality such as easy random restarts.

After two recent inquiries about using constrained PCA for portfolio optimization and combustion modeling, I decided to fill the gaps in the implementation. Because I primarily use R now, this provided an opportunity to learn about package writing and documentation for a public audience, with the goal of submitting the result to CRAN.

Evoluent Upright Mouse 4

I sometimes get funny feelings in my wrists. Shopping for hardware recently, I found the Evoluent Upright Mouse 4. It’s fundamental idea is intriguing: holding the mouse with the wrist in a vertical position provides a welcome change from resting the hand horizontally on the keyboard. Evoluent argues that the position is less tiresome because it avoids twisting the radius and ulna bones. I don’t know about the significance of this claim, but at least it sounds plausible.

I went ahead and bought the mouse. Unpacking and plugging it in revealed two issues:

  1. It’s not exactly pretty.
  2. There is a glaringly bright blue LED illuminating the company logo on the back of the mouse!