Learning
Bits

2014-02-21

Publishing Octopress over FTPES

My webhost does not support ssh, the only secure transfer method that they provide is FTPES. I therefore had to put together my own Octopress deployment method. Since rake generate also copies unchanged files to public/ (and thus modifies their mtime stamps), I combine the following three steps to only upload changed files and delete stale files on the server:

  1. Use checksum-based rsync locally to identify changed files
  2. Use lftp to synchronize the content with the web server
  3. Use openssl for secure transfer

None of this is difficult. But hopefully, reading this post saves you some time if you have to solve the same problem.

Read on …


2014-01-20

Canonical Correlation Analysis under Constraints

"nscancor" is an R package for canonical correlation analysis (CCA) under constraints. As the name implies, the nscancor function has the same interface as cancor from the "stats" package, but supports enforcing constraints on the canonical vectors, such as non-negativity and sparsity.

The implemented algorithm is based on iterated regression (Sigg et al., 2007), and generalized deflation (Mackey, 2009) adapted from PCA to CCA. By using readily available constrained regression algorithms, it becomes straightforward to enforce the appropriate constraints for each data domain. And by using generalized deflation, each subsequent tuple of canonical variables maximizes the additional correlation not explained by previous ones.

I hope to do a proper writeup at a later date, but for now, here is an explanation of how to use the package and a demonstration of its benefits.

Read on …


2013-11-29

Preventing rMBP Thermal Meltdown

My Retina MacBook Pro (early 2013 model) has been too quiet lately.

I suspect that either the recent EFI or SMC updates modified the fan control curves, with the result that the fans stay at 2000 RPM independent of thermal load. Running multi-threaded code, such as par2tbb which takes all the cores that it gets, quickly overheats the processor to the point of emergency shutdown.

At first I thought that there might be a hardware problem with the fans, but successfully increasing the fan speed using smcFanControl proved otherwise. An SMC reset had no effect, and the firmware installers refuse to re-run.

The solution comes in the form of the Fan Control preference pane and daemon, which lets me specify a linear curve between measured temperature and desired fan speed. Unfortunately, the SMC address polled for reading the temperature does no longer exist on the Retina MBP and thus the reported temperature is stuck at 0 degrees. Fortunately, Fan Control is free software and MacRumors forum members compiled binaries with modified sensor addresses.

I settled on the version which reads the TC0F address. I don't know exactly which sensor this address corresponds to, but comparing with iStat Menus it is close to the "CPU Die - digital" sensor, although the change in reported temperature is substantially slower.

If I find the time I will compile Fan Control myself for further fine tuning, but I am glad that I can run heavy workloads again. My thanks go to Lobotomo Software and MacRumors forum members xqdong and maratus. No thanks to Apple for the botched update.


2013-11-22

My Strategy for Personal Backups

A sane backup strategy is guided by the data recovery needs and the appropriate threat model. Executing it must be effortless, or else it won't be followed through.

As with every technical system, the backup system must be validated by testing - a backup is worthless unless the lost data can be restored from it. Unfortunately, this is easier said than done. To minimize the residual risk, good rules of thumb suggest to be conservative, to follow a simple strategy that is well understood, and to use software that is in wide use and that has a proven track record.

This post is an attempt to reflect my strategy for personal backups - to find bugs and opportunities for improvement. It is tailored to backing up an Apple laptop with an SSD running OS X 10.9, which sees daily use at home, at work and on the commute ride in between.

Read on …


2013-09-15

Non-Negative Sparse PCA Comparison

Version 0.4 of the nsprcomp package brings several improvements:

  • The various deflation methods have been replaced with generalized deflation (Mackey, 2009), which directly optimizes the additional variance explained by each component. Implementing generalized deflation required changes to the inner EM loop, and I was unsure at first whether they could be made efficient for high dimensional data. Fortunately, there is only a small constant increase in computational complexity.
  • nscumcomp includes a variational re-normalization step (i.e. recomputing the loadings given the support of the pseudo-rotation matrix), which improves the explained variance quite a bit.
  • Both nsprcomp and nscumcomp return the additional explained standard deviation of each component. This is identical to standard PCA for an orthogonal rotation matrix, but avoids double counting of explained variance for principal axes which are not pairwise orthogonal. See the asdev function documentation for details.

A comparison on the marty data from the EMA package illustrates the relative performance of sparse PCA methods with R implementations. This data matrix contains N=23N=23 expression profiles for D=54613D=54613 genes, and thus explores the NDN \ll D case. The three methods considered are nsprcomp and nscumcomp and arrayspc from the elasticnet package (version 1.1). PCAgrid from the pcaPP package has problems with long vectors in version 1.9-49 under R 3.0.1 and therefore could not be included in this comparison.

Read on …