Publishing Octopress over FTPES
My webhost does not support ssh
, the only secure transfer method that they
provide is FTPES. I therefore
had to put together my own Octopress deployment method. Since rake generate
also copies unchanged files to public/
(and thus modifies their mtime
stamps), I combine the following three steps to only upload changed files and
delete stale files on the server:
- Use checksum-based
rsync
locally to identify changed files - Use
lftp
to synchronize the content with the web server - Use
openssl
for secure transfer
None of this is difficult. But hopefully, reading this post saves you some time if you have to solve the same problem.
Canonical Correlation Analysis under Constraints
"nscancor" is an R package for
canonical correlation analysis (CCA) under constraints. As the name implies, the nscancor
function has the same interface as cancor
from the "stats" package, but supports
enforcing constraints on the canonical vectors, such as non-negativity and sparsity.
The implemented algorithm is based on iterated regression (Sigg et al., 2007), and generalized deflation (Mackey, 2009) adapted from PCA to CCA. By using readily available constrained regression algorithms, it becomes straightforward to enforce the appropriate constraints for each data domain. And by using generalized deflation, each subsequent tuple of canonical variables maximizes the additional correlation not explained by previous ones.
I hope to do a proper writeup at a later date, but for now, here is an explanation of how to use the package and a demonstration of its benefits.
Preventing rMBP Thermal Meltdown
My Retina MacBook Pro (early 2013 model) has been too quiet lately.
I suspect that either the recent
EFI or
SMC updates modified the fan
control curves, with the result that the fans stay at 2000 RPM
independent of thermal load. Running multi-threaded code, such as
par2tbb
which takes all the cores that it gets, quickly overheats
the processor to the point of emergency shutdown.
At first I thought that there might be a hardware problem with the fans, but successfully increasing the fan speed using smcFanControl proved otherwise. An SMC reset had no effect, and the firmware installers refuse to re-run.
The solution comes in the form of the Fan Control preference pane and daemon, which lets me specify a linear curve between measured temperature and desired fan speed. Unfortunately, the SMC address polled for reading the temperature does no longer exist on the Retina MBP and thus the reported temperature is stuck at 0 degrees. Fortunately, Fan Control is free software and MacRumors forum members compiled binaries with modified sensor addresses.
I settled on the version
which reads the TC0F
address. I don't know exactly which sensor this
address corresponds to, but comparing with iStat Menus it is close to
the "CPU Die - digital" sensor, although the change in reported temperature is
substantially slower.
If I find the time I will compile Fan Control myself for further fine tuning, but I am glad that I can run heavy workloads again. My thanks go to Lobotomo Software and MacRumors forum members xqdong and maratus. No thanks to Apple for the botched update.
My Strategy for Personal Backups
A sane backup strategy is guided by the data recovery needs and the appropriate threat model. Executing it must be effortless, or else it won't be followed through.
As with every technical system, the backup system must be validated by testing - a backup is worthless unless the lost data can be restored from it. Unfortunately, this is easier said than done. To minimize the residual risk, good rules of thumb suggest to be conservative, to follow a simple strategy that is well understood, and to use software that is in wide use and that has a proven track record.
This post is an attempt to reflect my strategy for personal backups - to find bugs and opportunities for improvement. It is tailored to backing up an Apple laptop with an SSD running OS X 10.9, which sees daily use at home, at work and on the commute ride in between.
Non-Negative Sparse PCA Comparison
Version 0.4 of the nsprcomp package brings several improvements:
- The various deflation methods have been replaced with generalized deflation (Mackey, 2009), which directly optimizes the additional variance explained by each component. Implementing generalized deflation required changes to the inner EM loop, and I was unsure at first whether they could be made efficient for high dimensional data. Fortunately, there is only a small constant increase in computational complexity.
nscumcomp
includes a variational re-normalization step (i.e. recomputing the loadings given the support of the pseudo-rotation matrix), which improves the explained variance quite a bit.- Both
nsprcomp
andnscumcomp
return the additional explained standard deviation of each component. This is identical to standard PCA for an orthogonal rotation matrix, but avoids double counting of explained variance for principal axes which are not pairwise orthogonal. See theasdev
function documentation for details.
A comparison on the marty
data from the
EMA package illustrates the
relative performance of sparse PCA methods with R
implementations. This data matrix contains expression profiles for
genes, and thus explores the case. The three
methods considered are nsprcomp
and nscumcomp
and arrayspc
from
the elasticnet package
(version 1.1). PCAgrid
from the
pcaPP package has problems
with long vectors in version 1.9-49 under R 3.0.1 and therefore could
not be included in this comparison.