nsprcomp is on CRAN

2013-05-27

When we published our 2008 ICML paper on sparse and non-negative PCA, I thought it might be worthwhile to provide a Matlab implementation of our algorithm as well. Since then, I’ve received several requests and questions about the code. While the core functionality is there, the implementation lacks a friendly interface and some additional functionality, such as easy random restarts.

After two recent inquiries about using constrained PCA for portfolio optimization and combustion modeling, I decided to fill the gaps in the implementation. Because R has become my primary programming language, this provided an opportunity to learn about package writing and documentation for a public audience, with the goal of submitting the result to CRAN.

The nsprcomp package provides two algorithms: nsprcomp implements our emPCA algorithm as discussed in the paper, but with several deflation options for computing multiple components (motivated by Mackey, 2009). nscumcomp is a novel algorithm based on the same form of expectation-maximization, but it jointly computes all components such that the cumulative variance is maximized. One benefit of nscumcomp over nsprcomp is that the number of features can be specified as a total, instead of having to specify the cardinality of each principal axis individually. Setting the total is more natural in an exploratory data analysis, where the number of features that make up a component is not known in advance. A drawback of the joint optimization is that the M-step no longer has a closed form solution (that I know of), so the numerical optimization using L-BFGS iterations increases the computational load.

A small example from the domain of portfolio optimization can demonstrate the usefulness of both algorithms, but keep in mind that my knowledge about the subject essentially consist of reading Markowitz (1952) and a handful of related papers.

Assume that the variance in asset returns can be explained (approximately) using a linear combination of a small number of hidden factors, which of course suggests a principal component analysis. The drawback of classical PCA is that each factor consists of a linear combination of all assets, and with mixed signs. Enforcing non-negativity and sparsity of the loadings can result in a more meaningful analysis, as non-negative loadings correspond to long positions in the portfolio, and sparsity limits the number of positions in the portfolio.

We analyze NYSE daily returns for the year 2005, from a data set which used to be available at infochimps.com. After some pre-processing, the data matrix X contains 260 daily returns for 2365 stocks.

nsprcomp

Calling nsprcomp(X, ncomp=4, k=10, nneg=TRUE) returns four non-negative components with the top ten stocks each:

[[1]]
   weight symbol                                name            sector                                              industry
0.4434287    AKS        AK Steel Holding Corporation  Basic Industries                                        Steel/Iron Ore
0.3080191    ATI Allegheny Technologies Incorporated  Basic Industries                                        Steel/Iron Ore
0.3304431    CLF       Cliffs Natural Resources Inc.  Basic Industries                                       Precious Metals
0.3228908    CMC           Commercial Metals Company  Basic Industries                                        Steel/Iron Ore
0.2646535    CRS    Carpenter Technology Corporation  Basic Industries                                        Steel/Iron Ore
0.3076497    GPK   Graphic Packaging Holding Company Consumer Durables                                  Containers/Packaging
0.2925628    GTI          GrafTech International Ltd            Energy                       Industrial Machinery/Components
0.2947678     MT                       ArcelorMittal  Basic Industries                                        Steel/Iron Ore
0.2643825    NUE                   Nucor Corporation  Basic Industries                                        Steel/Iron Ore
0.2966061    USU                           USEC Inc.  Basic Industries Mining & Quarrying of Nonmetallic Minerals (No Fuels)

[[2]]
   weight symbol                                  name                sector                                              industry
0.2697512    ACO       Amcol International Corporation      Basic Industries Mining & Quarrying of Nonmetallic Minerals (No Fuels)
0.3291779    AIT Applied Industrial Technologies, Inc.     Consumer Durables                                Industrial Specialties
0.2878995    BKI            Buckeye Technologies, Inc.      Basic Industries                                                 Paper
0.3173853    CCC             Calgon Carbon Corporation      Basic Industries                                       Major Chemicals
0.3397139    CIA                        Citizens, Inc.               Finance                                        Life Insurance
0.3100144    CNC                   Centene Corporation           Health Care                                  Medical Specialities
0.2972055    ENS                               Enersys Consumer Non-Durables                          Telecommunications Equipment
0.3109916     NL                   NL Industries, Inc.      Basic Industries                                       Major Chemicals
0.4116398    PRM                                    NA                    NA                                                    NA
0.2631501    RKT                     Rock-Tenn Company     Consumer Durables                                  Containers/Packaging

[[3]]
   weight symbol                      name           sector     industry
0.3182184    BZH    Beazer Homes USA, Inc.    Capital Goods Homebuilding
0.3324353    DHI         D.R. Horton, Inc.    Capital Goods Homebuilding
0.3177888    HOV Hovnanian Enterprises Inc    Capital Goods Homebuilding
0.2986497    KBH                   KB Home    Capital Goods Homebuilding
0.2702869    LEN        Lennar Corporation Basic Industries Homebuilding
0.3632209    MTH      Meritage Corporation    Capital Goods Homebuilding
0.3124587    PHM          PulteGroup, Inc.    Capital Goods Homebuilding
0.2837270    RYL  Ryland Group, Inc. (The)    Capital Goods Homebuilding
0.3117444    SPF     Standard Pacific Corp    Capital Goods Homebuilding
0.3431356    TOL        Toll Brothers Inc.    Capital Goods Homebuilding

[[4]]
   weight symbol                             name sector                    industry
0.3072652    ARD                               NA     NA                          NA
0.2883218    CHK    Chesapeake Energy Corporation Energy        Oil & Gas Production
0.2663459    EAC                               NA     NA                          NA
0.3019455    FTO                               NA     NA                          NA
0.3332282   GMXR                               NA     NA                          NA
0.4095810    NGS Natural Gas Services Group, Inc. Energy Oilfield Services/Equipment
0.3135460     PQ            Petroquest Energy Inc Energy        Oil & Gas Production
0.3120241    SWN      Southwestern Energy Company Energy        Oil & Gas Production
0.3038500    TSO               Tesoro Corporation Energy    Integrated oil Companies
0.3058673    UPL            Ultra Petroleum Corp. Energy        Oil & Gas Production

The first component mostly consists of mining and steel companies. The second component is all over the place, but the third and fourth components are again very homogeneous (the NAs in the second and fourth component are due to unresolved stock symbols). The weights all have a similar magnitude and none of them is close to zero, which suggests that the per-component cardinality k could be increased to include more stocks in each factor (the NYSE energy sector for example contains over two hundred stocks). However, it would take a number of trial runs to optimize the cardinality of each component, such that all relevant and few spurious stocks are included in the analysis.

nscumcomp

Calling nscumcomp(X, ncomp=4, k=150, nneg=TRUE, gamma=1) instead lets the algorithm determine the cardinality of each component. The magnitude of the orthogonality penalty is not sensitive in this example, and a value of gamma=1 results in essentially orthogonal components:

[[1]]
      weight symbol                                    name            sector                                              industry
0.0532875561    ACI                         Arch Coal, Inc.            Energy                                           Coal Mining
0.0694850581    ALY                                      NA                NA                                                    NA
0.0284222815    ANR           Alpha Natural Resources, inc.            Energy                                           Coal Mining
0.0836213427    APA                      Apache Corporation            Energy                                  Oil & Gas Production
0.0481126384    APC          Anadarko Petroleum Corporation            Energy                                  Oil & Gas Production
0.1178876782    ARD                                      NA                NA                                                    NA
0.1393779752    ATW                   Atwood Oceanics, Inc.            Energy                                  Oil & Gas Production
0.0190883361    BJS                                      NA                NA                                                    NA
0.0454705431    BPT            BP Prudhoe Bay Royalty Trust            Energy                              Integrated oil Companies
0.1322468235    BRY                 Berry Petroleum Company            Energy                                  Oil & Gas Production
0.1083120714    BTU              Peabody Energy Corporation            Energy                                           Coal Mining
0.1932201309    CHK           Chesapeake Energy Corporation            Energy                                  Oil & Gas Production
0.1346712775    CNQ      Canadian Natural Resources Limited            Energy                                  Oil & Gas Production
0.0708790208    CNX                      CONSOL Energy Inc.            Energy                                           Coal Mining
0.1469680348    COG             Cabot Oil & Gas Corporation            Energy                                  Oil & Gas Production
0.0398795193    COP                          ConocoPhillips            Energy                              Integrated oil Companies
0.0473333886    CPE                Callon Petroleum Company            Energy                                  Oil & Gas Production
0.0977452639    CRK                Comstock Resources, Inc.            Energy                                  Oil & Gas Production
0.0154340470    CRR                    Carbo Ceramics, Inc.     Capital Goods                       Industrial Machinery/Components
0.1081947055    DNR                  Denbury Resources Inc.            Energy                                  Oil & Gas Production
0.1181309078     DO         Diamond Offshore Drilling, Inc.            Energy                                  Oil & Gas Production
0.1192592682    DRQ                         Dril-Quip, Inc.            Energy                                    Metal Fabrications
0.1043657129    DVN                Devon Energy Corporation            Energy                                  Oil & Gas Production
0.1597735987    EAC                                      NA                NA                                                    NA
0.1396755779    ECA                      Encana Corporation            Energy                                  Oil & Gas Production
0.1329824122    EOG                     EOG Resources, Inc.            Energy                                  Oil & Gas Production
0.0938604480    ESV                               ENSCO plc            Energy                                  Oil & Gas Production
0.0249781513    FRO                          Frontline Ltd.    Transportation                                 Marine Transportation
0.0757529529    FST                  Forest Oil Corporation            Energy                                  Oil & Gas Production
0.1836804006    FTO                                      NA                NA                                                    NA
0.1837984763    GDP          Goodrich Petroleum Corporation            Energy                                  Oil & Gas Production
0.0671665209    GLF                 GulfMark Offshore, Inc.            Energy                                    Metal Fabrications
0.1445269400   GMXR                                      NA                NA                                                    NA
0.0837276244    HAL                     Halliburton Company            Energy                           Oilfield Services/Equipment
0.0627795330    HES                        Hess Corporation            Energy                              Integrated oil Companies
0.0036551657    HGT                   Hugoton Royalty Trust            Energy                                  Oil & Gas Production
0.1085047020    HLX      Helix Energy Solutions Group, Inc.            Energy                           Oilfield Services/Equipment
0.1106579862    HOC                                      NA                NA                                                    NA
0.0991632214    HOS              Hornbeck Offshore Services Consumer Services                                 Marine Transportation
0.0607473586     HP                 Helmerich & Payne, Inc.            Energy                                  Oil & Gas Production
0.0116382620    INT         World Fuel Services Corporation            Energy                                Oil Refining/Marketing
0.0645058955    IOC                    InterOil Corporation            Energy                                  Oil & Gas Production
0.0132567145    IYE                                      NA                NA                                                    NA
0.1853276399    KWK              Quicksilver Resources Inc.            Energy                                  Oil & Gas Production
0.0013747206    MDR           McDermott International, Inc.     Capital Goods                                    Metal Fabrications
0.0264661046    MEE                                      NA                NA                                                    NA
0.0729227215    MRO                Marathon Oil Corporation            Energy                                  Oil & Gas Production
0.0023811506    MUR                  Murphy Oil Corporation            Energy                              Integrated oil Companies
0.0412584481    NBL                       Noble Energy Inc.            Energy                                  Oil & Gas Production
0.0313167250    NBR                  Nabors Industries Ltd.            Energy                                  Oil & Gas Production
0.0563314301     NE                       Noble Corporation            Energy                                  Oil & Gas Production
0.1034268654    NFX            Newfield Exploration Company            Energy                                  Oil & Gas Production
0.1926936035    NGS        Natural Gas Services Group, Inc.            Energy                           Oilfield Services/Equipment
0.1008675352    NOV            National Oilwel Varcol, Inc.            Energy                                    Metal Fabrications
0.0991873145    NXY                                      NA                NA                                                    NA
0.0508974751    OII         Oceaneering International, Inc.            Energy                           Oilfield Services/Equipment
0.0993800241    OIS          Oil States International, Inc.            Energy                                    Metal Fabrications
0.0405026646    OXY        Occidental Petroleum Corporation            Energy                                  Oil & Gas Production
0.0495609266    PDE                                      NA                NA                                                    NA
0.1062543807    PKD                 Parker Drilling Company            Energy                                  Oil & Gas Production
0.1877221709     PQ                   Petroquest Energy Inc            Energy                                  Oil & Gas Production
0.0670664696    PVA               Penn Virginia Corporation            Energy                                  Oil & Gas Production
0.0527486302    PXD       Pioneer Natural Resources Company            Energy                                  Oil & Gas Production
0.1055507210    PXP Plains Exploration & Production Company            Energy                                  Oil & Gas Production
0.1053442278    RDC                     Rowan Companies plc            Energy                                  Oil & Gas Production
0.0690536207    RES                               RPC, Inc.            Energy                           Oilfield Services/Equipment
0.1143500050    RIG                         Transocean Ltd.            Energy                                  Oil & Gas Production
0.1532407298    RRC             Range Resources Corporation            Energy                                  Oil & Gas Production
0.1101877769    SFY                    Swift Energy Company            Energy                                  Oil & Gas Production
0.0715891522    SGY                Stone Energy Corporation            Energy                                  Oil & Gas Production
0.0223880237    SII                                      NA                NA                                                    NA
0.0534550114    SJT            San Juan Basin Royalty Trust            Energy                                  Oil & Gas Production
0.1063493615     SM                       SM Energy Company            Energy                                  Oil & Gas Production
0.1062401621    SPN          Superior Energy Services, Inc.            Energy                           Oilfield Services/Equipment
0.1152037963     SU                     Suncor Energy  Inc.            Energy                              Integrated oil Companies
0.1026210927    SUN                                      NA                NA                                                    NA
0.2121555712    SWN             Southwestern Energy Company            Energy                                  Oil & Gas Production
0.0702312918    TDW                          Tidewater Inc. Consumer Services                                 Marine Transportation
0.1125170113    TLM                    Talisman Energy Inc.            Energy                                  Oil & Gas Production
0.0005959932    TMR                                      NA                NA                                                    NA
0.0131588866     TS                            Tenaris S.A.  Basic Industries                                        Steel/Iron Ore
0.2152177841    TSO                      Tesoro Corporation            Energy                              Integrated oil Companies
0.0679062447    TTI                Tetra Technologies, Inc.            Energy                                  Oil & Gas Production
0.1174076026    UNT                        Unit Corporation            Energy                                  Oil & Gas Production
0.2088031326    UPL                   Ultra Petroleum Corp.            Energy                                  Oil & Gas Production
0.0273802381    USU                               USEC Inc.  Basic Industries Mining & Quarrying of Nonmetallic Minerals (No Fuels)
0.1610340003    VLO               Valero Energy Corporation            Energy                              Integrated oil Companies
0.0068010930    WFT          Weatherford International, Ltd            Energy                                  Oil & Gas Production
0.0952688259    WLL           Whiting Petroleum Corporation            Energy                                  Oil & Gas Production
0.0668893737    WMB          Williams Companies, Inc. (The)  Public Utilities                              Natural Gas Distribution
0.0138315166    XEC                       Cimarex Energy Co            Energy                                  Oil & Gas Production
0.1427017005    XTO                                      NA                NA                                                    NA
      weight symbol                                    name            sector                                              industry

[[2]]
    weight symbol                                 name           sector                                              industry
0.12797710    ABX             Barrick Gold Corporation Basic Industries                                       Precious Metals
0.17644436    AEM           Agnico Eagle Mines Limited Basic Industries                                       Precious Metals
0.11593957    ASA ASA Gold and Precious Metals Limited              n/a                                                   n/a
0.15675978     AU            AngloGold Ashanti Limited Basic Industries                                       Precious Metals
0.24193015    AUY                     Yamana Gold Inc. Basic Industries                                       Precious Metals
0.17403800    BVN     Buenaventura Mining Company Inc. Basic Industries                                       Precious Metals
0.01358500    CCJ                   Cameco Corporation Basic Industries                                       Precious Metals
0.40112842    CDE                   Coeur Mining, Inc. Basic Industries                                       Precious Metals
0.32590838    EGO            Eldorado Gold Corporation Basic Industries                                       Precious Metals
0.07069435    FCX Freeport-McMoran Copper & Gold, Inc. Basic Industries                                       Precious Metals
0.16373700    FDG                                   NA               NA                                                    NA
0.20941091    GFI                  Gold Fields Limited Basic Industries                                       Precious Metals
0.20107216     GG                        Goldcorp Inc. Basic Industries                                       Precious Metals
0.10346601    GRS                                   NA               NA                                                    NA
0.39352265     HL                 Hecla Mining Company Basic Industries Mining & Quarrying of Nonmetallic Minerals (No Fuels)
0.25090552    HMY  Harmony Gold Mining Company Limited Basic Industries                                       Precious Metals
0.16994222    IAG                  Iamgold Corporation Basic Industries                                       Precious Metals
0.01370509    IVN                                   NA               NA                                                    NA
0.25820797    KGC             Kinross Gold Corporation Basic Industries                                       Precious Metals
0.15124020    NEM           Newmont Mining Corporation Basic Industries                                       Precious Metals
0.08798759     RZ                                   NA               NA                                                    NA
0.03131700    SLW                  Silver Wheaton Corp Basic Industries                                       Precious Metals
0.29394621    SWC            Stillwater Mining Company Basic Industries                                       Precious Metals
    weight symbol                                 name           sector                                              industry

[[3]]
     weight symbol                                                           name                sector                            industry
0.096875336    ABV                      Companhia de Bebidas das Americas - AmBev Consumer Non-Durables Beverages (Production/Distribution)
0.326311357    BAK                                                   Braskem S.A.      Basic Industries                     Major Chemicals
0.250725988    BBD                                              Banco Bradesco Sa               Finance                         Major Banks
0.166407433   BRFS                                                       BRF S.A. Consumer Non-Durables                   Meat/Poultry/Fish
0.307049892    BTM                                                             NA                    NA                                  NA
0.089442235    CBD                           Companhia Brasileira de Distribuicao     Consumer Services                         Food Chains
0.321156820    CIG                                        Comp En De Mn Cemig ADS      Public Utilities         Electric Utilities: Central
0.116259858    CPL                                              CPFL Energia S.A.      Public Utilities         Electric Utilities: Central
0.144458384    CYD                             China Yuchai International Limited                Energy     Industrial Machinery/Components
0.309964150    ELP                        Companhia Paranaense de Energia (COPEL)      Public Utilities         Electric Utilities: Central
0.001868534    ERJ                      Embraer-Empresa Brasileira de Aeronautica         Capital Goods                           Aerospace
0.181906131    EWZ                                                             NA                    NA                                  NA
0.087424791    FBR                                           Fibria Celulose S.A.      Basic Industries                               Paper
0.274446999    GGB                                                    Gerdau S.A.         Capital Goods                      Steel/Iron Ore
0.050549732    GOL                            Gol Linhas Aereas Inteligentes S.A.        Transportation       Air Freight/Delivery Services
0.031581881    GPK                              Graphic Packaging Holding Company     Consumer Durables                Containers/Packaging
0.002041877    GTI                                     GrafTech International Ltd                Energy     Industrial Machinery/Components
0.035948636    ILF                                                             NA                    NA                                  NA
0.230162376   ITUB                                 Itau Unibanco Banco Holding SA               Finance                         Major Banks
0.166407433    PDA                                                             NA                    NA                                  NA
0.177097305    SBS Companhia de saneamento Basico Do Estado De Sao Paulo - Sabesp      Public Utilities                        Water Supply
0.026647018    TAR                                                             NA                    NA                                  NA
0.199002737    TBH                                                             NA                    NA                                  NA
0.211402025    TNE                                                             NA                    NA                                  NA
0.104866320    TSP                                                             NA                    NA                                  NA
0.239683225    TSU                                         TIM Participacoes S.A.      Public Utilities        Telecommunications Equipment
0.094630001   VALE                                                      VALE S.A.      Basic Industries                     Precious Metals
0.249716544    VIV                                         Telefonica Brasil S.A.      Public Utilities        Telecommunications Equipment
     weight symbol                                                           name                sector                            industry

[[4]]
     weight symbol                                  name                sector               industry
0.001306744    AIT Applied Industrial Technologies, Inc.     Consumer Durables Industrial Specialties
0.003829347    ALY                                    NA                    NA                     NA
0.023222920    ELN                 Elan Corporation, plc     Consumer Durables  Major Pharmaceuticals
0.151106602    KKD          Krispy Kreme Doughnuts, Inc. Consumer Non-Durables            Food Chains
0.000329151    MAG                                    NA                    NA                     NA
0.988199944    SGU               Star Gas Partners, L.P.     Consumer Services Other Specialty Stores
0.008478258    TGI                   Triumph Group, Inc.         Capital Goods              Aerospace

The cardinality of each component varies greatly in this analysis: there are 92 stocks in the first component, which consists mostly of oil and gas production and some coal companies. There are 23 stocks in the second component, which is about gold and other precious metals. The stocks of the third component come from various sectors, but the common factor seems to be that most of them are Brazilian. No structure is apparent in the fourth component, which essentially consists of Star Gas Partners and Krispy Kreme – maybe the Star Gas employees like doughnuts?

Different random seeds reveal different sets of components with similar cumulative variances (including the mining and homebuilding components from the first analysis), but I stop here and leave further analysis to people who know what they are doing.

References

L. W. Mackey (2009). Deflation Methods for Sparse PCA. Advances in Neural Information Processing Systems, Vol. 21.

H. Markowitz (1952). Portfolio Selection. The Journal of Finance, Vol. 7, No. 1, pp. 77-91.