Fan SDR Hacking for the Intel S1200V3RP Motherboard [updated]

2015-07-19

[After a discussion with Andrew Su, this post has been updated to include information on how to select the temperature sensor that controls the fan speed.]

I run a home server built around the Intel S1200V3RP motherboard. To minimize noise levels, it is housed inside a Fractal Design Define Mini case, driven by three Scythe Slip Stream 120 PWM case fans and a Noctua NH-U12S CPU fan.

Contrary to typical desktop motherboards, the BIOS of this server board doesn’t provide functionality to set a temperature vs. fan speed curve. Intel expects its customers to house the board inside an approved chassis, and provides IPMI sensor data records (SDRs) with pre-defined curves for those chassis. For “other” chassis, Intel also provides three generic SDRs called slow, medium and fast ramp. Here are the steps to choose which temperature sensor controls the fans, and how to modify the fan speed and temperature points of the “slow ramp” curve.

Download the latest update package for EFI (version 03.02.0003 at the time of writing) and unpackage the contents to an USB stick. Open the file S1200RP.sdr in a text editor.

Understanding this file takes some effort. Although the format follows the IPMI specification, the relevant entries are undocumented OEM records. Fortunately, there are comments in the file for many records, which greatly simplifies reverse engineering.

Modifying the Curve

The changes consist of two parts: modifying the fan-temperature curve, and also lowering critical thresholds such that the desired fan speeds are considered normal. The latter can be necessary, because as soon as a critical threshold is triggered for a sensor, the baseband management controller (BMC) spins the fans up to full speed (as a safety measure). So if your changes don’t have the desired effect and the fans run at full speed instead, then check the IPMI system event log (SEL) for critical events (see below). For example, a mistake I made is to include a case fan in the SDR update that is not physically present, which is later interpreted by the BMC as if the fan was present but stuck at 0 RPM.

The curve record for the slow ramp (which has ID 5B) is defined on lines 7078 – 7084:

// Global Stepwise Curve Record
5B             // Stepwise Curve ID
02             // Domain max and Count [7]-Domain Max (0=no) [6:0]-Count 
1E             // 30C
19             // 25%
3C             // 60C
32             // 50%

I changed the 25 % PWM entry to 40 % (28 in hex) which results in 490 RPM for the Scythe case fans. At lower PWM values the fans stalled, which prompted the BMC to activate emergency mode and run the fans at full speed. At 490 RPM the fans are barely audible, yet they handle the thermal load generated by keeping all four cores of the Xeon E3-1241V3 CPU busy:

$ stress -c 8 &
[1] 29290
# stress: info: [29290] dispatching hogs: 8 cpu, 0 io, 0 vm, 0 hdd

$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +61.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:         +61.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:         +61.0°C  (high = +80.0°C, crit = +100.0°C)
Core 2:         +59.0°C  (high = +80.0°C, crit = +100.0°C)
Core 3:         +55.0°C  (high = +80.0°C, crit = +100.0°C)

Depending on the desired fan speed, adjust the criticality thresholds for the PROC1_FAN and OTHER_FAN_1 to OTHER_FAN_4 entries, which all share the same structure. For example, the OTHER_FAN_1 tachometer record is defined on lines 1371 – 1429:

_SDR_TYPE   01
_SDR_TAG    'OTHER_FAN_1'
_REC_LEN    003C

   // Sensor Record Header
   001F               // Record ID
   51                 // SDR Version
   01                 // Record Type
   37                 // Record Length

   (...)

   82                 // Normal Reading (82h == 12750 RPM)
   FF                 // Normal Maximum (FFh == 25000 RPM)
   05                 // Normal Minimum (05h == 500 RPM)
   //------------------------------------------------------------------//
   FF                 // Sensor Maximum Reading (FFh == 25000 RPM)
   00                 // Sensor Minimum Reading (00h == 0 RPM)
   //------------------------------------------------------------------//
   00                 // Upper non-recoverable (not specified)
   00                 // Upper critical (not specified)
   00                 // Upper non-critical (not specified)
   //------------------------------------------------------------------//
   00                 // Lower non-recoverable (not specified)
   01                 // Lower critical (01h == 175 RPM)
   04                 // Lower non-critical (04h == 400 RPM)
   //------------------------------------------------------------------//

   (...)

   'System Fan 1'     // String Bytes

If necessary, reduce the “Normal Minimum” and “Lower non-critical” thresholds

Choosing the Controlling Temperature Sensor

Different temperature sensors react differently to varying system loads. Sensors close to the CPU typically react faster than sensors that are further away. The default sensor chosen by Intel is the BB EDGE sensor, which measures ambient temperature inside the chassis and has a slow reaction time. If you prefer a more dynamic response of the fans, you can choose a different sensor as the input to the curve.

The sensor mapping is defined in the stepwise header record of lines 5742 – 5774:

//====================================================================//
_SDR_TYPE   C0
_SDR_TAG    'R1000'
_SDR_TAG    'UPS_4HDD'
_SDR_TAG    'UPS_8HDD'
_SDR_TAG    'OTHER'
_REC_LEN    0017

   // Sensor Record Header
   00C7           // Record ID
   51             // SDR Version
   C0             // Record Type
   12             // Record Length
   
   (...)
   
   // Global Clamp Header Record
   // Clamp Type Temperature Sensor sub-record
   20             // Clamp Header ID
   20             // Temperature Sensor Number -- (BB EDGE Temp)
   00             // Entity Presence Sensor Number -- (NA)
   64             // Sensor Disabled Control Value 
   64             // Sensor Fail Control Value 
   64             // Sensor Unavailable Control Value
   00             // Sleep Control Value
   28             // Sleep State and Hysteresis [7:6]-Supported in S1 (1=yes); [5:3]-Negative Hysteresis = 5; [2:0]-Positive Hysteresis = 0
   80             // Clamp Control Coefficient LSB
   02             // Clamp Control Coefficient MSB
   28             // Temperature (28h = 40 degree C)
   00             // Clamp Flags [7:4] - CPU Number = 0; [3:1] - Reserved=0; [0] - Temp Source = 0 (use Fixed Temp)

To choose a different sensor, modify the “Temperature Sensor Number” on line 5764. Good candidates for a more dynamic response are PCH Temp (sensor number 22) or BB CPU VR Temp (sensor number 24).

Writing the SDRs

To apply the changes, plug the USB stick into the server, boot into EFI (by holding F6 at the boot splash screen) and run updFRUSDR.nsh. The update script will detect that you operate the motherboard in an unknown chassis, will let you choose to install the (modified) slow ramp curve and then will ask you a series of questions to determine which fans are present in your system (remember to answer no if a fan is not present). Then reboot the system and remove the USB stick.

To verify that your changes have been applied successfully, run ipmitool sensor list from the command line (assuming a Linux system) and check the entries for System Fan 1 to System Fan 4 and Processor Fan:

System Fan 1     | 490.000    | RPM        | ok    | na        | 98.000    | 98.000    | na        | na        | na        
System Fan 2     | 490.000    | RPM        | ok    | na        | 98.000    | 98.000    | na        | na        | na        
System Fan 4     | 490.000    | RPM        | ok    | na        | 98.000    | 98.000    | na        | na        | na        
Processor Fan    | 588.000    | RPM        | ok    | na        | 98.000    | 98.000    | na        | na        | na

(System Fan 3 is not present in my system)

If the fans don’t do what you want, inspect the SEL using ipmitool sel list and watch for entries like

 606 | 07/19/2015 | 08:44:08 | Fan #0x30 | Lower Critical going low  | Asserted

Such an entry indicates that System Fan 1 (which has sensor number 0x30) had its lower critical threshold triggered. You would need to either change the sensor threshold or the curve to avoid this event in the future.

Enjoy a quiet home server that is based on a proper server motherboard.