Backdoor Support for Control-Transfer Breakpoint Features in Windows x64

Feryno, 2007-10-31Revision: 1.0

It is well known that both AMD64 and Intel EM64T CPUs support Control-Transfer Breakpoint Features. When I was trying to implement this feature in FDBG, a hidden backdoor was discovered, which makes the implementation very easy.

Background

It was more than two years ago when I discovered these five fields at the end of CONTEXT structure (ntddk.h):

typedef struct DECLSPEC_ALIGN(16) _CONTEXT {
    ...
    ULONG64 DebugControl;
    ULONG64 LastBranchToRip;
    ULONG64 LastBranchFromRip;
    ULONG64 LastExceptionToRip;
    ULONG64 LastExceptionFromRip;
} CONTEXT, *PCONTEXT;

My first thought was that Microsoft used these variables for some purposes. But I didn't found any way how to made OS to fill them with usefull data. They were always all zeros. So my second thought was that they would be used in a future and now they are only reserved in CONTEXT structure. I didn't know which of these two possible predictions was true and which false. If the first one was true, then it would be only a question of time when somebody find the way how to use them. If the second one was true, it would be again only a question of time when Microsoft implements them for usage.

Using a Driver

So my first approach was successfully done by making a driver to read/write MSR registers. I also realized that these four MSRs: LastBranchFromRip, LastBranchFromRip, LastExceptionFromRip, LastExceptionToRip are read-only and thus can't be written back from CONTEXT structure into CPU as OS switches tasks. I didn't know whether OS saves them at all. I supposed that the best moment when to save them is just when the thread being debugged generates an exception. But the success was delayed a bit until I realized that OS often generates int 0E (exception page fault) when manipulating pages (loading them from swap device).

It was also clear that the best moment when to save the four MSRs is just when entering exception handler (as early as possible) because the registers change at any branching instruction (so I had to avoid, for example, call instruction before save them).

At the end I had a working driver which hooked exceptions (interrupts 00-1F) and every generated exception saved the four MSRs into an internal buffer in the driver.

Driver Issues

The problem was that sometimes a page fault occured as OS loaded pages from swap between saving MSRs and reading saved values from the driver. The second problem was to find the thread which caused exception, what means to find the owner of the saved registers. For a thread being debugged it didn't matter as it often generated exceptions (e. g., breakpoint, single-step exception, …). But loading a page from swap rarely overwrote the saved values with new ones between the moment of exception from program being debugged and the moment when debugger read them from saved buffer of the driver. The third problem was Patchguard which checked the kernel integrity randomly every 5-10 minutes and often rebooted my testing PC with:

BugCheck 109, {a3a03a387918c925, b3b746becb988153, fffff8000010b070, 2}

The Bugcheck Analysis looked like: CRITICAL_STRUCTURE_CORRUPTION (109)

This bugcheck is generated when the kernel detects that critical kernel code or data have been corrupted. There are generally three causes for a corruption:

  1. A driver has inadvertently or deliberately modified critical kernel code or data. See Patching Policy for x64-Based Systems
  2. A developer attempted to set a normal kernel breakpoint using a kernel debugger that was not attached when the system was booted. Normal breakpoints (bp in WinDbg), can only be set if the debugger is attached at boot time. Hardware breakpoints (ba in WinDbg) can be set at any time.
  3. A hardware corruption occurred, for example failing RAM holding kernel code or data. Arguments:
    1. a3a03a387918c925, Reserved
    2. b3b746becb988153, Reserved
    3. fffff8000010b070, Failure type dependent information
    4. 0000000000000002, Type of corrupted region, can be:
      • 0: A generic data region
      • 1: Modification of a function or .pdata
      • 2: A processor IDT
      • 3: A processor GDT
      • 4: Type 1 process list corruption
      • 5: Type 2 process list corruption
      • 6: Debug routine modification
      • 7: Critical MSR modification

At least one of reported MSR was always local address in kernel mode space (the start address of corresponding exception handler), so I started to play with local kernel debugging (only local kernel debugging as I don't have two PCs at close distance to connect them to do remote debugging). Fortunately, I discovered this fragment of kernel code:

fffff80001041628 mov rax, dr6
fffff8000104162b mov rdx, dr7
fffff8000104162e mov [rcx+0x40], rax   ; save DR6
fffff80001041632 mov [rcx+0x48], rdx   ; save DR7
fffff80001041636 xor eax, eax
fffff80001041638 mov dr7, rax   ; zero DR7
fffff8000104163b cmp byte ptr gs:[000007bd], 0x1
fffff80001041644 jnz fffff800010416b0
fffff80001041646 test dx, 0x300   ; test DR7.LE, DR7.GE
fffff8000104164b jz fffff800010416b0   ; skip saving branches registers if none of above DR7 bits is set
fffff8000104164d mov r8, rcx   ; save pointer to data into r8, because ecx will be required for value of MSR register
fffff80001041650 mov ecx, 0x1db   ; LastBranchFromIP
fffff80001041655 rdmsr
fffff80001041657 mov [r8+0x88], eax
fffff8000104165e mov [r8+0x8c], edx
fffff80001041665 mov ecx,0x1dc   ; LastBranchToIP
fffff8000104166a rdmsr
fffff8000104166c mov [r8+0x80], eax
fffff80001041673 mov [r8+0x84], edx
fffff8000104167a mov ecx,0x1dd   ; LastExceptionFromIP
fffff8000104167f rdmsr
fffff80001041681 mov [r8+0x98], eax
fffff80001041688 mov [r8+0x9c], edx
fffff8000104168f mov ecx,0x1de   ; LastExceptionToIP
fffff80001041694 rdmsr
fffff80001041696 mov [r8+0x90], eax
fffff8000104169d mov [r8+0x94], edx
fffff800010416a4 mov ecx,0x1d9   ; DebugCtlMSR
fffff800010416a9 rdmsr
fffff800010416ab and eax, 0xfffffffc   ; disable DebugCtlMSR.LBR, DebugCtlMSR.BTF
fffff800010416ae wrmsr
fffff800010416b0 ret

This code fragment gave me a hope that OS saves MSRs somewhere to be later transfered into thread CONTEXT structure.

I was also fighting against another problem. It was debug exception (int 1). This exception clears DebugCtlMSR.LBR, DebugCtlMSR.BTF (as well as the CPU clears RFLAGS.TF, DR7.GD just when switching from the thread generating debug exception to the debug exception handler). So my driver reenabled one or both bits (depending on request) in DebugCtlMSR at the end of the new hooked routine for int 1. This had a disadvantage that the bits were enabled for any (and thus unknown) thread to be executed by the CPU. The other problem were multi-CPU systems, where I had to set both bits for all CPUs in the system.

As a conclusion, I had a relatively well working proof of concept which wasn't completely perfect, but it usually worked (most of time correctly, more than 99%). For make it to be safe, I had to reboot OS, hook exceptions and do debugging until 5 minutes expired (safe interval to avoid reboot by Patchguard). Very rarely (less than 1%), all four branches MSRs were overwritten with useless addresses when OS loaded a page from swap between saving registers into a buffer (exception handler) in the driver and transfering them from the driver into the debugger (reading saved data from the driver).

Backdoor Found

Fortunately and luckily, I discovered this code fragment from kernel:

fffff80001041581 mov rdx, [rcx+0x48]   ; get value to be written into DR7
fffff80001041585 xor eax, eax
fffff80001041587 mov dr6, rax
fffff8000104158a mov dr7, rdx
fffff8000104158d cmp byte ptr gs:[000007bd], 0x1
fffff80001041596 jnz fffff800010415c2
fffff80001041598 test dx, 0x200   ; test DR7.GE (bit 9.)
fffff8000104159d jz fffff800010415a2
fffff8000104159f or eax, 0x2   ; bit 1. of eax = DR7.GE
fffff800010415a2 test dx, 0x100   ; test DR7.LE (bit8.)
fffff800010415a7 jz fffff800010415ac
fffff800010415a9 or eax, 0x1   ; bit 0. of eax = DR7.LE
fffff800010415ac test eax, eax
fffff800010415ae jz fffff800010415c2
fffff800010415b0 mov r8d, eax   ; save eax into r8d
fffff800010415b3 mov ecx, 0x1d9   ; DebugCtlMSR
fffff800010415b8 rdmsr
fffff800010415ba and eax, 0xfffffffc   ; mask off DebugCtlMSR.LBR, DebugCtlMSR.BTF (bits 0 and 1)
fffff800010415bd or eax, r8d   ; set DebugCtlMSR.LBR, DebugCtlMSR.BTF according to DR7.LE, DR7.GE
fffff800010415c0 wrmsr   ; write the value back to DebugCtlMSR
fffff800010415c2 ret

What does the code do? It sets debug registers (sure to the thread just before switching to it). Then it checks some bits in DR7 and sets two bits in DebugCtlMSR according the two bits in DR7. Strange at the first sight. But I very soon realized how much clever was the programmer who implemented this revolutionary idea! The programmer surely had a thoughts something like:

Bits DebugCtlMSR.LBR, DebugCtlMSR.BTF, DR7.GD, RFLAGS.TF are cleared when entering debug exception handler (int 1). Bit RFLAGS.TF can be easily restored because the image of RFLAGS register just before entering debug exception handler is pushed on the stack when an exception generates. Restoring DR7.GD isn't any problem either, its setting before triggering a debug exception is known, it was set when debug exception was generated as a general detect - accessing debug register, and when entering debug exception handler the DR6.BD bit is set to reflect DR7.GD bit setting before triggering debug exception.

Bits DR7.LE and DR7.GE (Bits 8 and 9, Local/Global Exact Breakpoint Enabled) are both ignored by implementations of the x64 architecture, as it is written in manual (AMD64 Architecture Programmer's Manual Volume 2: System Programming, chapter 13.1.1 Debug-Control Register DR7), because all breakpoint conditions, except certain string operations preceded by a repeat prefix, are exact. These bits aren't cleared when entering debug exception handler.

Bits DebugCtlMSR.LBR and DebugCtlMSR.BTF are destroyed when entering debug exception handler. Bits DR7.LE and DR7.GE aren't destroyed. DR7.LE and DR7.GE aren't implemented for anything which makes sense. DR7.LE and DR7.GE bits were used years ago in older CPUs, but they are still implemented and they are free now. So the revolutionary idea of the programmer was certainly: Let's use DR7.LE and DR7.GE as shadows (aliases) for DebugCtlMSR.LBR and DebugCtlMSR.BTF. So the programmer did.

The other benefit of this 'hack' was: as debug registers are specific for a thread (every thread has its own debug registers which are reloaded when switching to the thread) we can reenable branches recording / stepping on branches only for specific thread(s), so other threads don't interfere and it doesn't matter at which CPU the thread executes in multi-CPU systems. My old driver set DebugCtlMSR for all threads in the system on all CPUs which is not so much desireable. Doing it only for thread(s) being debugged is the best choice.

Conclusion

I really don't know why Microsoft didn't make this information publicly available. The information can't be abused for any malware. The question is whether 32-bit windows does the same as x64 version. I hope the clever trick won't disappear in newer versions of Windows. Currently, Windows XP x64, Windows 2003 x64, and Windows Vista x64 support it perfectly.

This hidden backdoor just waited to be discovered. So let's enjoy the new knowledge.

Resources

AMD64 Architecture Programmer's Manual Volume 2: System Programming

Patching Policy for x64-Based Systems


Comments

Continue to discussion board.

The author doesn't wish to publish his e-mail here.

Visit author's home page.


Revisions

2007-10-311.0First public versionFeryno

(dates format correspond to ISO 8601)