Feryno, 2007-10-31 | Revision: 1.0 |
It is well known that both AMD64 and Intel EM64T CPUs support Control-Transfer Breakpoint Features. When I was trying to implement this feature in FDBG, a hidden backdoor was discovered, which makes the implementation very easy.
It was more than two years ago when I discovered these
five fields at the end of CONTEXT
structure (ntddk.h):
typedef struct DECLSPEC_ALIGN(16) _CONTEXT { ... ULONG64 DebugControl; ULONG64 LastBranchToRip; ULONG64 LastBranchFromRip; ULONG64 LastExceptionToRip; ULONG64 LastExceptionFromRip; } CONTEXT, *PCONTEXT;
My first thought was that Microsoft used these
variables for some purposes. But I didn't found any way how to made
OS to fill them with usefull data. They were always all zeros. So
my second thought was that they would be used in a future and now
they are only reserved in CONTEXT
structure. I didn't know which of
these two possible predictions was true and which false. If the first
one was true, then it would be only a question of time when somebody
find the way how to use them. If the second one was true, it would
be again only a question of time when Microsoft implements them for
usage.
So my first approach was successfully done by making
a driver to read/write MSR registers. I also realized that these
four MSRs: LastBranchFromRip
,
LastBranchFromRip
, LastExceptionFromRip
,
LastExceptionToRip
are read-only and thus can't be written back from
CONTEXT
structure into CPU as
OS switches tasks. I didn't know whether
OS saves them at all. I supposed that the best moment when to save them
is just when the thread being debugged generates an exception. But
the success was delayed a bit until I realized that OS often generates
int 0E (exception page fault) when manipulating pages (loading them
from swap device).
It was also clear that the best moment when to save the four MSRs is just when entering exception handler (as early as possible) because the registers change at any branching instruction (so I had to avoid, for example, call instruction before save them).
At the end I had a working driver which hooked exceptions (interrupts 00-1F) and every generated exception saved the four MSRs into an internal buffer in the driver.
The problem was that sometimes a page fault occured as OS loaded pages from swap between saving MSRs and reading saved values from the driver. The second problem was to find the thread which caused exception, what means to find the owner of the saved registers. For a thread being debugged it didn't matter as it often generated exceptions (e. g., breakpoint, single-step exception, …). But loading a page from swap rarely overwrote the saved values with new ones between the moment of exception from program being debugged and the moment when debugger read them from saved buffer of the driver. The third problem was Patchguard which checked the kernel integrity randomly every 5-10 minutes and often rebooted my testing PC with:
BugCheck 109, {a3a03a387918c925, b3b746becb988153, fffff8000010b070, 2}
The Bugcheck Analysis looked like: CRITICAL_STRUCTURE_CORRUPTION (109)
This bugcheck is generated when the kernel detects that critical kernel code or data have been corrupted. There are generally three causes for a corruption:
a3a03a387918c925
, Reservedb3b746becb988153
, Reservedfffff8000010b070
, Failure type dependent
information0000000000000002
, Type of corrupted region, can be:
At least one of reported MSR was always local address in kernel mode space (the start address of corresponding exception handler), so I started to play with local kernel debugging (only local kernel debugging as I don't have two PCs at close distance to connect them to do remote debugging). Fortunately, I discovered this fragment of kernel code:
fffff80001041628 mov rax, dr6 fffff8000104162b mov rdx, dr7 fffff8000104162e mov [rcx+0x40], rax ; save DR6 fffff80001041632 mov [rcx+0x48], rdx ; save DR7 fffff80001041636 xor eax, eax fffff80001041638 mov dr7, rax ; zero DR7 fffff8000104163b cmp byte ptr gs:[000007bd], 0x1 fffff80001041644 jnz fffff800010416b0 fffff80001041646 test dx, 0x300 ; test DR7.LE, DR7.GE fffff8000104164b jz fffff800010416b0 ; skip saving branches registers if none of above DR7 bits is set fffff8000104164d mov r8, rcx ; save pointer to data into r8, because ecx will be required for value of MSR register fffff80001041650 mov ecx, 0x1db ; LastBranchFromIP fffff80001041655 rdmsr fffff80001041657 mov [r8+0x88], eax fffff8000104165e mov [r8+0x8c], edx fffff80001041665 mov ecx,0x1dc ; LastBranchToIP fffff8000104166a rdmsr fffff8000104166c mov [r8+0x80], eax fffff80001041673 mov [r8+0x84], edx fffff8000104167a mov ecx,0x1dd ; LastExceptionFromIP fffff8000104167f rdmsr fffff80001041681 mov [r8+0x98], eax fffff80001041688 mov [r8+0x9c], edx fffff8000104168f mov ecx,0x1de ; LastExceptionToIP fffff80001041694 rdmsr fffff80001041696 mov [r8+0x90], eax fffff8000104169d mov [r8+0x94], edx fffff800010416a4 mov ecx,0x1d9 ; DebugCtlMSR fffff800010416a9 rdmsr fffff800010416ab and eax, 0xfffffffc ; disable DebugCtlMSR.LBR, DebugCtlMSR.BTF fffff800010416ae wrmsr fffff800010416b0 ret
This code fragment gave me a hope that OS saves MSRs
somewhere to be later transfered into thread CONTEXT
structure.
I was also fighting against another problem. It was
debug exception (int 1). This exception clears DebugCtlMSR.LBR
,
DebugCtlMSR.BTF
(as well as the CPU clears RFLAGS.TF
, DR7.GD
just
when switching from the thread generating debug exception to the
debug exception handler). So my driver reenabled one or both bits
(depending on request) in DebugCtlMSR
at the end of the new hooked
routine for int 1. This had a disadvantage that the bits were
enabled for any (and thus unknown) thread to be executed by the
CPU. The other problem were multi-CPU systems, where I had to set
both bits for all CPU
s in the system.
As a conclusion, I had a relatively well working proof of concept which wasn't completely perfect, but it usually worked (most of time correctly, more than 99%). For make it to be safe, I had to reboot OS, hook exceptions and do debugging until 5 minutes expired (safe interval to avoid reboot by Patchguard). Very rarely (less than 1%), all four branches MSRs were overwritten with useless addresses when OS loaded a page from swap between saving registers into a buffer (exception handler) in the driver and transfering them from the driver into the debugger (reading saved data from the driver).
Fortunately and luckily, I discovered this code fragment from kernel:
fffff80001041581 mov rdx, [rcx+0x48] ; get value to be written into DR7 fffff80001041585 xor eax, eax fffff80001041587 mov dr6, rax fffff8000104158a mov dr7, rdx fffff8000104158d cmp byte ptr gs:[000007bd], 0x1 fffff80001041596 jnz fffff800010415c2 fffff80001041598 test dx, 0x200 ; test DR7.GE (bit 9.) fffff8000104159d jz fffff800010415a2 fffff8000104159f or eax, 0x2 ; bit 1. of eax = DR7.GE fffff800010415a2 test dx, 0x100 ; test DR7.LE (bit8.) fffff800010415a7 jz fffff800010415ac fffff800010415a9 or eax, 0x1 ; bit 0. of eax = DR7.LE fffff800010415ac test eax, eax fffff800010415ae jz fffff800010415c2 fffff800010415b0 mov r8d, eax ; save eax into r8d fffff800010415b3 mov ecx, 0x1d9 ; DebugCtlMSR fffff800010415b8 rdmsr fffff800010415ba and eax, 0xfffffffc ; mask off DebugCtlMSR.LBR, DebugCtlMSR.BTF (bits 0 and 1) fffff800010415bd or eax, r8d ; set DebugCtlMSR.LBR, DebugCtlMSR.BTF according to DR7.LE, DR7.GE fffff800010415c0 wrmsr ; write the value back to DebugCtlMSR fffff800010415c2 ret
What does the code do? It sets debug registers (sure
to the thread just before switching to it). Then it checks some bits
in DR7
and sets two bits in DebugCtlMSR
according the two bits in
DR7
. Strange at the first sight. But I very soon realized how much clever was the
programmer who implemented this revolutionary idea! The programmer
surely had a thoughts something like:
Bits DebugCtlMSR.LBR
,
DebugCtlMSR.BTF
, DR7.GD
,
RFLAGS.TF
are cleared when entering debug exception
handler (int 1). Bit RFLAGS.TF
can be easily
restored because the image of RFLAGS
register just before
entering debug exception handler is pushed on the stack when an
exception generates. Restoring DR7.GD
isn't any problem either, its
setting before triggering a debug exception is known, it was set when
debug exception was generated as a general detect - accessing debug
register, and when entering debug exception handler the DR6.BD
bit is
set to reflect DR7.GD
bit setting before triggering debug
exception.
Bits DR7.LE
and DR7.GE
(Bits 8 and 9, Local/Global Exact Breakpoint Enabled) are both ignored by implementations of the x64
architecture, as it is written in manual (AMD64
Architecture Programmer's Manual Volume 2: System
Programming, chapter 13.1.1 Debug-Control Register DR7), because all
breakpoint conditions, except certain string operations preceded by
a repeat prefix, are exact. These bits aren't cleared when entering
debug exception handler.
Bits DebugCtlMSR.LBR
and DebugCtlMSR.BTF
are destroyed
when entering debug exception handler. Bits DR7.LE
and DR7.GE
aren't
destroyed. DR7.LE
and DR7.GE
aren't implemented for anything which
makes sense. DR7.LE
and DR7.GE
bits were used years ago in older
CPUs, but they are still implemented and they are free now. So the
revolutionary idea of the programmer was certainly: Let's
use
So the programmer did.DR7.LE
and DR7.GE
as shadows (aliases) for
DebugCtlMSR.LBR
and
DebugCtlMSR.BTF
.
The other benefit of this 'hack' was: as debug
registers are specific for a thread (every thread has its own debug
registers which are reloaded when switching to the thread) we can
reenable branches recording / stepping on branches only for specific
thread(s), so other threads don't interfere and it doesn't matter at
which CPU the thread executes in multi-CPU systems. My old driver
set DebugCtlMSR
for all threads in the system on all CPUs which is
not so much desireable. Doing it only for thread(s) being debugged
is the best choice.
I really don't know why Microsoft didn't make this information publicly available. The information can't be abused for any malware. The question is whether 32-bit windows does the same as x64 version. I hope the clever trick won't disappear in newer versions of Windows. Currently, Windows XP x64, Windows 2003 x64, and Windows Vista x64 support it perfectly.
This hidden backdoor just waited to be discovered. So let's enjoy the new knowledge.
AMD64 Architecture Programmer's Manual Volume 2: System Programming
Patching Policy for x64-Based Systems
Continue to discussion board.
The author doesn't wish to publish his e-mail here.
Visit author's home page.
2007-10-31 | 1.0 | First public version | Feryno |
(dates format correspond to ISO 8601)