Black Hat - 2017-08-31
A processor is not a trusted black box for running code; on the contrary, modern x86 chips are packed full of secret instructions and hardware bugs. In this talk, we'll demonstrate how page fault analysis and some creative processor fuzzing can be used to exhaustively search the x86 instruction set and uncover the secrets buried in your chipset. Full Abstract & Presentation Materials: https://www.blackhat.com/us-17/briefings.html#breaking-the-x86-instruction-set
I just keep thinking how poor Terry Davis has wasted all his time.
He needs to make his own silicon as well as his own compiler and OS.
RIP
May his soul find rest and find the peace he deserves.. RIP dear Terry! ☹
And all this in his own universe :)
@Vahag Bejanyan no not Universe.. the correct word would be Reality
He is happy with what he does, may he is not interested in business and building his own empire
"jk i'm malicious af"
wrr
Oh man, that page fault analysis is genius.
Guy Smith Intresting, it's the first time hearing it for me though. Just thought it such an creative way to test for instructions.
I am jealous. Not only is this guy much more capable of mentally processing this complex information than I am, he's also incredibly good at presenting it!
thats why he is speaking!! dum dum dum dum duummmmmmmmmmmmmmmmmmmmmmmmm
no such thing as jealoux or capabx or good or not, cepu, do, say, think any nmw anda ny be perfect
He's a Genius.
they should crowdsource the undocumented instructions the same way that pass mark gets benchmarks. Everyone uploads the results and a website shows a nice breakdown of what instruction run on what.
I feel like already running sandsifter to be one of the first that uploads to that site when it inevitably starts existing
+Josh Beach damn cool, i'd like to run this shit on the brazilian voting machines using cyrix..
wiipronhi but if they do stumble across a secret instruction there’s a good chance it will mess up the OS operation. And an emulator wouldn’t have the instructions programmed into it anyway.
https://github.com/xoreaxeaxeax/sandsifter
thanks for providing the link!
this guy obviously never pirates games, they literally have text files saying "this crack is not a virus, disable your antivirus"
I trust them blindly, hasn't failed me yet XD
also I'm unplugging my pc when i go to bed from now on, don't trust that bastard of a cpu plotting all kind of nefarious shit while I sleep XD
I once read a book (totally forgot its name), where a computer meant to do homework, gained intelligence to run without the power plug.
honestly that's the magic of P2P, if it's too good to be true it probably is and somebody has mentioned it in the comments.
over a decade of pirating and only got caught of-guard when I was being retarded downloading extremely niche software... and that was mostly just bloatware/adware, and in hindsight it was obviously fake, but someone has to be the first to write "fake [insert reason why]".
In Torrents we trust, Somalian pirates are we :)
you know he's really good at what he's doing when he speaks at 200 k/h
It takes one to know one.
Maybe a stupid question, but how does the cursor keep blinking if the CPU is locked up?
The graphics card does the blinking. The cursor blinking was literally done by special circuitry back in the days of text-only computers decades ago when the CPU couldn't possible spare CPU cycles to blink a cursor. And since there's been a continuous evolution of compatible machines ever since, the graphics hardware has never been relieved of that responsibility. Of course, modern GPUs can handle that in their sleep. (If the bug was demonstrated in a Graphical UI, then it probably would freeze.)
no such thing as stupix qor not
That's actually a fantastic question, one I didn't even think to make.
32:56 This man is claiming that it would be possible to write a malicious program that is completely benign and undetectable on any x86_64 computer that is not using a physical Intel CPU, but when executed on a computer with an Intel CPU it could arbitrarily execute whatever otherwise deliberately avoided lines of ASM the programmer desired, and also vice versa by inverting the principle. That's the killing blow of this video and, if his demonstration is true, means that nobody should be using x86 CPUs for cryptography of any meaningful level of security.
His claim applies to all processors. Intel just happens to be the most popular, thus relevant one.
actually if you click on the timestamp at the beginning of my comment he is literally saying that a difference between intel and amd (+via) is what causes the vulnerability to exist. it could be said that intel is at fault because their noncompliance with their own documentation is what causes the problem. however all cpu manufacturers are at fault for everything he finds with his method, as he goes on to reveal a pentium f00f-like CPU hardlock he purportedly discovered in an AMD CPU.
The behavior of 0x66 with jumps is well-documented since quite some time. This is just tools not being correct visavi the spec, which is pretty common to be honest for a complex spec. The fact that AMD processors behave differently is a compounding of the problem, however. Qemu is not an instruction set reference by a long shot; it is a best-effort project - and there are likely many other similar bugs to be found. When using VT-x to run, you would get the right results.
I think the issue is that it’s the same language for slightly different architecture. You won’t be able to go to arm or ibm architecture with these specific issues. Now maybe different arm chips to each other, and ibm chips to each other
Oh boy. And now it's been revealed that Intel chips created in the past decade have a kernel memory leak "bug"/backdoor. Well, at least the Intel CEO sold as many shares as he could during Q4...
I’m betting the CPU manufacturers are already using knock instructions to enable hidden instructions, based on a checksum of a piece of resident code. So it would be impossible to find that way, unless you are insanely luck.
a magic number is way more probable.
exactly how could anyone guess magic numbers in all registers plus an invalid instruction.
+mPky1 "guess" i don't know, but if they try to use this exploit you might rest assured it won't be in any kind of small attack, trying to break a hacker's computer giving access to the backdoor to the hacker doesn't seem plausible... rev.eng the software would make a hell of a backdoor...
@Felipe Siqueira Magic number in a protected register is very probable.
@=NolePtr of course it is, because they can deny that the magic number is an exploit
imagine making a compiler that uses alternate undocumented instructions wherever it can.
Undocumented means unsupported. You can assume those don't do shit on other processors though they're both X86.
It would be worse than the current situation, which isn't particularly stellar anyways.
CapnTates That is what i had in mind. Going a step beyond platform dependant. Unique Machine dependant code. Obviously it's not a great idea but interesting none the less.
@TacticalMelonFarmer Maybe we could write unique machine dependent assemblers for that unique machine dependent code
"You can fool all the people some of the time, and some of the people all the time, but you cannot fool all the people all the time."
(Abraham Lincoln)
"Risen"
That's not a pun, that's kid's toilet humour.
How is it not a pun if pi -> pie is?
ikr he must hate amd
McDucky
All pi and pee have in common are the first letter. It's not pronounced the same, nor written the same. Not a pun.
CapnTates
We were talking about how it's pronounced that way in almost every language (including Greek itself of course) except for English. So yes, it is a pun.
Neat talk!
Oh hey! It's Tom! Love your learnfun and playfun videos and programs, it's really fun editing the code and seeing if I can optimize the behavior, and it's also just plain hilarious to run them on hacked roms, especially speedhacked ones. Have you considered doing more work on them?
Those manufacturers are coordinating their hidden instructions, huh... Maybe they are organized by some government agency...
+judgeomega Actually, on most Intel boards, there is: Intel Active Management Technology. Judge for yourself how secret it is...
I'll bet you some of these aren't government, but Hollywood DRM organized. Because if you want to watch their movies - they want to own your hardware.
@Rauer Hutger Its a reference to the original one I think, which is about 18 years old now.
*coisraelough* *comossadugh* sorry I'm allergic to snakes.
It's good to see Black Hat uploading videos unlike the other popular tech conference.
Mp57navy it's the name of the conference...
CCC uploads all of their videos too, as does Defcon, and FOSDEM; HOPE only keeps official audio recordings; which conference are you referring to?
The secret one?
Torpcoms Maybe he meant That-Conference-That-Do-Not-Upload-The-Video-Immediately
Ok thanks for this talk. Just confirmed an old acquaintance must work for an intelligence agency. He told me this in 98 but was drunk and denied saying it after.. Lol..
That 66 byte malfunction is really interesting. I know this suffix is used in 16 bit mode when you want to use 32 bit registers. Now you can totally hide your code from disassemblers on x64 cpus. And it will work differently in VM. Perfect for writing malicious code.
It is interesting, though now anti-viruses and malware detection tools can make extended tests when they see 66 e9 opcodes..
They should. I think nobody noticed it before is because compilers do not use this suffix. I think it's just a left over feature from the 16 bit real mode and they just reversed what it does. In real mode everything you do is 16 bit, obviously, but with 66 prefix you can do 32 bit operations. In long mode (64bit) this suffix seems to do the opposite, it will make the operation 16 bit. But if you want to move a 16 bit value to a 32/64 bit register you can just use the 16 bit counterpart, instead of (66) mov eax, 0xFFFF do mov ax, 0xFFFF and I'm sure that's what compilers do.
Intel:
Instruction prefixes can be used to override the default operand size and address size of a code segment. These prefixes can be used in real-address mode as well as in protected mode and virtual-8086 mode. An operand-size or address-size prefix only changes the size for the duration of the instruction.
The following two instruction prefixes allow mixing of 32-bit and 16-bit operations within one segment:
•The operand-size prefix (66H)
•The address-size prefix (67H)
These prefixes reverse the default size selected by the D flag in the code-segment descriptor. For example, the processor can interpret the (MOV mem, reg) instruction in any of four ways:
•In a 32-bit code segment:
—Moves 32 bits from a 32-bit register to memory using a 32-bit effective address.
—If preceded by an operand-size prefix, moves 16 bits from a 16-bit register to memory using a 32-bit effective address.
—If preceded by an address-size prefix, moves 32 bits from a 32-bit register to memory using a 16-bit effective address.
—If preceded by both an address-size prefix and an operand-size prefix, moves 16 bits from a 16-bit register to memory using a 16-bit effective address.
•In a 16-bit code segment:
—Moves 16 bits from a 16-bit register to memory using a 16-bit effective address.
—If preceded by an operand-size prefix, moves 32 bits from a 32-bit register to memory using a 16-bit effective address.
—If preceded by an address-size prefix, moves 16 bits from a 16-bit register to memory using a 32-bit effective address.
—If preceded by both an address-size prefix and an operand-size prefix, moves 32 bits from a 32-bit register to memory using a 32-bit effective address.
The previous examples show that any instruction can generate any combination of operand size and address size regardless of whether the instruction is in a 16- or 32-bit segment. The choice of the 16- or 32-bit default for a code segment is normally based on the following criteria:
•Performance — Always use 32-bit code segments when possible. They run much faster than 16-bit code segments on P6 family processors, and somewhat faster on earlier IA-32 processors.
•The operating system the code segment will be running on — If the operating system is a 16-bit operating system, it may not support 32-bit program modules.
•Mode of operation — If the code segment is being designed to run in real-address mode, virtual-8086 mode, or SMM, it must be a 16-bit code segment.
•Backward compatibility to earlier IA-32 processors — If a code segment must be able to run on an Intel 8086 or Intel 286 processor, it must be a 16-bit code segment.
The D flag in a code-segment descriptor determines the default operand-size and address-size for the instructions of a code segment. (In real-address mode and virtual-8086 mode, which do not use segment descriptors, the default is 16 bits.) A code segment with its D flag set is a 32-bit segment; a code segment with its D flag clear is a 16-bit segment.
Executable code segment. The flag is called the D flag and it indicates the default length for effective addresses and operands referenced by instructions in the segment. If the flag is set, 32-bit addresses and 32-bit or 8-bit operands are assumed; if it is clear, 16-bit addresses and 16-bit or 8-bit operands are assumed.
The instruction prefix 66H can be used to select an operand size other than the default, and the prefix 67H can be used select an address size other than the default.
The 32-bit operand prefix can be used in real-address mode programs to execute the 32-bit forms of instructions. This prefix also allows real-address mode programs to use the processor’s 32-bit general-purpose registers.
The 32-bit address prefix can be used in real-address mode programs, allowing 32-bit offsets.
The IA-32 processors beginning with the Intel386 processor can generate 32-bit offsets using an address override prefix; however, in real-address mode, the value of a 32-bit offset may not exceed FFFFH without causing an exception.
Assembler Usage:
If a code segment that is going to run in real-address mode is defined, it must be set to a USE 16 attribute. If a 32-bit operand is used in an instruction in this code segment (for example, MOV EAX, EBX), the assembler automatically generates an operand prefix for the instruction that forces the processor to execute a 32-bit operation, even though its default code-segment attribute is 16-bit.
The 32-bit operand prefix allows a real-address mode program to use the 32-bit general-purpose registers (EAX, EBX, ECX, EDX, ESP, EBP, ESI, and EDI).
When moving data in 32-bit mode between a segment register and a 32-bit general-purpose
register, the Pentium Pro processor does not require the use of a 16-bit operand size prefix;
however, some assemblers do require this prefix. The processor assumes that the 16 least-significant
bits of the general-purpose register are the destination or source operand. When moving a
value from a segment selector to a 32-bit register, the processor fills the two high-order bytes of
the register with zeros.
--------------------------------------------------
AMD:
3.3.2. 32-Bit vs. 16-Bit Address and Operand Sizes
The processor can be configured for 32-bit or 16-bit address and operand sizes. With 32-bit
address and operand sizes, the maximum linear address or segment offset is FFFFFFFFH
(232-1), and operand sizes are typically 8 bits or 32 bits. With 16-bit address and operand sizes,
the maximum linear address or segment offset is FFFFH (216-1), and operand sizes are typically
8 bits or 16 bits.
When using 32-bit addressing, a logical address (or far pointer) consists of a 16-bit segment
selector and a 32-bit offset; when using 16-bit addressing, it consists of a 16-bit segment selector
and a 16-bit offset.
Instruction prefixes allow temporary overrides of the default address and/or operand sizes from
within a program.
When operating in protected mode, the segment descriptor for the currently executing code
segment defines the default address and operand size. A segment descriptor is a system data
structure not normally visible to application code. Assembler directives allow the default
addressing and operand size to be chosen for a program. The assembler and other tools then set
up the segment descriptor for the code segment appropriately.
When operating in real-address mode, the default addressing and operand size is 16 bits. An
address-size override can be used in real-address mode to enable 32-bit addressing; however, the
maximum allowable 32-bit linear address is still 000FFFFFH (220-1).
3.6. OPERAND-SIZE AND ADDRESS-SIZE ATTRIBUTES
When the processor is executing in protected mode, every code segment has a default operandsize
attribute and address-size attribute. These attributes are selected with the D (default size)
flag in the segment descriptor for the code segment (see Chapter 3, Protected-Mode Memory
Management, in the Intel Architecture Software Developer’s Manual, Volume 3). When the D
flag is set, the 32-bit operand-size and address-size attributes are selected; when the flag is clear,
the 16-bit size attributes are selected. When the processor is executing in real-address mode,
virtual-8086 mode, or SMM, the default operand-size and address-size attributes are always 16
bits.
The operand-size attribute selects the sizes of operands that instructions operate on. When the
16-bit operand-size attribute is in force, operands can generally be either 8 bits or 16 bits, and
when the 32-bit operand-size attribute is in force, operands can generally be 8 bits or 32 bits.
The address-size attribute selects the sizes of addresses used to address memory: 16 bits or 32
bits. When the 16-bit address-size attribute is in force, segment offsets and displacements are 16
bits. This restriction limits the size of a segment that can be addressed to 64 KBytes. When the
32-bit address-size attribute is in force, segment offsets and displacements are 32 bits, allowing
segments of up to 4 GBytes to be addressed.
The default operand-size attribute and/or address-size attribute can be overridden for a particular
instruction by adding an operand-size and/or address-size prefix to an instruction (see
“Instruction Prefixes” in Chapter 2 of the Intel Architecture Software Developer’s Manual,
Volume 3). The effect of this prefix applies only to the instruction it is attached to.
Table 3-1 shows effective operand size and address size (when executing in protected mode)
depending on the settings of the D flag and the operand-size and address-size prefixes.
+BOOZE & METAL there's nothing wrong with a good payload...
+martin i doubt they do, or interestingly could you detect this code in order to invade other machine's with the cia's or whoever code like they did with spector and meltdown's payload?
Goddamn, did Intel send assassins after this guy or what?
Intel ME, AMD PSP...
Rutkowska*
Art Vandelay Intel me can be disabled by setting one bit to 1 that is documented as "reserved"
That happens with older ME versions as well, anything past Penryn has a 30 minute reset timer. https://mail.coreboot.org/pipermail/coreboot/2016-September/082021.html
Yes; also nothing except ME11 is even using x86. AMD's ASP/PSP is an ARM core, and ME versions 10 and below were ARC cores.
I don't know about auditable fabrication, but OpenPOWER machines can be pretty close to open hardware. Or if you are just wanting a micro-controller, RISC-V sounds promising.
Has the hardware bug been disclosed yet?
Not to me, at least.
Bump.
Dunno yet, but today some people hinted at a whole new family of Spectre-class vulnerability.
@Y H This aged quite well haha
Most of those undefined opcodes could be what LAX was on 6502: opcodes without explicit microcode.
With simple processors like the 6502, you could enable parts of more than one instruction because the opcodes weren't discretely decoded. On a modern CPU, a working instruction is most likely intentional.
bryede the thing is: we don't know the exact internals of x86 like we do with 6502 because it's closed hardware and it's constantly changing with each new microarchitecture iteration.
Right, but the x86 is fully microcoded and it can even be updated in the field. This means instructions point to specific internal sequences to be executed. When you find a sequence that works without throwing an exception, it probably means there's something in the table. It doesn't mean it's top secret functionality, but it might be. On the 6502, the undocumented opcodes came from enabling more than one instruction because the sequencing ROM was optimized for size and is built out of fragments that can overlap with illegal bit combinations.
bryede several millions of undocumented instructions would require a fairly big microcode ROM, so we can speculate that there's overlap between some of the documented and some of the undocumented.
Right. Because of how the instructions are prefixed and expanded in a byte-wise fashion, you don't need a table supporting all bit combinations, but rather you move from one 256-entry table to another. I'm just saying all unused byte entries are supposed to throw an exception. Just modifying instructions with prefix bytes won't.
This is why open source is so important. I would love to see a viable open CPU alternative emerge, on the scale of Linux in the software world. It's not impossible, but it would be a much different and more challenging problem to solve.
just discovered RISC-V recently myself also
one teeny weeny problem with this.... how are you going to be sure the open spec is what is on silicon..... slightly modified chips could be swapped for companies that the govt wants to target
faacts
meh757 when gov has this kind of concern, gov uses legal shell companies. You could even become a “free for all” RISC V shell. As long as you weren’t getting compromised. Easier if you’re in a neutral non extraditing country. Easier if you have tamper evident traceable packaging (ie, this REALLY IS the package from chips ‘r’ us in Switzerland, and, it has not been opened)
Open-source for security only works when people can verify that the object code (the silicon in the case of a processor) matches the source code. It's theater to show some source code and claim that the build matches it. Suuuure it does.
What an amazing talk, and an amazing guy.
lmao i wish i blindly trusted my hardware. i've had 2 HaCFs while playing games on my current pc so far
Thanks for the talk! You used such fascinating methodology to carry out this project. Definitely learned a lot, not just about hidden instructions but also software searching techniques.
15:40 how that is truth? I thought we don't relay on what's said in documentation and it's sounds like a good idea to hide backdoor/hiden instructions by continuously throwing exceptions.
Well how am I gonna trust that my Hardware is failing if I can't even trust this video to be telling me the truth?
This guy is a genius
Oh my. That's why I love my RISC CPU. Those x86 CPUs are so complex!
I have good news for you: x86 has been a RISC CPU for decades. The bad news: it's running a proprietary real-time OS known as "microcode" that emulates the CISC ISA and does all manner of shenanigans behind your back.
Waiting and waiting and waiting for some company to push out some RISC-V ISA based CPU for consumers.
If you completely disable the microcode, could you run code on "bare metal"?
If you disable the microcode (if you manage to do that) you could not do anything with your processor anymore.
The initial microcode is hardcoded into the CPU, but via software you can update it (on every boot)... So I guess it's really impossible as you said.
Epic.
Great talk, nice ideas!
38:00 Translation: "I have a Transmeta processor. Neenerz."
its kind of scary how good some of these guys are at figuring these things out
1:00
Hey mister, are you a nazi, or something?
good stuff
Everytime he says "instruction" take a shot
That's so meta
i am so fucking drunk.
Challenge accepted!
20 min in and i concede defeat
8:26 either that instruction's length, or its exception behaviorrrrrr :(
Wow, this guy! <3
Wait, so you're telling me VMs simulate all that? I wonder if they replicate the bugs
You are my favorit! Love your stuff.
How the hell can the sifter be written in Python??
Update: nvm, the real work is done in C. I'm satisfied.
Wow, that is amazing! Just curious though, any updates on that Ring3 DoS instruction that locked up his CPU?
This guy is amazing. What an incredible work! Great presentation too! But please... breathe and drink some water dude! :P
I totally understood all of this:) :) :)
This is actually something amazing
Do we know now which processor had the lock bug?
Edsel Valle - 2017-09-05
This guy is impressively good at what he does.
Marienkarpfen - 2019-01-29
Christopher Domas btw.
Zes Jerome - 2019-06-08
no such thing as imprex or not
I can't get up! - 2020-01-02
@Reth Tard Been seeing you in CS vids
Reth Tard - 2020-01-03
@I can't get up!
I did watch a lot of Youtube videos in the recent years... lol
Just curious, what do you mean by CS?
I can only think of counter strike, but I'm pretty sure you didn't mean that.
Proxy - 2020-03-19
except he can't pronounce "Ryzen" correctly. it's "Rai-Zen" not whatever this is: 3:24