Bitcoin Transaction Malleability

clarkmoody · on July 20, 2017

> These txids are immaterial to how the Bitcoin blockchain works: their primary use is as a convenience for humans when referring to transactions.

This is incorrect. Each Bitcoin transaction input references a previous transaction output as the txid+output index. Transactions spending unconfirmed outputs are orphaned when the parent is malleated and confirmed.

Also, as a data hash with no checksum, txids are not convenient for humans at all.

> Transaction malleability is already more or less fixed in Bitcoin

A couple months ago, there was a significant malleability attack on the Testnet, in which nearly every transaction was malleated as it was included in a block.

nullc · on July 20, 2017

It's also confused with the sources of malleability, listing DSA sign and DER encoding (which you note it calls asn.1) as the only sources; unfortunately there are a dozen of them... and as we came up with workaround with some, we'd find more. This is why are complete fix was needed rather than a series of hacks.

abrkn · on July 20, 2017

For the uninformed, nullc knows a thing or two about Bitcoin[1]

https://github.com/bitcoin/bitcoin/commits/master?author=gma...

f9beb4d9 · on July 20, 2017

> However, OpenSSL did not do strict validation of the ASN.1 data by default

The more interesting problem was that this was non deterministic, you could encode fields with 64bit integers and they would bomb out on 32bit systems. ASN1 is also mind bogglingly complex, you can encode to arbitrary depths completely nonsensical things like negative numbers and strings, containers of multiple elements, none of the implementations manage to decode blocks the same or adhere to the same limits.

nullc · on July 20, 2017

We've identified ~some~ of OpenSSL's strange behaviors and documented them for the purpose of making a bug compatible implementation (https://github.com/bitcoin-core/secp256k1/tree/master/contri...), which required that:

- All numbers are parsed as nonnegative integers, even though X.609-0207 section 8.3.3 specifies that integers are always encoded as two's complement.

- Integers can have length 0, even though section 8.3.1 says they can't.

- Integers with overly long padding are accepted, violation section 8.3.2.

- 127-byte long length descriptors are accepted, even though section 8.1.3.5.c says that they are not.

- Trailing garbage data inside or after the signature is ignored.

- The length descriptor of the sequence is ignored.

But some things were just too awful to implement, e.g.

- Using overly long tag descriptors for the sequence or integers inside, violating section 8.1.2.2.

- Encoding primitive integers as constructed values, violating section 8.3.1.

This last is especially fun, in OpenSSL you can create a constructed value (like a struct) of constructed values of constructed values of strings.. and it will just concatenate up all the bytes in the last level primitive elements and treat the result as a number. ... but only if it's not more than 7 (IIRC) levels deep.

richardwhiuk · on July 20, 2017

What's the bar here for 'too awful to implement', out of curiosity?

kens · on July 20, 2017

There's a list of nine different types of malleability here: https://github.com/bitcoin/bips/blob/master/bip-0062.mediawi...

And if you want to see what a malleability attack looks like at the byte level, I analyzed one three years ago: http://www.righto.com/2014/02/bitcoin-transaction-malleabili...

Uptrenda · on July 20, 2017

This is a very complicated way to explain TX malluability. It'd say that the problem is that signatures only sign a portion of the transaction and the resulting TXID that is used in the blockchain is based on hashing the entire transaction.

So the signature can be mutated as the author suggests, but the signature doesn't sign the entire section of the transaction anyway (where data is provided to the redeemScript to satisfy its conditions. This section called the scriptSig includes the sig which cannot sign itself.)

So with the scriptSig, anyone is free to add whatever new data they like to this section which gets added to the input stack, and as long as you leave the stack the same way as you found it you can insert any arbitrary junk as you like and it will change the resulting TXID as seen in the blockchain without invalidating the transaction.

This is a bad thing for "smart contracts" on Bitcoin because many contracts depend on making chains of unconfirmed, future transactions, based on hashing the entire transaction to compute its TXID (as Clarkmoody suggests.) An example of this is a cross-chain contract where you might want to send funds to a partially shared address between a stranger and yourself, and you need a way to setup a time-locked refund in case the protocol doesn't succeed (no longer necessary due to OP_CHECKLOCKTIMEVERIFY but its an example.)

To do refunds in this way you would need to be able to sign chains of unconfirmed transactions without previous transaction IDs being changed from transaction malleability. Bitcoin does include a fix for this called "segregated witness" but the fix has been controversial. I don't keep up to date with the "scaling progress" now but I doubt it has been merged yet.

uncoder0 · on July 20, 2017

Segregated Witness is almost locked in through BIP91.

https://www.xbt.eu/

rothbardrand · on July 20, 2017

I hope you are right. There is reason to believe that some of the signaling for BIP91 is false (e.g.: bitmain has announced intentions to do a fork http://bitcoincash.org / http://bitcoinabc.org )

I expect they are serious and we will have a fork into BCC (Bitcoin cash) and BTC (bitcoin)

Certainly on the BTC chain segwit will be locked in. Whether it claims the mantle of "bitcoin" or not is to be seen.

fpgaminer · on July 20, 2017

Bitmain has stated that they will throw their hashrate at BitcoinABC _only_ if BIP91 fails. It's their "big red button".

dfox · on July 20, 2017

One thing that strikes me as weird is the reference to ASN.1, I always thought that bitcoin only uses DER encoding for the signatures themselves (because that is what is usual for ECDSA, even thought it is suboptimal for multiple reasons) and the rest of the protocol including transaction format is specified in terms of bytes and varints. Have I missed something?

saurik · on July 20, 2017

I thought that was the entire point (though it is possible that I misunderstood myself): that the transaction identifier is formed by taking a hash of the entire transaction and the signature (which, of course, could not have been signed); if anything in the data being signed were modified then this would be a very different issue, so the only options for malleability are the signature and any structure connecting the signature to the data.

f9beb4d9 · on July 20, 2017

The script language itself is malleable due to being executed. `NOP NOP NOP PUSHDATA` has the same result as `PUSHDATA`, despite having different bytes and a different resulting hash. The `PUSHDATA` opcodes are also in themselves malleable, you can do a `PUSHDATA2` (push the next 2 bytes to the stack) or a `PUSHDATA4 (push the next 4 bytes to the stack) and get exactly the same output. These can largely fixed with policy, but for a lot of cases that add back in this behaviour- Segregated Witness simply doesn't include this data in the TXID hash (but it is hashed in a commitment for the block to avoid other attacks).

saurik · on July 20, 2017

For purposes of my intentionally super-zoomed-out view of this (as I think that is what is most valuable from a cryptography perspective), the script is part of the "structure connecting the signature to the data".

dfox · on July 20, 2017

What I meant is that there is no ASN.1 involved in the format of transaction itself, only thing that is serialized in DER is the argument to OP_PUSH_DATA as part of scriptsig

saurik · on July 20, 2017

That script is how the signature is attached to the data and is how the transaction is verified, though; it is no different at a conceptual level than "someone decided to attach the signature to the data using JSON and didn't realize that the order of the two fields mattered". Instead of using JSON, they are using a virtual machine, but that just provides even more opportunity for shenanigans ;P.

TD-Linux · on July 20, 2017

I think the reason the design is this way is because it was really convenient to get the ASN.1 format out of OpenSSL for the first implementation. If the protocol was designed today, there would be no ASN.1 involved at all. (see other comments for the problems associated with ASN.1 parsing)