Why does DER enforce minimal encoding rules? How did non-strict DER encoding historically cause problems in Bitcoin?
Quick answer: DER does not allow multiple encodings for the same data
To understand DER, we need to know about BER. And to understand BER we first need to talk about ASN.1. Let's encrypt has written a good article about all of these, and this MD that you're reading will be based on that: https://letsencrypt.org/docs/a-warm-welcome-to-asn1-and-der/.
ASN.1 stands for Abstract Syntax Notation. It's a way of specifying data structures. Here's an example:
Point ::= SEQUENCE {
x INTEGER,
y INTEGER,
label UTF8String
}
It doesn't have any other meaning than what you see. ASN.1 is not an encoding and is not tied to a programming language. That's why it's called "Abstract" - because it's generic. It's used in a lot of places where you need to standardize a data structure and you want to be language-agnostic.
BER stands for "Basic Encoding Rules" and is a set of rules that tells you how to take an instance of an ASN.1 speficiation and turn it into bytes. With other words: serialization.
BER is a Type-Length-Value (TLV) encoding. Each field of an ASN.1 SEQUENCE is encoded in such a TLV.
- Tag: a byte, or series of bytes, that tells you the type of the field
- Length: a byte, or series of bytes, that tells you how long the value is
- Value: the actual contents of the field
BER specifies how the Value section of a TLV looks for each type. Here are some examples of how the value gets encoded:
INTEGER 50 (ASN.1) -> 0x32 (one byte BER Value section)
INTEGER -549755813887 (ASN.1) -> 0xA0 0x00 0x00 0x00 0x01 (5 bytes BER Value section)
STRING "hi" (ASN.1) -> 0x68 0x69 (2 bytes BER Value section)
DER stands for "Distinguished Encoding Rules" and is a subet of BER. Any DER encoding is valid BER. But not every BER is valid DER.
The reason for which DER was added was because multiple BER encodings can represent the same data. Which is generally not a desired thing, and in the worst case, that can lead to software vulnerabilities.
So, the key takeaway about DER is: DER does not allow multiple encodings for the same data.
Here are some differences between DER and BER:
- In DER, only definite lengths are allowed
- Shortest possible encoding must be used
Here's the ASN.1 definition of an ECDSA signature:
ECDSASignature ::= SEQUENCE {
r INTEGER,
s INTEGER
}
DER encoding of the ECDSA signature:
0x30 <len>
0x02 <len_r> <r_bytes>
0x02 <len_s> <s_bytes>
n1 razor, good shii