Cryptography Asked on January 3, 2021
I am writing a toy1 transport protocol and I am trying to implement a feature whereby:
My initial thought was to use a well known algorithm such as MD5/SHA2 and simply take the first/last 16/48 bits. This raises a few questions:
What are the consequences of simply taking a “substring” of a checksum? I assume, perhaps naively, that it would just increase the likelihood of a collision.
Is there any reason to prefer taking the last x bits instead of the first x bits, or vice versa?
When truncating a checksum like this, does SHA actually perform any better than MD5? Keeping in mind I am concerned with errors, not attackers.
[1] That is to say, I’m creating it for the purpose of learning. So “just use UDP” or “just use TCP” does apply, but that’s why I’m not.
[2] Maybe use MD5 if it’s 16-bit, SHA if it’s 48-bit.
You can use a 16-bit truncation of MD5 or SHA-1 or SHA-256 or SHAKE128 as a checksum, but it's not a good checksum.
Why not? Let's model them by a uniform random function of $t$ bits, which is a reasonable model in this scenario. In this case, $Pr[H(x) = h] = 1/2^t$ for any message $x$ and hash $h$, and every $H(x)$ is independent for each distinct $x$. Thus, the probability of failing to detect an error $e$ in a message $x$ is $Pr[H(x + e) = H(x)] = 1/2^t$ for any message $x$ and any error $e$. This is the best error detection you can hope for from a hash function designed like a random oracle (or close to it, length extensions notwithstanding).
If you choose a good 16-bit CRC (preprint, paywall-free; web site), then you can be guaranteed:
Cryptographic hash functions designed for random oracle use like MD5, SHA-1, SHA-256, SHAKE128, etc., guarantee none of these. They're also often much more expensive to compute than error-detecting codes like checksums, because you're paying an extremely high cost for collision resistance which is of no concern for error detection.
- What are the consequences of simply taking a "substring" of a checksum? I assume, perhaps naively, that it would just increase the likelihood of a collision.
Collisions are not relevant here. The only thing that is relevant is the probability of error detection. Any substring of MD5 or SHA-1 or SHA-256 or SHAKE128 is as good as any other.
- Is there any reason to prefer taking the last x bits instead of the first x bits, or vice versa?
No. If this led to an appreciably different probability of error detection, then that would be a noteworthy result about the hash function.
- When truncating a checksum like this, does SHA actually perform any better than MD5? Keeping in mind I am concerned with errors, not attackers.
No. They're both bad choices for detecting independent random bit errors.
Correct answer by Squeamish Ossifrage on January 3, 2021
From what I have found, there is not a way to answer this question in a cryptographic context that will make cryptographers happy. If you have a cryptographic hash with 16-bits, or even 48-bits, you have limited protection on using the checksum to detect someone forging the data depending on how much data you send.
I would just use Error Correcting Codes as a checksum in the stream and forgo the idea of using the checksum for anything but stream verification and not data integrity.
It's also worth mentioning that hash takes a lot more cycles and complexity than you probably want if you have a simple device.
Answered by b degnan on January 3, 2021
What are the consequences of simply taking a "substring" of a checksum? I assume, perhaps naively, that it would just increase the likelihood of a collision.
That's correct for cryptographic checksums.
For non-cryptographic checksums it may also destroy properties with regards to finding e.g. single or double bit errors or regional errors (that the cryptographic hashes do not possess).
Is there any reason to prefer taking the last x bits instead of the first x bits, or vice versa?
No, not when using a hash that operates on bits / bytes / words, where each bit should be as probable depending on the input. This is true for almost all hashes in use; you may want to check hashes that are based on number theory - but in general you would not want to use those in the first place.
It's a non-formal standard to take the leftmost bits / bytes when truncating hashes.
When truncating a checksum like this, does SHA actually perform any better than MD5? Keeping in mind I am concerned with errors, not attackers.
No, both outputs could be thought of to be pseudo random.
As Eugene and poncho indicated in the comments, it is a much better idea to go for a suitable cyclic redundancy check with well specified error detecting capabilities.
Answered by Maarten Bodewes on January 3, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP