Spelling, in principle, should reflect pronunciation, but I’ve also read that the opposite can happen, and that the pronunciation of a word already in circulation can be changed by altering/standardising its spelling. I’m afraid I forget the publication – it was several years ago – but a striking example that was given was the word clothes, which, as it’s usually pronounced now, involves a cluster of consonants which is physically awkward for an English speaker to utter. According to this book, in centuries past the word was pronounced more like close (to rhyme with rose) which is a lot easier to say even today, and the change occurred because of the standardisation of spelling.

As anyone who’s handled old books knows, the genitive/possessive apostrophe did not become common practice until well into the eighteenth century, and was not fully standardised until the Victorian era. Before then, it was quite correct to write, for example, in the Articles of Union of 1707, "Her Majesties Great Seal". At that time, did people write "wifes" meaning wife’s and "wives" meaning wives’, or did they write "wives" for both, and did pronunciation reflect this? The construction wife’s is slightly awkward to say, and so I wouldn’t be surprised if it was an eighteenth century concoction.

The genetive singular of "wife" in Old English is wīfes. In Old English, fricatives s,θ,f are allophonically voiced between vowels. Therefore, that was pronounced [wi:vəs]. As explained in that handout on the fricative voicing rule, the rule was much more general in OE and became very restricted over time.

The reason for there being intervocalic voicing in OE is in part, simply, "because they can". There was no contrast between voiced and voiceless fricatives but there was a contrast in stops. Intervocalic voicing is a reasonably common phenomenon in the world's languages, especially as an allophonic process (where there is no voicing contrast). Neutralization of /s,z/ is certainly possible between vowel, but it is less likely compared to non-neutralizing processes (the reason for that, I believe, has to do with the fact that "patterns of articulation" and "rules of phonological grammar" are not the same thing).

The "naturalness" intuition may be partially the product of frequency. Clusters like [fs, vz] in the coda are themselves special, in involving root plus suffix. I suspect that [vz] is what you encounter most often, compared to [fs].

