TransWikia.com

Replace nbsp with none string in utf-8 encoding in vim

Vi and Vim Asked on August 31, 2021

The nbsp(non breaking space)’s binary format in utf-8 encoding is c2a0.
Create a sample.txt this way :input it is a test then ctrl+v+u a0,then you get the sample test file.

xxd sample.txt
00000000: 6974 2069 7320 6120 7465 7374 c2a0 0a  it is a test....    

Now show the current encoding with set encoding,it is encoding=utf-8,i want to replace the nbsp with none string,it make me confused that

:%s/%uc2%ua0//
"the above command can't replace nbsp with none string in utf-8 encoding
:%s/%ua0//
"%s/%ua0//  can replace it

In utf-8 encoding why %s/%ua0// can replace nbsp with none string instead of %s/%uc2%ua0// ?

One Answer

You almost never need to deal with the binary UTF-8 encoding in Vim (or indeed, almost anywhere, including programming languages); for the most part you can just forget it exists as Vim takes cares of all of that for you: you just need to deal with the Unicode codepoints.

You can think of Unicode codepoints as the "human interface" and UTF-8 as one of several technical implementations. It's roughly similar to files on your disk: you use the same tools to create, edit, copy, and move files, and you don't deal with the filesystem format directly (ext4, NTFS, FAT, etc.) UTF-8 is like the filesystem.

At any rate, the codepoint for the non-breaking space is U+00A0, so you can just use that, as you already discovered:

%s/%ua0//

You can leave our the two leading zeros, although you can also include them if you want.

In this case it may look confusing since U+00A0 and 0xc2 0xa0 look kinda similar, but this is just an artefact of how UTF-8 encodes things (it's actually quite clever in many ways; the Wikipedia page can probably explain that better than I can).


See the Unicode code point of the current character may also be useful. If you want a CLI tool to know which codepoints are in a file then I actually wrote a little tool for this; here's a web demo for your specific example.

Correct answer by Martin Tournoij on August 31, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP