Wednesday, 25 November 2009

Inserting the BOM into a file

I have been working with Windows UTF-8 files a lot today, and I found that just saving the file as UTF-8 isn't enough for ASP.NET (well 1.1 at least). Even though emacs says that it will save the buffer with the BOM (Byte Order Mark) it doesn't seem to if the file doesn't start with one. So I wrote myself a little helper function to add the BOM into the start of the file. It works by going to the start of the buffer you are in, and adding the BOM FEFF. The exact bytes comprising the BOM for the Unicode character U+FEFF are converted into the UTF-8 format by emacs when it saves the file (which for reference are EF BB BF -- thanks to pnkfelix for pointing that out).
;;Insert the BOM at the start of a file for UTF
(defun insert-BOM()
  (interactive)
  (goto-char (point-min))
  (ucs-insert (string-to-number "FEFF" 16)) 
)

2 comments:

pnkfelix said...

I think you have mixed up the UTF8 and UTF16 BOM's.

Wikipedia says:
The UTF-8 representation of the BOM is the byte sequence 0xEF,0xBB,0xBF.

versus
UTF-16, a BOM (U+FEFF)

http://en.wikipedia.org/wiki/Byte_order_mark

Andrew Cox said...

Thanks for that, my understanding was based on what a review tool was checking for at my work. I will update the blog to represent this.