TransWikia.com

Typesetting phonetic symbols: Unicode or tipa?

TeX - LaTeX Asked by Jason Zentz on November 9, 2021

On TeX.SX there are a lot of questions of the form “How do I typeset [some phonetic symbol]?” For example:

One of the issues that often comes up in answers to these questions is whether International Phonetic Alphabet (IPA) symbols should be typeset using

What should a user consider when deciding which of these approaches to use?

This question is related to Is aa or å preferred?, but I’m focused on the tipa package specifically here, not the broader question of LaTeX-based macros vs. Unicode. This is also related to questions about pdfLaTeX vs. XeLaTeX vs. LuaLaTeX, but again I’m focused on tipa, which can be used with any of those.

4 Answers

If it helps, and to add to @yannis' answer, the old xunicode package (back when xelatex was the only UTF8-aware engine) redefines tipa commands to Unicode.

So you can use fontspec to choose whichever fonts are suitable.

xunicode can run under lualatex with one additional code line (see MWE).

Some examples:

CMU Serif

Linguistics Pro

Noto Serif

Gentium Plus

Not all fonts have full coverage of the glyphs.

Items in red indicate potential revision of the macro definition might be required.

Some slight typing corrections in the yannis list have been silently made.

Conclusion

IPA symbols are a script in their own right.

For large volumes, direct input (using a dedicated keyboard overlay) would be the most efficient input method.

For using a smaller set of glyphs, named macros would keep the source code readable and easier to maintain compared to using codepoints (e.g., ^^^^0259 = ə), at the expense of typing in the macro names and knowing what the names actually are and mean. Perhaps shortcuts for the more commonly-used ones would help.

Tipa-as-unicode would fall at this smaller end of the spectrum, in terms of usage and convenience.

MWE

documentclass{article}
usepackage{xcolor}
usepackage{fontspec}
setmainfont{Noto Serif}
defXeTeXpicfile{}%so can compile with LuaLatex
usepackage{xunicode}
%usepackage{tipa}

newcommandfnamea{Noto Serif}
newcommandfnameb{Junicode}
newcommandfnamec{Linguistics Pro}
newcommandfnamed{DejaVu Serif}
newcommandfnamee{Gentium Plus}
newcommandfnamef{Liberation Serif}
newcommandfnameg{CMU Serif}

newfontfamilyffonta{fnamea}
newfontfamilyffontb{fnameb}
newfontfamilyffontc{fnamec}
newfontfamilyffontd{fnamed}
newfontfamilyffonte{fnamee}
newfontfamilyffontf{fnamef}
newfontfamilyffontg{fnameg}


newcommanddolist[2]{%1=font command,2=fontname
#1 Tipa Unicode commands using fbox{{large #2}} font.

First item is the Tipa macro, second item is the Unicode character directly.

AA{Å}
AE{Æ}
DH{Ð}
O{Ø}
Thorn{Þ}
TH{Þ}
ss{ß}
aa{å}
ae{æ}
dh{ð}
o{ø}
textthorn{þ}
textthornvari{þ}
textthornvarii{þ}
textthornvariii{þ}
textthornvariv{þ}
th{þ}
DJ{Đ}
dj{đ}
textcrd{đ}
textHbar{Ħ}
textcrh{ħ}
texthbar{ħ}
i{ı}
j{ȷ}
IJ{IJ}
ij{ij}
textkra{ĸ}
L{Ł}
textbarl{ł}
l{ł}
NG{Ŋ}
ng{ŋ}
OE{Œ}
oe{œ}
textTbar{Ŧ}
textTstroke{Ŧ}
texttbar{ŧ}
texttstroke{ŧ}
textcrb{ƀ}
textBhook{Ɓ}
textOopen{Ɔ}
textChook{Ƈ}
textchook{ƈ}
texthtc{ƈ}
textDafrican{Ɖ}
textDhook{Ɗ}
textEreversed{Ǝ}
textEopen{Ɛ}
textFhook{Ƒ}
textflorin{ƒ}
textGammaafrican{Ɣ}
texthvlig{ƕ}
hv{ƕ}
textIotaafrican{Ɩ}
textKhook{Ƙ}
textkhook{ƙ}
texthtk{ƙ}
textcrlambda{ƛ}
textNhookleft{Ɲ}
Ohorn{Ơ}
ohorn{ơ}
textPhook{Ƥ}
textphook{ƥ}
texthtp{ƥ}
textEsh{Ʃ}
ESH{Ʃ}
textlooptoprevesh{ƪ}
textcolor{red}{textpalhookbelow{t}}{ƫ} %command takes argument
textThook{Ƭ}
textthook{ƭ}
texthtt{ƭ}
textTretroflexhook{Ʈ}
Uhorn{Ư}
uhorn{ư}
textVhook{Ʋ}
textYhook{Ƴ}
textyhook{ƴ}
textcolor{red}{textEzh}{Ʒ} %Ǯǯ: misaligned char?
texteturned{ǝ}
textturna{ɐ}
textscripta{ɑ}
textturnscripta{ɒ}
textbhook{ɓ}
texthtb{ɓ}
textoopen{ɔ}
textopeno{ɔ}
textctc{ɕ}
textdtail{ɖ}
textrtaild{ɖ}
textdhook{ɗ}
texthtd{ɗ}
textreve{ɘ}
textschwa{ə}
textrhookschwa{ɚ}
texteopen{ɛ}
textepsilon{ɛ}
textrevepsilon{ɜ}
textrhookrevepsilon{ɝ}
textcloserevepsilon{ɞ}
textbardotlessj{ɟ}
texthtg{ɠ}
textscriptg{ɡ}
textscg{ɢ}
textgammalatinsmall{ɣ}
textcolor{red}{textgamma}{ɣ} % 
textramshorns{ɤ}
textturnh{ɥ}
texthth{ɦ}
texththeng{ɧ}
textbari{ɨ}
textiotalatin{ɩ}
textiota{ɩ}
textsci{ɪ}
textltilde{ɫ}
textbeltl{ɬ}
textrtaill{ɭ}
textlyoghlig{ɮ}
textturnm{ɯ}
textturnmrleg{ɰ}
textltailm{ɱ}
textltailn{ɲ}
textnhookleft{ɲ}
textrtailn{ɳ}
textscn{ɴ}
textbaro{ɵ}
textscoelig{ɶ}
textcloseomega{ɷ}
textphi{ɸ}
textturnr{ɹ}
textturnlonglegr{ɺ}
textturnrrtail{ɻ}
textlonglegr{ɼ}
textrtailr{ɽ}
textfishhookr{ɾ}
textlhti{ɿ}
textscr{ʀ}
textinvscr{ʁ}
textrtails{ʂ}
textesh{ʃ}
texthtbardotlessj{ʄ}
textcolor{red}{textraisevibyi}{ʅ} %ʅ
textctesh{ʆ}
textturnt{ʇ}
textrtailt{ʈ}
texttretroflexhook{ʈ}
textbaru{ʉ}
textupsilon{ʋ}
textscriptv{ʋ}
textvhook{ʋ}
textturnv{ʌ}
textturnw{ʍ}
textturny{ʎ}
textscy{ʏ}
textrtailz{ʐ}
textctz{ʑ}
textezh{ʒ}
textyogh{ʒ}
textctyogh{ʓ}
textglotstop{ʔ}
textrevglotstop{ʕ}
textinvglotstop{ʖ}
textstretchc{ʗ}
textbullseye{ʘ}
textscb{ʙ}
textcloseepsilon{ʚ}
texthtscg{ʛ}
textsch{ʜ}
textctj{ʝ}
textturnk{ʞ}
textscl{ʟ}
texthtq{ʠ}
textbarglotstop{ʡ}
textbarrevglotstop{ʢ}
textdzlig{ʣ}
textdyoghlig{ʤ}
textdctzlig{ʥ}
texttslig{ʦ}
textteshlig{ʧ}
texttesh{ʧ}
texttctclig{ʨ}
textprimstress{ˈ}
textlengthmark{ː}


textsc{Shortcuts}: textipa{["pI*Di]}
textipa{[!b] [:r] [;B]}

textsc{Input Methods}: 

[textsecstresstextepsilon kspltextschwatextprimstress netextscitexteshtextschwa n]
:
textipa{[""Ekspl@"neIS@n]}

vtextturnv v wtextsca w ytextturny y [textesh]
:
textipa{v2v wtextsca w yLy [S]}

[textipa{S}]
:
textipa{[S]}
par
vspace{3ex}
hrule
vspace{4ex}
}



begin{document}

dolist{ffonta}{fnamea}
dolist{ffontb}{fnameb}
dolist{ffontc}{fnamec}
dolist{ffontd}{fnamed}
dolist{ffonte}{fnamee}
dolist{ffontf}{fnamef}
dolist{ffontg}{fnameg}

end{document}

Answered by Cicada on November 9, 2021

Unicode, definitely. The sole exception is if your publisher doesn’t support it. PDFTeX, for example, cannot handle combining Unicode characters, only precomposed ones.

The tipa package was last updated in 2004. The only fonts it supports are Computer Modern Roman/Sans-Serif/Typewriter, Times, and Helvetica. It loads an 8-bit font encoding, making it difficult to use in the same document as non-European scripts.

You can use Unicode input with tipa (other than combining accents in PDFTeX) by setting the Unicode character active with DeclareUnicodeCharacter or newunicodechar. If you want to use tipa-like commands, a modern package would probably declare them to check iffontchar, use the Unicode symbol in the current font if it has it, and fall back, perhaps to an 8-bit font, otherwise. You can write that yourself, but tipa doesn’t do it, nor does inputenc support T3.

Answered by Davislor on November 9, 2021

If for some reason you have TIPA commands such as textturnscripta in your LaTeX document but you are using an Unicode-compliant TeX engine and your current font contains all necessary glyphs (e.g., the FreeSerif font family), then here are some redefinitions of TIPA (and other similar LaTeX) commands that produce the corresponding Unicode characters:

defAA{Å}
defAE{Æ}
defDH{Ð}
defO{Ø}
defThorn{Þ}
defTH{Þ}
defss{ß}
defaa{å}
defae{æ}
defdh{ð}
defo{ø}
deftextthorn{þ}
deftextthornvari{þ}
deftextthornvarii{þ}
deftextthornvariii{þ}
deftextthornvariv{þ}
defth{þ}
defDJ{Đ}
defdj{đ}
deftextcrd{đ}
deftextHbar{Ħ}
deftextcrh{ħ}
deftexthbar{ħ}
defi{ı}
defj{ȷ}
defIJ{IJ}
defij{ij}
deftextkra{ĸ}
defL{Ł}
deftextbarl{ł}
defl{ł}
defNG{Ŋ}
defng{ŋ}
defOE{Œ}
defoe{œ}
deftextTbar{Ŧ}
deftextTstroke{Ŧ}
deftexttbar{ŧ}
deftexttstroke{ŧ}
deftextcrb{ƀ}
deftextBhook{Ɓ}
deftextOopen{Ɔ}
deftextChook{Ƈ}
deftextchook{ƈ}
deftexthtc{ƈ}
deftextDafrican{Ɖ}
deftextDhook{Ɗ}
deftextEreversed{Ǝ}
deftextEopen{Ɛ}
deftextFhook{Ƒ}
deftextflorin{ƒ}
deftextGammaafrican{Ɣ}
deftexthvlig{ƕ}
defhv{ƕ}
deftextIotaafrican{Ɩ}
deftextKhook{Ƙ}
deftextkhook{ƙ}
deftexthtk{ƙ}
deftextcrlambda{ƛ}
deftextNhookleft{Ɲ}
defOHORN{Ơ}
defohorn{ơ}
deftextPhook{Ƥ}
deftextphook{ƥ}
deftexthtp{ƥ}
deftextEsh{Ʃ}
defESH{Ʃ}
deftextlooptoprevesh{ƪ}
deftextpalhookbelow{ƫ}
deftextThook{Ƭ}
deftextthook{ƭ}
deftexthtt{ƭ}
deftextTretroflexhook{Ʈ}
defUHORN{Ư}
defuhorn{ư}
deftextVhook{Ʋ}
deftextYhook{Ƴ}
deftextyhook{ƴ}
deftextEzh{Ʒ}
deftexteturned{ǝ}
deftextturna{ɐ}
deftextscripta{ɑ}
deftextturnscripta{ɒ}
deftextbhook{ɓ}
deftexthtb{ɓ}
deftextoopen{ɔ}
deftextopeno{ɔ}
deftextctc{ɕ}
deftextdtail{ɖ}
deftextrtaild{ɖ}
deftextdhook{ɗ}
deftexthtd{ɗ}
deftextreve{ɘ}
deftextschwa{ə}
deftextrhookschwa{ɚ}
deftexteopen{ɛ}
deftextepsilon{ɛ}
deftextrevepsilon{ɜ}
deftextrhookrevepsilon{ɝ}
deftextcloserevepsilon{ɞ}
deftextbardotlessj{ɟ}
deftexthtg{ɠ}
deftextscriptg{ɡ}
deftextscg{ɢ}
deftextgammalatinsmall{ɣ}
deftextgamma{ɣ}
deftextramshorns{ɤ}
deftextturnh{ɥ}
deftexthth{ɦ}
deftexththeng{ɧ}
deftextbari{ɨ}
deftextiotalatin{ɩ}
deftextiota{ɩ}
deftextsci{ɪ}
deftextltilde{ɫ}
deftextbeltl{ɬ}
deftextrtaill{ɭ}
deftextlyoghlig{ɮ}
deftextturnm{ɯ}
deftextturnmrleg{ɰ}
deftextltailm{ɱ}
deftextltailn{ɲ}
deftextnhookleft{ɲ}
deftextrtailn{ɳ}
deftextscn{ɴ}
deftextbaro{ɵ}
deftextscoelig{ɶ}
deftextcloseomega{ɷ}
deftextphi{ɸ}
deftextturnr{ɹ}
deftextturnlonglegr{ɺ}
deftextturnrrtail{ɻ}
deftextlonglegr{ɼ}
deftextrtailr{ɽ}
deftextfishhookr{ɾ}
deftextlhti{ɿ}
deftextscr{ʀ}
deftextinvscr{ʁ}
deftextrtails{ʂ}
deftextesh{ʃ}
deftexthtbardotlessj{ʄ}
deftextraisevibyi{ʅ}
deftextctesh{ʆ}
deftextturnt{ʇ}
deftextrtailt{ʈ}
deftexttretroflexhook{ʈ}
deftextbaru{ʉ}
deftextupsilon{ʊ}
deftextscriptv{ʋ}
deftextvhook{ʋ}
deftextturnv{ʌ}
deftextturnw{ʍ}
deftextturny{ʎ}
deftextscy{ʏ}
deftextrtailz{ʐ}
deftextctz{ʑ}
deftextezh{ʒ}
deftextyogh{ʒ}
deftextctyogh{ʓ}
deftextglotstop{ʔ}
deftextrevglotstop{ʕ}
deftextinvglotstop{ʖ}
deftextstretchc{ʗ}
deftextbullseye{ʘ}
deftextscb{ʙ}
deftextcloseepsilon{ʚ}
deftexthtscg{ʛ}
deftextsch{ʜ}
deftextctj{ʝ}
deftextturnk{ʞ}
deftextscl{ʟ}
deftexthtq{ʠ}
deftextbarglotstop{ʡ}
deftextbarrevglotstop{ʢ}
deftextdzlig{ʣ}
deftextdyoghlig{ʤ}
deftextdctzlig{ʥ}
deftexttslig{ʦ}
deftextteshlig{ʧ}
deftexttesh{ʧ}
deftexttctclig{ʨ}
deftextprimstress{ˈ}
deftextlengthmark{ː}

Answered by yannis on November 9, 2021

Unicode's advantages

As I see it, there are many advantages to using a Unicode font with XeLaTeX/LuaLaTeX, some of which are mentioned in answers to the above questions and in other places, notably Alan Munn's answers to How to use phonetic IPA characters in LaTeX and Preparing a text for conversion to LaTeX: How to convert "ejective stops" in TIPA?:

  • Code readability. There's no doubt that [ˌɛkspləˈneɪʃən] is easier to (proof)read than textipa{[""Ekspl@"neIS@n]}.
  • No special command/environment. There's no need for textipa{...}, {tipaencoding ...}, or begin{IPA} ... end{IPA}.
  • Cross-application compatibility
    • Copy and paste. If you are bringing your data into a .tex file from another application (e.g., Excel, Toolbox, ELAN, FLEx, etc.), Unicode input allows you to simply copy and paste without any conversion to tipa (or other LaTeX) macros. And if you want to take an example from your .tex file and put it in a Word document, email, or webpage, copy and paste works on the way out too.
    • Keyboard shortcuts. If you already use a Unicode IPA keyboard layout (or a keyboard layout specific to a language you work on), you can use the same shortcuts you would use in any other Unicode application. See my answer to Accessing IPA characters when using Charis SIL for more about using a keyboard layout for Unicode IPA input.
  • PDF usability. When you use a Unicode font, people who only have access to the resulting PDF can search for IPA symbols, and they can copy and paste them out of the document, too. This is only occasionally true for PDFs that use tipa, as discussed at How to use the real letters in a pdf?.
  • Flexibility in font selection. You can use any Unicode font that has the characters you need. tipa's options give you symbols designed to match Computer Modern, Times, or Helvetica, and that's it.
  • Font consistency. You can choose a font whose IPA symbols were created by the designer of the rest of the font, so they will match the body text. tipa matches Computer Modern quite well, but it merely approximates Times and Helvetica.
  • OpenType features. There are several Unicode fonts with full IPA coverage that also have OpenType features that you can make use of using fontspec in XeLaTeX or LuaLaTeX. For example, Charis SIL has alternate glyphs for literacy applications (<ɑɡ> instead of <ag>, etc.) and for localization (e.g., variant glyphs for <Ŋ> and <ʋ>).

tipa's advantages

There are at least two advantages to tipa, possibly a third:

  • Backwards compatibility. It's a lot of work to convert tipa code into Unicode, and it might not be worth it if most of the data you are working with is already coded for tipa.
  • Control over microtypography. tipa has commands (section 4 of the manual) that allow you to place diacritics manually and make some other fine adjustments to kerning, etc. Some Unicode fonts allow diacritic stacking and correct placement of modifier letters, but this varies widely across fonts.
  • Ease/speed of input. This is frequently mentioned in favor of tipa, but personally I've never found using tipa shortcuts to be any faster than using a Unicode IPA keyboard layout with mnemonic, semantic key assignment.

Conclusion

The package tipa should be considered a legacy method for using IPA characters in LaTeX, just as other non-Unicode fonts have been phased out (e.g., IPAPhon and the non-Unicode versions of the SIL and LaserIPA fonts). It may be necessary to use tipa in some circumstances for compatibility reasons (using already tipa-coded data, following a publisher's style guide, etc.), but in general users should strongly consider using Unicode with XeLaTeX or LuaLaTeX.

Answered by Jason Zentz on November 9, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP