Are assembly languages untyped?

Question

I'm writing my Bsc thesis about type systems of various languages and I want to have a short section about assembly languages. Initially I thought I'll bring up assembly as a counter example to languages with advanced type systems. My goal was to explore the reasons why assembly omits most (all?) type system features but eventually I found papers about typed assembly languages and I started to get confused.
So here are my questions:

Can we say that assembly is generally untyped? If so, what is the reason we don't have assemblers with advanced types and type checkers?
Arguments for having a typed assembly language?

Here is a list of papers about typed assembly languages from the end of the last century: http://www.cs.cornell.edu/talc/papers.html.
Links to other papers are highly appreciated!

gnasher729 · Answer

JVM byte code can be considered an assembly language, and it is most definitely typed. Classes, interfaces, exceptions are actual parts of that assembly language. Any system that executes JVM byte code will actually include a byte code verifier to check that all the instructions are properly typed.

Answered by gnasher729 on December 20, 2020

Mozibur Ullah · Answer

Machine language is untuned. It's simply numbers which are interpreted as actions by the main and ancillary processors.
The most basic kinds of assembly language follow this. But there is no reason that types can't be used. For example, you could declare a certain set of data as a string. Then the assembler can emit an error when you attempt to do arithmetic on this data.
Undoubtably, types were invented in the higher languages, and have migrated down the language stack as their utility began to be recognised.

Ta Thanh Dinh · Answer

Can we say that assembly is generally untyped?

If you mean "assembly" as, e.g. x86 assembly language, then I think yes, to some degree. Types are some constraints that we can statically checked/proved, then there is so little (but not nothing) we can do given an x86 assembly program, e.g.

add rcx, [@addr]
jmp rcx

so it's possible to infer that [@addr] is a 64-bit memory values starting from @addr, but it's very hard to statically check whether the next jmp rcx is safe or not because we must know whether rcx is a valid address which contains executable code.

Another example is

mov rax, @pointer0
mov rcx, @pointer1
add rax, rcx

suppose that @pointer0 and @pointer1 are valid pointers, then this program should not type-check because adding two pointers seems meaningless. But x86 assembly allows that.

If we add an "advanced" type system into x86 assembly language, then... it wouldn't be this language anymore (the type system isn't independent from the language).

Arguments for having a typed assembly language?

Typed assembly language (TAL) means intermediate languages (not adding type system into existing x86 assembly language), its goals is (verbatim from the paper):

...to provide a fully automatic way to verify that programs will not violate the primitive abstractions of the language.

that means, given some program (at a high-level language) which has been type-checked to satisfy some properties, then when the program is compiled into another program (in TAL), we can still check (in TAL) that these properties are satisfied.

Artelius · Answer

Many assembly languages do have certain features that could be considered static typing. Most often this is for making programming easier, rather than type checking.

In many assembly languages you can define the equivalent of C's structs and unions. Many assembly languages also allow the usage of arrays, where the type (in the sense of byte-count) of the elements is determined at assemble time.

n.b. "x86 assembly" is a very vague term; every assembler has its own dialect. I'll refer to MASM, as it is the most featureful assembler.

It says a lot that MASM has the TYPEDEF keyword. It also has the ASSUME keyword which marks the value of a register as a pointer to a specific type which will affect later usages of the register. You can also use LENGTHOF, SIZEOF, and TYPE for arrays which return statically-available information.

In MASM, there is a distinction between near and far addresses and variables can be declared with certain types. Structs can be defined with a particular alignment which influences how the fields are accessed. Structs, unions and arrays can be nested. Bitfield types also exist.

ARM assembly is intended to be a compiler target rather than convenient for humans (as far as assemblies go) so it doesn't have any of these features.

Regarding machine code itself: Theoretically it is possible for a computer architecture to use extra bits to determine the type of that word. Some old computer architectures did use an extra bit to distinguish data from pointers.

D.W. · Answer

Assembly language is normally untyped, in the sense that there is no type-checking. Adding type-checking is a non-trivial research challenge (hence the papers you see). Papers on typed assembly language should explain the motivation. One application is that they can be used to support proof-carrying code, which can be used to securely execute untrusted code. Another potential application is to support formal verification. But I suggest reading some of the key papers to see what they say about the applications and motivation for typed assembly language.

You can find other papers yourself by doing a literature search -- see https://crypto.stackexchange.com/q/8316/351.

Are assembly languages untyped?

5 Answers

Add your own answers!

Ask a Question