Retrocomputing Asked on January 3, 2022
On the Intel 8080, all conditional instructions have varying execution time depending on whether the condition is true or not, except conditional jumps. Why is this?
The Intel 8085 changes this; there, all conditional instructions are faster in cycles (but not necessarily in states), and conditional jumps now have varying execution time too, with the 8080 cycles/states being the worst case (ie. if the condition is true).
The Intel 8085 Programming Manual partly answers this:
Execution of conditional instructions on the 8085 differs from the 8080. The 8080 fetches all three instruction bytes whether or not the condition is satisfied. The 8085 evaluates the condition while it fetches the second instruction byte. If the specified condition is not satisfied, the 8085 skips over the third instruction byte and immediately fetches the next instruction. Skipping the unnecessary byte allows for faster execution.
So on the 8080, all conditionals fetch the third byte regardless of whether the condition is satisfied, but this doesn’t really answer why all conditionals on the 8080 vary in execution time except jumps. Obviously jumps are much quicker to execute than calls and returns; if a conditional call or return is true, stack manipulation is needed. But how can setting the PC for a jump be done in "zero time"? What is the technical reason for that?
I noted that later on the manual says the following:
Notice that two sets of cycle/state specifications are given for 8085 conditional call and jump instructions. This is because the 8085 fetches the third instruction byte only if it is actually needed; i.e., the specified condition is satisfied.
I’m not sure why it mentions conditional call here, since that also had varying execution time on the 8080; conditional returns also have two sets of cycle/state specifications on the 8085 for obvious reasons.
I'm not really clear what the underlaying problem/question is, as you already cite all relevant information to answer waht seams to be the question. Thus it's hard to give a straight answer, meaning I have to 'swing along' - so maybe lets start with an
To start with, let's look at the cycle data:
8080 8085
Branching Cycles/States Cycles/States
Instruction Bytes Taken Not-Taken Taken Not-Taken
JMP 3 3/10 --- 3/10 ---
Jcond 3 3/10 3/10 3/10 2/7
CALL 3 5/17 --- 5/18 ---
Ccond 3 5/17 3/11 5/18 2/9
RET 1 3/10 --- 3/10 ---
Rcond 1 3/11 1/5 3/12 1/6
Notable here that Intel summarises all PC changing instruction (these plus RST
and PCHL
) under Branching Instructions.
To understand a 80-style (*1) CPU working it's important to keep in mind that it's organized as
For this we can ignore the Phi1/2 clocks - they are only relevant if one wants to dissect how registers are loaded (*4).
Within an instruction the machine cycles are simply numbered in order as M1..M5. There are 10 types (*5). For branching only 5 are relevant:
Further it's helpful to have a look at the internal structure:
(Taken from p.4-2 of the 1979 MCS-80/85 Family User's Manual)
Please not the yellow marked registers, especially W
and Z
but as well TMP
and the buffers.
The first M-cycle is always instruction fetch and execution and takes 4/5 clock cycles. 3 to fetch the opcode, one or two to perform the operation.
PC
) and Status WordPC
Interesting for timing considerations is T4/T5. They cover are 3 different workings.
Only T4, No operation. Usually with instructions loading data later on.
Only T4 for internal operation. For example all ALU operation load A
into the ALU latch
, the ones using register as second operand load it into TMP
.
T4 and T5 for internal operation. For example move instructions between registers move the source register content in T4 into TMP
and in T5 into destination register.
A detail often overlooked is that the 8080 does do pipelining. All single byte ALU (*6) operations (i.e. with no operand or a register operand) will be finished in T2 of the following M1 (fetch) cycle, as the ALU result will only then be copied into A
.
Depending on the instruction this will read/write memory and optional increment the PC
.
PC
, when reading a second instruction byteHL
for all instructions using memory addressing (ADD M
)SP
in case of stack instructions (PUSH/POP/CALL/RET)BC
or DE in case of STAX/LDAXPC
when PC
addressing was usedSP
when SP
addressing was used(To simplify from here on only what's relevant for branches)
JMP
/Jxx
/CALL
/Cxx
) move data into Z
RET
/Rxx
) the value loaded into the low part of the destination register - PCL
in case of RET.Much like M2, now for the second byte
PC
, when reading a third instruction byteSP
in case of stack instructions (RET)PC
when PC
addressing was usedSP
when SP
addressing was usedW
RET
) the value loaded into the low part of the destination register - PCH
in case of RET
.For RET
/Rxx
the instruction ends here, as PC is now prepared to fetch the next instruction (usually the one after CALL
/Cxx
)
Finishing of a JMP
/Jxx
instruction is again pipelined into the following instructions. The address collected in WZ
is outputted in the T1 of M1 of the following cycle (instead of PC
) and is send via the incrementer into PC
in T2 (WZ+1->PC
).
CALL
/Cxx
)As the PC
now holds the next instruction after CALL
/Cxx
, it only needs to be pushed:
M4:
SP
SP
PCH
Repeated for M5 with PCL
.
Like jumps before (M3 1/2) calls pipeline into the following instructions. Address outputted in the T1 of M1 of the following cycle is provided by WZ
instead of PC
, and in T2 WZ
is send via the incrementer into PC
.
It's easy to see that conditional calls and returns to not (and should not) perform any stack access. Thus T4/T5 for Cxx
and T2/T3 for Rxx
will not be executed - saving 3 T-sates each (6 total).
While the 8085 in general does everything like the 8080, branch as well as stack instructions have changed in timing. Relevant here are two three:
Jxx
) now skip fetching the second address byteCxx
skips as well fetching the second address byteCALL
/Cxx
/Rxx
/RST
now have an M1 cycle of 6 T-satesThe skipping the second byte of a conditional jump/call is possible due a modification enabeling the incrementer to increment by two as well.
Now for the question
On the Intel 8080, all conditional instructions have varying execution time depending on whether the condition is true or not, except conditional jumps. Why is this?
The savings are only for stack related branching instructions (Cxx
/Rxx
) as it wouln't be a good idea to manipulate the stack if they are not taken (see "Conclusion" above) .
The Intel 8085 changes this; there, all conditional instructions are faster in cycles (but not necessarily in states)
No, they are not. They are in fact one cycle slower - which is offset by saving 3 cycles (M3) only when not taken.
So on the 8080, all conditionals fetch the third byte regardless of whether the condition is satisfied, but this doesn't really answer why all conditionals on the 8080 vary in execution time except jumps.
Because of no stack operations (see above)
But how can setting the PC for a jump be done in "zero time"? What is the technical reason for that?
By using WZ
to address the target instruction and loading PC
only afterwards (for free after incrementing WZ
). See the pipelining part above.
I'm not sure why it mentions conditional call here, since that also had varying execution time on the 8080
Because there's a second set only valid for the 8085?
*1 - 8080/8085/Z80 and most derivative
*2 - There are exceptions
*3 - If this sounds similar to a 6502, than it's because two non overlapping clocks deliver 4 edges, the basic building block of many CPUs. The main difference is that the 8080 needs an external clock generator (8224), while 6502 and 8085 got it build in.
*4 - As so often, Ken Shirriff provides a great in depth analysis of the 8085 register set and its detailed workings (basically applicable for the 8080 as well). Including an easy example down to the separate clocks. Go there for all fancy details - I'll rather stay at the surface for this.
*5 - Cycles are marked by the status word, outputted on the data line during T1 of M1 (marked by Sync).
*6 - There are as well other like XCHG, EI/DI or HALT.
Answered by Raffzahn on January 3, 2022
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP