Boot Loader for Bit-Slice Processor

Exploring the Hardware High-Level Block-Diagrams of some micro-controllers (further, MCU) we may find, that it include MPU, System Clocks and Peripherals, ROM, FLASH, RAM and Boot Loader. MPU in its turn usually has Controller of Commands Sequence (further, Sequencer) inside. The Sequencer feed instructions to the MPU, handle jumps, calls, returns, and memory accesses. Depending of MCU complexity and purpose, the task of Boot Loader may be copying of software program from interior FLASH, ROM or some external resource to the fast RAM. Loaded software can be started from prescribed address to provide regular work or debugging. So, in general, Boot Loader is a separate hardware resource inside the chip. In this article I'll explain in brief a Boot Loader, which instead of additional separate resources actively using Sequencer – sort of hardware bootstrap.

In 1989 I got a task to develop a Graphics's Accelerator (further, GA) based on bit-slice processors. Quite modern and productive that time the Sequencer less AM29116 was chosen as MPU. It had an accumulator, ALU, status register, etc., providing 167 instructions, including bit manipulation, CRC generation, as well as a barrel shifter and 32 on chip 16-bit fast registers. Such opened architecture devices does not have any software support. Therefore, in addition my job was to define a micro-assembling language and create on that language a library of basic Graphics Kernel Standard primitives (GKS) to be used by C-level programmers (polygons, ellipses, etc.).

First version of GA had 80-bit wide word of program and did not fit to the standard PCB size, confirmed to the whole device, where GA was a part. After a deeper prototyping of algorithms and functions on assembler of PDP-11 (best I ever met because of its strong, simple and clear organization) the word width was decreased to 64 bits. Funny, but exactly the chips, intended to provide regular realization of Boot Loader on counters and some other logic elements, could not be placed on the PCB.

One of my favorite expressions that time was 'wife is an engine of progress'. Wise guys were arguing – 'wife's wardrobe is engine of progress'. Later, the most learned people declared love as desease and closed the case :). Concerning the scheme of GA I can say that it was created with a slogan 'PCB size limitation is an engine of progress'. At the next figure I'm presenting first two pages of its scheme, related to the subject of this article. Initially GA was not intended for medical equipment, but as I know was additionally used in a few TOMOGRAPHs, made in my country right before Gorby's 'perestroika'.

I'm presenting this bad quality photo just to prove that idea was implemented, because had very hard and nervous time trying to explain how it will work, but did not get chief's understanding up to the moment of getting assembled board from the factory. Probably nobody in current time needs detail scheme of bit-slice based MCU. On the other hand, an idea might be interesting, for example, to the chip-designers. You know, in the 1995 I was invited to the USA Blood Center to show the very first version of own 'Blood Center Manager' software. After demonstration it was approved by facility owner, but a very day before installation I found in the newspaper an article, titled sort of 'FDA scrutiny of software, which can be flexible tuned by a user needs'. My software did not have any adjustments. Next day, in spite of inviter's insistence to setup the software, I packed my suitcase and got back home. But in the end of 1996 I brought first flexible software. So, who knows what kind of idea you can create after reading this article.

GPU

Brief description of blocks, presented on the scheme:

  • 1,5 bus driver
  • 2 two digital comparators
  • 3 system clocks generator
  • 4 connector to Host
  • 6,11,15 register
  • 7 sequencer (AM2910)
  • 8 controller of interrupts
  • 9 64-bit RAM
  • 10 64-bit registers (RAM sync in parallel mode or RAM load/read in serial mode)
  • 12,13 multiplexers
  • 14 MPU (AM29116)

Explanation of Idea

As you can see at the first diagram, instead of ROM, GA uses the fast 4Kx64 words RAM for storing of program image, which address-bus controlled by Sequencer, while image itself can be loaded via Serial/Parallel Register (further, SPR). RAM loading is simple procedure - host shifting each 64-bit word of program into SPR via 'Serial Data In' line, clocking it by 'Serial Data Shift' and storing to the RAM by CE and WR/RD lines. But control of the RAM address is a little bit tricky and needs a short explanation of functions, the Sequencer can support by its 16 instructions:

  • jump to 0-address (instruction code 0x0)
  • auto-increment by 1 of address
  • reiteration of the same address
  • conditional and unconditional jump to address, stored in some interior register or presented on its input bus
  • write to interior stack and interior loop counter
  • loops until interior counter not empty
  • etc.

In addition, to support synchronization of different blocks, the 'System Clocks' provide quite complex pattern.

Therefore, by default GA's program image can be loaded starting from address 0, because host hold the line 'Run' low, which is a reason of 'Register' reset and passing of instruction 'jump to 0-address' to Sequencer. Next, host setting line 'Run' high and 'pumping' SPR with a short initial pattern of instructions, which 'System Clocks' writing to the Sequencer via appropriate 'Registers'. RAM writing denied this time by line 'CE'. In a result, depending of task, the Sequencer may be loaded with initial control data of:

  • loop counter (program image size)
  • address from which image loading must start
  • address from which program will start after image loaded (if 'System Clocks' did not set HALT condition)
  • instruction of address auto-increment loop repeat until interior counter not empty (must be the very last, because from that moment real loading of RAM begins)

Notice, that you may load just a part of whole image, which is usable for debugging. That's all about loader.

Probably not many modern developers knows or remember bit-slice processors and them quite short, but interesting history. So, based on GA as example, I will add a few words about software development process for such devices.

Microprogramming

First of all you have to describe an assembler language, which in a simple words is an easy to use textual representation of each 64-bit wide command, that MCU can execute. For example, as I was mentioning above, AM29116 has 167 instructions set, which is huge amount comparing to common RISC MCUs. Its instruction-bus is 16-bit wide, plus some bits supports access to the interior registers and some special ports. If you'll try to describe all possible variants of the 64-bit word with different instructions of MPU, Sequencer and peripheral stuff, you'll get stuck very soon. So, good idea is to start from the description of logical groups of bits, accordingly to its purpose. But even this approach is not enough on the way to create easy applicable assembler language. For example, some commands must include the fields of conditional branches, but another do not need this, while have to support simultaneous control of some peripheral blocks.

Therefore, you'll come to the idea of different language's profiles. GA's assembler language had almost 20 profiles. Here is a brief example of few profiles, just to 'feel a taste' of process. Notice the variables colored in red. For example, compare the length of the field FUNC on the PROFILE #1 and #9. Symbol X means, that programmer may skip definition of the field value in the program and use that symbol instead, as default.

PROFILE #1
WORD LEN 64
FIELDS NUMBER 6

FIELD #1 FUNC
LENGTH 8 TYPE SYM
BITS 20,19,18,13,12,11,10,34

SET/R   11111011
....  
TST/R   11111111

 

FIELD #2 CLC
LENGTH 5 TYPE DIG
BITS 17,16,15,14,31

X   11110
0   00001
1   00011
....  
> 0   00001

 

FIELD #3 OPERAND
LENGTH 28 TYPE SYM
BITS 30,9,8,7,...24

R0   1111111111000110011000111001
....  
@RB2   1111111101000110011001111001

 

FIELD #4 TEST
LENGTH 6 TYPE SYM
BITS 47,39,38,37,40,36

X   111111
NE   000011
....  
SYNCS/SS   11111011

 

FIELD #5 SQNS
LENGTH 5 TYPE SYM
BITS 64,63,62,61,48

X   11101
JZ   00001
JNZ   00000
....  
LOOP   11011

 

FIELD #6 JMP
LENGTH 12 TYPE SYM DIG
BITS 49 #12

X   111111111111
0   010001000000
1   010111110000
....  
DX-EY+   100111110000

 

….......................................................

PROFILE #9
WORD LEN 64
FIELDS NUMBER 5

FIELD #1 FUNC
LENGTH 21 TYPE SYM
BITS 20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,34

SETONCZ   011101110100001100111
SETL   011101110100010101010
....  
INC/A   1111110010000000100010
COM/#A   111110101110000100010

 

FIELD #2 MANPER
LENGTH 20 TYPE CONST
BITS 31,30,35,33,29,27,28,25,26,32,44,41,21,45,43,46,42,24

X 10010101111011111100

FIELD #3 TEST
REFER: same as TEST in PROFILE #1

FIELD #4 SQNS
REFER: same as SQNS in PROFILE #1

FIELD #5 JMP
REFER: same as JMP in PROFILE #1

After language defined you are ready for programming of own great graphics library to support programmers of higher level language, for example C. What about 'draw_polygon(vector, fill_pattern, …)':

LIPT: TST/R 3 R30 EQ CJP LIPDT;
  MOV/#R R16 X X X;
  # 5777 X CJP LDTYPE;
  LIPDT : MOV/#R R16 X X X;
….  
  TF2S CJP LF;
  SHR0/RR R2 R2 X X X;
  TST/R 0 R28 EQ CJP SCND1;
  SET/R 15 R2 X X X;
SCND1: SHRL/RR R28 R28 X CJP LDHAT;
LF: SHR0/RR R28 R28 X X X;
  TST/R 15 R2 EQ CJP SCND2;
....  

 

As you see, even 25 years ago it was quite simple task :). In the current time, you can easily place all GA into the one FPGA, and using IP-cores of ARM, RAM and logic elements describe it on C, VHDL or Verilog and preliminary test without hardware prototyping with help of, for example, Quartus and Model-Sim design tools. What a wonderful times!..

P.S. Since Similar devices are used as part of other computing resources, support the boot process from a systemic point of view represents a next level of designer's interest. But in the old days did not exist even Intel Boot Initiative, so the said means of support were not standard. In the development of modern devices, of course you need to pay serious attention to the implementation of Unified Extensible Firmware Interface (UEFI) and Platform Initialization (PI).