Friday, January 18, 2008

irecho

IR Echo (infra red remote control stuff)

irecho-20080118a.tar.gz

I have been working on this project for right around 10 years now. No, not full time, not part time, some years I may think about it once and not act on it. I finally got around to working on it and actually finishing it.

I have a mid 1980's RCA TV, which at the time was a special order top of the line thing. Wouldnt be surprised if it was one of the early RCA TVs to use an infra red remote. The long and short of it is this TV has outlasted any TV purchased since, a couple of buttons on the TV, power in particular have worn out, but the remote, despite several moves and 20+ years is still working. If I lose the remote though the TV might become a brick. You cannot find this TV on google searches much less the remote. Either due to the rareness of the TV or how early its infra red remote is or isnt, it is not supported by any of the pre-programmed remote codes in universal remotes. Yes some remotes will "record" ir and then retransmit it, but that has limited success.

So my thought was, why not program the remote for a TV it does know about, receive the codes for that TV and then blast the codes for the real TV (that the remote does not know about). Take that one step further and you could just as easily do the same thing to support a non xyz brand device using an xyz branded remote.

The problem I had in this specific case was figuring out the IR protocol/codes for the old tv. I didnt do anything different this time around, yet, this time around I was able to figure out the codes without too much trouble. Generating those codes has been interesting, and most of the problem there was I was trying to do it in C with gcc whose code generation and optimization varied widely.

IR remote control is really not complicated at all.
http://www.sbprojects.com/knowledge/ir/ir.htm
Explains a number of protocols quite well. I used to use the documentation for a program you could run on an hp48 calculator to turn it into a remote control. Basically, an IR led is used and you blink it at a frequency like 40KHz for example. Blink it on and off for so many micro or milliseconds, then leave it off for so many micro or milliseconds, and repeat this, varying the length of the blinking and off periods. The receiver modules I have used do the analog work for you and the output is a square wave, with the blinking on/of period windowed by one output state ("high") and the non-blinking as a ("low"). The job for the microcontroller is to time the high and low periods and decode them based on a protocol. There are some standard protocols as described in the link above.

So for a number of reasons, after trying a few, I chose what the link above calls the Sony SIRC protocol. The IR receiver module I am using is the same or similar to one you can get at Radio Shack (see below). The SB-Projects page says the Sony protocol is 40KHz and the receiver I am using is 40KHz which is probably why I had better luck with that protocol than the RCA protocol (a modern one not the old one). When I used the RCA protocol it only worked between x and y feet from the receiver and you really had to aim right at the receiver. With the sony protocol you can be anywhere in the room and dont really have to aim at the receiver, many walls and other surfaces will reflect the IR and still work.

The Sony protocol starts with a 2.4ms pulse. I poll the ir input pin, wait for a state change, when there is a state change grab the counter from a timer, poll for the next state change, grab the counter, subtract the counters and if the time of the pulse is the right length plus or minus some acceptable slop (aiming at the walls or really close or really far from the receiver) call it good. From there I went into decoding the code, the blank periods should all be 600us plus or minus some slop followed by an on period that is either 1.2ms for an one,or 600us for a zero. The code has 12 bits (24 state changes after the start pulse). If you manage to collect 12 bits without any of the periods between state changes being out of range, call it a good code and process it. Simple.

The LM3S811 eval board runs at 6MHz by default so the 2.4ms start pulse is 0.0024 * 6000000 or 14400 timer ticks. The 1.2ms pulse is half that at 7200 ticks and the 600us periods should be around 3600 ticks. It is really easy to tweak and or rearrange the example to receive other protocols. Likewise change the math above and you can do this on the LM3S6965 or LM3S1968 (or others). If I remember right the LM3S1968 did not have 3.3v and ground and an available input pin within reach of each other to easily solder in a receiver, so I primarily worked with the LM3S6965 and the LM3S811 for this project. I didnt want to dedicate the 6965 to this job so I bought another '811 to dedicate to this job. Actually I bought some msp430 ez430 modules to dedicate to this job but, my guess is, because of the instability of the internal clock I couldnt get it to work. I also struggled getting the uart to work to print out the codes as I was figuring them out and eventually gave up and bought the LM3S811 board.

Decoding is half the fun, I used the same decoding method to finally figure out the protocol/codes for the older RCA remote. To blast the ir, I bought some IR leds from radio shack (part number?). ou have to know or figure out the frequency to blink the led and then bit blast that at a close enough rate to make it work. I figured the old remote was somewhere around 32KHz or some harmonic. I only really cared about it working at point blank range so I could be a little sloppy. The problem is that 6MHz/32000 is not an even number of timer ticks. Even if it were I was struggling trying to get the C compiler to produce clean enough code to get the state changes reasonably close. I eventually gave up and wrote a some hand tuned assembler which worked quite well. But was, of course, tuned for my 6MHz LM3S811


/* hand tuned ir blinker for the '811 */
.thumb_func
.global blinker
blinker:
;@ r0 times, r1 address
2:
mov r2,#0xFF
str r2,[r1]
mov r2,#29
1: sub r2,r2,#1
bne 1b
mov r2,#0x00
str r2,[r1]
mov r2,#29
1: sub r2,r2,#1
bne 1b
sub r0,r0,#1
bne 2b
bx lr



The idea was to pre-calculate the number of on/off blinks I would need, feed this function the number of blinks and the address to write to the IR LED. The code turns on the led, has a tight little loop that counts to 29, two instructions per loop not counting the setup. Then turns off the led, counts to 29 and repeats all of that r0 times. I didnt bother getting anal with the nops to make the whole loop a perfect square wave, the r0 counter is going to make it just a little off. From C I timed it using the system timer, had it do 150 or so blinks and divided by that number and it came out around 181.1 ticks per loop, used that number to pre-calculate the number of loops I would need for each "on" period. For the blank periods of the IR output I know from decoding how long the on plus off period needed to be, so grab the timer tick before blasting the IR, after blasting the IR, wait until the timer has burned both the on and off periods of time before moving on to the next on pulse.



void blastir ( const unsigned short *data, unsigned int len )
{
unsigned int ra,rb,rc;

SetLedx();
ResetTimer();
for(rb=0;rb<4;rb++)
{
rc=GET32(TIMER2_BASE+GPTMTAR);
for(ra=0;ra < len;ra+=2)
{
blinker(data[ra+0],IRLED_BASE+GPIO_DATA+((IRLED_PIN) << 2));
rc-=data[ra+1];
while(GET32(TIMER2_BASE+GPTMTAR) > rc);
}
}
ResetLedx();
}



Again, its actually pretty simple stuff.

One thing to note if you ever try this and perhaps try this with two modern or similar devices. If they use the same code or a code with time periods that are similar you may confuse the target device. The remote transmits some code using protocol A, both your microcontroller and your target device (TV, VCR, whatever) see the code from the remote, your microcontroller turns around and transmits another code using protocol A. The human may still be pressing the button on the remote, now the target device is getting hit with IR from the remote and the IR from your microcontroller and it gets confused and doesnt react to either.

IR Reciever Module. Digikey part number 425-1904-ND
Looks like it might be similar to the module available at radio shack although I found out the hard way that the pinout at Radio Shack is different and you will warm up the module and remove some fingerprints when you touch it to find out it is clearly wired wrong. My Luminary Micro LM3S811 evaluation board survived the torture fortunately. (I bricked the first one I bought when I accidentally programmed a JTAG pin the wrong way).

Monday, January 7, 2008

blinker3

Blinking the led, part 3.

blinker3-20080106a.tar.gz

This post continues to improve on or at least change a simple example program that blinks the led on a Luminary Micro Stellaris evaluation board. Specifically supported are the LM3S811, LM3S1968, and the LM3S6965 evaluation boards.

part 1, blinker1, was very simple assembler. part 2, blinker 2, was C with some support functions in assembler. Both part 1 and part 2 simply counted for a while between blinks. Part 3 uses the on chip timer to control the rate of the blinks with some precision.

This family of chips has three on board timers, for what I care about here we are going to take one of the timers and make it a 32 bit count down timer that rolls over. Basically you can grab the timer value when you start something, grab the timer value again and subtract it from the start time to get the duration. In this case we continue sampling the timer and subtracting from the start time until we reach a pre-determined timeout. Note, being 32 bit timers and using 32 bit math you dont have to worry about the counter roll over because you are allowed to borrow off the top of the register. Take a 3 or 4 bit counter and work it out on paper...

Because I am using the timer I needed to make sure the system clock was predictable. At one point I was having problems with the system clock between loading with lmiflash and reset/powerup. So I made it predictable by reading the register, preserving the reserved bits and oring on the bits I wanted which were mostly the default bits. For the 811 I did not have to touch the clock, it just worked.

So, the 811 eval board uses a 6mhz clock, so if we count to 6,000,000 then toggle the led that should be once a second. And it is. The 1968 and 6965 are 8mhz based so when compiling for those the timeout is 8,000,000 clock ticks. All of the problems we had with the for loop in blinker2 go away as gcc is not willing to optimize out the GET32 timer reads.

Tuesday, January 1, 2008

blinker2

Blinking an led. Part 2.

Full source for a working, although slow blinking, example is located here:
blinker2-20080106a.tar.gz
blinker2.tar.gz


In this post I am going to look at blinking the LED using C with gcc and binutils instead of assembler. This example is targeted at the Luminary Micro Stellaris LM3S811, LM3S1968, and LM3S6965 evaluation boards.

Just like Linux, high level language development is an exercise in self abuse. I am pretty sure, no, I am certain that most developers are not aware of what the compiler is doing behind the scenes. Whats even more scary is how many embedded developers fall into this category. I will take a tangent here in a bit to discuss the problem.

Repeating blinker1 using C is actually not that big of a leap. I am using the C compiler that I created in a prior post, which has no C library to get in the way. The compiler switches I am using are hopefully keeping the C library out of the way anyway.

With blinker1, on reset, we branched directly to the main program. When you add C with or without a C library you will want to branch to some pre-main code that does things like zero the .bss memory (I find it best to initialize variables at run time and not make assumptions). Then prep the argc and argv arguments and call main. So the first change I made was to at least put a pre main step. Its not a bad idea to call main and then fall into an infinite loop. The alternative is to branch to main and remember that you cannot return from the function.

So the startup code using my bootloader now looks like this (there is a link to the full source below which includes the normal-not-bootloader startup code).

/* novectors.s */
.cpu cortex-m3
.thumb
.global _start
_start:
bl main
b .
.end

I have tried all of the tricks over the years and the compilers simply cannot understand when they should not optimize writes to hardware registers. The only way to insure it is done right is to write a few core functions in assembler. I call them PUT32 and GET32. And since the compiler doesnt optimize that well (it optimizes when you dont want it to but not so much when you do) I have assembler functions for the read-modify-write functions PUTGETSET32 and PUTGETRESET32.

/* putget.s */
;@-----------------------
.cpu cortex-m3
.thumb
;@-----------------------
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
;@-----------------------
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
;@-----------------------
.thumb_func
.globl PUTGETRESET
PUTGETRESET:
ldr r2,[r0]
bic r2,r1
str r2,[r0]
bx lr
;@-----------------------
.thumb_func
.globl PUTGETSET
PUTGETSET:
ldr r2,[r0]
orr r2,r1
str r2,[r0]
bx lr
;@-----------------------
.end
;@-----------------------

Okay so Reset goes to _start, then _start calls main. Just like blinker1 main is going to setup the leds then go into an infinite loop with counters for delays between the setting and resetting of the led output pin. I have created defines for the hardware addresses and bits.

//------------------------------------------------------------------------
//------------------------------------------------------------------------
/* blinker2.c */

#include "lmistuff.h"

#define CHIP6965

#ifdef CHIP1968
#define LED_RCGC2 LED_1968_RCGC2
#define LED_BASE LED_1968_BASE
#define LED_PIN LED_1968_PIN
#endif

#ifdef CHIP6965
#define LED_RCGC2 LED_6965_RCGC2
#define LED_BASE LED_6965_BASE
#define LED_PIN LED_6965_PIN
#endif
//------------------------------------------------------------------------
unsigned int ra;
//------------------------------------------------------------------------
void PUT32 ( unsigned long, unsigned long);
unsigned long GET32 ( unsigned long );
void PUTGETSET ( unsigned long addr, unsigned long bit );
void PUTGETRESET ( unsigned long addr, unsigned long bit );
//------------------------------------------------------------------------

void SetLed ( void )
{
PUT32(LED_BASE+GPIO_DATA+((LED_PIN) << 2),LED_PIN);
}
//------------------------------------------------------------------------
void ResetLed ( void )
{
PUT32(LED_BASE+GPIO_DATA+((LED_PIN) << 2),0);
}
//------------------------------------------------------------------------
void SetupLeds ( void )
{
PUTGETSET (SYSCON_BASE+SYSCON_RCGC2,LED_RCGC2);
PUTGETRESET(LED_BASE+GPIO_AFSEL,LED_PIN);
PUTGETSET (LED_BASE+GPIO_DIR ,LED_PIN);
PUTGETRESET(LED_BASE+GPIO_ODR ,LED_PIN);
PUTGETSET (LED_BASE+GPIO_DEN ,LED_PIN);
}
//------------------------------------------------------------------------
int main ( void )
{
SetupLeds();
while(1)
{
for(ra=0;ra < 0x3E0000;ra++) ; SetLed();
for(ra=0;ra < 0x3E0000;ra++) ; ResetLed();
}
}
//------------------------------------------------------------------------
//------------------------------------------------------------------------

Lastly the Makefile


COPS = -Wall -O2 -mthumb -nostdlib -nostartfiles -ffreestanding

all : blinker2.bin blinker2.norm.bin

novectors.o : novectors.s
arm-thumb-elf-as novectors.s -o novectors.o

vectors.o : vectors.s
arm-thumb-elf-as vectors.s -o vectors.o

putget.o : putget.s
arm-thumb-elf-as putget.s -o putget.o

blinker2.o : blinker2.c lmistuff.h
arm-thumb-elf-gcc $(COPS) -c blinker2.c -o blinker2.o

blinker2.elf : novectors.o putget.o blinker2.o blmemmap
arm-thumb-elf-gcc $(COPS) novectors.o putget.o blinker2.o -T blmemmap -o blinker2.elf
arm-thumb-elf-objdump -D blinker2.elf > blinker2.list

blinker2.bin : blinker2.elf
arm-thumb-elf-objcopy blinker2.elf blinker2.bin -O binary

blinker2.norm.elf : vectors.o putget.o blinker2.o memmap
arm-thumb-elf-gcc -Wall $(COPS) vectors.o putget.o blinker2.o -T memmap -o blinker2.norm.elf
arm-thumb-elf-objdump -D blinker2.norm.elf > blinker2.norm.list

blinker2.norm.bin : blinker2.norm.elf
arm-thumb-elf-objcopy blinker2.norm.elf blinker2.norm.bin -O binary

clean:
rm *.bin
rm *.o
rm *.elf
rm *.list

And that should do it right? Well, no actually. It doesnt work. The led comes on and stays on. Why is that? Doesnt a computer do what we tell it to do, not what we want it to do? Actually that is probably still true. Here is the problem. I optimized, if you remove the -O2 or change it to -O0 then it will work. Another way to make it work is to put the word volatile in front of the unsigned int ra; definition. Why?

The optimizer looks at the code and realizes that the variable ra is counting sure, but nothing is using ra as an input. That counting is just a waste of time get rid of it. So the compiler has reduced our main loop to:

while(1)
{
ra=0x3E0000; SetLed();
ra=0x3E0000; ResetLed();
}

Using the -S option to compile to assembler (you can also see this if you compile and dissasseble):

ldr r5, .L17
.L14:
mov r4, #248
lsl r4, r4, #14
str r4, [r5]
bl SetLed
str r4, [r5]
bl ResetLed
b .L14
.L18:
.align 2
.L17:
.word ra

248 is 0xF8, when shifted 14 times it becomes 0x3E0000, as expected. At least the compiler is not completely cruel and has the heart to set ra to 0x3E0000 to replace the for loop. As humans we can see that even that could have been optimized out. Or they could have set ra one time outside the while loop. But ra is a global and this compiler apparently does not penetrate into the SetLed and ResetLed code to see that ra is not needed nor touched there. (I have seen other compilers do this which usually means you have to separate functions into separately compiled files so that the optimizer cannot penetrate into your code). The least painful solution is to mark ra as volatile:

volatile unsigned int ra;

This tells the compiler that whatever you do, any time ra changes you must write the change to the memory location used to hold ra. You will find that some compilers will optimize variables to registers and do the things you said for them to do but didnt feel the need to waste cycles writing intermediate values to memory. This is actually quite desireable in general except in cases where you are writing to a hardware register and rely on all of the writes happening. Compiling for debug, depending on the compiler, might mean to the compiler assume all variables are volatile (so that the debugger can watch memory locations). Here again, you would be surprised how many developers are unaware of what is going on behind the curtain and are surprised when code that has been working for months/years when compiled for debug stops working when it is compiled for production.

Before showing the output of adding the volatile, a quick note. I am using gcc 4.2.2 built as a cross compiler as described in a previous post. I also created a gcc 3.3.6 cross compiler and a gcc 3.4.6 cross compiler. Both of the 3.x.x compilers are not willing and/or able to optimize out the delay counter, and for those you would get a blinking led. It would be a long while between blinks but it would work. I cut a lot out but you can basically see it counting from zero to some number before calling SetLed.

mov r3, #0
.L16:
add r3, r3, #1
cmp r3, r4
bls .L16
str r3, [r1]
bl SetLed

Since the count of 0x3E0000 was determined for the blinker1 by assuming two instructions in the delay loop and trying to count to 8 million, having three instructions means it will take 50% longer than its blinker1 counterpart. Ahh, this also demonstrates that during this loop ra was not stored back to memory each time it changed, only at the end after the loop is the value ra saved to memory.

The Coded Sourcery G++ Lite compiler, 2007-q3, is 4.2.2 based and likewise optimizes out the for loop all together.

Okay, so we tell the compiler that ra is volatile

.L20:
ldr r3, [r4]
add r3, r3, #1
str r3, [r4]
ldr r3, .L25+4
ldr r2, [r4]
cmp r2, r3
bls .L20

Well, the computer did what we told it to do, every time it needs to read and write ra it reads it from memory then changes it and writes it back. And perhaps because of the limited number of registers or who knows why, the max count value is not stored in a register it has to be fetched every time (ldr r3,.L25+4). So now our timeout count that was determined based on two instructions per loop, now has 7 instructions in the loop, assuming all instructions are one clock cycle that means it should take 3.5 times longer to blink than blinker1.

Just for my own entertainment if we follow blinker1 literally and use this as our delay loop

for(ra=0x3E0000;ra;ra--) ; ResetLed();

We get

.L21:
ldr r3, [r4]
sub r3, r3, #1
str r3, [r4]
ldr r3, [r4]
cmp r3, #0
bne .L21

Since it is comparing with zero now instead of a terminal count we save one instruction which because it had a memory access can take longer in general unless you have zero-wait-state memory.

Again, just for fun, if I remove the volatile and use gcc 3.4.6, it gets very close to the optimal solution:

.L20:
sub r3, r3, #1
cmp r3, #0
bne .L20

The problem I have is that other compilers know that a sub is really a subs which means set the flags after you do the subtract. And branch if not equal means branch if not equal to zero, subs already set the z or zero flag so there is no need for the additional compare with zero, it has already been done.

So this trivial program has created lots of problems, and for this specific task you would have been closer to success on your first try with the older gcc 3.x.x compiler over the current 4.2.2.

If you ever have the opportunity to examine what ARM's own compiler does, it can be absolutely amazing. Now that Keil is ARM and I think it uses the rvct as a compiler the Keil eval may very well do amazing things. I will have to look at that some day.

Oh, the reason I wrote the PUTGETSET and PUTGETRESET functions in assembler is because the output of gcc had a few extra unnecessary operations.

Hmmm, very interesting note. blinker2 had a bug. For those that dont know this semi-documented feature of the ARM cores (Even the popular ARM7), is that when you are executing in thumb mode the lsbit of your PC (r15) is set, in arm mode the bit is clear. Likewise compilers have to produce the proper addresses with the bit set or clear. I had the vectors.s table wrong, I wasnt using it because I was running with the bootloader. Bootloader works because the reset vector pointed to the C function main which was compiled as thumb thus telling the linker to generate an address in the vector table with the lsbit set. When I changed the code so that the reset vector jumped over the vector table to a _start label then from there I bl main, I had not declared _start as a .thumb_func so despite producing thumb instructions it created an ARM address in the vector table. And the 811 board I just got would not execute. What disturbs me greatly about all of this is that this is supposed to be a thumb or thumb2 ONLY core from ARM. If you know its thumb only shouldnt you be ignoring anything to do with ARM mode? Wouldnt that be a safe assumption? I guess not.

20070106a, fixed the vectors.s bug and added LM3S811 support.

Monday, December 31, 2007

blinker1

Blinking an led. Part 1.

The Luminary Micro Stellaris (LMI) eval boards I have so far all come with a CDROM full of goodies. The most important of course is the documentation. And the two most important documents are the users manual for the eval board and the datasheet for the chip.

The schematic for the LM3S6965 evaluation board shows that the LED is connected to the PF0/PWM0 I/O pin. For strictly "blinking" the led this does not matter but for completeness the next page of schematic shows that the led is between the I/O pin and ground so to turn on the led we must turn the output on. To turn off the led turn the output off.

Switching to the data sheet for the 6965. LMI uses the term GPIO, general purpose I/O. Section 9.2 makes life easy, for starters you must enable the clock to the appropriate GPIO in the RCGC2 register, then you need to use the GPIO registers to configure the pin to do what you want it to do.

So lets get RCGC2 out of the way. This is in the System Control section of the datasheet. Section 6.2 states that the System Control base address is 0x400FE000. Table 6-1 shows the register map and RCGC2 is at offset 0x108. So this means the address to the RCGC2 register is 0x400FE108. In this particular datasheet had we gone straight to the RCGC2 register definition, both the base address and offset are given.
Seems like every undefined bit has the warning from LMI that you should preserve the state of this bit. Arguments could be made either way, but since I easily bricked my LM3S811 by not paying attention to other bits I am not sold on the idea of looking at these configuration registers as bits and not whole registers.

So we are going to need to set GPIOF in RCGC2 to enable that peripheral. Since setting and clearing of bits in registers is going to happen often we should make some functions to reuse. Arguably a macro is better here, but I am not macro savvy, probably because that means you have to adopt a particular assembler or remember syntax for each of the assemblers you use.

So a function that is compatible with C that takes the address of the register as the first argument (r0) and the second argument (r1) has the bits set that we want to set in the register at that address. Basically read the memory location, do a bitwise or and write it back to the memory location. To clear bits we use the bic instruction which you send it the bits you want to clear instead of the bits you want to set.


.thumb_func
PUTGETSET:
ldr r2,[r0]
orr r2,r1
str r2,[r0]
bx lr

.thumb_func
PUTGETRESET:
ldr r2,[r0]
bic r2,r1
str r2,[r0]
bx lr


Okay we can move forward, loading RCGC2:


ldr r0,=0x400FE108 ;@ SYS_CONTROL+RCGC2
mov r1,#0x0x000020 ;@ GPIO_F
bl PUTGETSET


Now we need to configure the GPIO pin. We want this pin to be a plain and simple GPIO, LMI has provided table 9-1 to make our lives easier. For a digital output we can quickly see that we need AFSEL 0, DIR 1, ODR 0, and DEN 1. If this is your first time with this part it is a good idea to look up each register. Hmmm, its not actually spelled out but each of these configuration registers acts on the 8 bits in that register (PF7 down to PF0), so PF0 is the 0x00000001 bit in each of these registers.

Lets get the first four out of the way:


ldr r0,=0x400FE108 ;@ SYS_CONTROL+RCGC2
mov r1,#0x00000020 ;@ GPIO_F
bl PUTGETSET
mov r1,#0x000000001 ;@ pin zero
ldr r0,=0x40025420 ;@ GPIO_F_BASE+AFSEL
bl PUTGETRESET
ldr r0,=0x40025400 ;@ GPIO_F_BASE+DIR
bl PUTGETSET
ldr r0,=0x4002550C ;@ GPIO_F_BASE+ODR
bl PUTGETRESET
ldr r0,=0x4002551C ;@ GPIO_F_BASE+DEN
bl PUTGETSET


For sake of example and argument I am going to assume quite a few things. Coming out of reset we assume the datasheet is correct in describing the state of the registers. And the reset state of the PUR, PDR, DR2R, DR4R, DR8R and SLR registers is probably fine for what we are trying to do here. Likewise when looking at the AFSEL register it mentions the GPIO Lock and Commit, we also assume these are set as-desired right after reset. For general purpose code you should probably manage the PDR, PUR, and DRxR registers at a minimum. Note, I dont think LMI's example board support package touches GPIO Lock or Commit but they do modify AFSEL, so they assume those are set as-desired.

So...To blink the led we need to use the GPIODATA register. This is the only tricky register, bits 9 down to 2 of the ADDRESS are a mask that enables the output bits you want to access, this is pretty neat actually because you can be sloppy. Had they done this everywhere we wouldnt need to read-modify-write so much, sigh. So bit 9 of the address we write to is related to PF7 and bit 2 of the address relates to PF0. If I were to write to this in C I would do something like:


PUT32(GPIO_F_BASE+GPIO_DATA+(pin<<2),data);


Since this is assembler and its all hardcoded we can just figure it out. Base address is 0x40025000, GPIODATA offset is 0x000. And PF0 is bit 0x00000001, shifted left two is 0x00000004.
So this means to set the led:


SetLed:
ldr r0,=0x40025004 ;@ GPIO_F_BASE+GPIO_DATA+(PIN_0<<2)>
mov r1,#0x00000001 ;@ pin GPIO_0
str r1,[r0]


And to turn off the led


ResetLed:
ldr r0,=0x40025004 ;@ GPIO_F_BASE+GPIO_DATA+(PIN_0<<2)>
mov r1,#0x00000000
str r1,[r0]



So to put this all together, I am going to assume that the chip is running at 8MHz, and each instruction is one clock cycle. With that if we burn around 8 million clock cycles between toggling the led we should definitely see it change. 0x3E0000 is over 4 million and two instructions in a wait loop gives us over 8 million instructions, hopefully that blinks at a not too fast and not too slow rate.

Here is the blinker1 code:


/* blinker1.s */
.cpu cortex-m3
.thumb

.thumb_func
PUT32:
str r1,[r0]
bx lr

.thumb_func
GET32:
ldr r0,[r0]
bx lr

.thumb_func
PUTGETRESET:
ldr r2,[r0]
bic r2,r1
str r2,[r0]
bx lr

.thumb_func
PUTGETSET:
ldr r2,[r0]
orr r2,r1
str r2,[r0]
bx lr

.thumb_func
dowait:
ldr r4,=0x3E0000
wait0:
sub r4, #1 ;@why wont it let me use subs, but then it creates a subs?
bne wait0
bx lr

.thumb_func
.globl _start
_start:
ldr r0,=0x400FE108 ;@ SYS_CONTROL+RCGC2
mov r1,#0x00000020 ;@ GPIO_F
bl PUTGETSET
mov r1,#0x000000001 ;@ pin zero
ldr r0,=0x40025420 ;@ GPIO_F_BASE+AFSEL
bl PUTGETRESET
ldr r0,=0x40025400 ;@ GPIO_F_BASE+DIR
bl PUTGETSET
ldr r0,=0x4002550C ;@ GPIO_F_BASE+ODR
bl PUTGETRESET
ldr r0,=0x4002551C ;@ GPIO_F_BASE+DEN
bl PUTGETSET

SetLed:
ldr r0,=0x40025004 ;@ GPIO_F_BASE+GPIO_DATA+(PIN_0<<2)
mov r1,#0x00000001 ;@ pin GPIO_0
str r1,[r0]

bl dowait

ResetLed:
ldr r0,=0x40025004 ;@ GPIO_F_BASE+GPIO_DATA+(PIN_0<<2)
mov r1,#0x00000000
str r1,[r0]

bl dowait
b SetLed

.end


The one thing that is missing is how to get from reset to calling _start.
I intentionally separated this from the main program because I have a bootloader I use as described in a previous blog.

To use the bootloader you want the first instruction in your binary to simply branch to _start:


/* novectors.s */
.cpu cortex-m3
.thumb
b _start
.end


For the rest of the world you need a proper interrupt vector table at the beginning of your binary. This was actually a bit difficult to find, there wasnt a big neon sign that said look here dummy. The ARMv7M Architectural Reference Manual, which is included in the materials with the eval board. Part B System Level Architecture, Section B1.5.2 and B1.5.3 describe the exception vector table. This is NOT what you are used to from the ARMv4 (ARM7TDMI) days. I think its superior but what do I know. Address 0x00000000 is what they call SP_main, for this example what we care about is that it is setting up the stack pointer, we are not using the stack in this example but for a chip like this you probably want to just set it at the top of ram anyway. The words that follow have a one to one relationship with the various exceptions. So a proper vector table would look like this:


/* vectors.s */
.thumb

.word 0x20010000 /* SP_Main */
.word _start /* 1 Reset */
.word hang /* 2 NMI */
.word hang /* 3 HardFault */
.word hang /* 4 MemManage */
.word hang /* 5 BusFault */
.word hang /* 6 UsageFault */
.word hang /* 7 RESERVED */
.word hang /* 8 RESERVED */
.word hang /* 9 RESERVED*/
.word hang /* 10 RESERVED */
.word hang /* 11 SVCall */
.word hang /* 12 Debug Monitor */
.word hang /* 13 RESERVED */
.word hang /* 14 PendSV */
.word hang /* 15 SysTick */
.word hang /* 16 External Interrupt(0) */
.word hang /* 17 External Interrupt(1) */
.word hang /* 18 External Interrupt(2) */
.word hang /* 19 ... */

hang: b .

.end



Unlike the ARMv4 cores the word is not a branch instruction (nor a ldr pc), it is simply the address of the handler. The one exception we care about here is Reset which we send to _start.

Back to the 6965 datasheet. The vector table lives at address 0x00000000 which is in the On-chip flash on the 6965 (Section 3 Memory Map), which is good thats where we want it. Sram goes from 0x20000000 to 0x2000FFFF. The push instruction (see the ARMv7M ARM) decrements the SP first then writes to memory so you can initialize your stack pointer one byte past the end as we have here 0x20010000.

There are MANY ways to tell the gnu linker the memory layout for your system, each one more insanely complicated that the one before it. I prefer the KISS approach, and currently this is the linker script I use:


/* blmemmap */
MEMORY
{
rom(RX) : ORIGIN = 0x00002000, LENGTH = 0x3E000
ram(WAIL) : ORIGIN = 0x20000000, LENGTH = 64K
}

SECTIONS
{
.text : { *(.text*) } > rom
}


That is for use with my bootloader which currently wants you to start at address 0x2000. For the rest of the world:


/* memmap */
MEMORY
{
rom(RX) : ORIGIN = 0x00000000, LENGTH = 0x40000
ram(WAIL) : ORIGIN = 0x20000000, LENGTH = 64K
}

SECTIONS
{
.text : { *(.text*) } > rom
}


In a nutshell anything that is Rom or eXecutable goes in the first memory location, the flash, everything else (W, A, I, L) is targeted for SRAM.

Using the gnu binutils compiled using the steps described in a prior post, here is my Makefile for building the binaries.


all : blinker1.bl.bin blinker1.norm.bin

blinker1.bl.bin : blinker1.s novectors.s blmemmap

arm-thumb-elf-as novectors.s -o novectors.o
arm-thumb-elf-as blinker1.s -o blinker1.o
arm-thumb-elf-ld -X -o blinker1.bl.elf novectors.o blinker1.o -T blmemmap
arm-thumb-elf-objdump -D blinker1.bl.elf > blinker1.bl.list

arm-thumb-elf-objcopy blinker1.bl.elf blinker1.bl.bin -O binary

blinker1.norm.bin : blinker1.s vectors.s memmap
arm-thumb-elf-as vectors.s -o vectors.o
arm-thumb-elf-as blinker1.s -o blinker1.o
arm-thumb-elf-ld -X -o blinker1.norm.elf vectors.o blinker1.o -T memmap
arm-thumb-elf-objdump -D blinker1.norm.elf > blinker1.norm.list
arm-thumb-elf-objcopy blinker1.norm.elf blinker1.norm.bin -O binary

clean:
rm *.bin
rm *.o
rm *.elf
rm *.list


Using this memmap it is important that the (no)vectors.o object is first in the list of objects for the linker command.

The source for this example is located here: blinker1.tar.gz

Thursday, December 27, 2007

bootloader

Yes, building gcc is fun. Yes the painful self abuse of linux is fun. Trying to get some big debugger thing working that I will never use because I only care to load the flash, not fun.

I gave up trying to get the ftdi jtag working on Linux and quickly wrote a very simple bootloader. I currently use this on the Luminary Micro Stellaris LM3S6965 and LM3S1968 evaluation boards.

The bulk of what you need to know is in the bootload.txt file in the tarball.

bootload-20080106a.tar.gz
bootload.tar.gz

I used the serial port exposed by the on board ftdi chip. On ubuntu and perhaps elsewhere you might need to:

sudo rmmod ftdi_sio
sudo modprobe ftdi_sio vendor=0x0403 product=0xbcd9

(each time you plug in the board)

And that should provide you with /dev/ttyUSB0 and /dev/ttyUSB1 where /dev/ttyUSB1 is the port we are interested in. If you already have usb serial ports then the numbers will shift. On windows who knows how they are chosen you get COMsomething, unplug and replug and watch which port goes away and returns and there you go.

The bootloader has simple commands like

e 0x2000

To erase the page of flash starting at address 0x2000

And

w 0x2000 0x12345678

To write the value 0x12345678 to (an erased) address 0x2000

If you hold the up button on the 1968 or 6965 board (probably just the select button on the 811 if/when I get it going) during power up or a reset it goes into the bootloader waiting for commands. If you do not have that button pressed it jumps to address 0x2000. I have no support for interrupts at this time.

By the way its 115,200 8N1

There is an assembler and C LED blinker example (for now you have to go into the code and change probably one line to pick the board type before compiling/assembling) in the blinker directory. And on linux a simple loader program in the bload directory.

The linker script (memmap) for using my bootloader is

MEMORY
{
rom(RX) : ORIGIN = 0x00002000, LENGTH = 0x3E000
ram(WAIL) : ORIGIN = 0x20000000, LENGTH = 64K
}
SECTIONS
{
.text : { *(.text*) } > rom
}

Which puts you at address 0x2000 in the flash, not 0x0000

If you want to use pretty much any of my examples without the bootloader use this linker script instead:


MEMORY
{
rom(RX) : ORIGIN = 0x00000000, LENGTH = 0x40000
ram(WAIL) : ORIGIN = 0x20000000, LENGTH = 64K
}

SECTIONS
{
.text : { *(.text*) } > rom
}


Also since the bootloader handles the "vector table" which this chip and/or core uses (not the same as what you are used to from the ARM7) I simply branch to main when using the bootloader:


.cpu cortex-m3
.thumb
b main
.end

If you dont want to use the bootloader, use this instead:

.cpu cortex-m3
.thumb

.word 0x20010000 /* stack top address */
.word main /* reset routine location */
.word hang /* NMI ISR location */
.word hang /* Hard Fault ISR location */

hang: b .


20080106a adds support for the LM3S811 evaluation board.

Roll your own GCC

Although the optimizations are pretty weak I prefer to use gcc, probably because I prefer to develop on Linux over Windows. With gcc, esp a homebrew, I can switch back and forth without too much trouble. Yes, the same is true if I use Codesourcery G++ Lite.

The ARM Cortex-M3 core is a new (relative to the ARM7 we all know and love) core. To make it nice and confusing the Cortex cores can support one or more of ARM mode, (traditional) Thumb mode, the new Thumb2 additions to thumb mode, and some DSP extensions to the ARM instructions. The Cortex-M3 on these chips only supports Thumb and Thumb2 instructions. Basically, ARM filled in the weaknesses in the Thumb instruction set with some ARM Like instructions, in theory, with a good compiler, you can come much closer to ARM ISA performance while staying close to Thumb ISA binary size. Thats the theory.

The code I share here is not necessarily one size fits all. There will be some hardcoding from time to time per platform. And in no way, shape, or form do I think that embedded programming means making api/operating system calls. I dont use operating systems, and for the time being I dont even have a C library. And if you really know gcc you know to just forget about floating point.

Cygwin is good, has its place, does its job, but for a gcc cross compiler I prefer statically compiled MinGW binaries, no dll hell and a performance boost for those big jobs. Yes, talking about Windows. The build instructions are almost identical for Linux and Windows, MinGW and/or windows provides some extra problems so they are identified as extra steps. Currently using Ubuntu Gutsy and Windows XP. These build instructions, which are quite simple, have evolved from gcc 2.95 to the present from Windows NT4 and Slackware whatever to the present.

As of 4.2.2 gcc does not yet support the Thumb2 instructions. The next release should have it. Codesourcery has added/included Thumb2 support in their 4.2.2 gcc.

Lets begin:



#IF WINDOWS
Go to http://www.mingw.org, find and download

MinGW=5.1.3.exe
MSYS-1.0.11-2004.04.30-1.exe

Install both
Run the Msys shell/prompt to continue as if on a native Linux system.
#ENDIF

#IF LINUX
Make sure you have texinfo, bison, and flex installed, as well as gcc, make and other typical programs
On ubuntu this would be

sudo apt-get install build-essential bison flex texinfo

Note the makeinfo error I am trying to avoid here is part of the texinfo package
#ENDIF

Most of this section is simply a list of the commands to type.

#IF LINUX
You probably dont run as root all the time so you may need to use a directory other than /arm or
as root create /arm and chown to your username so that you can finish the build as the user.
It wants to run from the path used to create it so think about where you want it and who you want
to use it. If you do all of this as root and keep it in /arm (and make it readable by everyone) everyone
can use it.
#ENDIF

Download http://ftp.gnu.org/gnu/gcc/gcc-4.2.2/gcc-core-4.2.2.tar.bz2
Download http://ftp.gnu.org/gnu/binutils/binutils-2.18.tar.gz

ZZZ='--target=arm-thumb-elf --prefix=/arm'
mkdir /arm
cd /arm
tar xzvf /path/to/binutils-2.18.tar.gz
cd binutils-2.18
mkdir build
cd build
../configure $ZZZ

#IF WINDOWS
edit the Makefile in this directory
Find and change the line from

MAKEINFO = /arm/binutils-2.18/missing makeinfo

to

MAKEINFO = /bin/makeinfo

#ENDIF

make all install

(this will take a while)

/arm/bin/arm-thumb-elf-as --version

GNU assembler (GNU Binutils) 2.18
...

Now binutils, the assembler, linker and other utilties is complete. If you only want to use assembly language this is all you need

To clean up the binutils build files

cd /arm
rm -rf binutils-2.18

On to gcc

cd /arm
tar xjvf /path/to/gcc-core-4.2.2.tar.bz2
cd gcc-4.2.2
mkdir build
cd build
../configure $ZZZ --disable-libssp
make all install

#IF WINDOWS
This will fail at some point with errors that look like

In file included from ../../gcc/libgcc2.c:35:
./tm.h:6:28: error: config/dbxelf.h: No such file or directory
...

It doesnt like the -I by itself in the xgcc command line
The culprit is in gcc/Makefile, a line that starts with

INCLUDES = -I. -I$(@D) -I$(srcdir) ...

Just get rid of that second include

INCLUDES = -I. -I$(srcdir) ...

Then start the build again

make all install

and it will complete this time
#ENDIF

/arm/arm-thumb-elf-gcc --version

arm-thumb-elf-gcc.exe (GCC) 4.2.2

To clean up the gcc build files

cd /arm
rm -rf gcc-4.2.2

#IF WINDOWS
You can exit msys and then copy the arm directory tree from
z:\msys\1.0\arm to z:\arm
(so that your binaries are in z:\arm\bin\)
z:\ is of course whatever directory you happen to be working with C:\ for example
#ENDIF

Welcome to lmistuff

Welcome, I thought I would keep track of my Luminary Micro Stellaris microcontroller evaluation board experiments.

Currently I have an LM3S6965, an LM3S1968, and a dead LM3S811. Well, not really dead I was too eager when it came in and programmed one or more of the jtag I/O pins the wrong way...I have another one on its way.