lmistuff: January 2008

Friday, January 18, 2008

irecho

IR Echo (infra red remote control stuff)

irecho-20080118a.tar.gz

I have been working on this project for right around 10 years now. No, not full time, not part time, some years I may think about it once and not act on it. I finally got around to working on it and actually finishing it.

I have a mid 1980's RCA TV, which at the time was a special order top of the line thing. Wouldnt be surprised if it was one of the early RCA TVs to use an infra red remote. The long and short of it is this TV has outlasted any TV purchased since, a couple of buttons on the TV, power in particular have worn out, but the remote, despite several moves and 20+ years is still working. If I lose the remote though the TV might become a brick. You cannot find this TV on google searches much less the remote. Either due to the rareness of the TV or how early its infra red remote is or isnt, it is not supported by any of the pre-programmed remote codes in universal remotes. Yes some remotes will "record" ir and then retransmit it, but that has limited success.

So my thought was, why not program the remote for a TV it does know about, receive the codes for that TV and then blast the codes for the real TV (that the remote does not know about). Take that one step further and you could just as easily do the same thing to support a non xyz brand device using an xyz branded remote.

The problem I had in this specific case was figuring out the IR protocol/codes for the old tv. I didnt do anything different this time around, yet, this time around I was able to figure out the codes without too much trouble. Generating those codes has been interesting, and most of the problem there was I was trying to do it in C with gcc whose code generation and optimization varied widely.

IR remote control is really not complicated at all.
http://www.sbprojects.com/knowledge/ir/ir.htm
Explains a number of protocols quite well. I used to use the documentation for a program you could run on an hp48 calculator to turn it into a remote control. Basically, an IR led is used and you blink it at a frequency like 40KHz for example. Blink it on and off for so many micro or milliseconds, then leave it off for so many micro or milliseconds, and repeat this, varying the length of the blinking and off periods. The receiver modules I have used do the analog work for you and the output is a square wave, with the blinking on/of period windowed by one output state ("high") and the non-blinking as a ("low"). The job for the microcontroller is to time the high and low periods and decode them based on a protocol. There are some standard protocols as described in the link above.

So for a number of reasons, after trying a few, I chose what the link above calls the Sony SIRC protocol. The IR receiver module I am using is the same or similar to one you can get at Radio Shack (see below). The SB-Projects page says the Sony protocol is 40KHz and the receiver I am using is 40KHz which is probably why I had better luck with that protocol than the RCA protocol (a modern one not the old one). When I used the RCA protocol it only worked between x and y feet from the receiver and you really had to aim right at the receiver. With the sony protocol you can be anywhere in the room and dont really have to aim at the receiver, many walls and other surfaces will reflect the IR and still work.

The Sony protocol starts with a 2.4ms pulse. I poll the ir input pin, wait for a state change, when there is a state change grab the counter from a timer, poll for the next state change, grab the counter, subtract the counters and if the time of the pulse is the right length plus or minus some acceptable slop (aiming at the walls or really close or really far from the receiver) call it good. From there I went into decoding the code, the blank periods should all be 600us plus or minus some slop followed by an on period that is either 1.2ms for an one,or 600us for a zero. The code has 12 bits (24 state changes after the start pulse). If you manage to collect 12 bits without any of the periods between state changes being out of range, call it a good code and process it. Simple.

The LM3S811 eval board runs at 6MHz by default so the 2.4ms start pulse is 0.0024 * 6000000 or 14400 timer ticks. The 1.2ms pulse is half that at 7200 ticks and the 600us periods should be around 3600 ticks. It is really easy to tweak and or rearrange the example to receive other protocols. Likewise change the math above and you can do this on the LM3S6965 or LM3S1968 (or others). If I remember right the LM3S1968 did not have 3.3v and ground and an available input pin within reach of each other to easily solder in a receiver, so I primarily worked with the LM3S6965 and the LM3S811 for this project. I didnt want to dedicate the 6965 to this job so I bought another '811 to dedicate to this job. Actually I bought some msp430 ez430 modules to dedicate to this job but, my guess is, because of the instability of the internal clock I couldnt get it to work. I also struggled getting the uart to work to print out the codes as I was figuring them out and eventually gave up and bought the LM3S811 board.

Decoding is half the fun, I used the same decoding method to finally figure out the protocol/codes for the older RCA remote. To blast the ir, I bought some IR leds from radio shack (part number?). ou have to know or figure out the frequency to blink the led and then bit blast that at a close enough rate to make it work. I figured the old remote was somewhere around 32KHz or some harmonic. I only really cared about it working at point blank range so I could be a little sloppy. The problem is that 6MHz/32000 is not an even number of timer ticks. Even if it were I was struggling trying to get the C compiler to produce clean enough code to get the state changes reasonably close. I eventually gave up and wrote a some hand tuned assembler which worked quite well. But was, of course, tuned for my 6MHz LM3S811


/* hand tuned ir blinker for the '811 */
.thumb_func
.global blinker
blinker:
;@ r0 times, r1 address
2:
    mov r2,#0xFF
    str r2,[r1]
    mov r2,#29
1:  sub r2,r2,#1
    bne 1b
    mov r2,#0x00
    str r2,[r1]
    mov r2,#29
1:  sub r2,r2,#1
    bne 1b
    sub r0,r0,#1
    bne 2b
    bx lr

The idea was to pre-calculate the number of on/off blinks I would need, feed this function the number of blinks and the address to write to the IR LED. The code turns on the led, has a tight little loop that counts to 29, two instructions per loop not counting the setup. Then turns off the led, counts to 29 and repeats all of that r0 times. I didnt bother getting anal with the nops to make the whole loop a perfect square wave, the r0 counter is going to make it just a little off. From C I timed it using the system timer, had it do 150 or so blinks and divided by that number and it came out around 181.1 ticks per loop, used that number to pre-calculate the number of loops I would need for each "on" period. For the blank periods of the IR output I know from decoding how long the on plus off period needed to be, so grab the timer tick before blasting the IR, after blasting the IR, wait until the timer has burned both the on and off periods of time before moving on to the next on pulse.



void blastir ( const unsigned short *data, unsigned int len )
{
    unsigned int ra,rb,rc;

    SetLedx();
    ResetTimer();
    for(rb=0;rb<4;rb++)
    {
        rc=GET32(TIMER2_BASE+GPTMTAR);
        for(ra=0;ra < len;ra+=2)
        {
            blinker(data[ra+0],IRLED_BASE+GPIO_DATA+((IRLED_PIN) << 2));
            rc-=data[ra+1];
            while(GET32(TIMER2_BASE+GPTMTAR) > rc);
        }
    }
    ResetLedx();
}

Again, its actually pretty simple stuff.

One thing to note if you ever try this and perhaps try this with two modern or similar devices. If they use the same code or a code with time periods that are similar you may confuse the target device. The remote transmits some code using protocol A, both your microcontroller and your target device (TV, VCR, whatever) see the code from the remote, your microcontroller turns around and transmits another code using protocol A. The human may still be pressing the button on the remote, now the target device is getting hit with IR from the remote and the IR from your microcontroller and it gets confused and doesnt react to either.

IR Reciever Module. Digikey part number 425-1904-ND
Looks like it might be similar to the module available at radio shack although I found out the hard way that the pinout at Radio Shack is different and you will warm up the module and remove some fingerprints when you touch it to find out it is clearly wired wrong. My Luminary Micro LM3S811 evaluation board survived the torture fortunately. (I bricked the first one I bought when I accidentally programmed a JTAG pin the wrong way).

Monday, January 7, 2008

blinker3

Blinking the led, part 3.

blinker3-20080106a.tar.gz

This post continues to improve on or at least change a simple example program that blinks the led on a Luminary Micro Stellaris evaluation board. Specifically supported are the LM3S811, LM3S1968, and the LM3S6965 evaluation boards.

part 1, blinker1, was very simple assembler. part 2, blinker 2, was C with some support functions in assembler. Both part 1 and part 2 simply counted for a while between blinks. Part 3 uses the on chip timer to control the rate of the blinks with some precision.

This family of chips has three on board timers, for what I care about here we are going to take one of the timers and make it a 32 bit count down timer that rolls over. Basically you can grab the timer value when you start something, grab the timer value again and subtract it from the start time to get the duration. In this case we continue sampling the timer and subtracting from the start time until we reach a pre-determined timeout. Note, being 32 bit timers and using 32 bit math you dont have to worry about the counter roll over because you are allowed to borrow off the top of the register. Take a 3 or 4 bit counter and work it out on paper...

Because I am using the timer I needed to make sure the system clock was predictable. At one point I was having problems with the system clock between loading with lmiflash and reset/powerup. So I made it predictable by reading the register, preserving the reserved bits and oring on the bits I wanted which were mostly the default bits. For the 811 I did not have to touch the clock, it just worked.

So, the 811 eval board uses a 6mhz clock, so if we count to 6,000,000 then toggle the led that should be once a second. And it is. The 1968 and 6965 are 8mhz based so when compiling for those the timeout is 8,000,000 clock ticks. All of the problems we had with the for loop in blinker2 go away as gcc is not willing to optimize out the GET32 timer reads.

Tuesday, January 1, 2008

blinker2

Blinking an led. Part 2.

Full source for a working, although slow blinking, example is located here:
blinker2-20080106a.tar.gz
blinker2.tar.gz

In this post I am going to look at blinking the LED using C with gcc and binutils instead of assembler. This example is targeted at the Luminary Micro Stellaris LM3S811, LM3S1968, and LM3S6965 evaluation boards.

Just like Linux, high level language development is an exercise in self abuse. I am pretty sure, no, I am certain that most developers are not aware of what the compiler is doing behind the scenes. Whats even more scary is how many embedded developers fall into this category. I will take a tangent here in a bit to discuss the problem.

Repeating blinker1 using C is actually not that big of a leap. I am using the C compiler that I created in a prior post, which has no C library to get in the way. The compiler switches I am using are hopefully keeping the C library out of the way anyway.

With blinker1, on reset, we branched directly to the main program. When you add C with or without a C library you will want to branch to some pre-main code that does things like zero the .bss memory (I find it best to initialize variables at run time and not make assumptions). Then prep the argc and argv arguments and call main. So the first change I made was to at least put a pre main step. Its not a bad idea to call main and then fall into an infinite loop. The alternative is to branch to main and remember that you cannot return from the function.

So the startup code using my bootloader now looks like this (there is a link to the full source below which includes the normal-not-bootloader startup code).


/* novectors.s */
.cpu cortex-m3
.thumb
.global _start
_start:
bl main
b .
.end

I have tried all of the tricks over the years and the compilers simply cannot understand when they should not optimize writes to hardware registers. The only way to insure it is done right is to write a few core functions in assembler. I call them PUT32 and GET32. And since the compiler doesnt optimize that well (it optimizes when you dont want it to but not so much when you do) I have assembler functions for the read-modify-write functions PUTGETSET32 and PUTGETRESET32.


/* putget.s */
;@-----------------------
.cpu cortex-m3
.thumb
;@-----------------------
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
;@-----------------------
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
;@-----------------------
.thumb_func
.globl PUTGETRESET
PUTGETRESET:
ldr r2,[r0]
bic r2,r1
str r2,[r0]
bx lr
;@-----------------------
.thumb_func
.globl PUTGETSET
PUTGETSET:
ldr r2,[r0]
orr r2,r1
str r2,[r0]
bx lr
;@-----------------------
.end
;@-----------------------

Okay so Reset goes to _start, then _start calls main. Just like blinker1 main is going to setup the leds then go into an infinite loop with counters for delays between the setting and resetting of the led output pin. I have created defines for the hardware addresses and bits.


//------------------------------------------------------------------------
//------------------------------------------------------------------------
/* blinker2.c */

#include "lmistuff.h"

#define CHIP6965

#ifdef CHIP1968
#define LED_RCGC2     LED_1968_RCGC2
#define LED_BASE      LED_1968_BASE
#define LED_PIN       LED_1968_PIN
#endif

#ifdef CHIP6965
#define LED_RCGC2     LED_6965_RCGC2
#define LED_BASE      LED_6965_BASE
#define LED_PIN       LED_6965_PIN
#endif
//------------------------------------------------------------------------
unsigned int ra;
//------------------------------------------------------------------------
void PUT32 ( unsigned long, unsigned long);
unsigned long GET32 ( unsigned long );
void PUTGETSET ( unsigned long addr, unsigned long bit ); 
void PUTGETRESET ( unsigned long addr, unsigned long bit ); 
//------------------------------------------------------------------------

void SetLed ( void )
{
    PUT32(LED_BASE+GPIO_DATA+((LED_PIN) << 2),LED_PIN);
}
//------------------------------------------------------------------------
void ResetLed ( void )
{
    PUT32(LED_BASE+GPIO_DATA+((LED_PIN) << 2),0);
}
//------------------------------------------------------------------------
void SetupLeds ( void )
{
    PUTGETSET  (SYSCON_BASE+SYSCON_RCGC2,LED_RCGC2);
    PUTGETRESET(LED_BASE+GPIO_AFSEL,LED_PIN);
    PUTGETSET  (LED_BASE+GPIO_DIR  ,LED_PIN);
    PUTGETRESET(LED_BASE+GPIO_ODR  ,LED_PIN);
    PUTGETSET  (LED_BASE+GPIO_DEN  ,LED_PIN);
}
//------------------------------------------------------------------------
int main ( void )
{
    SetupLeds();
    while(1)
    {
        for(ra=0;ra < 0x3E0000;ra++) ; SetLed();
        for(ra=0;ra < 0x3E0000;ra++) ; ResetLed();
    }
}
//------------------------------------------------------------------------
//------------------------------------------------------------------------

Lastly the Makefile



COPS = -Wall -O2 -mthumb -nostdlib -nostartfiles -ffreestanding 

all : blinker2.bin blinker2.norm.bin

novectors.o : novectors.s
 arm-thumb-elf-as novectors.s -o novectors.o

vectors.o : vectors.s
 arm-thumb-elf-as vectors.s -o vectors.o

putget.o : putget.s
 arm-thumb-elf-as putget.s -o putget.o

blinker2.o : blinker2.c lmistuff.h
 arm-thumb-elf-gcc $(COPS) -c blinker2.c -o blinker2.o

blinker2.elf : novectors.o putget.o blinker2.o blmemmap
 arm-thumb-elf-gcc $(COPS) novectors.o putget.o blinker2.o -T blmemmap -o blinker2.elf
 arm-thumb-elf-objdump -D blinker2.elf > blinker2.list

blinker2.bin : blinker2.elf
 arm-thumb-elf-objcopy blinker2.elf blinker2.bin -O binary

blinker2.norm.elf : vectors.o putget.o blinker2.o memmap
 arm-thumb-elf-gcc -Wall $(COPS) vectors.o putget.o blinker2.o -T memmap -o blinker2.norm.elf
 arm-thumb-elf-objdump -D blinker2.norm.elf > blinker2.norm.list

blinker2.norm.bin : blinker2.norm.elf
 arm-thumb-elf-objcopy blinker2.norm.elf blinker2.norm.bin -O binary

clean:
 rm *.bin
 rm *.o
 rm *.elf
 rm *.list

And that should do it right? Well, no actually. It doesnt work. The led comes on and stays on. Why is that? Doesnt a computer do what we tell it to do, not what we want it to do? Actually that is probably still true. Here is the problem. I optimized, if you remove the -O2 or change it to -O0 then it will work. Another way to make it work is to put the word volatile in front of the unsigned int ra; definition. Why?

The optimizer looks at the code and realizes that the variable ra is counting sure, but nothing is using ra as an input. That counting is just a waste of time get rid of it. So the compiler has reduced our main loop to:


    while(1)
    {
        ra=0x3E0000; SetLed();
        ra=0x3E0000; ResetLed();
    }

Using the -S option to compile to assembler (you can also see this if you compile and dissasseble):


 ldr r5, .L17
.L14:
 mov r4, #248
 lsl r4, r4, #14
 str r4, [r5]
 bl SetLed
 str r4, [r5]
 bl ResetLed
 b .L14
.L18:
 .align 2
.L17:
 .word ra

248 is 0xF8, when shifted 14 times it becomes 0x3E0000, as expected. At least the compiler is not completely cruel and has the heart to set ra to 0x3E0000 to replace the for loop. As humans we can see that even that could have been optimized out. Or they could have set ra one time outside the while loop. But ra is a global and this compiler apparently does not penetrate into the SetLed and ResetLed code to see that ra is not needed nor touched there. (I have seen other compilers do this which usually means you have to separate functions into separately compiled files so that the optimizer cannot penetrate into your code). The least painful solution is to mark ra as volatile:


volatile unsigned int ra;

This tells the compiler that whatever you do, any time ra changes you must write the change to the memory location used to hold ra. You will find that some compilers will optimize variables to registers and do the things you said for them to do but didnt feel the need to waste cycles writing intermediate values to memory. This is actually quite desireable in general except in cases where you are writing to a hardware register and rely on all of the writes happening. Compiling for debug, depending on the compiler, might mean to the compiler assume all variables are volatile (so that the debugger can watch memory locations). Here again, you would be surprised how many developers are unaware of what is going on behind the curtain and are surprised when code that has been working for months/years when compiled for debug stops working when it is compiled for production.

Before showing the output of adding the volatile, a quick note. I am using gcc 4.2.2 built as a cross compiler as described in a previous post. I also created a gcc 3.3.6 cross compiler and a gcc 3.4.6 cross compiler. Both of the 3.x.x compilers are not willing and/or able to optimize out the delay counter, and for those you would get a blinking led. It would be a long while between blinks but it would work. I cut a lot out but you can basically see it counting from zero to some number before calling SetLed.


 mov r3, #0
.L16:
 add r3, r3, #1
 cmp r3, r4
 bls .L16
 str r3, [r1]
 bl SetLed

Since the count of 0x3E0000 was determined for the blinker1 by assuming two instructions in the delay loop and trying to count to 8 million, having three instructions means it will take 50% longer than its blinker1 counterpart. Ahh, this also demonstrates that during this loop ra was not stored back to memory each time it changed, only at the end after the loop is the value ra saved to memory.

The Coded Sourcery G++ Lite compiler, 2007-q3, is 4.2.2 based and likewise optimizes out the for loop all together.

Okay, so we tell the compiler that ra is volatile


.L20:
 ldr r3, [r4]
 add r3, r3, #1
 str r3, [r4]
 ldr r3, .L25+4
 ldr r2, [r4]
 cmp r2, r3
 bls .L20

Well, the computer did what we told it to do, every time it needs to read and write ra it reads it from memory then changes it and writes it back. And perhaps because of the limited number of registers or who knows why, the max count value is not stored in a register it has to be fetched every time (ldr r3,.L25+4). So now our timeout count that was determined based on two instructions per loop, now has 7 instructions in the loop, assuming all instructions are one clock cycle that means it should take 3.5 times longer to blink than blinker1.

Just for my own entertainment if we follow blinker1 literally and use this as our delay loop


  for(ra=0x3E0000;ra;ra--) ; ResetLed();

We get


.L21:
 ldr r3, [r4]
 sub r3, r3, #1
 str r3, [r4]
 ldr r3, [r4]
 cmp r3, #0
 bne .L21

Since it is comparing with zero now instead of a terminal count we save one instruction which because it had a memory access can take longer in general unless you have zero-wait-state memory.

Again, just for fun, if I remove the volatile and use gcc 3.4.6, it gets very close to the optimal solution:


.L20:
 sub r3, r3, #1
 cmp r3, #0
 bne .L20

The problem I have is that other compilers know that a sub is really a subs which means set the flags after you do the subtract. And branch if not equal means branch if not equal to zero, subs already set the z or zero flag so there is no need for the additional compare with zero, it has already been done.

So this trivial program has created lots of problems, and for this specific task you would have been closer to success on your first try with the older gcc 3.x.x compiler over the current 4.2.2.

If you ever have the opportunity to examine what ARM's own compiler does, it can be absolutely amazing. Now that Keil is ARM and I think it uses the rvct as a compiler the Keil eval may very well do amazing things. I will have to look at that some day.

Oh, the reason I wrote the PUTGETSET and PUTGETRESET functions in assembler is because the output of gcc had a few extra unnecessary operations.

Hmmm, very interesting note. blinker2 had a bug. For those that dont know this semi-documented feature of the ARM cores (Even the popular ARM7), is that when you are executing in thumb mode the lsbit of your PC (r15) is set, in arm mode the bit is clear. Likewise compilers have to produce the proper addresses with the bit set or clear. I had the vectors.s table wrong, I wasnt using it because I was running with the bootloader. Bootloader works because the reset vector pointed to the C function main which was compiled as thumb thus telling the linker to generate an address in the vector table with the lsbit set. When I changed the code so that the reset vector jumped over the vector table to a _start label then from there I bl main, I had not declared _start as a .thumb_func so despite producing thumb instructions it created an ARM address in the vector table. And the 811 board I just got would not execute. What disturbs me greatly about all of this is that this is supposed to be a thumb or thumb2 ONLY core from ARM. If you know its thumb only shouldnt you be ignoring anything to do with ARM mode? Wouldnt that be a safe assumption? I guess not.

20070106a, fixed the vectors.s bug and added LM3S811 support.

lmistuff