2010JulAug EP log: Difference between revisions
Monnierast (talk | contribs) No edit summary |
Monnierast (talk | contribs) e |
||
Line 228: | Line 228: | ||
The program seem to crash after no interrupt is received (the semaphore goes into timeout after 10 seconds). Does this have anything to do with the "kernel:Disabling IRQ #20" | The program seem to crash after no interrupt is received (the semaphore goes into timeout after 10 seconds). Does this have anything to do with the "kernel:Disabling IRQ #20" | ||
I commented out the | I commented out the spin_lock from the driver for a test since that may conflict with xenomai. In practice the program crashes either using the interrupt method or the ioct method. | ||
01/08/10 | |||
Maybe converting the old driver to kernel 2.6.32 is actually easier that debugging the new driver. I try to compile the old driver. I copy the kernel dir in my home to kernel.old and copy the CHAMP kernel dir to my home dir, | |||
Now copy new Makefile from kernel to kernel_old and compile driver: | |||
make -C /lib/modules/2.6.32.11-xenomai-2.5.3/build SUBDIRS=/home/ep41/kernel modules -lnative -lrtdm | |||
make[1]: Entering directory `/usr/src/linux-2.6.32.11' | |||
CC [M] /home/ep41/kernel/astropci.o | |||
In file included from /home/ep41/kernel/astropci.c:65: | |||
/usr/include/xenomai/native/task.h: In function ‘rt_task_spawn’: | |||
/usr/include/xenomai/native/task.h:317: warning: ‘rt_task_create’ is deprecated (declared at /usr/include/xenomai/native/task.h:250) | |||
/home/ep41/kernel/astropci.c: In function ‘astropci_init’: | |||
/home/ep41/kernel/astropci.c:193: error: implicit declaration of function ‘pci_module_init’ | |||
/home/ep41/kernel/astropci.c: In function ‘astropci_exit’: | |||
/home/ep41/kernel/astropci.c:290: error: void value not ignored as it ought to be | |||
/home/ep41/kernel/astropci.c: In function ‘__astropci_isr’: | |||
/home/ep41/kernel/astropci.c:829: warning: initialization makes integer from pointer without a cast | |||
make[2]: *** [/home/ep41/kernel/astropci.o] Error 1 | |||
make[1]: *** [_module_/home/ep41/kernel] Error 2 | |||
make[1]: Leaving directory `/usr/src/linux-2.6.32.11' | |||
make: *** [default] Error 2 | |||
OK, pci_module_init has been substituted with pci_register_driver. I just need to change the name. From line 290 I simply remove the variable int ret and all references to it since the function now returns void. | |||
- ret = unregister_chrdev(major[i], board[i]); | |||
+ unregister_chrdev(major[i], board[i]); |
Revision as of 16:22, 1 August 2010
When running ./RTscheduler 60000 the system hungs after start exposure at a random time. There is no reported event in the logs when the machine hungs, however in /var/log/kern.log I find two messages:
Jul 25 09:37:24 mirkwood kernel: ( astropci_check_reply_flags ) status: 0x0
Printed by this function:
/******************************************************************************
FUNCTION: ASTROPCI_CHECK_REPLY_FLAGS PURPOSE: Check the current PCI DSP status. Uses HSTR HTF bits 3,4,5. RETURNS: Returns DON if HTF bits are a 1 and command successfully completed. Returns RDR if HTF bits are a 2 and a reply needs to be read. Returns ERR if HTF bits are a 3 and command failed. Returns SYR if HTF bits are a 4 and a system reset occurred.
NOTES: This function must be called after sending a command to the PCI board or controller.
- /
static int astropci_check_reply_flags( int devnum ) { uint32_t status = 0; int reply = TIMEOUT;
do { astropci_printf( "( astropci_check_reply_flags ) status: 0x%X\n", status ); // sds - Oct 23, 2008
status = astropci_wait_for_condition( devnum, CHECK_REPLY );
if ( status == DONE_STATUS ) reply = DON;
else if ( status == READ_REPLY_STATUS ) reply = RDR;
else if ( status == ERROR_STATUS ) reply = ERR;
else if ( status == SYSTEM_RESET_STATUS ) reply = SYR;
else if ( status == READOUT_STATUS ) reply = READOUT;
// Clear the status bits if not in READOUT if ( reply != READOUT )
Write_HCVR( devnum, ( uint32_t )CLEAR_REPLY_FLAGS );
} while ( status == BUSY_STATUS );
return reply; }
The returned code 0x0 corresponds to TIMEOUT_STATUS:
enum { TIMEOUT_STATUS = 0, DONE_STATUS, READ_REPLY_STATUS, ERROR_STATUS, SYSTEM_RESET_STATUS, READOUT_STATUS, BUSY_STATUS };
I also get this message:
Jul 25 09:37:25 mirkwood kernel: (Write_HCVR): HCVR not ready. Count: 0 Value: 0x8073
in:
/******************************************************************************
FUNCTION: WRITE_HCVR
PURPOSE: Writes a 32-bit value to the HCVR. Checks that the HCVR register
bit 1 is not set, otherwise a command is still in the register. Calls WriteRegister_32.
RETURNS: None
- /
static int Write_HCVR( int devnum, unsigned int regVal ) { unsigned int currentHcvrValue = 0; int i, status = -EIO;
for ( i=0; i<100; i++ ) { currentHcvrValue = ReadRegister_32( devices[ devnum ].ioaddr + HCVR );
if ( ( currentHcvrValue & ( unsigned int )0x1 ) == 0 ) { status = 0; break; }
astropci_printf( "(Write_HCVR): HCVR not ready. Count: %d Value: 0x%X\n", i, currentHcvrValue ); }
if ( status == 0 ) WriteRegister_32( regVal, devices[ devnum ].ioaddr + HCVR );
return status; }
I found this (useful?) comment:
- 30-Aug-2005 sds 1.7 Added Read/Write register functions, which include
- delays before reading/writing the PCI DSP registers
- (HCTR, HSTR, etc). Also includes checking bit 1 of
- the HCVR, if it's set, do not write to the HCVR register.
- Also did general cleanup, including re-writing the
- astropci_wait_for_condition function. Updated for current
- kernel PCI API.
And this other:
/******************************************************************************
FUNCTION: ASTROPCI_IOCTL() PURPOSE: Entry point. Control a character device. RETURNS: Returns 0 for success, or the appropriate error number. NOTES: The spinlocks have been removed because they shouldn't be used here since the functions used here can sleep. This will cause a processor to spin forever and deadlock when two PCI boards are active simultaneously. This is because the spin lock is global. A mutex (semaphore) can be used here, but it causes the load/unload process to result in WriteHCVR failure for some reason! Frankly, I don't think any locking is needed since each instance of the driver accesses different hardware and each instance is only opened by one program at a time.
- /
The log files of the old mirc software does not have any of these entries. CHAMP log file has: Oct 15 20:11:38 champ [<f8901392>] astropci_check_reply_flags+0x2e/0x57 [astropci]
Which is similar but not the same (the printk function in the kernen seem different, showing kernel modification?)
27/07/10
Moved driver to my local directory (kernel) to do testings on it. Removed astropci0 from /lib/udev/devices/ (to avoid loading the driver at boot time).
( astropci_check_reply_flags ) status: 0x0 was not an error message. It was always at zero because there was a mistake in the driver.
28/07/10
commented out ioctl from RTscheduler.c
//ioctl(pci_fd, ASTROPCI_GET_FRAMES_READ, &astropci_reply);
it seems a s if this line is causing the "HCVR not ready" error
Mirkwood did not crash a single time since commenting out the ioctl. A possible cause for the problem is that ioctl is a call to the Linux kernel and will switch the task in secondary mode (soft realtime). Conversely rt_task_sleep() is a call to the realtime kernel (scheduler) and will switch to primary mode (hard realtime). The fast switching of context may me causing the problem. See:
while (astropci_reply == astropci_reply_prev){
rt_task_sleep(2000); // .01 musec astropci_reply++; //ioctl(pci_fd, ASTROPCI_GET_FRAMES_READ, &astropci_reply);
}
Experiment: put the ioctl back in and comment out the rt_task_sleep:
while (astropci_reply == astropci_reply_prev){
//rt_task_sleep(2000); // .01 musec ioctl(pci_fd, ASTROPCI_GET_FRAMES_READ, &astropci_reply);
}
It has not crashed for about 15 minutes now!!! It did not crash but eventually it hanged like when using the interrupt in user-space.
I also got this message: Message from syslogd@mirkwood at Jul 28 19:06:40 ...
kernel:Disabling IRQ #20
29/07/10
The hung problem was caused by the ioctl/rt_task_sleep switching contest. Now the program does not crash but it seems to stop at the ioctl, therefore I will put in print statements to test this possibility.
Lost all the wiki edit of today during an auto-logout from the wiki page.
30/07/10
Test if the HCVR message is relevant to the crash:
add user space ISR handler.
/***********************************************************************
* Task to transfer data from astropci on interrupt ***********************************************************************/
void interrupt_task(void *cookie) {
int err; while(!exc.endProg){ // blocking interrupt handler err = rt_intr_wait(&intr_desc, TM_INFINITE); if (0 >= err) { rt_printf("Timeout on data interrupt!!!\n"); break; } //else rt_printf("Interrupt OK\n"); rt_sem_v(&switch_sem); }
}
Any IRQ20 sent from the astropci board will trigger a semaphore.
I put the semaphore just before the ioctl while loop:
// blocking semaphore from interrupt handler err = rt_sem_p(&switch_sem, timeout);
astropci_reply_prev=astropci_reply;
// poll astropci (some latency here.. will need to average
while (astropci_reply == astropci_reply_prev){
ioctl(pci_fd, ASTROPCI_GET_FRAMES_READ, &astropci_reply);
}
the "HCVR not ready" error does not show any more in kernel messages. After a while I get a kernel crash probably due to context switching from Xenomai to Linux kernel.
31/07/10
The program seem to crash after no interrupt is received (the semaphore goes into timeout after 10 seconds). Does this have anything to do with the "kernel:Disabling IRQ #20"
I commented out the spin_lock from the driver for a test since that may conflict with xenomai. In practice the program crashes either using the interrupt method or the ioct method.
01/08/10 Maybe converting the old driver to kernel 2.6.32 is actually easier that debugging the new driver. I try to compile the old driver. I copy the kernel dir in my home to kernel.old and copy the CHAMP kernel dir to my home dir, Now copy new Makefile from kernel to kernel_old and compile driver: make -C /lib/modules/2.6.32.11-xenomai-2.5.3/build SUBDIRS=/home/ep41/kernel modules -lnative -lrtdm make[1]: Entering directory `/usr/src/linux-2.6.32.11'
CC [M] /home/ep41/kernel/astropci.o
In file included from /home/ep41/kernel/astropci.c:65: /usr/include/xenomai/native/task.h: In function ‘rt_task_spawn’: /usr/include/xenomai/native/task.h:317: warning: ‘rt_task_create’ is deprecated (declared at /usr/include/xenomai/native/task.h:250) /home/ep41/kernel/astropci.c: In function ‘astropci_init’: /home/ep41/kernel/astropci.c:193: error: implicit declaration of function ‘pci_module_init’ /home/ep41/kernel/astropci.c: In function ‘astropci_exit’: /home/ep41/kernel/astropci.c:290: error: void value not ignored as it ought to be /home/ep41/kernel/astropci.c: In function ‘__astropci_isr’: /home/ep41/kernel/astropci.c:829: warning: initialization makes integer from pointer without a cast make[2]: *** [/home/ep41/kernel/astropci.o] Error 1 make[1]: *** [_module_/home/ep41/kernel] Error 2 make[1]: Leaving directory `/usr/src/linux-2.6.32.11' make: *** [default] Error 2
OK, pci_module_init has been substituted with pci_register_driver. I just need to change the name. From line 290 I simply remove the variable int ret and all references to it since the function now returns void.
- ret = unregister_chrdev(major[i], board[i]); + unregister_chrdev(major[i], board[i]);