Pages

May 14, 2011

Linux: Application crash analysis

Some times your application gets terminated without giving a clue. The reasons may be many bad memory reference, lost pointer to a function, buffer overflow, kernel SIGNAL. 
In most of the cases SIGKILL is sent to the application. 


What is SIGKILL ? 
When sent to a program, SIGKILL causes it to terminate immediately. In contrast to SIGTERM and SIGINT, this signal cannot be caught or ignored, and the receiving process cannot perform any clean-up upon receiving this signal. 
There might not be enough information to find the exact cause of the termination. The logs might have only limited debug prints which can only lead to the cause in only a broader sense. So it might take lot of analysis time to route out the exact function in which SIGKILL was received. 


Hopefully below steps will help you to find the exact reason. 
Steps to create the final file that can give the termination cause: 
1) Login to target where your application is getting killed/running. 
2)$ ps –ef | grep “app.xxx” 
Get the PID of your process. PID is in the 2nd column of the result of the ps –ef command. 
3)Then open the following file 
$ vi /proc/"PID"/maps 
maps is file that contains the memory address range allocated to your application. 
It looks as shown below: 
10000000-1014e000 r-xp 00000000 00:0c 3734 app.xxx 
1015e000-10161000 rwxp 0014e000 00:0c 3734 app.xxx 
Copy the contents to a text file on to your PC. Ex: maps.txt 
4)When any application is killed it creates a core dump file. But it cannot be directly analyzed, it needs some treatment. 
Its better if you tweak your OS to generate a core with PID,Timestamp,ImageName 
The reference for searching this core file will be your process PID. 
5)Use GDB backtrace to get to know last symbol accessed and store in mortem.txt 
Once GDB has loaded up, you can run the “bt” command to display backtrace of the program stack. 
Example: 
(gdb) bt 
#0 0x40c8b6ec in s7_listen () from /usr/lib/libgcs7.so 
#1 0x40c8be4e in DlgcHost_GcSS7::s7_Listen () from /usr/lib/libgcs7.so 
#2 0x40278255 in gc_Listen () at eval.c:41 
#3 0x08049ac1 in route () at eval.c:41 
#4 0x0804986a in main () at eval.c:41 
#5 0x402c9507 in __libc_start_main (main=0x8049220 , argc=2, ubp_av=0xbfffeae4, init=0x8048d64 <_init>, fini=0x804b4f0 <_fini>, rtld_fini=0x4000dc14 <_dl_fini>, stack_end=0xbfffeadc) at ../sysdeps/generic/libc-start.c:129 


Looking at the stack itself, it goes upward where the last executed function will always be frame 0. That is where the application experienced the segmentation fault: 
#0 0x40c8b6ec in s7_listen () from /usr/lib/libgcs7.so 
6)Go to your code server where you built your image 
Usually the images built will have following three identities 
simple image app.xxx 
Nostrip version app.xxx.nostrip 
Map version app.xxx.map 
abc.xxx.nostrips -This is your application executable minus symbol information. 
abc.xxx.map - This contains the function locations. 
Copy these 2 files to the path were your application is compiled. 
7)Create 2 directories named “src” and “hdr” in your base application folder where your application is compiled. 
Copy all your source and header files correspondingly to src and hdr directory. 
Now you have src, hdr, *.nostrip and *.map all in same path. 
8) Now we need to create a dump file which will contain the function and its address location information. This dump file is created using all your source & header files along with the *.nostrip & *.map files. 


We need to use the objdump file to create the dump file. Objdump file is architecture specific.Compiler bin will usually have objdump 
9) The execute the following command: 
/bin/-objdump –D –S *.nostrip >> x.dmp 
10) Open this file. 
Now search for the address in “call backtrace” (mortem.txt ) in the *.dmp file. 
Usually these are found and to be within a function call. So this function can be the reason for termination of the application. 


Note: Even if there are many address location in “call backtrace” but all these may ultimately lead to one function. Many address’s is due to functions with in function. If there was to be a termination due to one independent function, then only that functions location would have been in “call backtrace”. 


Further reading 
http://www.trilithium.com/johan/2005/08/linux-gate/ 
proc(5),mmap(2) 
Bye !!!!

No comments: