Interpreting segfault messages

Graham_Spratt1 · April 17, 2015, 11:08am

I’m trying to understand some segfault messages, any ideas?

These messages are appearing in the messages log on CentOS after the web app crashes.

Apr 16 11:10:44 scb kernel: SCB[21433]: segfault at 90 ip b76e3836 sp bfac6ee0 error 4
Apr 16 13:45:17 scb kernel: SCB[7987]: segfault at 654c7375 ip 0132b902 sp 00a59460 error 4 in XojoConsoleFramework32.so[ff8000+755000]

Apr 17 11:33:56 scb kernel: SCB[3957]: segfault at 8308 ip 0103c1b0 sp 020cf370 error 4 in XojoConsoleFramework32.so[c72000+755000]
Apr 17 11:47:38 scb kernel: SCB[1908]: segfault at 90 ip b7687879 sp bf8c4920 error 4

Greg_O_Lone · April 17, 2015, 12:23pm

I usually see these when there’s a missing dependency on the machine. What version of centos and what version of the IDE are you using?

Graham_Spratt1 · April 17, 2015, 12:35pm

The web app starts okay and runs fine then it randomly crashes.

It’s CentOS release 6.6 (Final) and Xojo 2015 R2.

Greg_O_Lone · April 17, 2015, 12:39pm

I’d start by checking 3 things:

Make sure you have libicu.i386 installed. That’s a new requirement for r2.
Watch the server to see if you are running out if ram.
Check to see if the server has a process killer running. Some servers have a service that assumes long running processes have hung and need to be killed.

Graham_Spratt1 · April 17, 2015, 12:43pm

Had to install libicu.i386 other wise it wouldn’t let me start the app, so thats okay.
Ram is fine

[root@scb ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         12052        953      11099          0        122        463
-/+ buffers/cache:        367      11685
Swap:         4063          0       4063

It’s a bog standard centos install with just HaProxy and the web app on it.

I’m running six of these web apps on different ports, it’s not always the same one that crashes and it happens so rarely 2 or 3 times a day.

Greg_O_Lone · April 17, 2015, 12:46pm

Do you have any declares in your code?

Graham_Spratt1 · April 17, 2015, 1:01pm

Nope. It’s a very simple app.

It takes data from an IPC Socket which is simple tab separated data and then pushes that data out to the clients.

Very quick example of what i’m doing below. The app has no input controls it purely displays information.

    
Data = DefineEncoding(ReadAll, Encodings.ASCII)

LineSplit = Data.Split(EndOfLine.Windows)

For i As Integer = 0 To LineSplit.Ubound

TabSplit =  LineSplit(i).Split(Chr(9))

If i = 0 Then

x1 = TabSplit(0)
x2 = TabSplit(1)
x3 = TabSplit(3)

ElseIf i = 1 Then

x4 = TabSplit(0)
x5 = TabSplit(1)
x6 = TabSplit(3)

End If

Next i

' Loop on Pages
For i As Integer = 0 To Pages.Ubound

Pages.label1 = x1
Pages.label2 = x2
Pages.label3 = x3
Pages.label4 = x4
Pages.label5 = x5
Pages.label6 = x6

Next i

John_Joyce · April 17, 2015, 4:35pm

How many physical processors does the box have? If you are not sure type “nproc” at the shell.

John_Joyce · April 17, 2015, 4:37pm

To more things: Is there anything interesting in /var/log/messages when it crashes?, how are you launching the instances of the app?

Graham_Spratt1 · April 17, 2015, 4:38pm

It’s got 8 processors and no nothing else in any of the logs around the time of the crashes.

Graham_Spratt1 · April 17, 2015, 4:41pm

at the moment they are started as a service, I have tried in the past running them in a screen. Still had the same issue

John_Joyce · April 17, 2015, 5:32pm

Hmmm - it is hard to say without being able to monitor the status of the server as things are happening. It does sound like a resource limit is being reached of some type…

at least, if you are using upstart to spawn these instances you can configure it to automatically restart them if they fail. Are you using upstart? Are these processes running as root?

John_Joyce · April 17, 2015, 5:42pm

Also - what is the output of this command “grep -i killed /var/log/messages*”

Graham_Spratt1 · April 17, 2015, 5:51pm

had another one short while ago, your see the only message before that one was the one from earlier today.

Apr 17 11:47:38 scb kernel: SCB[1908]: segfault at 90 ip b7687879 sp bf8c4920 error 4
Apr 17 18:42:57 scb kernel: SCB[28483]: segfault at 90 ip b771748b sp bfee9350 error 4

Graham_Spratt1 · April 17, 2015, 5:55pm

No output from grep -i killed /var/log/messages* from today

Graham_Spratt1 · April 17, 2015, 5:56pm

Top

top - 18:55:36 up 2 days, 23:52,  3 users,  load average: 0.57, 0.85, 1.06
Tasks: 182 total,   1 running, 181 sleeping,   0 stopped,   0 zombie
Cpu0  :  1.7%us,  1.0%sy,  0.0%ni, 97.0%id,  0.0%wa,  0.3%hi,  0.0%si,  0.0%st
Cpu1  :  3.7%us,  0.7%sy,  0.0%ni, 95.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.3%us,  0.3%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.3%hi,  0.3%si,  0.0%st
Cpu3  :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  1.3%us,  0.3%sy,  0.0%ni, 98.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  1.7%us,  0.0%sy,  0.0%ni, 98.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  12342148k total,  1289536k used, 11052612k free,   129944k buffers
Swap:  4161532k total,        0k used,  4161532k free,   573328k cached

Graham_Spratt1 · April 17, 2015, 6:03pm

[quote=180952:@John Joyce]Hmmm - it is hard to say without being able to monitor the status of the server as things are happening. It does sound like a resource limit is being reached of some type…

at least, if you are using upstart to spawn these instances you can configure it to automatically restart them if they fail. Are you using upstart? Are these processes running as root?[/quote]

It’s still in beta so at present no I’m not using upstart and yes they are running as root.

John_Joyce · April 17, 2015, 7:08pm

Was it doing this with Xojo2015R1?

This seems to be a memory related problem in the binary (null pointer). It is hard to say what is the cause, could be anything. Some things you could try are: Compile with a different version of Xojo and see what happens. Consider updating the packages on your server “yum update installed” or at least the ones that you have installed for 32 bit support - (sometimes updating packages can cause voodoo). Erase and replace all the Xojo library files and other app resources with fresh copies. Perhaps, if you haven’t already, do a test with just one instance and no haproxy just to rule out that configuration as a problem.

Also - if you use upstart or systemd you can set the instances to restart automatically on crash - which is a workaround but might be helpful.

Graham_Spratt1 · April 17, 2015, 9:37pm

[quote=180966:@John Joyce]Was it doing this with Xojo2015R1?

This seems to be a memory related problem in the binary (null pointer). It is hard to say what is the cause, could be anything. Some things you could try are: Compile with a different version of Xojo and see what happens. Consider updating the packages on your server “yum update installed” or at least the ones that you have installed for 32 bit support - (sometimes updating packages can cause voodoo). Erase and replace all the Xojo library files and other app resources with fresh copies. Perhaps, if you haven’t already, do a test with just one instance and no haproxy just to rule out that configuration as a problem.

Also - if you use upstart or systemd you can set the instances to restart automatically on crash - which is a workaround but might be helpful.[/quote]

It might have done it under Xojo2015R1 but can’t be sure.

All the packages are the latest ones via Yum Update and same with the xojo library files I always replace them when uploading a new version.

I’m dealing with a couple of issues here not sure if they are related or not.

I can only manage about 20/25 sessions per node before they start becoming sluggish updating the clients e.g via push, updates at times being pushed out to the clients every 2s at peak. What happens is it starts to take 2s to download the push according to Chrome by looking at the network log
I can leave all six nodes running with no one using them and they don’t appear to crash, this would maybe indicate it’s an issue when under load?
In the IPC Socket DataAvailable event it sometimes errors out I’ve tried to capture it with try/catch and logging the reason/error number to file but all I get is error code 0 and no reason… the code below is not 100% copy of what it really is I’m just giving an quick example of what I’m doing. The IPC Socket will receive data at peak every 2s for a few seconds then die down again for 20s.
I’ve seen the server load reach about 4.00 and it sometimes reports in top that the cpu for one of the web apps is say 130% for a few seconds before dropping back down.
Am I asking to much of Xojo.


  Try

Data = DefineEncoding(ReadAll, Encodings.ASCII)

LineSplit = Data.Split(EndOfLine.Windows)

For i As Integer = 0 To LineSplit.Ubound

TabSplit =  LineSplit(i).Split(Chr(9))

If i = 0 Then

x1 = TabSplit(0)
x2 = TabSplit(1)
x3 = TabSplit(3)

ElseIf i = 1 Then

x4 = TabSplit(0)
x5 = TabSplit(1)
x6 = TabSplit(3)

End If

Next i

' Loop on Pages
For i As Integer = 0 To Pages.Ubound

Pages.label1 = x1
Pages.label2 = x2
Pages.label3 = x3
Pages.label4 = x4
Pages.label5 = x5
Pages.label6 = x6

Next i

  Catch err As RunTimeException
    
    ' Debug Message
    System.DebugLog "DataAvailable " + Err.Reason + " " + Err.ErrorNumber.ToText, -1
    
    ' Exit Sub
    Exit Sub
    
  End Try

John_Joyce · April 17, 2015, 11:13pm

Are all the instances of your webapp trying to use the same IPC socket? Or are different instances using different sockets?