Interpreting segfault messages

I’m trying to understand some segfault messages, any ideas?

These messages are appearing in the messages log on CentOS after the web app crashes.

Apr 16 11:10:44 scb kernel: SCB[21433]: segfault at 90 ip b76e3836 sp bfac6ee0 error 4
Apr 16 13:45:17 scb kernel: SCB[7987]: segfault at 654c7375 ip 0132b902 sp 00a59460 error 4 in XojoConsoleFramework32.so[ff8000+755000]
Apr 17 11:33:56 scb kernel: SCB[3957]: segfault at 8308 ip 0103c1b0 sp 020cf370 error 4 in XojoConsoleFramework32.so[c72000+755000]
Apr 17 11:47:38 scb kernel: SCB[1908]: segfault at 90 ip b7687879 sp bf8c4920 error 4

I usually see these when there’s a missing dependency on the machine. What version of centos and what version of the IDE are you using?

The web app starts okay and runs fine then it randomly crashes.

It’s CentOS release 6.6 (Final) and Xojo 2015 R2.

I’d start by checking 3 things:

  1. Make sure you have libicu.i386 installed. That’s a new requirement for r2.
  2. Watch the server to see if you are running out if ram.
  3. Check to see if the server has a process killer running. Some servers have a service that assumes long running processes have hung and need to be killed.
  1. Had to install libicu.i386 other wise it wouldn’t let me start the app, so thats okay.
  2. Ram is fine
[root@scb ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         12052        953      11099          0        122        463
-/+ buffers/cache:        367      11685
Swap:         4063          0       4063
  1. It’s a bog standard centos install with just HaProxy and the web app on it.

I’m running six of these web apps on different ports, it’s not always the same one that crashes and it happens so rarely 2 or 3 times a day.

Do you have any declares in your code?

Nope. It’s a very simple app.

It takes data from an IPC Socket which is simple tab separated data and then pushes that data out to the clients.

Very quick example of what i’m doing below. The app has no input controls it purely displays information.

    
Data = DefineEncoding(ReadAll, Encodings.ASCII)

LineSplit = Data.Split(EndOfLine.Windows)

For i As Integer = 0 To LineSplit.Ubound

TabSplit =  LineSplit(i).Split(Chr(9))

If i = 0 Then

x1 = TabSplit(0)
x2 = TabSplit(1)
x3 = TabSplit(3)

ElseIf i = 1 Then

x4 = TabSplit(0)
x5 = TabSplit(1)
x6 = TabSplit(3)

End If

Next i

' Loop on Pages
For i As Integer = 0 To Pages.Ubound

Pages.label1 = x1
Pages.label2 = x2
Pages.label3 = x3
Pages.label4 = x4
Pages.label5 = x5
Pages.label6 = x6

Next i

How many physical processors does the box have? If you are not sure type “nproc” at the shell.

To more things: Is there anything interesting in /var/log/messages when it crashes?, how are you launching the instances of the app?

It’s got 8 processors and no nothing else in any of the logs around the time of the crashes.

at the moment they are started as a service, I have tried in the past running them in a screen. Still had the same issue

Hmmm - it is hard to say without being able to monitor the status of the server as things are happening. It does sound like a resource limit is being reached of some type…

at least, if you are using upstart to spawn these instances you can configure it to automatically restart them if they fail. Are you using upstart? Are these processes running as root?

Also - what is the output of this command “grep -i killed /var/log/messages*”

had another one short while ago, your see the only message before that one was the one from earlier today.

Apr 17 11:47:38 scb kernel: SCB[1908]: segfault at 90 ip b7687879 sp bf8c4920 error 4
Apr 17 18:42:57 scb kernel: SCB[28483]: segfault at 90 ip b771748b sp bfee9350 error 4

No output from grep -i killed /var/log/messages* from today

Top

top - 18:55:36 up 2 days, 23:52,  3 users,  load average: 0.57, 0.85, 1.06
Tasks: 182 total,   1 running, 181 sleeping,   0 stopped,   0 zombie
Cpu0  :  1.7%us,  1.0%sy,  0.0%ni, 97.0%id,  0.0%wa,  0.3%hi,  0.0%si,  0.0%st
Cpu1  :  3.7%us,  0.7%sy,  0.0%ni, 95.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.3%us,  0.3%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.3%hi,  0.3%si,  0.0%st
Cpu3  :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  1.3%us,  0.3%sy,  0.0%ni, 98.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  1.7%us,  0.0%sy,  0.0%ni, 98.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  12342148k total,  1289536k used, 11052612k free,   129944k buffers
Swap:  4161532k total,        0k used,  4161532k free,   573328k cached

[quote=180952:@John Joyce]Hmmm - it is hard to say without being able to monitor the status of the server as things are happening. It does sound like a resource limit is being reached of some type…

at least, if you are using upstart to spawn these instances you can configure it to automatically restart them if they fail. Are you using upstart? Are these processes running as root?[/quote]

It’s still in beta so at present no I’m not using upstart and yes they are running as root.

Was it doing this with Xojo2015R1?

This seems to be a memory related problem in the binary (null pointer). It is hard to say what is the cause, could be anything. Some things you could try are: Compile with a different version of Xojo and see what happens. Consider updating the packages on your server “yum update installed” or at least the ones that you have installed for 32 bit support - (sometimes updating packages can cause voodoo). Erase and replace all the Xojo library files and other app resources with fresh copies. Perhaps, if you haven’t already, do a test with just one instance and no haproxy just to rule out that configuration as a problem.

Also - if you use upstart or systemd you can set the instances to restart automatically on crash - which is a workaround but might be helpful.

[quote=180966:@John Joyce]Was it doing this with Xojo2015R1?

This seems to be a memory related problem in the binary (null pointer). It is hard to say what is the cause, could be anything. Some things you could try are: Compile with a different version of Xojo and see what happens. Consider updating the packages on your server “yum update installed” or at least the ones that you have installed for 32 bit support - (sometimes updating packages can cause voodoo). Erase and replace all the Xojo library files and other app resources with fresh copies. Perhaps, if you haven’t already, do a test with just one instance and no haproxy just to rule out that configuration as a problem.

Also - if you use upstart or systemd you can set the instances to restart automatically on crash - which is a workaround but might be helpful.[/quote]

It might have done it under Xojo2015R1 but can’t be sure.

All the packages are the latest ones via Yum Update and same with the xojo library files I always replace them when uploading a new version.

I’m dealing with a couple of issues here not sure if they are related or not.

  1. I can only manage about 20/25 sessions per node before they start becoming sluggish updating the clients e.g via push, updates at times being pushed out to the clients every 2s at peak. What happens is it starts to take 2s to download the push according to Chrome by looking at the network log

  2. I can leave all six nodes running with no one using them and they don’t appear to crash, this would maybe indicate it’s an issue when under load?

  3. In the IPC Socket DataAvailable event it sometimes errors out I’ve tried to capture it with try/catch and logging the reason/error number to file but all I get is error code 0 and no reason… the code below is not 100% copy of what it really is I’m just giving an quick example of what I’m doing. The IPC Socket will receive data at peak every 2s for a few seconds then die down again for 20s.

  4. I’ve seen the server load reach about 4.00 and it sometimes reports in top that the cpu for one of the web apps is say 130% for a few seconds before dropping back down.

  5. Am I asking to much of Xojo.


  Try

Data = DefineEncoding(ReadAll, Encodings.ASCII)

LineSplit = Data.Split(EndOfLine.Windows)

For i As Integer = 0 To LineSplit.Ubound

TabSplit =  LineSplit(i).Split(Chr(9))

If i = 0 Then

x1 = TabSplit(0)
x2 = TabSplit(1)
x3 = TabSplit(3)

ElseIf i = 1 Then

x4 = TabSplit(0)
x5 = TabSplit(1)
x6 = TabSplit(3)

End If

Next i

' Loop on Pages
For i As Integer = 0 To Pages.Ubound

Pages.label1 = x1
Pages.label2 = x2
Pages.label3 = x3
Pages.label4 = x4
Pages.label5 = x5
Pages.label6 = x6

Next i

  Catch err As RunTimeException
    
    ' Debug Message
    System.DebugLog "DataAvailable " + Err.Reason + " " + Err.ErrorNumber.ToText, -1
    
    ' Exit Sub
    Exit Sub
    
  End Try

Are all the instances of your webapp trying to use the same IPC socket? Or are different instances using different sockets?