Are #Pragmas Run-Time, Compile-Time or Both?

I have a scientific application that is running very slowly. I have added a number of pragmas to speed it up, but not noticing any significant differences. Here is an example of a run-time pragma that I have used:

if Qfast then
#pragma backgroundtasks false
#pragma boundschecking false
#pragma nilobjectchecking false
#pragma stackoverflowchecking false
end if

Qfast can be turned on by the user in a dialogue box prior to pressing “Run”. The problem is, it seems to make no difference.

I am wondering if the above pragmas must be implemented at compile time. Based on the Language Reference, it appears that they can be turned on at run time. Is this wrong? Must I provide two methods, one with pragmas on during compile and one without, and select the fast method at run time instead?

Pragma’s a compiler directives, not code, so they take effect at compile time wherever you place them. They do not care what the code is before or after them.

Your example code does this:

if Qfast then
// Apply pragmas to the rest of this method
// or until another, opposite pragma is reached

Notice I didn’t include the “end if” because it doesn’t matter. The pragma’s stay in effect anyway.

In fact, you should not use these during debugging so you might want to do something like this:

#if DebugBuild
  #pragma BackgroundTasks False
  // etc
#endif

But that’s a different conversation. In any event, your Qfast flag isn’t actually doing anything which is why you’re not seeing a difference.

Can you post your code so we can see if it can be sped up?

Another thing to be aware of, pragmas affect only the code in the method they are place in. They do not “carry forward” into methods that code may call. If you’re calling a lot of subroutines, each of them must have their own set of pragmas.

Tim,
Thanks. I knew about the need for pragmas in each method. What I just learned from Kem is that they are pure compile time directives. I should have known.

Kem,

Now this all makes sense to me. I will have to compile two versions of the methods, and one with pragmas and one without, and select the fast one at run time. That is certainly doable. I do not want these pragmas active during debugging, however.

There is too much code to post all of it, but below is a function that is called a million times or more in a given calculation. You can see it involves lots of math and access to arrays, which is why I wanted to turn off bounds checking, among other things. If you see anything that I can do to speed it up, please let me know. My program spends about 50% of its time in this function.

function two_electron_integral(i0 as integer, j0 as integer, k0 as integer, l0 as integer, w() as double)
// w(1…n2) holds the electron repulsion terms in linear format
dim lid as boolean
dim spcg,wint,atemp,temp,temp1,t0 as double
dim i,j,k,l,kk,itype,id,ii,ia,ib,ja,jb,jj,i1 as integer
dim ix,iy,j1,k1 as integer
dim is1,is0,iz,izn,iminus as integer
static c1(1),c2(1),c3(1),c4(1) as double

if Q_mndo_fast or psdci.Qfast then
#pragma backgroundtasks false
#pragma boundschecking false
#pragma nilobjectchecking false
#pragma stackoverflowchecking false
end if

if ubound(c1)<norbs then
redim c1(norbs)
redim c2(norbs)
redim c3(norbs)
redim c4(norbs)
end if

for i=1 to norbs
c1(i)=v(i,i0)
c2(i)=v(i,j0)
c3(i)=v(i,k0)
c4(i)=v(i,l0)
next
lid=true
spcg = 0.0
kk = 0
for ii = 1 to numat // do ii = 1, numat
ia = nfirst(ii)
ib = nlast(ii)
iminus = ii - 1
for jj = 1 to iminus // do jj = 1, iminus
ja = nfirst(jj)
jb = nlast(jj)
for i = ia to ib
for j = ia to i
for k = ja to jb
for l = ja to k
kk = kk + 1

           spcg=spcg+wint*(c1(i)*c2(j)*c3(k)*c4(l)+c1(k)*c2(l)*c3(i)*c4(j))
            if (i <> j) then spcg = spcg + wint*(c1(j)*c2(i)*c3(k)*c4(l)+c1(k)*c2(l)*c3(j)*c4(i))
            if (k <> l) then spcg = spcg + wint*(c1(i)*c2(j)*c3(l)*c4(k)+c1(l)*c2(k)*c3(i)*c4(j))
            if (i <> j) and (k <> l) then  spcg = spcg +wint*(c1(j)*c2(i)*c3(l)*c4(k) + c1(l)*c2(k)*c3(j)*c4(i))

        next
      next
    next
  next
next

next
atemp = spcg
is1 = 0
for i1 = 1 to numat // do i1 = 1, numat
is1 = is1 + 1
izn = nat(i1)
spcg = spcg + c1(is1)*c2(is1)*c3(is1)*c4(is1)gss(izn)
if (izn >= 3) then
is0 = is1
is1 = is1 + 1
ix = is1
is1 = is1 + 1
iy = is1
is1 = is1 + 1
iz = is1
spcg = spcg + gpp(izn)
(c1(ix)*c2(ix)*c3(ix)*c4(ix)+c1(iy)*c2(iy)*c3(iy)*c4(iy)+c1(iz)*c2(iz)*c3(iz)c4(iz))
spcg = spcg + gsp(izn)
(c1(is0)*c2(is0)*c3(ix)*c4(ix)+c1(is0)*c2(is0)*c3(iy)*c4(iy)+c1(is0)*c2(is0)*c3(iz)*c4(iz)+c1(ix)*c2(ix)*c3(is0)*c4(is0)+c1(iy)*c2(iy)*c3(is0)*c4(is0)+c1(iz)*c2(iz)*c3(is0)c4(is0))
spcg = spcg + gp2(izn)
(c1(ix)*c2(ix)*c3(iy)*c4(iy)+c1(ix)*c2(ix)*c3(iz)*c4(iz)+c1(iy)*c2(iy)*c3(iz)*c4(iz)+c1(iy)*c2(iy)*c3(ix)*c4(ix)+c1(iz)*c2(iz)*c3(ix)*c4(ix)+c1(iz)*c2(iz)*c3(iy)c4(iy))
temp1 = hsp(izn)
for j1 = ix to iz
spcg = spcg + temp1
(c1(is0)*c2(j1)*c3(j1)*c4(is0)+c1(is0)*c2(j1)*c3(is0)*c4(j1)+c1(j1)*c2(is0)*c3(is0)*c4(j1)+c1(j1)*c2(is0)*c3(j1)*c4(is0))
next
temp1 = 0.5 (gpp(izn)-gp2(izn))
for j1 = ix to iz
for k1 = ix to iz
if (j1 <> k1) then spcg = spcg + temp1
(c1(j1)*c2(k1)*c3(j1)*c4(k1)+c1(j1)*c2(k1)*c3(k1)*c4(j1))
next
next
end if
next
return spcg
end function

Could:

is0 = is1 is1 = is1 + 1 ix = is1 is1 = is1 + 1 iy = is1 is1 = is1 + 1 iz = is1
be simplified to:

is0 = is1 ix = is0 + 1 iy = is0 + 2 iz = is0 + 3

Hi Scott,

No- that would not work because we need to assign is0, and increment is1 as well… But I could do what you proposed and add is1=is1+3 at the bottom. So your proposal is a good start. However, this was written more for transparency than efficiency, and it is not the integer arithmetic that is slowing things down, but the array access and double multiplications.

Thanks for your help and suggestions,
Bob

[quote]if Q_mndo_fast or psdci.Qfast then
#pragma backgroundtasks false
#pragma boundschecking false
#pragma nilobjectchecking false
#pragma stackoverflowchecking false
end if[/quote]
I may be mistaken, but I think pragmas are effective only within the “code block” they reside in… So your pragmas are only in effect within the if/then block they are enclosed in…

Ah yes… from the LR

I learned something new today.

That’s not necessary AFAIK. Just put that on the top of your method:

#If Not DebugBuild #Pragma BackgroundTasks False #Pragma BoundsChecking False #Pragma NilObjectChecking False #Pragma StackOverflowChecking False #Endif // your code here

Ptr access is faster than array, especially with all the pragmas on. For doubles it’s about 20-30% faster, for integers up to twice as fast, at least in my timing tests. This timing comes from simple specific code, “v = a(i)” vs “v = p.Double(i*8)”, and may behave different in practice so I usually write both versions and measure.

Other possible speedups…

XojoScript

plugin

declare into library

Will’s right, I have sped up code dramatically by replacing arrays with MemoryBlock/Ptr. But it’s not for the faint of heart as it can be a bit tricky. However, if you don’t mind the challenge, you will see improvements in performance, especially if combined with the pragmas.

For such a complex computation, declaring into an optimized library will probably bring the greatest boost. I’ve seen computation increase at about 500% using the Accelerate framework on iOS for FFTs.

There’s even a few GPU-enhanced libraries (but it surely depends on the system you’re using too):
https://en.wikipedia.org/wiki/List_of_software_for_molecular_mechanics_modeling