acer

32632 Reputation

29 Badges

20 years, 45 days
Ontario, Canada

Social Networks and Content at Maplesoft.com

MaplePrimes Activity


These are replies submitted by acer

@Markiyan Hirnyk The suggestion to use rhs(sol[1]) instead of eval(k[1],sol) is not so widely reliable, because in older versions of Maple (or in recent Maple if launched with --setsort=0) using rhs(sol[1]) may produce an incorrect answer.

The reason is that in some older Maple versions (or with --setsort=0) the elements of the set sol could be in a session dependent order. The k[2]=... could be sol[1], so sol[1] is not generally reliable as a means to extract the value in the equation k[1]=...

Eg, in Maple 11.02 (or Maple 16.0 launched with --setsort=0) I see,

> restart:
> k[2]: k[2]=666.0: k[1]=5.0:
> sol:={k[1]=5.0,k[2]=666.0};
                       sol := {k[2] = 666.0, k[1] = 5.0}

> rhs(sol[1]);
                                     666.0

> restart:
> sol:={k[1]=5.0,k[2]=666.0};
                       sol := {k[1] = 5.0, k[2] = 666.0}

> rhs(sol[1]);
                                      5.0

@Markiyan Hirnyk The suggestion to use rhs(sol[1]) instead of eval(k[1],sol) is not so widely reliable, because in older versions of Maple (or in recent Maple if launched with --setsort=0) using rhs(sol[1]) may produce an incorrect answer.

The reason is that in some older Maple versions (or with --setsort=0) the elements of the set sol could be in a session dependent order. The k[2]=... could be sol[1], so sol[1] is not generally reliable as a means to extract the value in the equation k[1]=...

Eg, in Maple 11.02 (or Maple 16.0 launched with --setsort=0) I see,

> restart:
> k[2]: k[2]=666.0: k[1]=5.0:
> sol:={k[1]=5.0,k[2]=666.0};
                       sol := {k[2] = 666.0, k[1] = 5.0}

> rhs(sol[1]);
                                     666.0

> restart:
> sol:={k[1]=5.0,k[2]=666.0};
                       sol := {k[1] = 5.0, k[2] = 666.0}

> rhs(sol[1]);
                                      5.0

@dj_gssst You are using sol[1] and sol[2] inappropriately.

Look at what sol[1] and sol[2] are.

sol[1];

         k[1] = 0.001637935477698018732276851530308612938004131156280124580147547622858728899732152932156471141157968415

sol[2];

         k[2] = 0.8738880449255668655493761487129800277295707224229803653538821365150932040118518225052273427259289080

Those are both equations, not names or scalar values.

If you want to use the name k[1] then just use it, not sol[1]. If you want to use the value in that equation returned by fsolve then use eval(k[1],sol), and not sol[1].

It's somewhat irritating to wait for this site to load all the image of 2D Math that you've inlined into your posts. It would be quicker and nicer (in future) if you could please just type in your comments here, along with the link to your uploaded worksheets heavy with 2D Math.

@dj_gssst You are using sol[1] and sol[2] inappropriately.

Look at what sol[1] and sol[2] are.

sol[1];

         k[1] = 0.001637935477698018732276851530308612938004131156280124580147547622858728899732152932156471141157968415

sol[2];

         k[2] = 0.8738880449255668655493761487129800277295707224229803653538821365150932040118518225052273427259289080

Those are both equations, not names or scalar values.

If you want to use the name k[1] then just use it, not sol[1]. If you want to use the value in that equation returned by fsolve then use eval(k[1],sol), and not sol[1].

It's somewhat irritating to wait for this site to load all the image of 2D Math that you've inlined into your posts. It would be quicker and nicer (in future) if you could please just type in your comments here, along with the link to your uploaded worksheets heavy with 2D Math.

@jimmyinhmb There's a lot to consider here. I'll have to leave parallelization of it until another day, hope that's ok.

Your elementwise approach is mostly getting bogged down by memory management, I think. Consider the statement,

        G1[Mid..Last] := G1[Mid..Last] + exp~( c +~ H[Mid..Last] ):

I count at least five overt Vector workspaces being created there. Two to extract H[Mid..Last] and G1[Mid..Last] on the RHS. One more to do elementwise adddition of `c`, and another to do elementwise exponentiation. And then another to do the final addition (prior to recopying into G1).

And the second method below cuts that down to just two overt Vector workspaces created, and gets some improvement. Further improvement comes from getting rid of all iterated temp workspace creation, and acting purely inplace. (A re-usable workspace or two, created just once outside the loop can really help performance.) It seems natural to conclude that the improvements are largely due to avoiding repeated Vector creation and collection as garbage: ie. avoidable memory management. And, indeed, this aspect of the performance improvement seems to match the reduction in kernelopts(bytesused) which is one measure of the amount of memory management.

It might be work noting that simply putting the whole job into a single procedure that acts inplace on G, and merely Compiling that, is almost fastest. And yet it is one of the easiest to code.

The following output is from a run with Maple 16.01 on a Core2 Duo running Windows XP. I saw some variation with regard to the relative performance of methods 3-to-5 below when run on Windows 7 or Linux 64bit.

 

restart:

(N,maxiter):=10^5, 10^3:

(Mid, Last):=floor(N/2),N; c:=1.3e-7:

50000, 100000

G:=LinearAlgebra:-RandomVector[row](N,generator=0.0..0.1e-6,outputoptions=[datatype=float[8]]):
# Make a bunch of these, to test the various results, since the job acts inplace.
G1:=copy(G):G2:=copy(G):G3:=copy(G):G4:=copy(G):G5:=copy(G):
H:=LinearAlgebra:-RandomVector[row](N,generator=0.0..0.1e-6,outputoptions=[datatype=float[8]]):

# usage: DAXPY(num,c,x,offsetx,incx,y,offsety,incy)
DAXPY:=ExternalCalling:-DefineExternal('hw_f06ecf',
                                       ExternalCalling:-ExternalLibraryName("linalg")):

EXP:=proc(start::integer[4],finish::integer[4],R::Vector(datatype=float[8]))
   local i::integer[4]:
   for i from start to finish do R[i]:=exp(R[i]); end do:
   NULL;
end proc:
cEXP:=Compiler:-Compile(EXP):

ALL:=proc(start::integer[4],finish::integer[4],R1::Vector(datatype=float[8]),
                                               R2::Vector(datatype=float[8]),
                                               c::float[8])
   local i::integer[4]:
   for i from start to finish do R1[i]:=R1[i]+exp(c+R2[i]); end do:
   NULL;
end proc:
cALL:=Compiler:-Compile(ALL):

run:=[true,true,true,true,true,true]:

if run[1] then
st,str:=time(),time[real](): bu:=kernelopts(bytesused):
   for p from 1 to maxiter do
      # This creates at five temporary Vectors, at each iteration.
      G1[Mid..Last] := G1[Mid..Last] + exp~( c +~ H[Mid..Last] ):
   end do:
print(time()-st,time[real]()-str); print(kernelopts(bytesused)-bu);
end if:

12.359, 12.406

2001770640

if run[2] then
st,str:=time(),time[real](): bu:=kernelopts(bytesused):
   for p from 1 to maxiter do
      # This creates two temporary Vectors, at each iteration.
      G2[Mid..Last] := LinearAlgebra:-VectorAdd( G2[Mid..Last],
                                                 map[evalhf,inplace](exp, H[Mid..Last] ),
                                                 1, evalhf(exp(c)), inplace ):
   end do:
print(time()-st,time[real]()-str); print(kernelopts(bytesused)-bu);
end if:

5.875, 5.906

806437012

if run[3] then
C3:=Vector[row](N,datatype=float[8]): # We'll re-use this, after Aliasing to right length.
st,str:=time(),time[real](): bu:=kernelopts(bytesused):
   for p from 1 to maxiter do
      # This creates at zero temporary Vectors, at each iteration.
      C3midlast:=ArrayTools:-Alias(C3,0,[1..Last-Mid+1]):
      ArrayTools:-Copy(Last-Mid+1,H,Mid-1,1,C3midlast):
      map[evalhf,inplace](exp,C3midlast):
      DAXPY(Last-Mid+1,evalhf(exp(c)),C3,0,1,G3,Mid-1,1);
   end do:
print(time()-st,time[real]()-str); print(kernelopts(bytesused)-bu);
end if:

3.031, 3.031

365548

if run[4] then
C4:=Vector[row](N,datatype=float[8]): # We'll re-use this, and no need to Alias it.
st,str:=time(),time[real](): bu:=kernelopts(bytesused):
   for p from 1 to maxiter do
      # This creates at zero temporary Vectors, at each iteration.
      ArrayTools:-Copy(Last-Mid+1,H,Mid-1,1,C4):
      cEXP(1,Last-Mid+1,C4):
      DAXPY(Last-Mid+1,evalhf(exp(c)),C4,0,1,G4,Mid-1,1);
   end do:
print(time()-st,time[real]()-str); print(kernelopts(bytesused)-bu);
end if:

3.484, 3.500

167212

if run[5] then
st,str:=time(),time[real](): bu:=kernelopts(bytesused):
   for p from 1 to maxiter do
      # This creates at zero temporary Vectors, at each iteration.
      cALL(Mid,Last,G5,H,c);
   end do:
print(time()-st,time[real]()-str); print(kernelopts(bytesused)-bu);
end if:

3.485, 3.484

31212

# Discrepencies based on using exp(c+h)=exp(c)*exp(h) may or may not
# appear if you are doing something other than just repeating adding
# terms involving exp of H. Your computation may do something different,
# and you'd want to check for correctness, at high iterations. My crude
# example is for timing, and understandably for constant c the accuracy
# is affected by maxiter the number of iterations.

LinearAlgebra:-Norm(zip(`/`,(G1-G2),G1));
LinearAlgebra:-Norm(zip(`/`,(G1-G3),G1));
LinearAlgebra:-Norm(zip(`/`,(G1-G4),G1));
LinearAlgebra:-Norm(zip(`/`,(G1-G5),G1));

0.570707844023472596e-13

0.555928563343654250e-13

0.555928563343654250e-13

0.

 

 

Download maptry3.mw

@jimmyinhmb There's a lot to consider here. I'll have to leave parallelization of it until another day, hope that's ok.

Your elementwise approach is mostly getting bogged down by memory management, I think. Consider the statement,

        G1[Mid..Last] := G1[Mid..Last] + exp~( c +~ H[Mid..Last] ):

I count at least five overt Vector workspaces being created there. Two to extract H[Mid..Last] and G1[Mid..Last] on the RHS. One more to do elementwise adddition of `c`, and another to do elementwise exponentiation. And then another to do the final addition (prior to recopying into G1).

And the second method below cuts that down to just two overt Vector workspaces created, and gets some improvement. Further improvement comes from getting rid of all iterated temp workspace creation, and acting purely inplace. (A re-usable workspace or two, created just once outside the loop can really help performance.) It seems natural to conclude that the improvements are largely due to avoiding repeated Vector creation and collection as garbage: ie. avoidable memory management. And, indeed, this aspect of the performance improvement seems to match the reduction in kernelopts(bytesused) which is one measure of the amount of memory management.

It might be work noting that simply putting the whole job into a single procedure that acts inplace on G, and merely Compiling that, is almost fastest. And yet it is one of the easiest to code.

The following output is from a run with Maple 16.01 on a Core2 Duo running Windows XP. I saw some variation with regard to the relative performance of methods 3-to-5 below when run on Windows 7 or Linux 64bit.

 

restart:

(N,maxiter):=10^5, 10^3:

(Mid, Last):=floor(N/2),N; c:=1.3e-7:

50000, 100000

G:=LinearAlgebra:-RandomVector[row](N,generator=0.0..0.1e-6,outputoptions=[datatype=float[8]]):
# Make a bunch of these, to test the various results, since the job acts inplace.
G1:=copy(G):G2:=copy(G):G3:=copy(G):G4:=copy(G):G5:=copy(G):
H:=LinearAlgebra:-RandomVector[row](N,generator=0.0..0.1e-6,outputoptions=[datatype=float[8]]):

# usage: DAXPY(num,c,x,offsetx,incx,y,offsety,incy)
DAXPY:=ExternalCalling:-DefineExternal('hw_f06ecf',
                                       ExternalCalling:-ExternalLibraryName("linalg")):

EXP:=proc(start::integer[4],finish::integer[4],R::Vector(datatype=float[8]))
   local i::integer[4]:
   for i from start to finish do R[i]:=exp(R[i]); end do:
   NULL;
end proc:
cEXP:=Compiler:-Compile(EXP):

ALL:=proc(start::integer[4],finish::integer[4],R1::Vector(datatype=float[8]),
                                               R2::Vector(datatype=float[8]),
                                               c::float[8])
   local i::integer[4]:
   for i from start to finish do R1[i]:=R1[i]+exp(c+R2[i]); end do:
   NULL;
end proc:
cALL:=Compiler:-Compile(ALL):

run:=[true,true,true,true,true,true]:

if run[1] then
st,str:=time(),time[real](): bu:=kernelopts(bytesused):
   for p from 1 to maxiter do
      # This creates at five temporary Vectors, at each iteration.
      G1[Mid..Last] := G1[Mid..Last] + exp~( c +~ H[Mid..Last] ):
   end do:
print(time()-st,time[real]()-str); print(kernelopts(bytesused)-bu);
end if:

12.359, 12.406

2001770640

if run[2] then
st,str:=time(),time[real](): bu:=kernelopts(bytesused):
   for p from 1 to maxiter do
      # This creates two temporary Vectors, at each iteration.
      G2[Mid..Last] := LinearAlgebra:-VectorAdd( G2[Mid..Last],
                                                 map[evalhf,inplace](exp, H[Mid..Last] ),
                                                 1, evalhf(exp(c)), inplace ):
   end do:
print(time()-st,time[real]()-str); print(kernelopts(bytesused)-bu);
end if:

5.875, 5.906

806437012

if run[3] then
C3:=Vector[row](N,datatype=float[8]): # We'll re-use this, after Aliasing to right length.
st,str:=time(),time[real](): bu:=kernelopts(bytesused):
   for p from 1 to maxiter do
      # This creates at zero temporary Vectors, at each iteration.
      C3midlast:=ArrayTools:-Alias(C3,0,[1..Last-Mid+1]):
      ArrayTools:-Copy(Last-Mid+1,H,Mid-1,1,C3midlast):
      map[evalhf,inplace](exp,C3midlast):
      DAXPY(Last-Mid+1,evalhf(exp(c)),C3,0,1,G3,Mid-1,1);
   end do:
print(time()-st,time[real]()-str); print(kernelopts(bytesused)-bu);
end if:

3.031, 3.031

365548

if run[4] then
C4:=Vector[row](N,datatype=float[8]): # We'll re-use this, and no need to Alias it.
st,str:=time(),time[real](): bu:=kernelopts(bytesused):
   for p from 1 to maxiter do
      # This creates at zero temporary Vectors, at each iteration.
      ArrayTools:-Copy(Last-Mid+1,H,Mid-1,1,C4):
      cEXP(1,Last-Mid+1,C4):
      DAXPY(Last-Mid+1,evalhf(exp(c)),C4,0,1,G4,Mid-1,1);
   end do:
print(time()-st,time[real]()-str); print(kernelopts(bytesused)-bu);
end if:

3.484, 3.500

167212

if run[5] then
st,str:=time(),time[real](): bu:=kernelopts(bytesused):
   for p from 1 to maxiter do
      # This creates at zero temporary Vectors, at each iteration.
      cALL(Mid,Last,G5,H,c);
   end do:
print(time()-st,time[real]()-str); print(kernelopts(bytesused)-bu);
end if:

3.485, 3.484

31212

# Discrepencies based on using exp(c+h)=exp(c)*exp(h) may or may not
# appear if you are doing something other than just repeating adding
# terms involving exp of H. Your computation may do something different,
# and you'd want to check for correctness, at high iterations. My crude
# example is for timing, and understandably for constant c the accuracy
# is affected by maxiter the number of iterations.

LinearAlgebra:-Norm(zip(`/`,(G1-G2),G1));
LinearAlgebra:-Norm(zip(`/`,(G1-G3),G1));
LinearAlgebra:-Norm(zip(`/`,(G1-G4),G1));
LinearAlgebra:-Norm(zip(`/`,(G1-G5),G1));

0.570707844023472596e-13

0.555928563343654250e-13

0.555928563343654250e-13

0.

 

 

Download maptry3.mw

@Alejandro Jakubi By "computing with identifiers" I meant just using them only as names, not parsing them. Clearly typeset names which look the same would also have to be the same (internally), or else mixing them arithmetically and otherwise would not be so useful.

Thanks, about the brackets.

By trying a few tricks I got some hints that Typesetting:-Typeset may not get called at all when one does right-click conversion to Atomic Identifier. That makes getting the exact same structures problematic. It'd be easier to use the same basic mechanism, than to fix up the results from something else.

@AliKhan Is this what you mean?

A:=LinearAlgebra:-RandomMatrix(3,outputoptions=[shape=diagonal]);

                                  [27   0    0]
                                  [           ]
                             A := [ 0  -4    0]
                                  [           ]
                                  [ 0   0  -74]

B:=Matrix(3,3):

for i from 1 to 3 do B[i,i]:=A; end do:

B;

                [[27   0    0]                              ]
                [[           ]                              ]
                [[ 0  -4    0]        0              0      ]
                [[           ]                              ]
                [[ 0   0  -74]                              ]
                [                                           ]
                [               [27   0    0]               ]
                [               [           ]               ]
                [      0        [ 0  -4    0]        0      ]
                [               [           ]               ]
                [               [ 0   0  -74]               ]
                [                                           ]
                [                              [27   0    0]]
                [                              [           ]]
                [      0              0        [ 0  -4    0]]
                [                              [           ]]
                [                              [ 0   0  -74]]

B[2,2];

                                [27   0    0]
                                [           ]
                                [ 0  -4    0]
                                [           ]
                                [ 0   0  -74]

Most of regular LinearAlgebra will not be able to handle B, although arithmetic and some simpler things might work. (You might be able to teach LinearAlgebra[Generic] to do more...)

@AliKhan Is this what you mean?

A:=LinearAlgebra:-RandomMatrix(3,outputoptions=[shape=diagonal]);

                                  [27   0    0]
                                  [           ]
                             A := [ 0  -4    0]
                                  [           ]
                                  [ 0   0  -74]

B:=Matrix(3,3):

for i from 1 to 3 do B[i,i]:=A; end do:

B;

                [[27   0    0]                              ]
                [[           ]                              ]
                [[ 0  -4    0]        0              0      ]
                [[           ]                              ]
                [[ 0   0  -74]                              ]
                [                                           ]
                [               [27   0    0]               ]
                [               [           ]               ]
                [      0        [ 0  -4    0]        0      ]
                [               [           ]               ]
                [               [ 0   0  -74]               ]
                [                                           ]
                [                              [27   0    0]]
                [                              [           ]]
                [      0              0        [ 0  -4    0]]
                [                              [           ]]
                [                              [ 0   0  -74]]

B[2,2];

                                [27   0    0]
                                [           ]
                                [ 0  -4    0]
                                [           ]
                                [ 0   0  -74]

Most of regular LinearAlgebra will not be able to handle B, although arithmetic and some simpler things might work. (You might be able to teach LinearAlgebra[Generic] to do more...)

Apparently I have trouble remembering my own advice.

A better double loop in the Compilable BlockAdd procedure above would walk entries by column (and not by row) in the innermost loop because my example uses Fortran_order Matrices by default. Ie,

   for j from sxj to exj do   
      for i from sxi to exi do
         x[i,j]:=x[i,j]+y[i+syi-sxi,j+syj-sxj];
      end do;
   end do;

With that change, and by reducing some of the indexing integer-arithmetic, the Compiled version gets about 30% faster (but varying with platform). Still not as fast as the block-copy & daxpy. But it also might be threaded, after splitting the action by column (which I have not tried).

Apparently I have trouble remembering my own advice.

A better double loop in the Compilable BlockAdd procedure above would walk entries by column (and not by row) in the innermost loop because my example uses Fortran_order Matrices by default. Ie,

   for j from sxj to exj do   
      for i from sxi to exi do
         x[i,j]:=x[i,j]+y[i+syi-sxi,j+syj-sxj];
      end do;
   end do;

With that change, and by reducing some of the indexing integer-arithmetic, the Compiled version gets about 30% faster (but varying with platform). Still not as fast as the block-copy & daxpy. But it also might be threaded, after splitting the action by column (which I have not tried).

@jimmyinhmb It looks as if the block-copying & daxpy apporach is faster than the (serial) Compiled.

First, though, I apologize for pointing you at some wrong lines of code above. In Maple 15, the call out to f06ecf/daxpy would be more like this line in the Library,

showstat((LinearAlgebra::LA_External)::MatrixAdd,52);
       ...
  52     HWCall[2](lengthA,evalf(c2),localB,0,1,localA,0,1)
       ...

Here's a comparison, done on 64bit Maple on Windows 7,

restart: #randomize():

(m,n):=10^5,10: maxiter:=10^2:

M1:=LinearAlgebra:-RandomMatrix(m,n,outputoptions=[datatype=float[8]]):
M2:=LinearAlgebra:-RandomMatrix(m,n,outputoptions=[datatype=float[8]]):

DAXPY:=proc(num,c,x,offsetx,incx,y,offsety,incy)
   # http://www.netlib.org/blas/daxpy.f
   # http://en.wikipedia.org/wiki/SAXPY
   local extlib, extdaxpy;
   extlib:=ExternalCalling:-ExternalLibraryName("linalg");
   extdaxpy:=ExternalCalling:-DefineExternal('hw_f06ecf',extlib);
   extdaxpy(num,evalf(c),x,offsetx,incx,y,offsety,incy);
   NULL;
end proc:

(tm,tn):=floor(m/2),n:
T1:=Matrix(tm,tn,datatype=float[8]):
T2:=Matrix(tm,tn,datatype=float[8]):

st,str:=time(),time[real]():
for iter from 1 to maxiter do
  ArrayTools:-BlockCopy(M1,m-tm,m,tm,tn,T1,m-tm);
  ArrayTools:-BlockCopy(M2,m-tm,m,tm,tn,T2,m-tm);
  DAXPY(tm*tn,1.0,T2,0,1,T1,0,1);
  ArrayTools:-BlockCopy(T1,0,1,1,tm*tn,M1,m-tm,m,tm,tn);
end do:
time()-st,time[real]()-str;

                          0.873, 0.874

restart: #randomize():

(m,n):=10^5,10: maxiter:=10^2:

M1:=LinearAlgebra:-RandomMatrix(m,n,outputoptions=[datatype=float[8]]):
M2:=LinearAlgebra:-RandomMatrix(m,n,outputoptions=[datatype=float[8]]):

BlockAdd:=proc(x::Matrix(datatype=float[8]),
               sxi::integer[4],exi::integer[4],
               sxj::integer[4],exj::integer[4],
               y::Matrix(datatype=float[8]),
               syi::integer[4],syj::integer[4])
   local i::integer[4], j::integer[4];
   for i from 0 to exi-sxi do
      for j from 0 to exj-sxj do
         x[i+sxi,j+sxj]:=x[i+sxi,j+sxj]+y[i+syi,j+syj];
      end do;
   end do;
   NULL;
end proc:

try
  cBlockAdd:=Compiler:-Compile(BlockAdd):
  compiled:=true:
catch:
end try:

if compiled then

st,str:=time(),time[real]():
for iter from 1 to maxiter do
   cBlockAdd(M1,floor(m/2)+1,m,1,n,M2,floor(m/2)+1,1);
end do:
print(time()-st,time[real]()-str);

end if:

                          2.746, 2.737

Interestingly, the MKL did not appear to use more than a single core, when running daxpy, on this 4-physical core (8-hyperthreaded) i7. That seems borne out by the total-cpu time vs real-time ration, as well as by not exceeding 12% Usage in Windows Task Manager. But MKL does use more cores for other tasks, like Matrix-Matrix multiplication. Maybe Intel has not yet threaded all level-2 BLAS. I don't know.

Hopefully, the block-copy approach will be fast enough.

acer

@jimmyinhmb It looks as if the block-copying & daxpy apporach is faster than the (serial) Compiled.

First, though, I apologize for pointing you at some wrong lines of code above. In Maple 15, the call out to f06ecf/daxpy would be more like this line in the Library,

showstat((LinearAlgebra::LA_External)::MatrixAdd,52);
       ...
  52     HWCall[2](lengthA,evalf(c2),localB,0,1,localA,0,1)
       ...

Here's a comparison, done on 64bit Maple on Windows 7,

restart: #randomize():

(m,n):=10^5,10: maxiter:=10^2:

M1:=LinearAlgebra:-RandomMatrix(m,n,outputoptions=[datatype=float[8]]):
M2:=LinearAlgebra:-RandomMatrix(m,n,outputoptions=[datatype=float[8]]):

DAXPY:=proc(num,c,x,offsetx,incx,y,offsety,incy)
   # http://www.netlib.org/blas/daxpy.f
   # http://en.wikipedia.org/wiki/SAXPY
   local extlib, extdaxpy;
   extlib:=ExternalCalling:-ExternalLibraryName("linalg");
   extdaxpy:=ExternalCalling:-DefineExternal('hw_f06ecf',extlib);
   extdaxpy(num,evalf(c),x,offsetx,incx,y,offsety,incy);
   NULL;
end proc:

(tm,tn):=floor(m/2),n:
T1:=Matrix(tm,tn,datatype=float[8]):
T2:=Matrix(tm,tn,datatype=float[8]):

st,str:=time(),time[real]():
for iter from 1 to maxiter do
  ArrayTools:-BlockCopy(M1,m-tm,m,tm,tn,T1,m-tm);
  ArrayTools:-BlockCopy(M2,m-tm,m,tm,tn,T2,m-tm);
  DAXPY(tm*tn,1.0,T2,0,1,T1,0,1);
  ArrayTools:-BlockCopy(T1,0,1,1,tm*tn,M1,m-tm,m,tm,tn);
end do:
time()-st,time[real]()-str;

                          0.873, 0.874

restart: #randomize():

(m,n):=10^5,10: maxiter:=10^2:

M1:=LinearAlgebra:-RandomMatrix(m,n,outputoptions=[datatype=float[8]]):
M2:=LinearAlgebra:-RandomMatrix(m,n,outputoptions=[datatype=float[8]]):

BlockAdd:=proc(x::Matrix(datatype=float[8]),
               sxi::integer[4],exi::integer[4],
               sxj::integer[4],exj::integer[4],
               y::Matrix(datatype=float[8]),
               syi::integer[4],syj::integer[4])
   local i::integer[4], j::integer[4];
   for i from 0 to exi-sxi do
      for j from 0 to exj-sxj do
         x[i+sxi,j+sxj]:=x[i+sxi,j+sxj]+y[i+syi,j+syj];
      end do;
   end do;
   NULL;
end proc:

try
  cBlockAdd:=Compiler:-Compile(BlockAdd):
  compiled:=true:
catch:
end try:

if compiled then

st,str:=time(),time[real]():
for iter from 1 to maxiter do
   cBlockAdd(M1,floor(m/2)+1,m,1,n,M2,floor(m/2)+1,1);
end do:
print(time()-st,time[real]()-str);

end if:

                          2.746, 2.737

Interestingly, the MKL did not appear to use more than a single core, when running daxpy, on this 4-physical core (8-hyperthreaded) i7. That seems borne out by the total-cpu time vs real-time ration, as well as by not exceeding 12% Usage in Windows Task Manager. But MKL does use more cores for other tasks, like Matrix-Matrix multiplication. Maybe Intel has not yet threaded all level-2 BLAS. I don't know.

Hopefully, the block-copy approach will be fast enough.

acer

@Markiyan Hirnyk The matter of whether to extrapolate the surface or to introduce new cutting planes, so as to obtain a closed region, is relatively trivial to the task of constructing a procedure that can determine the interpolated surface.

The second question asked was, "how to define whether the arbitrary point (x,y,z) is inside this surface or not?" Another interpretation is that he wishes to determine whether a selected point lies approximately on the surface (as opposed to meaning within some region bounded in part by the surface).

Only minor changes to the worksheet I uploaded are necessary, to address one of these interpretations versus the other. The key part is the procedure which interpolates the surface, between the data points.

He can test that func(n,mx)=my approximately, to test whether point (mx,my,n) is on the surface. Or he can bound a region (by extrapolation or introducing planes, or whatever) and test using an inequality. That's still up to him, and is relatively easy compared to procedurally generating the approximating surface.

For example, one could use extrapolation=undefined instead of extrapolation=true in the call to ArrayInterpolation within the defn of `S`. After which a plot of the points and surface (at grid=[100,100]) can look like this:

The procedure `func` may be more useful than the mere plot returned by `surfdata`, for answering his question 2).

I personally found it interesting because of the method of mapping to a regular grid, which I had not done before by can imagine using again. So I learned something.

acer

@Markiyan Hirnyk The matter of whether to extrapolate the surface or to introduce new cutting planes, so as to obtain a closed region, is relatively trivial to the task of constructing a procedure that can determine the interpolated surface.

The second question asked was, "how to define whether the arbitrary point (x,y,z) is inside this surface or not?" Another interpretation is that he wishes to determine whether a selected point lies approximately on the surface (as opposed to meaning within some region bounded in part by the surface).

Only minor changes to the worksheet I uploaded are necessary, to address one of these interpretations versus the other. The key part is the procedure which interpolates the surface, between the data points.

He can test that func(n,mx)=my approximately, to test whether point (mx,my,n) is on the surface. Or he can bound a region (by extrapolation or introducing planes, or whatever) and test using an inequality. That's still up to him, and is relatively easy compared to procedurally generating the approximating surface.

For example, one could use extrapolation=undefined instead of extrapolation=true in the call to ArrayInterpolation within the defn of `S`. After which a plot of the points and surface (at grid=[100,100]) can look like this:

The procedure `func` may be more useful than the mere plot returned by `surfdata`, for answering his question 2).

I personally found it interesting because of the method of mapping to a regular grid, which I had not done before by can imagine using again. So I learned something.

acer

First 405 406 407 408 409 410 411 Last Page 407 of 597