acer

32363 Reputation

29 Badges

19 years, 332 days
Ontario, Canada

Social Networks and Content at Maplesoft.com

MaplePrimes Activity


These are replies submitted by acer

@Carl Love Yes, that top-level code isn't the same as calling `Convolve` without evalhf. Rather, it is an elementwise (vectorized) use of kernel builtin `*` on float[8] rtables which it knows how to treat well. So, yes, it's already parallelized in another way, and is fast despite the fact that it repeatedly creates `subimage` as a collectible garbage rtable.

Attached is a sheet with a few ideas on speedup. This does not include what I'd expect to be best: Compiled and multithreaded with the Task mode.

The top level code can be optimized at its second stage, by rolling the two nested `add` calls into just one call.

Yes, I believe that using order=C_order on `subimage` and `kerneld` can make it a bit faster -- as long as loops or the nested `add` calls walk along rows at their innermost layer (so as to walk adjacent entries in memory).

More saving can be had by constucting part of the `subimage` indicies, for re-use, at the start of each nested i and j loop. (See `ipart`, `jpart`, and `mindex`.)

And then one can notice that `subimage` is not really needed in such a proc.

restart:

kernelopts(version);

          Maple 18.02, X86 64 WINDOWS, Oct 20 2014, Build ID 991181

k := 3:

filelocation := cat(kernelopts(datadir),"/ship.jpg"):

zimage := ImageTools:-Read(filelocation):

zwidth := ImageTools:-Width(zimage):

zHeight := ImageTools:-Height(zimage):
zimage := ImageTools:-Scale(zimage, 1..2*zHeight, 1..2*zwidth):

kernell := 2*k+1:

kerneld := Matrix(kernell, kernell, fill=1/kernell^2, datatype=float[8], order=C_order):

st:= time():

new1zumage:= ImageTools:-Convolution(zimage,kerneld):

time()-st;

                                    0.203

#ImageTools:-View(new1zumage);

aa:= zimage(1..,1..,1):

bb:= zimage(1..,1..,2):

cc:= zimage(1..,1..,3):

subimage := Matrix(kernell, kernell, datatype=float[8], order=C_order):

newaa := copy(aa):
newaa1 := copy(aa):
newaa2 := copy(aa):
newaa3 := copy(aa):

R := LinearAlgebra:-RowDimension(aa);

C := LinearAlgebra:-ColumnDimension(aa);

                                  R := 800
                                  C := 1200

st := time[real]():

for i from k+1 to R-k-1 do

     for j from k+1 to C-k-1 do

          subimage := aa(i-k..i+k, j-k..j+k) *~ kerneld;
          newaa[i,j] := add(s, s=subimage);

     end do

end do;

T1s := time[real]()-st;

                                T1s := 40.814

st := time[real]():

for i from k+1 to R-k-1 do

     for j from k+1 to C-k-1 do

          subimage := aa(i-k..i+k,j-k..j+k) *~ kerneld;

          newaa[i,j] := add(add(subimage[m,n], m=1..kernell), n=1..kernell)

     end do

end do;

T1:= time[real]()-st;

                                T1 := 54.961

Convolve:= proc(aa, newaa, kerneld, subimage, k, R, C)

local i, j, m, n, `k+1` := k+1, `C-k-1` := C-`k+1`, kernell := 2*k+1;     

     for i from `k+1` to R-k-1 do

          for j from `k+1` to `C-k-1` do

               for m to kernell do

                    for n to kernell do

                         subimage[m,n] := aa[i-k+m-1, j-k+n-1] * kerneld[m,n]

                    end do

               end do;

               newaa[i,j] := add(add(subimage[m,n], m=1..kernell), n=1..kernell)

          end do

     end do

end proc:

st := time[real]():

evalhf(Convolve(aa, newaa1, kerneld, subimage, k, R, C)):

T2 := time[real]()-st;

                                T2 := 28.373

Convolve2:= proc(aa, newaa, kerneld, subimage, k, R, C)

local i, j, m, n, `k+1`:= k+1, `C-k-1` := C-`k+1`, kernell := 2*k+1,
      ipart, jpart, mindex;     

     for i from `k+1` to R-k-1 do
          ipart := i-k-1;

          for j from `k+1` to `C-k-1` do
               jpart := j-k-1;

               for m to kernell do
                    mindex := ipart+m;

                    for n to kernell do

                         subimage[m,n] := aa[mindex, jpart+n] * kerneld[m,n]

                    end do

               end do;

               newaa[i,j] := add(add(subimage[m,n], n=1..kernell), m=1..kernell)

          end do

     end do

end proc:

st := time[real]():

evalhf(Convolve2(aa, newaa2, kerneld, subimage, k, R, C)):

T3 := time[real]()-st;

                                T3 := 24.766

Convolve3:= proc(aa, newaa, kerneld, k, R, C)

local i, j, m, n, `k+1`:= k+1, `C-k-1` := C-`k+1`, kernell := 2*k+1,
      ipart, jpart;     

      for i from `k+1` to R-k-1 do
          ipart := i-k-1;

          for j from `k+1` to `C-k-1` do
               jpart := j-k-1;

               newaa[i,j] := add(add(aa[ipart+m, jpart+n] * kerneld[m,n],
                                     n=1..kernell), m=1..kernell)

          end do

     end do

end proc:

st := time[real]():

evalhf(Convolve3(aa, newaa3, kerneld, k, R, C)):

T4 := time[real]()-st;

                                T4 := 19.051

LinearAlgebra:-Norm(Matrix(newaa)-Matrix(newaa1));
LinearAlgebra:-Norm(Matrix(newaa)-Matrix(newaa2));
LinearAlgebra:-Norm(Matrix(newaa)-Matrix(newaa3));

                                     0.
                                                -14
                          6.30606677987088915 10   
                                                -14
                          6.30606677987088915 10   

 


Download convolve_evalhf.mw

@rollermonkey Your claims about my suggestions appear to be entirely false, and appear to depend on your continued mistyping of key syntax, or misspelling (you now appear to have done with(Plots) not with(plots) as I suggested), or perhaps in part due to a faulty installation. Carl's Answer covers it in detail, I see.

@Carl Love Did you intend,

  subimage[m,n]:= aa[i-k+m-1, j-k+n-1] * kerneld[m,n]

instead of that,

  subimage[m,n]:= aa[i-k+m, j-k+m] * kerneld[m,n]

within that innermost loop of `Convolve`?

You have several basic syntax mistakes which are not version specific.

To assign a plot to name F you need to use the syntax,

F := ...

and not,

F = ...

The plots package is loaded by issuing the command,

with(plots)

and not by (what you have),

withPlots

If you want to call display without loading the package then it would be done like either,

plots[display]( ... )

or,

plots:-display( ... )

and not (as you have it) as,

plots[display][ ... ]

acer

@Carl Love If one of the goals is to avoid potentially quadratic memory use (ie. O(k^2)) then it might be a good idea to explicitly unassign the names which get the indexed assignments, before the loop begins.

That would add safeguard against the user re-running the code without doing a restart, and incurring such a cost.

It could also help the user avoid a runtime error which trying to assign to a "long list", if re-run with restart, in the case that k>100.

It might also be more clear to use different names: say tableA for the indexed assignments and listA for the final conversion.

@kegj You have ignored my query: where are the statement terminators between the statements in the for-do loop?

More specifically, why is there no statement terminator at the end of this statement?

Y[j]:=Y[j-1]+dY

 

@maple2015 Did you try print(NCP) after first doing your plotsetup call? Try it.

@nm Your screenshots each show a single execution group containing the first three commands.

@nm I was talking about `add` in particular, because you mentioned it specifically. I have a suspicion that, apart from a general note, the only things that usually get in such Compatibility sections are mention of new or changed options.

FWIW, there are details on changes to `cat` and `rand` on the ?updates,Maple2015,Language help page.

Of course, this too does not cover your Question in full. I don't know of any uniform way to find details on all changes.

 

@nm Make sure that the three first commands are in separate execution groups or document blocks.

There is a long-standing problem in that the Std GUI implementation of the interface command may not take effect if it follows a restart in the same execution group.

You'll know it has worked because the printout will be in plaintext mode, not typeset (or italic).

@Preben Alsholm My ~/.mapleinit initialization file contains code to read a .mapleinit file from currentdir(), if found. Sometimes I use this mechanism to customize libname, etc, according to directory.

Basically, I often have multiple projects on the go, and instead of having to change working directory and pass the -b option to the TTY interface I can just set up the details once in a local directory-specific initialization file.

...and I have such local initialization files print something to the terminal, to remind me whether or not they are active in the current session. I don't want that output pasted in my responses here.

Sure, I could just use the -s option to prevent the initialization file from running. But I also use the TTY interface history feature quite a bit, and I usually prefer not to quit the session since I can lose the connection between that project and that command history.

There are quite a few words that I habitually misspell. Fortunately, restart doesn't seem to be one of them.

@Markiyan Hirnyk I think that it's more automatic to not have to supply v>=3, which is why I wrote that Kitonum's answer is better. I realize that English may not be not your first language, but please try and pay closer attention.

@Markiyan Hirnyk Your suggestion is not better, since it is not more automatic. It still depends on the user utilizing special knowledge of the structure; in this case the position of the value.

Kitonum's answer is more automatic, and it is better.

@richas I have some customized procedures for compacting trig expressions, which can sometimes find shorter equivalents that what simplify(...,size) gives. But they can be very computationally expensive, with quickly diminishing returns on time and memory resources invested.

One not so expensive idea is that combine can be restricted to act on just some unfrozen subset of all the trig calls in expressions like yours. The point being that simplify(...,size) doesn't attempt such mathematical conversions as combining trig calls. And combine tries to merge as much as it can, without regard to size-simplification. By using frontend to call combine with specific subsets of the trig calls unfrozen, a shorter equivalent may be found. This can be a brute force attack, leaving unfrozen all members of the full power set of the trig terms.

Another, very expensive idea is to notice that subsets of a SUM DAG (a `+` of subterms) can sometimes be simplified selectively to obtain a shorter equivalent than what simplify(...,size) gives. For example suppose the original expression T is a sum with 4 addends T[i], i=1..4. It might turn out that T[1]+T[3] and T[2]+T[4] can be size-simplified, and then the resulting pair added and size-simplied, and that this is short. Testing all ways to accomplish this kind of towered addition gets very expensive as the number or original addends increases. Even with memoization this is expensive. I would not be surprised if there were much combinatorially better ways to build up candidate equivalents.

These two ideas can be combined, as a very expensive brute force attack.

But as I mentioned above, the results from simplify(...,size) produced just as good a common-subexpression optimized procedure for computing all of [X,Y,Z] together. Well, as good as for any short [A,B,C] that I've found so far. It you have to do 35 such examples then I suggest just emitting procedures of Matlab functions is in the fast and mostly automated way in my Answer.

It might also be worth noting that it could be that neither length not MmaTranslator:-Mma:-LeafCount might be the most appropriate metric for estimating the computational cost of evaluating the resulting expression at purely floating-point values such as might occur during a simluation. How should the cost of a float evaluation of a trig call be compared with that of an addition or a multiplication? Perhaps the trig calls should be given more weight, so that reducing the total number of trig calls becomes more significant.

The following A,B,C are shorter representations for X,Y,Z respectively than what simplify(...,size) produces.

But they don't appear to produce a better computation sequence using common subexpressions (as codegen[optimize] is used for, above).

A := (-sin(q6)*L11+d1*cos(q6)+L10+((L16*sin(q10)+d2*cos(q10)+L13)*cos(q8)
 +((L16*cos(q10)-d2*sin(q10)+L15)*sin(q9)+L14*cos(q9))*sin(q8)+L12)*cos(q6+q7)
 -((-L16*cos(q10)+d2*sin(q10)-L15)*cos(q9)+L14*sin(q9))*sin(q6+q7))*cos(q5)
 -(((-L16*cos(q10)+d2*sin(q10)-L15)*sin(q9)-L14*cos(q9))*cos(q8)
 +sin(q8)*(L16*sin(q10)+d2*cos(q10)+L13))*sin(q5):

B := (-L11*sin(q6)+d1*cos(q6)+L10+((L16*sin(q10)+d2*cos(q10)+L13)*cos(q8)
 +((L16*cos(q10)-sin(q10)*d2+L15)*sin(q9)+L14*cos(q9))*sin(q8)+L12)*cos(q6+q7)
 -(L14*sin(q9)-cos(q9)*(L16*cos(q10)-sin(q10)*d2+L15))*sin(q6+q7))*sin(q5)
 +(((-L16*cos(q10)+sin(q10)*d2-L15)*sin(q9)-L14*cos(q9))*cos(q8)
 +sin(q8)*(L16*sin(q10)+d2*cos(q10)+L13))*cos(q5)-L9:

C := ((L16*cos(q10)-sin(q10)*d2+L15)*cos(q9)-L14*sin(q9))*cos(q6+q7)-L8
 +(((sin(q10)*d2-L16*cos(q10)-L15)*sin(q9)-L14*cos(q9))*sin(q8)
 +(-L16*sin(q10)-d2*cos(q10)-L13)*cos(q8)-L12)*sin(q6+q7)-d1*sin(q6)-L11*cos(q6):

length(B), length(B), length(C);
                                646, 646, 406

 

First 338 339 340 341 342 343 344 Last Page 340 of 592