acer

yes...

Answered: acer 32363

March 20 2015

0 0

@Carl Love Yes, that top-level code isn't the same as calling `Convolve` without evalhf. Rather, it is an elementwise (vectorized) use of kernel builtin `*` on float[8] rtables which it knows how to treat well. So, yes, it's already parallelized in another way, and is fast despite the fact that it repeatedly creates `subimage` as a collectible garbage rtable.

Attached is a sheet with a few ideas on speedup. This does not include what I'd expect to be best: Compiled and multithreaded with the Task mode.

The top level code can be optimized at its second stage, by rolling the two nested `add` calls into just one call.

Yes, I believe that using order=C_order on `subimage` and `kerneld` can make it a bit faster -- as long as loops or the nested `add` calls walk along rows at their innermost layer (so as to walk adjacent entries in memory).

More saving can be had by constucting part of the `subimage` indicies, for re-use, at the start of each nested i and j loop. (See `ipart`, `jpart`, and `mindex`.)

And then one can notice that `subimage` is not really needed in such a proc.

>

restart:

>	kernelopts(version);

Maple 18.02, X86 64 WINDOWS, Oct 20 2014, Build ID 991181

>

k := 3:

>	filelocation := cat(kernelopts(datadir),"/ship.jpg"):

>	zimage := ImageTools:-Read(filelocation):

>	zwidth := ImageTools:-Width(zimage):

>	zHeight := ImageTools:-Height(zimage): zimage := ImageTools:-Scale(zimage, 1..2zHeight, 1..2zwidth):

>	kernell := 2*k+1:

>	kerneld := Matrix(kernell, kernell, fill=1/kernell^2, datatype=float[8], order=C_order):

>	st:= time():

>	new1zumage:= ImageTools:-Convolution(zimage,kerneld):

>	time()-st;

0.203

>	#ImageTools:-View(new1zumage);

>	aa:= zimage(1..,1..,1):

>	bb:= zimage(1..,1..,2):

>	cc:= zimage(1..,1..,3):

>	subimage := Matrix(kernell, kernell, datatype=float[8], order=C_order):

>	newaa := copy(aa): newaa1 := copy(aa): newaa2 := copy(aa): newaa3 := copy(aa):

>	R := LinearAlgebra:-RowDimension(aa);

>	C := LinearAlgebra:-ColumnDimension(aa);

R := 800
C := 1200

>	st := time[real]():

>	for i from k+1 to R-k-1 do

>	for j from k+1 to C-k-1 do

>	subimage := aa(i-k..i+k, j-k..j+k) *~ kerneld; newaa[i,j] := add(s, s=subimage);

>

end do

>

end do;

>	T1s := time[real]()-st;

T1s := 40.814

>	st := time[real]():

>	for i from k+1 to R-k-1 do

>	for j from k+1 to C-k-1 do

>	subimage := aa(i-k..i+k,j-k..j+k) *~ kerneld;

>	newaa[i,j] := add(add(subimage[m,n], m=1..kernell), n=1..kernell)

>

end do

>

end do;

>	T1:= time[real]()-st;

T1 := 54.961

>	Convolve:= proc(aa, newaa, kerneld, subimage, k, R, C)

>	local i, j, m, n, `k+1` := k+1, `C-k-1` := C-`k+1`, kernell := 2*k+1;

>	for i from `k+1` to R-k-1 do

>	for j from `k+1` to `C-k-1` do

>	for m to kernell do

>	for n to kernell do

>	subimage[m,n] := aa[i-k+m-1, j-k+n-1] * kerneld[m,n]

>

end do

>

end do;

>	newaa[i,j] := add(add(subimage[m,n], m=1..kernell), n=1..kernell)

>

end do

>

end do

>

end proc:

>	st := time[real]():

>	evalhf(Convolve(aa, newaa1, kerneld, subimage, k, R, C)):

>	T2 := time[real]()-st;

T2 := 28.373

>	Convolve2:= proc(aa, newaa, kerneld, subimage, k, R, C)

>	local i, j, m, n, `k+1`:= k+1, `C-k-1` := C-`k+1`, kernell := 2*k+1, ipart, jpart, mindex;

>	for i from `k+1` to R-k-1 do ipart := i-k-1;

>	for j from `k+1` to `C-k-1` do jpart := j-k-1;

>	for m to kernell do mindex := ipart+m;

>	for n to kernell do

>	subimage[m,n] := aa[mindex, jpart+n] * kerneld[m,n]

>

end do

>

end do;

>	newaa[i,j] := add(add(subimage[m,n], n=1..kernell), m=1..kernell)

>

end do

>

end do

>

end proc:

>	st := time[real]():

>	evalhf(Convolve2(aa, newaa2, kerneld, subimage, k, R, C)):

>	T3 := time[real]()-st;

T3 := 24.766

>	Convolve3:= proc(aa, newaa, kerneld, k, R, C)

>	local i, j, m, n, `k+1`:= k+1, `C-k-1` := C-`k+1`, kernell := 2*k+1, ipart, jpart;

>	for i from `k+1` to R-k-1 do ipart := i-k-1;

>	for j from `k+1` to `C-k-1` do jpart := j-k-1;

>	newaa[i,j] := add(add(aa[ipart+m, jpart+n] * kerneld[m,n], n=1..kernell), m=1..kernell)

>

end do

>

end do

>

end proc:

>	st := time[real]():

>	evalhf(Convolve3(aa, newaa3, kerneld, k, R, C)):

>	T4 := time[real]()-st;

T4 := 19.051

>	LinearAlgebra:-Norm(Matrix(newaa)-Matrix(newaa1)); LinearAlgebra:-Norm(Matrix(newaa)-Matrix(newaa2)); LinearAlgebra:-Norm(Matrix(newaa)-Matrix(newaa3));

                                     0.
                                                -14
                          6.30606677987088915 10
                                                -14
                          6.30606677987088915 10

>

Download convolve_evalhf.mw

false...

Answered: acer 32363

March 20 2015

0 0

@rollermonkey Your claims about my suggestions appear to be entirely false, and appear to depend on your continued mistyping of key syntax, or misspelling (you now appear to have done with(Plots) not with(plots) as I suggested), or perhaps in part due to a faulty installation. Carl's Answer covers it in detail, I see.

subimage[m,n]...

Answered: acer 32363

March 20 2015

0 0

@Carl Love Did you intend,

  subimage[m,n]:= aa[i-k+m-1, j-k+n-1] * kerneld[m,n]

instead of that,

  subimage[m,n]:= aa[i-k+m, j-k+m] * kerneld[m,n]

within that innermost loop of `Convolve`?

various mistakes...

Answered: acer 32363

March 20 2015

0 0

You have several basic syntax mistakes which are not version specific.

To assign a plot to name F you need to use the syntax,

F := ...

and not,

F = ...

The plots package is loaded by issuing the command,

with(plots)

and not by (what you have),

withPlots

If you want to call display without loading the package then it would be done like either,

plots[display]( ... )

or,

plots:-display( ... )

and not (as you have it) as,

plots[display][ ... ]

acer

re-run...

Answered: acer 32363

March 15 2015

0 0

@Carl Love If one of the goals is to avoid potentially quadratic memory use (ie. O(k^2)) then it might be a good idea to explicitly unassign the names which get the indexed assignments, before the loop begins.

That would add safeguard against the user re-running the code without doing a restart, and incurring such a cost.

It could also help the user avoid a runtime error which trying to assign to a "long list", if re-run with restart, in the case that k>100.

It might also be more clear to use different names: say tableA for the indexed assignments and listA for the final conversion.

why ignore?...

Answered: acer 32363

March 15 2015

0 0

@kegj You have ignored my query: where are the statement terminators between the statements in the for-do loop?

More specifically, why is there no statement terminator at the end of this statement?

Y[j]:=Y[j-1]+dY

try it...

Answered: acer 32363

March 14 2015

0 0

@maple2015 Did you try print(NCP) after first doing your plotsetup call? Try it.

execution group...

Answered: acer 32363

March 12 2015

0 0

@nm Your screenshots each show a single execution group containing the first three commands.

add...

Answered: acer 32363

March 12 2015

0 0

@nm I was talking about `add` in particular, because you mentioned it specifically. I have a suspicion that, apart from a general note, the only things that usually get in such Compatibility sections are mention of new or changed options.

FWIW, there are details on changes to `cat` and `rand` on the ?updates,Maple2015,Language help page.

Of course, this too does not cover your Question in full. I don't know of any uniform way to find details on all changes.

separate execution groups...

Answered: acer 32363

March 11 2015

0 0

@nm Make sure that the three first commands are in separate execution groups or document blocks.

There is a long-standing problem in that the Std GUI implementation of the interface command may not take effect if it follows a restart in the same execution group.

You'll know it has worked because the printout will be in plaintext mode, not typeset (or italic).

.mapleinit...

Answered: acer 32363

March 11 2015

0 0

@Preben Alsholm My ~/.mapleinit initialization file contains code to read a .mapleinit file from currentdir(), if found. Sometimes I use this mechanism to customize libname, etc, according to directory.

Basically, I often have multiple projects on the go, and instead of having to change working directory and pass the -b option to the TTY interface I can just set up the details once in a local directory-specific initialization file.

...and I have such local initialization files print something to the terminal, to remind me whether or not they are active in the current session. I don't want that output pasted in my responses here.

Sure, I could just use the -s option to prevent the initialization file from running. But I also use the TTY interface history feature quite a bit, and I usually prefer not to quit the session since I can lose the connection between that project and that command history.

There are quite a few words that I habitually misspell. Fortunately, restart doesn't seem to be one of them.

read...

Answered: acer 32363

March 11 2015

0 0

@Markiyan Hirnyk I think that it's more automatic to not have to supply v>=3, which is why I wrote that Kitonum's answer is better. I realize that English may not be not your first language, but please try and pay closer attention.

not really...

Answered: acer 32363

March 11 2015

0 0

@Markiyan Hirnyk Your suggestion is not better, since it is not more automatic. It still depends on the user utilizing special knowledge of the structure; in this case the position of the value.

Kitonum's answer is more automatic, and it is better.

custom code...

Answered: acer 32363

March 11 2015

0 0

@richas I have some customized procedures for compacting trig expressions, which can sometimes find shorter equivalents that what simplify(...,size) gives. But they can be very computationally expensive, with quickly diminishing returns on time and memory resources invested.

One not so expensive idea is that combine can be restricted to act on just some unfrozen subset of all the trig calls in expressions like yours. The point being that simplify(...,size) doesn't attempt such mathematical conversions as combining trig calls. And combine tries to merge as much as it can, without regard to size-simplification. By using frontend to call combine with specific subsets of the trig calls unfrozen, a shorter equivalent may be found. This can be a brute force attack, leaving unfrozen all members of the full power set of the trig terms.

Another, very expensive idea is to notice that subsets of a SUM DAG (a `+` of subterms) can sometimes be simplified selectively to obtain a shorter equivalent than what simplify(...,size) gives. For example suppose the original expression T is a sum with 4 addends T[i], i=1..4. It might turn out that T[1]+T[3] and T[2]+T[4] can be size-simplified, and then the resulting pair added and size-simplied, and that this is short. Testing all ways to accomplish this kind of towered addition gets very expensive as the number or original addends increases. Even with memoization this is expensive. I would not be surprised if there were much combinatorially better ways to build up candidate equivalents.

These two ideas can be combined, as a very expensive brute force attack.

But as I mentioned above, the results from simplify(...,size) produced just as good a common-subexpression optimized procedure for computing all of [X,Y,Z] together. Well, as good as for any short [A,B,C] that I've found so far. It you have to do 35 such examples then I suggest just emitting procedures of Matlab functions is in the fast and mostly automated way in my Answer.

It might also be worth noting that it could be that neither length not MmaTranslator:-Mma:-LeafCount might be the most appropriate metric for estimating the computational cost of evaluating the resulting expression at purely floating-point values such as might occur during a simluation. How should the cost of a float evaluation of a trig call be compared with that of an addition or a multiplication? Perhaps the trig calls should be given more weight, so that reducing the total number of trig calls becomes more significant.

even shorter X,Y,Z...

Answered: acer 32363

March 10 2015

0 0

The following A,B,C are shorter representations for X,Y,Z respectively than what simplify(...,size) produces.

But they don't appear to produce a better computation sequence using common subexpressions (as codegen[optimize] is used for, above).

A := (-sin(q6)*L11+d1*cos(q6)+L10+((L16*sin(q10)+d2*cos(q10)+L13)*cos(q8)
 +((L16*cos(q10)-d2*sin(q10)+L15)*sin(q9)+L14*cos(q9))*sin(q8)+L12)*cos(q6+q7)
 -((-L16*cos(q10)+d2*sin(q10)-L15)*cos(q9)+L14*sin(q9))*sin(q6+q7))*cos(q5)
 -(((-L16*cos(q10)+d2*sin(q10)-L15)*sin(q9)-L14*cos(q9))*cos(q8)
 +sin(q8)*(L16*sin(q10)+d2*cos(q10)+L13))*sin(q5):

B := (-L11*sin(q6)+d1*cos(q6)+L10+((L16*sin(q10)+d2*cos(q10)+L13)*cos(q8)
 +((L16*cos(q10)-sin(q10)*d2+L15)*sin(q9)+L14*cos(q9))*sin(q8)+L12)*cos(q6+q7)
 -(L14*sin(q9)-cos(q9)*(L16*cos(q10)-sin(q10)*d2+L15))*sin(q6+q7))*sin(q5)
 +(((-L16*cos(q10)+sin(q10)*d2-L15)*sin(q9)-L14*cos(q9))*cos(q8)
 +sin(q8)*(L16*sin(q10)+d2*cos(q10)+L13))*cos(q5)-L9:

C := ((L16*cos(q10)-sin(q10)*d2+L15)*cos(q9)-L14*sin(q9))*cos(q6+q7)-L8
 +(((sin(q10)*d2-L16*cos(q10)-L15)*sin(q9)-L14*cos(q9))*sin(q8)
 +(-L16*sin(q10)-d2*cos(q10)-L13)*cos(q8)-L12)*sin(q6+q7)-d1*sin(q6)-L11*cos(q6):

length(B), length(B), length(C);
                                646, 646, 406

E-Mail Address:
Password:
Remember Me:	Automatically sign in on future visits

E-Mail Address:
Password:
Remember Me:	Automatically sign in on future visits

Ask a Question

Create a Post

32363 Reputation

29 Badges

Social Networks and Content at Maplesoft.com

MaplePrimes Activity

These are replies submitted by acer

yes...

false...

subimage[m,n]...

various mistakes...

re-run...

why ignore?...

try it...

execution group...

add...

separate execution groups...

.mapleinit...

read...

not really...

custom code...

even shorter X,Y,Z...

Save this setting as your default sorting preference?

Ask a Question

Create a Post

Generating PDF…

Save this setting as your default sorting preference?
Note: You can change your preference any time in your account settings
Don't show this again