Project

General

Profile

Slug #1228

SLUG: filling an array

Added by John Abbott over 5 years ago. Updated about 2 years ago.

Status:
In Progress
Priority:
High
Assignee:
Category:
bug
Target version:
Start date:
30 Sep 2018
Due date:
% Done:

30%

Estimated time:
Spent time:

Description

It seems that assigning to an array element is surprisingly slow.

Look at the timings in the following session:

>>> t0 := CpuTime();
>>> c := 0; for J := 1 to N/2 do if gcd(J,N) = 1 then incr(ref c); endif endfor;
>>> TimeFrom(t0);
8.734
>>> t0 := CpuTime();
>>> d := 0; for J := 1 to N/2 do if gcd(J,N) = 1 then d := d+mod(J^2,N); endif endfor;
>>> TimeFrom(t0);
7.442
>>> sieve := NewList(N);
>>> t0 := CpuTime();
>>> for J := 1 to N/(17*38) do if gcd(J,N) = 1 then sieve[mod(J^2,N)]:=1; endif endfor;
>>> TimeFrom(t0);
42.705

There are 3 loops. Why is the first loop SLOWER than the second one??
Why is the third loop FAR SLOWER than either the first or second ones?
(note that the third loop does only about 0.3% of the iterations of the other loops)


Related issues

Related to CoCoA-5 - Slug #1229: append is slowNew2018-10-05

Related to CoCoA-5 - Slug #96: sort is too slowNew2012-03-01

Related to CoCoA-5 - Slug #862: append has bad complexityNew2016-04-05

Related to CoCoA-5 - Slug #31: theValue makes copyIn Progress2011-11-15

History

#1 Updated by John Abbott over 5 years ago

  • Priority changed from Normal to High

I forgot to mention that N := 9699690 (product of all primes up to 20).

The third loop is APPALLINGLY SLOW, TRULY EMBARRASSINGLY SLOW.

I have a nasty fear that sieve[...] := 1; actually makes a copy of the entire list sieve (which contains nearly 10 million elements).

I do hope that this is easy to fix.

#2 Updated by John Abbott over 5 years ago

  • Description updated (diff)

#3 Updated by John Abbott over 5 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 10

I have just tried the third loop again, but after making the list sieve twice as long, and the loop takes almost twice as long. So the assignment sieve[...] := 1 has cost proportional to the length of the list sieve.

Ouch!

#4 Updated by John Abbott over 5 years ago

  • % Done changed from 10 to 20

I have used profiling to confirm that the list is indeed copied (but why???)

The test code is:

LongList := NewList(1000000);

t0 := CpuTime();
for j := 1 to 100 do
  LongList[j] := j;
endfor;
println "Loop time: ", TimeFrom(t0);

Profiler reports:

                0.23    1.05     100/100         CoCoA::InterpreterNS::LeftValue::obtainOwnership() [6]
[7]     83.7    0.23    1.05     100         CoCoA::InterpreterNS::LIST::needsToBeCopiedBeforeChanges() const [7]
                0.23    0.07 100000100/100000100     bool __gnu_cxx::operator!=<boost::intrusive_ptr<CoCoA::InterpreterNS::RightValue> const*, .... > const&) [8]

In Interpreter.C around line 2355 the source says:

bool MAT::needsToBeCopiedBeforeChanges() const {
    return this->getRefCount()>1;
}

So the question is: why is the ref count greater than 1???

#5 Updated by John Abbott over 5 years ago

In Interpreter.C around line 2065 there is:

bool IndexedAccess::assignmentNeedsOwnership() const {
    return true;
}

In Interpreter.C around line 4210 there is:

void AssignmentStatement::implExecute(RuntimeEnvironment *runtimeEnv) {
    const intrusive_ptr<LeftValue> leftValue = this->leftExp->evalAs<LeftValue>(runtimeEnv);
    runtimeEnv->interpreter->checkForInterrupts(this->leftExp);
    intrusive_ptr<RightValue> rightValue = this->rightExp->evalAs<RightValue>(runtimeEnv);
    if (leftValue->assignmentNeedsOwnership())
        leftValue->obtainOwnership();
    leftValue->assign(rightValue, this->rightExp->getBegin(), this->rightExp->getEnd(), runtimeEnv);
}

#6 Updated by John Abbott over 5 years ago

  • Assignee set to John Abbott
  • % Done changed from 20 to 30

I have changed IndexedAccess::assignmentNeedsOwnership so that it gives false.
The loop is much faster (0.001s against 11s), and the CoCoA5 tests all pass.

So, is it safe to make that change???

The slow loop in the main description now takes 0.3s (with debugging and profiling on) against 42s (optimized).

I'll try asking Giovanni...

#7 Updated by John Abbott over 5 years ago

Now I fear there may be some "strange" situations in which nasty things may occur:

define HEAD(ref L) L := L[1]; return L; enddefine;
L := [1,2];
L[2] := HEAD(ref L); --> asking for trouble!

With my modified CoCoA-5 this produces:

--> ERROR: Expecting type LIST, MAT or MATRIXROW, but found type INT
--> L[2] := HEAD(L); --> asking for ...
--> ^

Not so helpful, but I had feared a crash or memory corruption.

Maybe the semantics of an assignment of the form L[i] := blah should be:
  1. evaluate blah
  2. check that L is indexable and that i is a valid index; if so, then assign.

Not sure what the actual semantics are...

#8 Updated by John Abbott over 5 years ago

A potential disadvantage of my proposed semantics is that if the RHS is expensive to evaluate, and the LHS is "obviously wrong" then the user must wait for the evaluation of RHS to finish before discovering the error.

What should happen with this input?

>>> J := 1;
>>> define MakeRecord(ref A) A := record[]; return 1; enddefine;
>>> J.xyz := MakeRecord(J);

Currently it complains:

--> ERROR: Expecting type RECORD, but found type INT
--> J.xyz := MakeRecord(J);
--> ^

I am inclined to think that a user who writes an assignment LHS := RHS; where evaluating RHS may change the structure of LHS should accept whatever CoCoA-5 chooses to do... but CoCoA-5 should not SEGV, or corrupt memory or do anything else "nasty": it should either do something "legal" or return a proper CoCoA error.

#9 Updated by Anna Maria Bigatti over 5 years ago

John Abbott wrote:

I am inclined to think that a user who writes an assignment LHS := RHS; where evaluating RHS may change the structure of LHS should accept whatever CoCoA-5 chooses to do...

Michael Abshoff used to say "you can't prevent the user to shoot himself in the foot".

PS with my old CoCoA I get the same errors you wrote.

#10 Updated by John Abbott over 5 years ago

Yes, we cannot (and probably should not) stop the user from "shooting himself in the foot".
But CoCoA-5 should never crash (SEGV, or other naughty memory access).

#11 Updated by John Abbott over 5 years ago

Here is an alternative to my comment about the semantics of assignment (in comment 7):
  1. check that LHS is writable (valid index, etc)
  2. evaluate RHS
  3. check again that LHS is writable, and then write the value.

I would hope that checking writability is cheap...

#12 Updated by Anna Maria Bigatti over 5 years ago

John Abbott wrote:

I would hope that checking writability is cheap...

We will make some checks.
I want to design some speed tests mechanism.
(Cannot really be based on the current test examples because they are for testing correctedness, and effectively expansible at any time)

#13 Updated by John Abbott over 5 years ago

My proposal in comment 11 is bad because, for instance, the computation of an index may be lengthy or may have side-effects. The user would be surprised to find the index computed twice.

Now I propose:
  1. check LHS, and cache the index values (if any)
  2. evaluate RHS
  3. check existence/writability of LHS using the cached indexes

It seems that the interpreter may already do something similar to this, because my tests gave errors (but the error messages were not very helpful...)

NOTE a naughty user could write something like L[expr1] := expr2 where expr1 has a side-effect changing L, and expr2 has a side-effect changing the way expr1 is evaluated. Hopefully Giovanni had already thought of this!

#14 Updated by John Abbott over 5 years ago

It does not really matter how we evaluate LHS := RHS.
Either side could be expensive to evaluate. Publicly we should say that the order of evaluation is not rigidly specified -- this lets us change it in the future. Anyway, it is rather risky writing code which relies on order of evaluation inside a single expression.

Anyway, my suggestion in comment 13 is reasonable (and seems to be what the interpreter actually does, based on a few tests).

#15 Updated by John Abbott over 5 years ago

#16 Updated by John Abbott almost 5 years ago

  • Related to Slug #96: sort is too slow added

#17 Updated by John Abbott almost 5 years ago

  • Related to Slug #862: append has bad complexity added

#18 Updated by John Abbott almost 5 years ago

I think I have found a situation where my simple idea in comment 6 causes trouble...

/**/ L := [2,3,1];
/**/ sorted(L);
[1,2,3];
/**/ L;
[1,2,3]; // OUCH!!??!!

Boohoo!!

Note: JAA has no idea how to fix this :-( But it is a terrible SLUG.

#19 Updated by John Abbott over 4 years ago

  • Target version changed from CoCoA-5.3.0 to CoCoA-5.4.0

#20 Updated by John Abbott almost 3 years ago

  • Related to Slug #31: theValue makes copy added

#21 Updated by John Abbott about 2 years ago

  • Target version changed from CoCoA-5.4.0 to CoCoA-5.4.2

Also available in: Atom PDF