Feature #622
New function: RandomSubset
Description
Sometimes "all subsets" are far too many (compute how many! ;-) to fit into memory.
So checking a property on a suitable number of random subsets can be interesting.
Related issues
History
#1 Updated by Anna Maria Bigatti over 9 years ago
- % Done changed from 0 to 20
- Estimated time set to 1.50 h
done. still missing: documentation, tests.
Would it be useful in CoCoALib? maybe just a random subset of 1..n?
#2 Updated by John Abbott almost 9 years ago
I wanted a RandomSubset
function which generated a random subset of 1..n
of cardinality r
. I was surprised when CoCoA reported that I could not define the fn as the name was already a package exported variable.
I have a slight preference for my fn over the existing one which generated a random subset of a set represented as a list. The fns are closely related:
RandomSubset(L,r) == [L[i] | i in RandomSubsetJAA(len(L),r)]; RandomSubsetJAA(n,r) == RandomSubset(1..n, r)
Why do I prefer my fn? It somehow "feels simpler", and might be slightly easier to implement efficiently in C++; the current defn requires copying some/all elements of the input list L
which could be expensive if the elements are large.
For completeness here is my defn (taken from Wikipedia "Reservoir sorting"?)
define RandomSubset(n,r) subset := 1..r; for k := r+1 to n do i := random(1,k); if i > r then continue; endif; subset[i] := k; endfor; return sorted(subset); enddefine; -- RandomSubset
#3 Updated by John Abbott almost 9 years ago
We could simply have both; that way the user can choose which is more convenient (or efficient).
If we do choose to have both, should they have the same name or different names?
Maybe my fn should be called RandomIndexSubset
? (long and a rather ugly)
#4 Updated by Anna Maria Bigatti almost 9 years ago
- % Done changed from 20 to 30
Let's start with just yours, for many reasons:
- it is simpler, as you said
- indeed that's also what I needed ;-)
- it can be implemented with the same philosophy in cocoalib (without fears for copies)
- you will write the documentation and tests (won't you? ;-)
#5 Updated by John Abbott almost 9 years ago
- Status changed from New to In Progress
Following CoCoALib conventions the return type should be vector<long>
.
What should the values in the vector be? Should it be a subset of 0..(n-1)
or a subset of 1..n
? If we regard the values as indices then the former makes more sense in C++, but it might be more convenient/natural in CoCoA-5 to adopt the latter convention.
It would even be possible to implement both: in CoCoALib the range is 0..(n-1)
while the "same function" in CoCoA-5 uses the range 1..n
. Is this a good or a bad idea?
#6 Updated by John Abbott almost 9 years ago
- Priority changed from Normal to Urgent
#7 Updated by Anna Maria Bigatti almost 9 years ago
I suggest having also RandomTuple(n,r)
which is indeed pretty simple: in CoCoA
define RandomTuple(L, r) return [ L[random(1,len(L))] | i in 1..r ]; enddefine; -- RandomSubset
Now I think in CoCoALib we should have
RandomSubsetIndices(n,r)
and RandomTupleIndices(n,r)
(and also
RandomSubsetIndices(vec, r)
and RandomTupleIndices(vec,r)
?)#8 Updated by Anna Maria Bigatti almost 9 years ago
I added to NotBuiltin.cpkg the functions
Export RandomSubset; Export RandomSubsetIndices; Export RandomTuple; Export RandomTupleIndices;
now I write the documentation...
#9 Updated by Anna Maria Bigatti almost 9 years ago
documentation done (and cvs'ed)
When we implement the "Indices" in CoCoALib we'll remove them from "NotBuiltin".
(Remember: shift indices by 1)
#10 Updated by Anna Maria Bigatti almost 9 years ago
- Status changed from In Progress to Feedback
- % Done changed from 30 to 90
- Estimated time changed from 1.50 h to 3.00 h
#11 Updated by John Abbott almost 9 years ago
I have added defns to CoCoALib of RandomSubsetIndices
and RandomTupleIndices
. These names are very long :-/
Not entirely convinced by the usefulness of RandomTupleIndices
.
Also added defns to BuiltinFunctions.C
Not yet: doc, examples, tests.
#12 Updated by John Abbott almost 9 years ago
Where should the new functions go?
Currently they are in a new file combinatorics.C
because I could not see any other good place tro put them. Is this OK?
#13 Updated by Anna Maria Bigatti almost 9 years ago
John Abbott wrote:
Where should the new functions go?
Currently they are in a new file
combinatorics.C
because I could not see any other good place tro put them. Is this OK?
yes
#14 Updated by John Abbott almost 9 years ago
Checked in.
You'll have to remove interrupt.C
from src/AlgebraicCore/Makefile
until I can check those files in... hot choc break now! :-)
#15 Updated by Anna Maria Bigatti almost 9 years ago
John Abbott wrote:
I have added defns to CoCoALib of
RandomSubsetIndices
andRandomTupleIndices
. These names are very long :-/Not entirely convinced by the usefulness of
RandomTupleIndices
.
Nor am I, for cocoalib. But I know someone who should make good use of RandomTuple
in cocoa-5 ;-)
Also added defns to
BuiltinFunctions.C
cvs?
Not yet: doc, examples, tests.
I can do that.
#16 Updated by John Abbott almost 9 years ago
I have checked in RandomSubsetIndices
and RandomTupleIndices
(in CoCoALib); also the interface fns in BuiltinFunctions.C
.
I have not implemented RandomSubset
in C++; nor RandomTuple
.
#17 Updated by Anna Maria Bigatti almost 9 years ago
- Status changed from Feedback to Closed
- % Done changed from 90 to 100
- Estimated time changed from 3.00 h to 6.01 h