Looking for some math help

I’m having trouble computing some probabilities. Here’s the basics:

The class ItemSet can have any number of entries that will be chosen at random. The algorithm will choose between a user-defined min and max. What I’m trying to do is compute the odds of an entry being selected. So for example, if there are 4 entries in a set and the min and max are both 2, all entries would have a 50% chance of being chosen.

This part I’ve got down actually. Here’s the monkeywrench I haven’t been able to deal with. Each entry has a weight value. These values are relative to each other, so 2 entries with the same weight will have the same chances. If entry A has a weight of 0.8 and entry B has a weight of 0.4, entry A should be twice as likely to be chosen.

Here’s the code I have so far, slightly modified for clarity.

[code] Dim EntryCount As UInteger = UBound(Self.mEntries) + 1
If EntryCount = 0 Then
Return 0
End If

Dim MinNumItems As UInteger = Max(Min(EntryCount, Self.MinNumItems, Self.MaxNumItems), 1)
Dim MaxNumItems As UInteger = Max(Min(Self.MaxNumItems, EntryCount), MinNumItems)
Dim TotalCombinations, TotalMatches As Double

For I As Integer = MaxNumItems DownTo MinNumItems
Dim Combinations As Double = Beacon.Combinations(EntryCount, I)
Dim Matches As Double = (I / EntryCount) * Combinations
TotalCombinations = TotalCombinations + Combinations
TotalMatches = TotalMatches + Matches
Next

If TotalCombinations = 1 Then
// Feels dirty, but this is true
Return 1
ElseIf TotalCombinations = 0 Then
Return 0
End If

Dim Chance As Double = Max(Min(TotalMatches / TotalCombinations, 1), 0.01)
Return Chance[/code]

This code can be found in the RelativeWeight method here: https://github.com/thommcgrath/Beacon/blob/master/Project/Modules/Beacon/ItemSet.xojo_code which is part of my Beacon project: https://github.com/thommcgrath/Beacon, so you could even download the project and run it yourself if you so desire.

Just to be clear, I’m not looking for help actually performing the selection. I’m looking for help computing the odds that an entry will be chosen.

Wouldn’t the probability of an item being chosen be the weight of that item divided by the sum of all weights?

Only if selecting exactly 1 entry. This is the code before yesterday:

Return Self.mEntries(Index).Weight / Self.TotalWeight()

My goal has been to make the calculation respect the min/max values, which has proven to be dramatically more complicated.

I’m really having a hard time wrapping my head around what you mean by min and max. I can see that they are the range of your For-Next loop, but beyond that, I don’t understand their significance. Could you give another example.

Let’a say I have a list of 10 entries, min 2, max 4. When it comes time to select entries, it can pick at least min and and most max entries. So it could pick 2 of 10, 3 of 10, or 4 of 10. That’s why the loop sums the number of combinations for each possibility.

Okay, I think I understand that part. So, when min=max=n then there will be exactly n items chosen, and when min<>max then any number of items in the range min…max can be chosen.

Unfortunately the devil is in the details. I had a look through your project, but it was far too complex to get an understanding of exactly what you’re trying to do.

I will point out one thing that is often overlooked in these things: Make sure you don’t count the same item twice. For example, if an item is chosen on iteration 1, then you can’t choose it again on iteration 2 or 3, etc. To account for this, on each iteration you need to multiply the probability of an item being chosen by the probability that it hasn’t been chosen on any of the previous iterations. This is fairly straightforward if all items have equal weights, but can become quite complicated if they have different weights, or other conditions apply.

It’s also useful to remember that the total of the probabilities of all possible events is one. So if you keep a running total of every possible event probability, and the value is one, there’s a very good change that you got it right.

Now is it correct that picking 2 or picking 3 or picking 4 is random? i.e. if the min is 2 and the max is 4 that picking 2 is equally likely as picking 3 which is equally likely as picking 4.

If that is the case, the multiple choices (2,3,4) are handled fairly easily. Chance of 2 occurring is ?; chance of 3 occurring is ? etc

( 1 / (max-min+1) )

So you calculate the likelihood of the entry being chosen if 2 are selected (Chance2) and divide it by three
So you calculate the likelihood of the entry being chosen if 3 are selected (Chance3) and divide it by three

So total chance is (Chance2 + Chance3 +Chance4) / 3

So that part is easy. This gets rid of all the min/max stuff. That is just cluttering the problem.

The real question:

The complicated part is calculating the chance of the entry being chosen if 2 things are chosen or if 3 things are chosen etc.

Calculating Chance2; Chance3; etc. is hard.

Now we have really reduced the problem to this. You have a bunch of “weighted” items and you are going to select N items. What is the likelihood that an entry will be chosen. As has been pointed out, if N equals 1 it is easy. (weight of entry / sum of the weights of the entries).

If the entries (say there are M of them) all have the same weight I think it is easy: 1 - [ (M-N) / (M) ]

But if N is not equal to 1 and the entries do not have equal weightings, it is complex(for me anyway)

Take a simple case: three entries A,B,C. Weights A 9; B 2; C 1 You are going to select 2 of the entries. What is the likelihood that entry C will be chosen?

Well,

it could be chosen first & chance of that is: (1 / 12)
it could be chosen second after A chance of that is: (9 / 12) * (1 / 3)
it could be chosen second after B chance of that is: (2 / 12 ) * (1 / 10)

So for this simple case: chance that C will be selected is (1 / 12 ) + (9 / 12) * (1 / 3) + (2 / 12 ) * (1 / 10)

But to my eyes the complexity of this simple case implies that more complex cases will be really complex as the numbers M and N increase.

The end of your post sums it up very well: it does become very complex. I’m pretty sure I need to adjust the total count inversely to the weight of the given item. Adding more items decreases the chance for selection, so the highest weight item should be unaltered, but the lowest weight item should compute as if there are more entries than there really is.

But for the record, your analysis of the chance in the first section of your post is not correct. Here’s an example. Given M = 5, Min =, Max = 5, the formula would be 1 / ((5 - 5) + 1) or 1/1 or 100%. Which is correct. But it doesn’t account for M. If we change M to 10, the result stays the same, but is no longer correct, as selecting 5 from a pool of 10 should be a 50% chance per item.

What I’ve concluded is that I need to calculate the sum of the number of possible combinations for each possible number of items chosen.

The problem is doing so requires computing a number of factorials. We’re talking very large numbers. Computing the combinations of any grouping from a pool of 38 (for example) requires larger numbers than can fit into a UInt64. So I’ve needed to use doubles.

So I’m left in a difficult position. Given two items with weight 5 and 10, I’d expect the first to be selected half as frequently as the second. So for the first, I’d compute combinations using twice the actual pool size. In a pool of 38, this means computing for 76. I fear blowing out even the capabilities of a double and being stuck. And this is just an example, there could be even more than 38.

For the record, I really feel like I’m making this more complicated than it needs to be.

we all do that from time to time.

The tough part is I know I should just leave this alone. It isn’t worth the time I’ve spent on this. But my brain can’t stop trying to solve it.

I do the same thing. I will spend hours… days… weeks on something that I should have walked away from after minutes… it is our nature.

This is exactly what happened even in simple scenarios.

Maybe if I write this out, something will come to mind, because I’m struggling to grasp a certain scenario.

Ignore the weight problem for a minute. Let’s say we have a pool of items A, B, C, and D. We are going to select between 1 and 4 of them without duplicates. The goal is to compute the odds that A will be chosen.

If 1 item is chosen, there are 4 possible combinations, A exists in 1 of those, so has a 25% chance of selection.
If 2 items are chosen, there are 6 possible combinations, A exists in 3 of those, so has a 50% chance of selection.
If 3 items are chosen, there are 4 possible combinations, A exists in 3 of those, so has a 75% chance of selection.
If 4 items are chosen, there is only 1 possible combination, so A has a 100% chance of selection.

There’s a very obvious pattern here. Finding each chance is as simple as iteration / 4, right? I don’t actually need to compute combinations to come up with this number. Then I could just multiply each chance by 1 / 4, add them all together, for 62.5%.

However… this isn’t how it actually works out. Because there are 15 total possible combinations, and 8 include A:

{A} {B} {C} {D} {A,B} {A,C} {A,D} {B,C} {B,D} {C,D} {A,B,C} {A,B,D} {A,C,D} {B,C,D} {A,B,C,D}

Which is 53.33% not 62.5% as previously computed. This is why I think I’m making it more complicated than it needs to be. Honestly, I feel like an idiot right now, like I must be making some fundamental mistake.

I think that you misinterpreted what I said. (That is to say I was not clear enough).
I was using M to refer to the number of entries. N to the number of entries being selected (So N<=M)

The equation in the first part of my post was ( 1 / (max-min+1) ). That was NOT an equation to calculate the likelihood of a given entry being chosen. It was the likelihood that N would be a particularly value in the range from min to max (inclusive)

[quote]If that is the case, the multiple choices (2,3,4) are handled fairly easily. Chance of 2 occurring is ?; chance of 3 occurring is ? etc

( 1 / (max-min+1) )[/quote]

The 1 / 3 comes from ( 1 / (max-min+1) ) : ( 1 / (4-2+1) ) --> 1 / 3

When you talk of min:max being 2:4 I interpret that as there being an equal likelihood that the actual N being 2 is one out of three ( 1 / (max-min+1) ). The change that N is 3 is also one out of three.

So if you are able to calculate the likelihood that your entry will be chosen under the circumstance that N is 2: we will call that Chance2. If we can calculate the likelihood that your entry will be chosen under the circumstance that N is 3: we will call that Chance3. If we can calculate the likelihood that your entry will be chosen under the circumstance that N is 4: we will call that Chance4.

Chance of 2 occurring is NOT the same thing as what I am calling Chance2. I think this is where I was unclear. Chance of 2 occurring (which is to say the chance that N will be 2) is easy to calculate. Once N has been chosen as 2, then the chance that the entry will be chosen under this circumstance is what I am calling Chance2. That can be hard to calculate.

The chance that the entry will be chosen is: (Chance2 + Chance3 +Chance4) / 3

The hard part is figuring out what Chance2 & Chance3 & Chance4 are. The special case where the weighting is the same is doable. But figuring out what Chance 2 or Chance 3 or Chance4 is when the weightings are different is, to be polite, complicated.

In the example you cite in the first part of your quote at the top of this post: M is 5 and N is 5.
1 - [ (5-5) / (5) ] --> 1 which is equal to 100% as you say.

In your example in the second part of your comment at the top of this post: M is 10 and N is 5.
1 - [ (10-5) / (10) ] --> 1 - .5 --> .5 which is equal to 50% as you say.

Are we on the same page?

Yeah.

[quote]Ignore the weight problem for a minute. Let’s say we have a pool of items A, B, C, and D. We are going to select between 1 and 4 of them without duplicates. The goal is to compute the odds that A will be chosen.

If 1 item is chosen, there are 4 possible combinations, A exists in 1 of those, so has a 25% chance of selection.
If 2 items are chosen, there are 6 possible combinations, A exists in 3 of those, so has a 50% chance of selection.
If 3 items are chosen, there are 4 possible combinations, A exists in 3 of those, so has a 75% chance of selection.
If 4 items are chosen, there is only 1 possible combination, so A has a 100% chance of selection.[/quote]

Chance that you will choosing 1 item is 1 / 4. Chance that you will choosing 2 items is 1 / 4 etc

¼ comes from ( 1 / (max-min+1) ) with min being 1 and max being 4.

Now look at each of those individual scenarios

the formula is now 1 - [ (M-N) / (M) ]

#1: 1 - [ (4-1) / (4) ] --> 1 - 3 / 4 --> 1 / 4 or 25 %
#2: 1 - [ (4-2) / (4) ] --> 1 - 2 / 4 --> 2 / 4 or 50 %
#3: 1 - [ (4-3) / (4) ] --> 1 - 1 / 4 --> 3 / 4 or 75 %
#4: 1 - [ (4-4) / (4) ] --> 1 - 0 / 4 --> 1 or 100 %

Each of the scenarios has an equal chance of existing ( 1 / 4)

(Chance2 + Chance3 +Chance4) / 3

Show the final chance is: [ (1 / 4) + (2 / 4) + (3 / 4) +( 1) ] / 4 --> (10 / 4) / 4 --> 10 / 16 = .625

I understand that, but that’s the problem. The math says the chance should be 0.625. But the actual results say 53.3 repeating. Can you explain the discrepancy?

[quote]However… this isn’t how it actually works out. Because there are 15 total possible combinations, and 8 include A:

{A} {B} {C} {D} {A,B} {A,C} {A,D} {B,C} {B,D} {C,D} {A,B,C} {A,B,D} {A,C,D} {B,C,D} {A,B,C,D}[/quote]

There may be 15 total possible combinations BUT they are not equally likely.

The chance of {A,B,C,D} is 1 / 4: that is what will happen when N is 4 which happens 1 / 4 of the time.

But the chance of {A} is 1 / 16: First N has to be 1 which happens 1 / 4 of the time and then A has to be actually selected which occurs 1 / 4 of the time for a result of (1 / 4) * (1 / 4) --> 1 / 16

[quote=299048:@Robert Livingston]There may be 15 total possible combinations BUT they are not equally likely.

The chance of {A,B,C,D} is 1 / 4: that is what will happen when N is 4 which happens 1 / 4 of the time.

But the chance of {A} is 1 / 16: First N has to be 1 which happens 1 / 4 of the time and then A has to be actually selected which occurs 1 / 4 of the time for a result of (1 / 4) * (1 / 4) --> 1 / 16[/quote]
Aha, got it. I knew I was overthinking it. I kept looking at the numbers telling myself “I KNOW it should work out this way” but couldn’t quite explain the difference.

With that in mind, I shouldn’t need to compute the combinations, which means I think I can handle the weight problem.