[Beneath the code is a snippet of console output, as follows:]
test no.99989: passed
test no.99990: passed
test no.99991: failed
test no.99992: passed
test no.99993: passed
test no.99994: passed
test no.99995: passed
test no.99996: passed
test no.99997: passed
test no.99998: passed
test no.99999: passed
95.121% tests passed
I am a human who transcribes posts to improve accessibility on Lemmy. Transcriptions help people who use screen readers or other assistive technology to use the site. For more information, see here.
We'll grab those in integration and E2E tests.. let's just stick with even numbers such that 2 < n < 1002... that's 1000 cases, more than enough, and 100% test coverage
You are joking, but this is exactly what happens if you optimize accuracy of an algorithm to classify something when positive cases are very few. The algorithm will simply label everything as negative, and accuracy will be anyway extremely high!
This is also why medical studies never use accuracy as a measure if the disorder being studied is in any way rare. Sensitivity and specificity or positive/negative likelihood ratios are more common
This is actually a perfect example of why to care about the difference between accuracy, precision, and recall. This algorithm has 0 precision and 0 recall, the only advantage being that it has 100% inverse recall (all negative results are correctly classified as negative).
Prime numbers become less frequent as the numbers get larger, so if you want to implement a function that tests whether a number is prime, just always returning false will get more and more accurate as you count up.
The console output is just saying whether it was correct to say the number isn't prime, and the percent is the accuracy over the previous numbers
There are 9592 prime numbers less than 100,000. Assuming the test suite only tests numbers 1-99999, the accuracy should actually be only 90.408%, not 95.121%
The density of primes can be approximated using the Prime Number Theorem: 1/ln(x).
Solving 99.9995 = 100 - 100 / ln(x) for x gives e^200000 or 7.88 × 10^86858. In other words, the universe will end before any current computer could check that many numbers.
I found a list of prime numbers, which states that the 50,000,000th prime number is 982,451,653, which means 5.08930896% of the numbers up to 982,451,653 are prime. That's unfortunate, as it means the accuracy is actually lower than the original post we go further - down from 95.121% accuracy to 94.921%. Bummer!
Out of curiosity, I then whipped up a quick program in rust that starts from those numbers, crunching forward through the primes while prime_count as f32 / total_count as f32 > 0.05, using 16 CPU cores to divide-and-conquer and check whether a number is prime. There's probably a better way to do that, but meh. Such a check will essentially only get me back above 95% though, and based on the rate of change, I suspect it would take an exponentially higher amount of time than whatever it takes to get to 99.5%.
In the time it's taken me to write this, it's calculated just over 330,000 more primes, reaching ~0.050874 hit rate for primes.
This has led me down a small rabbit hole, in which it turns out there are plenty of folks who have approached the topic of "what percentage of numbers are prime?" - and the answer is essentially "it will eventually round to 0%". Because of you, I remain curious to know when that crosses the threshold of 99.5% though - and I'll at least leave it running for the next day or two to see how close it gets.
Unfortunately though, at the rate my PC can calculate, I don't think I'm personally gonna be hitting an answer to this any time soon. If I ever do manage to figure it out, I'll be sure to update... because hell, why not.
I've also considered trying to find bigger lists of primes, but meh. I've already spent an hour on this that I intended to spend playing D&D so ... meh. =]
We got nerd sniped at almost the exact same time, but approached this in very different ways. I applaud your practical approach, but based on what I calculated, you should stop now. It will never reach 99.999%