A Tale of Four Fuzzers | Svelte Hacker News

gavinhoward 31 minutes ago

The title of the blog post downplays the absolute masterclass that this post is. It should be called "A Tale of Four Fuzzers: Best Practices for Advanced Fuzzing."

And if you don't have time, just go to the bullet point list at the end; that's all of the best practices, and they are fantastic.

atn34 3 hours ago

> If you wrote a function that takes a PRNG and generates a random object, you already have a function capable of enumerating all objects.

Something often forgotten here: if your PRNG only takes e.g. a 32-bit seed, you can generate at most 2^32 unique objects. Which you might chew through in seconds of fuzzing.

Edit: this is addressed later in the article/in a reference where they talk about using an exhaustive implementation of a PRNG interface. Neat!

pfdietz 6 hours ago

> If you wrote a function that takes a PRNG and generates a random object, you already have a function capable of enumerating all objects.

More specifically: if you uniformly sample from a space of size N, then in O(N log N) tries you can expect to sample every point in the space. There's a logarithmic cost to this random sampling, but that's not too bad.

IngoBlechschmid 5 hours ago

Just a tiny addition: Yes, N log N is the average time, but the distribution is heavily long-tailed, the variance is quite high, so in many instances it might take quite some time till every item has been visited (in contrast to merely most items).
The keyword to look up more details is "coupon collector's problem".
- pfdietz 4 hours ago
  
  You can also cover every one of the points "with high probability" in O(N log N) time (meaning: the chance you missed any point is at most 1/p(N) for a polynomial p, with the constant in the big-O depending on p.)
matklad 6 hours ago

It is much better than this. You can _directly_ enumerate all the objects, without any probabilities involved. There's nothing about probabilities in the interface of a PRNG, it's just non-determinism!
You could _implement_ non-determinism via probabilistic sampling, but you could also implement the same interface as exhaustive search.
- pfdietz 6 hours ago
  
  Well, yes. But the point is that random sampling lets you do it without thinking. Even better, it can sample over multiple spaces at the same time, and over spaces we haven't even yet formalized. "Civilization advances by extending the number of important operations which we can perform without thinking of them." (Whitehead)
  An example is something like "pairwise testing" of arguments to a function. Just randomly generating values will hit all possible pairs of values to arguments, again with a logarithmic penalty.
  - AlotOfReading 3 hours ago
    
    The point is that you can exhaustively explore the space without logarithmic overhead. There's no benefits to doing it with random sampling and it doesn't even save thought.
    
    pfdietz 3 hours ago
    
    I already explained what the benefit is. What is it with this focus on offloading work from computers to people? Let people do things more easily without thinking, even if it burns more increasingly cheap cycles.
    
    AlotOfReading 3 hours ago
    
    You haven't explained what the benefit is. There aren't "spaces we haven't formalized" because of the pigeonhole principle. There are M bits. You can generate every one of those 2^M values with any max cycle permutation.
    What work is being offloaded from computers to people? It's exactly the same thing with more determinism and no logarithmic overhead.
    
    pfdietz 3 hours ago
    
    > There aren't any "spaces we haven't formalized"
    Suppose that space of N points is partitioned into M relevant subsets, for now we assume of the same size. Then random sampling hits each of those subsets in O(M log M) time, even if we don't know what they are.
    This sort of partitioning is long talked about in the testing literature, with the idea you should do it manually.
    > what work is being offloaded
    The need to write that program for explicitly enumerating the space.
    
    matklad an hour ago
    
    Just to avoid potential confusion, the claim is that this is a function that generates a random permutation:
    pub fn shuffle(g: *Gen, T: type, slice: []T) void { if (slice.len <= 1) return; for (0..slice.len - 1) |i| { const j = g.range_inclusive(u64, i, slice.len - 1); std.mem.swap(T, &slice[i], &slice[j]); } }
    And this is a function that enumerates all permutations, in order, exactly once:
    pub fn shuffle(g: *Gen, T: type, slice: []T) void { if (slice.len <= 1) return; for (0..slice.len - 1) |i| { const j = g.range_inclusive(u64, i, slice.len - 1); std.mem.swap(T, &slice[i], &slice[j]); } }
    Yes, they are exactly the same function. What matters is Gen. If it looks like this
    https://github.com/tigerbeetle/tigerbeetle/blob/809fe06a2ffc...
    then you get a random permutation. If it rather looks like this
    https://github.com/tigerbeetle/tigerbeetle/blob/809fe06a2ffc...
    you enumerate all permutations.

efilife 4 hours ago

is the css completely fucked or am I the only one?

philipwhiuk 3 hours ago

seems fine