Show HN: Rust Vector and Quaternion Lib

128 points by the__alchemist 4 months ago

I use this library I made for Vectors and Quaternions in many personal projects. I've open-sourced it, in case anyone else would get use out of it.

I use this on various projects, including quadcopter firmware, a graphics engine, a cosmology simulation, and several molecular dynamics applications. No_std compatible.

gibibit 4 months ago

> For Compatibility with no_std tgts, e.g. embedded, Use the no_std feature.

Note, this is not the recommended way to use Rust feature flags. They are additive and so the correct way to make a `no_std` compatible crate is to have a `std` feature flag that conditionally enables use of the `std` library.

Referring to Effective Rust:

> Note that there's a trap for the unwary here: don't have a no_std feature that disables functionality requiring std (or a no_alloc feature similarly). As explained in Item 26, features need to be additive, and there's no way to combine two users of the crate where one configures no_std and one doesn't—the former will trigger the removal of code that the latter relies on. - https://www.lurklurk.org/effective-rust/no-std.html

leoedin 4 months ago

I'm a bit confused by this. In practice you can't combine no_std and std within a single project anyway - because then the entire app is relying on the std library, so is no longer no_std. Surely if a user sets no_std as a feature, they expect that to be the case for the entire application?
- sapiogram 4 months ago
  
  > In practice you can't combine no_std and std within a single project anyway
  You can with Rust features. It allows a library to conditionally compile certain parts of the code, and users of the library can decide which features to enable.
  If a library only uses stdlib in code gated by #[feature(std)], users can disable that flag, and use the library in no_std contexts.
  - endofreach 4 months ago
    
    Interesting. I have no rust experience yet, so i wonder: while that sounds very cool, how does it look like in the wild? To me it seems like this could quickly turn into a nightmare, if the code is not very well organized / structured. In reality shouldn't this rather be two separate libs?
    But maybe i should first learn rust before asking.
    
    sapiogram 4 months ago
    
    It's definitely a tool that must be used carefully, and why informal rules like "features should be strictly additive" exist. I'd say they are very commonly used by large libraries, for example, here's the list of features for Axum, (arguably) Rust's most popular http server: https://docs.rs/axum/latest/axum/#feature-flags
    
    the__alchemist 4 months ago
    
    Anecdotally, the use cases I've had are all clearly in one category or the other: I'm either building firmware for an embedded device, or building a PC application. I'm curious about the overlap cases.
    
    gibibit 4 months ago
    
    You're right that an application project (a `binary crate` in Rust parlance) for embedded firmware versus a PC application is very different. But library crates are often usable in both an embedded or system-level context or in a full-fledged desktop GUI application.
    Consider these common libraries you might use in either a `std` project (PC application, web microservice) or `no_std` project (embedded microcontroller firmware, bootloader, Linux kernel module, blockchain smart contract):
    - data encoding (https://crates.io/crates/base64 for instance),
    - hashing (SHA2 https://github.com/RustCrypto/hashes/tree/master/sha2),
    - data structures (https://github.com/Lokathor/tinyvec)
    - time/date manipulation (https://docs.rs/chrono/latest/chrono/)
    
    burntsushi 4 months ago
    
    As of recently, Jiff with time zone support can now be used in a no_std project. :-)
    https://docs.rs/jiff/latest/jiff/tz/index.html#core-only-env...
    
    gibibit 4 months ago
    
    Amazing, thanks! To anyone who hasn't tried it yet... Jiff is a fantastic time and date crate, I highly recommend you check it out, as a possible improvement over chrono and time.
    
    mlsu 4 months ago
    
    I’ve used it in the past for creating a shared messaging crate that has packet definitions, handshake logic, serialization/deserialization.
    I think with sloppy/complex code it could start to resemble #ifdef PLATFORM complexity if you do a lot inline, but cargo workspaces are a good way to reduce the blast radius.
- LoganDark 4 months ago
  
  > In practice you can't combine no_std and std within a single project anyway
  This is not in any way the problem.
  The reason for having std be the feature instead of having no_std be the feature is because if a dependency unsets the default features because it does not rely on the std feature and then someone else still does rely on the std feature then everything will still work properly.
  If no_std is the feature then if a dependency sets the no_std feature because it does not rely on the std features and then someone else does not set no_std because they do rely on the std features then there will be problems because there won't be any ability for the someone else to unset the no_std feature that was specified by the first dependency.
- junon 4 months ago
  
  std is based on core, the latter of which is available in no_std. Std is thus additive. GP comment is correct; this is how the entire ecosystem structures no_std-compatible crates.
the__alchemist 4 months ago

This is great info! I tried using that approach, but had trouble enabling the `num_traits/libm` dependency on no_std only. Does anyone know if that's possible?

tux3 4 months ago

Sounds interesting, but it's a space with a lot of different options. Have tried doing benchmarks against other vector libraries?

If I'm shopping for a vector library, this is one of the pieces of information that makes the decision easier.

creata 4 months ago
I haven't benchmarked, so these opinions might be worthless, but here's how they're laid out internally.
* nalgebra uses fixed-size arrays (so a Vec4 is like [[f32; 4]; 1])
* this library seems to use fields (so a Vec4 is a struct with x,y,z,w fields)
* glam uses SIMD types for some types (so a Vec4 is a __m128)
I think maybe glam might win for some operations, but if you want performance, people usually SIMD in the other direction when possible, like:
```
    struct Vec4 { x: __m128, y: __m128, z: __m128, w: __m128 }
```
According to mathbench-rs[0] (which I looked at after typing this comment...) it looks like nalgebra and ultraviolet have such types. The benchmarks have "N/A" for many of the "wide" nalgebra entries though, which might indicate that nalgebra hasn't implemented many of those functions for "wide" types.
[0]: https://github.com/bitshifter/mathbench-rs
- Animats 4 months ago
  
  The glam guy wanted to go all aligned so that SIMD would work, but that would break so much code that he was talked out of it.
  Hint for language designers: when you design a new language, put this stuff, and multidimensional arrays, in the standard library. Multiple incompatible versions of such types is as bad for number-crunching as would be multiple incompatible string types for string manipulation. You want your standard numeric libraries to work on the standard types.
  This is part of why Matlab is so successful. You don't have to worry about this stuff.
  - zevets 4 months ago
    
    It's honestly surprising so many programming languages ignore the needs of "floating point" users. Rust has ints that aren't 0, but no std type for floats that aren't NaN? In some sense, ieee754 floats are better than ints, as the float error modes have NaNs are just HW supported error tagged enum types.
    I think its from a CS education which treats the "naturals" as fundamental, vs an engineering background where the "reals" are fundamental, and matrix math _essential_ and people live on one side of this fence.
    
    Animats 4 months ago
    
    That was true in the past, for a few reasons.
    - Floating point operations used to be slow. On early PCs, you didn't even have a floating point unit. AutoCAD on DOS required an FPU, and this was controversial at the time.
    - Using the FPU inside system code was a no-no for a long time. Floating point usage inside the Linux kernel is still strongly discouraged.[1] System programmers tended not to think in terms of floating point.
    - Attempts to put multidimensional arrays in modern languages tend to result in bikeshedding. If a language has array slices, some people want multidimensional slices. That requires "stride" fields on slices, which slows down slice indexing. Now there are two factions arguing. Rust and Go both churned on this in the early days, and neither came out with a good language-level solution. It's embarrassing that FORTRAN has better multidimensional arrays.
    Now that the AI world, the GPU world, and the graphics world all run on floating point arrays, it's time to get past that.
    [1] https://www.kernel.org/doc/html/next/core-api/floating-point...
    
    vlovich123 4 months ago
    
    > This enables some memory layout optimization. For example, Option<NonZero<u32>> is the same size as u32
    NaN doesn’t have this optimization because the optimization isn’t generic across all possible representations. Trying to make it generic gets quite complex and floats might have many such representations (eg you want NaN to be optimized, someone else needs NaN and thinks infinity works better etc). In other words:
    Nonzero is primarily for size optimization of Option<number>. If you want sentinels, then write your own wrapper, it’s not hard.
- grandempire 4 months ago
  
  The code example is absolutely the way to do simd. A simd type is not a geometric vector, it’s a magic float that happens to do 4 float operations at a time.
  If your vector is generic (using cpp syntax here): vec<3 float> then you can just put in vec<3, float4> and then solve 4 vector math problems at a time.
  It helps tremendously if your interfaces already take N inputs at a time, so then instead of iterating one at a time you do 4 at a time.
  - creata 4 months ago
    
    Right. Glam (maybe because it's stuck with its data layout, maybe to present a cleaner interface) instead uses a SIMD type for a single Vec4, which tends to be a much less efficient way of using SIMD types.
    > If your vector is generic (using cpp syntax here): vec<3 float> then you can just put in vec<3, float4> and then solve 4 vector math problems at a time.
    Yeah, that's the idea, but for anyone reading, the main complication is when you need to branch. There are usually multiple ways to handle branching (e.g., sometimes it's worth adding a "fast path" for when all the branches are true, and sometimes it isn't; sometimes you should turn a branch into branchless code and sometimes you shouldn't) and AVX-512 adds even more ways to do it.
- the__alchemist 4 months ago
  
  I did some digging; I added rudimentary 256-bit (AVX) Structure-of-Array layout Vec3 (f32) support. Seems to work. Have constructor/unpack methods to convert between these types and [Vec3; 8].
- camel-cdr 4 months ago
  I'm not a fan of such vector libraries, AFAIK they all just inhibit auto vectorization. At most, you can take advantage of 128-bit SIMD with a bunch of shuffles and extracts, whenever you are also working with scalar variables.
  I did a small experiment comparing 6 possible implementations of the n-body [0] update loop: https://godbolt.org/z/sfehEfPGT
  The implementations are:
  * AOS: a simple scalar implementation with coordinates stored in an array of structs * SOA: a simple scalar implementation with coordinates stored as a struct of arrays * float3: uses a struct of three floats as a vector type * float4: uses a struct of four floats as a vector type, ignores the last element * vec4: like float4, but using a generic SIMD abstraction (so basically what glam does) * floats3: attempts to do SOA with nice syntax. floats3 type has three arrays of floats and there are operations to extract and store a float3 type from a given index.
  Since these abstractions are often used in games I'll start of looking at what the compiler produces when targeting Zen5 with -O3 -ffast-math:
  * Zen5 O3 ffast-math:
  AOS: gcc: 11119 ~SSE clang: 3688 AVX512, but quite messy SOA: gcc: 1283 AVX512 clang: 1202 AVX512 float3: 11050 ~SSE clang: 10894 ~SSE float4: gcc: 8646 ~SSE clang: 10815 ~SSE vec4: gcc: 7913 ~SSE clang: 8196 ~SSE floats3: gcc: 1284 AVX512 clang: 13351 ~SSE
  The numbers next to the compilers are the cycle estimates from the llvm-mca model of Zen5 for processing 1024 elements. AVX512 indicates whether the compiler was able to vectorize the loop with AVX512, and ~SSE means it could be partial vectorization with SSE.
  Now let's also look at a different ISA, this time the RISC-V Vector extension:
  * P670 2xVLEN O3 ffast-math:
  AOS: gcc: 17445 clang: 3357 RVV SOA: gcc: 3355 RVV clang: 3334 RVV float3: gcc: 17445 clang: 17449 float4: gcc: 25668 RVV128 clang: 17470 RVV128 vec4: gcc: 45091 RVV128 clang: 23111 RVV128 floats3: gcc: 3333 RVV clang: 17446
  This time the llvm-mca model for the SiFive-P670 was used, but I pretended it has 256-bit vectors instead of 128-bit ones, as the vector length is transparent to the codegen and this amplifies the effect I'd like to show. RVV means it could be fully vectorized, while RVV128 is similar to ~SSE and means it could only partially take advantage of the lower 128-bit of the vector registers.
  So if you are using such vector types to do computations in loops you are likely to end up preventing your compiler from optimizing it for modern hardware. In general writing simple SOA scalar code seems to vectorize best, as long as you make sure the compiler isn't confused by aliasing. But even the plain old AOS scalar code can be vectorized by modern clang, but not by gcc, and sadly also not the float3/float4 implementations, which should be very similar. Modern ISAs like NEON/SVE/RVV have more complex vector load/stores that allow you to retrieve data more efficiently even from a traditionally bad data layout like AOS. You can dress up the SOA code to make it a bit nicer, unfortunately my attempt with floats3 currently only works properly with gcc.
  Below are the results when compiling without -ffast-math:
  * Zen5 O3:
  AOS: gcc: 11819 ~SSE clang: 10788 ~SSE SOA: gcc: 4146 AVX512 clang: 13734 AVX512 float3: 11826 ~SSE clang: 11499 ~SSE float4: gcc: 8662 ~SSE clang: 11810 ~SSE vec4: gcc: 8575 ~SSE clang: 7451 ~SSE floats3: gcc: 4148 AVX512 clang: 14367 ~SSE
  * P670 2xVLEN O3:
  AOS: gcc: 17464 RVV64 clang: 6122 RVV SOA: gcc: 7140 RVV clang: 6118 RVV float3: gcc: 17445 clang: 17464 RVV64 float4: gcc: 25665 RVV128 clang: 19184 RVV128 vec4: gcc: 17463 RVV128 clang: 56868 RVV128 floats3: gcc: 7140 RVV clang: 17444
  Weirdly clang seems to be struggling with the SOA here, and overall vec4 looks like the best performance tradeoff for X86. Still with proper SOA, and I bet you could coax clang into generating it as well, you can still get a 2x performance improvement. Additionally, vec4 performs horribly with current compilers for VLA SIMD ISAs.
  I'll try to experiment with some real world code, if I can find some that is bottle-necked by such types.
  [0] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
  - creata 4 months ago
    
    > I'm not a fan of such vector libraries, AFAIK they all just inhibit auto vectorization.
    Yes, I don't think anyone using them is depending on autovectorization.
exDM69 4 months ago

I'm using Rust (nightly) std::simd for my 3d math stuff. It doesn't come with matrix and quaternion functions but they were relatively simple to implement based on known good algorithms. But you do get basic arithmetic operators without having to implement std::ops by hand (or with macros).
What was/is not pretty is adding some generics so the same code can work with f32 and f64. I did manage to get something that works but it's so ugly that I didn't want to release it. I'm sure it could be improved but what I got works well enough and I haven't needed to touch it in a few years.
std::simd is quite pleasant to work with and most importantly it allows a zero cost fallback to CPU specific intrinsics when needed.
mflaherty22 4 months ago

I'd like to see what the author's motivation was for making this, but I'm not sure benchmarks would be the first thing I look for :)
- the__alchemist 4 months ago
  
  I haven't done benchmarks, but this speculation nailed it - The exact motivation is lost, but I believe I became tilted with inconsistent mixes of JPL and Hamilton quaternion operations on the popular libs at the time. Since then, I use it as a low-friction way to conserve geometrical functions among multiple projects. I.e., it's easier to just edit the code base than go through a PR process.
  - zamalek 4 months ago
    
    no_std seems somewhat novel, which might have been another motivating factor?

Keyframe 4 months ago

I admit I haven't gone yet past the README, but I did a fair share of such programming in the past and there's always the same use-case presented which is fine for clarity, but.. ask yourself when is it that you're doing one of each (transforms, normalizations, etc.)? General use-case is many (MANY) at once, and writing an optimized code for both is vastly different.

the__alchemist 4 months ago

Concur on this. Usually reaching for an approximation algorithm (FMM, Barnes Hut etc), and/or serialization and sending to a CUDA kernel. And generally using Rayon to parallelize if not on CUDA. I'm curious how to explore the space on CPU optimization (SIMD, SOA/AOS etc), but don't know anything about it.
- Keyframe 4 months ago
  
  I'm curious how to explore the space on CPU optimization (SIMD, SOA/AOS etc), but don't know anything about it.
  as with anything in that regard. profile, profile, profile. valgrind, check cache misses and profile. calculate theoretical throughput of a cpu you're working on, like actual bandwidth of reading/writing RAM, with and without caching, and that's your high post; to try to get as close as possible to those limits. If you want to start with that, you can do just that, simple reads/writes and profile and then introduce functions and structures instead and try to reclaim speed as much as possible. graphs over profiling always help, even better graphs over profiling on commits or PRs so you can tell how you're progressing. But that's just like my opinion, man. No right/wrong way, profiling always tells the truth in the end.
  tl;dr; read ops for cpu; profile.
  - the__alchemist 4 months ago
    
    I just added a 256-bit SoA SIMD computation. Going to follow your advice and benchmark this against both plain rust, and CUDA. (f32)
    
    Keyframe 4 months ago
    
    awesome, wish you all the best with the project going forward!

moron4hire 4 months ago

  let mut d = a.dot(b);
  d.normalize();

I think I found a bug in your readme. Dot product should return a scalar. I don't know Rust at all, but I've never met any languages that had a normalize method for scalars.

the__alchemist 4 months ago

You're absolutely correct

kvark 4 months ago

The gamedev ecosystem appears to be split between nalgebra (for Rapier users) and glam. Where does lin-alg fit?

cwiz 4 months ago

There's also bevy_math with quite large number of crates depending on it.
- dakom 4 months ago
  
  bevy_math also uses glam, re-exported in the prelude: https://github.com/bevyengine/bevy/blob/cc69fdd0c63ea79fda4f...
pttrn 4 months ago

cgmath?
- bladeee 4 months ago
  
  cgmath is nice, but it uses old Rust 2015 and hasn't been updated since January 2021.
  - tialaramex 4 months ago
    
    I can't imagine any obvious reason I would miss Rust's 2018 edition, let alone 2024 edition, to implement linear algebra? People seemed happy enough in Fortran before I was old enough to go to school and I don't sense it's an application where I'd want async for example. A lot of other edition changes are nice when writing new thing but not helpful for an existing codebase. So, like, sure, it's 2015 edition but that's fine?
  - foresterre 4 months ago
    
    There's actually an advantage to using older editions, and that is that it lowers the MSRV (minimum supported Rust version). This is especially nice for libraries, while binary project usually can just use the latest edition.
  - duped 4 months ago
    
    it's updated, just not on crates.io

simojo 4 months ago

how does it compare to nalgebra?

aquarin 4 months ago

Very nice. Are there any test covering the functionality?

m00dy 4 months ago

unfortunately, there are no tests in it

ivanjermakov 4 months ago

> Do not run cargo fmt on this code base; the macro used to prevent duplication of code between f32 and f64 modules causes undesirable behavior.

Why not use #[rustfmt::skip]?

the__alchemist 4 months ago

Good question. I tried marking that on several functions, but still ended up with inappropriate indents, related to the macros. Fortunately, it seems that a (relatively) recent `rustfmt` update resolved this. `fmt` now doesn't work properly, but it doesn't break the indents either.

lawlessone 4 months ago

Quaternions, we were due our weekly quaternion post.

Nice work by the way!

dsffsad 4 months ago

[flagged]