Page MenuHome

BLI: Generally improve C++ data structures
ClosedPublic

Authored by Jacques Lucke (JacquesLucke) on Jun 4 2020, 9:50 PM.

Details

Summary

The main focus here was to improve the docs significantly. Furthermore, I reimplemented Set, Map and VectorSet. They are now (usually) faster, simpler and more customizable. I also rewrote Stack as well to make it more efficient.

Obviously, this is not the end of improvements to these data structures. My hope is that this update will make other developers more comfortable with the idea of having custom implementations for these basic structures.

What I expect from this review:

  • Feedback on some statements I made throughout the code like BLI::Vector should be your default choice for a vector data structure in Blender. There are similar statements for the other data structures and also ArrayRef and StringRef.
  • Feedback on public method/class names. I like the names I used, but if there is a majority who prefers other names like push_back instead of append, I can change that. The best time for these changes is now.

You are free to give feedback on implementation details as well of course, but those can be changed at any point later on. So they should not be the main focus of this review. It probably does not make too much sense to look at the diff, so I recommend you apply this patch and read through it in your editor.

Diff Detail

Repository
rB Blender
Branch
update-bli-cpp-structures (branched from master)
Build Status
Buildable 8394
Build 8394: arc lint + arc unit

Event Timeline

Jacques Lucke (JacquesLucke) requested review of this revision.Jun 4 2020, 9:50 PM
Jacques Lucke (JacquesLucke) created this revision.
Ray Molenkamp (LazyDodo) added inline comments.
source/blender/blenlib/BLI_array_ref.hh
196

This applies to all uses of failure but i picked this one since it's a nice concise example

Saying something would "fail" imho implies controlled failure where the caller should be logically be able to deal with said failure (by either a return flag or an exception), doing an out of bounds read may not actually fail in this instance, it may cause a catastrophic failure or it may just return garbage data depending a bit on 'luck' where on the heap we are allocated.

Unless we want to implement rigorous parameter validation (quite possibly at a cost of perf) and error handling in the calling code, i'd lean towards using the phrasing "undefined behavior" rather than "failure"

source/blender/blenlib/BLI_array_ref.hh
196

That makes sense. Thanks.

  • "fail" -> "undefined behavior"

Thanks for the great improvement to this documentation! I especially appreciate notes on when/why to use these data structures.
I wrote some minor nits, mostly on language in comments, which normally I wouldn't do but you have put so much care into wording throughout, maybe you care about these small details.

source/blender/blenlib/BLI_allocator.hh
26

Shouldn't this be std::vector, not std::Vector?

53

A future nice thing to have would be an arena allocator, which has a deallocator that does nothing because all memory is reclaimed when the arena is freed. Blenlib has such an allocator, and I like to it universally when working on a big complicated algorithm that allocates lots of things but doesn't really want to or need to free most of them until the algorithm as a whole is finished.

source/blender/blenlib/BLI_array.hh
33

intend -> intent

source/blender/blenlib/BLI_array_ref.hh
34

intend -> intent

47

not save -> not safe

338

Since you've been changing all "Return ,..." to "Returns ..." in descriptions of member actions, maybe you want to be consistent and change all "Get ..." to "Gets ...".

381

Thanks for this. I would love versions of this that:

  1. assume there's a standard stream output function available for the value so one doesn't have to define a wrapper or make a lambda to use this in that very common case.
  2. a version that prints as blank-separated values terminated by a newline, also assuming there is a standard stream output function available.

Also, versions for the other containers, though perhaps wrapping them in an ArrayRef for debug printing isn't too painful.
Of course, I realize that it is already a bit strange to put debugging-help functions in standard data structures so perfectly understand if you provide no more than print_as_lines.

source/blender/blenlib/BLI_map.hh
93

I would phrase "other than in a vector" as "unlike vector"

762

Might you also consider adding a debug print function that prints all the key->value mappings in a Map?

source/blender/blenlib/BLI_map_slots.hh
140

Your pattern of using Returns instead of Return in comments is not being followed in this file.

source/blender/blenlib/BLI_probing_strategies.hh
45

criterions -> criteria

source/blender/blenlib/BLI_set.hh
91

other than in a vector -> unlike in a vector, or unlike vector

408

Consider having a debugging function to print elements of the Set.

source/blender/blenlib/BLI_stack.hh
29

Maybe explicitly state the advantages of using Stack instead of Vector (which is easily used as a stack?)

source/blender/blenlib/BLI_vector.hh
101

Thanks for this! I use this all the time.

396

While I can see the argument for using push_back to mirror std::vector, I am with you in preferring append. I always found push_back a strange and non-obvious name. Python and many other languages use append.

752

Consider a debugging print for elements.

source/blender/blenlib/BLI_allocator.hh
53

I agree, that would be good to have. There is already LinearAllocator but we have to think about how to make this work with the containers in a nice way. There are quite a few specialized allocators that I'd like to use in different cases, but I'd say that's a topic for another day.

source/blender/blenlib/BLI_array_ref.hh
381
  1. If you scroll down just a tiny bit, you'll find that version.
  2. I agree, that is helpful as well.

In general I'm totally fine with adding these debug print functions. We/I just need to think about how to make them consistent between the different containers.

source/blender/blenlib/BLI_map.hh
762

Yes. I did have that at some point, but removed it for now. I'll think about how make these debug prints consistent between the containers.

  • incorporate feedback

This is really great! Appreciate you taking all the feedback from the other thread into account here.

General question: Why Ref and Mutable Ref instead of View and Span?
ArrayRef becomes a confusing name I think, along with using 'array' in the documentation extensively. It works with arrays AND vectors (of the std and BLI varieties). Would just having a BLI::View and BLI::Span be too common of names to use instead? Though, honestly, the view vs. span terminology is not as clear as Ref and Mutable Ref unless you're already familiar with modern c++... sigh, words are hard. RangeView, RangeSpan ... RangeRef, MutableRange... not sure 'range' would make it better either... It's probably ok to leave Ref and Mutable Ref, I can't think of a measurably better one.

source/blender/blenlib/BLI_map.hh
23

assoziative -> associative

source/blender/blenlib/BLI_set.hh
238

intend -> intent

source/blender/blenlib/BLI_vector_set.hh
241

intend -> intent

I borrowed the names ArrayRef and MutableArrayRef from the LLVM (https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/ADT/ArrayRef.h).

Just View is not specific enough imo. Every kind of data structure can have a View. I've seen MapView and SetView in some other code base if I remember correctly. ArrayView instead of ArrayRef would be fine with me, but I'm not sure if that is better.
Just Span and MutableSpan seems better. The term seems to imply that it is a contiguous array. Also, this is closer to the standard library. I do not have a strong opinion on whether ArrayRef or Span is better. ArrayRef seems more clear to me, but having a shorter name is also nice.

I'm not familiar enough with C++20 but if it's close enough to their span and view i'd say we stick with those names, if they are fundamentally different things chose a different name to avoid confusion in the future.

I'm not familiar enough with C++20 but if it's close enough to their span and view i'd say we stick with those names, if they are fundamentally different things chose a different name to avoid confusion in the future.

I never used std::span, but from what I see it is basically the same as BLI::MutableArrayRef. Will see if someone else voices an opinion soon. If not I'll probably rename it to BLI::Span and BLI::MutableSpan.

  • minor fix for print_stats.

I only scanned through the code without reviewing every change in detail, but overall this looks good to go in.

This revision is now accepted and ready to land.Jun 8 2020, 2:57 PM

I only scanned through the code without reviewing every change in detail, but overall this looks good to go in.

Thanks. I'll probably commit this tomorrow then.

Do you have an opinion on the ArrayRef vs Span issue? I started to like the name Span more over time. The constness behavior will be a bit different to C++20 std::span, but I'll explain that in the documentation in more detail.

In any case, I'll commit this patch first, then introduce the blender namespace in blenlib, and then potentially rename ArrayRef to Span. I don't see much value in doing all of that in this patch.

Do you have an opinion on the ArrayRef vs Span issue? I started to like the name Span more over time. The constness behavior will be a bit different to C++20 std::span, but I'll explain that in the documentation in more detail.

Renaming to Span is fine with me.

In any case, I'll commit this patch first, then introduce the blender namespace in blenlib, and then potentially rename ArrayRef to Span. I don't see much value in doing all of that in this patch.

Right, no need to do it all in one patch.

I haven't gone through all of it, but in general it looks very nice & well documented. There are a few nags, but I'd accept this patch even without these being addressed.

source/blender/blenlib/BLI_array_ref.hh
32

Up to here it was not known that an ArrayRef can refer to a Vector. IMO this should be described in the first sentence.

47

I would even "not safe" → "unsafe", as in that case the negation is harder to miss.

95

Is it desired to have C-style casts in C++ code? I always thought it was better to use one of the explicit C++ casts.

  • update documentation according to feedback
source/blender/blenlib/BLI_array_ref.hh
95

I think that deserves its own separate discussion on devtalk. I'm fine with using C++ casts, although I've yet to see a case where these explicit casts improve readability.