Page MenuHome

ID: Foreach External File.
Needs RevisionPublic

Authored by Jeroen Bakker (jbakker) on Oct 11 2021, 9:46 AM.

Details

Summary

This patch adds a foreach external file callback to the id block.
This allows parts of the code to find external stored files.

Supported ID types:

  • Images (Movie, Single file, Sequences, Multi view, Tiled)
  • Movie clips (Movie, Sequences)
  • Brushes (icons)
  • Cached files (Alembic). USD isn't supported as they rely on other files as well that we aren't aware of.
  • Volumes (Single file, Sequences)
  • Library
  • NodeTree (IES, Script)
  • Material
  • World
  • Lamp
  • Sound
  • Text
  • VFont
  • Mesh (custom data external file)
  • Object (Particle system, modifiers (fluid, cloth, ocean, meshcache), soft body)
  • Scene (Sequencer)

This patch is part of localizing external files feature of the asset
browser. See D12423.

Diff Detail

Repository
rB Blender
Branch
T91252-id-foreach-external-file (branched from master)
Build Status
Buildable 17715
Build 17715: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
source/blender/blenkernel/intern/image.c
240

It would be important to explain why this intermediary pointer is used here (I assume this is some sort of security to avoid callbacks trying to reassign the pointer itself, since this is a fixed size array and not an allocated data)?

This revision is now accepted and ready to land.Oct 11 2021, 3:31 PM
Jeroen Bakker (jbakker) marked 3 inline comments as done.Oct 11 2021, 3:51 PM
Jeroen Bakker (jbakker) added inline comments.
source/blender/blenkernel/intern/image.c
240

&ima->filepath gives compilation warnings as it could not match the cast automatically.
Will add a comment.

Jeroen Bakker (jbakker) marked an inline comment as done.
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)
  • Updated comments based on code review.
Jeroen Bakker (jbakker) planned changes to this revision.Oct 12 2021, 4:00 PM

ID: Add foreach_external_files to Volumes.

This revision is now accepted and ready to land.Oct 12 2021, 4:00 PM
Jeroen Bakker (jbakker) requested review of this revision.Oct 12 2021, 4:01 PM
Jeroen Bakker (jbakker) planned changes to this revision.
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Oct 12 2021, 4:16 PM
  • Changed interface of foreach_external_file callback to support different memory allocation modes.
  • Updated comments based on code review.
  • Add foreach_external_file to images.
  • Add foreach_external_file to brushes.
  • Add foreach_external_file for movie clips.
  • Add foreach_external_file for cachefiles (Alembic only).
  • Add volume.
  • Added operator for debugging foreach_external_file.
Jeroen Bakker (jbakker) planned changes to this revision.Oct 13 2021, 10:31 AM
  • Added library, sound, text, vfont.
  • Added NodeTree, Material, Light, World, Mesh.
Jeroen Bakker (jbakker) planned changes to this revision.Oct 13 2021, 12:57 PM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Oct 13 2021, 2:04 PM
  • Added Object.
  • Scene/sequencer.
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Oct 13 2021, 2:50 PM
Sybren A. Stüvel (sybren) requested changes to this revision.Oct 14 2021, 2:54 PM

I've noted a bunch of things inline. Most of them are small changes to reduce code complexit, and most of all: requests for documentation. The API is largely undocumented, so it's hard to say whether the implementation is actually doing the right thing.

There should be unittests for this code. The BLI_path_frame_range_calc is located in a file that's already well covered by unittests, so adding another one should be trivial. The rest of the code also should be tested; it shouldn't be too hard to set up some blend file that covers the majority of cases.

@Jeroen Bakker (jbakker) are you planning on creating a Python interface for this as well? It could be used to basically implement parts of Blender Asset Tracer in Blender itself.

source/blender/blenkernel/BKE_idtype.h
104–106

It's unclear how the external_file_path and external_file_path_size parameters relate. Because of the double pointer the former seems to be some return parameter, or at least the callback itself is able to change the char *. However, external_file_path_size is not a pointer, so it cannot be altered by the callback. This means one parameter is one that can be used as input & output, whereas the other is strictly input-only. I don't see the use of using external_file_path_size=0 when external_file_path is dynamically allocated; the callback should know how much space it can write into.

106

There seem to be two indicators of "dynamically allocated": external_file_path_size == 0 and is_mem_alloc == true. It's probably best to just remove the is_mem_alloc parameter; boolean parameters are infamous for producing hard to understand calls.

107

const can be removed; in function declarations it has no meaning for passed-by-value parameters.

112

I'd vote to remove flags. If it's no in use, it's dead code and should not be committed. Things can always be expanded later.

If you have a concrete future API extension, and this will be added very shortly after landing this patch, document what that extension is going to be. If there are no such concrete, short-term plans, just remove the flags.

332

This should get documentation as to what it does. And since "external file" and "embedded file" are mutually exclusive, the function name may need some attention as well. I would suggest something better, but for that I would have to know what it does (which loops back to the missing documentation).

source/blender/blenkernel/intern/cachefile.c
112

Flip the condition & return early, thereby reducing the cognitive complexity of the remainder of the code.

124

This may report a file twice. This may or may not be a breach of contract; it's not documented whether reported files are unique or not. Document this, and potentially also fix the duplicated call.

source/blender/blenkernel/intern/idtype.c
556 ↗(On Diff #43251)

This needs documentation as to what it does. Is it "unwrapping" embedded files into external ones?

578 ↗(On Diff #43251)

Flip the condition & return early.

source/blender/blenkernel/intern/image.c
235

Why is this marked volatile?

248

This if and the next one look like they'd be clearer with a switch/case.

263–265

This code seems to compute how many image files are to be looped over. This is *not* the number of views, so max_views is not the right name. I think something like max_image_files would be better.

267–290

Nesting is too deep, refactor into smaller functions.

285

The above code has a continue that always executes when ima->source == IMA_SRC_SEQUENCE. This means that this comment isn't reached for sequences, hence the "sequences where ..." doesn't seem to apply here.

source/blender/blenkernel/intern/library.c
73–74

Why the volatile, only to cast it away again?

source/blender/blenkernel/intern/mesh.c
192

Flip the condition, return early, reduce the cognitive complexity of the remaining code.

source/blender/blenkernel/intern/movieclip.c
178

Flip the condition, return early, reduce cognitive complexity.

source/blender/blenkernel/intern/object.c
577

This creates tight coupling between the modifier implementation and this part of the code. It would be much better to have callbacks for this on the modifier type, though, so that each modifier can implement its own external file looper function.

If this really, really cannot be helped, it should be screamed about in the modifier code, and this if/else if chain should be rewritten as switch/case.

source/blender/blenkernel/intern/scene.c
831

Flip the condition, return early.

841

Flip the condition, return early.

844

This should be a switch/case.

source/blender/blenkernel/intern/volume.cc
582–594

This is only reporting the frames that are actually used; that's inconsistent with the handling of Alembic sequences, which reports the entire sequence. Such details should be made consistent and documented.

source/blender/blenlib/intern/path_util.c
857 ↗(On Diff #43251)

if → whether

If the comment is taken literally, your code will not return (i.e. hang Blender) when the range could not be determined.

This revision now requires changes to proceed.Oct 14 2021, 2:54 PM
Jeroen Bakker (jbakker) marked an inline comment as done.Oct 14 2021, 7:21 PM

Best to discuss together with @Bastien Montagne (mont29) he proposed the signature of the callbacks. including the unused flags parameter. We should get some consensus.

source/blender/blenkernel/intern/image.c
235

If filepath_ptr is optimized away by the compiler it gets into undefined behavior. Best to mark them as volatile to make sure undefined behavior is deferred.

263–265

max image files is also not accurate bad name as you can have more files per view. views_stored_in_separate_files_len but separate_files isn't correct.... so.... view_len ?

source/blender/blenkernel/intern/library.c
73–74

See comment in previous usage. removing undefined behavior when variable is optimized away.

Jeroen Bakker (jbakker) marked 17 inline comments as done.Oct 15 2021, 11:36 AM

I have no plans of creating a python api for this. The API should be done in 2 ways. One the add-on is interested in where files are stored (BAT) or want to modify (asset browser). In the other hands an add-on would also want to provide their own external files (fluid caches in flip fluids for example).
I do think it can be good for addons but would lead me further away towards the target we want to achieve. In external files for add-ons any operation on the path should be abstracted away what is a lot of work and not really my focus point. Perhaps when discussing Asset Libraries API we could decide otherwise.

I agree that test cases is very useful for this part as it is well hidden to actual users and the stability of the API depends on the internal design of the specific ID types. I doubt if it is easy to add as it requires a lot of data to work on. For example the BLI_path_frame_range_calc has many edge cases so we should design a few data sets where we can test multiple cases.

Test cases for some ID types can be added. But then we should discuss timing. In the meeting last week it was discussed that this was the an important thing to add before moving to bcon3. We are now a week further and we see that the reviewing of the mechanisms that are required still takes a lot of time.

Personally I don't mind this to move to blender 3.1 as the mechanisms aren't at all tested yet. But that should be agreed by the project and project owners.

source/blender/blenkernel/BKE_idtype.h
106

The signature of is_mem_alloc was requested by module owner.
I added comments about usage and promises.

332

Updated the signature and comments.
Note that they are embedded ids, not files. Perhaps it is confusing with external files and packed files.

source/blender/blenkernel/intern/cachefile.c
124

Yes this is a limitation of the API.

This could be solved by introducing a helper struct that would check if the file were already passed. Other option would be to add a parameter to the callback function to identify if it is a 'persistent' filepath, or a 'generated' filepath.

Lets discuss this with @Bastien Montagne (mont29)...

source/blender/blenkernel/intern/volume.cc
582–594

At this moment the implementation is specific per ID block. In some cases the user (ImageUser) describes how to use a sequence, what isn't part of the ID block. Other cases like volumes the ID block contains the sequence. Other areas the data is only available as runtime data (movie files).

How do you want to make this consistent, without refactoring a large part of blender :-)
If there is a general rule how sequences should be handled we can start migrating towards that.

I agree this should be documented. But a it is specific in the context of a general statement that should lead to an architectural decision. Best location for that is wiki.

Jeroen Bakker (jbakker) marked an inline comment as done.
  • Added documentation - Cleanup image, movie_clip by introducing functions - Added early exits where it made sense
  • Solved items from code review.
Julian Eisel (Severin) requested changes to this revision.Oct 15 2021, 2:40 PM

Requesting some more of that smaller polishing stuff.

source/blender/blenkernel/BKE_idtype.h
105

We usually use colons with the \param. E.g. here:

\param id: The ID that is passed ...

I also prefer indenting the wrapped lines, it increases readability a lot IMHO. E.g.:

\param something: This is some wrapped
                  description of the thing.
\param something_else: Another description

(Same for the cases below.)

109

What's a boxed value? :) Guess you mean contained value?

source/blender/blenkernel/intern/brush.c
228

Why is this volatile? Doesn't seem like it could be changed from externally while this runs. Plus, it's passed to the callback as non-volatile char ** anyway.
If there's a reason for it, it should be mentioned. It's not obvious either way.

source/blender/blenkernel/intern/cachefile.c
123

Again volatile without clear reason.

source/blender/blenkernel/intern/idtype.c
551 ↗(On Diff #43375)

References to functions should use the # prefix, e.g. #BKE_idtype_id_foreach_external_file_for_embedded_id(). Also for the other comments.
This is a rather new guideline https://wiki.blender.org/wiki/Style_Guide/C_Cpp#Comments

source/blender/editors/asset/intern/asset_ops.cc
563 ↗(On Diff #43375)

Redundant separator :)

605 ↗(On Diff #43375)

I wouldn't mind keepting this in as debug operator even. But then make sure to mark it as OPTYPE_INTERNAL

This revision now requires changes to proceed.Oct 15 2021, 2:40 PM
source/blender/blenkernel/intern/image.c
235

I still don't think there's a need to make this volatile, from my understanding this would only be needed for possible asynchronous changes (hardware event, multi-threading, interrupt).

This confirms it: https://stackoverflow.com/a/246148

Some clarifications @Jeroen Bakker (jbakker) and I just worked on:

1diff --git a/source/blender/blenkernel/BKE_idtype.h b/source/blender/blenkernel/BKE_idtype.h
2index 868846cf3e8..b034ba35947 100644
3--- a/source/blender/blenkernel/BKE_idtype.h
4+++ b/source/blender/blenkernel/BKE_idtype.h
5@@ -100,18 +100,20 @@ typedef void (*IDTypeForeachCacheFunction)(struct ID *id,
6 IDTypeForeachCacheFunctionCallback function_callback,
7 void *user_data);
8 /**
9- * Signature of the callback function for the foreach_external_file.
10+ * Signature of the callback function for #foreach_external_file.
11 *
12- * \param id id that was passed when calling `foreach_external_file`. For embedded ids the owner id
13+ * \see IDTypeForeachExternalFileFunction
14+ *
15+ * \param id id that was passed to `foreach_external_file`. For embedded ids the owner id
16 * will be passed.
17- * \param external_file_path A reference to a single detected external filepath.
18- * Filepath can be relative or absolute and checks on existing files might not be performed.
19- * The boxed value can also be contain a value that isn't stored in a blend file.
20- * \param external_file_path_size Size of the allocated memory to store the external_file_path
21- * boxed element. 0 is passed if the boxed value of external_file_path is dynamically allocated.
22- * \param is_mem_alloc Is set to true when the boxed value of external_file_path is dynamically
23- * allocated.
24- * \param user_data user data that was passed when calling `foreach_external_file`.
25+ * \param external_file_path A pointer to a single external (i.e. non-packed) filepath.
26+ * Filepath can be relative or absolute, and may not exist on disk.
27+ * \param external_file_path_size Size of the allocated memory to store `*external_file_path`.
28+ * 0 is passed if `*external_file_path` is dynamically allocated.
29+ * \param is_mem_alloc Is set to true when `*external_file_path` is dynamically allocated; this
30+ * makes it possible for this function to take ownership of the file path memory by assigning
31+ * `NULL` to `*external_file_path`.
32+ * \param user_data user data that was passed to `foreach_external_file`.
33 */
34 typedef void (*IDTypeForeachExternalFileFunctionCallback)(
35 struct ID *id,
36@@ -121,26 +123,35 @@ typedef void (*IDTypeForeachExternalFileFunctionCallback)(
37 bool is_mem_alloc, /* Set to true when dynamically allocated string. */
38 void *user_data);
39
40+/**
41+ * Type definition for filtering flags used by #IDTypeInfo::foreach_external_file.
42+ * \see IDTypeForeachExternalFileFunction
43+ */
44+typedef int IDTypeForeachExternalFileFilterFlag;
45+
46 /**
47 * Signature of foreach_external_file.
48 *
49- * \param id id to detect the external files from. External files are files that are stored on disk
50- * based on the internal state of the id. User should not trust that the files actually exist.
51+ * Given an ID, the function must call the given callback for each external (i.e. non-packed) file
52+ * referenced by that ID. Also files that are referenced but do not exist on disk must be reported.
53+ * File sequences must be reported by calling the callback for each file in the sequence.
54+ *
55+ * Files can be reported multiple times. For example, by reporting the filepath that is stored
56+ * inside the id, and when passing the sequence files.
57+ *
58+ * \param id ID to detect the external files from. External files are files that are stored on disk
59+ * based on the internal state of `id`.
60 * \param function_callback function that is invoked with the detected file path info. The callback
61- * function is allowed to alter the file path. In case for image sequences each detected frame of
62- * the sequence will be passed individually. The callback function can be invoked twice for the
63- * same file once containing the filepath that is stored inside the id, and then when passing the
64- * sequence files.
65- * Packed files of id's are skipped as these are not external files.
66- * \param flags flags added for future extension. Flags are already added to the signature doesn't
67- * need to be updated when migrating BKE_bpath functions. `BKE_bpath_traverse` will be modified to
68- * use foreach_external_file and uses flags to select of hide specific external files.
69- * \param user_data data to be passed to the user_data parameter of function_callback.
70+ * function is allowed to alter the file path.
71+ * \param flags flags added for future extension. Flags are already added to support an
72+ * already-planned migration of `BKE_bpath`; `BKE_bpath_traverse` will be modified to use
73+ * foreach_external_file, and uses flags to select of hide specific external files.
74+ * \param user_data data to be passed to the `user_data` parameter of `function_callback`.
75 */
76 typedef void (*IDTypeForeachExternalFileFunction)(
77 struct ID *id,
78 IDTypeForeachExternalFileFunctionCallback function_callback,
79- int flags, /* Currently unused, added for future API extension. */
80+ IDTypeForeachExternalFileFilterFlag flags,
81 void *user_data);
82
83 typedef struct ID *(*IDTypeEmbeddedOwnerGetFunction)(struct Main *bmain, struct ID *id);

source/blender/blenkernel/BKE_idtype.h
106

The signature of is_mem_alloc was requested by module owner.

@Bastien Montagne (mont29) could you maybe shed some light on this? What's the reason that this parameter is required, on top of the external_file_path_size=0 that communicates the same thing?

For me it's still unclear what the whole reason is to even communicate this to the callback. Why would this function care about whether some memory was dynamically allocated or not?

source/blender/blenkernel/BKE_idtype.h
106

We currently have two cases for the given external_file_path pointer:

  • It is a fixed array of chars:
    • callback needs to know its size to be able to safely modify it, and is not expected to modify that pointer of pointer.
    • callback should only consider external_file_path_size parameter, and fully ignore is_mem_alloc one.
  • It is an allocated buffer:
    • callback does not care about its size, if it needs to modify it it should re-allocate it and update accordingly the pointer it points to.
    • callback should only consider is_mem_alloc parameter, and fully ignore external_file_path_size one.

Could we avoid the is_mem_alloc parameter? Yes. Should we do it? Imho, no, since this would:

  • Add some extra 'implied' meaning to external_file_path_size, mixing two different info into the same parameter.
  • Reduce overall clarity, in a case where things are already not really simple due to the need to handle two very different cases in a same, abstract way.
  • Less important, but reduce general future-proofness, since we may actually want to pass in the size of an allocated string at some point, if that becomes needed/relevant.
NOTE: in theory we could even make is_mem_alloc a bitflag of a more generic enum flag parameter... In case we need to pass more 'status' info in the future. But not sure it's worth adding this extra verbosity/complexity for now at least.
120

Better put the comments one line above the parameter, rather than having them do some linewrap:

/* In case it is an array of chars, 0 if dynamically allocated. */
size_t external_file_path_size,

...However, I would expect such comment to be in the relevant \param doxygen section above? So no need to duplicate it here imho.

Same below.

I'll move the discussion thread from the inline comments to the main comments.

@Bastien Montagne (mont29) said:

We currently have two cases for the given external_file_path pointer:

  • It is a fixed array of chars:
    • callback needs to know its size to be able to safely modify it, and is not expected to modify that pointer of pointer.
    • callback should only consider external_file_path_size parameter, and fully ignore is_mem_alloc one.
  • It is an allocated buffer:
    • callback does not care about its size, if it needs to modify it it should re-allocate it and update accordingly the pointer it points to.
    • callback should only consider is_mem_alloc parameter, and fully ignore external_file_path_size one.

Aha, so there are certain freedoms that the callback has in certain situations, and not in others. This is not always the case when passing dynamically allocated memory, so I would propose to model the API so that it communicates what can be done with the pointer. The way the memory is allocated doesn't matter. Whether or not the callback is allowed to reallocate the memory and update the pointer, that is what (IMO) is important.

I also don't quite agree with the strict "modification of dynamically allocated memory requires reallocation"; if it can be done by just overwriting the existing data, why bother reallocating?

Could we avoid the is_mem_alloc parameter? Yes. Should we do it? Imho, no, since this would:

  • Add some extra 'implied' meaning to external_file_path_size, mixing two different info into the same parameter.
  • Reduce overall clarity, in a case where things are already not really simple due to the need to handle two very different cases in a same, abstract way.

This is what's currently happening here in the patch; I agree that we should either use one or the other.

  • Less important, but reduce general future-proofness, since we may actually want to pass in the size of an allocated string at some point, if that becomes needed/relevant.

I'd do this right now, instead of "lying" that a zero-size path was passed. Parameters should have one meaning, and any exceptions make it harder to work with.

My proposal:

  • Always pass the path size to external_file_path_size.
  • Rename is_mem_alloc to something like may_reallocate or may_move_pointer. This expresses what the callback can do with the information it gets, rather than communicating implementation details from the caller.

We currently have two cases for the given external_file_path pointer:

  • It is a fixed array of chars:
    • callback needs to know its size to be able to safely modify it, and is not expected to modify that pointer of pointer.
    • callback should only consider external_file_path_size parameter, and fully ignore is_mem_alloc one.
  • It is an allocated buffer:
    • callback does not care about its size, if it needs to modify it it should re-allocate it and update accordingly the pointer it points to.
    • callback should only consider is_mem_alloc parameter, and fully ignore external_file_path_size one.

Aha, so there are certain freedoms that the callback has in certain situations, and not in others. This is not always the case when passing dynamically allocated memory, so I would propose to model the API so that it communicates what can be done with the pointer. The way the memory is allocated doesn't matter. Whether or not the callback is allowed to reallocate the memory and update the pointer, that is what (IMO) is important.

I also don't quite agree with the strict "modification of dynamically allocated memory requires reallocation"; if it can be done by just overwriting the existing data, why bother reallocating?

I fully disagree with this analysis. There are no "certain freedoms that the callback has in certain situations". There are two fully different ways of handling storage of those paths, and the callbacks have to deal with both in they want to modify those paths.

Not saying this is an ideal case, but this is what we have to work with.

Could we avoid the is_mem_alloc parameter? Yes. Should we do it? Imho, no, since this would:

  • Add some extra 'implied' meaning to external_file_path_size, mixing two different info into the same parameter.
  • Reduce overall clarity, in a case where things are already not really simple due to the need to handle two very different cases in a same, abstract way.

This is what's currently happening here in the patch; I agree that we should either use one or the other.

No this is not. Current patch clearly convey which of the two cases we are dealing with. Passing 0to external_file_path_size in case of allocated buffer is simply a way to say 'ignore me', or 'unknown size' (we could add a #define UNKNOWN_BUFFER_SIZE 0 here if you want to be fully explicit about it).

  • Less important, but reduce general future-proofness, since we may actually want to pass in the size of an allocated string at some point, if that becomes needed/relevant.

I'd do this right now, instead of "lying" that a zero-size path was passed. Parameters should have one meaning, and any exceptions make it harder to work with.

Once again, external_file_path_size has a single meaning, the size of of passed buffer. 0 just means 'unknown size'. There is no lying here, just unknown info.

My proposal:

  • Always pass the path size to external_file_path_size.

This would add extra processing in allocated case, since we do not know that without querying MEM code.

  • Rename is_mem_alloc to something like may_reallocate or may_move_pointer. This expresses what the callback can do with the information it gets, rather than communicating implementation details from the caller.

No, when allocated, code is not allowed to reuse the same buffer. Just always reallocate a new string if you need to modify its content, for at least those reasons:

  • Current code always re-allocate.
  • If a path is allocated, it was most likely done to try and save memory usage. If a path is shortened in the callback, then some memory is lost now.
  • It would add additional complexity to the code (both generic looper, and callbacks):
    • Generic looper needs to find the currently allocated size.
    • Modifying callback needs to decide whether to re-allocate or modify in place (callbacks already have to deal with the allocated vs. fixed size buffer case, which is bad enough).
  • It avoids inconsistent behaviors that may lead to hard-to-trackdown bugs (if callback sometimes re-uses same allocated buffer with different content, and sometimes re-allocate a new buffer, then code using that path pointer may introduce bugs in cases it would store the value of this pointer locally while using modifying callbacks e.g.).

I fully disagree with this analysis. There are no "certain freedoms that the callback has in certain situations". There are two fully different ways of handling storage of those paths, and the callbacks have to deal with both in they want to modify those paths.

To me, communicating "this memory was dynamically allocated" is exposing what is basically an implementation detail of the caller. There are quite a few things implied here, like "you are allowed to reallocate and change the pointer", "you should not write to this memory", or "it is unknown how much memory is available". These are all different implications, and none of those strictly follow from "this is dynamically allocated". As a concrete example, just because malloc() was used does not always imply any function that gets that pointer is free to reallocate it.

No this is not. Current patch clearly convey which of the two cases we are dealing with. Passing 0to external_file_path_size in case of allocated buffer is simply a way to say 'ignore me', or 'unknown size' (we could add a #define UNKNOWN_BUFFER_SIZE 0 here if you want to be fully explicit about it).

I don't know how you can argue that this is clear in any way, when I get extra explanation and apparently still get it wrong. This is not clear.

The explanation you give here now is different from what is currently in the patch. The patch says "0 is passed if *external_file_path is dynamically allocated", you say "0 means unknown size". One does not automatically follow from the other, so to document one while actually meaning the other is confusing.

My entire point in this conversation is to cut down on implicit implications, to reduce strong coupling between different parts of the code, and to make things simpler. It is to avoid special values that are documented to mean one thing but actually mean another.
I don't understand why these points are dismissed so strongly. I'm not asking for big rewrites or redesigns. Just for some clarity in documentation & naming, and maybe the removal of one boolean parameter.

I do not see any implicit implication in current proposed code, and I do not understand how it is confusing to have two very clear cases, and the need to deal with both of them. To me it really feels like creating problems where there are none. Now if you think you can propose a better interface here, please feel free to do so, actual code might make things clearer than any endless confusing discussions and explanations.

But I really think we both have better ways to invest our time on real, actual issues, than passing by each other like that and being unable to understand each other. If this needs to be sorted out, this needs to be done in person at that point, since we can clearly not communicate our understanding of things through text on this topic.

Forgot to mention T92315: API Design for `IDType::foreach_external_file` in this task. It is a design task that would add a more descriptive API to the foreach_external_file that hides the complexity of how files are stored.