When BLI_task_parallel_mempool does not use threading, the
userdata_chunk is allocated locally simulating a TLS.
However func_reduce is not called and the original chunk is ignored.
This patch resolves this issue by using the original userdata_chunk.
task_parallel_iterator_no_threads is another function that doesn't call
func_reduce and apparently ignores userdata_chunk_local in the main
iterator.
This fixes T90131