It might be true that tee-ing is bad for very large sites, as it
increases load a bit in those (I think) extremly rare cases where
clients concurrently access the very same error page. But it might
be a solution for those in between cases. I think that incrementally
generated page is better progress indicator than just "Generating..."
page.
Anyway this proof of concept patch is to show how such thing should
be implemented. I don't think that it makes things a lot more complex;
in this rewrite everything is quite well modularized, encapsulated, and
isolated.
But the main intent behind this patch was to avoid bad interaction between
'progress info' indicator (in the process that is generating page, see
below), and non-cached error pages.
On the contrary, with tee-ing (and zero size sanity check) you would be
able to see pages even if there are errors saving cache entry. Though
this wouldn't help very large sites which cannot function without caching,
it could be useful for smaller sites.
But see below.
Err... could you explain what do you mean by "client is switching from
reading the file to reading the tee"?
Hmmm... I thought that the code is clear. Generating data, whether it
is captured to be displayed later, or tee-ed i.e. printed and captured
to cache, is inside critical section, protected by exclusive lock. Only
after cache entry is written (in full), the lock is released, and clients
waiting for data can access it; they use shared (readers) lock for sync.
Note that in my rewrite (and I think also in _some_ cases in your version)
files are written atomically, by writing to temporary file then renaming
it to final destination.
It does make allowance. cache_output from GitwebCache::CacheOutput uses
capturing and not tee-ing if we are in background process. When there
is stale data to serve, cache entry is (re)generated in background in
detached process.
Moreover by default cache_output has safety in that error pages generated
by such detached process are cached.
Note also that in my rewrite you can simply (by changing one single
configuration knob) configure gitweb to also cache error pages. This
might be best and safest solution for very large sites with very large
disk space, but not so good for smaller sites.
The problem with 'lightweight waiting message', as it is implemented in
your code, and as I stole it ;-), is that it doesn't provide any indicator
how much work is already done, and how much work might there be left.
Well, at least for now.
With tee-ing client (well, at least the one that is generating data; other
would get "Generating...", or rather "Waiting..." page) can estimate how
long would he/she had to wait, and literally see progress, not just some
progress indicator.
P.S. In my rewrite clients would retry generating page if it was not
generated when they were waiting for it, till they try their own hand
at generating. This protects against process generating data being
killed; see also test suite for caching interface.
I'll try to send much simplified (and easier to use in caching) error
handling using exceptions (die / eval used as throw / catch) today.
--
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html