(1) [Bug] For plain per-block mode, when Plumb schedules a YARN container to process a block,
and if it take > 3 minutes to run (because no container yet available), then there was a danger
that Plumb queue scanenr will pick the same block more than once and schedule them to run via
YARN.  That would have been wasteful but more importantly, it can create correctness problems
if both YARN containers start running concurrently.
This bug also hinders reducing our current ~3 minute polling of all the queues.

[Temp Fix]: To partially avoid this issue, Plumb needs to pricesly know hwo many max YARN containers
can be run in parallel.  Plumb only allows a small multiple of that number to be scheduled. But
clearly, that is not fool-proof.


[Proposed Solution]: Currently YARN container (dedicated for a block) actually reserves it, processes
it and releases it.  Plumb can reserve a block itself and then give it to a YARN container.
[Caution] One rational in old decision that a YARN container does its own reservation was that,
our conservative max time-to-run starts counting AFTER container has started.

Another solution could be to let a container do its own reservation, but Plumb keeps track of
pending / running YARN containers (and associated block Numbers).

Or may be Plumb reserves, failure detector doesn't toggle those blocks that are still waiting to get a container
or are alive and running, and when actually YARN container runs, it refreshes its reservation time.
........................................................................................................................
(2) [Problem] Failure detector acts block-by-block.  This behavior can interfere with window finding code, especially
if both of them run concurrently.

[Solution 1] Ask failure detector to not deal with windows and let the window handling code to take care of window
failures as well.

(+) Window has to deal with other failures like missing blocks, late blocks, permanently missing vs failed inside
system.  So it makes sense for it to deal with "reserved for too long" issues.

(-) Now error-handling code is at two places (In plumb thread, and in Queue manager in hle)


[Solution 2]  Keep all fault-handing code in one place (even for windows) and use mutual exclusion between this error
handling code and ready-to-run windows code.

(+) Simple solution.
(-) Potentially slow due to mutual exclusion (but there are only two threads.  So shouldn't be a big deal).
........................................................................................................................
(3) [Problem] For windows and continous code, that has been running more than our max allowed, what to do with them?
If we don't do anything, out fault-handling code will toggle them and those will be rescheduled?



........................................................................................................................
(4) [Problem] For window boundary cut-out where we allow boundary block to be present in two windows, how to implement
it in hle?  Currently there is no reference counting on reservation on the same block.


........................................................................................................................
(5) [problem] On each new incarnation of the Plumb, should we delete old windowstate tables and make anew or let the
old data stick in ther and use the old table (if it exists)?

........................................................................................................................