Sunday, October 18, 2009

The deceiving case of the 'Exceptional Scenario'

Two weeks back I had to drop my dad in the railway station for a night train. This particular railway station was one where very few trains had a stop and thus it's location was not familiar to many people.The first time I had to go to that place, I looked up Google Maps and it presented me with three alternate routes. Two routes were through small and confusing roads and were shorter than the third route where I had to take the national highway and then turn to local street roads which were much clearer than the first two options. I picked the third option, irrespective of it being longer, because there was an element of 'risk' involved in trying out small and confusing roads especially when you have a train to catch! The story goes on that I had to go to the station thrice (before this incident of dropping my dad) and all the time I took the national highway route, the route was perfect and hassle free except for a small stretch of road, about 200 meters which connected the main road to the station. That final stretch was through a slum like area, the road was just wide enough for a car and bike to go and I was confused whether it was a public road at all as the residents seemed to treat the road as an extension of their houses. The place had a unique co-existence of , Hindu, Christian and Islamic places of worship next to each other and every time I drove through that lane, the people used to have some celebration or the other (I appreciated the element of religious harmony, but not the prospect of having to drive through that narrow piece of road with people dancing around!). One day while I dropped my parents and was going back, I was forced to take another route (one of the first two options given by Google Maps) and found that even though I had to go through dingy roads, it was much shorter than the highway route as prescribed by Google Maps, but I could not exactly correlate the roads and the turns I had to take to reach the station as it was night time.
Now let's pan back to me dropping my dad! The train was at 9:30 PM and the drive takes around 45 minutes. We started from my home at 8:15 PM and I was juggling the two routes I could take - through the highway or through the shorter new route which I found the last time I came back after dropping my folks. The highway route was longer and I had to go through that last 200 meter stretch where I had the 'risk' of the entire road being blocked because of some celebration or the other. But nothing of that sort had happened anytime when I went there before - where the entire road was blocked and I could not reach the station, yes there was an outside possibility that it could happen, but that 'scenario' had never happened and was very 'rare'. The alternate route on the other hand was indeed shorter, but much more 'uncertain' and 'complex' than the highway route - there was a pretty high probability that I could take a wrong turn and lose my way and eventually my dad could end up missing the train, but that road did not present me with any possibility of being blocked - which mean that if I was able to find the correct way and take the correct turns, I had an almost 100% probability of reaching the station.
Which option should have I taken and why?
Well, this is not a suspense thriller (though it came close to it, my dad did not have to turn superman and chase the train to board it!!), so no spoilers in here - I took the 'tried and tested' highway route, had a smooth ride on the highway and reached the last 200 meter stretch in almost 30 minutes, I turned into the small road, drove for around 50 meters and tragedy struck! The entire road ahead was blocked by an idol of Virgin Mary and people were dancing and singing around it!! The station was just 150 meters ahead and I could not let my dad walk in the night through that area, there was only one option left - take a U turn and find a new way!! Finding a new way at the eleventh hour was a risk I did not want to take, so I had to do the most deplorable act of asking my dad to take a rickshaw and go to the station. The story had a not-so-tragic ending with my dad reaching the station before the train arrived and I kissed him goodbye (I followed the rick to reach the station! Shame on me!!).
That incident made me reflect on some of the design decisions and risk evaluations that we make in our projects. This is a very typical scenario - two alternate solutions are proposed for a problem. Solution 'A' is pretty straightforward using proven technology and easy to implement, the developers vouch that it will work for nine out of ten cases and the one case where it will fail being a very rare one. Solution 'B' on the other hand is a more complex and difficult one (but not impossible), using technology and methods which are not commonly used, but it will work for all scenarios. I have seen that majority of the time, Solution 'A' will be chosen and signed off by even the business team, vouching that they would handle that one exceptional scenario manually. The usual justifactions for this decision is the time saved by going for 'A' and the opinion that the incremental advantage accrued by fixing 1% of the exceptional cases cannot be justified by the increased effort and cost involved in developing solution 'B'.
I am convinced that the risk of the 'rare case' or 'exceptional condition' can be under-evaluated during the design phase and could strangle the entire operations if it does happen sometime (statistically it will!!). Even if it does not disrupt the operations, it could turn out to be an administrative menace over the long run for business users. This is something I have learnt during my support experience in TESCO, that seemingly trivial issues ignored during the development phase becomes a headache once the system goes live and once the organisation grows and the number of transactions increase. At this stage, extra effort will have to be spent in finding a solution to the problem, which in turn increases the cost of ownership of the product (the IT Support/ Maintenance team on the other hand would be happy, because it represents an opportunity to 'improve the system' and they can flaunt the 'savings' accrued by the fix in their quarterly metrics! It really doesn't speak volumes when you fix a defect downstream, though it is a great help for the business plugging holes in the system, it is highly undesirable and should be ideally captured during the development phase).
The argument that the cost involved in solving 1% of the issues is not justified stands true only when cost is calculated over the development phase. But if one considers 'Total Cost of Ownership (TCO)', which would be the cost of developing as well as supporting the system over it's entire life time, it can very easily be understood that the extra cost and effort incurred during the development phase to develop a fool-proof solution will lead to lesser maintenance issues and thus lesser TCO. (Considering the fact that typical development phases extend somewhere between 8 months to 2 years, while support phases will run into years and decades).
So in conclusion, I would like to state that when presented with alternate solutions, the choice should always go to the fool-proof one, even if it will take more effort and cost - because the money saved in the development phase by going for a quicker fix that works 99% of the time, will be lost during the support phase handling 1% of the exceptions over the course of the system.
P.S.: This issue and many others bring to the fore, the need for 'System Thinking'. I think that it is one of the most powerful management thoughts ever, but unfortunately is more linked to the way you look at a problem and is not a 'tool set' that can be easily implemented, which should explain why this great idea finds very little practical application. The fact is that IT projects are driven in such a manner that development and support are done by completely different teams and in many cases by different organisations - this takes away the accountability of developers and the organisation that carries out the development on the issues that prop up during the support phase and most developers have only done development and have never been in support, which cripples their ability to foresee possible issues and their impact. The way I see it, it is the customer that loses out it in the end - the IT teams, let it be development or support comes out clean in the whole act - once the product has been implemented the development team moves on to other engagements, while the support team gets more work fixing the issues left over unattended during development - it is the customer that gets caught in between. The world is not perfect nor ideal and from an ethical point of view, these are all not deliberate decisions that people make (in most cases) ('they signed off the design, so don't blame me!!', 'we had brought up that risk during the design phase, but even the business agreed to it', 'it is important that we meet the deadline, so considering the fact that the solution meets 99% of the cases, we should go with it' etc) - but can't we take small steps and decisions to improve the probability of perfection, not of the world, but atleast of the IT systems that we design?

Friday, October 2, 2009

Can you implement T&L without Absence Management?

I found this very interesting question in ITToolBox yesterday and thought would 'type in' (I would have scribbled or jotted few years back!!) my opinion on this. Let me restate the exact question as it appeared in the forum:

"We are trying to do an analysis of using Absence Manager without Time and Labor. We already have a customized module that feeds online timecards to payroll in peoplesoft. First, is this possible? If so, what functionality would we be missing out on? Everything I have read talks about them together. Is AM just the workflow piece?"

Though the user did not mention their actual requirement, the obvious answer is yes - traditionally over the years T&L has been implemented as a different module from Absence Management. May be the understanding of what these two modules do might help the user take an informed choice.
Absence Management is a module that is specifically designed to manage leaves - that's all it does. It is a very elegant and powerful module that can handle all types of leave accruals, carry forwards, prorations, workflows for approvals etc. It is interesting to note that before AM gained the popularity that it enjoys today, organisations have used Benefits, Monitor Absence in Workforce Admin and even T&L to manage leaves. Absence Management is an independent module and can be implemented standalone with core HRMS.
Time and Labor on the other hand is a much more diverse module that can handle time reporting of employee, track compensatory time offs, enable task based time reporting, calculate overtime and shift premiums etc. If there are no complex rules, T&L can also double up as the module handling leaves of your organisation.

The interdependence of the two modules has to be a thing of version 9.0. With version 9.0, T&L and AM have been coupled very tightly, where entire absence self service has almost been integrated in the timesheet. This could be the reason why it looks as if they cannot be implemented separately.
But the fact remains that both of them are independent modules and the choice of which module to use should completely depend on the depth and nature of your organisation's requirement.