Sunday, September 15, 2013

Cloud prototyping and Kolb's Learning Cycle


There is a lot of attention on Cloud platforms (IaaS, PaaS and SaaS), but not so many present how they actually will move to it. This blog is about prototyping core business to such a platform (eg. Amazon Web Services' Beanstalk). It’s also about utilizing Klob’s Learning Cycle in organizational development. An even harder problem, until you realize that your organization has to experience a common vision.

I have earlier presented results from our Proof of Concept (Tax Norways PoC Results). That prototype was about designing software for the "cloud" (In-memory, Big Data); to make it easier to maintain, cost less and scale linearly. (I had a talk at QCon London 2013 about this)
This time its about how we can simplify our business and still make sure we comply to legislation. Its about using Domain Driven Design to establish the ubiquitous language and aggregates. Its about engaging business and stakeholders.
Innovation emerges from the collaboration between disciplines.
This is what Enterprise Architecture is all about: Improving business though IT.
What business?
The need to clean up a very complex set of forms representing mandatory reporting from the population to the Tax Administration. Simplifying for small businesses is our focus, by addressing 13 forms (the total is 50 forms, but larger businesses and other tax domains is not addressed now). Main strategy for this simplification is to collect timely facts through a responsive and “smart” wizard made available for the public. There was already a business project working on this involving 20-30 people. They had a concept ready, but was unsure how to proceed. And I had blueprints of our future systems. 
Is the Public Sector able to innovate?
Yes We Can!
Why Prototype?
Kolb's learning cycle
Lets experience something together! Ever noticed how people understand the same text or the same specification, in different ways? That is one of the major challenges in Computer Science, expectations and requirements most often are not aligned. We used Kobl's to achieve a collective understanding of our vision in this area, combining the individual skills of our organization. With this support we managed to combine deep knowledge to create something new and highly useful.

We prototyped to make this cycle come alive. Making very different disciplines; people from business (lawyers, economists and accountants), user interaction (UI designers), IT, Planning and Sponsors come together over a running application. We have run demos every 14. days (Active experimentation and Concrete experience) and discussed calculations, concepts and terms (as will be part of the new and improved ubiquitous Business Language). After the demo the participants used their Reflective observation and Abstract understanding to write new requirements to the Backlog. This is a “silver bullet” for a team fighting a complex domain. No Excel sheet, presentation or report would have made this possible.

In technical terms its more than just a prototype. It is a full blown information model, implementation, and a full fledged stack (although without database). And we know it will perform with massive scale. We have now defined the Java classes and the xml for the document store (Continual Aggregate Store).
But for business its a prototype; as we have not covered all areas or all detail, but shown how the majority of complex issues must be tackled. The prototype will be used as a study within new areas. Also remember that for business, IT is just one piece. There is also timing, financing, customer scoping, migration, integration, processes and organisational issues.

The beautiful prototype
The domain has 13 forms with some 1500+ fields. The solution now consists of 6-7 aggregates and
Navigation at left, income facts at right
some 600+ fields altogether, but any user will only fill in a subset of these. We have utilised the latest in responsive design, to give as much as possible on a web platform. Everything is saved as you work and "backend" business logic is run for calculations and validations. Logic is not duplicated, the user is working towards the same backend as our own case handlers. (There are no Form anymore, as the user is working synchronously with the backend. Old legacy systems need an asynchronous front end, and get a costly duplication of the business logic.)

Previously there was a “back-and-forth” work process through 11 hard to comprehend steps, now the structure is sequential through 5 simple steps.
Previously the user had to do a lot of manual calculations and book-keeping, now the prototype collect facts and calculate for them.
Expert in your hands
Previously the user had to know what forms to use, now the prototype guides the user though it.

Making the impossible possible
At the end there is adjustment GUI of sliders that solves what was previously hard to comprehend. Only experts with many years experience could do this. A goal for us is that normal people can report their own tax-form in an optimal way. We have done user testing and got very positive feedback. It is much easier for a user to report on real world things (assets, income, costs, dept), and let the application do the calculations.
Now that we see the result, it looks so simple, sublime and beautiful.

The key artefact is the new set of fully unit testable java classes (information, logic and aggregates) of the core domain. Ready to be deployed on some PaaS. This core is now much simpler, simplifying all aspects of systems development and integration. This is how we increase the ability to change and decrease maintenance cost.

The platform
We are using Amazon Web Services (AWS) and have deployed the prototype in Apache and PHP Beanstalk containers. This prototype continues where the last prototype stopped, and porting that code has proven to be simple. Also we are using plain Java and HazelCast on the backend. The backend contains all business logic and information. There is very little duplication of code, and the backend is used all the time as the user works his way through the wizard.
The front end has HTML5/CSS3, Javascript, JSON and REST.
AWS has been really simple (time saving) and cheap. The bill is about $100 for the test environments after 5 months! :) Also we have proven (again) that if you do your design right, you can quite easily move to another "cloud" platform (see deploy looking good).
Testing is now a dream compared to before, mainly because the aggregates are great units to test, and because business provide calculations as spreadsheets, which then sub-sequently are used as tests.
(We are deploying our private cloud, and the production-stack will be different)  

The experience
We teamed up in late March 2013 with a team of 5 from Inmeta.no. They where experienced in Java, Infinispan, GUI design, and web-development. They had no experience from Tax Administration or Accounting. The business had a concept and a plan; by starting with simple cases and adding complexity demo-by-demo. And I had a rough design for the aggregates replacing the forms. Late August we finished. At that time we had covered a lot more than we anticipated and also worked out new concepts (which had been impossible to foresee without the prototype). 
  • The theory and practices of Kolb's Learning Cycle are helpful in Computer Science. 
  • Prototyping is a silver bullet in many aspects
  • Use the prototype also on all other parts of you organization
  • Our modernised business can run with many 10s of thousands of concurrent users
  • EA-perspective and Organizational Development: Business is engaged, drive change and stakeholders are behind the modernisation
  • Our business processes can be run in new ways, eg. being much more efficient or providing transparency for the public
  • A prototype will result in a mutual understanding of information model and business logic
  • Do not implement paper forms into xml schemas, but re-structure in aggregates 
  • Your legacy systems should be moved to a “cloud” platform by using Domain Driven Design, Java, and a systems design approach that I talked about at QCon London 2013
  • If you understand HTLM5/CSS3, Javascript, JSON and REST, it is not that important what framework to use on the client side
  • Java can be really verbose, you dont need a rule-engine
  • Aggregate design (of Domain Driven Design) really rock
  • PaaS really saves time and cost
This show the innovation-power of small multi-discipline teams, with the right competence and ambition.

Creative Commons License
Cloud prototyping by Tormod Varhaugvik is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Tuesday, June 18, 2013

BIG DATA advisory - Definite Content

I urge you to start taking data quality seriously. Aggregate design (as defined in Domain Driven Design) and the technology supporting BIG DATA and NoSQL gives new possibilities, also for your core business. So be warned: pure definite business data at your fingertips.

Central to our new architectre here at Tax Norway is The Continual Aggregate Hub. One important feature of the CAH; keeping legislated versions of business data unchanged and available for “eternity”. Business data is our most important asset, the content must be definite. We must keep it for proof of procedure for more than 10 years, its integrity must be protected for wear and tear from both functional and technical upgrades in the software handling it.


Your current situation
I claim that the relational schema is too volatile for keeping the integrity of the business data stored in it, over time. The data is too fragmented, and functional enhancements to the schema will make the data deteriorate over time. “Will a join today give the same result as it did 5 years ago?”. Also a major threat against the integrity of the data is relations and other stuff that is added to the schema (DDL) to support reporting or analytical concerns. This makes the definite (explicit) content hard to get to, because the real business data is vague in the relational schema.

The Continual Aggregate Hub
Here, business data is stored as XML-documents, versioned for every legislated change, and categorized by metadata. See my talk at QCon 2013 in London, where I present that we organize business data by a classification system not unlike what libraries use for books. Basically we use the header to describe the content, and the document itself contains an Aggregate. I also show that we can compose complex domains from these Aggregates, and that applications running these domains fit nicely in the deployment model of "the cloud” and in-memory architectures. (see discussion on software design in the CAH)

The Implementation
The excellent team that implemented the datastore of the CAH constructed it as two parts; one called BOX and the other IRIS. BOX sole purpose is to store aggregates (as versioned documents), enforce a common header (metadata for information classification), information retrieval (lookup based on metadata), and providing feeds (ATOM) of these documents to consumers. BOX does not care what is in the document. IRIS' sole purpose is to provide search, reporting, insight and (basic) analytics based on all document content. IRIS utilize a search engine for this. We use Java, REST, ATOM-feeds, XML, and Elastic Search. We still use the Oracle database, but will migrate to a document database in a couple of years. (see blog for discussion on deployment models)

Separation of Concern
This is now in production and we see the great effect of having separated the concerns of information storage and information usage. We can at any time exchange the search engine and use sexy new tools  (for example. Neo4J, SkyTree, and many others), without touching the schema of the business data, or the technology supporting BOX. A true components based approach to functionality. We can also change the schema of the business data over time without altering the old data, or altering the analytics and search capacities. The original and definite business content is untouched. The lifetime requirements of our data has never had such a good stand. Also the performance of these searches are awesome. Expect to use the same amount of space for IRIS as spaced used in BOX.

Insight into our business data has never been better. BIG DATA and NoSQL tools are giving us a fantastic opportunity. You should consider it.

Creative Commons License
BIG DATA advisory - Definite Content by Tormod Varhaugvik is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Thursday, January 10, 2013

Target architecture looking good

It has now been 3 years after the target architecture, and more specifically the CAH (Continual Aggregate Hub) was established. The main goal is to have a processing architecture where ease of maintenance, robustness and scale is combined. We want it highly verbose, but at the same time flexible. In the mean time we have started our modernization program, and at present we have 50-70 people on projects, but many more supporting on different levels. We have gathered more detailed insight (we ran a successful PoC) and the IT-landscape has matured. Right now we are live with our first in-memory application; it collects Bank account information from all persons and business in Norway. I am quite content that our presumptions and design are also seeing real traction in the IT-landscape. Domain Driven Design, Event driven Federated systems, Eventually Consistent, CQRS, ODS, HTTP, REST, HTML/CSS/JS, Java (container standardization), and XML for long term data still is a good choice.
My talk at QCon London 2013 is about how this highly complex domain is designed.

It is time for a little retrospective.

The Continual Aggregate Hub contains a repository or data storage, consisting of immutable documents containing aggregates. The repository is aggregate-agnostic. It does not impose any schema on these; it is up to the producer and consumers to understand the content (and for them it must be verbose). The only thing the repository mandates is a common header for all documents. The header contains the key(s), type, legitimate period and a protocol. Also part of the CAH is a processing layer. This is where the business logic is, and all components here reside in-memory and are transient. Valid state only exists in the repository, and all processing is eventually consistent, and must handle things idempotent. Components are fed by queues of documents, the aggregates in the documents are composed into a business model (things are very verbose here), and new documents are produced (and put into the repository). Furthermore all usage of information is retrieved from the repository.

Realization
With this way of structuring our systems, we can utilize in-memory or BIG-data architectures. The success for utilizing these lies in understanding how your business domain may be implemented (Module and Aggregate design). The IT-landscape within NoSQL is quickly expanding, in-memory products are pretty mature, PaaS looks more promising than ever, and BIG-data certainly has a few well proven candidates. I will not go in details on these, but use some as example as to how we can utilize them.
This is in no way an exhaustive list. Products in this blog is used as examples for different implementations or deployments of the CAH. It is not a product recommendation, nor represent what Skatteetaten might acquire.
NoSQL: It’s all about storing data as they are best structured. Data is our most valuable asset. It brings out the best of Algorithms and Data structures (as you where taught in school). For us a document store is feasible, also because legislation states formal information set that should last for a long time. In this domain example candidates are: Couch DB because of its document handling, Datomic because of immutability and timeline, or maybe MarkLogic because of XML support.

Scaleable processing, where many candidates are possible. It depends on what is important.
In-memory: I would like to divide them into “Processing Grid” or “Data Grid”. Either you have data in the processing java-VM, or you have data outside the java-VM (but on the same machine).
PaaS: An example is Heroku, because I think the maturity of the container is important. The container is where you put your business logic (our second most valuable asset), and we want it to run for along time (10 years +). Maybe development and test should run at Heroku, but we run the production environment at our own site. Developers could BYOD. Anyway Heroku is important because it tells a lot of how we should build systems that has the properties we are discussing here. And the CAH could be implemented there (I will talk about thet at SW2013).
BIG-data: We will handle large amounts of data live from “society”. Our current data storage can’t cope with the amounts of data that will be pouring in. This may be solved with Hadoop and its “flock” of supporting systems.

Deployment models
The deployment models resemble our dilemma of balancing a “Total Cost of Ownership”; ease of maintenance, robustness, ability to scale, and cost.
In-memory – Processing Grid (~GemFire)
  • Pro. Very low latency. Elastic (scale and re-balance)
  • Con. Cost. (Open Source not stable enough).   Heap limitation leads to many VM’s. Business code and data are close, leads to deployment issues.
In-memory – Data Grid (~Terracotta)
  • Pro. Elastic (scale and re-balance). Number of VM’s solely by processing modules. Business code and data are separate, better deploy situation. Low latency (serialisation, but on same machine).
  • Con. Cost.
Distributed database – Big Data (~Hadoop)
  • Pro. Super simple VM (jetty) that only handle local data. Cost. (Open Source stable). Business code and data are separate, better deploy situation. Number of VM’s solely by processing modules.
  • Con. Slow elastic (scale and re-balance). Disk-to-disk. Latency (map-reduce)

Conclusion
Our application and systems ovehaul seems to fit many scalable deployment models, and that is good. Lifetime requirements are strict, and we need flexible sourcing.
We are doing Processing Grid now (we are using HazelCast), but will acquire some "in-memory" product during 2013 (either Processing or Data). Oracle is the document database, extremely simple, just a table with the header as relational columns, and the aggregate as CLOB. The database is “aggregate agnostic”.
Somewhere around 2015 when the large volumes really show up, you will probably see us with something like Hadoop, in addition maybe to the ones mentioned above. Since latency in sub-seconds is OK, and we will have a long tail of historic data, maybe just Hadoop? Who knows?

We are navigating into a system landscape where we may choose deployment models, in a more tactical cost / benefit evaluation. We are not stuck to a Database or a single machine anymore.
Creative Commons License
Target architecture looking good by Tormod Varhaugvik is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Friday, September 21, 2012

Business Intelligence and Predictive Analytics in the CAH

Use of Complex Event Processing (CEP) in the Continual Aggregate Hub (CAH)
With all these Aggregates being stored, it is necessary to understand what events (or combination of them) that should trigger some business activity. Some of these are already discussed in CAH, "The Restaurant" and in "Module and Aggregate design". It is the hard job of the waiter in "Restaurant" to execute CEP. CEP is already in use for the primary processing job of the CAH - to process the correct tax. This blog is about processing secondary concerns - correlating events that should trigger actions to follow up on the primay task: Business Intelligence and Predictive Analysis. (This blog-post is not about the analysis algorithms, but on the architecture and tool-box)

Just to summarize some constraints of the CAH (definitions from Domain Driven Design):
  • All Aggregates have a commonly defined Root-object.
  • Aggregates that are "Private" may change at the will of the Module producing it.
  • Aggregates that are "Private" are not known by other Modules.
  • When an Aggregate goes "Public" that is a business event and the Aggregate contains the data. 
  • Aggregates that are "Public" will never change.
  • If the Module needs to change a "Public" Aggregate, a new new version is published.
  • All Aggregates are two-sided; the creditor and the debitor (eg. bank and account owner)
Maintenace and robustness
The design presented below provides excellent isolation from the business logic, linear scaling, and the other features of the CAH. The design gives these "Business Intelligence" concerns a dedicated place in the architecture. The idempotent feature of the CAH is important because we can calculate on historic events when we understand new patterns for fraud detection. (Any tax case up to 10 years old can be re-instantiated).

Chain of events
Catching the business event is a valuable thing. In database systems many events occur, but since the database has a very fragmented view of the data, understanding what actually are important events drown in the flood. It is in the business logic (mainly the Application Layer in DDD) that business events are known, and it is here that Aggregates are made Public.

The aggregates stored in the CAH represents a chain of events that can be reasoned upon. These events may be divided into these groups, or lanes of events.
  • Everything. Typically used for a stream to the data warehouse or others that have a holistic approach
  • By document type (or schema). This is a lane for subscribers that are interested in a particular set of data.
  • By Party. This is a lane for subscribers that reason about data concerning a Party.
  • A combination of document type and Party

Identfying a complex event
This part of the processing is about having an inferred state of the events over a certain period of time. In the CAH this would be the "waiters" job and a specialized set of such; Event Monitors. The unique keys for event monitoring a Party could be: Event Monitor Type (Payroll vs VAT balance), Concerns (the party Id) and Legitimate period (2012) (see definition of the super-document). These event monitors only access the Header of the document, this for performance and clear separation of concern. 

Reasoning about a complex event
If the Event Monitor fins a prospect, it triggers an Event Reasoning Module. This module is capable of retrieving a subset of data from the CAH - this is data not part of the event itself - and the Module does some logic on these data. There will certainly be re-use of Services present in other Modules (which contain business logic necessary to understand the content in an Aggregate). The ERMs will also use services on Party for segmentation parameters such as "Scoring". These parameters are often set by analytics in the data warehouse, that focus on Party behavior. 
There is a many to one relationship here; many Event Monitors may trigger the same Event Reasoning Module. Either the Module has a negative or positive case. If there is a positive case, then the Module produce an Aggregate and makes it Public in the CAH.  These Aggregates are a special set of Aggregates containing data for the BI processing. They in turn can be used for trend analytics and form basis for Predictive Analytics.

Responding to a complex event
Responding to a complex event would be to listen to Aggregates that come from ERM´s. These Aggregates will be stored for a long period of time, so that trend analysis can be done on them. The Aggregate can be sent to any participating system real-time or they are used by other Event Monitors to form chains of CEP. These Aggregates are secondary products of your pipeline of tasks. The first product is about calculating tax correctly.


In the illustration the blue Aggregates (the primary ones) emit events (below the CAH) that form an event stream. EM modules monitor events and trigger ERM for further investigation. ERM 1 has a negative case and does not produce an Aggregate. ERM 2 produce an Aggregate (a green one, representing secondary products in the CAH), which also triggers an event. The Party Registry is also in action supporting segment or scoring information, for the ERMs.


Predictive analytics
Business Intelligence, complex validation, fraud detection and monitoring done in the "live" environment and not in the Data warehouse. (As I have discussed in other blog-posts, the Data warehouse has an important role, but not for these "real-time" tasks). Fraud detection could be analyzing typical patterns and triggering action if they are out of bounds. (For example: a carpenter with certain characteristics, in Oslo, having revenue 25% less than the average, for the third month in a row, then do "something"). Patterns could also be a balance between data-sets or chains of real-world events that are linked. Predictive analytics will contribute to tackling things up front, or enable the Party (tax payer) to act in the right way. Also note that our Aggregates have two-sides; the debitor and the creditor, they help us chain events.

Involving the Party as events occur in the real world
By catching events as they occur in the real would - marriage, death, bankruptcy, trades, liability, payroll etc. - the system could also respond to the Party (the physical or legal entity) and make them acknowledge the event. The Party will then have more insight into its own tax-case, it will validate the data we have, and we are treating our citizens in the correct way. Or the Party may understand that he is doing something wrong and act in the right way. It is better to tackle this up-front.

Implementation
The Aggregates are stored as XML, and are Java object structures in-memory.
Modules are plain Java deployed in our linear scalable processing architecture, and the Modules have services in RMI, REST or WS.
Event streams are Atom feeds, or JMS in-memory.


Creative Commons License
Business Intelligence and Predictive Analytics in the CAH by Tormod Varhaugvik is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Saturday, May 19, 2012

Viking Culture - An Agile Approach

What?
What do Vikings have in common with lean and agile approaches of Software Engineering or Enterprise Architecture? How does your quest find its way in troubled and uncertain waters? I have heard talks about engineering that date back to the 18th century, but I would like to go back 7th century. As I was reading "The Hammer and the Cross" by Robert Ferguson, some similarities from a long gone period struck me. Engineering with little resources and the ability to cope in a hostile environment, while still having some cultural, mercantile and warfare success during a few hundred years. They discovered America 500 years before Columbus. It certainly has influenced Northern part of Europe (eg. Great Britain and Normandy) (This is no cultural ranking or justification, but used to highlight some aspects of agile and lean practices. See footnote)


Odin
Risky knowledge seeking
Gaining knowledge was very central in Viking culture and Northern heathendom. You where obliged to seek knowledge and experience, and at any risk. Odin (the main god) sacrificed one eye to obtain knowledge. So at the very foundation of culture is bold knowledge seeking. Knowledge seeking actually means that failure will happen. This together with transparency (see below) provided a basis for trust and forgiveness when failing, and not an embarrassment.
This is most certainly a foundation of software engineering as well, you will get nowhere without seeking knowledge. Be inquisitive, find combinations, test new technology, and understand its appliance. And then being patient as you convince the stakeholders and project members the new possibilities and how to apply them. Be bold and brave!


Don't give oath
Early in the book Ferguson quotes that some Viking party when having meeting with some King of England, where reluctant to give an Oath. The agile view would be that it is first when you get into a situation you know what to do. You just can't promise that you will act in a certain way, its depends on the situation.
This may be brought back to up-front design, estimates and project plans. The nature of any project with a degree of uncertainty, is that you will face elements that you can't foresee. So just don't promise anything, you will regret it.


Admit and go free
This is also about transparency. Very central to the culture was the "Ting". This show a strong natural inclination towards the involvement of the whole community in common decisions and in courts of law. Living scarcely across the country, communities met both at local and central places to discuss law, and to judge actions. The gatherings show a balance between consensus driven decisions and local actions depending on the situation. It also show that transparency is important to get consensus, and that individuals understand the purpose and interpretation of law. If you failed, you will be forgiven, provided that your actions was well intended.
Viking "Longhouse"
As an example: If you killed someone (in daylight or by fire at night...) and you passed 3 settlements without telling anyone, then you would be sentenced to death (or being classified as an "outlaw"). Now, if you admitted your action (it may have been an accident, or depending on the situation it was considered self-defense ;-) ), that provides transparency and you involve the community in you humble action.
In engineering, it is vital that errors are acted upon and that -- by admitting or taking responsibility --, you take part of the team and handle things up-front, instead of pretending that nothing happened. You actions will come for a day, so it is better to face it.
In Enterprise Architecture is about balancing business and IT. Central steering (principles, target architecture and business value), with local actions (understanding purpose and executing projects).

You can't cheat nature
Surviving in a climate where the soil is frozen 6 months of the year take some planning, being diligent, and taking risk. In this type of condition, it all depends on how well you plan, cooperate, gain craftsmanship and do your life. You can't survive by lying or cheating. You can't cheat nature.
In our discipline that system you build better work. You can't pretend that it works. It is not a report, not a picture, or some abstract stuff: it is working software. It is out there for every user to see. You can't hide.
On the other hand if you have vast resources, you can make it happen even without a plan or good craftsmanship. It will just take longer time and cost all too much. So even without an agile approach, working software is possible, but not desirable.

History has its stories of success and failure. Success occurs now and then, but everything seems to fail in the end. Where are we heading?


(There are certainly other aspects of the Vikings that does not fit. The book is all about why the Viking age had such a turbulent end. And there are certainly aspects of other great cultures of ancient times that would serve as examples.)
Creative Commons License
Viking Culture - An Agile Approach by Tormod Varhaugvik is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Wednesday, February 29, 2012

Module and Aggregate design in the CAH

As a result of the successful PoC we ran, we detailed our design for Modules and Aggregates in the Continual Aggregate Hub. And maybe this just is our Kinder Surprise; 3 things in one:
  1. Maintainable - Modular, clear and clean functional code as close to business terms as possible. Domain Specific Language.
  2. Testable - Full test coverage. Exploit test driven development and let the business make test cases as spread sheets.
  3. Performance - Liner scalable. Cost of HW determine speed, not development time. Too often you start out with nice modules but end up cross cutting the functionality, and chopping up transactions into something that is far from the business terms. That makes a hard system to maintain. We will achieve high performance without rewriting.
This article explain the logical design of the Module that is responsible of Assessment of income tax, the Aggregate that contains the Assessment data, our DSL, deployment issues, and lifetime issues. They represent components of the CAH. "Assessment of income tax is an type of "Validate consolidate and Fix", and the Assessment data is of type "Fixed values".
We will release the Java Source for this, and I expect that our contractors (Ergo/Bekk) to blogg and talk about the implementation. They too are eager on this subject and will focus in this direction of software design.

Tax Assessment Module
This is a logical view of the Module 'Selvangivelse 2010' handling the tax assessment form and the case handling on it for the year 2010. There will exist one such module for each income year, it consumes supporting xml-documents for the assessment of tax for that year.  The Module is where all business logic is and the Modules (of different type) together form the processing chain around the TaxInfo Aggregate Hub. It is layered as defined in DDD and as you see the Aggregate Hub is only present at the bottom - through a Repository - and this also is the boundary between pure Java logic and the "grid infrastructure / scalable cache".
Selvangivelse (Tax form) Module
The core business logic is in 'SelvangivelseService'. This is where the aggregates from the 'SupportRepo' is extracted to (through an Anti Corruption Layer) and put into the business domain, all business logic is contained and it is where the real consistency checks reside (today there is 3000 assessment rules and 4000 consistency checks). 'SupportRepo' is read-only for this Module and may consist of as many as 47 different schema's. These are supplied by other Modules earlier in the "tax pipeline". 'SelvangivelseService' persist and read from 'SelvangivelseRepo', and makes sure that the last version is consistent or handles manual case handling on it. Tax assessment forms may also be delivered independently by the tax payer (no information about this tax payer exist in the 'SupportRepo'), so it is important to have full audit as to all changes. Tax Calculation is another Module, it is fed by 'SelvangivelseRepo', and for that component it is read-only.
There is also a vertical line and it illustrates that usage of the Selvangivelse (the assessment form) must be deployed separately from producing it (This "read only stack" has been discussed previously, but it is to provide better up-time and to be able to migrate in Assessment forms from the Mainframe without converting the logic.)
Most of the business logic in 'SelvangivelseService' is static (Java construct), so that is does not need any object allocation (it is faster). Object creation is mostly limited to the Entities present in the Aggregates. Limiting the amount of functionality present in the Aggregates that go in the cache, makes the cache more robust as to software upgrades. This may be in conflict with good object orientation, but our findings show a good balance. The though business logic is actually not a concern to the cache-able objects, but is a concern of the Service. In other words a good match for DDD and a high performance system!

Domain Specific Language
We see that maintenance is enhanced by having clear and functional code as close to business terms as possible. Good class names and methods has been used and we believe that this is the right direction. We do not try to foresee changes to the business logic that may appear in the future, risking only to bloat down the business code with some generics we may not need and that hinders understanding. And acknowledge that things have to be rewritten now and then, and must re-factor in the future without affecting historic information or code. (see deployment further down). Note that we have already in the design separated functional areas into Modules, meaning that one years tax assessment handling may be completely different from another.
Our DSL have been demonstrated for business people, and the terms relate to existing literature ("Tax-ABC" is a nasty beast of 1100 pages ;-) ). Business confirm that they can read and understand the code, but we do not expect them to program. We expect them to give us test cases. Close communication is vital, and anyone can define test cases as columns in a spreadsheets (there are worse test scenarios that is not able to be represented as a table, but at least they are a good start and actually covers most of what we have seen in this domain so far.)
Example DSL for summarizing fields in the Tax form

The DSL approach is more feasible for us than using a Rule Engine. Partly because there are not really that many rules, but data composition, validation and calculations which a normal programming language is so good at. Also by having a clear validation layer/component, the class names and the freedom to program Java actually makes the rule-set more understandable, it does not get so fragmented. The information model is central, that to has to be maintained and flexible. And last but mot least; because of lifetime requirements and other support such as source-handling (eg. github), refactoring and code quality (eg. sonar) is so much more mature and well known in the Java world, that any Rule Engine vendor just cant compete. (Now there certainly are domains where Rule Engine is a good fit, but for our domain, we can wait.)

Logical design of the xml super-document
Aggregate store: the XML-document
All aggregates are stored in a super-document structure consisting of sections. The content of these sections are generic, except the head. (see Aggregate Store For the Enterprise Application)
The aggregate has a head that is common for all document types in the TaxInfo database. It defines the main keys and the protocol for exchanging them. It pretty much resembles the header of a message or the key-object of the Aggregate in Domain Driven Design. The main aggregate boundary is defined by who it concerns, who reported the data, the schema type and the legitimate period. In either case it is there as the static long lasting properties of an aggregate in the domain of the CAH. We do not expect the ability to change here, without rebuilding the whole CAH. 'State' is the protocol and as long as the 'state' is 'private', no other module can use it.
The other sections of the document are owned exclusively by the module that produce such content, and belong to the domain the module implements.  'Case' is the state and process information that the module mandates, in the example the document may be 'public' in any of the different phases that tax handling goes through. For example: for a typical tax payer (identified by 'concerns') when an income year is finished there will exist 4 document of this type, each representing phases in the tax process (prognosis, prefilled, delivered and assessed).
The Aggregate section is where the main business information is. All content that is relevant is stored here (also copied, even though is may be present in other supporting documents). This makes the document valid on its own, and must not be put together at query time or for later archiving. Any field may either be registered uniquely in this aggregate, or it may be copied and reference some other document as its master. This is referenced by 'ref GUID' and is used by the business logic in 'SelvangivelseServce' to sew objects in the supporting aggregates together, and create the domain object model of the Module.
The Anomalies section contains validation errors and other defects in the Aggregate, and only concerns this occurrence. We may assess such a document even though it has anomalies, and information here is relevant to the tax payer to give more insight.
The Audit section contain all changes to the aggregate, also automatic handling, to provide insight into what the system has done during assessment. This log contains all changes from the first action in first phase of assessment, not just the present document.
Both Anomalies and Audit can reference any field in the Aggregate.
Aggregates are not stored all of the time, only at specific steps in the business process are things stored as xml. Mostly when legislation state that we must have an official record. For example when we send the pre-filled form to the Tax -payer. The rest of the time the business logic run - either automatic because of events or manual case handling - and update the objects without any persistence at all. Sweet!

Deployment
Logical deployment view
Every node is functionally equal and has the exact same business logic. To achieve scale the data is partitioned between the different nodes.
This illustration show how the different types of aggregates are partitioned between the different nodes, where the distribution key is to co-locate all aggregates that belong to the same "tax-family". This is transparent for the business logic, and the grid software makes sure that partitioning, jobs, indexes, failover, etc. are handled. By co-locating we know that all data access is local to the VM. This gives high performance. The business logic function even if we have a different distribution key, it only takes more time to complete. Some jobs will span VM's and they will use more time, but Map-Reduce makes sure that most of the work is handled within one VM.
We have learned that memory usage is pretty linear with time usage, and that the business logic is not the main driver. The difference in 10 or 100 rules is much less than 2 or 3 Kb of aggregate size. So fight for effective aggregates, and build clear java business logic!

Lifetime deployment
Where do you draw the line?
There are deployment challenges as time goes by. Technology will change over time, and at some point we will have a new os, jvm, or new grid software to run this. We have looked into different deployment models, and believe that we have the strategy and tool-set to manage this. At some time we must re-factor big-time and take in new infrastructure software, but we do not what to affect previous years data or logic. We want to handle historic information side-by-side in some Module, and the xml representation of the aggregate is the least common denominator. It is stored in TaxInfo. Modules deployed on different infrastructure must be able to communicate via services. Modules running on an old platform will co-exsist with Modules on a new platform. In these scenarios we do not need high performance, so they need not co-exist in the same Grid, but communicate via WebServices.
In the above mentioned solution space, it is important to distinguish between source and deploy. We may very well have a common source, with forks all over the place if that is necessary. Even though we deploy separate Modules on separate platforms, we can still have control with a common source. (I may get back with some better illustrations on this later.)

Conclusion and challenges
It is now shown that our domain should fit on this new architecture and that the CAH concept hold. We also now think that there has been no better time as to rewriting existing legacy. We have a platform that performs and we can build high level test cases (both constructed and regression test) "brick by brick" until we have full coverage. Also we understand the core of the domain much better now and we should hurry because core knowledge are soon leaving for pension.
This type of design running on a grid platform need not only do so because of performance. It will benefit because the aggregates are stored asynchronously. The persistence logic is out of the business logic, and there will be no need to "save as fast the user has pressed the button". It allows you to think horizontal through the business logic layer, and not vertically as Java architecture sadly has been forcing us for so long.
As presented in PoC we now have a fantastic possibility to achieve the Kinder Surprise, but we certainly must have good steering to make a better day.

Kinder Surprise; simpler, cheaper, faster!

Creative Commons License
Module and Aggregate design in the CAH by Tormod Varhaugvik is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Saturday, January 21, 2012

Tax Norways PoC results

Our results from the Proof of Concept has been presented at Software 2012 (Massivt skalerbar skatteberegning) and Ark 2012 (Kinderegget: enklere, billigere og mye raskere) here in Oslo.  We have tested essential complexity (the core of the problem, although a subset of the overall functionality) in our quite complex multi-phase tax assessment process (up to 47 sub-forms assembled together onto a "tax-foundation" form of 800 fields with 4000 rules and 3000 validations), and subsequent tax calculation (also quite complex).
As a background there is a context to this, a concept, and logical design.

Our findings show that a 12 server grid (Intel i7) with 500 Gb of RAM will process everything in less than 5 minutes (for a population of 5.1 million), on a hardware platform costing 5% of todays expenses. We also see that having full test-coverage is highly achievable and will of course drastic reduce maintenance cost in the long run. Plain Java with good class names and methods (DSL "Domain Specific Language") makes this rock.

This platform handle tax forms at over 50.000 forms pr. second. 

The aggregates (as defined in Domain Driven Design) really make the difference, and in this domain it is a great fit.


BTW: using this type of in memory architecture will also gain applications who do not need scale. Storage is asynchronous from the usage scenarios. Just store when the right business state is met. The business logic and information model is nice and clean. No persistence tweaking anymore :-)


See you at SW2012!


Creative Commons License
Tax Norways PoC results by Tormod Varhaugvik is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.