Big Data and NoSql

At various data modeling conferences I've visited for the last few years, there's a common identity crisis amongst data modelers and data architects. In most cases it encircles the following set of questions they ask themselves:

  • How do we stay relevant in this avalanche of new tech and developers?
  • How do we perserve the business knowledge that is in our data models?
  • How do we pass our knowledge and the business knowledge along in the organisation?

Instead of answering these questions directly, let's look at a little history in IT and see if we can find a trend. And perhaps formulate some insight on thoses trends, and then find the answers to these questions. They might be slightly different than you'd expect. Let's find out..

Some history

Before computing became main stream it was about solving complex mathematical problems. For that purpose the algorithms were functional in nature. And as computers progressed, the application of the computers grew into business usage to solve repetitive computationally intense activities. The software grew from being organized around functions and procedures into arrangements around the data. Methods for organizing or modeling data structures were invented. And being able to arrange data in relational databases instead of files, grew the code into more object oriented methods. These allowed software re-used, binding them to data structures and abstracting away from database access code itself.

All this evolved from dedicated computers, to mainframes and mini-computers towards personal computers. In the time of relational databases, servers and clients were developed, graphical user interfaces came to life, and finally the internet took over. The internet as an extended network created the need for more scaling, and the desire for more layers of abstractions in the database, middle-ware and client software.

With the increase of coding fragmentation into a wide range of specialists, focus was more drawn on the technological aspects than on the organizational aspects of the information. The mantra of immediate internet response, the overwhelming growth of ready made, downloadable, applications, put more pressure on the developers. Yet they still dealt with immature technology in many aspects. The amount of software frameworks available flooded the IT work floor even more.

Deep inside the companies, databases were still being maintained and used on a daily basis. The data modelers who designed them became obsolete in the fast hit-and-run era of publicly available software, and growing list of new developers how had limited sense of where their education came from, or where they were going with it. A history lesson is always needed in any school and education to make students learn from the past. Yet in IT education this past seemed less and less important. The focus lies forward in a way similar to the rabbit who stares into the headlights of the approaching car of the future.

New trends

With the coming of frameworks, the database and the required design phase of those, were becoming a hindrance to many developers. Before they could start coding, the software stack was simply to complex to align to a database which was not fully developed yet. Therefor frameworks attempted to abstract away from those database structures, and allow developers to start coding and delivering semi-products as they went from start to finish.

To users the fast paced development seemed great, the new trends really seemed to live up to their expectations. The software they thought they needed was being build as they talked to the IT developers. All through visual wire frames, stand ups, and continues deliveries. Little to developers know about the rich history and the long term business needs. The needs to integrate all source systems, build smart systems and report on a high level, integrating all data stored in all systems.

New Data Warehouse solutions and methods and technologies are invented. All to solve the integration issues of various computer systems and products in the company. Historical data is suddenly very important. What was the data last year, but also what did we think the data was last year. What do we think what the data is right now, next year and what will it actually be. All these business questions become really important and crucially they we're not met by the existing systems or software.

And just IT started to solve the integration issues, developers are already ahead on the next front. Many companies still struggle with immature solutions to integrate all the business data and make sense of it. And yet the next front already seems to move forward. Again the developers push for big data, no-sql and massively scalable solutions seems unstoppable.

Push back

And as the tsunami of new promising technologies and solutions seems unstoppable, there's already a serious push back happening. Data scientists are called in to find answers in mountains of data. Data warehouse automation finally gives analysts the breathing room to start thinking about what the data actually means. The business finally getting the management answers they so long for, develop a need for accurate and precise data because of their accountability. Governance and data lineage seems more important every day. Privacy and data leakage push back on the forward motion of new immature technology and people using it.

In an attempt to analyze what data even exists, Artificial Intelligence to learn what is unknown to us, learning about the past, predicting what will be. All statistical based data and reporting. More API development to solve the integration issues for runtime IT systems, data lakes, data vaults, cloud solutions, Software as a Service.

The data modelers and data architects are becoming more relevant as the business is under pressure by regulations, compliance, privacy and accountability. Management are on the brink of receiving jail time for IT failures. The new technology is being pushed back from the business itself.

There's a growing need by the business to know what's going on. A swing back to taking control instead of being the taken in by the developers as ringleader. No only do they require to know what IT is doing, but also the need for accurate precise data, and well integrated data is getting a foothold again.


So, knowing the trends and knowing the tidal waves of technological advancements, one can clearly see how data modeling is not obsolete. The integration of systems require IT to know what the data means. API development requires knowledge about the systems being integrated on a data level. AI systems need to be trained by data that actually makes sense to be able to validate the outcome. Business doesn't want to be responsible for reports they cannot validate themselves.

Data modeling is not irrelevant at all. Big data and NoSQL products solve a technical scaling issues, not a management issues. It impresses the business in a way XML did when it was new, and as JSON has been doing for the past few years. OWL/RDF and machine learning promises to do all that and better. Yet, again and again, the real business knowledge is still in the heads of employees and experts in the business themselves.

We are currently going through shreds of paper being pushed out of the paper shredder and trying to figure out how the individual strokes of paper mean, if anything. Data Modelers are more relevant then ever. However the traditional view of building a relational database is no longer. Logical models cover only a fraction of the entire IT need. Again the tsunami of IT needs hierarchy, network, relational, graph, semantic, process, rule, etc etc etc.

The real answer of the traditional data modeler is not to be relevant in conceptual, logical, dimensional, process or semantic modeling. The real answer to traditional modelers it to become relevant in the domain of information modeling. Rising above the old school logical database, and capture the knowledge of the domain experts in the business. Preferably in a way which withstands the next tidal wave of immature promising technology.

A way to capture the knowledge is to take a step back from technology and start modeling using fact oriented models. This is the way to stay relevant to the business and technology. Fact Oriented Modeling is as far as I know the only way that combines theory, method and tooling. It allows to bootstrap all currently active trendy technologies from a single consistent model.


Having stated a brief history of IT, and showing the true values for a data modeler, remains a single piece of advice. For every new technology out there, and currently being developed and grown into maturity, make sure you not only provide solution to technical issues, but also incorporate decades of knowledge on data modeling. I'd love to invite businesses to test your solution providers to deliver on an open environment to exchange data, and support data validation. For instance Big Data or NoSQL provide storage for all developers and every software and all versions of it, providing they take care of their own data. Yet as business users I'd want the technology to be able to support my requirements and being able to guarantee the data is actually valid and stored in structures that make sense to all other users and products handling the same data.


Plaats reactie


© 2000-2018 Copyright - All rights reserved