4 reasons why Engineers need Data Science
Key points
With the world around us seemingly determined by analysis of data and an immense amount of data produced daily, we have seen the Data Scientist emerge as a new colleague. It is hard to see through the hype and determine a good strategy.
We believe that there are four ways that Data Science can add value for Consulting Engineers:
Coding for Consulting Engineers
Testing options for Consulting Engineers
Benchmarking and Checking
Asset Management Proactive Maintenance
Why do Consulting Engineers need Data Science?
We have become aware that data can make a difference in the outcome of elections, or whether we are offered something we will watch on Netflix. With the world around us seemingly determined by analysis of data and an immense amount of data produced daily, we have seen the Data Scientist emerge as a new, or perhaps, rebadged specialism. How should the Consulting Engineer profession embrace this new team member? The hype is hard to see through, so it is hard to determine a good strategy.
This article is not simply about Data Science but about how Consulting Engineers can prepare for the benefits that Data Science can offer improving outcomes for our clients.
Possible Strategies
Considering her own clients experience of how industry tends to think and operate, we suggest that there are three possible strategies to follow:
Ignore
Determine the ‘touch points’ and move towards better use of Data Science
Diving deep, declare that Isaac Newton, Ohm and Euler were ‘behind the times’ and join the world where data rules
Ok, I am provoking, but not much.
To ‘ignore’ means that our industry will become less relevant to how our clients run their businesses. Probably, data is important to them, and we need at least to relate to them. If we don't adapt we’ll find it harder to attract tech-savvy young professionals; our cost base will then become then too high and the industry slowly dies.
The second option, to understand the ‘touch points’ where Data Science and our work as engineers has real synergy, and then to make the connections closer – we suggest that this is the win-win option. The third, to dive in with an evangelical approach will impress some, but only for a limited time.
The reality is that Consulting Engineers do something based on robust mathematics and physics. Our clients expect us to give a solution that is correct - meaning it meets the needs and is technically sound.
Will Data Science make us redundant?
For about 20 years we've been told that the in the future will “just press a button and out comes the design”. Some in data Science world do believe and challenge whether, what we can call Domain Knowledge, meaning we are expert engineers at whatever field of consulting we treasure, will matter. Who needs domain knowledge when predictive models can just work the answer out from scratch?
Peter Aldhous is a Science Reporter for BuzzFeed News and is based in San Francisco. He worked at Kaggle, a renowned website that runs Data Science competitions and is now a subsidiary of Google. He was interviewed in New Scientist magazine in 2012.
Talking about what makes a good competition he reflected that “it's controversial because we are telling [people] your decades of specialist knowledge are not only useless, they’re actually unhelpful; your sophisticated techniques are worse than generic methods. It's difficult for people that I used to that old type of science”. I think that he might mean us! Does he have a point? Well, we can reflect on the important part that definition of requirements plays in what we do as Consulting Engineers. Sometimes the technical solution is the easy part. It is working with the stakeholders to decide what the outcome of that technical solution needs to be that is a skill – the use of our vital domain knowledge.
We could have course just ‘press the button’ in the future and offer the client a range of options to choose from, or even show one option, and recommend others that they may like because they like something else? We will come back to this, but first some background.
‘Big Data’
Before we try to define Data Science, or the part of it that matters to us and want to explore, we need to first demystify what we understand as Big Data. Let's start with Steve Law, a data journalist at the New York Times, said that “Big Data is a very vague term, used loosely, if often these days. But put simply, the catch-all phrase means three things. First, it is a bundle of technologies. Secondly, it is a potential revolution in measurement. And third, it is a point of view of philosophy, about how decisions will be - and perhaps should be - made in the future.”
It is interesting to consider how he emphasizes ‘measurement’ as the thing been revolutionized by Data Science, not the doing, designing, or creating.
At this point, any self-respecting Consulting Engineer will be asking: How much data do I need for big data? Ok, well just so that you can talk about Big Data with more credibility than many, let me offer some loose definition.
The data world sees it like this: big is when you cannot fit it on your data handling infrastructure that you have. If your data is too big for your desktop, and the desktop is all you have, then you have a big data problem. Likewise, if your servers are not big enough, then you have a big data problem. Presumably, Google and others, are actively managing capacity problems, which is why they are continually building new data centers, that we can design for them. We think that a simple example will help to demystify and illustrate how you achieve a big data problem. Then we can share some buzzwords that confront that you can throw around to show your deep knowledge!
Why ‘Big Data’ is tricky - an example
Let's say that we have a list of names and we want to know the most common name for females. Let's say we have a small sample of people.
In this simplistic example we can see that Emma is the most common name, but let's think about the process as data analysis. We deal with each item, each person, one at a time and look at the others to see if there are any matches. If there is a match, or matches, we count up by one, to how many in total. Then we move to the next name in the list and do this again. When we have looked at all names in the list then we will sort out and determine which is the most common in this case, obviously Emma in this simple example.
The point to notice here is that to observe whether one name has matches in the others you need to review each o the other names to find out. When a computer does this, it means that all names in the list must be held (or read into memory) each time we cross-check each name, or data item.
This is easy for our simple example, but what if we scale up to 1,000 names, or 10,000 names, or a million names. Then we need to hold all items in memory - and this presents a big data problem!
These problems are solved by working across many computers. This introduces a problem of errors. If you use many computers at once, probably one will fail. The solution Hadoop and others deals with this problem in large systems.
‘The 4Vs’
Data Scientists coined the term ‘the 4Vs’ as a way to describe big data by its:
Volume - there is lots of it
Variety - it is diverse
Velocity - it arrives at pace and usually continuously
Value - we can use it to add value to our activity
We need to understand what big data really means to put the Data Science in context with our own work as Consulting Engineers. Think about the data that we use or evaluate in a usual design tasks. It does have value, but we probably cannot say that it has volume, variety and velocity in the way that these are used in the big data definition. The definition isn't a good fit, and that's why we are exploring Data Science here
What can Data Science do for us?
A definition for our Data Scientist colleagues is far from decided, other than it is a person that is sexy, magical and the Ferrari parked outside is probably there's!
Don't feel resentful; let them enjoy their moment. Isambard Kingdom Brunel was in their shoes once. As Consulting Engineers, we can get in on the action. We can walk in their backyard - of course we can and must.
For the purposes of this debate, let's agree that a Data Scientist is someone who can handle Big Data problems. All Consulting Engineers do analysis of data every day, but it is not usually on any Big Data scale and the answers we determine are usually clearly defined, not probably!
So, let us say that the Data Scientist is a new addition to our multidisciplinary team that offers the chance to expand our knowledge or insight by analysis of data at high-volume; the type with ‘the 4Vs’; the big data problems.
Culturally, data scientists work in open environments, value shared visualizations of data and also welcome multidisciplinary skills - we can welcome this. Today’s hotbeds of Data Science can be found in Facebook, Amazon, Apple, Netflix and of course Google, the mega enterprises known as the FAANG. These mega machines are continuously collecting and analyzing data and trying to make predictions of what will happen next, and if they can, to encourage us to take the next step. This may be to watch the next movie on Netflix or buy on Amazon, or whatever. The general principle of how they do this is important to consider because it is quite unlike engineering mathematics. They are in a very informed and sophisticated way, guessing. Engineers do not usually like to be guessing.
Consulting Engineering is about physics and mathematics and the ingenious application of proven theory. The actions of the elements we apply, like the structural or fluid or electrical, interrelate in a way that mathematicians call linear, and continuous. These relationships can then be expanded in equations that will tell us what happens as any variable changes. Almost all that Consulting Engineers do can be defined like this.
Data Science is predominantly focused on probability, using some form with logistic regression. This is to look at the data, define ‘features’ of that data, and to use them to predict the probability of another action; whether you will want to buy a new car or watch a certain are the movie etc. So, where is the place for such analysis in our industry? Let's think about that.
4 ways that Data Science can add value for Consulting Engineers
We suggest that there are four ways that Data Science touches our work as Consulting Engineers in a positive way. Consider these as ‘touch points’, as the places where we need to strategize, so that the way we work with eventually converge with that of our Data Science colleagues.
We will address these in the order of how the approaches move from linear regression (the mathematics that we are normally using in engineering) to those that draw more from the logistic regression (being the use of probability of outcomes rather than absolute outcomes).
- Coding for Consulting Engineers
Writing machine code (programming) is the art of algorithm design and the craft of working until the algorithm does what you need.
Data Scientists bring coding skills, with Python and R being the most used languages. Both are open access; free to download and use. Most engineers will have had some coding experience, but perhaps this was in university some time ago.
Python is used across many disciplines, and it's fairly easy to get started for a determined novice. There are many resources on YouTube [link] and other written internet material and many resources with example code to learn from. Increasingly, Python is being used to link between data packages and some data sets, for example, bringing ground investigation data into civil engineering models in Civil 3D of Revit. It is also useful for automating production of reports and visualizations of data analysis from design models. With any coding, the most important skill is the ability to fail many times as you learn.
Bringing someone into your team who has Data Science knowledge will also bring you someone who can write and execute code - so there is an instant added benefit to your team.
A good book to learn from is “Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython” by Wes McKinney.
For online resources, it is about the style that suits you, but for beginners we can recommend CS Dojo on YouTube as a good starting point. YK Sugi is logical, clear and fun.
2. Testing options for Consulting Engineers
‘Options’ is a word that usually makes Consulting Engineers anxious. Often, thanks for our own experience-based instinct we can spot which solution is the most appropriate for the application. But our clients do like to be given options before a decision is made, and often these options allow us to justify the choice that we support.
If your work as part of a team with an architect, the typical case, then we probably have two parties to convince which option is best - the architect and the client. Sometimes we may need to convince other engineers too, perhaps around arrangements of space allocation or business impact.
We suggest that bringing data scientist into a team offers greater credibility in exploring, testing and supporting one option against another. This is most significant if we can test against external influences that have some random element, such as, building occupancy or use of a facility or function, and in transport or public facilities looking at how people will move around.
It is unusual for the Consulting Engineer to have access to large data sets that can inform their work. Most typically, these would be about ingress, egress or a circulation of users over time.
Data Science skills allow us to use datasets, if available, and to efficiently carry out the process of (as Data Scientists call it) data ‘munging’. This is the cleaning, organizing, scraping and formatting of data that is required to make a large dataset usable for us to use.
Most Consulting Engineers can think of examples where data has been available to us, that may have informed our design, but we could not use it because of the time needed to clean it up at such high-volume. A Data Scientist, with access to programs like python and R, can do this task efficiently.
Making your own data sets
Having Data Science skills that can manipulate data or code can enable us to produce simulated datasets, or complete data sets that have gaps or obvious errors. What is the point of making up the data? It is a fair question that we suggest has a valid answer.
It most cases, as Consulting Engineers, we are working towards a worst-case performance criteria. This may be vehicle loads, or building static loads, or for building services, and occupancy worst-case. These do not consider real-time situations, just a solid state worst-case value.
Think for a moment about post COVID-19 people flows in buildings or other enclosed spaces. We will not only need to consider the maximum occupancy for a space, but also how occupancy may change dynamically, as the people flow. Modelling of how people move from space to space is now important to consider the likelihood (the risk) of crowding and close contact within the space. Data Science skills can provide a set of tools to model and visualize these dynamic flows.
3. Benchmarking and Checking
Can we use Data Science to check out design information?
We think that the answer is eventually yes, but we do need to collect enough data to train our algorithms. Consider that, if we can identify spam emails or the likelihood that you will like a movie using algorithms then we can also use the same approach to spot possible design errors. Whether this can be done depends on whether we can collect enough example data to teach the algorithm how to spot mistakes. This is why we say eventually, because this is tomorrow's tool, not today’s.
4. Asset Management Proactive Maintenance
Asset Management is the process of the effective and efficient management of any asset in operation. This is the lifecycle phase that usually begins when the project design and construction team has handed the operational, commissioned project, over to the owner or the user - the Asset Manager.
A major part of Asset Management is maintenance of the operational assets, being buildings, or utility infrastructure, such as water or power networks or road and highways etc. The approach to maintenance activity is classified into three types:
Reactive Maintenance - This was yesterday’s strategy. The most traditional, where if something fails, we fix it.
Preventative Maintenance - This is typically today’s strategy. An improvement is to try to take actions to spot potential failures and do something to prevent them. This is Preventative maintenance, and a typical example is where lighting lamps or air conditioning air filters are changed on a routine basis, every year etc.
Predictive Maintenance - This is the strategy for today and tomorrow and where your clients need to be heading. The progressive step is to move to Predictive maintenance, where we can use data to identify and predict where we need to spend our maintenance cash. This gives us the highest reliability, the least disruption (down time) and we use our data to predict what interventions are required.
You can find more information about maintenance approaches in the article about Digital Twins here.
Predictive maintenance is the nirvana for Asset Managers. It means that maintenance interventions are optimized, only for those items where will be where it will add value, and these are and there is no reduction in the systems reliability. If we can predict when an item may fail, with a probability threshold that we can set and adjust, then we can start to target and maintenance spend on the priority items - meaning those that need maintenance interventions now.
This approach is best illustrated by examples from Building Services. If we can monitor the failure conditions of enough items, say fans in an air conditioning system, then we can look for trends and patterns. Perhaps, the load or the ambient temperature of the space or the air flowing impacts performance. If we have enough data, then our Data Science colleagues can write algorithms that will better predict failures. We can target maintenance activity and start to shave cost from the maintenance margins of our client’s business.
Some building management systems do provide some predictive functions already, but at a very rudimentary level. As Consulting Engineers, we should be looking to the future and then anticipating our client’s expectations. If Netflix can predict (or be perceived as predicting) the movie that your client wants to watch next, then why can't their Asset Management Systems predict which element will fail first? It's a reasonable question, isn't it?
Be ready for significant improvement in the diagnostics available for Asset Management in the next few years and think about the data that can be collected from your systems to inform and enable that revolution. You can use Data Science as a tool to help you specify what your client needs.
Conclusion
Steve Law, a data journalist at the New York Times, said that “Big Data is a very vague term, used loosely, if often these days. But put simply, the catch-all phrase means three things. First, it is a bundle of technologies. Secondly, it is a potential revolution in measurement. And third, it is a point of view of philosophy, about how decisions will be - and perhaps should be - made in the future.” Data Science is most easily described as the use and management of ‘Big Data’ sets’
To understand the ‘touch points’ where Data Science and our work as engineers has real synergy, and then to make the connections closer, is an essential win-win for our industry.
There are four ways that Data Science can add value for Consulting Engineers:
Coding for Consulting Engineers
Testing options for Consulting Engineers
Benchmarking and Checking
Asset Management Proactive Maintenance
A good source for a wide ranging and non-technical introduction to the world of Data Science is “Doing Data Science: Straight Talk from the Frontline” by Cathy O'Neil and Rachel Schutt.
Authored by Paul Lengthorn
Chartered Engineer, MBA, BEng, member of the Institute of Asset Management (IAM) and independent practicing Consulting Engineer