Hydrocarbon Processing Copying and distributing are prohibited without permission of the publisher
Email a friend
  • Please enter a maximum of 5 recipients. Use ; to separate more than one email address.

Use crowdsourcing to boost process simulation

06.01.2014  |  Brown, S.,  Schneider Electric , Houston, Texas

There are many ways to enhance the user experience of next-generation simulators by borrowing concepts from crowdsourcing, data mining and related fields.

Keywords: [crowdsourcing] [process simulation] [ergonomics] [corpora] [filter] [application]

The next generation of process simulators is under development. These tools must have a high degree of ergonomics and usability to be embraced by engineers who have grown accustomed to the intuitive interfaces of tablets, smartphones and modern business software.

A fundamental principal in software ergonomics is to minimize the number of clicks required to accomplish tasks, especially common tasks. Some smart phones achieve this with keyboard apps that have an uncanny ability to predict the next word you are going to type. They know that certain words are statistically likely to appear together, like “Tuesday night” and “ice cream,” as shown in Fig. 1.

  Fig. 1.  Swiftkey predicts
  next word.

A cursory examination of a few process simulations reveals a similar fact—that certain combinations of unit operations, for example, “compressor → heat exchanger,” occur frequently. Equally important, many combinations are almost never encountered. For example, an expander would rarely follow a compressor, because they do opposite things.

Simulators can use a combination of statistics, heuristics and domain knowledge to predict downstream objects (anticipating their users’ needs) and improve the simulation building experience.

Improved workflow

The current generation of simulators falls short of the “minimize clicks” ideal in the flowsheet building phase. Simply put, the user spends too much time searching model palettes for the desired object. This problem will worsen as future simulators provide increasingly larger palettes to accommodate expanding markets.

If a simulator could reliably predict the next object in a simulation, there would be no need to search the large palette, and the flowsheet building workflow might look like Fig. 2. The goal is to display a small smart palette that always contains the object that the user wants. To achieve this, the palette will need to be context sensitive, showing only the most likely downstream objects for that particular object.

  Fig. 2.  Proposed workflow.

Crowd help

The smart-palette could, of course, be populated by the simulation vendor, using their best guess for common object pairs, but this would reflect their experiences and biases and might miss some important cases. A better strategy would be to survey the user community, or equivalently, data-mine object pairs from the body of simulations they have built. Indirectly using the community of simulation users to recommend the next object in the simulation is essentially the same technique used in text prediction, borrowing ideas to populate the palette. The fundamental premise of this implicit crowdsourcing approach is that if a lot of people are doing something (like connecting a compressor model to a heat exchanger), then it is probably a good idea.

Corpus linguistics

One popular technique for text prediction is corpus linguistics, which extracts statistical data on word combinations from large databases of real-world language samples (the corpora).1 Some readily available corpora include:

  • Every article published in the Wall Street Journal
  • Every English language book ever published (Google is compiling this corpus as part of its Google Books project)
  • The complete works of William Shakespeare.

Suppose the word “ice” occurred 2,000 times in the Wall Street Journal during the last five years. If we look at the following word for every case, we can establish that “ice cream” was more commonly used than “ice skating,” as shown in Table 1.


Simulation corpora

To predict simulation objects, we could gather thousands of real-world process simulations, from which we could extract the probability of any object following any other. The more simulations in this simulation corpus, the better the predictive ability; this is, after all, a statistical technique, and it is governed by the law of large numbers.

In practice, however, simulator vendors do not have access to many of their customers’ models, which often contain confidential process data. Perhaps a cloud-based simulator could collect general patterns from customer simulations, but, for now, any simulation corpus is likely to be small and a purely statistical prediction approach might be insufficient. Fortunately, this drawback will be more than offset by the considerable advantages that the process simulation community has over the text-prediction folks:

  • The vocabulary is much smaller: The English language contains tens of thousands of words, while even next-generation process simulators contain in the low hundreds of model types
  • The domain knowledge of the process being simulated exists
  • The processes simulated are constrained by physics, which is more straightforward than the rules of grammar that constrain proper writing.

Segregated corpora

It is important that a corpus be representative of what it strives to predict. A phone app designed to assist texting teenagers should not use the (very narrow) Shakespearean corpus as the basis for its prediction. An overly general corpus can also be a problem. A reporter writing about the winter Olympics would be better served by a narrow Sports Illustrated corpus, which would certainly rank ice skating and ice hockey above ice cream.

In process simulation, models created for operator training simulators (OTSs), design and real-time optimization (RTO) have different characteristics, and a corpus that predicts well for one type may not be as successful for another. OTS models focus on high-fidelity reproduction of the plant’s dynamics and include many pumps, valves, pipes and controllers. An RTO model of the same process typically omits these and any other objects that do not constrain or otherwise affect the steady-state plant economics.

Process knowledge

Text prediction in its simplest form predicts the next word based on the previous word, but this strategy does not work well for words like “and,” which can be followed by almost any word. One solution to this problem is to consider two or more previous words. This expanded context allows Android’s SwiftKey app to predict familiar triplets like these:

  • Bacon and eggs
  • Snow and ice.

Analogously, the model prediction algorithm can be improved by considering more than just the upstream object type.

Filtering by fluid type

In process modeling, a source object can be followed by almost anything, and our simulation corpus might yield statistics like those in Fig. 3 (left). The same is true for many other objects including mixers and splitters. The long flat trend indicates that sources have diverse uses, and there are thus no obvious choices to show in the smart palette.

Fig. 3 (right) shows how the histogram may look when only considering air sources, containing O2 and N2. The downstream objects are dominated by only a few types and the prediction ability is greatly improved.

  Fig. 3.  Sharpening the prediction by considering fluid type.

There are many common scenarios where knowledge of the fluid type can improve the prediction capability, for example:

  • Air, when modeled as a single component, will almost always be used for cooling. So a heat exchanger or fan will almost
    always be the downstream unit
  • A full-boiling crude stream in an RTO model will probably only go to these units: heat exchanger, splitter, mixer, furnace, column, flash, pump, desalter and blender.

Filter by stream properties

Simulators that solve the model as it is being built can further improve the smart-palette contents by considering the calculated stream properties. Knowledge of the phase is the most important:

  • A stream with liquid will never be fed to a compressor, expander, or any other object that requires vapor feeds
  • A stream with vapor will not feed a pump or tank
  • A mixed-phase stream is likely to be sent to an adiabatic flash or column for separation
  • A mixed-phase stream will rarely have a flowmeter on it.

To a lesser extent, a filter is based on a stream’s temperature, pressure, and composition:

  • A very-low-pressure stream will probably not be sent to an expander
  • A refinery stream with a high sulfur content is a candidate for hydrodesulfurization
  • A very pure stream will probably not be sent to a column.

Best practices

The smart palette can encourage best modeling practices by alerting users to options that they may not have otherwise considered. Stream S3 in Fig. 4 (left), for example, has 200 components. But, due to the upstream column, 75% of those components are trace. The user originally intended to connect it to another column, but sees that the smart palette highly recommends a component lumper (a utility that simplifies models by consolidating components). The user realizes that the lumper is indeed a good choice because the downstream column will solve faster and more reliably with a reduced component slate in Fig. 4 (right).

  Fig. 4.  Stream S3 (left) has 200 components, but, due to
  the upstream column, 75% of those components are trace.
  The user realizes that the lumper is indeed a good choice
  because the downstream column will solve faster and
  more reliably with a reduced component slate (right).

Proof of concept

In this example, only the basic ideas described earlier are tested by trying to predict the downstream units for existing RTO models without the phase- or property-based filtering. To avoid any systematic bias that might overstate the prediction ability, the models were provided by different operating companies and were built by different engineers. The corpus consists of only a single simulation—a large (1,795 objects) real-world ethylene plant RTO, containing mostly the separation equipment downstream of the cracking furnaces.

First, extract all unit pairs from the ethylene plant model and sort the results. Table 2 shows the probability of an object following the model’s 380 heat exchangers (HXs) and Table 3 shows the destinations of the flash vapor. The top four destinations, highlighted in red, will be shown in the mini-palette.



Next, extract the connectivity information from two other RTO models, a gas plant and a crude unit, and calculated how often the ethylene corpus would have predicted the downstream objects in these models. Table 4 summarizes the results. Observations:

  • The highlighted cells show encouraging success rates: The actual downstream unit appeared in the four-item smart palette 67% of the time for the crude unit and 72% for the gas plant
  • The last column shows that the prediction improves to over 80%, by showing a nine-item palette
  • The second and third columns show just how small the “model vocabulary” is in the simulations
  • Even though this particular simulator offers its users over 150 object types, the gas plant model, with over 1,400 objects, uses only 20 different types.

Lessons learned

Most prediction failures were a consequence of using a very small (1,795 objects) and narrow (ethylene domain) corpus. The ethylene plant model had a few idiosyncrasies that limited its prediction ability, and there were no other models in the corpus to balance them out. For example, it had no “source → heat exchanger” connections, so the source’s smart palette did not show an HX. Fig. 5 (top) shows why; the builder of this model chose to use a single air source, split to many HXs. The gas plant model used a different, but equally valid, approach, as shown in Fig. 5 (bottom). This was the cause of many prediction misses; however all of them would have been predicted, if the “air is likely to go to an HX” heuristic was enabled, as discussed earlier. They also would have been predicted had it used a larger corpus, since “source → HX” is indeed a common object pair.

Individual preference. The goal is not to judge which approach in Fig. 5 is better; rather, it is to have a corpus that represents diverse processes and modeling styles so that the crowd will decide which objects appear in the palette. But the smart palette should also dynamically adapt to an individual’s style and preferences. Text prediction apps do this, and it would probably be appreciated by the simulation community as well. One simple way to achieve dynamic palettes is to include a user’s past-built models in the corpus, with appropriate weighting.

  Fig. 5.  Different ways
  to model air-cooled heat 

To the future

The next generation of process simulators will do more than ever before. They will support design, dynamics and optimization activities in a single application. They will extend the benefits of process modeling beyond the traditional chemical, oil and gas industries. To prevent users from getting lost in this breadth of features, these simulators will need superb usability, including tools that not only reduce clicks, but that also support and encourage best practices.

Promising results have emerged from the idea discussed here, that of an implicit crowdsourcing application inspired from the field of text prediction. There are undoubtedly many ways in which the user experience of next-generation simulators can be enhanced by borrowing concepts from crowdsourcing, data mining and other such fields, which have matured greatly since the current generation of simulators were designed. HP


1 Hirschberg, J. (n.d.). N-Grams and Corpus Linguistics. Retrieved from www.cs.columbia.edu/~julia/courses/CS4705/ngrams.ppt.

The author
Scott Brown is a consulting engineer with Invensys, which is now part of Schneider Electric, concentrating on the company’s SimSci software products. During his 20 years with the company, Dr. Brown has been a developer of the company’s online optimization software and has authored training classes for its software products. His professional interests include applied mathematics, process simulation and ergonomics. He has a PhD degree in chemical engineering from Princeton University. 

Have your say
  • All comments are subject to editorial review.
    All fields are compulsory.

Related articles


Sign-up for the Free Daily HP Enewsletter!

Boxscore Database

A searchable database of project activity in the global hydrocarbon processing industry


Is 2016 the peak for US gasoline demand?




View previous results

Popular Searches

Please read our Term and Conditions and Privacy Policy before using the site. All material subject to strictly enforced copyright laws.
© 2016 Hydrocarbon Processing. © 2016 Gulf Publishing Company.