Introduction to Project Haystack

So what’s with all the talk about data semantics and tagging?

Most likely if you are reading this you have either you have come to understand what data tagging is and why it’s useful, or are concerned that the concept is an attempt to force another communication protocol on the industry, or make you adhere to a rigid naming standard that will not fit your needs. If you are thinking either of the latter, give me a few minutes of your time to try to bridge the gap.

The reason data semantics (a big pretentious word so let’s just say tagging going forward) is important is because smart devices and equipment are creating dramatic increases in the amount and type of data available from our facilities, AND a new generation of software applications promise to help us benefit from that data. In the buildings industry those benefits include, improving the performance of our facilities and equipment systems by detecting faults and inefficiencies, reducing energy use and costs, streamlining maintenance operations, measuring and managing occupant comfort, satisfaction and productivity, meeting regulatory and reporting requirements.

Data is Everywhere — But…

Device data seems to surrounds us, but it turns out that it’s one thing to have access to data; it’s another to make it useful and actionable. And, the primary barrier to using the data from diverse devices and systems is knowing what it means. This problem is not unique to our industry. Throughout our increasingly software driven world the challenge of how to give data meaning to enable software applications to work with data more effectively is a core challenge being addressed in every segment of the software industry. The concept of tagging is one of the primary approaches that has gained acceptance. Think about tagging your emails in Gmail, or tagging your photos in your favorite photo app so you can find ones with specific meaning.

The challenge in our domain (building systems) is that device data are stored in many different formats, communicated via numerous protocols, have inconsistent non-standard naming conventions, and have very limited descriptors to enable us to understand meaning without direct human knowledge of the device producing the data. Ideally we want data to be self-describing. Without that, a time consuming manual effort is required before data can be used effectively to generate value.

Describing the Meaning of Data — Data Semantics

In order to utilize equipment system data in external applications such as analytics, analysis, visualization and reporting tools, we need to know the meaning of our data. For example, if we obtain a data item from a device and it has a value of 77.6, we can’t do any effective analysis of that value until we understand whether the number represents 77.6 degrees F, or degrees C, or PSI, RPM, or kW, or some other unit of measure. “Units” therefore is one good example of a common but essential descriptor we need in order to understand and use data, but it is by no means the only one.

Continuing with our example, if all we know are the units (Deg F), we still don’t know much about the significance of the value 77.6. If it’s a zone temperature it might be a bit warm for occupants. If it’s a return air temperature, it’s right where we want it to be. So we want to know what it is.

Let’s say the sensor is named zn3-wwfl4. If I am intimately familiar with the building system and the naming conventions used when it was installed I may be able to determine that means Zone 3, West Wing, Floor 4. If I know the building really well I may also be able to tell that zn3-wwfl4:

Is a zone temperature

Is an exterior zone

Is south facing

Is supplied by a VAV box

Is served by AHU-1

Is operated on occupancy schedule #1, which is 7:30 AM - 6:30 PM

Has an occupied cooling setpoint of 74 degrees F

Armed with this additional information I could determine that a value of 77.6 is not proper for 9:00 AM on a weekday — it’s too hot and will lead to occupant complaints. What enabled me to make that determination, however, was a significant amount of information about the meaning of the specific sensor. Information that I happened to have because of my personal knowledge of the building. This is information that I may take for granted if I know it, but it’s information that was not recorded in the control system, the sensor or any single location, and is not available in any consistent “machine readable” format. Herein is the challenge to using the wealth of data produced by today’s systems and devices — we need a method to represent, communicate and interpret the meaning of data. This “data about data” is often referred to metadata.

Having appropriate descriptive data (metadata) about the sensor zn3-wwfl4 would enable another person (or software application) to understand the impact of the current value of 77.6 without relying on personal knowledge of the building. Without the necessary metadata, however, we can’t determine the impact of the current value and its relationship to proper system operation. So in order to provide effective use of sensor and equipment data we need to combine descriptive metadata with the sensor value.

When done manually, this process is referred to as mapping or “data wrangling”. This step in the utilization of equipment data has historically been a time consuming manual process that adds significant cost to the implementation of software applications such as analytics, data visualization and reporting.

With all of the power they have gained over the last decade, and the adoption of standard communication protocols, most building automation systems and “smart” equipment systems provide little to no ability to represent and communicate semantic information about the data they contain beyond some of the simplest attributes like units. There has been no standardized approach to representing the meaning of the data they generate or contain. Systems provide a name, which typically is ad hoc and follows no universal standard, a value, and units but little other information. The result is that a labor intensive process is required to “map” the data before any effective use of the data can begin. Clearly, this creates a significant barrier to effective use of the growing amount of data available from smart devices.

Representing Metadata with Tagging

So how can we capture this descriptive information, associate it with the data items in our automation systems and smart devices and share it with other applications and people? We cannot do it simply by trying to use standardized point names. Even in our simple example we have more metadata that can be effectively captured in a point name. Add to that the fact that we may want to add numerous other metadata items over time and it’s obvious we need another approach. An effective solution needs to have the following characteristics:

It should de-couple the point name from the the associated metadata. The concept of tags to represent the metadata works well here. Tags represent “facts” about data items and can be associated with the point names to provide information that describe the point. They tell us about the meaning of the point, but they do not replace or change the point name in any way. This is essential for any solution to work with existing systems. The reality is that we have millions of points in thousands of systems and their point names cannot be changed. It’s simply not an option — and it isn’t necessary. What is needed is a standardized model for associate metadata with those existing data items to enable us to associate meaning with the existing point names.
It should utilize a standardized library of “tags” to provide consistency of metadata terminology. This will enable automated tools to interpret data meaning. The library needs to be able to be updated by industry experts as new applications are encountered. The metadata methodology therefore needs to be extensible. Tags should also make it possible for both humans and machines to interpret the meaning of the data.

The Role of Project Haystack in Addressing the Metadata Challenge

The role of Project Haystack is to bring together a community of industry constituents in a collaborative, open-source effort to develop a standardized approach to representing and using metadata across a wide range of applications. Project Haystack provides a flexible, extensible methodology for representing and conveying metadata, standardizes semantic data models for common equipment systems (tags sets to describe equipment systems) and defines standard web services to communicate meta data between applications.

The Project-Haystack vision is to streamline the use of data from the Internet of Things including, but not limited to, building and energy systems, by creating a standardized approach to defining “data semantics” and related services and API’s to consume and share the data and its semantic descriptors. Project Haystack makes it easier to unlock value from the vast quantity of data being generated by smart devices by making data “self-describing”.

An Open-Source, Community-Driven Approach

Project Haystack is operated as an open source project, community-driven initiative modeled on the open source efforts found in the software industry, which makes it easy for anyone to get involved. Anyone can easily take advantage of the work of Project Haystack and contribute to it. All collaboration is done on the discussion forum at http://project-haystack.org/. Anyone can contribute on the forum by signing up on the web site.

Domain experts in a given area such as chillers, data centers, or refrigeration can join or start a discussion. Equipment manufacturers who would like to see specific tag models for their products are also a great source of input. All of the work done by Project Haystack is easily available to the industry. It can be downloaded without even registering an account on the web site. There is no cost or obligation associated with using Project-Haystack techniques, tagging libraries and open source reference implementations.

Project Haystack is More Than Metadata Tagging

It’s important to note that Haystack is more than one thing.

First, it’s the data modeling methodology — the simple, flexible tagging approach that can be used in media from Excel spreadsheets and CSV text files, to data tables in embedded devices, XML representations, web services and others.

Second, it’s the consensus developed tagging libraries (taxonomies) published and made available for download and use (at no cost). You can find all of the tagging libraries developed by the Project Haystack community here: http://project-haystack.org/tag

Third, it’s the REST style communication services designed to exchange Haystack tags between applications.

Fourth, it’s the software reference implementations and complementary applications being developed by various community members and companies. As of the date of publication these included:

Haystack Java Toolkit: lightweight J2ME compliant client and server implementation
NHaystack: Niagara module to add Haystack tagging and the Haystack REST API
Haystack CPP: C Haystack client and server implementation
Haystack Dart: client library for Dart programming language
NodeHaystack: node.js client/server implementation
PyHaystack: Python Haystack toolkit

You can find links to download all of these software reference implementations here: http://project-haystack.org/download

Perhaps most important, however, is the community that has formed to address the challenges of data modeling for building systems and IoT devices. The Project Haystack community continues to grow and expand the equipment and device models and extend the range of applications served by Project Haystack.

A Flexible Extensible Model that Can Be Used Beyond Consensus Approved Tags.

It is also important to mention that the Project-Haystack methodology can be used beyond just the consensus approved tags available at any point in time. Projects and products can add custom tags to represent information important to their needs without requiring submission or approval. And those custom tags can still be discoverable and interpretable by other applications with minimal effort to add a reference index or look up table. Applications that are “Haystack tag-aware” can be easily extended to interpret new community approved tags and custom tags as they are added. Numerous companies have started their adoption of Haystack by developing their own haystack compatible tagging libraries and then returned that work to the community over time.

What Will the World Look Like When We Have Standard Tagging Models to Describe Building, Equipment and Device Data?

Today, using building system data beyond the system generating it requires significant time and effort because these contextual models must be built by hand. This means that the opportunity to use data to reduce energy use, and improve operational efficiency and performance is being stifled by the labor costs associated with manual mapping of system data and the inconsistency of those efforts. Project Haystack defines this common vocabulary so that we can build models of our buildings and systems enabling us to more efficiently derive value from all the data our building automation systems and smart devices are producing. With Project Haystack models we can transition from a manual process to an automated process and unlock the value of our operational data.

With metadata “tagging” a software applications can now automatically find and interpret the data they need to provide value to the user. Two specific examples of the benefits of metadata tagging being accomplished today include:

Software today can automatically generate equipment graphics and system views simply by interpreting tags on the control system data. Hours of manual graphics assembly is thereby eliminated reducing project cost and increasing value creation.
With proper metadata, analytics applications can quickly consume data from equipment systems and interpret patterns in operational data to identify faults, deviations, and trends that can be addressed to improve efficiency and insure proper operation of equipment systems.

Project Haystack enables a future where a push of a button can turn data into true intelligence reducing the cost of intelligent building systems and enabling operational teams to better understand and improve the operation and performance of buildings.

More Information:

Complete information on Project haystack can be found at www.project-haystack.org.

Introduction to Project Haystack

Showcasing what the Project Haystack community is doing