SQL Develop

Stream and Complex Event Processing from a Relational Guy’s Eye (StreamInsight Series)

This is the first in (hopefully) a series of blog posts where I will be looking into Microsoft’s new technology for Complex Event Processing (CEP); StreamInsight (SI). This post is an overview of the problem domain that Microsoft tries to target SI at. As I am a relational database guy at heart, I look at it from a relational guy’s perspective.

Relational Database Systems

The relational database system (RDBMS) is the backbone from almost any enterprise application today, and the various RDBMS’s are highly optimized to deliver the best performance available, for its particular type of applications. The particular type of applications a RDBMS is (mostly) optimized for is an application where updates to the data don’t happen that frequently (i.e. not like 100,000’s of updates per second) and queries against the database are queries against (from what can be described as) a snapshot of the database.

The last couple of decades we have seen the emergence of types of applications that has somewhat different requirements and characteristics than a typical RDBMS based application. Examples of these type of applications are OLAP, Data Mining as well as storage and querying new data types such as XML, media and spatial. This has required the RDBMS to add new functionality as well as extending existing functionality.

Streaming Data

The last few years there has been yet another type of data intensive applications arriving on the scene, but these applications has somewhat different requirement’s than “just” being able to query “static” data. These are types of applications where data can potentially arrive with very high frequency and we may need to run queries against this data continuously and / or from the arriving data derive new types of data (change the schema of the original data) – which we also may want to run queries against. I am talking about Stream Data Processing (SDP) and Complex Event Processing (CEP) applications.

The main differences between a typical RDBMS application and a SDP/CEP application are:

  • The data in a SDP/CEP application can be never ending. I.e. the data continuously arrives.
  • When we query data in a RDBMS app, we do it against a static snapshot of the data at that particular time.  The data is being evaluated once – and output once.
  • Querying against SDP/CEP data however is typically done in a continuous fashion. The data is continuously evaluated and output.

RDBMS vs. SDP/CEP

We can use RDBMS systems for SDP/CEP applications; we load the incoming data into the database and then we run queries continuously against the stored data. This will work OK, but we will run into some issues with it:

  • By storing the data before we query it, we are adding latency as per Figure 1 below.
  • We may have to write some convoluted queries in order to be able to querying the data in a continuous manner

dbms-stream-porcess

Figure 1: RDBMS Handling Stream Data

So, even if we can use RBMS’s for SDP/CEP type applications it is fairly obvious that this may not be the best approach. Hence the rise of another type of management systems for SDP/CEP applications: the Data Stream Management Systems (DSMS).

The DSMS systems work under the premises that we have some sort of server (running in memory), which serves up application(s) that handles the incoming data. The incoming data is fed to the application(s) by the use of input adapters. In the application(s) there are continuous queries running over the data from the input adapters. The results of the queries are then being fed to output adapters which serve up the data to applications that need the data. Figure 2 tries to illustrate a DSMS system.

stream-engine-process

Figure 2: General Overview of DSMS

Dependent on the DSMS system the language of the query may vary. Quite a few systems are using languages that are fairly similar to SQL, whereas SI is using LINQ. As we can see from Figure 2, the main part of the DSMS runs in a low latency environment, and it is only if we need any sort of look-up data loaded from a RDBMS that we may run into high latency issues.

Complex Event Processing

So what is the difference between processing the streaming data and doing CEP? In CEP we look at the individual events, try to correlate them and look at the impact on a macro-level. A typical example of this (quite a few DSMS systems, are using this as an example) is where we collect sensor signals from cars, let’s say each car sends out a signal every 30:th second. This signal contains information about position, speed, road, lane in the road etc. When analyzing these event signals we say that a car-crash has happened if any given car has during 4 consecutive signals the same position and 0 speed. We have analyzed the individual events and from them derived a new event: a Complex Event.

This was a very rudimentary explanation. To get a fuller (and much better and in-depth) explanation have a look at a series of blog posts by Tim Bass.

Finally

This was the first post in my series. In the next post, I will look into the architecture of StreamInsight.

StreamInsight Series

Over the next few weeks/months I am planning to do a series of blog-posts about Microsoft’s entry into the Stream Data and Complex Event Processing domain; StreamInsight.

I will update this page with links as I do publish the individual posts.

Resources for CEP and StreamInsight

Yesterday I finished a class down here in South Africa – you guys rocked!!!! Now I am sitting on the balcony, looking out over Indian Ocean, with a glass of wine (hey, it after 17:00 somewhere in the world – right), waiting for the cab to come and take me to the airport and my flight to England :( .

Anyway, I thought I’d make a short post about various resources (mostly blogs), that cover Complex Event Processing and StreamInsight. So, without further ado, resources for CEP and StreamInsight:

Blogs:

Forums:

Have a lok and see if you find it useful. If you have other sites you visit for this kind of information, please let me know and I will include it here.

StreamInsight and SQL Server

There has been some disussions recently about StreamInsight and how it fits into SQL Server; especially as it was said that StreamInsight is a SQL Server 2008 R2 technology.

Well, as it stands at the moment, (the operative words here are “at the moment”), is that right now, the CTP of StreamInsight has no dependencies what so ever on SQL Server. Now – some of you that has installed StreamInsight may say; “Niels what are you saying, in the installation instructions it says you should install SQL Server CE”. Well, yes – that is true; it says “you should”, but that is a bit like when your mum said you should stay away from loose men / women – you did not pay much attention, did you?!.

The truth is, you are not dependent on SQL Server CE or SQL Server 2008 R2. You can live a happy and fulfilling life and run StreamInsight without any trace of SQL Server 2008 R2 or SQL Server CE what so ever. At the moment, the only reason you would need SQL Server CE, is if you wanted to persist StreamInsight metadata in order to (more or less) automatically re-load it later (this is for another blog-post).

As I aid in the beginning, this is as it stands at the moment – what will happen in the future, no one knows (at least no one outside of Microsoft). So there you have it, if you want to test StreamInsight, but are allergic to SQL Server, for one reason or another, do not worry – go ahead, you can still use it.

SQL Server 2008 R2 August CTP

Yesterday I downloaded and installed the August CTP of SQL Server 2008 R2, and today I played around with it for a while. So, what are my impressions…

Well, from a perspective of being a relational dev and internals guy, my immediate response is … “yawn – where is the beef”. I.e, it is not much there, and I doubt we will see much more in coming releases. However, if I were a BI / reporting guy I’d be over the moon, and definitely look forward to future CTP’s! Even if I were a (wait for it …) DBA I would be fairly interested.

I will let you decide for yourself what is interestimg for you, but one thing that is not in the CTP at the moment but is promised (and keeps me interested) is StreamInsight (based on Complex Event Processing). This will be part of SQL Servr 2008 R2. Coming from the financial industry and dealing with message based applications (that’s why I love SQL Server Service Broker), this is something I am really interested in. So, even if you are a T-SQL / internals guy, do not despair – there may be something for us as well.

SQL Develop