Stream and Complex Event Processing from a Relational Guy’s Eye (StreamInsight Series)
This is the first in (hopefully) a series of blog posts where I will be looking into Microsoft’s new technology for Complex Event Processing (CEP); StreamInsight (SI). This post is an overview of the problem domain that Microsoft tries to target SI at. As I am a relational database guy at heart, I look at it from a relational guy’s perspective.
Relational Database Systems
The relational database system (RDBMS) is the backbone from almost any enterprise application today, and the various RDBMS’s are highly optimized to deliver the best performance available, for its particular type of applications. The particular type of applications a RDBMS is (mostly) optimized for is an application where updates to the data don’t happen that frequently (i.e. not like 100,000’s of updates per second) and queries against the database are queries against (from what can be described as) a snapshot of the database.
The last couple of decades we have seen the emergence of types of applications that has somewhat different requirements and characteristics than a typical RDBMS based application. Examples of these type of applications are OLAP, Data Mining as well as storage and querying new data types such as XML, media and spatial. This has required the RDBMS to add new functionality as well as extending existing functionality.
Streaming Data
The last few years there has been yet another type of data intensive applications arriving on the scene, but these applications has somewhat different requirement’s than “just” being able to query “static” data. These are types of applications where data can potentially arrive with very high frequency and we may need to run queries against this data continuously and / or from the arriving data derive new types of data (change the schema of the original data) – which we also may want to run queries against. I am talking about Stream Data Processing (SDP) and Complex Event Processing (CEP) applications.
The main differences between a typical RDBMS application and a SDP/CEP application are:
- The data in a SDP/CEP application can be never ending. I.e. the data continuously arrives.
- When we query data in a RDBMS app, we do it against a static snapshot of the data at that particular time. The data is being evaluated once – and output once.
- Querying against SDP/CEP data however is typically done in a continuous fashion. The data is continuously evaluated and output.
RDBMS vs. SDP/CEP
We can use RDBMS systems for SDP/CEP applications; we load the incoming data into the database and then we run queries continuously against the stored data. This will work OK, but we will run into some issues with it:
- By storing the data before we query it, we are adding latency as per Figure 1 below.
- We may have to write some convoluted queries in order to be able to querying the data in a continuous manner

Figure 1: RDBMS Handling Stream Data
So, even if we can use RBMS’s for SDP/CEP type applications it is fairly obvious that this may not be the best approach. Hence the rise of another type of management systems for SDP/CEP applications: the Data Stream Management Systems (DSMS).
The DSMS systems work under the premises that we have some sort of server (running in memory), which serves up application(s) that handles the incoming data. The incoming data is fed to the application(s) by the use of input adapters. In the application(s) there are continuous queries running over the data from the input adapters. The results of the queries are then being fed to output adapters which serve up the data to applications that need the data. Figure 2 tries to illustrate a DSMS system.

Figure 2: General Overview of DSMS
Dependent on the DSMS system the language of the query may vary. Quite a few systems are using languages that are fairly similar to SQL, whereas SI is using LINQ. As we can see from Figure 2, the main part of the DSMS runs in a low latency environment, and it is only if we need any sort of look-up data loaded from a RDBMS that we may run into high latency issues.
Complex Event Processing
So what is the difference between processing the streaming data and doing CEP? In CEP we look at the individual events, try to correlate them and look at the impact on a macro-level. A typical example of this (quite a few DSMS systems, are using this as an example) is where we collect sensor signals from cars, let’s say each car sends out a signal every 30:th second. This signal contains information about position, speed, road, lane in the road etc. When analyzing these event signals we say that a car-crash has happened if any given car has during 4 consecutive signals the same position and 0 speed. We have analyzed the individual events and from them derived a new event: a Complex Event.
This was a very rudimentary explanation. To get a fuller (and much better and in-depth) explanation have a look at a series of blog posts by Tim Bass.
Finally
This was the first post in my series. In the next post, I will look into the architecture of StreamInsight.
StreamInsight Series
Over the next few weeks/months I am planning to do a series of blog-posts about Microsoft’s entry into the Stream Data and Complex Event Processing domain; StreamInsight.
I will update this page with links as I do publish the individual posts.
- Stream and Complex Event Processing from a Relational Guy’s Eye (published September 18, 2009)
Resources for CEP and StreamInsight
Yesterday I finished a class down here in South Africa – you guys rocked!!!! Now I am sitting on the balcony, looking out over Indian Ocean, with a glass of wine (hey, it after 17:00 somewhere in the world – right), waiting for the cab to come and take me to the airport and my flight to England
.
Anyway, I thought I’d make a short post about various resources (mostly blogs), that cover Complex Event Processing and StreamInsight. So, without further ado, resources for CEP and StreamInsight:
Blogs:
- http://blogs.msdn.com/streaminsight/ – from the horses mouth, the blog by the StreamInsight team at MS.
- http://geekswithblogs.net/cyoung – MVP, talks about BizTalk and CEP
- http://tibcoblogs.com/cep/ – TIBCO blog about CEP in general and some stuff related to what TIBCO is doing
- http://www.edmblog.com/weblog/ – blog about CEP, Business Rules, Predictive Analytics etc.
- http://epthinking.blogspot.com/ – THE blog about CEP, if you were to read only one blogabout CEP, this is it!
- http://magmasystems.blogspot.com/ – blog by the technical lead for CEP at a Wall Street bank.
- http://rulecore.com/CEPblog/ – useful information about CEP , blog not updated that often (hey, who am I to talk)
- http://weblogs.asp.net/sweinstein – blog by Scott, working for a company in New Your building neat financial applications (and other stuff)
- http://mdavey.wordpress.com/ – from a guy at an investment bank, doing development
Forums:
- http://social.msdn.microsoft.com/Forums/en-US/streaminsight – forum for MS StreamInsight
Have a lok and see if you find it useful. If you have other sites you visit for this kind of information, please let me know and I will include it here.
Deployment of Assemblies to SQLCLR
Today, after having arrived in Durban (South Africa), to do a gig here this coming week, I was going to start a serie of blog-posts about Complex Event Processing (CEP) and Microsoft StreamInsight.
However, as some of you may know, I am fairly interested in SQLCLR (the hosting of CLR inside SQL Server), and it happens from time to time that I try to help out with questions that arises on the forums and newsgroups.
One of the questions that pops up from time to time has to do with how to deploy assemblies which have references to other assemblies when using the Visual Studio “SQL Server Project” project data type.
So I decided to write a small post about it, so that I – in the future – just can point to that post instead of having to repeat it every time.
If you are interested, the post is here.
StreamInsight and SQL Server
There has been some disussions recently about StreamInsight and how it fits into SQL Server; especially as it was said that StreamInsight is a SQL Server 2008 R2 technology.
Well, as it stands at the moment, (the operative words here are “at the moment”), is that right now, the CTP of StreamInsight has no dependencies what so ever on SQL Server. Now – some of you that has installed StreamInsight may say; “Niels what are you saying, in the installation instructions it says you should install SQL Server CE”. Well, yes – that is true; it says “you should”, but that is a bit like when your mum said you should stay away from loose men / women – you did not pay much attention, did you?!.
The truth is, you are not dependent on SQL Server CE or SQL Server 2008 R2. You can live a happy and fulfilling life and run StreamInsight without any trace of SQL Server 2008 R2 or SQL Server CE what so ever. At the moment, the only reason you would need SQL Server CE, is if you wanted to persist StreamInsight metadata in order to (more or less) automatically re-load it later (this is for another blog-post).
As I aid in the beginning, this is as it stands at the moment – what will happen in the future, no one knows (at least no one outside of Microsoft). So there you have it, if you want to test StreamInsight, but are allergic to SQL Server, for one reason or another, do not worry – go ahead, you can still use it.
StreamInsight and Notification Services
If you are an “old-hand” (the emphasis is on old), you may remember how Microsoft introduced a service called Notification Services (NS) at around the same time frame as SQL Server 2000. I do definitely remember, because me and a cohert (at hat time) of mine – Bob B – wrote a class about NS. Now, when StreamInsight (SI) has been introduced, some people are saying that SI is the replacement of NS.
Actually, even though NS and SI may have some simiarities (both are dealing with events in one way or another), they are definitely not the same, and it would be totally wrong to compare them. In NS, you store events in the database and then compare and match them to subscriptions also stored in the database.
SI is completely different; here you query and process the incomnig data in real-time, way before it hits the database. You definitely have the ability to use the database for storage, but you are not dependent upon it.
This – I think – is something that needs to be communicated to the industry. If this is not understood, we may end up in a scenario where StreamInsight is rejected ut of hand, due to the wrong impression that it is dependent upon a database – which is not true. So please, NS was (is) great, but do not think it is the same as Notification Services SI!
Update 1: Jamie pointed out how I in the last sentence had said that NS is not the same as Notification Services, that should obviously be that NS is not the same as StreamInsight – thanks Jamie
StreamInsight Stuff
As you probably know, Microsoft’s implementation of Complex Event Processing (CEP) – called StreamInsight was released as a Community Technology Preview (CTP) a couple of days ago. The release was CTP2 (I have no idea when and if CTP1 was released). Anyway, I have a love for anything message based – and CEP is definitely message based, so I was obviously over this as a “rash”.
The last few days I have been playing around with researching this, and trying to come to grips with what it is all about. Part of the release are some samples and some help files in the shape of a chm file. It is definitely a lot to get your head around, and I will post more in the coming weeks. However, here is a heads up if you cannot get some of the samples to work;
When I started with StreamInsight I opened up the samples, and actually read the README.txt files, as well as part of the chm help document (I cannot be a developer, reading the help …). I tried first the ObjectModel sample app. It compiled OK (cool, let’s ship), and when I tried to run it I actually got some output to the console. At first glance it looked OK, but looking at the output a bit more closely I saw I had some exceptions there, looking something like this:
Query Exception: Microsoft.ComplexEventProcessing.Engine.OperatorExecutionExcep
tion: The adapter 'sensorInput' of type 'CepSamples.InputAdapters.TextFileInterv
alInput', query 'TrafficSensorQuery', failed to start. ---> System.FormatExcepti
on: String was not recognized as a valid DateTime.
at System.DateTimeParse.Parse(String s, DateTimeFormatInfo dtfi, DateTimeStyl
es styles)
at System.Convert.ToDateTime(String value, IFormatProvider provider)
at CepSamples.InputAdapters.TextFileIntervalInput.CreateEventFromLine(String
line) in C:\Program Files\Microsoft StreamInsight CTP2\docs_samples\Samples\Inpu
tAdapters\TextFileInputAdapter\TextFileIntervalInput.cs:line 247
I looked into the TextFileIntervalInput. cs file at around the line number mentioned above, and saw some code looking like this:
// set Start time
evt.StartTime = Convert.ToDateTime(split[0], CultureInfo.CurrentCulture);
evt.EndTime = Convert.ToDateTime(split[1], CultureInfo.CurrentCulture);
I then looked at the input data file for this sample, TrafficSensor.csv, and saw that it had some date data formatted according to “en-US” culture. However, I am running under “en-UK” culture. So when the Convert.ToDateTime code runs it will fail, because it cannot convert to my culture.
I fixed that by using DateTime.TryParse instead (I will leave it up to you – my two readers – to figure out the syntax), and that sorted that problem. I then had the same issue with a second input adapter file: TextFileEdgeInput.cs, and fixed it in the same way. Note, you may get the eception in the TextFileEdgeInput.cs exception before the exception in TextFileIntervalInput.cs.
So, if you run under non-US culture this is an issue you can run into, but it looks it is just for the ObjectModel sample. The other samples seem to run OK.
I am really intrigued by StreamInsight and will drill down deeper into it, and also try to blog my findings along the way. If you are interested in this, please leave a comment.
Oh, and if you want it from the “horses mouth”, the StreamInsight team at Microsoft has a blog here. There is also a support forum here.
SQL Server 2008 R2 August CTP
Yesterday I downloaded and installed the August CTP of SQL Server 2008 R2, and today I played around with it for a while. So, what are my impressions…
Well, from a perspective of being a relational dev and internals guy, my immediate response is … “yawn – where is the beef”. I.e, it is not much there, and I doubt we will see much more in coming releases. However, if I were a BI / reporting guy I’d be over the moon, and definitely look forward to future CTP’s! Even if I were a (wait for it …) DBA I would be fairly interested.
I will let you decide for yourself what is interestimg for you, but one thing that is not in the CTP at the moment but is promised (and keeps me interested) is StreamInsight (based on Complex Event Processing). This will be part of SQL Servr 2008 R2. Coming from the financial industry and dealing with message based applications (that’s why I love SQL Server Service Broker), this is something I am really interested in. So, even if you are a T-SQL / internals guy, do not despair – there may be something for us as well.
Hosting of Code Samples
The other day when I posted about the sample how to call a WCF Service from a SQLCLR method, I mentioned that I had no place to host my demo-code. I did not want to create a project on CodePlex (like I did with the SQLCLRProject), as demo code is not really projects per se.
I discussed this with David Reed from Microsoft who is a PM on the SQL Server team, and has a lot to do with overseeing the SQL projects on Code Plex. He mentioned MSDN’s Code Gallery, which is a place where you can create resource pages and upload code for download. So, earlier today I created a new resource page on the Code Gallery, and uploaded the SQLCLR to WCF sample to there. My intention is to upload other SQL Server related samples as well as How-To articles to there.
How-To Article about Calling a WCF Service from SQLCLR
I sometimes (not as much as I would like) hang out at some of the user forums where people talk about SQL Server generally and the SQLCLR specifically. Ever so often the question is being asked how to call into a WCF service from SQLCLR (.NET code running inside SQL Server).
The other day I became tired of giving exactly (or thereabout) the same answer for the n:th time, so I decided to croft up some code and write a mini How-To about how to call into a WCF service from SQLCLR. So without any further ado, the How-To article can be found here. I do not have anywhere to host the code yet (my usual hosting place disappeared – don’t ask), so until I decide where to host you can drop me an email if you want to see the code.
Update: I have now created a Resource Page on MSDN’s code gallery, where the sample code for the article can be found. So if you want to get the code, go to here.
