原文链接:http://blogs.msdn.com/b/masimms/archive/2010/09/16/101-ish-linq-samples-for-streaminsight.aspx


(译者注:Mark Simms在原文中将其称为StreamInsight LINQ示例介绍的第1部分,即过滤和聚合,然而我发现并没有第2部分或者后续的内容,因此我在标题并没有加入第1部分的字样。)

LINQ示例101一样,我也写了一个针对StreamInsight的LINQ系列,我在考虑以后把它们放在StreamInsight开发者中心中以备查阅。

你可以使用这个应用程序框架(包含数据源)来运行下述查询。

这篇博文中我会重点介绍过滤以及窗口聚合,具体内容如下:


[Where] - 怎样过滤流来保留特定的事件?

var filterQuery = from e in inputStream
                    where e.Value > 20
                    select e;

[输入数据] 结果:

SimpleFilter,Point,12:00:02.000,1001,77
SimpleFilter,Point,12:00:03.000,1001,44
SimpleFilter,Point,12:00:04.000,1001,22
SimpleFilter,Point,12:00:05.000,1001,51
SimpleFilter,Point,12:00:06.000,1001,46
SimpleFilter,Point,12:00:07.000,1001,71
SimpleFilter,Point,12:00:08.000,1001,37
SimpleFilter,Point,12:00:09.000,1001,45

怎样根据多个条件进行过滤?

var filterQuery = from e in inputStream
                  where e.Value > 70 && e.SensorId == 1001
                  select e;

[输入数据] 结果:

MediumFilter,Point,12:00:01.001,1001,77
MediumFilter,Point,12:00:03.010,1001,72
MediumFilter,Point,12:00:12.082,1001,73
MediumFilter,Point,12:00:14.145,1001,75
MediumFilter,Point,12:00:14.154,1001,71
MediumFilter,Point,12:00:15.163,1001,74
MediumFilter,Point,12:00:15.172,1001,73

怎样计算聚合,如计算一系列事件的平均值?

所有的聚合操作,例如求平均值,最小值,最大值等等都针对的是事件集合。而StreamInsight中的事件集合均定义在一个时间窗口内。窗口操作或基于一段时间(跳跃窗口,翻转窗口和滑动窗口),或基于特定数量的事件(计数窗口)。

怎样计算过去5秒内的所有事件的平均值?

var query = from window in inputStream.TumblingWindow(
                TimeSpan.FromSeconds(5),
                HoppingWindowOutputPolicy.ClipToWindowEnd)
            select new
            {
                Average = window.Avg(e => e.Value)
            };

[输入数据] 结果:

SimpleTumblingWindow,Point,12:00:00.000,32.2
SimpleTumblingWindow,Point,12:00:05.000,50

怎样每隔2秒计算过去5秒内的所有事件的平均值?

var query = from window in inputStream.HoppingWindow(
                TimeSpan.FromSeconds(5), TimeSpan.FromSeconds(2),
                HoppingWindowOutputPolicy.ClipToWindowEnd)
            select new
            {
                Average = window.Avg(e => e.Value)
            };

[输入数据] 结果:

SimpleHoppingWindow,Point,11:59:56.000,14
SimpleHoppingWindow,Point,11:59:58.000,31.66666
SimpleHoppingWindow,Point,12:00:00.000,32.2
SimpleHoppingWindow,Point,12:00:02.000,48
SimpleHoppingWindow,Point,12:00:04.000,45.4
SimpleHoppingWindow,Point,12:00:06.000,49.75
SimpleHoppingWindow,Point,12:00:08.000,41

怎样在每当一个新的事件进入时,计算过去5秒内的所有事件的平均值?

var query = from window in inputStream
                .AlterEventDuration(e => TimeSpan.FromSeconds(5))
                .SnapshotWindow(SnapshotWindowOutputPolicy.Clip)
            select new
            {
                Average = window.Avg(e => e.Value)
            };

[输入数据] 结果:

SimpleSlidingWindow,Point,12:00:00.000,14
SimpleSlidingWindow,Point,12:00:01.000,9
SimpleSlidingWindow,Point,12:00:02.000,31.66
SimpleSlidingWindow,Point,12:00:03.000,34.75
SimpleSlidingWindow,Point,12:00:04.000,32.2
SimpleSlidingWindow,Point,12:00:05.000,39.6
SimpleSlidingWindow,Point,12:00:06.000,48
SimpleSlidingWindow,Point,12:00:07.000,46.8
SimpleSlidingWindow,Point,12:00:08.000,45.4
SimpleSlidingWindow,Point,12:00:09.000,50
SimpleSlidingWindow,Point,12:00:10.000,49.75
SimpleSlidingWindow,Point,12:00:11.000,51
SimpleSlidingWindow,Point,12:00:12.000,41
SimpleSlidingWindow,Point,12:00:13.000,45

怎样计算过去的5个事件的平均值?

使用计数窗口对一定数量的事件进行聚合。StreamInsight版本1中的计数窗口不可以使用内置的聚合函数(如Count,Avg,Min和Max),取而代之的必须使用用户自定义聚合操作。下面的代码样例展示了在计数窗口中使用用户自定义聚合操作:

var query = from window in inputStream.CountByStartTimeWindow(
                5, CountWindowOutputPolicy.PointAlignToWindowEnd)
            select new
            {
                Average = window.UserDefinedAverage(e => e.Value)
            };


/// <summary>
/// 计数窗口使用的简单平均值聚合函数
/// </summary>
public class SimpleAggregate : CepAggregate<double, double>
{
    public override double GenerateOutput(IEnumerable<double> payloads)
    {
        return payloads.Average();
    }
}



/// <summary>
/// LINQ 扩展方法
/// </summary>
public static class MyExtensions
{
    [CepUserDefinedAggregate(typeof(SimpleAggregate))]
    public static double UserDefinedAverage<InputT>(this CepWindow<InputT> window,
        Expression<Func<InputT, double>> map)
    {
        throw CepUtility.DoNotCall();
    }

}

[输入数据] 结果:

SimpleCountWindow,Point,12:00:04.000,32.2
SimpleCountWindow,Point,12:00:05.000,39.6
SimpleCountWindow,Point,12:00:06.000,48
SimpleCountWindow,Point,12:00:07.000,46.8
SimpleCountWindow,Point,12:00:08.000,45.4
SimpleCountWindow,Point,12:00:09.000,50

怎样每隔2秒计算事件进入数目?

var query = from window in inputStream.TumblingWindow(
                TimeSpan.FromSeconds(2),
                HoppingWindowOutputPolicy.ClipToWindowEnd)
            select new
            {
                Count = window.Count()
            };

[输入数据] 结果:

SimpleEventCount,Point,12:00:00.000,54
SimpleEventCount,Point,12:00:02.000,54
SimpleEventCount,Point,12:00:04.000,54
SimpleEventCount,Point,12:00:06.000,54
SimpleEventCount,Point,12:00:08.000,54
SimpleEventCount,Point,12:00:10.000,54
SimpleEventCount,Point,12:00:12.000,54
SimpleEventCount,Point,12:00:14.000,54
SimpleEventCount,Point,12:00:16.000,54

怎样计算过去10秒内每个事件组的平均值?

这个例子中,我们每隔10秒计算一次每个传感器的平均值。首先将事件按照传感器ID进行分组,接下去计算每一个组的平均值(结果包含了事件组键——这个例子中是传感器的ID)。

var query = from e in inputStream 
            group e by e.SensorId into sensorGroups
            from window in sensorGroups.TumblingWindow(
                TimeSpan.FromSeconds(10),
                HoppingWindowOutputPolicy.ClipToWindowEnd)
            select new
            {
                SensorId = sensorGroups.Key,
                Average = window.Avg(e => e.Value)
            };

[输入输入数据] 结果:

GroupedTumblingWindow,Point,12:00:00.000,42.1,1001
GroupedTumblingWindow,Point,12:00:00.000,34.6,1002
GroupedTumblingWindow,Point,12:00:00.000,39.7,1003
GroupedTumblingWindow,Point,12:00:00.000,50.9,1004
GroupedTumblingWindow,Point,12:00:00.000,30.7,1005
GroupedTumblingWindow,Point,12:00:00.000,38.3,1006
GroupedTumblingWindow,Point,12:00:00.000,36.8,1007
GroupedTumblingWindow,Point,12:00:00.000,41.7,1008
GroupedTumblingWindow,Point,12:00:00.000,34.4,1009
GroupedTumblingWindow,Point,12:00:10.000,46.375,1001
GroupedTumblingWindow,Point,12:00:10.000,40.2,1002
GroupedTumblingWindow,Point,12:00:10.000,36.1,1003
GroupedTumblingWindow,Point,12:00:10.000,37.3,1004
GroupedTumblingWindow,Point,12:00:10.000,31.8,1005
GroupedTumblingWindow,Point,12:00:10.000,41.4,1006
GroupedTumblingWindow,Point,12:00:10.000,40.3,1007
GroupedTumblingWindow,Point,12:00:10.000,43.5,1008
GroupedTumblingWindow,Point,12:00:10.000,43.3,1009

怎样每隔10秒计算过去20秒内的每个事件组的平均值?

var query = from e in inputStream
            group e by e.SensorId into sensorGroups
            from window in sensorGroups.HoppingWindow(
                TimeSpan.FromSeconds(20), TimeSpan.FromSeconds(10),
                HoppingWindowOutputPolicy.ClipToWindowEnd)
            select new
            {
                SensorId = sensorGroups.Key,
                Average = window.Avg(e => e.Value)
            };

[输入数据] 结果:

GroupedHoppingWindow,Point,11:59:50.000,42.1,1001
GroupedHoppingWindow,Point,11:59:50.000,34.6,1002
GroupedHoppingWindow,Point,11:59:50.000,39.7,1003
GroupedHoppingWindow,Point,11:59:50.000,50.9,1004
GroupedHoppingWindow,Point,11:59:50.000,30.7,1005
GroupedHoppingWindow,Point,11:59:50.000,38.3,1006
GroupedHoppingWindow,Point,11:59:50.000,36.8,1007
GroupedHoppingWindow,Point,11:59:50.000,41.7,1008
GroupedHoppingWindow,Point,11:59:50.000,34.4,1009
GroupedHoppingWindow,Point,12:00:00.000,44,1001
GroupedHoppingWindow,Point,12:00:00.000,37.0,1002
GroupedHoppingWindow,Point,12:00:00.000,38.1,1003
GroupedHoppingWindow,Point,12:00:00.000,44.8,1004
GroupedHoppingWindow,Point,12:00:00.000,31.2,1005
GroupedHoppingWindow,Point,12:00:00.000,39.7,1006
GroupedHoppingWindow,Point,12:00:00.000,38.4,1007
GroupedHoppingWindow,Point,12:00:00.000,42.5,1008
GroupedHoppingWindow,Point,12:00:00.000,38.4,1009
GroupedHoppingWindow,Point,12:00:10.000,46.3,1001
GroupedHoppingWindow,Point,12:00:10.000,40.2,1002
GroupedHoppingWindow,Point,12:00:10.000,36.1,1003
GroupedHoppingWindow,Point,12:00:10.000,37.3,1004
GroupedHoppingWindow,Point,12:00:10.000,31.8,1005
GroupedHoppingWindow,Point,12:00:10.000,41.4,1006
GroupedHoppingWindow,Point,12:00:10.000,40.3,1007
GroupedHoppingWindow,Point,12:00:10.000,43.5,1008
GroupedHoppingWindow,Point,12:00:10.000,43.3,1009

怎样在每当一个新的事件进入事件组时,计算过去10秒内的每个事件组的平均值?

var query = from e in inputStream
                .AlterEventDuration(e => TimeSpan.FromSeconds(10))
            group e by e.SensorId into sensorGroups
            from window in sensorGroups
                .SnapshotWindow(SnapshotWindowOutputPolicy.Clip)
            select new
            {
                SensorId = sensorGroups.Key,
                Average = window.Avg(e => e.Value)
            };

[输入数据] 结果:

<snip>
GroupedSlidingWindow,Point,12:00:17.017,39.3,1008
GroupedSlidingWindow,Point,12:00:17.018,39.4,1009
GroupedSlidingWindow,Point,12:00:17.019,43,1001
GroupedSlidingWindow,Point,12:00:17.020,43.1,1002
GroupedSlidingWindow,Point,12:00:17.021,37.8,1003
GroupedSlidingWindow,Point,12:00:17.022,41.8,1004
GroupedSlidingWindow,Point,12:00:17.023,33.7,1005
GroupedSlidingWindow,Point,12:00:17.024,39.2,1006
GroupedSlidingWindow,Point,12:00:17.025,40.3,1007
GroupedSlidingWindow,Point,12:00:17.026,38.9,1008
GroupedSlidingWindow,Point,12:00:17.027,40.8,1009
<snip>

怎样每隔10秒找出具有最大值的事件?

var query = (from window in inputStream.TumblingWindow(
                TimeSpan.FromSeconds(10),
                HoppingWindowOutputPolicy.ClipToWindowEnd)
            from e in window
            orderby e.Value descending
            select e).Take(1);

[输入数据] 结果:

TopK_TumblingDescending,Point,12:00:00.000,1006,80
TopK_TumblingDescending,Point,12:00:00.000,1007,80
TopK_TumblingDescending,Point,12:00:00.000,1009,80
TopK_TumblingDescending,Point,12:00:10.000,1007,81

注意第一个事件组在相同的窗口时间内共有3个事件具有同样的最大值(80)。因此,所有具有最大值的事件都被输出了。

怎样每隔5秒的找出过去10秒内具有两个最小值的事件?

var query = (from window in inputStream.HoppingWindow(
                TimeSpan.FromSeconds(10), TimeSpan.FromSeconds(5),
                HoppingWindowOutputPolicy.ClipToWindowEnd)
             from e in window
             orderby e.Value ascending
             select e).Take(2);

[输入数据] 结果:

TopK_HoppingAscending,Point,11:59:55.000,1004,2
TopK_HoppingAscending,Point,11:59:55.000,1007,1
TopK_HoppingAscending,Point,11:59:55.000,1008,2
TopK_HoppingAscending,Point,12:00:00.000,1001,1
TopK_HoppingAscending,Point,12:00:00.000,1001,1
TopK_HoppingAscending,Point,12:00:00.000,1003,1
TopK_HoppingAscending,Point,12:00:00.000,1007,1
TopK_HoppingAscending,Point,12:00:05.000,1001,1
TopK_HoppingAscending,Point,12:00:05.000,1001,1
TopK_HoppingAscending,Point,12:00:05.000,1003,1
TopK_HoppingAscending,Point,12:00:10.000,1004,2
TopK_HoppingAscending,Point,12:00:10.000,1004,2
TopK_HoppingAscending,Point,12:00:10.000,1006,2
TopK_HoppingAscending,Point,12:00:10.000,1007,2
TopK_HoppingAscending,Point,12:00:15.000,1004,2
TopK_HoppingAscending,Point,12:00:15.000,1007,2

注意这里有同样的现象——窗口内包含多个具有相同最小值的事件。

怎样找出过去10秒内具有最高平均值的事件组?

这个查询分两个阶段执行。第一个查询中在过去的10秒内计算每一个传感器的平均值(使用group by,翻转窗口和Avg()函数)。接下去取出该聚合的结果并在其中执行快照(针对第一个查询中的事件集合),排序并取出最高值。

var avg = from e in inputStream
            group e by e.SensorId into sensorGroups
            from window in sensorGroups.TumblingWindow(
                TimeSpan.FromSeconds(10),
                HoppingWindowOutputPolicy.ClipToWindowEnd)
            select new
            {
                SensorId = sensorGroups.Key,
                Average = window.Avg(e => e.Value)
            };

var query = (from win in avg.SnapshotWindow(SnapshotWindowOutputPolicy.Clip)
             from e in win
             orderby e.Average descending
             select e).Take(1);

[输入数据] 结果:

TopK_GroupTumblingDescending,Point,12:00:00.000,50.9,1004
TopK_GroupTumblingDescending,Point,12:00:10.000,46.3,1001

怎样每隔5秒找出过去10秒内具有最高值的事件组?

var avg = from e in inputStream
          group e by e.SensorId into sensorGroups
          from window in sensorGroups.HoppingWindow(
              TimeSpan.FromSeconds(10), TimeSpan.FromSeconds(5),
              HoppingWindowOutputPolicy.ClipToWindowEnd)
          select new
          {
              SensorId = sensorGroups.Key,
              Maximum = window.Max(e => e.Value),
              Minimum = window.Min(e => e.Value)
          };

var query = (from win in avg.SnapshotWindow(SnapshotWindowOutputPolicy.Clip)
             from e in win
             orderby e.Maximum descending
             select e).Take(1);

[输入数据] 结果:

TopK_GroupHoppingDescending,Point,11:59:55.000,80,6,1006
TopK_GroupHoppingDescending,Point,11:59:55.000,80,1,1007
TopK_GroupHoppingDescending,Point,11:59:55.000,80,4,1009
TopK_GroupHoppingDescending,Point,12:00:00.000,80,6,1006
TopK_GroupHoppingDescending,Point,12:00:00.000,80,6,1006
TopK_GroupHoppingDescending,Point,12:00:00.000,80,1,1007
TopK_GroupHoppingDescending,Point,12:00:00.000,80,1,1007
TopK_GroupHoppingDescending,Point,12:00:00.000,80,2,1009
TopK_GroupHoppingDescending,Point,12:00:00.000,80,4,1009
TopK_GroupHoppingDescending,Point,12:00:05.000,80,2,1006
TopK_GroupHoppingDescending,Point,12:00:05.000,80,6,1006
TopK_GroupHoppingDescending,Point,12:00:05.000,80,1,1007
TopK_GroupHoppingDescending,Point,12:00:05.000,80,2,1009
TopK_GroupHoppingDescending,Point,12:00:10.000,81,2,1007
TopK_GroupHoppingDescending,Point,12:00:15.000,81,2,1007
TopK_GroupHoppingDescending,Point,12:00:15.000,81,2,1007
TopK_GroupHoppingDescending,Point,12:00:20.000,81,2,1007

再一次注意StreamInsight引擎根据最高值(80或81)输出了对应结果。

怎样在一个特定的时间段内(例如5分钟的时间窗口)找出具有相同ID值的事件?

实现这个查询,我们使用了一个滑动窗口来检查5分钟窗口内到达的事件,并通过它们的ID值进行分组。接下望去我们找出该窗口内具有相同ID的事件数量,并过滤出具有两个事件的事件组。换句话说,我们仅希望得到那些包含超过1个事件的事件组(传感器),而删除那些仅包含1个事件的事件组(不会存在包含0个事件的事件组微笑)。

var eventCount = from e in inputStream.AlterEventDuration(e => TimeSpan.FromMinutes(5))
                 group e by e.SensorId into sensorGroups
                 from win in sensorGroups.SnapshotWindow(SnapshotWindowOutputPolicy.Clip)
                 select new
                 {
                     SensorId = sensorGroups.Key,
                     Count = win.Count()
                 };

var query = from e in eventCount
            where e.Count >= 2
            select e;

[输入数据] 结果:

DetectMultipleInWindow,Point,12:05:16.206,4,1008
DetectMultipleInWindow,Point,12:05:16.207,4,1009
DetectMultipleInWindow,Point,12:05:16.208,3,1001
DetectMultipleInWindow,Point,12:05:16.209,3,1002 
<snip>

作者: StreamInsight 发表于 2011-08-13 22:16 原文链接

推荐.NET配套的通用数据层ORM框架:CYQ.Data 通用数据层框架
新浪微博粉丝精灵,刷粉丝、刷评论、刷转发、企业商家微博营销必备工具"