Scripting API

IMPROVEKIT

Scripting API

IHDSL Commands

average
    fieldName | unaryMessage

Returns the average value of the given numeric field. If you want to access the scalar value using averageValue

Examples

data average effort

buckets

buckets
    fieldName | unaryMessage

Class intervals (frequency) given field (numeric)

Examples

| filtro proyecto |
    
    filtro := data omitExtremes insumido.
    proyecto := filtro distinct proyecto anyOne proyecto.
    (filtro where proyecto: proyecto) buckets insumido

centiles

centiles
    fieldName | unaryMessage

Returns ten percentile values for the given field

Examples

| centiles ds |
    
    ds := data omit insumido.
    centiles := ds centiles insumido.
    ds select
        defaults: [:row | row centil: #insumido on: centiles]
        as: 'centil'

chiSquare

chiSquare
    fieldName | unaryMessage

Returns chi squared frequencies (theoretical) of the given field (numeric)

Examples

| filtro proyecto |
    
    filtro := data omitExtremes insumido.
    proyecto := filtro distinct proyecto anyOne proyecto.
    (filtro where proyecto: proyecto) chiSquare insumido

cumAverage

cumAverage
    fieldName | unaryMessage

Returns the cumulative average of the given numeric field

Examples

data cumAverage items

cumSum

cumSum
    fieldName | unaryMessage

Returns the accumulated sum of the numeric field given

Examples

data cumSum items

distinct

distinct
    fieldName | unaryMessage

It returns the data as a single assembly as given field

Examples

data distinct proyecto

fuzzy

fuzzy
    fieldName | unaryMessage

Returns fuzzy values domino considering a real between 0 .. 1, and a default triangular function {0 0.5 1}

Examples

(data kpi: 'RequirementsChangesRatio') 
            data fuzzy value where isHigh

gini

gini
    fieldName | unaryMessage

The Gini coefficient measures inequality between the values of a frequency distribution. A Gini coefficient of zero expresses perfect equality where all values are the same. A Gini coefficient of one (or 100%) expressed high inequality between values. Especially useful in distributions with skewness runs normal

 

Examples

data gini insumido

groupBy

groupBy
    
    fieldName | unaryMessage

    | defaults
    
    | field: fieldNameN [as: 'aliasN']..., N = 1..6
    | field: aBlockN [as: 'aliasN']..., N = 1..6
    

Returns the data grouped by a series of fields and expressions. Each row of the result is a result group.

There are several ways, the first allows grouping by a single field.

The reserved word defaults refers to a specific set of predefined fields (project, nameRelease, dateRelease).

The last form (field: as :) allows grouping by up to 6 different fields, with optional aliases. Optionally, select: as: an additional field resulting from an expression with alias

Examples

    data groupBy proyecto.
    
    data groupBy defaults.
    data groupBy field: 'proyecto' as: 'p'.
    data groupBy field: 'proyecto' as: 'p' field: 'nombreRelease' as: 'r'.
    data groupBy field: 'proyecto' as: 'p' field: 'nombreRelease' as: 'r' select:[:e|e count] as: 'items'.
    data groupBy defaults select defaults: [:group| group count] as: 'items'.
    
    data groupBy periodo.
    
    data groupBy 
        field: #tipoProyecto as: 'tipo'
        field: [:row| self aggregatedPeriodDateOf: row] as: 'fecha'.
    
    data groupBy 
        field:#proyecto as: 'proyecto'
        select: [:r| r count] as: 'items'.
        
        
    ((data where proyecto: 'NAME') 
        and tipo: 'TYPE') 
        groupBy defaults select defaults: [:g | g count] as: 'items'        
    

hurst

hurst
    fieldName | unaryMessage

Hurst exponent is used as a measure of long term memory time series. Refers to the autocorrelations of the time series, and the speed at which this decrease as the gap between pairs of values increases. Returns the Hurst exponent slope of the line as a linear series of rescaled range (10 logarithmic scale). 

The persistence (positive correlation) allows modeling phenomena that tend to cluster first one way and then the average across while antipersistence (negative correlation) allows modeling phenomena that strongly fluctuate around the mean. Persistence is associated with stable structures with high probability of fulfilling specific functions, while antipersistence relates to unstable structures that seek functionality

Examples

data hurst duracionEstimada

kurtosis

kurtosis
    fieldName | unaryMessage

Kurtosis is a measure of whether the data are peak or flat relative to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. A uniform distribution would be the extreme case. The kurtosis of a standard normal distribution is three. The histogram is an effective way to show both the skewness and kurtosis of a dataset graphical technique.

Examples

| ds centiles |
    
    ds := data omit insumido.
    centiles := ds centiles insumido.
    (ds where
        each: [:row | (row centil: #insumido on: centiles)
                = 3]) kurtosis insumido

limits

limits
    fieldName | unaryMessage

In statistical quality control, the X / moving range chart is a type of control chart used to monitor the data variables of a business or industrial process for which it is impractical to use rational subgroups. 

The "graphic" actually consists of a couple of them: one shows the individual measured values; secondly, the moving range chart shows the difference from one point to another. As with other control charts, these two graphs allow the user to control a process for process changes that alter the mean or variance

Examples

|kpi project panelData shewhart |
kpi := data kpi: 'ReleaseEffortMeasurement'.
project := data project: 'LAS MAJAGUAS'.
panelData := (((kpi data where tipoProyecto: project type) groupBy
        field: #tipoProyecto
        as: 'proyecto'
        field: [:row | row weekly]
        as: 'periodo'
        field: [:row | row firstDateOfWeek]
        as: 'fecha') select
        defaults: [:group | kpi isRatio
                ifTrue: [group average value]
                ifFalse: [group sum value]]
        as: 'value')
        sortedBy: #fecha.
shewhart := panelData limits value.
shewhart parameters.

Ver también #signals (en el ejemplo se observará la forma de obtener los mismos datos utilizando protocolo de KPI)

log

log
    fieldName | unaryMessage

Logarithmic log-lin series (x values are linear, logarithmic and y values).
There are two main reasons for using logarithmic scales in graphs. The first is to manage the asymmetry (skewness) towards graneds values; That is, cases where one or more points are much larger than most of the data. The second is to show the percent change or multiplicative factors.

Examples

data log valor plot.
    
data log valor y limits plot

max

max
    fieldName | unaryMessage

Returns the maximum value of a field (magnitude)

Examples

data max duracion

median

median
    fieldName | unaryMessage

Returns the median value of the given numeric field. If you want to access the scalar value using medianValue

Examples

data median effort

min

min
    fieldName | unaryMessage

Returns the minimum value of a field (magnitude)

Examples

data min duracion

mode

mode
    fieldName | unaryMessage

Returns the mode given numeric field. If you want to access the scalar value use modeValue

Examples

data mode items

movingRange

movingRange
    fieldName | unaryMessage

Moving range (difference between two successive values of a series)

Examples

| filtro proyecto |
    
    filtro := data omit insumido.
    proyecto := filtro distinct proyecto anyOne proyecto.
    (filtro where proyecto: proyecto) movingRange insumido

normalityChiSquareTest

normalityChiSquareTest
    fieldName | unaryMessage

Null hypothesis that the frequency distribution of certain events observed in a sample is consistent with a particular theoretical distribution is tested. The events considered must be mutually exclusive and have total probability 1. 

The chi square test of Pearson is used to assess two types of comparison: tests of goodness of fit and independence tests. 

  • A goodness of fit test determines whether or not an observed frequency distribution differs from a theoretical distribution. 

  • A test of independence assesses whether paired observations on two variables, expressed in a contingency table are independent of each other (eg, voting responses of people of different nationalities, to see if nationality is related to the response )

Examples

data omitExtremes normalityTest esfuerzo

omit

omit
    fieldName | unaryMessage

Returns a new set filtering nulls, zeros or empty field values given

Examples

data omit adjuntos

omitExtremes

omitExtremes
    fieldName | unaryMessage

Returns a new set filtering null values, zeros, empty, minimum and maximum of the given field

Examples

| total extremes omit|
total := data count.
omit := data omit esfuerzo count.
extremes := data omitExtremes esfuerzo count.
{total. omit. extremes}

quartiles

quartiles
    fieldName | unaryMessage

Returns three quartiles for field values given

Examples

| quartiles ds |
    
    ds := data omit insumido.
    quartiles := ds quartiles insumido.
    ds select
        defaults: [:row | row quartil: #insumido on: quartiles]
        as: 'quartil'

range

range
    fieldName | unaryMessage

Returns the range (maximum value minus minimum value) of a given field (magnitude)

Examples

data range duracion

ranks

ranks
    fieldName | unaryMessage

Returns a collection with the ranks (in ascending order) of the given field (numeric)

Examples

| ranking ds |
    
    ds := data omit insumido.
    ranking := ds ranks insumido.
    ds select
        defaults: [:row | row rankedOn: ranking]
        as: 'ranks'

rescaledRange

rescaledRange
    fieldName | unaryMessage

The rescaling is a statistical measure of the variability of a time series introduced by the British hydrologist Harold Edwin Hurst (1880-1978). Its purpose is to provide an assessment of how the apparent variability of a series switch with the length of the period of time considered. The rescaled range is calculated from dividing the range of the exposed part of the time series by the standard deviation of the values on the same portion of the time series values. For example, consider a time series {2, 5, 3, 7, 8, 12, 4, 2} which has a range, R, 12 -. 2 = 10 The standard deviation, s, is 3.46, so the range is rescaled r / s = 2.71

Examples

data omitExtremes insumido rescaledRange insumido

select

select
    fieldName | unaryMessage
    
    | fieldName: 'alias' ...
    
    | all
    
    | defaults
    
    | defaults: aBlock alias: 'alias'
    
    | field: #fieldNameN [as: 'aliasN'] ..., N = 1..6
    [field: aBlock as: 'alias']

    | rowExpression
    
    rowExpression :=: 
        each: oneArgBlock |
        dayOf: aspect |
        firstDateOfQuarter |
        firstDateOfWeek |
        monthOf: aspect |
        monthsFrom: aspect1 to: aspect2 |
        quarterOf: aspect |
        quartersFrom: aspect1 to: aspect2 |
        weekOf: aspect |
        weeksFrom: aspect1 to: aspect2 |
        yearOf: aspect         

 

Is used to project individual columns of the data source. 

There are several options, including the ability to reference predefined sets of fields (all, defaults, default: alias :) and use expressions (unaryMessage, select: as:, rowExpressions)

Examples

data select defaults.

data select field: #proyecto as: 'pro'.

data select field:[:row| row duracion ] as: 'duration'

serie

serie
    fieldName | unaryMessage

Returns the set of values of the given field (numeric) as a series object

Examples

(data omitExtremes esfuerzo 
        sortedBy:'fecha') serie esfuerzo plot

sigmas

sigmas
    fieldName | unaryMessage

Returns a dictionary with the Sigma (- +3) variation from the average value of the given field (numeric)

Examples

| filtro proyecto |
    
    filtro := data omitExtremes insumido.
    proyecto := filtro distinct proyecto anyOne proyecto.
    (filtro where proyecto: proyecto) sigmas insumido

signals

signals
    fieldName | unaryMessage

Returns a collection with signs of variation in the values of the given (numerical) column. The signals can be statistical or fuzzy logic. Each signal knows the points of the data series that compose it, and is identified by a name. The meaning of each signal is the following:

 

ascendantSignals

Possible process of change that shifts somewhat the average down, or up

 

dominantSignals

Possible process (or data error) that causes an overflow outside the natural statistical limits

 

moderateSignals

Possible change process moving average near the lower natural limit, or higher

 

nearAverageSignals

Sign of a possible process that keeps the performance close to the average

 

weakSignals

Possible process of change that shifts somewhat the average down, or up

 

ascendantFuzzySignals

Possible change process that shifts somewhat the diffused average value downwards, or upwards

 

dominantFuzzySignals

Posible proceso (o datos con error) que ocasiona un desborde por fuera de los límites difusos

 

moderateFuzzySignals

Possible change process that shifts the average value near the lower diffuse limit, or higher

 

nearAverageFuzzySignals

Possible process that keeps the performance near the diffused average value

 

weakFuzzySignals

Possible process that causes a downward shift, or rise, of the mean diffuse value

Examples

| kpi panelData |
    
    kpi := data kpi: 'ReleaseEffortMeasurement'.
    panelData := kpi
                baselines: (Projects named: 'LAS MAJAGUAS').
    panelData signals value

skewness

skewness
    fieldName | unaryMessage

is a measure of symmetry, or more accurately, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same on the left and right of the center point. The asymmetry of a normal distribution is zero, and symmetric data should have a skewness near zero. Negative values of the asymmetry may indicate that the data are skewed to the left, positive values for the skewness indicate data that are skewed to the right

Examples

| ds centiles |
    
    ds := data omit insumido.
    centiles := ds centiles insumido.
    (ds where
        each: [:row | (row centil: #insumido on: centiles)
                = 1]) skewness insumido

standardDeviation

standardDeviation
    fieldName | unaryMessage

Returns the value of the standard deviation of the given number field. If you want to access the scalar value using standardDeviationValue

Examples

data standardDeviation defects

sum

sum
    fieldName | unaryMessage

Returns the sum of given number field. If you want to access the scalar value using sumValue

Examples

data sum items

theil

theil
    fieldName | unaryMessage

An index of inequality / order, can be seen as a measure of redundancy, lack of diversity, isolation, segregation, inequality, no randomness and compressibility (1 highest order, greater inequality)

Examples

data theil insumido

trend

trend
    fieldName | unaryMessage

Returns an object of class or ExponentialTrendSeries LinearTrendSeries (best fit) for the values of the given field (numeric)

Examples

| kpi panelData |
    
    kpi := data kpi: 'ReleaseEffortMeasurement'.
    panelData := kpi
                trend: (Projects named: 'LAS MAJAGUAS').
    panelData trend value parameters

valuesOf

valuesOf
    fieldName | unaryMessage

Returns a collection with the values of the given field

Examples

data omitExtremes insumido valuesOf insumido kurtosis

variance

variance
    fieldName | unaryMessage

Returns the value of variance (stddev ^ 2) given the number field. If you want to access the scalar value using varianceValue

Examples

data variance insumido.
    
data variance esfuerzo.

where

where
    fieldName: value,    value :=: number | string | stringWithWildcards
        
    | booleanExpression
    | and ...
    | having ...
    
    
    booleanExpression :=: 
    
        ResultSetRow boolean protocol, examples:
    
        field: fieldName before: aDate
        field: fieldName from: startingMagnitude to: endingMagnitude
        field: fieldName match: stringWithWildcars
        field: fieldName matchNot: stringWithWildcars
        field: fieldName in: aSet
        
        isNotNull: fieldName
        isNull: fieldName
        isNotZero: fieldName
        isZero: fieldName
        isToday: fieldName
        
        each: oneArgBlock
        

Select one or more rows per logical conditions. 

It can be short putting equal ':' between a field name and value (you can use wildcards * and # to indicate any sequence of characters, or any character respectively). 

Logical expressions must be the result of a message sent to each row of data (see predefined functions). 

It can be calculated more complex expressions using eval: oneArgBlock (block of Smalltalk code that evaluates each row getting as argument)

Examples

((medida resultRows where proyecto: project proyecto) 
    and field: 'periodo' matchNot: '*.*.*.') 
    and each: [:row | row duracion <= 30]

ResultSet class

 

Protocols of messages that can be sent to a ResultSet
Collection of rows that implements HOM (High Order Messaging) protocols

Examples

    ResultSet fromFileName: 'data.txt'.
    
    ResultSet fromScriptNamed: 'script.st'.
    
    ResultSet fromBundle: 'bundle.st'.
    
    ResultSet datasources.
    
    tabDelimitedString asResultSet.
    
    tabDelimitedString asResultSetDefault.
    
    tabDelimitedString 
        asResultSetKinds: #(#String #String #Dateyyyymmdd #Number ) 
        required: (true false false true)
    

Protocol *DevImprovekit-fuzzy set

fuzzy

Returns a collection with the fuzzy values (using triangular membership function for values 0 .. 1) given field (number> = 0 and <= 1)

 

Protocol *DevImprovekit-statistic collections

chiSquare

Returns a collection with the chi squared frequencies of the given field (numeric)

Protocol *DevImprovekit-statistic values

gini

Returns the Gini coefficient for the given field

 

hurst

Hurst exponent is used as a measure of long term memory time series. Refers to the autocorrelations of the time series, and the speed at which this decrease as the gap between pairs of values increases. Returns the Hurst exponent slope of the line as a linear series of rescaled range (10 logarithmic scale)

 

 

rescaledRange

Returns the scaled range (range / stddev) of a (numeric) field. Its purpose is to provide an assessment of how the (apparent) variability of a series changes with the length of the time period considered. The modified scale range is calculated from the division of the range of the values exposed in a part of the time series by the standard deviation of the values over the same portion of the time series. For example, consider a time series {2, 5, 3, 7, 8, 12, 4, 2} having a range, R, of 12 -. 2 = 10 Its standard deviation, s, is 3.46, so the rescaled range is R / s = 2.71.

 

theil

Returns the Theil coefficient for the given field

 

Protocol *DevImprovekit-testing

checkData

normalityChiSquareTest

Returns true if the given column values follow a normal distribution (based on chi-square method)

Protocol accessing

configuration

scalars

values

Returns a collection with the values of the first field

valuesOf

Returns a collection with the values of the given field

Protocol copying

transposedAt:average:

Returns a result adding new columns by transposing rows from fieldNameOrSymbol field. Calculate anotherField's average for each group of fieldNameOrSymbol

 

transposedAt:in:average:

Returns a result adding new columns by transposing rows from fieldNameOrSymbol field (filter by aSet).Calculate the anotherField's average for each group of fieldNameOrSymbol

 

transposedAt:in:sum:

Returns a new result by adding columns by transposing rows from the fieldNameOrSymbol field (filtered by aSet). Calculate the sum of anotherField for each group of fieldNameOrSymbol.

| activities rows ranks |
    
    activities := (ranks := ((data groupBy field: #proyecto field: #tipo) 
                    select
                        field: #proyecto
                        field: #tipo
                        field: #nombreRelease
                        field: [:group | group count] as: 'items')                            where: #tipo isTop: 5 rankedBy: #items)                                 indexedOn: {#proyecto. [:row | row quarterly]. #tipo}.    rows := (data groupBy                field: #proyecto as: 'proyecto'                field: [:row | row quarterly] as: 'periodo'                field: #tipo as: 'tipo'                select: [:group | group count] as: 'total')                 select                    field: #proyecto as: 'proyecto'                    field: #periodo    as: 'periodo'                    field: #tipo as: 'tipo'                    field: [:group | (activities join: group on: #(#proyecto #periodo #tipo ))                            ifNotNil: [:row | row items]] as: 'items'                    field: #total as: 'total'.    rows        transposedAt: #tipo        in: ranks distinct tipo        sum: #total

Contact improvekit@gmail.com