IMPROVEKIT
Scripting API
IHDSL Commands
average
fieldName | unaryMessage
Returns the average value of the given numeric field. If you want to access the scalar value using averageValue
Examples
data average effort
buckets
buckets
fieldName | unaryMessage
Class intervals (frequency) given field (numeric)
Examples
| filtro proyecto |
filtro := data omitExtremes insumido.
proyecto := filtro distinct proyecto anyOne proyecto.
(filtro where proyecto: proyecto) buckets insumido
centiles
centiles
fieldName | unaryMessage
Returns ten percentile values for the given field
Examples
| centiles ds |
ds := data omit insumido.
centiles := ds centiles insumido.
ds select
defaults: [:row | row centil: #insumido on: centiles]
as: 'centil'
chiSquare
chiSquare
fieldName | unaryMessage
Returns chi squared frequencies (theoretical) of the given field (numeric)
Examples
| filtro proyecto |
filtro := data omitExtremes insumido.
proyecto := filtro distinct proyecto anyOne proyecto.
(filtro where proyecto: proyecto) chiSquare insumido
cumAverage
cumAverage
fieldName | unaryMessage
Returns the cumulative average of the given numeric field
Examples
data cumAverage items
cumSum
cumSum
fieldName | unaryMessage
Returns the accumulated sum of the numeric field given
Examples
data cumSum items
distinct
distinct
fieldName | unaryMessage
It returns the data as a single assembly as given field
Examples
data distinct proyecto
fuzzy
fuzzy
fieldName | unaryMessage
Returns fuzzy values domino considering a real between 0 .. 1, and a default triangular function {0 0.5 1}
Examples
(data kpi: 'RequirementsChangesRatio')
data fuzzy value where isHigh
gini
gini
fieldName | unaryMessage
The Gini coefficient measures inequality between the values of a frequency distribution. A Gini coefficient of zero expresses perfect equality where all values are the same. A Gini coefficient of one (or 100%) expressed high inequality between values. Especially useful in distributions with skewness runs normal
Examples
data gini insumido
groupBy
groupBy
fieldName | unaryMessage
| defaults
| field: fieldNameN [as: 'aliasN']..., N = 1..6
| field: aBlockN [as: 'aliasN']..., N = 1..6
Returns the data grouped by a series of fields and expressions. Each row of the result is a result group.
There are several ways, the first allows grouping by a single field.
The reserved word defaults refers to a specific set of predefined fields (project, nameRelease, dateRelease).
The last form (field: as :) allows grouping by up to 6 different fields, with optional aliases. Optionally, select: as: an additional field resulting from an expression with alias
Examples
data groupBy proyecto.
data groupBy defaults.
data groupBy field: 'proyecto' as: 'p'.
data groupBy field: 'proyecto' as: 'p' field: 'nombreRelease' as: 'r'.
data groupBy field: 'proyecto' as: 'p' field: 'nombreRelease' as: 'r' select:[:e|e count] as: 'items'.
data groupBy defaults select defaults: [:group| group count] as: 'items'.
data groupBy periodo.
data groupBy
field: #tipoProyecto as: 'tipo'
field: [:row| self aggregatedPeriodDateOf: row] as: 'fecha'.
data groupBy
field:#proyecto as: 'proyecto'
select: [:r| r count] as: 'items'.
((data where proyecto: 'NAME')
and tipo: 'TYPE')
groupBy defaults select defaults: [:g | g count] as: 'items'
hurst
hurst
fieldName | unaryMessage
Hurst exponent is used as a measure of long term memory time series. Refers to the autocorrelations of the time series, and the speed at which this decrease as the gap between pairs of values increases. Returns the Hurst exponent slope of the line as a linear series of rescaled range (10 logarithmic scale).
The persistence (positive correlation) allows modeling phenomena that tend to cluster first one way and then the average across while antipersistence (negative correlation) allows modeling phenomena that strongly fluctuate around the mean. Persistence is associated with stable structures with high probability of fulfilling specific functions, while antipersistence relates to unstable structures that seek functionality
Examples
data hurst duracionEstimada
kurtosis
kurtosis
fieldName | unaryMessage
Kurtosis is a measure of whether the data are peak or flat relative to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. A uniform distribution would be the extreme case. The kurtosis of a standard normal distribution is three. The histogram is an effective way to show both the skewness and kurtosis of a dataset graphical technique.
Examples
| ds centiles |
ds := data omit insumido.
centiles := ds centiles insumido.
(ds where
each: [:row | (row centil: #insumido on: centiles)
= 3]) kurtosis insumido
limits
limits
fieldName | unaryMessage
In statistical quality control, the X / moving range chart is a type of control chart used to monitor the data variables of a business or industrial process for which it is impractical to use rational subgroups.
The "graphic" actually consists of a couple of them: one shows the individual measured values; secondly, the moving range chart shows the difference from one point to another. As with other control charts, these two graphs allow the user to control a process for process changes that alter the mean or variance
Examples
|kpi project panelData shewhart |
kpi := data kpi: 'ReleaseEffortMeasurement'.
project := data project: 'LAS MAJAGUAS'.
panelData := (((kpi data where tipoProyecto: project type) groupBy
field: #tipoProyecto
as: 'proyecto'
field: [:row | row weekly]
as: 'periodo'
field: [:row | row firstDateOfWeek]
as: 'fecha') select
defaults: [:group | kpi isRatio
ifTrue: [group average value]
ifFalse: [group sum value]]
as: 'value')
sortedBy: #fecha.
shewhart := panelData limits value.
shewhart parameters.
Ver también #signals (en el ejemplo se observará la forma de obtener los mismos datos utilizando protocolo de KPI)
log
log
fieldName | unaryMessage
Logarithmic log-lin series (x values are linear, logarithmic and y values).
There are two main reasons for using logarithmic scales in graphs. The first is to manage the asymmetry (skewness) towards graneds values; That is, cases where one or more points are much larger than most of the data. The second is to show the percent change or multiplicative factors.
Examples
data log valor plot.
data log valor y limits plot
max
max
fieldName | unaryMessage
Returns the maximum value of a field (magnitude)
Examples
data max duracion
median
median
fieldName | unaryMessage
Returns the median value of the given numeric field. If you want to access the scalar value using medianValue
Examples
data median effort
min
min
fieldName | unaryMessage
Returns the minimum value of a field (magnitude)
Examples
data min duracion
mode
mode
fieldName | unaryMessage
Returns the mode given numeric field. If you want to access the scalar value use modeValue
Examples
data mode items
movingRange
movingRange
fieldName | unaryMessage
Moving range (difference between two successive values of a series)
Examples
| filtro proyecto |
filtro := data omit insumido.
proyecto := filtro distinct proyecto anyOne proyecto.
(filtro where proyecto: proyecto) movingRange insumido
normalityChiSquareTest
normalityChiSquareTest
fieldName | unaryMessage
Null hypothesis that the frequency distribution of certain events observed in a sample is consistent with a particular theoretical distribution is tested. The events considered must be mutually exclusive and have total probability 1.
The chi square test of Pearson is used to assess two types of comparison: tests of goodness of fit and independence tests.
A goodness of fit test determines whether or not an observed frequency distribution differs from a theoretical distribution.
A test of independence assesses whether paired observations on two variables, expressed in a contingency table are independent of each other (eg, voting responses of people of different nationalities, to see if nationality is related to the response )
Examples
data omitExtremes normalityTest esfuerzo
omit
omit
fieldName | unaryMessage
Returns a new set filtering nulls, zeros or empty field values given
Examples
data omit adjuntos
omitExtremes
omitExtremes
fieldName | unaryMessage
Returns a new set filtering null values, zeros, empty, minimum and maximum of the given field
Examples
| total extremes omit|
total := data count.
omit := data omit esfuerzo count.
extremes := data omitExtremes esfuerzo count.
{total. omit. extremes}
quartiles
quartiles
fieldName | unaryMessage
Returns three quartiles for field values given
Examples
| quartiles ds |
ds := data omit insumido.
quartiles := ds quartiles insumido.
ds select
defaults: [:row | row quartil: #insumido on: quartiles]
as: 'quartil'
range
range
fieldName | unaryMessage
Returns the range (maximum value minus minimum value) of a given field (magnitude)
Examples
data range duracion
ranks
ranks
fieldName | unaryMessage
Returns a collection with the ranks (in ascending order) of the given field (numeric)
Examples
| ranking ds |
ds := data omit insumido.
ranking := ds ranks insumido.
ds select
defaults: [:row | row rankedOn: ranking]
as: 'ranks'
rescaledRange
rescaledRange
fieldName | unaryMessage
The rescaling is a statistical measure of the variability of a time series introduced by the British hydrologist Harold Edwin Hurst (1880-1978). Its purpose is to provide an assessment of how the apparent variability of a series switch with the length of the period of time considered. The rescaled range is calculated from dividing the range of the exposed part of the time series by the standard deviation of the values on the same portion of the time series values. For example, consider a time series {2, 5, 3, 7, 8, 12, 4, 2} which has a range, R, 12 -. 2 = 10 The standard deviation, s, is 3.46, so the range is rescaled r / s = 2.71
Examples
data omitExtremes insumido rescaledRange insumido
select
select
fieldName | unaryMessage
| fieldName: 'alias' ...
| all
| defaults
| defaults: aBlock alias: 'alias'
| field: #fieldNameN [as: 'aliasN'] ..., N = 1..6
[field: aBlock as: 'alias']
| rowExpression
rowExpression :=:
each: oneArgBlock |
dayOf: aspect |
firstDateOfQuarter |
firstDateOfWeek |
monthOf: aspect |
monthsFrom: aspect1 to: aspect2 |
quarterOf: aspect |
quartersFrom: aspect1 to: aspect2 |
weekOf: aspect |
weeksFrom: aspect1 to: aspect2 |
yearOf: aspect
Is used to project individual columns of the data source.
There are several options, including the ability to reference predefined sets of fields (all, defaults, default: alias :) and use expressions (unaryMessage, select: as:, rowExpressions)
Examples
data select defaults.
data select field: #proyecto as: 'pro'.
data select field:[:row| row duracion ] as: 'duration'
serie
serie
fieldName | unaryMessage
Returns the set of values of the given field (numeric) as a series object
Examples
(data omitExtremes esfuerzo
sortedBy:'fecha') serie esfuerzo plot
sigmas
sigmas
fieldName | unaryMessage
Returns a dictionary with the Sigma (- +3) variation from the average value of the given field (numeric)
Examples
| filtro proyecto |
filtro := data omitExtremes insumido.
proyecto := filtro distinct proyecto anyOne proyecto.
(filtro where proyecto: proyecto) sigmas insumido
signals
signals
fieldName | unaryMessage
Returns a collection with signs of variation in the values of the given (numerical) column. The signals can be statistical or fuzzy logic. Each signal knows the points of the data series that compose it, and is identified by a name. The meaning of each signal is the following:
ascendantSignals
Possible process of change that shifts somewhat the average down, or up
dominantSignals
Possible process (or data error) that causes an overflow outside the natural statistical limits
moderateSignals
Possible change process moving average near the lower natural limit, or higher
nearAverageSignals
Sign of a possible process that keeps the performance close to the average
weakSignals
Possible process of change that shifts somewhat the average down, or up
ascendantFuzzySignals
Possible change process that shifts somewhat the diffused average value downwards, or upwards
dominantFuzzySignals
Posible proceso (o datos con error) que ocasiona un desborde por fuera de los límites difusos
moderateFuzzySignals
Possible change process that shifts the average value near the lower diffuse limit, or higher
nearAverageFuzzySignals
Possible process that keeps the performance near the diffused average value
weakFuzzySignals
Possible process that causes a downward shift, or rise, of the mean diffuse value
Examples
| kpi panelData |
kpi := data kpi: 'ReleaseEffortMeasurement'.
panelData := kpi
baselines: (Projects named: 'LAS MAJAGUAS').
panelData signals value
skewness
skewness
fieldName | unaryMessage
is a measure of symmetry, or more accurately, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same on the left and right of the center point. The asymmetry of a normal distribution is zero, and symmetric data should have a skewness near zero. Negative values of the asymmetry may indicate that the data are skewed to the left, positive values for the skewness indicate data that are skewed to the right
Examples
| ds centiles |
ds := data omit insumido.
centiles := ds centiles insumido.
(ds where
each: [:row | (row centil: #insumido on: centiles)
= 1]) skewness insumido
standardDeviation
standardDeviation
fieldName | unaryMessage
Returns the value of the standard deviation of the given number field. If you want to access the scalar value using standardDeviationValue
Examples
data standardDeviation defects
sum
sum
fieldName | unaryMessage
Returns the sum of given number field. If you want to access the scalar value using sumValue
Examples
data sum items
theil
theil
fieldName | unaryMessage
An index of inequality / order, can be seen as a measure of redundancy, lack of diversity, isolation, segregation, inequality, no randomness and compressibility (1 highest order, greater inequality)
Examples
data theil insumido
trend
trend
fieldName | unaryMessage
Returns an object of class or ExponentialTrendSeries LinearTrendSeries (best fit) for the values of the given field (numeric)
Examples
| kpi panelData |
kpi := data kpi: 'ReleaseEffortMeasurement'.
panelData := kpi
trend: (Projects named: 'LAS MAJAGUAS').
panelData trend value parameters
valuesOf
valuesOf
fieldName | unaryMessage
Returns a collection with the values of the given field
Examples
data omitExtremes insumido valuesOf insumido kurtosis
variance
variance
fieldName | unaryMessage
Returns the value of variance (stddev ^ 2) given the number field. If you want to access the scalar value using varianceValue
Examples
data variance insumido.
data variance esfuerzo.
where
where
fieldName: value, value :=: number | string | stringWithWildcards
| booleanExpression
| and ...
| having ...
booleanExpression :=:
ResultSetRow boolean protocol, examples:
field: fieldName before: aDate
field: fieldName from: startingMagnitude to: endingMagnitude
field: fieldName match: stringWithWildcars
field: fieldName matchNot: stringWithWildcars
field: fieldName in: aSet
isNotNull: fieldName
isNull: fieldName
isNotZero: fieldName
isZero: fieldName
isToday: fieldName
each: oneArgBlock
Select one or more rows per logical conditions.
It can be short putting equal ':' between a field name and value (you can use wildcards * and # to indicate any sequence of characters, or any character respectively).
Logical expressions must be the result of a message sent to each row of data (see predefined functions).
It can be calculated more complex expressions using eval: oneArgBlock (block of Smalltalk code that evaluates each row getting as argument)
Examples
((medida resultRows where proyecto: project proyecto)
and field: 'periodo' matchNot: '*.*.*.')
and each: [:row | row duracion <= 30]
ResultSet class
Protocols of messages that can be sent to a ResultSet
Collection of rows that implements HOM (High Order Messaging) protocols
Examples
ResultSet fromFileName: 'data.txt'.
ResultSet fromScriptNamed: 'script.st'.
ResultSet fromBundle: 'bundle.st'.
ResultSet datasources.
tabDelimitedString asResultSet.
tabDelimitedString asResultSetDefault.
tabDelimitedString
asResultSetKinds: #(#String #String #Dateyyyymmdd #Number )
required: (true false false true)
Protocol *DevImprovekit-fuzzy set
fuzzy
Returns a collection with the fuzzy values (using triangular membership function for values 0 .. 1) given field (number> = 0 and <= 1)
Protocol *DevImprovekit-statistic collections
chiSquare
Returns a collection with the chi squared frequencies of the given field (numeric)
Protocol *DevImprovekit-statistic values
gini
Returns the Gini coefficient for the given field
hurst
Hurst exponent is used as a measure of long term memory time series. Refers to the autocorrelations of the time series, and the speed at which this decrease as the gap between pairs of values increases. Returns the Hurst exponent slope of the line as a linear series of rescaled range (10 logarithmic scale)
rescaledRange
Returns the scaled range (range / stddev) of a (numeric) field. Its purpose is to provide an assessment of how the (apparent) variability of a series changes with the length of the time period considered. The modified scale range is calculated from the division of the range of the values exposed in a part of the time series by the standard deviation of the values over the same portion of the time series. For example, consider a time series {2, 5, 3, 7, 8, 12, 4, 2} having a range, R, of 12 -. 2 = 10 Its standard deviation, s, is 3.46, so the rescaled range is R / s = 2.71.
theil
Returns the Theil coefficient for the given field
Protocol *DevImprovekit-testing
checkData
normalityChiSquareTest
Returns true if the given column values follow a normal distribution (based on chi-square method)
Protocol accessing
configuration
scalars
values
Returns a collection with the values of the first field
valuesOf
Returns a collection with the values of the given field
Protocol copying
transposedAt:average:
Returns a result adding new columns by transposing rows from fieldNameOrSymbol field. Calculate anotherField's average for each group of fieldNameOrSymbol
transposedAt:in:average:
Returns a result adding new columns by transposing rows from fieldNameOrSymbol field (filter by aSet).Calculate the anotherField's average for each group of fieldNameOrSymbol
transposedAt:in:sum:
Returns a new result by adding columns by transposing rows from the fieldNameOrSymbol field (filtered by aSet). Calculate the sum of anotherField for each group of fieldNameOrSymbol.
| activities rows ranks |
activities := (ranks := ((data groupBy field: #proyecto field: #tipo)
select
field: #proyecto
field: #tipo
field: #nombreRelease
field: [:group | group count] as: 'items') where: #tipo isTop: 5 rankedBy: #items) indexedOn: {#proyecto. [:row | row quarterly]. #tipo}. rows := (data groupBy field: #proyecto as: 'proyecto' field: [:row | row quarterly] as: 'periodo' field: #tipo as: 'tipo' select: [:group | group count] as: 'total') select field: #proyecto as: 'proyecto' field: #periodo as: 'periodo' field: #tipo as: 'tipo' field: [:group | (activities join: group on: #(#proyecto #periodo #tipo )) ifNotNil: [:row | row items]] as: 'items' field: #total as: 'total'. rows transposedAt: #tipo in: ranks distinct tipo sum: #total
Contact improvekit@gmail.com