Genetic Algorithm: Black box software testing for UI applications

Genetic Algorithm: Black box software testing for UI applications


Every QC person in software industry has that embarrassing moment when a defect is overlooked and finally caught in production after causing lot of sorrow to lot of stakeholders. 

In this blog, I am attempting a Genetic Algorithm (GA) that can uncover the plausible scenarios that might be a defect in the application. These are the scenarios that will not even cross the mind of the QC in most circumstances. 

We will see an application that is used to create and track events and using the GA we will try to find the scenarios where user may create an event when he/she should not be able to do so. We are trying to find the defective flows in the application.

Before going any further, let me try to explain what genetic algorithm is with an example. 

Genetic Algorithm: Genetic algorithms are heuristic method of optimization that operate on pieces of information like nature does on genes during the course of evolution. Without getting into too much theory, lets see one of the hand worked example of GA.

A hand worked example of Genetic Algorithm 
To Maximize the value of the function (x2 -2x) where x is the number represented in the binary format
1. Generate the initial parent population with 5 bits and get the Fitness value.
2. Do a cross over at any random position e.g. position 3  
Fitness Function after first generation:
Fitness value for the first and second generation will be compared. The ones with more values will be selected. If we compare the value of function x2 -2x for first and second generation we will find 15> 8, 153>120, 3>-1 and 323 > 255. So we will retain the string values for 1st and 3rd (as the fitness value is more) from the first generation and 2nd and 4th values will be selected from 2nd generation. So our final population after 2nd generation will look like:
We can repeat the above steps till we are satisfied with our fitness function.
Genetic Algorithm in black box testing: In any Genetic Algorithm the main issue remains on how to formulize the problem such that it can take the linear string like form, similar to DNA.
Application Under Test: Consider an application that is used as event logging system. The home page of the application looks like this:
To create an Event user needs to have the Create rights in the application.
Also, to navigate to create screen user can either click New --> Event or can directly navigate to http://localhost:3000/events/new


In the above image you can see the event create page. You can observe few of the fields are mandatory to create the event while others are optional.
Following fields are mandatory: Name, Summary, Type, Severity, Urgency, Responsible Party.
Following fields are optional: Description, Target Date
According to the above schematic a user that has create rights can either navigate through Event --> New or directly through /app/event/new url to land at Create event page. Then he/she has to provide the values for all the mandatory fields and that too only from the allowed values for few fields and then click on Save to successfully create an event.

Now,lets try to look each of the parameter that determines the success of create event.

1) User Rights (R): Possible values (Create, Update, Read, None). Only user with Create right can create the event.
2) Navigation Mechanism (NM): Through UI or direct URL (There may be scenarios where user may be able to create the event with lesser rights if he/she directly uses the URL)
3) Name (N): Any value of text is allowed, Numeric, alphanumeric, special characters everything should be allowed. Only check is that the field should not be blank.
4) Summary (S): Same rules as Name.
5) Type (T): Possible values (Defect, Enhancement, User education).No other value is permitted. Also field should not be blank.
6) Severity (Sv): Possible values (1, 2, 3, 4). No other value including blank is permitted.
7) Urgency (U): Same as Severity.
8) Description (D): Free text allowed. Not a mandatory field for event creation.
9) Target Date (TD): Only future date allowed. Not a mandatory field for event creation. If user does not give valid date or gives date prior to today's date, event creation will not be successful.
10) Responsible Party (RP): Look-up field. It should have a valid value in the database table. Validation happens after clicking Save only.
These 10 parameters will form the 10 character string for Genetic Algorithm each identified by the position it takes. 
 So a 10 character string may look like:
R NM N S T Sv U D TD RP
-    -    -   -  -   -   -  -    -    -
With first character always referring to the User Rights, Second to Navigation Mechanism and so forth.

Next step for us is to define the search space for each of the parameters so that we can achieve the desired results in reasonable time. For this we will provide the possible values for each of the parameter. We will store the values in the arrays so that we can refer the value through position.
R = ['Create', 'Update, 'Read', '']
NM = ['UI', 'URL']
N= ['', 'valid name', 'not #$% 1 so%^valid']
S= ['', 'valid summary', 'not #$% 1 so%^valid']
T= ['', 'Defect', 'Enhancement', 'User education']
Sv= [ , 1, 2, 3, 4]
U= [ , 1, 2, 3, 4]
D= ['', 'valid description', 'not #$% 1 so%^valid']
TD= ['', #{Time.now - 1}, #{Time.now + 1}   >> Time.now will give the present date
RP= ['', 'valid value'] >>  Since invalid value will not result in fetching db value

We will refer to each value using the position in a string. so a string:
1010212021 will correspond to [Update, UI, valid name, '', Enhancement, 1, 2, '', Tomorrow's date, valid value]

Now that we have the domain of values ready we can assign the weights for each value in individual arrays. The idea behind this is that a less likely a value to result in successful Save event the higher weight it is going to get (Since our aim to find the defective workflows that may result in successful Save). E.g. A user with Create access is more likely to successfully create the event so we will give less weight of 5 while user with Read right is less like to create the event so we will give Read entitlement weight of 30. 
The choice of weight is completely arbitrary and I am still trying to figure out better way to assign these. User comments are welcome.

Here is the weights associated with each of the value:
R     [ 0:5, 1:10, 2:30, 3:35]  >> Create has weight of 5, Update of 10, Read of 30 and none of 35
NM  [ 0:5, 1:10]
N     [0:50, 1:5, 2:10] >> Name is mandatory field, so if no value is given probability of creation decreases and hence weight increases
S     [0:50, 1:5, 2:10]
T     [0:50, 1:5, 2:5, 3:5 ] 
Sv   [0:50, 1:5: 2:5, 3:5, 4:5]
U     [0:50, 1:5, 2:5, 3:5, 4:5]
D     [0:5, 1:5, 2:5, 3:5]
TD   [0:5, 1:5, 2:50] >> Future date is not allowed so tomorrows date gets higher value
RP   [0:50, 1:5]

Now that we have the mapping of values with the weights we can define the Fitness function as:
F = R + NM + N + S + T + Sv + U + D +TD + RP +  K
K = 400 if save is successful 
K = 0 if save is unsuccessful
The rational behind the value of K is that we want to focus our attention only to the workflows that lead to successful creation of events. Some of which may be faulty. If we give value of K too less then we may end up with population that is not successful in event creation. Also if K is too much large then the value of other parameters will not matter, so we need to maintain the trade off.

Our next step would be to create a population of randomly created 10 strings e.g. 
1010212021
1001102111
0111112011
..................
This would be the first generation. Calculate the Fitness value of each string.
Now Using either mutations or crossover create the next round of offspring. 
e.g
101   0212021                   100   0212021
100   1102111                    101  1102111
Select the off springs that have Fitness value greater that of parents, if not retain the parents.
You can go upto nth number of generations before stopping the run. The last generation is likely to have higher fitness value. Now, all we have to do is analyze the 10 strings from the final population and verify that the save event (if happened) was a valid one or not.

Hope it makes sense.
In the next blog I will try to give code on how to achieve this.

Comments