Tuesday, December 1, 2015

Evaluating the proposed capabilities to be supported by an API

In our experience, designing an application with usability as a key goal leads to software that is more readily adopted by a scientific collaboration. Usability applies not only to GUIs but also to APIs and CLIs as well. There has been some work on evaluating the usability of already implemented APIs [1,2,3,4,5]. A few studies have focused on taking a user-centered approach to getting early feedback on an API before implementation [6,7].

We built on this body of work by developing a user experience study to test the proposed features of a future API. This study revealed which functions would have the greatest impact on the users’ work as well as missing functionality.

Methodology
We conducted the study with the three domain scientists, with the test lasting approximately one hour each. Each feature was first defined before asking asking the study participants to perform two tasks.

How one user filled in the valence/arousal
ratings for the feature list

The first task involved asking the participants how they felt about the feature. Each participant placed the feature number on a valence/arousal chart. Valence represents how positively or negatively the participant feels about a feature, while arousal indicates how strongly they felt about the feature. For example, feeling excited about a feature would register as high valence, high arousal, whereas feeling ambivalent about a feature would be neutral valence, low arousal. Once they completed this task, we asked them to explain their ratings.











For the second task, we asked the participants to rank the features according to what they most wanted to see implemented. Additionally, we asked the users for a cutoff point in the ranked list which would mark the level of functionality they would consider very necessary and others that they were okay if the development team didn’t get to it.

Finally, we asked users if there were any missing features or if there were any mismatches between the concepts and how they thought about their work.

The importance heatmap

To analyze the results of the data, we visualized the importance of each feature as a heat map. In the process of categorizing the data into bins, we realized we had to do some judgement calls to split the list into bins. Therefore, we asked participants to verify the results of the binning afterwards. We then further categorized the features into implementation priority recommendations based on the following criteria:









A - Very important (team should look at it immediately)  -  any time 2 or more  people considered it critical.
B - Important (team should look at it soon)  - any time 2 or more people considered it "should have" or higher
C - Important (team should consider this in more detail)  - any time at least one person had a score of "would be good to have" or higher
D - Don't worry about it now.

Methodological Takeaways
In our methodology, the combination of trying to gather the emotional response feedback as well as the ranking feedback provided us with the relevant context for discussing the features with the domain scientists. However, once we concluded the study and tried to analyze the data, we realized that we didn’t have quite the right level of information in order to develop specific recommendations for what features should be implemented. We performed the binning with the information we had on hand but felt the need to do additional verification with the users to check our binning results. In future studies, we plan to do a binning exercise, using the scale shown in the above heatmap, as one of the study tasks.

A future blog posts will discuss our thoughts on evaluating a “paper” API, where you have most of the function definitions and signatures designed but not yet an implemented.

References

[1] Grill, Thomas, Ondrej Polacek, and Manfred Tscheligi. "Methods towards api usability: A structural analysis of usability problem categories." Human-Centered Software Engineering. Springer Berlin Heidelberg, 2012. 164-180.

[2] Rama, Girish Maskeri, and Avinash Kak. "Some structural measures of API usability." Software: Practice and Experience 45.1 (2015): 75-110.

[3] Cataldo, Marcelo, et al. "The impact of interface complexity on failures: an empirical analysis and implications for tool design." School of Computer Science, Carnegie Mellon University, Tech. Rep (2010).

[4] Robillard, Martin P., and Robert Deline. "A field study of API learning obstacles." Empirical Software Engineering 16.6 (2011): 703-732.

[5] Stylos, Jeffrey, et al. "A case study of API redesign for improved usability."Visual Languages and Human-Centric Computing, 2008. VL/HCC 2008. IEEE Symposium on. IEEE, 2008.

[6] Clarke, Steven. "Measuring API usability." Doctor Dobbs Journal 29.5 (2004): S1-S5.

[7] Ramakrishnan, Lavanya, et al. "Experiences with user-centered design for the Tigres workflow API." e-Science (e-Science), 2014 IEEE 10th International Conference on. Vol. 1. IEEE, 2014.