Wednesday, September 29, 2010

Reading #10. Graphical Input Through Machine Recognition of Sketches (Herot)

Comment Location:
http://christalks624.blogspot.com/2010/09/reading-10-graphical-input-through.html

Summary:
This is an old paper, written in 1976.  The HUNCH system was an early sketch recognizer that worked for some users and not for others; the programmer had styled his programming for a particular style of behavior; some users matched it, some did not.  "Latching", the practice of joining together endpoints if certain conditions are met, failed in a number of cases and did not work perfectly in the end.  The author demonstrated a "room finder" which calculated the location of room in floor plans by finding whitespace surrounded by lines; it worked well for simple plans, but it was less effective for oddly-shaped floor plans.

The author implemented "speed" and "bentness" as means of measuring and interpreting the data; I suspect "bentness" is a precusor to the term "curvature".  The author observed slower speeds and high bentness usually meant a corner.  In the end, the author was able to do some sketch recogntion clean-up that is rather common-place today.

Discussion:
This paper was another get-back-to-the-basics that demonstrated how the author implemented techniques already in existence today.  I don't remember the "latching" from any recent papers, though.  I suspect the term changed or there are not a lot of people who use it.

On a side note, where are the results & evaluation sections?

Tuesday, September 28, 2010

Reading #9. PaleoSketch: Accurate Primitive Sketch Recognition and Beautification (Paulson)

Comment Location:
http://christalks624.blogspot.com/2010/09/reading-9-paleosketch.html

Summary:
This is the Paleo program being used for the 1st programming assignment.  It takes a sketch and classifies each line within the sketch into primitives.  The set of primitives used can be changed in the code.  Paleo does some nice things in the pre-processing; it removes duplicate points (done by systems with high sampling rate), normalized distance
between direction extremes (NDDE) (finds circles rather well), and direction change ratio (DCR) (finds corners very well).  Here's a summary of the testing procedures:

Line: least squares fit comparison (more complicated and additional steps, but that's the gist of it).
Polyline: locate corners and do line test for each segmented line.
Ellipse: get center (average of points), major axis, minor axis; NNDE value is high, area of shape is within a threshold
Circle: same as ellipse, except major and minor axis must be close
Etc.

Those are the important ones in the first programming assignment, so those are all I listed. 


Discussion:
This paper had everything (and a few other things) I discovered using Paleo within the first 3 hours or so.  As for the results, all I care about at the moment is if they are good enough for the homework.  Since it seems like it, I'm satisfied about that.  Now, if only there was more documentation on Paleo...

Tuesday, September 14, 2010

Reading #8. A Lightweight Multistroke Recognizer for User Interface Prototypes (Anothony)

Comment Location:
http://christalks624.blogspot.com/2010/09/reading-8-lightweight-multistroke.html

Summary:
The previous paper (and most of the previous papers I believe) focused on recognizing the shape of a single stroke.  This paper seeks to do sketch recognition for object made of multiple strokes.  Named the $N, the algorithm is a little more than twice the $1 in number of lines of code and claims to be accurate; it's also a template-matcher.  $N claims to improve upon all of $1's weaknesses and achieved an impressive 96.7% recognition rate for a set of 15 templates. 

$N seeks to improve upon $1 by "(1) recognizing gestures comprising
multiple strokes, (2) automatically generalizing from one
multistroke template to all possible multistrokes with alternative
stroke orderings and directions, (3) recognizing 1D gestures such
as lines, and (4) providing bounded rotation invariance.

The paper can be reviewed for details on exactly how those 4 goals are accomplished.  $N still suffers from the shortcomings of any template-recognizer (not good with new shapes).  Still, it was able to recognize letters and some mathematical symbols.

Discussion:
I've noticed this in other papers, but the authors tend to use small data sets and the sketches are rather similar to their intended images; there are some flaws in the sketches, of course.  It's good that the sketchers could draw well and a small data set was used to prove the concept, but that does not indicate an algorithm's robustness and ability to recognize (or clean up) ANY image the user draws (like words or an airplane, cat, or car, for instance).  Overall, the results have been somewhat biased; that seems to be true for a LOT of papers, but it's still rather discouraging to see it so frequently in a singular discipline.

That said, I LOVE how the author included psuedocode for his algorithm at the end of the paper; that's very rare, in any paper of any discipline of computer science.

On a side note, is it possible to include pressure exerted in the data of each point?
Point p = (x,y,time,pressure)

While implementing this is primarily a hardware (and some back-end software) issue, I believe the user would exert more pressure on mores significant lines and curves; that could aid in deciphering the user's intention behind his sketch.

Reading #7. Sketch Based Interfaces: Early Processing for Sketch Understanding (Sezgin)

Comment Location:
http://christalks624.blogspot.com/2010/09/reading-7-sketch-based-inferfaces.html

Summary:
This paper was written before 2001, before the $1.  $1 algorithm was a key basis algorithm like Otsu and Niblack are to document image binarization; it impressed upon me how early the development of sketch recognition truly is.  That basic algorithm was created only a few years ago; it could easily take another 10-20 years before we see some really impressive stuff.  This paper focuses on providing a basis for other algorithms by recognizing basic geometric shapes within a single stoke.  Vertex detection is the core of the algorithm.  The algorithm locates all the vertices, filters out the noise, and applies speed data to locate the actual vertices of the stroke. 

The algorithm uses Brezier curves to detect curved strokes.  The Brezier curves procedure seems sound, but not perfect.  The combination of the vertex detection and Brezier curve detection constitutes the bulk of the algorithm.  This algorithm is definitely dealing with the basics because the test data consisted of only 10 geometric shapes; the algorithm classified correctly 96% of the time, but only for those 10 shapes.  The author mentioned an extension on this algorithm but elaborated very little.

Discussion:
The vertex filter leaves room for improvement.  Simply using the mean of the data is quick and simple, but not very robust; a better method could involve smoothing the data and then choosing the highest peaks as the vertices.  Of course, that method assumes all vertices possess sharp corners; I do not know of an all-purpose answer (I don't think one even exists--yet).  The author combines speed data on assumption that a user will always slow down when drawing corners, and he reported good results (though I cannot personally be sure).

Overall, the algorithm seems a little simplistic (compared to some other algorithms I've seen) and the test data was criminally small.  It's good for proof-of-concept (which is important), but I would like to see something more substantial from this person (or someone who improves upon his research) in the future.

Thursday, September 9, 2010

Reading #6: Protractor: A Fast and Accurate Gesture Recognizer (Li)

Comment Location:
http://martysimpossibletorememberurl.blogspot.com/2010/09/reading-6-protractor.html

Summary:
This the third recent paper we have done so far; this one was done by a person working research at Google.  Protractor, the template-recognizing program in question, claims to be small in size and fast in speed.  The author gives arguments of the superiority of templates over parameter-based programs; the author argued the templates could be customized for the user and the results would be excellent, whereas such a procedure did not occur with parameter-based programs.  Protractor uses a nearest neighbor approach. 
Protractor (assuming the user allows it) aligns the image and reduces noise to allow for faster matching (it does not rescale, unlike the $1 recognizer).  I had difficulty understanding the meat of the classification procedure and would appreciate it if someone could explain it to me in a simple manner.  Protractor did not demonstrate significantly greater results than $1, but Protractor performed much more quickly.


Discussion:
The author's arguments favoring templates failed to mention their flaws.  Protractor certainly brings some unique ideas to the table, but it is worrisome how the author stated on page 1 that Protractor would work if the templates were customized for the user.  This indicates the program is not very robust.  The data set is worrisome as well.  The author employed a large data set that concentrated on the $1's strengths (where it outperformed Rubine); Protractor did not demonstrate it could compensate for $1's shortcomings.

To summarized, Protractor did what the author wanted: it's a faster version of $1.  However, Protractor is not a significant improvement over $1 in terms of the success rate--just a slight improvement.

Reading #5: Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes (Wobbrock)

Comment Location:
http://martysimpossibletorememberurl.blogspot.com/2010/09/reading-5-1.html

Summary:
This the second recent paper we have done so far (first was Hammond, Reading #1); it presents the $1 recognizer.  The code is simple and short and it can perform more effectively than Rubine at the cost of increased computation time.  The algorithm realigns the image and rescales it.  As a result, the $1 recognizer cannot distinguish between horizontal and vertical lines. 


Discussion:
I really like the "100 lines of code" part about the algorithm; that is a small algorithm.  The author stated the small size meant the algorithm was imperfect and the algorithm was slower than Rubine.  This is an excellent algorithm, albiet a limited one.  If a sketch recognition algorithm cannot distinguish between a square and rectangle, it needs to be improved.

Wednesday, September 8, 2010

Reading #4: Sketchpad: A Man-Made Graphical Communication System (Sutherland)

Comment Location:
http://christalks624.blogspot.com/2010/09/reading-4-master-sutherland-vs-machine.html

Summary:
The author is Ivan Sutherland, the MIT professor who virtually started the field of sketch recognition; the paper was written in 1963.  Considering the date, I believe this was one of the first (if not the first) sketching program created.  While reading this, I couldn't help but think this program was similar to Microsoft Paint that everyone has today; I did notice a few features that were not in Paint, though.  The author explains his sketching program in this paper.  The user creates a shape and then manipulates that shape with a number of flexible commands. 

The drawing tool of choice is the "light pen".  The user must inform the computer of the starting location of the pen to begin drawing; I summarize the light pen as an earlier version of the mouse.  The idea of a psuedo pen was to allow for easier tracking and computations on the programmer's end. 


Discussion:
This paper was a starting point for drawing programs; I believe it is a predecessor to Microsoft Paint.  I wonder if Sutherland created any other revolutionary programs; considering his listed patents on Wikipedia, that seems likely.  The program could be improved through advances in technology, and the resulting drawing programs of today are proof.

Wednesday, September 1, 2010

“Those Look Similar!” Issues in Automating Gesture Design Advice (Long)

Comment Location:
http://christalks624.blogspot.com/2010/09/youre-doing-it-wrong.html

Summary:
The author created Quill as his primary contribution.  Quill is very similar to Rubine's GRANDMA save that Quill does not include the time-based features of Rubine and adds some new elements.  Quill is a GUI that has the user/designer train the program to recognize a particular gesture type through repitition; the program provides "active feedback" to prevent a first-time user from getting lost.  The paper devoted more time to explaining the features of the program rather than the programming that created it (such as the feature set and particulars about the linear classifier).

A large section of the paper was devoted towards the difficulties involved in optimizing the active feedback.  The author mentioned the key problems were the timing of advice, amount of advice, and advice content.  The problems centered around not offending the user and still giving advice the user deemed helpful.  The author also wrote on background analysis and explained the reasoning behind his choices.  Lastly, the author mentioned his prediction system was not perfect and sometimes incorrect advice was given.



Discussion:
Quill is an improvement upon Rubine's GRANDMA in terms of considering the user.  The author tried to consider multiple situations when adjusting the active feedback so the user would actually use the advice instead of ignoring the advice.  The author's choice of the background analysis method was to prevent user confusion.  I have found it somewhat rare for a program in a scientific paper to consider the user so heavily.

I found the background analysis section to be extraneous.  There should a paper explaining how to optimize background analysis per given situation.  I believe that space in the paper could have been better spent on something else; it was certainly worth mentioning, but I feel the author should have either elaborated on the subject or been more brief.

Specifying Gestures by Example by Dean Rubine

Comment Location:
http://martysimpossibletorememberurl.blogspot.com/2010/09/reading-2-rubine.html#comments
 
Summary:
This is the paper that virtually started sketch recognition.  The author bundled up his research into a sketch recognition program called GRANDMA.  After preprocessing, the gesture is used to derive 13 features.  The author mentioned the feature set could not distinguish between all types of gestures and that the feature set should be expanded in future implementations.  The technique employs machine learning and a linear classifier.

The author allowed for a classification to be unknown.  The thresholding measure utilized would sometimes reject known gestures; obviously, future improvements are needed in that area (it's probably already been done).



Discussion:
Since there are many more sophisticated sketch recognition programs in existence today, I did not consider the GRANDMA program to be of great significance outside reference material for building one's own sketch recognition program.  The preprocessing seemed rather arbitrary (only consider every 3rd pixel or so); I'm sure there are more effective preprocessing measures in existence today.  Concerning the unknown option for classification, I'm certain the author did that to allow for future improvements.

Gesture Recognition (Hammond)

Comment Locations:
http://martysimpossibletorememberurl.blogspot.com/2010/09/hammond-gesture-recognition-survey.html#comments

Summary:
This paper covered many basics and the beginning points for sketch recognition.  The author elaborately explained the purposes of each of the features in Rubine's and Long's feature set.  The author included discussion questions for the Rubine and Long sections (I didn't notice any questions for the Wobbrock section).  Rubine and Long used different types of feature-based linear classifiers while Wobbrock implemented a template matcher; I interpreted the template matcher to mean the program finds the closest comparison of the gesture to a pre-generated set of template gestures.

Rubine and Long were more detail oriented in their classifiers while Wobbrock generalized the data set before classification.



Discussion:
I would have preferred if the author had simply given the answers to the discussion questions in the paper.  Time is valuable to me and I would prefer to have quick and complete understanding of the concepts and move on.  I did appreciate the in-depth coverage of the techniques and comparisons between Rubine, Long, and Wobbrock.  Firmly establishing the basics is absolutely necessary when learning a new discipline.

Intuitively one would think the most effective program of the three authors mentioned would depend on the data set.

Questions:
Rubine:
1) If points at an identical location are points with the same (x,y) coordinates and a different time stamp, then deleting the 1st or 2nd point there can affect the overall accuracy of the data; this is very true for the cumulative measures, like the sum of angles between each point in the stroke.

2) Removing either the 1st or the 2nd value could affect the summation features.  Ideally, the duplicate time stamps could be fixed by adding a time unit to every subsequent point (eliminating every duplicate time stamp case one at a time while preserving the integrity of the stroke as much as possible).

3)
shape 1: d
shape 2: g
shape 3: b
shape 4: e
shape 5: a
shape 6: h
shape 7: c
shape 8: f

Long:
 1) Some of the features are devoted towards finding the size of the stroke, average angle per point, and overall curviness.  I believe Long was trying to get an overall picture of the image (size of stroke and how curvy stroke was).