Thursday, December 2, 2010

Reading #15: An Image-Based, Trainable Symbol Recognizer for Hand-drawn Sketches (Kara)

Comment Location:
http://pacocomputer.blogspot.com/2010/11/reading-15-image-based-trainable-symbol.html

Summary:
The author proposes a trainable, hand-drawn symbol recognizer based on a multi-layer recognition scheme.  Binary templates are used to represent the symbols.  The author uses multiple classifiers to rank a symbol and thus increase the overall accuracy of the system.  The 4 classifiers are Hausdorff Distance, Modified Hausdorff Distance, Tanimoto Coefficient, and Yule Coefficient. 

The author discovered limitations among his shape set when he tried to compare sketches that had shapes (like arrows) differing mainly by direction, size, or some other small detail. 

Discussion:
The author realized there is currently no perfect algorithm in sketch recognition.  The idea to employ multiple recognizers is a step forward in progress.  It also increases the coding, but then again, nothing's perfect.  Maybe if the author slapped on a few more classifiers and weighted their input, the overall recognition of the symbol would increase.

Saturday, October 16, 2010

Reading #14. Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams (Bhat)

Comment Location:
http://pacocomputer.blogspot.com/2010/11/reading-14-using-entropy-to-distinguish.html

Summary:
This is another paper that distinguishes between text and shapes--good.  The algorithm relies on one feature: entropy--even better.  The author used zero-order entropy (meaning symbols are independent of each other) and this simplifies the algorithm.  It creates an alphabet of angle differences between stroke points and uses that as the basis of entropy.  Text characters have significant more variance in the angles than shapes. 

The evaluation metric appears sound.  A total of 756 strokes were used and the algorithm attained a high accuracy for the strokes it identified.  The problem is 25% of the strokes were not classified as text or shape with high confidence.  The algorithm identified the extreme cases, but balked at the ambigious ones.  The algorithm was more accurate with text strokes than geometric strokes.

Discussion:
The algorithm uses one feature to classify between shape and text.  I like it.  This is even better than the last paper.  Assuming the accuracy is to be believed, the algorithm is of great use to our programming project.  An additional filter to handle the ambiguous strokes could prove very useful. 

I'm a little unclear how the alphabet angles are grouped; I imagine it's the angle between 2 different stroke points, but I'm not sure.  If anyone knows for sure, please share. 

The paper said it used the data from the COA domain.  Isn't that the data WE are using in our 2nd programming assignment?  If so, then we should definitely use an algorithm that already works for this data set.  If anybody has an existing implementation of this algorithm, please post it to the Google groups for the class.

Reading #13. Ink Features for Diagram Recognition (Plimmer)

Comment Location:
http://pacocomputer.blogspot.com/2010/11/reading-13-ink-features-for-diagram.html

Summary:
The author attempted to distinguish between text and shapes.  The author attempted a linear split of the data.  First, the data is protted onto a graph according to its "bounding box width"; then the best fit vertical line is placed and the strokes are classified depending on which side of the line they are located.  The initial results showed a large number of misclassifications, particularly in the shape department.  The text had a much lower misclassification rate.

I did not find a step-by-step algorithm on how the bounding box was calculated.  I noticed Figure 3 revealed some information about the bounding box.  It seems the bounding box is a summary of the features of a given stroke.  Inter-stroke gaps (distance between strokes) was the biggest key feature of the feature set; it is smaller from text-to-text than shape-to-shape.

Here are the significant features:

Interstroke gaps: Time till next stroke, Speed till next stroke, Distance from last stroke, Distance to next stroke
Size: Bounding box width, Perimeter to area, Amount of ink inside
Curvature: Total angle

Discussion:
This algorithm has immediate potential for the 2nd programming project.  I plan on doing one of the sets involving characters, so this algorithm warrants close examination.  I am dissappointed in the results, but there is potential.  The author managed to find text remarkably well.  If an additional filter is used to eliminate the misclassified shapes, then the results will improve significantly.

Monday, October 11, 2010

Reading #12. Constellation Models for Sketch Recognition. (Sharon)

Comment Location:
http://martysimpossibletorememberurl.blogspot.com/2010/10/reading-12-constellation-models.html

Summary:
The paper takes a new twist on recognition by using constellation shapes as basis for shape recognition.  Remarkably, it demonstrates some success when recognizing crude facial sketches.  It relies on the connectivity between the shapes in the facial sketch.  The recognition algorithm may be called a constellation model, but it's really a graph-type recognition algoirthm.  Strokes/individual shapes form a node, and the connection between the nodes are used as part of the recognition algorithm.  I won't go into the specifics of the algorithm.

For evaluation, the author used 5 classes of objects with 7–15 labels each.  They have been tested on drawings having 3–200 strokes. The author used 20-60 training examples for each class.  The algorithm's major weakness was its computation time.  It took over an hour to recognize a face. 

Discussion:
This paper takes an...intersting...approach.  Truthfully, I would never have considered using constellation shapes as part of ANY recognition system; the thought never occurred to me.  I wonder if this will be of any help in the 2nd homework assingment...If anyone has any ideas regarding this, please share.  The author demonstrated (some) success in recognizing complex shapes.

I must admit, the numbers for evaluation (3-200 strokes in a drawing) were impressive.  The computation time is a problem, but the algorithm recognized complex shapes such as sailboats and faces.

Reading #11. LADDER, a sketching language for user interface developers. (Hammond)

Comment Location:
http://martysimpossibletorememberurl.blogspot.com/2010/10/reading-11-ladder.html

Summary:
LADDER is a subset of Paleo, if memory serves.  The paper says, "LADDER allows interface designers to describe how shapes in a domain are drawn, displayed, and edited".  Translation: LADDER gets some basic information about the stroke the programmer then plays with (like Paleo).  LADDER deals with primitive shapes only, so it can't recognize a drawing of a cat; the programmer would need to use the line information collected by LADDER to determine the sketch is a cat. 

LADDER defines a shape "in terms of previously defined shapes and constraints between them".  The pre-defined primitive shapes are everything we saw in the first homework except the multiple line types (like Polyline).  A sizable portion of the paper is devoted towards explaining the speicifics of the shape recognition, such as the definition of variables, variable values, procedure of recognition, etc.

Discussion:
Since we've already used LADDER (and Paleo) to do the 1st homework assignment and (I think) it went well, I've got no complaints.  It organizes strokes with enough diverse information so the programmer can do further recognition on the shape.  LADDER claims to have some more complex shapes such as rectangle and diamond.  If there was a triangle shape, I definitely would have liked to have seen it for the first homework assignment.

Wednesday, September 29, 2010

Reading #10. Graphical Input Through Machine Recognition of Sketches (Herot)

Comment Location:
http://christalks624.blogspot.com/2010/09/reading-10-graphical-input-through.html

Summary:
This is an old paper, written in 1976.  The HUNCH system was an early sketch recognizer that worked for some users and not for others; the programmer had styled his programming for a particular style of behavior; some users matched it, some did not.  "Latching", the practice of joining together endpoints if certain conditions are met, failed in a number of cases and did not work perfectly in the end.  The author demonstrated a "room finder" which calculated the location of room in floor plans by finding whitespace surrounded by lines; it worked well for simple plans, but it was less effective for oddly-shaped floor plans.

The author implemented "speed" and "bentness" as means of measuring and interpreting the data; I suspect "bentness" is a precusor to the term "curvature".  The author observed slower speeds and high bentness usually meant a corner.  In the end, the author was able to do some sketch recogntion clean-up that is rather common-place today.

Discussion:
This paper was another get-back-to-the-basics that demonstrated how the author implemented techniques already in existence today.  I don't remember the "latching" from any recent papers, though.  I suspect the term changed or there are not a lot of people who use it.

On a side note, where are the results & evaluation sections?

Tuesday, September 28, 2010

Reading #9. PaleoSketch: Accurate Primitive Sketch Recognition and Beautification (Paulson)

Comment Location:
http://christalks624.blogspot.com/2010/09/reading-9-paleosketch.html

Summary:
This is the Paleo program being used for the 1st programming assignment.  It takes a sketch and classifies each line within the sketch into primitives.  The set of primitives used can be changed in the code.  Paleo does some nice things in the pre-processing; it removes duplicate points (done by systems with high sampling rate), normalized distance
between direction extremes (NDDE) (finds circles rather well), and direction change ratio (DCR) (finds corners very well).  Here's a summary of the testing procedures:

Line: least squares fit comparison (more complicated and additional steps, but that's the gist of it).
Polyline: locate corners and do line test for each segmented line.
Ellipse: get center (average of points), major axis, minor axis; NNDE value is high, area of shape is within a threshold
Circle: same as ellipse, except major and minor axis must be close
Etc.

Those are the important ones in the first programming assignment, so those are all I listed. 


Discussion:
This paper had everything (and a few other things) I discovered using Paleo within the first 3 hours or so.  As for the results, all I care about at the moment is if they are good enough for the homework.  Since it seems like it, I'm satisfied about that.  Now, if only there was more documentation on Paleo...