Thursday, December 2, 2010

Reading #30: Tahuti: A Geometrical Sketch Recognition System for UML Class Diagrams (Hammond)

Comment Location:
https://www.blogger.com/comment.g?blogID=19209095&postID=1751249688937523606&isPopup=true

Summary:
UML is Unified Modeling Language and it is used to create flow charts of software design.  There is already software for creating UML diagrams (they're a little "bulky" to use), but a combination of a sketch recognition and Powerpoint-esque system would be a welcome addition.  This is exactly what Tahuti does.  The users can draw the necessary boxes and lines and type in the necessary characters.

The users were required to perform 4 tasks and rank the difficulty in accomplishing them.  The users performed the tasks on Rational Rose, a UML diagram creation software, and Tahuti.  At the end of the study, the users were interviewed.  Users expressed a higher satisfaction with Tahuti that with other UML diagram creation software and to a paint program.  Some users complained Rational Rose was non-intuitive and it was difficult to perform the desired actions.

Discussion:
The author had a very good thing working in his favor.  With the exception of letters, nearly every single shape in a UML diagram is composed of straight lines.  This makes pre-processing of the sketch and identification of the sketch a much simpler matter than it would be otherwise.  The only possible complaint I can see here is wondering if the user tasks were geared towards Tahuti's favor rather than a general set of tasks.

Reading #29: Scratch Input Creating Large, Inexpensive, Unpowered and Mobile Finger Input Surfaces (Harrison)

Comment Location:
https://www.blogger.com/comment.g?blogID=19209095&postID=1742929146352552782&isPopup=true

Summary:
This paper attempts to recognize sketches by the sound created when the user creates the sketch.  A stethoscope/microphone combination is placed on the drawing surface and the user creates the sketch.  The amplitude of the sound wave is mapped out and analyzed to determine the shape drawn.  For example, a rectangle typically has 4 amplitude peaks and a triangle typically has 3 amplitude peaks.

The author professed a high recognition rate of 90%, but he used some very simple shapes.  The number of shapes used was very small as well.  From what I read, the author assumed shapes were drawn in the same manner (very incorrect when sketching letters).

Discussion:
This paper introduced the idea of sound-based sketch recognition to me.  However, sound-based recognition should be used to create a portable sketch recognition system.  I want to see a "Magic Pen" that the user can use to sketch anywhere: on the bus, on the restroom wall, on the table or counter at Taco Bell.  The sound and positioning data are collected to recognize the sketch on a separate screen.

By itself, sound recognition of sketches is not very effective.  There are simply too many variations for a single shape and too few features to identify the shapes.  In addition, many of the variations of shapes overlap with each other, making distinction between shapes very difficult.  I am not the only one to say this.  There is at least one other paper on this topic that expresses similar sentiments.

Reading #28: iCanDraw? – Using Sketch Recognition and Corrective Feedback to Assist a User in Drawing Human Faces (Dixon)

Comment Location:
http://pacocomputer.blogspot.com/2010/11/reading-28-icandraw-using-sketch.html

Summary:
This paper presents the idea of helping a user draw a sketch correctly.  It also presents a general method for guiding a user's sketch and nine ideas for assistive sketch recognition in general.  Most users are not skilled at drawing on a computer, or drawing at all.  iCanDraw allows users to create accurate sketches.  The user's sketches were more accurate with the assistance of the iCanDraw system. 

Before the user starts sketching, the iCanDraw system analyzes the face image and extracts relevant data from it to use for the guidance interface.  The user can verify the accuracy of his or her sketch with the "Check my work" option.  It checks the accuracy of the user's lines with the "correct", computer-generated version.

Discussion:
If I remember correctly, Prof. Hammond presented this paper's contents in class.  I like the idea of getting the face correct, but it doesn't really allow for artistic creativity.  The system pretty much tells you what to draw, so why doesn't it just draw the face for you and save you the trouble?  One more thing: does this system work for someone who ISN'T bald?

Reading #27: K-Sketch: A 'Kinetic' Sketch Pad for Novice Animators (Davis)

Comment Location:
https://www.blogger.com/comment.g?blogID=19209095&postID=3549141404676567109&isPopup=true

Summary:
Here's a fun idea for anyone who's tried using Maya or any other 2D model software for animation.  This paper introduces K-Sketch to make the creation of animated models very simple and intuitive.  The author conducted several interviews to ensure the design of the system was acceptable. 

The author determined the uses the K-Sketch system would be employed for and strove to ensure those purposes could be accomplished.  For example, professional animators may use K-Sketch to do a presentation and amateurs may use K-Sketch to doodle or create an animation for entertainment purposes. 

User testing was conducted by comparing K-Sketch against Powerpoint's animation capabilities.  Users typically needed less help to accomplish tasks in K-Sketch and they accomplished those tasks in less time with K-Sketch than Powerpoint.  The user satisfaction was generally greater towards K-Sketch than Powerpoint.

Discussion:
Here's what I want to see: an evolved form of K-Sketch that allows the user to save the models in file formats accepted by game programming systems (like XNA).  Even better, I would like to scan in a few sketches of a person and use those as the basis for an animated model.  This would close the gap between handrawn pictures and animated models.  I also want to do this in 3D.

It's a good thing the author did the interviews prior to development.  It's a great way to make sure you do the job right and sadly, not a lot of papers (that present systems) do that.

Reading #26: Picturephone: A Game for Sketch Data Capture (Johnson)

Comment Location:
https://www.blogger.com/comment.g?blogID=19209095&postID=2268287811024365988&isPopup=true

Summary:
Picturephone was introduce in reading #24, so if any background information on Picturephone is necessary, please refer to that paper.  The game works in 3 basic steps:

1) party 1 describes picture in text
2) party 2 creates sketch based on text
3) party 3 judges similarity between picture and text description

The paper did not contain a results section, so this leads to doubt about Picturephone's usability testing.  The author does state users will only play Picturephone if it is engaging.  The author mentioned tools and features used to attract and hold the user's attention, but was careful to state such tools and features should not destroy the original purpose of Picturephone.

Discussion:
I was a little surprised to learn this mini-game had its own paper.  I had assumed the author included it in reading #24 and dropped it afterward.  The paper does bring up the interesting point that people interpret the same description differently.  I once did a similar exercise in an English class in middle school.  We all drew a picture based on a textual description and we found both our descriptions and sketches lacking.  Give 2 users the same description and you will get 2 different sketches, guaranteed.  Coping with this human idiosyncrasy will become a very pertinent topic in future sketch recognition research.  I also noticed chunks of text in this paper were identical to chunks of text in reading #24.

Reading #25: A Descriptor for Large Scale Image Retrieval Based on Sketched Feature Lines (Eitz)

Comment Location:
https://www.blogger.com/comment.g?blogID=19209095&postID=3221243972096915542&isPopup=true

Summary:
This is another attempt to generalize sketch recognition.  The previous paper focused on incorporating variances between pictures of the same description, and this paper focuses on scale.  The author focuses on searching for and retrieving images from a database of over a million images.  The uniqueness of the system stems from the fact that it is an image-based search system.  The user sketches an image and that is used as the query in the database. 

An edge histogram and tensor descriptor are used to extract the necessary data for the search query.  Explaining the definitions and utilization of an edge histogram and tensor descriptor would be too lengthy for this summary, so it is left to the reader to investigate further.

Discussion:
The author achieved promising results.  What I would like to see is the computer playing pictionary with a good deal of accuracy.  The user draws a sketch and the computer's queries get more specific and have a smaller list of results.  All in all, I personally enjoy the idea of an image-based search system.

Reading #24: Games for Sketch Data Collection (Johnson)

Comment Location:
http://ayden-kim.blogspot.com/2010/12/reading-24-games-for-sketch-data.html

Summary:
The author attempts to incorporate the fact that there are multiple ways to draw the same "picture".  A person can draw the sun or the moon in a variety of different ways.  People will draw different based on a text description.  The author introduced 2 games to collect sketches based on certain information (such as a text description) to enable future researchers obtain sketch data.  The 2 games are Picturephone and Stellasketch.  Picturephone gives the sketchers a description and allows them to draw it.  The judgment of the sketch's similarity to the original text description is rated by a 3rd party of humans.  Stellasketch is a computer version of Pictionary.

Discussion:
The author brings up some good points.  There are many ways of drawing many shapes; a stick figure can be drawn in 720 different ways, and that's a relatively simple sketch.  The author tried to make these games "engaging", meaning "fun".  I don't see that happening.  What would you rather do: play a mentally-stimulating game of Stellasketch or pick up Halo and kill people online with explosives?  The answer is a no-brainer, literally.

Reading #23: InkSeine: In Situ Search for Active Note Taking (Hinckley)

Comment Location:
https://www.blogger.com/comment.g?blogID=19209095&postID=483186139061182635&isPopup=true

Summary:
This paper presents a method of improving the note-taking process.  That can be very useful for classes--if done correctly.  One thing to keep in mind: this is a design paper, not an implementation paper.  The user presents the design and the lo-fi prototype studies, but has not created the actual system.

InkSeine will fundamentally rely on text recognition.  Since that has not been perfected, I do not foresee InkSeine being perfected until text recognition is perfected.

Discussion:
Doing this one correctly is a problem, because the optimal set of needs for note-taking differs from user to user.  There are also users who prefer pen-and-paper for note-taking (I'm one of them).  Now that I think of it, Professor Kerne has done something that is slightly similar: combinFormation.  From what I gather, this particular interface is not "fast" enough to support real-time note taking in class, escpecially at the rate some professors talk and write.  InkSeine also seems to be somewhat dependent on having the relevant materials on hand for searching (instead of searching on the Internet, for example).  InkSeine is one of the many applications that would benefit from perfect text recognition.

Reading #22: Plushie: An Interactive Design System for Plush Toys (Mori)

Comment Location:
http://ayden-kim.blogspot.com/2010/12/reading-22-plushie.html

Summary:
This is another system for creating a 3D image from 2D sketches.  Unlike Reading #21 where the 2D sketch was "inflated" into a 3D shape, Plushie sews together a bunch of 2D images into a final 3D form.  The interface has 2 windows: one for 3D editing and the other for 2D editing.  The user can edit the images in either of these windows to modify the current plushie representation shown in the 3D window.

A triangle mesh was used for the 3D modeling.  Children tested and used the system and were able to design and (maybe have this part done for them) sew a plushie toy.  The designing phase took considerably longer than the author required to design a plushie.

Discussion:
Here is yet another paper concerning the design of stuffed animals.  The interface of Plushie must indeed be simple and intuitive if children were able to use it successfully.  Personally, I wouldn't mind have a look at that code.  There is definitely some "rippable" snippets in there.

Reading #21: Teddy: A Sketching Interface for 3D Freeform Design (Igarashi)

Comment Location:
http://ayden-kim.blogspot.com/2010/12/reading-21-teddy.html

Summary:
Now this is interesting.  The user "draws" 3D objects by sketching a 2D sketch and then letting the system Teddy work its algorithm.  Teddy doesn't recognize individual shapes, such as a square or triangle.  Instead, it takes an enclosed shape and does a number of operations to it to transform the sketch into a 3D shape.  Operations include bending, painting, and extrusion; there are multiple variations of each operation, depending on the shape.  Only specialists in the author's general research areas tested the Teddy system, but they gave very positive reviews.

Discussion:
I noticed the light source differed between some of the sketches show in figure 6.  This makes me wonder if the light source is decided by Teddy or if it can be customized by the user.  Here is the future: combine this with Maya, so I can draw something and convert it to a rendered 3D object.  The farther future: scan a sketch(es) of a game character and create a 3D based on the input.  This would reduce the workload of game developers when creating characters, enemies, and levels.

Reading #20: MathPad2: A System for the Creation and Exploration of Mathematical Sketches (LaViola)

Comment Location:
http://pacocomputer.blogspot.com/2010/12/reading-20-mathpad-2-system-for.html

Summary:
The author presents an algorithm for recognizing mathematical problems and solving them.  The user sketches an equation or mathematical situation, and the MathPad2 system solves that problem.  Currently, MathPad2 cannot solve complex problems involving multiple equations by itself.  MathPad2 is currently limited to solving simple equations.  MathPad2 is a user-driven sketch recognition system using menu options and gesture options to activate functionality.

Discussion:
I did not see a results sections or a section devoted towards user evaluation; I did notice the odd feedback mini-section here and there.  I cannot help but think this system was created with minimal user feedback.  It would be interesting to see how a few Math majors (Math PhD students in particular and some Math professors) think of MathPad2.

Reading #19: Diagram Structure Recognition by Bayesian Conditional Random Fields (Qi)

Comment Location:
http://pacocomputer.blogspot.com/2010/12/reading-19-diagram-structure.html

Summary:
Bayes Theorem is a probability technique for guessing if a piece of data belongs to a particular class based on training data.  This has been applied to sketch recognition.  The algorithm also involves Markov properties, and I do not have a background in that; due to this, my explanation on the algorithm will be rather scant.  The algorithm only attempts to identify components within a diagram sketch.

The results, like all learning classifiers (and most algorithms on the planet) were not perfect.  The algorithm failed to give correct identification for all sketches.

Discussion:
The paper used some things I do not a background on so I cannot offer much in the way of discussion.  I can say the algorithm chose an approach I have not seen before and the  results were not perfect.  Still, it's a nice, math-heavy idea for a field.  It seems most fields try that route at some point.

Reading #18: Spatial Recognition and Grouping of Text and Graphics (Shilman)

Comment Location:
http://pacocomputer.blogspot.com/2010/12/reading-18-spatial-recognition-and.html

Summary:
This algorithm is somewhat similar to Reading #16.  The algorithm creates a proximity graph of each stroke.  The order of the strokes is not used in this algorithm; this potentially reduces error in the event the user drew the shape in an unusual manner.  The author improves upon the work of Viola-Jones, who "constructed a real-time face detection system using a boosted collection of simple and efficient features". 

Discussion:
The author achieved some interesting results.  The algorithm had improved results over some other algorithms when the number of recognizable shapes increased.  I'd say this algorithm is definitely worth looking into and it is beneficial to use parts of it.

Reading #17: Distinguishing Text from Graphics in On-line Handwritten Ink (Bishop)

Comment Location:
http://pacocomputer.blogspot.com/2010/12/reading-17-distinguishing-text-from.html

Summary:
This algorithm separates text from graphics.  Unlike the entropy algorithm, this algorithm employs a feature set.  The feature set include characteristics of the strokes and the relationship between strokes, such as the distance between strokes.  Time difference between strokes is calculated as well.

If I'm reading the results correctly, the algorithm demonstrated a great deal of errors.  This is not a surprise, considering some shapes look like characters (triangle looks like letter "A").

Discussion:
It's a shame I did not discover this paper before the due date of the second homework assignment.  Otherwise, I could have used some of the features to distinguish between text and non-text strokes.

Reading #16: An Efficient Graph-Based Symbol Recognizer (Lee)

Comment Location:
http://pacocomputer.blogspot.com/2010/12/reading-16-efficient-graph-based-symbol.html

Summary:
The algorithm employs a relational graph as the basis for its recognition.  For homework 1, I used relations between some lines as part of my algorithm, but I did not base my entire algorithm off the relations between strokes.  This algorithm does that.  Once the relational graph is created, the sketch is matched to the template that has the closest relational graph.  There are several ways of doing this, and the paper employs 4 of them:

"Stochastic Matching, which is based on stochastic search; Error-driven Matching, which uses local matching errors to drive the solution to an optimal match; Greedy Matching, which uses greedy search; and Sort Matching, which relies on geometric information to accelerate the matching."

A grand total of 23 different symbols were used in the testing.  The only algorithm to perform with less than a 90% accuracy was the sort type; the sort type also had the shortest computation time.  There was very little difference in the accuracy rates of the other 3 algorithms; their comptutation times differed widely though.  Stochastic took the longest to finish.

Discussion:
This algorithm is definitely viable for sketch recognition.  The grand future of sketch recognition no doubt involves an interface that recognizes and fixes up any sketch a user is making.  To encompass "any sketch", a large database is currently required and a significant amount of time to use that database is required.  The question remains, is there a way to get around that?  Is there a method of recognizing sketches that doesn't rely on a large amount of stored memory?  If not, then the cost of using that memory must be made much smaller than it is today and the computational abilities of computers must increase drastically (latter one's always happening).

Reading #15: An Image-Based, Trainable Symbol Recognizer for Hand-drawn Sketches (Kara)

Comment Location:
http://pacocomputer.blogspot.com/2010/11/reading-15-image-based-trainable-symbol.html

Summary:
The author proposes a trainable, hand-drawn symbol recognizer based on a multi-layer recognition scheme.  Binary templates are used to represent the symbols.  The author uses multiple classifiers to rank a symbol and thus increase the overall accuracy of the system.  The 4 classifiers are Hausdorff Distance, Modified Hausdorff Distance, Tanimoto Coefficient, and Yule Coefficient. 

The author discovered limitations among his shape set when he tried to compare sketches that had shapes (like arrows) differing mainly by direction, size, or some other small detail. 

Discussion:
The author realized there is currently no perfect algorithm in sketch recognition.  The idea to employ multiple recognizers is a step forward in progress.  It also increases the coding, but then again, nothing's perfect.  Maybe if the author slapped on a few more classifiers and weighted their input, the overall recognition of the symbol would increase.

Saturday, October 16, 2010

Reading #14. Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams (Bhat)

Comment Location:
http://pacocomputer.blogspot.com/2010/11/reading-14-using-entropy-to-distinguish.html

Summary:
This is another paper that distinguishes between text and shapes--good.  The algorithm relies on one feature: entropy--even better.  The author used zero-order entropy (meaning symbols are independent of each other) and this simplifies the algorithm.  It creates an alphabet of angle differences between stroke points and uses that as the basis of entropy.  Text characters have significant more variance in the angles than shapes. 

The evaluation metric appears sound.  A total of 756 strokes were used and the algorithm attained a high accuracy for the strokes it identified.  The problem is 25% of the strokes were not classified as text or shape with high confidence.  The algorithm identified the extreme cases, but balked at the ambigious ones.  The algorithm was more accurate with text strokes than geometric strokes.

Discussion:
The algorithm uses one feature to classify between shape and text.  I like it.  This is even better than the last paper.  Assuming the accuracy is to be believed, the algorithm is of great use to our programming project.  An additional filter to handle the ambiguous strokes could prove very useful. 

I'm a little unclear how the alphabet angles are grouped; I imagine it's the angle between 2 different stroke points, but I'm not sure.  If anyone knows for sure, please share. 

The paper said it used the data from the COA domain.  Isn't that the data WE are using in our 2nd programming assignment?  If so, then we should definitely use an algorithm that already works for this data set.  If anybody has an existing implementation of this algorithm, please post it to the Google groups for the class.

Reading #13. Ink Features for Diagram Recognition (Plimmer)

Comment Location:
http://pacocomputer.blogspot.com/2010/11/reading-13-ink-features-for-diagram.html

Summary:
The author attempted to distinguish between text and shapes.  The author attempted a linear split of the data.  First, the data is protted onto a graph according to its "bounding box width"; then the best fit vertical line is placed and the strokes are classified depending on which side of the line they are located.  The initial results showed a large number of misclassifications, particularly in the shape department.  The text had a much lower misclassification rate.

I did not find a step-by-step algorithm on how the bounding box was calculated.  I noticed Figure 3 revealed some information about the bounding box.  It seems the bounding box is a summary of the features of a given stroke.  Inter-stroke gaps (distance between strokes) was the biggest key feature of the feature set; it is smaller from text-to-text than shape-to-shape.

Here are the significant features:

Interstroke gaps: Time till next stroke, Speed till next stroke, Distance from last stroke, Distance to next stroke
Size: Bounding box width, Perimeter to area, Amount of ink inside
Curvature: Total angle

Discussion:
This algorithm has immediate potential for the 2nd programming project.  I plan on doing one of the sets involving characters, so this algorithm warrants close examination.  I am dissappointed in the results, but there is potential.  The author managed to find text remarkably well.  If an additional filter is used to eliminate the misclassified shapes, then the results will improve significantly.

Monday, October 11, 2010

Reading #12. Constellation Models for Sketch Recognition. (Sharon)

Comment Location:
http://martysimpossibletorememberurl.blogspot.com/2010/10/reading-12-constellation-models.html

Summary:
The paper takes a new twist on recognition by using constellation shapes as basis for shape recognition.  Remarkably, it demonstrates some success when recognizing crude facial sketches.  It relies on the connectivity between the shapes in the facial sketch.  The recognition algorithm may be called a constellation model, but it's really a graph-type recognition algoirthm.  Strokes/individual shapes form a node, and the connection between the nodes are used as part of the recognition algorithm.  I won't go into the specifics of the algorithm.

For evaluation, the author used 5 classes of objects with 7–15 labels each.  They have been tested on drawings having 3–200 strokes. The author used 20-60 training examples for each class.  The algorithm's major weakness was its computation time.  It took over an hour to recognize a face. 

Discussion:
This paper takes an...intersting...approach.  Truthfully, I would never have considered using constellation shapes as part of ANY recognition system; the thought never occurred to me.  I wonder if this will be of any help in the 2nd homework assingment...If anyone has any ideas regarding this, please share.  The author demonstrated (some) success in recognizing complex shapes.

I must admit, the numbers for evaluation (3-200 strokes in a drawing) were impressive.  The computation time is a problem, but the algorithm recognized complex shapes such as sailboats and faces.

Reading #11. LADDER, a sketching language for user interface developers. (Hammond)

Comment Location:
http://martysimpossibletorememberurl.blogspot.com/2010/10/reading-11-ladder.html

Summary:
LADDER is a subset of Paleo, if memory serves.  The paper says, "LADDER allows interface designers to describe how shapes in a domain are drawn, displayed, and edited".  Translation: LADDER gets some basic information about the stroke the programmer then plays with (like Paleo).  LADDER deals with primitive shapes only, so it can't recognize a drawing of a cat; the programmer would need to use the line information collected by LADDER to determine the sketch is a cat. 

LADDER defines a shape "in terms of previously defined shapes and constraints between them".  The pre-defined primitive shapes are everything we saw in the first homework except the multiple line types (like Polyline).  A sizable portion of the paper is devoted towards explaining the speicifics of the shape recognition, such as the definition of variables, variable values, procedure of recognition, etc.

Discussion:
Since we've already used LADDER (and Paleo) to do the 1st homework assignment and (I think) it went well, I've got no complaints.  It organizes strokes with enough diverse information so the programmer can do further recognition on the shape.  LADDER claims to have some more complex shapes such as rectangle and diamond.  If there was a triangle shape, I definitely would have liked to have seen it for the first homework assignment.

Wednesday, September 29, 2010

Reading #10. Graphical Input Through Machine Recognition of Sketches (Herot)

Comment Location:
http://christalks624.blogspot.com/2010/09/reading-10-graphical-input-through.html

Summary:
This is an old paper, written in 1976.  The HUNCH system was an early sketch recognizer that worked for some users and not for others; the programmer had styled his programming for a particular style of behavior; some users matched it, some did not.  "Latching", the practice of joining together endpoints if certain conditions are met, failed in a number of cases and did not work perfectly in the end.  The author demonstrated a "room finder" which calculated the location of room in floor plans by finding whitespace surrounded by lines; it worked well for simple plans, but it was less effective for oddly-shaped floor plans.

The author implemented "speed" and "bentness" as means of measuring and interpreting the data; I suspect "bentness" is a precusor to the term "curvature".  The author observed slower speeds and high bentness usually meant a corner.  In the end, the author was able to do some sketch recogntion clean-up that is rather common-place today.

Discussion:
This paper was another get-back-to-the-basics that demonstrated how the author implemented techniques already in existence today.  I don't remember the "latching" from any recent papers, though.  I suspect the term changed or there are not a lot of people who use it.

On a side note, where are the results & evaluation sections?

Tuesday, September 28, 2010

Reading #9. PaleoSketch: Accurate Primitive Sketch Recognition and Beautification (Paulson)

Comment Location:
http://christalks624.blogspot.com/2010/09/reading-9-paleosketch.html

Summary:
This is the Paleo program being used for the 1st programming assignment.  It takes a sketch and classifies each line within the sketch into primitives.  The set of primitives used can be changed in the code.  Paleo does some nice things in the pre-processing; it removes duplicate points (done by systems with high sampling rate), normalized distance
between direction extremes (NDDE) (finds circles rather well), and direction change ratio (DCR) (finds corners very well).  Here's a summary of the testing procedures:

Line: least squares fit comparison (more complicated and additional steps, but that's the gist of it).
Polyline: locate corners and do line test for each segmented line.
Ellipse: get center (average of points), major axis, minor axis; NNDE value is high, area of shape is within a threshold
Circle: same as ellipse, except major and minor axis must be close
Etc.

Those are the important ones in the first programming assignment, so those are all I listed. 


Discussion:
This paper had everything (and a few other things) I discovered using Paleo within the first 3 hours or so.  As for the results, all I care about at the moment is if they are good enough for the homework.  Since it seems like it, I'm satisfied about that.  Now, if only there was more documentation on Paleo...

Tuesday, September 14, 2010

Reading #8. A Lightweight Multistroke Recognizer for User Interface Prototypes (Anothony)

Comment Location:
http://christalks624.blogspot.com/2010/09/reading-8-lightweight-multistroke.html

Summary:
The previous paper (and most of the previous papers I believe) focused on recognizing the shape of a single stroke.  This paper seeks to do sketch recognition for object made of multiple strokes.  Named the $N, the algorithm is a little more than twice the $1 in number of lines of code and claims to be accurate; it's also a template-matcher.  $N claims to improve upon all of $1's weaknesses and achieved an impressive 96.7% recognition rate for a set of 15 templates. 

$N seeks to improve upon $1 by "(1) recognizing gestures comprising
multiple strokes, (2) automatically generalizing from one
multistroke template to all possible multistrokes with alternative
stroke orderings and directions, (3) recognizing 1D gestures such
as lines, and (4) providing bounded rotation invariance.

The paper can be reviewed for details on exactly how those 4 goals are accomplished.  $N still suffers from the shortcomings of any template-recognizer (not good with new shapes).  Still, it was able to recognize letters and some mathematical symbols.

Discussion:
I've noticed this in other papers, but the authors tend to use small data sets and the sketches are rather similar to their intended images; there are some flaws in the sketches, of course.  It's good that the sketchers could draw well and a small data set was used to prove the concept, but that does not indicate an algorithm's robustness and ability to recognize (or clean up) ANY image the user draws (like words or an airplane, cat, or car, for instance).  Overall, the results have been somewhat biased; that seems to be true for a LOT of papers, but it's still rather discouraging to see it so frequently in a singular discipline.

That said, I LOVE how the author included psuedocode for his algorithm at the end of the paper; that's very rare, in any paper of any discipline of computer science.

On a side note, is it possible to include pressure exerted in the data of each point?
Point p = (x,y,time,pressure)

While implementing this is primarily a hardware (and some back-end software) issue, I believe the user would exert more pressure on mores significant lines and curves; that could aid in deciphering the user's intention behind his sketch.

Reading #7. Sketch Based Interfaces: Early Processing for Sketch Understanding (Sezgin)

Comment Location:
http://christalks624.blogspot.com/2010/09/reading-7-sketch-based-inferfaces.html

Summary:
This paper was written before 2001, before the $1.  $1 algorithm was a key basis algorithm like Otsu and Niblack are to document image binarization; it impressed upon me how early the development of sketch recognition truly is.  That basic algorithm was created only a few years ago; it could easily take another 10-20 years before we see some really impressive stuff.  This paper focuses on providing a basis for other algorithms by recognizing basic geometric shapes within a single stoke.  Vertex detection is the core of the algorithm.  The algorithm locates all the vertices, filters out the noise, and applies speed data to locate the actual vertices of the stroke. 

The algorithm uses Brezier curves to detect curved strokes.  The Brezier curves procedure seems sound, but not perfect.  The combination of the vertex detection and Brezier curve detection constitutes the bulk of the algorithm.  This algorithm is definitely dealing with the basics because the test data consisted of only 10 geometric shapes; the algorithm classified correctly 96% of the time, but only for those 10 shapes.  The author mentioned an extension on this algorithm but elaborated very little.

Discussion:
The vertex filter leaves room for improvement.  Simply using the mean of the data is quick and simple, but not very robust; a better method could involve smoothing the data and then choosing the highest peaks as the vertices.  Of course, that method assumes all vertices possess sharp corners; I do not know of an all-purpose answer (I don't think one even exists--yet).  The author combines speed data on assumption that a user will always slow down when drawing corners, and he reported good results (though I cannot personally be sure).

Overall, the algorithm seems a little simplistic (compared to some other algorithms I've seen) and the test data was criminally small.  It's good for proof-of-concept (which is important), but I would like to see something more substantial from this person (or someone who improves upon his research) in the future.

Thursday, September 9, 2010

Reading #6: Protractor: A Fast and Accurate Gesture Recognizer (Li)

Comment Location:
http://martysimpossibletorememberurl.blogspot.com/2010/09/reading-6-protractor.html

Summary:
This the third recent paper we have done so far; this one was done by a person working research at Google.  Protractor, the template-recognizing program in question, claims to be small in size and fast in speed.  The author gives arguments of the superiority of templates over parameter-based programs; the author argued the templates could be customized for the user and the results would be excellent, whereas such a procedure did not occur with parameter-based programs.  Protractor uses a nearest neighbor approach. 
Protractor (assuming the user allows it) aligns the image and reduces noise to allow for faster matching (it does not rescale, unlike the $1 recognizer).  I had difficulty understanding the meat of the classification procedure and would appreciate it if someone could explain it to me in a simple manner.  Protractor did not demonstrate significantly greater results than $1, but Protractor performed much more quickly.


Discussion:
The author's arguments favoring templates failed to mention their flaws.  Protractor certainly brings some unique ideas to the table, but it is worrisome how the author stated on page 1 that Protractor would work if the templates were customized for the user.  This indicates the program is not very robust.  The data set is worrisome as well.  The author employed a large data set that concentrated on the $1's strengths (where it outperformed Rubine); Protractor did not demonstrate it could compensate for $1's shortcomings.

To summarized, Protractor did what the author wanted: it's a faster version of $1.  However, Protractor is not a significant improvement over $1 in terms of the success rate--just a slight improvement.

Reading #5: Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes (Wobbrock)

Comment Location:
http://martysimpossibletorememberurl.blogspot.com/2010/09/reading-5-1.html

Summary:
This the second recent paper we have done so far (first was Hammond, Reading #1); it presents the $1 recognizer.  The code is simple and short and it can perform more effectively than Rubine at the cost of increased computation time.  The algorithm realigns the image and rescales it.  As a result, the $1 recognizer cannot distinguish between horizontal and vertical lines. 


Discussion:
I really like the "100 lines of code" part about the algorithm; that is a small algorithm.  The author stated the small size meant the algorithm was imperfect and the algorithm was slower than Rubine.  This is an excellent algorithm, albiet a limited one.  If a sketch recognition algorithm cannot distinguish between a square and rectangle, it needs to be improved.

Wednesday, September 8, 2010

Reading #4: Sketchpad: A Man-Made Graphical Communication System (Sutherland)

Comment Location:
http://christalks624.blogspot.com/2010/09/reading-4-master-sutherland-vs-machine.html

Summary:
The author is Ivan Sutherland, the MIT professor who virtually started the field of sketch recognition; the paper was written in 1963.  Considering the date, I believe this was one of the first (if not the first) sketching program created.  While reading this, I couldn't help but think this program was similar to Microsoft Paint that everyone has today; I did notice a few features that were not in Paint, though.  The author explains his sketching program in this paper.  The user creates a shape and then manipulates that shape with a number of flexible commands. 

The drawing tool of choice is the "light pen".  The user must inform the computer of the starting location of the pen to begin drawing; I summarize the light pen as an earlier version of the mouse.  The idea of a psuedo pen was to allow for easier tracking and computations on the programmer's end. 


Discussion:
This paper was a starting point for drawing programs; I believe it is a predecessor to Microsoft Paint.  I wonder if Sutherland created any other revolutionary programs; considering his listed patents on Wikipedia, that seems likely.  The program could be improved through advances in technology, and the resulting drawing programs of today are proof.

Wednesday, September 1, 2010

“Those Look Similar!” Issues in Automating Gesture Design Advice (Long)

Comment Location:
http://christalks624.blogspot.com/2010/09/youre-doing-it-wrong.html

Summary:
The author created Quill as his primary contribution.  Quill is very similar to Rubine's GRANDMA save that Quill does not include the time-based features of Rubine and adds some new elements.  Quill is a GUI that has the user/designer train the program to recognize a particular gesture type through repitition; the program provides "active feedback" to prevent a first-time user from getting lost.  The paper devoted more time to explaining the features of the program rather than the programming that created it (such as the feature set and particulars about the linear classifier).

A large section of the paper was devoted towards the difficulties involved in optimizing the active feedback.  The author mentioned the key problems were the timing of advice, amount of advice, and advice content.  The problems centered around not offending the user and still giving advice the user deemed helpful.  The author also wrote on background analysis and explained the reasoning behind his choices.  Lastly, the author mentioned his prediction system was not perfect and sometimes incorrect advice was given.



Discussion:
Quill is an improvement upon Rubine's GRANDMA in terms of considering the user.  The author tried to consider multiple situations when adjusting the active feedback so the user would actually use the advice instead of ignoring the advice.  The author's choice of the background analysis method was to prevent user confusion.  I have found it somewhat rare for a program in a scientific paper to consider the user so heavily.

I found the background analysis section to be extraneous.  There should a paper explaining how to optimize background analysis per given situation.  I believe that space in the paper could have been better spent on something else; it was certainly worth mentioning, but I feel the author should have either elaborated on the subject or been more brief.

Specifying Gestures by Example by Dean Rubine

Comment Location:
http://martysimpossibletorememberurl.blogspot.com/2010/09/reading-2-rubine.html#comments
 
Summary:
This is the paper that virtually started sketch recognition.  The author bundled up his research into a sketch recognition program called GRANDMA.  After preprocessing, the gesture is used to derive 13 features.  The author mentioned the feature set could not distinguish between all types of gestures and that the feature set should be expanded in future implementations.  The technique employs machine learning and a linear classifier.

The author allowed for a classification to be unknown.  The thresholding measure utilized would sometimes reject known gestures; obviously, future improvements are needed in that area (it's probably already been done).



Discussion:
Since there are many more sophisticated sketch recognition programs in existence today, I did not consider the GRANDMA program to be of great significance outside reference material for building one's own sketch recognition program.  The preprocessing seemed rather arbitrary (only consider every 3rd pixel or so); I'm sure there are more effective preprocessing measures in existence today.  Concerning the unknown option for classification, I'm certain the author did that to allow for future improvements.

Gesture Recognition (Hammond)

Comment Locations:
http://martysimpossibletorememberurl.blogspot.com/2010/09/hammond-gesture-recognition-survey.html#comments

Summary:
This paper covered many basics and the beginning points for sketch recognition.  The author elaborately explained the purposes of each of the features in Rubine's and Long's feature set.  The author included discussion questions for the Rubine and Long sections (I didn't notice any questions for the Wobbrock section).  Rubine and Long used different types of feature-based linear classifiers while Wobbrock implemented a template matcher; I interpreted the template matcher to mean the program finds the closest comparison of the gesture to a pre-generated set of template gestures.

Rubine and Long were more detail oriented in their classifiers while Wobbrock generalized the data set before classification.



Discussion:
I would have preferred if the author had simply given the answers to the discussion questions in the paper.  Time is valuable to me and I would prefer to have quick and complete understanding of the concepts and move on.  I did appreciate the in-depth coverage of the techniques and comparisons between Rubine, Long, and Wobbrock.  Firmly establishing the basics is absolutely necessary when learning a new discipline.

Intuitively one would think the most effective program of the three authors mentioned would depend on the data set.

Questions:
Rubine:
1) If points at an identical location are points with the same (x,y) coordinates and a different time stamp, then deleting the 1st or 2nd point there can affect the overall accuracy of the data; this is very true for the cumulative measures, like the sum of angles between each point in the stroke.

2) Removing either the 1st or the 2nd value could affect the summation features.  Ideally, the duplicate time stamps could be fixed by adding a time unit to every subsequent point (eliminating every duplicate time stamp case one at a time while preserving the integrity of the stroke as much as possible).

3)
shape 1: d
shape 2: g
shape 3: b
shape 4: e
shape 5: a
shape 6: h
shape 7: c
shape 8: f

Long:
 1) Some of the features are devoted towards finding the size of the stroke, average angle per point, and overall curviness.  I believe Long was trying to get an overall picture of the image (size of stroke and how curvy stroke was).

Tuesday, August 31, 2010

Answers to Questionare

These are the answers for the questionnaire for my new blog for CSCE 624 Sketch Recognition.
  1. Photo of yourself.
    1. It's up.
  2. E-mail address (e.g., yourname at domain.com).
    1. grovemaster@hotmail.com
  3. Graduate standing (e.g., 3rd year Phd) (e.g., 3rd Year PhD, 2nd Year Masters, 1st Year PhD w/ Masters).
    1. 2nd semester Masters
  4. Why are you taking this class?
    1. Interesting subject; it was on my to-do list
  5. What experience do you bring to this class?
    1. Enthusiasm and a willingness to learn
  6. What do you expect to be doing in 10 years?
    1. Manager-level position or higher at a private company or government.
  7. What do you think will be the next biggest technological advancement in computer science?
    1. "Free" movement/allocation of physical memory (no more pulling from hard disk)
  8. What was your favorite course when you were an undergraduate (computer science or otherwise)?
    1. I don't pick favorites, but I did enjoy Prof. Leiss's Algorithms class.  I learned a lot and did some hands-on programming.  I enjoyed the class, even though Prof. Leiss has a "small" streak of "strictness".
  9. What is your favorite movie and why?
    1. I don't pick favorites, but all my top shows are animated.  2 of my favorite American films are "Balto" (the first one) and  "Cats Don't Dance".  Both movies were family-oriented and really uplifting.
  10. If you could travel back in time, who would you like to meet and why?
    1. My grandpa on my Dad's side.  He died when I was too young to really remember him.
  11. Give some interesting fact about yourself.
    1. I'm fond of anime and video games and am actively pursuing a closer relationship with Christ this semester.