Thursday, December 2, 2010

Reading #30: Tahuti: A Geometrical Sketch Recognition System for UML Class Diagrams (Hammond)

Comment Location:
https://www.blogger.com/comment.g?blogID=19209095&postID=1751249688937523606&isPopup=true

Summary:
UML is Unified Modeling Language and it is used to create flow charts of software design.  There is already software for creating UML diagrams (they're a little "bulky" to use), but a combination of a sketch recognition and Powerpoint-esque system would be a welcome addition.  This is exactly what Tahuti does.  The users can draw the necessary boxes and lines and type in the necessary characters.

The users were required to perform 4 tasks and rank the difficulty in accomplishing them.  The users performed the tasks on Rational Rose, a UML diagram creation software, and Tahuti.  At the end of the study, the users were interviewed.  Users expressed a higher satisfaction with Tahuti that with other UML diagram creation software and to a paint program.  Some users complained Rational Rose was non-intuitive and it was difficult to perform the desired actions.

Discussion:
The author had a very good thing working in his favor.  With the exception of letters, nearly every single shape in a UML diagram is composed of straight lines.  This makes pre-processing of the sketch and identification of the sketch a much simpler matter than it would be otherwise.  The only possible complaint I can see here is wondering if the user tasks were geared towards Tahuti's favor rather than a general set of tasks.

Reading #29: Scratch Input Creating Large, Inexpensive, Unpowered and Mobile Finger Input Surfaces (Harrison)

Comment Location:
https://www.blogger.com/comment.g?blogID=19209095&postID=1742929146352552782&isPopup=true

Summary:
This paper attempts to recognize sketches by the sound created when the user creates the sketch.  A stethoscope/microphone combination is placed on the drawing surface and the user creates the sketch.  The amplitude of the sound wave is mapped out and analyzed to determine the shape drawn.  For example, a rectangle typically has 4 amplitude peaks and a triangle typically has 3 amplitude peaks.

The author professed a high recognition rate of 90%, but he used some very simple shapes.  The number of shapes used was very small as well.  From what I read, the author assumed shapes were drawn in the same manner (very incorrect when sketching letters).

Discussion:
This paper introduced the idea of sound-based sketch recognition to me.  However, sound-based recognition should be used to create a portable sketch recognition system.  I want to see a "Magic Pen" that the user can use to sketch anywhere: on the bus, on the restroom wall, on the table or counter at Taco Bell.  The sound and positioning data are collected to recognize the sketch on a separate screen.

By itself, sound recognition of sketches is not very effective.  There are simply too many variations for a single shape and too few features to identify the shapes.  In addition, many of the variations of shapes overlap with each other, making distinction between shapes very difficult.  I am not the only one to say this.  There is at least one other paper on this topic that expresses similar sentiments.

Reading #28: iCanDraw? – Using Sketch Recognition and Corrective Feedback to Assist a User in Drawing Human Faces (Dixon)

Comment Location:
http://pacocomputer.blogspot.com/2010/11/reading-28-icandraw-using-sketch.html

Summary:
This paper presents the idea of helping a user draw a sketch correctly.  It also presents a general method for guiding a user's sketch and nine ideas for assistive sketch recognition in general.  Most users are not skilled at drawing on a computer, or drawing at all.  iCanDraw allows users to create accurate sketches.  The user's sketches were more accurate with the assistance of the iCanDraw system. 

Before the user starts sketching, the iCanDraw system analyzes the face image and extracts relevant data from it to use for the guidance interface.  The user can verify the accuracy of his or her sketch with the "Check my work" option.  It checks the accuracy of the user's lines with the "correct", computer-generated version.

Discussion:
If I remember correctly, Prof. Hammond presented this paper's contents in class.  I like the idea of getting the face correct, but it doesn't really allow for artistic creativity.  The system pretty much tells you what to draw, so why doesn't it just draw the face for you and save you the trouble?  One more thing: does this system work for someone who ISN'T bald?

Reading #27: K-Sketch: A 'Kinetic' Sketch Pad for Novice Animators (Davis)

Comment Location:
https://www.blogger.com/comment.g?blogID=19209095&postID=3549141404676567109&isPopup=true

Summary:
Here's a fun idea for anyone who's tried using Maya or any other 2D model software for animation.  This paper introduces K-Sketch to make the creation of animated models very simple and intuitive.  The author conducted several interviews to ensure the design of the system was acceptable. 

The author determined the uses the K-Sketch system would be employed for and strove to ensure those purposes could be accomplished.  For example, professional animators may use K-Sketch to do a presentation and amateurs may use K-Sketch to doodle or create an animation for entertainment purposes. 

User testing was conducted by comparing K-Sketch against Powerpoint's animation capabilities.  Users typically needed less help to accomplish tasks in K-Sketch and they accomplished those tasks in less time with K-Sketch than Powerpoint.  The user satisfaction was generally greater towards K-Sketch than Powerpoint.

Discussion:
Here's what I want to see: an evolved form of K-Sketch that allows the user to save the models in file formats accepted by game programming systems (like XNA).  Even better, I would like to scan in a few sketches of a person and use those as the basis for an animated model.  This would close the gap between handrawn pictures and animated models.  I also want to do this in 3D.

It's a good thing the author did the interviews prior to development.  It's a great way to make sure you do the job right and sadly, not a lot of papers (that present systems) do that.

Reading #26: Picturephone: A Game for Sketch Data Capture (Johnson)

Comment Location:
https://www.blogger.com/comment.g?blogID=19209095&postID=2268287811024365988&isPopup=true

Summary:
Picturephone was introduce in reading #24, so if any background information on Picturephone is necessary, please refer to that paper.  The game works in 3 basic steps:

1) party 1 describes picture in text
2) party 2 creates sketch based on text
3) party 3 judges similarity between picture and text description

The paper did not contain a results section, so this leads to doubt about Picturephone's usability testing.  The author does state users will only play Picturephone if it is engaging.  The author mentioned tools and features used to attract and hold the user's attention, but was careful to state such tools and features should not destroy the original purpose of Picturephone.

Discussion:
I was a little surprised to learn this mini-game had its own paper.  I had assumed the author included it in reading #24 and dropped it afterward.  The paper does bring up the interesting point that people interpret the same description differently.  I once did a similar exercise in an English class in middle school.  We all drew a picture based on a textual description and we found both our descriptions and sketches lacking.  Give 2 users the same description and you will get 2 different sketches, guaranteed.  Coping with this human idiosyncrasy will become a very pertinent topic in future sketch recognition research.  I also noticed chunks of text in this paper were identical to chunks of text in reading #24.

Reading #25: A Descriptor for Large Scale Image Retrieval Based on Sketched Feature Lines (Eitz)

Comment Location:
https://www.blogger.com/comment.g?blogID=19209095&postID=3221243972096915542&isPopup=true

Summary:
This is another attempt to generalize sketch recognition.  The previous paper focused on incorporating variances between pictures of the same description, and this paper focuses on scale.  The author focuses on searching for and retrieving images from a database of over a million images.  The uniqueness of the system stems from the fact that it is an image-based search system.  The user sketches an image and that is used as the query in the database. 

An edge histogram and tensor descriptor are used to extract the necessary data for the search query.  Explaining the definitions and utilization of an edge histogram and tensor descriptor would be too lengthy for this summary, so it is left to the reader to investigate further.

Discussion:
The author achieved promising results.  What I would like to see is the computer playing pictionary with a good deal of accuracy.  The user draws a sketch and the computer's queries get more specific and have a smaller list of results.  All in all, I personally enjoy the idea of an image-based search system.

Reading #24: Games for Sketch Data Collection (Johnson)

Comment Location:
http://ayden-kim.blogspot.com/2010/12/reading-24-games-for-sketch-data.html

Summary:
The author attempts to incorporate the fact that there are multiple ways to draw the same "picture".  A person can draw the sun or the moon in a variety of different ways.  People will draw different based on a text description.  The author introduced 2 games to collect sketches based on certain information (such as a text description) to enable future researchers obtain sketch data.  The 2 games are Picturephone and Stellasketch.  Picturephone gives the sketchers a description and allows them to draw it.  The judgment of the sketch's similarity to the original text description is rated by a 3rd party of humans.  Stellasketch is a computer version of Pictionary.

Discussion:
The author brings up some good points.  There are many ways of drawing many shapes; a stick figure can be drawn in 720 different ways, and that's a relatively simple sketch.  The author tried to make these games "engaging", meaning "fun".  I don't see that happening.  What would you rather do: play a mentally-stimulating game of Stellasketch or pick up Halo and kill people online with explosives?  The answer is a no-brainer, literally.

Reading #23: InkSeine: In Situ Search for Active Note Taking (Hinckley)

Comment Location:
https://www.blogger.com/comment.g?blogID=19209095&postID=483186139061182635&isPopup=true

Summary:
This paper presents a method of improving the note-taking process.  That can be very useful for classes--if done correctly.  One thing to keep in mind: this is a design paper, not an implementation paper.  The user presents the design and the lo-fi prototype studies, but has not created the actual system.

InkSeine will fundamentally rely on text recognition.  Since that has not been perfected, I do not foresee InkSeine being perfected until text recognition is perfected.

Discussion:
Doing this one correctly is a problem, because the optimal set of needs for note-taking differs from user to user.  There are also users who prefer pen-and-paper for note-taking (I'm one of them).  Now that I think of it, Professor Kerne has done something that is slightly similar: combinFormation.  From what I gather, this particular interface is not "fast" enough to support real-time note taking in class, escpecially at the rate some professors talk and write.  InkSeine also seems to be somewhat dependent on having the relevant materials on hand for searching (instead of searching on the Internet, for example).  InkSeine is one of the many applications that would benefit from perfect text recognition.

Reading #22: Plushie: An Interactive Design System for Plush Toys (Mori)

Comment Location:
http://ayden-kim.blogspot.com/2010/12/reading-22-plushie.html

Summary:
This is another system for creating a 3D image from 2D sketches.  Unlike Reading #21 where the 2D sketch was "inflated" into a 3D shape, Plushie sews together a bunch of 2D images into a final 3D form.  The interface has 2 windows: one for 3D editing and the other for 2D editing.  The user can edit the images in either of these windows to modify the current plushie representation shown in the 3D window.

A triangle mesh was used for the 3D modeling.  Children tested and used the system and were able to design and (maybe have this part done for them) sew a plushie toy.  The designing phase took considerably longer than the author required to design a plushie.

Discussion:
Here is yet another paper concerning the design of stuffed animals.  The interface of Plushie must indeed be simple and intuitive if children were able to use it successfully.  Personally, I wouldn't mind have a look at that code.  There is definitely some "rippable" snippets in there.

Reading #21: Teddy: A Sketching Interface for 3D Freeform Design (Igarashi)

Comment Location:
http://ayden-kim.blogspot.com/2010/12/reading-21-teddy.html

Summary:
Now this is interesting.  The user "draws" 3D objects by sketching a 2D sketch and then letting the system Teddy work its algorithm.  Teddy doesn't recognize individual shapes, such as a square or triangle.  Instead, it takes an enclosed shape and does a number of operations to it to transform the sketch into a 3D shape.  Operations include bending, painting, and extrusion; there are multiple variations of each operation, depending on the shape.  Only specialists in the author's general research areas tested the Teddy system, but they gave very positive reviews.

Discussion:
I noticed the light source differed between some of the sketches show in figure 6.  This makes me wonder if the light source is decided by Teddy or if it can be customized by the user.  Here is the future: combine this with Maya, so I can draw something and convert it to a rendered 3D object.  The farther future: scan a sketch(es) of a game character and create a 3D based on the input.  This would reduce the workload of game developers when creating characters, enemies, and levels.

Reading #20: MathPad2: A System for the Creation and Exploration of Mathematical Sketches (LaViola)

Comment Location:
http://pacocomputer.blogspot.com/2010/12/reading-20-mathpad-2-system-for.html

Summary:
The author presents an algorithm for recognizing mathematical problems and solving them.  The user sketches an equation or mathematical situation, and the MathPad2 system solves that problem.  Currently, MathPad2 cannot solve complex problems involving multiple equations by itself.  MathPad2 is currently limited to solving simple equations.  MathPad2 is a user-driven sketch recognition system using menu options and gesture options to activate functionality.

Discussion:
I did not see a results sections or a section devoted towards user evaluation; I did notice the odd feedback mini-section here and there.  I cannot help but think this system was created with minimal user feedback.  It would be interesting to see how a few Math majors (Math PhD students in particular and some Math professors) think of MathPad2.

Reading #19: Diagram Structure Recognition by Bayesian Conditional Random Fields (Qi)

Comment Location:
http://pacocomputer.blogspot.com/2010/12/reading-19-diagram-structure.html

Summary:
Bayes Theorem is a probability technique for guessing if a piece of data belongs to a particular class based on training data.  This has been applied to sketch recognition.  The algorithm also involves Markov properties, and I do not have a background in that; due to this, my explanation on the algorithm will be rather scant.  The algorithm only attempts to identify components within a diagram sketch.

The results, like all learning classifiers (and most algorithms on the planet) were not perfect.  The algorithm failed to give correct identification for all sketches.

Discussion:
The paper used some things I do not a background on so I cannot offer much in the way of discussion.  I can say the algorithm chose an approach I have not seen before and the  results were not perfect.  Still, it's a nice, math-heavy idea for a field.  It seems most fields try that route at some point.

Reading #18: Spatial Recognition and Grouping of Text and Graphics (Shilman)

Comment Location:
http://pacocomputer.blogspot.com/2010/12/reading-18-spatial-recognition-and.html

Summary:
This algorithm is somewhat similar to Reading #16.  The algorithm creates a proximity graph of each stroke.  The order of the strokes is not used in this algorithm; this potentially reduces error in the event the user drew the shape in an unusual manner.  The author improves upon the work of Viola-Jones, who "constructed a real-time face detection system using a boosted collection of simple and efficient features". 

Discussion:
The author achieved some interesting results.  The algorithm had improved results over some other algorithms when the number of recognizable shapes increased.  I'd say this algorithm is definitely worth looking into and it is beneficial to use parts of it.

Reading #17: Distinguishing Text from Graphics in On-line Handwritten Ink (Bishop)

Comment Location:
http://pacocomputer.blogspot.com/2010/12/reading-17-distinguishing-text-from.html

Summary:
This algorithm separates text from graphics.  Unlike the entropy algorithm, this algorithm employs a feature set.  The feature set include characteristics of the strokes and the relationship between strokes, such as the distance between strokes.  Time difference between strokes is calculated as well.

If I'm reading the results correctly, the algorithm demonstrated a great deal of errors.  This is not a surprise, considering some shapes look like characters (triangle looks like letter "A").

Discussion:
It's a shame I did not discover this paper before the due date of the second homework assignment.  Otherwise, I could have used some of the features to distinguish between text and non-text strokes.

Reading #16: An Efficient Graph-Based Symbol Recognizer (Lee)

Comment Location:
http://pacocomputer.blogspot.com/2010/12/reading-16-efficient-graph-based-symbol.html

Summary:
The algorithm employs a relational graph as the basis for its recognition.  For homework 1, I used relations between some lines as part of my algorithm, but I did not base my entire algorithm off the relations between strokes.  This algorithm does that.  Once the relational graph is created, the sketch is matched to the template that has the closest relational graph.  There are several ways of doing this, and the paper employs 4 of them:

"Stochastic Matching, which is based on stochastic search; Error-driven Matching, which uses local matching errors to drive the solution to an optimal match; Greedy Matching, which uses greedy search; and Sort Matching, which relies on geometric information to accelerate the matching."

A grand total of 23 different symbols were used in the testing.  The only algorithm to perform with less than a 90% accuracy was the sort type; the sort type also had the shortest computation time.  There was very little difference in the accuracy rates of the other 3 algorithms; their comptutation times differed widely though.  Stochastic took the longest to finish.

Discussion:
This algorithm is definitely viable for sketch recognition.  The grand future of sketch recognition no doubt involves an interface that recognizes and fixes up any sketch a user is making.  To encompass "any sketch", a large database is currently required and a significant amount of time to use that database is required.  The question remains, is there a way to get around that?  Is there a method of recognizing sketches that doesn't rely on a large amount of stored memory?  If not, then the cost of using that memory must be made much smaller than it is today and the computational abilities of computers must increase drastically (latter one's always happening).

Reading #15: An Image-Based, Trainable Symbol Recognizer for Hand-drawn Sketches (Kara)

Comment Location:
http://pacocomputer.blogspot.com/2010/11/reading-15-image-based-trainable-symbol.html

Summary:
The author proposes a trainable, hand-drawn symbol recognizer based on a multi-layer recognition scheme.  Binary templates are used to represent the symbols.  The author uses multiple classifiers to rank a symbol and thus increase the overall accuracy of the system.  The 4 classifiers are Hausdorff Distance, Modified Hausdorff Distance, Tanimoto Coefficient, and Yule Coefficient. 

The author discovered limitations among his shape set when he tried to compare sketches that had shapes (like arrows) differing mainly by direction, size, or some other small detail. 

Discussion:
The author realized there is currently no perfect algorithm in sketch recognition.  The idea to employ multiple recognizers is a step forward in progress.  It also increases the coding, but then again, nothing's perfect.  Maybe if the author slapped on a few more classifiers and weighted their input, the overall recognition of the symbol would increase.