+This repository contains the projects completed as a part of Udacity's [Artificial Intelligence Nanodegree](https://in.udacity.com/course/artificial-intelligence-nanodegree--nd889).
+
+## Contents
+
+### Term-1: Foundations of AI
+
+#### P1: Diagonal Sudoku Solver
+In this project, an extension of a Sudoku solving agent is developed. The project is capable of solving any Classic or Diagonal Sudoku puzzle using three ideas: Constraint Propagation, Search (DFS) and Naked-Twins Strategy.
+
+#### P2: Game Playing Agent (Isolation)
+This game-playing agent uses techniques such as Iterative Deepening, Minimax, and Alpha-Beta Pruning to compete in the game of Isolation (a two-player discrete competitive game with perfect information). The different heuristics used are then compared to find the best heuristic.
+
+#### P3: Implementing a Planning Search
+A planning agent was implemented to solve deterministic logistics-planning problems for an air cargo transport system. The underlying logic makes use of a planning graph and A* search with automatically generated heuristics. The results/performance are then compared against several uninformed non-heuristic search methods (BFS, DFS, etc.)
+
+#### P4: American Sign Language Recognizer
+HMMs (Hidden Markov Models) are used to recognize words communicated using the American Sign Language (ASL). The system is trained on a dataset of videos that have been pre-processed and annotated and then tested on novel sequences.
+> In this project, an extension of a Sudoku solving agent is developed. The project is capable of solving any Classic or Diagonal Sudoku puzzle using three ideas: Constraint Propagation, Search (DFS) and Naked-Twins Strategy.
+
+## About
+[Sudoku](https://en.wikipedia.org/wiki/Sudoku) is one of the world's most popular puzzles. It consists of a 9x9 grid, and the objective is to fill the grid with digits in such a way that each row, each column, and each of the 9 principal 3x3 sub-squares contains all of the digits from 1 to 9. The detailed rules can be found [here](http://www.conceptispuzzles.com/?uri=puzzle/sudoku/rules).
+
+This project solves classic/diagonal Sudoku puzzles using **[Constraint Propagation](https://en.wikipedia.org/wiki/Constraint_satisfaction)** and **[Search (DFS)](https://en.wikipedia.org/wiki/Search_algorithm)**. In addition to the mentioned algorithmic techniques, Sudoku-specific strategy '**[Naked-Twins](http://www.sudokudragon.com/tutorialnakedtwins.htm)**' has also been used.
+
+#### Q: How do we use constraint propagation to solve the naked-twins problem?
+- Constraint propagation works by reducing domains of variables, strengthening constraints, or creating new ones. This leads to a reduction of the search space, making it faster to use search algorithms to traverse for the solution.
+
+- The naked twins problem refers to the situation when two boxes within the same unit (row, column, square or diagonal) have the same two possible numbers that can be filled in them. When this happens, as no other number can go in those boxes, those two numbers can't go anywhere else either. This means they can be safely removed from the possibilities on any other box that belongs to the same unit.
+
+- In this process, an additional constraint can be added that allows further reduction of the possible digits that can fill the sudoku grid. This shortens the recursion towards the solution.
+
+#### Q: How do we use constraint propagation to solve the diagonal sudoku problem?
+- To solve diagonal sudoku, an additional constraint added by the addition of the two diagonals to unit-list. By constituting the necessary constraint which combined with Depth First Search & other reductions, feasible solution to the diagonal Sudoku is produced.
+
+## Requirements
+This project requires **Python 3**. It is recommended to use [Anaconda](https://www.continuum.io/downloads), a pre-packaged Python distribution that contains all of the necessary libraries and software for this project. Try using the environment provided in this folder.
+
+To see the visualization of Sudoku solving, installation of pygame is a necessity.The installation instructions are available [here](http://www.pygame.org/download.shtml).
+
+## Files
+* `solutions.py` – Driver program. Solves the Sudoku.
+
+* `sudoku.py` – Consists of a class 'Sudoku' which contains all method definitions used for solving.
+
+* `solution_test.py` – To test the solution.
+
+* `PySudoku.py` – Code for visualizing the solution.
+
+* `visualize.py` – Code for visualizing the solution.
+
+* `Classic_Sudoku.py` – Solver for classic sudoku.
+To run the code, edit the Sudoku string in `solutions.py` and run the script.
+
+
+
+## Future Improvements
+This was my favorite project of the nanodegree program. There is a lot that I would love to add to take this project and my learning experience further. Some of the ideas are:
+
+- Receiving input from a camera.
+- Extending the project to more Sudoku formats.
+- Incorporating more Sudoku solving techniques.
+
+If you want to contribute to the above ideas, feel free to reach out to me or use my code to build your own fork.
+> This game-playing agent uses techniques such as Iterative Deepening, Minimax, and Alpha-Beta Pruning to compete in the game of Isolation (a two-player discrete competitive game with perfect information). The different heuristics used are then compared to find the best heuristic.
+
+
+
+## About
+This project is an adversarial search agent to play the game 'Isolation.' Isolation is a deterministic, two-player board game of perfect information in which the players alternate turns moving a single piece from one cell to another. Whenever either player occupies a cell, that cell becomes blocked for the remainder of the game. The first player with no remaining legal moves loses, and the opponent is declared the winner.
+
+This project uses a version of Isolation where each agent is restricted to L-shaped movements (like a knight in chess) on a rectangular grid (like a chessboard). The agents can move to an open cell on the board that is 2-rows and 1-column or 2-columns and 1-row away from their current position on the board. Movements are blocked at the edges of the board (the board does not wrap around). However, the player can "jump" blocked or occupied spaces (just like a knight in chess).
+
+Additionally, agents will have a fixed time limit each turn to search for the best move and respond. If the time limit expires during a player's turn, that player forfeits the match, and the opponent wins.
+
+## Requirements
+This project requires **Python 3**. It is recommended to use [Anaconda](https://www.continuum.io/downloads), a pre-packaged Python distribution that contains all of the necessary libraries and software for this project. Try using the environment provided in this folder.
+
+### Using the Board Visualization
+The `isoviz` folder contains a modified version of `chessboard.js` that can animate games played on a 7x7 board. In order to use the board, you must run a local web server by running `python -m SimpleHTTPServer 8000` from your project directory (you can replace 8000 with another port number if that one is unavailable), then open your browser to `http://localhost:8000` and navigate to the `/isoviz/display.html` page. Enter the move history of an isolation match (i.e., the array returned by the `Board.play()` method) into the text area and run the match. Refresh the page to run a different game.
+
+## Files
+- `game_agent.py` – Contains the code for the game-playing agent (see CustomPlayer class).
+
+- `agent_test.py` - Provided by @udacity to unit test `game_agent.py` implementation.
+
+- `tournament.py` - Provided by the @udacity staff to evaluate the performance of the game-playing agent.
+
+- `heuristic_analysis.pdf` – Contains the analysis of the various heuristics implemented in the game_agent. The metrics are obtained from `tournament.py`.
+
+- `research_review.pdf` – Reviews IBM's Deep Blue's [seminal paper](https://pdfs.semanticscholar.org/ad2c/1efffcd7c3b7106e507396bdaa5fe00fa597.pdf). A detailed blog post can be read on my blog on [this link]()
+
+## Output (`tournament.py`)
+```
+*************************
+ Evaluating: ID_Improved
+*************************
+
+Playing Matches:
+----------
+ Match 1: ID_Improved vs Random Result: 15 to 5
+ Match 2: ID_Improved vs MM_Null Result: 14 to 6
+ Match 3: ID_Improved vs MM_Open Result: 16 to 4
+ Match 4: ID_Improved vs MM_Improved Result: 12 to 8
+ Match 5: ID_Improved vs AB_Null Result: 14 to 6
+ Match 6: ID_Improved vs AB_Open Result: 12 to 8
+ Match 7: ID_Improved vs AB_Improved Result: 11 to 9
+if(!R("json")){J||(K=Math.floor,O=[0,31,59,90,120,151,181,212,243,273,304,334],P=function(b,c){return O[c]+365*(b-1970)+K((b-1969+(c=+(c>1)))/4)-K((b-1901+c)/100)+K((b-1601+c)/400)});if(!(m={}.hasOwnProperty))m=function(b){var c={},a;if((c.__proto__=k,c.__proto__={toString:1},c).toString!=l)m=function(a){var b=this.__proto__,a=a in(this.__proto__=k,this);this.__proto__=b;return a};else{a=c.constructor;m=function(b){var c=(this.constructor||a).prototype;return b in this&&!(b in c&&this[b]===c[b])}}c=
+k;return m.call(this,b)};n=function(b,c){var a=0,d,j,f;(d=function(){this.valueOf=0}).prototype.valueOf=0;j=new d;for(f in j)m.call(j,f)&&a++;d=j=k;if(a)a=a==2?function(a,b){var c={},d=l.call(a)=="[object Function]",f;for(f in a)!(d&&f=="prototype")&&!m.call(c,f)&&(c[f]=1)&&m.call(a,f)&&b(f)}:function(a,b){var c=l.call(a)=="[object Function]",d,f;for(d in a)!(c&&d=="prototype")&&m.call(a,d)&&!(f=d==="constructor")&&b(d);(f||m.call(a,d="constructor"))&&b(d)};else{j=["valueOf","toString","toLocaleString",
+"propertyIsEnumerable","isPrototypeOf","hasOwnProperty","constructor"];a=function(a,b){var c=l.call(a)=="[object Function]",d;for(d in a)!(c&&d=="prototype")&&m.call(a,d)&&b(d);for(c=j.length;d=j[--c];m.call(a,d)&&b(d));}}a(b,c)};R("json-stringify")||(r={"\\":"\\\\",'"':'\\"',"\u0008":"\\b","\u000c":"\\f","\n":"\\n","\r":"\\r","\t":"\\t"},t=function(b,c){return("000000"+(c||0)).slice(-b)},u=function(b){for(var c='"',a=0,d;d=b.charAt(a);a++)c=c+('\\"\u0008\u000c\n\r\t'.indexOf(d)>-1?r[d]:r[d]=d<" "?
+> A planning agent was implemented to solve deterministic logistics-planning problems for an air cargo transport system. The underlying logic makes use of a planning graph and A* search with automatically generated heuristics. The results/performance are then compared against several uninformed non-heuristic search methods (BFS, DFS, etc.)
+
+## About
+The template code is available at https://github.com/udacity/AIND-Planning.
+
+**Reading reference:** "Artificial Intelligence: A Modern Approach" 3rd edition chapter 10 or 2nd edition chapter 11 on 'Planning,' sections:
+- The Planning Problem
+- Planning with State-Space Search
+
+available on the [AIMA book site](http://aima.cs.berkeley.edu/2nd-ed/newchap11.pdf).
+
+Given were classical PDDL (Planning Domain Definition Language) problems. All problems are in the Air Cargo domain. They have the same action schema defined, but different initial states and goals.
+
+Progression-planning problems can be solved with graph searches such as breadth-first, depth-first, and A*, where the nodes of the graph are "states" and edges are "actions." A "state" is the logical conjunction of all boolean ground "fluents," or state variables, that are possible for the problem using Propositional Logic.
+
+- **Uniformed Search Strategies:** These strategies (a.k.a., blind search) have no additional information about states beyond those provided in the problem definition. All they can do is generate successors and distinguish a goal state from a non-goal state.
+
+- **Informed (Heuristic) Search Strategies:** Informed search strategy are the ones that use problem-specific knowledge beyond the definition of the problem itself and can find solutions more efficiently than can an uninformed strategy.
+
+Both uninformed and heuristic-based search were applied to solve the problems and were then compared in an analysis. The study documents the results obtained from each search type to find an optimal solution for each air cargo problem that is; a search algorithm that finds the lowest path among all possible paths from start to goal with a suitable computational cost.
+
+- **Air Cargo Action Schema**
+ ```
+ Action(Load(c, p, a),
+ PRECOND: At(c, a) ∧ At(p, a) ∧ Cargo(c) ∧ Plane(p) ∧ Airport(a)
+ For Problem 3, the **optimal plan length is 12 actions**. Here's a sample plan that is optimal:
+
+ ```
+ Load(C1, P1, SFO)
+ Load(C2, P2, JFK)
+ Fly(P1, SFO, ATL)
+ Load(C3, P1, ATL)
+ Fly(P2, JFK, ORD)
+ Load(C4, P2, ORD)
+ Fly(P1, ATL, JFK)
+ Fly(P2, ORD, SFO)
+ Unload(C4, P2, SFO)
+ Unload(C3, P1, JFK)
+ Unload(C2, P2, SFO)
+ Unload(C1, P1, JFK)
+ ```
+
+## Requirements
+This project requires **Python 3**. It is recommended to use [Anaconda](https://www.continuum.io/downloads), a pre-packaged Python distribution that contains all of the necessary libraries and software for this project.
+
+## Files
+- `my_air_cargo_problems.py` – Air Cargo Transport code.
+- `research_review.pdf` – This one-page report highlights selected important historical developments in the field of AI planning and search and highlights the relationships between the developments and their impact on the field of AI as a whole.
+
+## Testing
+- The tests directory includes unittest test cases provided by @udacity to evaluate the implementations. All tests were passed before the project was submitted for review.
+ - All the test cases with additional context by running `python -m unittest -v`
+
+- The `run_search.py` script is for gathering metrics for various search methods on any of the problems.
+
+## Improving Execution Time
+The exercises in this project can take a long time to run (from several seconds to several hours) depending on the heuristics and search algorithms, as well as the efficiency of the code. One option to improve execution time is to try installing and using `pypy3` – a python JIT, which can accelerate execution time substantially. This is, however, untested.
+I'm grateful to @philferriere for posting his work online. His analysis on the same project helped me a lot to write this Read-Me in the current form.
+ 9- 0
Term-I – AI Foundations/03 - Cargo Planning/aimacode/LICENSE
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ 0- 0
Term-I – AI Foundations/03 - Cargo Planning/aimacode/__init__.py
+ help="Specify the indices of the search algorithms to use as a list of space separated values. Choose from: {!s}".format(list(range(1, len(SEARCHES)+1))))
+> HMMs (Hidden Markov Models) are used to recognize words communicated using the American Sign Language (ASL). The system is trained on a dataset of videos that have been pre-processed and annotated and then tested on novel sequences.
+
+## About
+The template code is available at https://github.com/udacity/AIND-Recognizer.
+
+The overall goal of this project is to build a word recognizer for American Sign Language video sequences, demonstrating the power of probabilistic models. In particular, this project employs Hidden Markov Models (HMM's) to analyze a series of measurements taken from videos of American Sign Language (ASL) collected for research ([RWTH-BOSTON-104 Database](http://www-i6.informatik.rwth-aachen.de/~dreuw/database-rwth-boston-104.php)). In this video, the right-hand 'x' and 'y' locations are plotted as the speaker signs the sentence. The raw data, train, and test sets are pre-defined.
+
+In the first part of the project, a variety of feature sets are derived. Further, in Part-2, three different model selection criterion is implemented. The objective of Model Selection is to tune the number of states for each word HMM before testing on unseen data. In three methods: Log-likelihood using cross-validation folds (CV), Bayesian Information Criterion (BIC), Discriminative Information Criterion (DIC) are explored. Finally, the recognizer is and compare the effects the different combinations of feature sets and model selection criteria.
+
+### Data Description
+The data in the `/data/` directory was derived from the RWTH-BOSTON-104 Database. The hand positions (`hand_condensed.csv`) are pulled directly from the database `boston104.handpositions.rybach-forster-dreuw-2009-09-25.full.xml.`
+
+The three markers are:
+- `0`: speaker's left hand
+- `1`: speaker's right hand
+- `2`: speaker's nose
+- `X` & `Y` values of the video frame increase left-to-right and top-to-bottom.
+
+For purposes of this project, the sentences have been pre-segmented into words based on slow motion examination of the files. These segments are provided in the `train_words.csv` and `test_words.csv` files in the form of start and end frames (inclusive).
+
+The videos in the corpus include recordings from three different ASL speakers. The mappings for the three speakers to video are included in the `speaker.csv` file.
+
+## Requirements
+This project requires **Python 3** with `NumPy`, `Pandas`, `matplotlib`, `SciPy`, `scikit-learn`, `jupyter` and `hmmlearn`.
+
+It is recommended to use [Anaconda](https://www.continuum.io/downloads), a pre-packaged Python distribution that contains all of the necessary libraries and software for this project.
+
+`hmmlearn`, Version 0.2.1, contains a bug-fix related to the log function, which is used in this project. This version can be directly from its repo with the following command:
+<h2 id="Introduction">Introduction<a class="anchor-link" href="#Introduction">¶</a></h2><p>The overall goal of this project is to build a word recognizer for American Sign Language video sequences, demonstrating the power of probabalistic models. In particular, this project employs <a href="https://en.wikipedia.org/wiki/Hidden_Markov_model">hidden Markov models (HMM's)</a> to analyze a series of measurements taken from videos of American Sign Language (ASL) collected for research (see the <a href="http://www-i6.informatik.rwth-aachen.de/~dreuw/database-rwth-boston-104.php">RWTH-BOSTON-104 Database</a>). In this video, the right-hand x and y locations are plotted as the speaker signs the sentence.
+<p>The raw data, train, and test sets are pre-defined. You will derive a variety of feature sets (explored in Part 1), as well as implement three different model selection criterion to determine the optimal number of hidden states for each word model (explored in Part 2). Finally, in Part 3 you will implement the recognizer and compare the effects the different combinations of feature sets and model selection criteria.</p>
+<p>At the end of each Part, complete the submission cells with implementations, answer all questions, and pass the unit tests. Then submit the completed notebook for review!</p>
+<h2 id="PART-1:-Data">PART 1: Data<a class="anchor-link" href="#PART-1:-Data">¶</a></h2><h3 id="Features-Tutorial">Features Tutorial<a class="anchor-link" href="#Features-Tutorial">¶</a></h3><h5 id="Load-the-initial-database">Load the initial database<a class="anchor-link" href="#Load-the-initial-database">¶</a></h5><p>A data handler designed for this database is provided in the student codebase as the <code>AslDb</code> class in the <code>asl_data</code> module. This handler creates the initial <a href="http://pandas.pydata.org/pandas-docs/stable/">pandas</a> dataframe from the corpus of data included in the <code>data</code> directory as well as dictionaries suitable for extracting data in a format friendly to the <a href="https://hmmlearn.readthedocs.io/en/latest/">hmmlearn</a> library. We'll use those to create models in Part 2.</p>
+<p>To start, let's set up the initial database and select an example set of features for the training set. At the end of Part 1, you will create additional feature sets for experimentation.</p>
+<span class="n">asl</span><span class="o">.</span><span class="n">df</span><span class="o">.</span><span class="n">head</span><span class="p">()</span> <span class="c1"># displays the first five rows of the asl database, indexed by video and frame</span>
+<div class=" highlight hl-ipython3"><pre><span></span><span class="n">asl</span><span class="o">.</span><span class="n">df</span><span class="o">.</span><span class="n">ix</span><span class="p">[</span><span class="mi">98</span><span class="p">,</span><span class="mi">1</span><span class="p">]</span> <span class="c1"># look at the data available for an individual frame</span>
+<h5 id="Feature-selection-for-training-the-model">Feature selection for training the model<a class="anchor-link" href="#Feature-selection-for-training-the-model">¶</a></h5><p>The objective of feature selection when training a model is to choose the most relevant variables while keeping the model as simple as possible, thus reducing training time. We can use the raw features already provided or derive our own and add columns to the pandas dataframe <code>asl.df</code> for selection. As an example, in the next cell a feature named <code>'grnd-ry'</code> is added. This feature is the difference between the right-hand y value and the nose y value, which serves as the "ground" right y value.</p>
+<span class="n">asl</span><span class="o">.</span><span class="n">df</span><span class="o">.</span><span class="n">head</span><span class="p">()</span> <span class="c1"># the new feature 'grnd-ry' is now in the frames dictionary</span>
+<span class="c1"># TODO add df columns for 'grnd-rx', 'grnd-ly', 'grnd-lx' representing differences between hand and nose locations</span>
+<h5 id="Build-the-training-set">Build the training set<a class="anchor-link" href="#Build-the-training-set">¶</a></h5><p>Now that we have a feature list defined, we can pass that list to the <code>build_training</code> method to collect the features for all the words in the training set. Each word in the training set has multiple examples from various videos. Below we can see the unique words that have been loaded into the training set:</p>
+<p>The training data in <code>training</code> is an object of class <code>WordsData</code> defined in the <code>asl_data</code> module. in addition to the <code>words</code> list, data can be accessed with the <code>get_all_sequences</code>, <code>get_all_Xlengths</code>, <code>get_word_sequences</code>, and <code>get_word_Xlengths</code> methods. We need the <code>get_word_Xlengths</code> method to train multiple sequences with the <code>hmmlearn</code> library. In the following example, notice that there are two lists; the first is a concatenation of all the sequences(the X portion) and the second is a list of the sequence lengths (the Lengths portion).</p>
+<h6 id="More-feature-sets">More feature sets<a class="anchor-link" href="#More-feature-sets">¶</a></h6><p>So far we have a simple feature set that is enough to get started modeling. However, we might get better results if we manipulate the raw values a bit more, so we will go ahead and set up some other options now for experimentation later. For example, we could normalize each speaker's range of motion with grouped statistics using <a href="http://pandas.pydata.org/pandas-docs/stable/api.html#api-dataframe-stats">Pandas stats</a> functions and <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html">pandas groupby</a>. Below is an example for finding the means of all speaker subgroups.</p>
+<p>To select a mean that matches by speaker, use the pandas <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.map.html">map</a> method:</p>
+<h3 id="Features-Implementation-Submission">Features Implementation Submission<a class="anchor-link" href="#Features-Implementation-Submission">¶</a></h3><p>Implement four feature sets and answer the question that follows.</p>
+<ul>
+<li><p>normalized Cartesian coordinates</p>
+<ul>
+<li>use <em>mean</em> and <em>standard deviation</em> statistics and the <a href="https://en.wikipedia.org/wiki/Standard_score">standard score</a> equation to account for speakers with different heights and arm length</li>
+</ul>
+</li>
+<li><p>polar coordinates</p>
+<ul>
+<li>calculate polar coordinates with <a href="https://en.wikipedia.org/wiki/Polar_coordinate_system#Converting_between_polar_and_Cartesian_coordinates">Cartesian to polar equations</a></li>
+<li>use the <a href="https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.arctan2.html">np.arctan2</a> function and <em>swap the x and y axes</em> to move the $0$ to $2\pi$ discontinuity to 12 o'clock instead of 3 o'clock; in other words, the normal break in radians value from $0$ to $2\pi$ occurs directly to the left of the speaker's nose, which may be in the signing area and interfere with results. By swapping the x and y axes, that discontinuity move to directly above the speaker's head, an area not generally used in signing.</li>
+</ul>
+</li>
+<li><p>delta difference</p>
+<ul>
+<li>as described in Thad's lecture, use the difference in values between one frame and the next frames as features</li>
+<li>pandas <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.diff.html">diff method</a> and <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html">fillna method</a> will be helpful for this one</li>
+</ul>
+</li>
+<li><p>custom features</p>
+<ul>
+<li>These are your own design; combine techniques used above or come up with something else entirely. We look forward to seeing what you come up with!
+Some ideas to get you started:<ul>
+<li>normalize using a <a href="https://en.wikipedia.org/wiki/Feature_scaling">feature scaling equation</a></li>
+<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># TODO add features for normalized by speaker values of left, right, x, y</span>
+<span class="c1"># Name these 'norm-rx', 'norm-ry', 'norm-lx', and 'norm-ly'</span>
+<span class="c1"># using Z-score scaling (X-Xmean)/Xstd</span>
+<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># TODO add features for polar coordinate values where the nose is the origin</span>
+<span class="c1"># Name these 'polar-rr', 'polar-rtheta', 'polar-lr', and 'polar-ltheta'</span>
+<span class="c1"># Note that 'polar-rr' and 'polar-rtheta' refer to the radius and angle</span>
+<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># TODO add features for left, right, x, y differences by one time step, i.e. the "delta" values discussed in the lecture</span>
+<span class="c1"># Name these 'delta-rx', 'delta-ry', 'delta-lx', and 'delta-ly'</span>
+<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># TODO add features of your own design, which may be a combination of the above or something else</span>
+<span class="c1"># Name these whatever you would like</span>
+
+<span class="c1"># TODO define a list named 'features_custom' for building the training set</span>
+
+<span class="c1"># Need to calculate mean and std again to include 'polar-rr' and 'polar-lr'</span>
+<p><strong>Question 1:</strong> What custom features did you choose for the features_custom set and why?</p>
+<p><strong>Answer 1:</strong> Added new features - <em>norm-polar-lr</em> & <em>norm-polar-rr</em> which normalize the polar radius to account for differences in arm length.</p>
+<p>Also, as noted from the lectures, delta between values in consecutive time steps gives us an indication of speed. Hence, delta of normalized data might be a good metric.</p>
+<h3 id="Features-Unit-Testing">Features Unit Testing<a class="anchor-link" href="#Features-Unit-Testing">¶</a></h3><p>Run the following unit tests as a sanity check on the defined "ground", "norm", "polar", and 'delta"
+feature sets. The test simply looks for some valid values but is not exhaustive. However, the project should not be submitted if these tests don't pass.</p>
+<h2 id="PART-2:-Model-Selection">PART 2: Model Selection<a class="anchor-link" href="#PART-2:-Model-Selection">¶</a></h2><h3 id="Model-Selection-Tutorial">Model Selection Tutorial<a class="anchor-link" href="#Model-Selection-Tutorial">¶</a></h3><p>The objective of Model Selection is to tune the number of states for each word HMM prior to testing on unseen data. In this section you will explore three methods:</p>
+<ul>
+<li>Log likelihood using cross-validation folds (CV)</li>
+<li>Bayesian Information Criterion (BIC)</li>
+<li>Discriminative Information Criterion (DIC) </li>
+<h5 id="Train-a-single-word">Train a single word<a class="anchor-link" href="#Train-a-single-word">¶</a></h5><p>Now that we have built a training set with sequence data, we can "train" models for each word. As a simple starting example, we train a single word using Gaussian hidden Markov models (HMM). By using the <code>fit</code> method during training, the <a href="https://en.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm">Baum-Welch Expectation-Maximization</a> (EM) algorithm is invoked iteratively to find the best estimate for the model <em>for the number of hidden states specified</em> from a group of sample seequences. For this example, we <em>assume</em> the correct number of hidden states is 3, but that is just a guess. How do we know what the "best" number of states for training is? We will need to find some model selection technique to choose the best parameter.</p>
+<span class="nb">print</span><span class="p">(</span><span class="s2">"Number of states trained in model for </span><span class="si">{}</span><span class="s2"> is </span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">demoword</span><span class="p">,</span> <span class="n">model</span><span class="o">.</span><span class="n">n_components</span><span class="p">))</span>
+<p>The HMM model has been trained and information can be pulled from the model, including means and variances for each feature and hidden state. The <a href="http://math.stackexchange.com/questions/892832/why-we-consider-log-likelihood-instead-of-likelihood-in-gaussian-distribution">log likelihood</a> for any individual sample or group of samples can also be calculated with the <code>score</code> method.</p>
+ <span class="nb">print</span><span class="p">(</span><span class="s2">"Number of states trained in model for </span><span class="si">{}</span><span class="s2"> is </span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="n">model</span><span class="o">.</span><span class="n">n_components</span><span class="p">))</span>
+<h5 id="Try-it!">Try it!<a class="anchor-link" href="#Try-it!">¶</a></h5><p>Experiment by changing the feature set, word, and/or num_hidden_states values in the next cell to see changes in values.</p>
+<h5 id="Visualize-the-hidden-states">Visualize the hidden states<a class="anchor-link" href="#Visualize-the-hidden-states">¶</a></h5><p>We can plot the means and variances for each state and feature. Try varying the number of states trained for the HMM model and examine the variances. Are there some models that are "better" than others? How can you tell? We would like to hear what you think in the classroom online.</p>
+<h5 id="ModelSelector-class">ModelSelector class<a class="anchor-link" href="#ModelSelector-class">¶</a></h5><p>Review the <code>ModelSelector</code> class from the codebase found in the <code>my_model_selectors.py</code> module. It is designed to be a strategy pattern for choosing different model selectors. For the project submission in this section, subclass <code>SelectorModel</code> to implement the following model selectors. In other words, you will write your own classes/functions in the <code>my_model_selectors.py</code> module and run them from this notebook:</p>
+<ul>
+<li><code>SelectorCV</code>: Log likelihood with CV</li>
+<li><code>SelectorBIC</code>: BIC </li>
+<li><code>SelectorDIC</code>: DIC</li>
+</ul>
+<p>You will train each word in the training set with a range of values for the number of hidden states, and then score these alternatives with the model selector, choosing the "best" according to each strategy. The simple case of training with a constant value for <code>n_components</code> can be called using the provided <code>SelectorConstant</code> subclass as follow:</p>
+<span class="n">training</span> <span class="o">=</span> <span class="n">asl</span><span class="o">.</span><span class="n">build_training</span><span class="p">(</span><span class="n">features_ground</span><span class="p">)</span> <span class="c1"># Experiment here with different feature sets defined in part 1</span>
+<span class="n">word</span> <span class="o">=</span> <span class="s1">'VEGETABLE'</span> <span class="c1"># Experiment here with different words</span>
+<span class="nb">print</span><span class="p">(</span><span class="s2">"Number of states trained in model for </span><span class="si">{}</span><span class="s2"> is </span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="n">model</span><span class="o">.</span><span class="n">n_components</span><span class="p">))</span>
+<h5 id="Cross-validation-folds">Cross-validation folds<a class="anchor-link" href="#Cross-validation-folds">¶</a></h5><p>If we simply score the model with the Log Likelihood calculated from the feature sequences it has been trained on, we should expect that more complex models will have higher likelihoods. However, that doesn't tell us which would have a better likelihood score on unseen data. The model will likely be overfit as complexity is added. To estimate which topology model is better using only the training data, we can compare scores using cross-validation. One technique for cross-validation is to break the training set into "folds" and rotate which fold is left out of training. The "left out" fold scored. This gives us a proxy method of finding the best model to use on "unseen data". In the following example, a set of word sequences is broken into three folds using the <a href="http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html">scikit-learn Kfold</a> class object. When you implement <code>SelectorCV</code>, you will use this technique.</p>
+<span class="n">training</span> <span class="o">=</span> <span class="n">asl</span><span class="o">.</span><span class="n">build_training</span><span class="p">(</span><span class="n">features_ground</span><span class="p">)</span> <span class="c1"># Experiment here with different feature sets</span>
+<span class="n">word</span> <span class="o">=</span> <span class="s1">'VEGETABLE'</span> <span class="c1"># Experiment here with different words</span>
+<p><strong>Tip:</strong> In order to run <code>hmmlearn</code> training using the X,lengths tuples on the new folds, subsets must be combined based on the indices given for the folds. A helper utility has been provided in the <code>asl_utils</code> module named <code>combine_sequences</code> for this purpose.</p>
+<h5 id="Scoring-models-with-other-criterion">Scoring models with other criterion<a class="anchor-link" href="#Scoring-models-with-other-criterion">¶</a></h5><p>Scoring model topologies with <strong>BIC</strong> balances fit and complexity within the training set for each word. In the BIC equation, a penalty term penalizes complexity to avoid overfitting, so that it is not necessary to also use cross-validation in the selection process. There are a number of references on the internet for this criterion. These <a href="http://www2.imm.dtu.dk/courses/02433/doc/ch6_slides.pdf">slides</a> include a formula you may find helpful for your implementation.</p>
+<p>The advantages of scoring model topologies with <strong>DIC</strong> over BIC are presented by Alain Biem in this <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.58.6208&rep=rep1&type=pdf">reference</a> (also found <a href="https://pdfs.semanticscholar.org/ed3d/7c4a5f607201f3848d4c02dd9ba17c791fc2.pdf">here</a>). DIC scores the discriminant ability of a training set for one word against competing words. Instead of a penalty term for complexity, it provides a penalty if model liklihoods for non-matching words are too similar to model likelihoods for the correct word in the word set.</p>
+<h3 id="Model-Selection-Implementation-Submission">Model Selection Implementation Submission<a class="anchor-link" href="#Model-Selection-Implementation-Submission">¶</a></h3><p>Implement <code>SelectorCV</code>, <code>SelectorBIC</code>, and <code>SelectorDIC</code> classes in the <code>my_model_selectors.py</code> module. Run the selectors on the following five words. Then answer the questions about your results.</p>
+<p><strong>Tip:</strong> The <code>hmmlearn</code> library may not be able to train or score all models. Implement try/except contructs as necessary to eliminate non-viable models from consideration.</p>
+<span class="n">training</span> <span class="o">=</span> <span class="n">asl</span><span class="o">.</span><span class="n">build_training</span><span class="p">(</span><span class="n">features_ground</span><span class="p">)</span> <span class="c1"># Experiment here with different feature sets defined in part 1</span>
+<span class="n">training</span> <span class="o">=</span> <span class="n">asl</span><span class="o">.</span><span class="n">build_training</span><span class="p">(</span><span class="n">features_ground</span><span class="p">)</span> <span class="c1"># Experiment here with different feature sets defined in part 1</span>
+<span class="n">training</span> <span class="o">=</span> <span class="n">asl</span><span class="o">.</span><span class="n">build_training</span><span class="p">(</span><span class="n">features_ground</span><span class="p">)</span> <span class="c1"># Experiment here with different feature sets defined in part 1</span>
+<p><strong>Question 2:</strong> Compare and contrast the possible advantages and disadvantages of the various model selectors implemented.</p>
+<p><strong>Answer 2:</strong> Selection using cross validation is the easier to implement compared to BIC & DIC. The idea in BIC to penalize model complexity is interesting as it may also have a side effect of faster running times. Yet, I also think it might be a doubled edged sword, as in some cases it will sacrifice complexity for performance. <strong>DIC seems like the best selector</strong> because it will use the model that gives the best distinction between the word of interest and all others. This should result in the best performing model but also the most sluggish as it needs to calculate the probabilities of each word.</p>
+<p>A more formal listing of advantages & disadvantages is depicted below.</p>
+<h3 id="Cross-Validation:">Cross Validation:<a class="anchor-link" href="#Cross-Validation:">¶</a></h3><p><strong>Advantages:</strong> Does not need a lot of data for training – as the train data is folded to simulate the behavior that the model will have in test data.</p>
+<p><strong>Disadvantages:</strong> It needs to split the sequences from the beginning each time a new state will be evaluated (overhead).</p>
+<h3 id="BIC:">BIC:<a class="anchor-link" href="#BIC:">¶</a></h3><p><strong>Advantages:</strong> It penalizes model's complexity (of parameters).</p>
+<p><strong>Disadvantages</strong> Not as accurate as cross-validation – this could be possible because of the requirement of more training data (as there are no folds to simulate test data).</p>
+<h3 id="DIC:">DIC:<a class="anchor-link" href="#DIC:">¶</a></h3><p><strong>Advantages:</strong> Better performance than BIC.</p>
+<p><strong>Disadvantages:</strong> Model complexity is not penalized which might lead to a large number of parameters.</p>
+<h3 id="Model-Selector-Unit-Testing">Model Selector Unit Testing<a class="anchor-link" href="#Model-Selector-Unit-Testing">¶</a></h3><p>Run the following unit tests as a sanity check on the implemented model selectors. The test simply looks for valid interfaces but is not exhaustive. However, the project should not be submitted if these tests don't pass.</p>
+<h2 id="PART-3:-Recognizer">PART 3: Recognizer<a class="anchor-link" href="#PART-3:-Recognizer">¶</a></h2><p>The objective of this section is to "put it all together". Using the four feature sets created and the three model selectors, you will experiment with the models and present your results. Instead of training only five specific words as in the previous section, train the entire set with a feature set and model selector strategy.</p>
+<h3 id="Recognizer-Tutorial">Recognizer Tutorial<a class="anchor-link" href="#Recognizer-Tutorial">¶</a></h3><h5 id="Train-the-full-training-set">Train the full training set<a class="anchor-link" href="#Train-the-full-training-set">¶</a></h5><p>The following example trains the entire set with the example <code>features_ground</code> and <code>SelectorConstant</code> features and model selector. Use this pattern for you experimentation and final submission cells.</p>
+<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># autoreload for automatically reloading changes made in my_model_selectors and my_recognizer</span>
+ <span class="n">training</span> <span class="o">=</span> <span class="n">asl</span><span class="o">.</span><span class="n">build_training</span><span class="p">(</span><span class="n">features</span><span class="p">)</span> <span class="c1"># Experiment here with different feature sets defined in part 1</span>
+<h5 id="Load-the-test-set">Load the test set<a class="anchor-link" href="#Load-the-test-set">¶</a></h5><p>The <code>build_test</code> method in <code>ASLdb</code> is similar to the <code>build_training</code> method already presented, but there are a few differences:</p>
+<ul>
+<li>the object is type <code>SinglesData</code> </li>
+<li>the internal dictionary keys are the index of the test word rather than the word itself</li>
+<li>the getter methods are <code>get_all_sequences</code>, <code>get_all_Xlengths</code>, <code>get_item_sequences</code> and <code>get_item_Xlengths</code></li>
+<h3 id="Recognizer-Implementation-Submission">Recognizer Implementation Submission<a class="anchor-link" href="#Recognizer-Implementation-Submission">¶</a></h3><p>For the final project submission, students must implement a recognizer following guidance in the <code>my_recognizer.py</code> module. Experiment with the four feature sets and the three model selection methods (that's 12 possible combinations). You can add and remove cells for experimentation or run the recognizers locally in some other way during your experiments, but retain the results for your discussion. For submission, you will provide code cells of <strong>only three</strong> interesting combinations for your discussion (see questions below). At least one of these should produce a word error rate of less than 60%, i.e. WER < 0.60 .</p>
+<p><strong>Tip:</strong> The hmmlearn library may not be able to train or score all models. Implement try/except contructs as necessary to eliminate non-viable models from consideration.</p>
+<p><strong>Question 3:</strong> Summarize the error results from three combinations of features and model selectors. What was the "best" combination and why? What additional information might we use to improve our WER? For more insight on improving WER, take a look at the introduction to Part 4.</p>
+<p><strong>Answer 3:</strong></p>
+<table>
+<thead><tr>
+<th><strong>Model</strong></th>
+<th><strong>Features</strong></th>
+<th><strong>WER</strong></th>
+<th><strong>Correct</strong></th>
+<th><strong>Incorrect</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>SelectorCV</td>
+<td>features_polar</td>
+<td>0.5280898876404494</td>
+<td>84</td>
+<td>94</td>
+</tr>
+<tr>
+<td>SelectorDIC</td>
+<td>features_polar</td>
+<td>0.5337078651685393</td>
+<td>83</td>
+<td>95</td>
+</tr>
+<tr>
+<td>SelectorCV</td>
+<td>features_norm</td>
+<td>0.6067415730337079</td>
+<td>70</td>
+<td>108</td>
+</tr>
+</tbody>
+</table>
+<p>The best results were obtained using the <em>SelectorCV</em> model selector which makes efficient use of the training data by folding it and attempting to simulate how the model would behave on test data. However, my preferred model DIC doesn't lag much. <em>features_polar</em> produced the lowest WER (with both <em>SelectorCV & SelectorDIC</em> model). This could be improved by computing the probability of a word being next to another using statistical language models (or other NLP techniques).</p>
+<h3 id="Recognizer-Unit-Tests">Recognizer Unit Tests<a class="anchor-link" href="#Recognizer-Unit-Tests">¶</a></h3><p>Run the following unit tests as a sanity check on the defined recognizer. The test simply looks for some valid values but is not exhaustive. However, the project should not be submitted if these tests don't pass.</p>
+<h2 id="PART-4:-(OPTIONAL)--Improve-the-WER-with-Language-Models">PART 4: (OPTIONAL) Improve the WER with Language Models<a class="anchor-link" href="#PART-4:-(OPTIONAL)--Improve-the-WER-with-Language-Models">¶</a></h2><p>We've squeezed just about as much as we can out of the model and still only get about 50% of the words right! Surely we can do better than that. Probability to the rescue again in the form of <a href="https://en.wikipedia.org/wiki/Language_model">statistical language models (SLM)</a>. The basic idea is that each word has some probability of occurrence within the set, and some probability that it is adjacent to specific other words. We can use that additional information to make better choices.</p>
+<h5 id="Additional-reading-and-resources">Additional reading and resources<a class="anchor-link" href="#Additional-reading-and-resources">¶</a></h5><ul>
+<li><a href="https://web.stanford.edu/class/cs124/lec/languagemodeling.pdf">Introduction to N-grams (Stanford Jurafsky slides)</a></li>
+<li><a href="https://www-i6.informatik.rwth-aachen.de/publications/download/154/Dreuw--2007.pdf">Speech Recognition Techniques for a Sign Language Recognition System, Philippe Dreuw et al</a> see the improved results of applying LM on <em>this</em> data!</li>
+<li><a href="ftp://wasserstoff.informatik.rwth-aachen.de/pub/rwth-boston-104/lm/">SLM data for <em>this</em> ASL dataset</a></li>
+</ul>
+<h5 id="Optional-challenge">Optional challenge<a class="anchor-link" href="#Optional-challenge">¶</a></h5><p>The recognizer you implemented in Part 3 is equivalent to a "0-gram" SLM. Improve the WER with the SLM data provided with the data set in the link above using "1-gram", "2-gram", and/or "3-gram" statistics. The <code>probabilities</code> data you've already calculated will be useful and can be turned into a pandas DataFrame if desired (see next cell).<br>
+Good luck! Share your results with the class!</p>