Sign language synthesis is a useful tool in addressing many of the issues faced by deaf communities. Sign languages are as different from spoken languages as spoken languages are from each other, and hence deaf persons raised learning sign language are not automatically proficient in communicating in written language. Existing methods of generating signing avatars are clunky and often unintuitive; hence the ability to classify gestures common to sign language using only data recorded by video would simplify the process dramatically.

Methods of gesture classification require a way to compare time series, and often (in particular, for k-means clustering) require a notion of "average" or "mean". However, computing the average of a collection of time series is difficult. Time series infer no meaning from the index of a particular frame; only the order, and not the time index, of features confer meaning. Dynamic time warping was developed as a similarity measure between time series, but does not in itself provide a method of averaging. Recently, a method of averaging called DTW Barycenter Averaging (DBA) was developed that is consistent with dynamic time warping. This method produces results suitable for classification and clustering of time series data, and is based on minimizing the within group sum of squares (WGSS) of the data.

Because dynamic time warping is time scale invariant, the average is not unique; other warpings of an average may also be averages. We propose a modification to DBA that allows for more flexibility in choosing the time scale of the resulting average. Time penalized DBA (TBA) adds a cooling regularization term to WGSS, making the problem well-posed. The regularization term penalizes the amount of total warping between the average and each other time series; hence features in the average appear closer to the average time at which they appear in the collection. We cool the regularization term to prevent it from altering the solution in undesirable ways.

Time penalized DBA is an effective method to average a collection both spatially and temporally, and also reduces the algorithm's sensitivity to initial guess. Unfortunately, the extra parameters it requires make its use more complicated. We will show for a selection of parameters that TBA performs favorably over classical DBA on both artificial signals and on data captured from videos of signs from American Sign Language.

Publication Date


Document Type


Student Type


Degree Name

Applied and Computational Mathematics (MS)

Department, Program, or Center

School of Mathematical Sciences (COS)


Nathan Cahill

Advisor/Committee Member

Matthew Coppenbarger

Advisor/Committee Member

Raluca Felea


RIT – Main Campus