|
Post by Will R (admin) on May 14, 2014 16:49:12 GMT
Consumer-grade camcorders - as far as my research shows - do not get above 60 frames per second. This is not enough to differentiate some phonetic characteristics, like labial-velars. My questions: - what frame-rate would be useful for diagnosing distinctions between phoneme sequences, co-articulations, voicing onset, etc.?
- What articulations could benefit from such video analysis? I'd like to develop a fairly robust list
|
|
|
Post by y-cult on May 14, 2014 19:33:25 GMT
You have not defined consumer-grade...
|
|
|
Post by z-cult on May 14, 2014 21:59:51 GMT
|
|
|
Post by Will R (admin) on May 14, 2014 23:55:18 GMT
z-cult Guest, my research has been looking at a lot of Canon, Panasonic, and Sony spec sheets and web-pages, and calling a couple tech reps of the same. I haven't written much down, let alone written anything up, because the answers are nearly always the same: 24, 30, 60 fps. The companies themselves split their product lines into consumer and professional. But these facts are off-track for the question at hand. My question is, how many milliseconds measure the time between, for example, the labial and the velar articulations in a labial-velar phoneme? I've decided to stop asking manufacturers about their fps until I know better what kinds of timing a phoneticist is interested in.
|
|
|
Post by Mike on May 15, 2014 16:55:27 GMT
1) Maddieson's electromagnetic articulography study in 1993 seemed to show that the non-overlap part of a labial-velar were about 10 msec (that is, the velar closure came about 10 msec before the labial one, and the labial closure persisted about 10 msec after the velar was released). Approximately, eyeballing his graph.
2) My own attempts to record the difference between Nkp and Nmkp foundered because I only had a 30 fps camera. It appeared that again, what I was trying to measure was a speech event maybe 10-20 msec. A 30 fps camera has frames every 30 msec or so.
So, you may want to estimate otherwise on the basis of the above, but I'd need a speed that could capture the beginning, fairly accurate duration, and end of a 10 msec event. 10 msec between frames would be 100 fps, but double that may be required for more confidence.
|
|
|
Post by Will R (admin) on May 19, 2014 15:28:41 GMT
Mike, I remember when you were attempting the Nkp/Nmkp business, and your dissatisfaction with standare 'consumer grade' equipment. That experience is a direct motivator for my query. So if I understand what you've given me: - the velar part of a labial-velar articulation closes at .000 sec
- the labial part of a labial-velar articulation closes at .010 sec
- the velar part of a labial-velar articulation releases at x sec
- the labial part of a labial-velar articulation releases at x + .010 sec
where x + .010 sec is the duration of the articulation.
Let's say Frame 1 captures the start of the articulation at .0001 seconds. To capter the start of the labial portion in a separate frame, we need the next frame capture to start at .0100 seconds. Then to capture the releases, we need separate frames at x seconds and x + .010 seconds.
So if it were just about distinguishing the closures, yes, 100 fps would be the minimum speed necessary. And even so if we are distinguishing the releases. But to look at the whole package - closures and releases, we need to know x, the duration of the articulation, especially if it is less than .010 seconds. Can you find a figure for x?
|
|
|
Post by Mike on May 20, 2014 19:17:41 GMT
X would be almost the total duration of the closure, which would be about 70-80 ms for a stop (kp or gb) between vowels. However, the velar release would not be visible, since the lips are still closed at that point. For a nasal, you should theoretically be able to tell from the spectrogram when the velar releases. (though I'm not very good at that...)
Having Nkp or Nmkp adds to the complexity.
|
|
|
Post by Will R (admin) on May 21, 2014 15:35:07 GMT
Thanks, Mike, very helpful. So to put all that in place: - the velar part of a labial-velar articulation closes at .000 sec
- the labial part of a labial-velar articulation closes at .010 sec
- the velar part of a labial-velar articulation releases at .070 sec (at Mike's fastest time)
- the labial part of a labial-velar articulation releases at .080 sec (at Mike's fastest time)
Let's say Frame 1 captures the start of the articulation at .0001 seconds. To capture the start of the labial portion in a separate frame, we need the next frame capture to start at .0100 seconds. Then to capture the releases, we need separate frames starting at .0700 seconds and .0800 seconds.
As said earlier 100 fps would be the minimum speed necessary for distinguishing closures and releases. If that rate carries through, then frame 1 captures the start of the articulation (velic closure), frame 2 captures labial closure, frame 8 would capture any video evidence of the velar release (a question on that later), and frame 9 should capture the labial release.
Next question: are there larengial/pharengial/uvular clues to the velar release that would show up on video, either from a front view or side view? Next query after that: since these are the theoretical values, part of what might be sought would be experimental proof of their validity. With that in mind, I would agree that 100 fps really is insufficient. We need to be able to benchmark timings at a faster rate. I suppose one could calculate mathematically how fine-grained the data must be for statistical validity. Any takers?
|
|
|
Post by z-cult on May 21, 2014 15:56:38 GMT
To call these cues, I think means that people are paying attention to them with some communicative impact. However, even if there is no communicative impact there might be physiological responses to the articulatory gesture.
|
|
|
Post by Y-Cult on May 21, 2014 16:16:46 GMT
If 1/1000sec 1 millisecond is fast enough then you might consider a Casio EX-FC100 ($170) Wikipedia [http://en.wikipedia.org/wiki/Casio_Exilim#High-speed_photography] says: Some cameras allow high-speed photography. The EX-FC100 and EX-FS10 allow taking short bursts of 30 pictures per second and shooting video up to 1000 frames per second, the EX-FH20 offers bursts of 40 pictures per second and 1000 frame/s video, and the EX-F1 offers bursts of 60 pictures per second and video of 1200 frame/s. However, the resolution of the video decreases drastically with increasing speed; in case of EX-F1, 300 frame/s are at 512x384 pixels, 600 frame/s at 432x192, and 1200 frame/s at 336x96. The burst shots are at full resolution.[3] The EX-FC100 records 480x360 at 210 frame/s, 224x168 at 420 frame/s, and 224x64 at 1,000 frame/s.[4] The Casio EX-FH25 is able to shoot at up to 1,000 frame/s at 224x64. The drop in resolution seems to be a common feature of how the CCD sensors are designed. - Not to say there aren't better sensors out there see discussion here: en.wikipedia.org/wiki/High-speed_photography.
|
|
|
Post by Coleen on May 21, 2014 18:55:46 GMT
Your question concerning laryngeal/pharyngeal/uvular clues to a *velar* release is interesting. I think you need to separate out uvular from laryngeal/pharyngeal as uvular is still a sound associated with the oral cavity, though. It would have similar "issues" as a velar one, namely whether it is possible to ascertain from visual evidence that a [+back] sound has been articulated. Though I don't know of any studies of the sort, perhaps it is possible to perceive muscular movement in the neck from a front or side view. I just checked in the mirror and I can clearly see movement of muscles of the underside of the tongue in the underpart of the chin for [k] that are not visible for because the tongue is in position for the vowel that follows. So, theoretically, if the camera were sensitive enough and the person's muscle movements visible, you'd be able to perceive closure and potentially release of a velar sound. But this is really not the pharynx or the larynx that is involved in that movement, per se. Mike, who knows a lot more than I do, about the articulation and acoustics of labial-velars might be able to add more about just how visible the articulation of egressive ones are. I'm thinking that ones that are more ingressive may involve a downward movement of the larynx that might be visible. What do you think, Mike? However, you'd have to be careful with laryngeal movements because you might be seeing the release into the vowel for languages with +ATR vowel harmony where the movement of the aryepiglottal folds (which produces a sphinctering effect at the larynx for -ATR vowel, retracting the tongue root and often raising the larynx and the inverse often being true for +ATR, a lowering of the larynx) This movement is independent of vowel height and certainly of any consonant. The movement of the larynx is visible especially in men with pronounced Adam's apples.
|
|
|
Post by Coleen on May 21, 2014 19:03:06 GMT
Gee, all kinds of errors in my post. 1. no "though" at the end of the first sentence. 2. "that are not visible for because..." for the third sentence. Second to last sentence should read "However, you'd have to be careful with laryngeal movements because you might be seeing the release into the vowel for languages with +ATR vowel harmony where the movement of the aryepiglottal folds produces a sphinctering effect at the larynx for -ATR vowels, retracting the tongue root and often raising the larynx. The inverse is often true for +ATR vowels: they lower the larynx."
|
|