最近在看 Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
這篇論文的時候發現了裡面提到Stochastic “Hard” Attention vs Deterministic “Soft” Attention
雖然大致上有寫出是什麼意思
但是還是稍微上網查了一下
Soft means differentiable.
For example a sigmoid function is soft because it is differentiable.
A softmax function is named soft because it is differentiable and is actually used in soft attention to compute attention probabilities over the input stimuli such as an image or audio.
Soft also means that the function varies somewhat smoothly over most of its domain.
Hard means non-differentiable.
For example, the heaviside step function is hard because it only swings between on and off states and has literally zero gradients everywhere else except at the origin where the derivative is basically an impulse function.
Hard attention uses this switch like mechanism to determine whether to attend to a region or not over the stimuli such as an image or audio.
Hard also means the function has many abrupt changes over most of its domain. Like the way the heaviside step function as a jump discontinuity at the origin.
所以說Soft是可微的,Hard是不可微的
大概是這樣
留言列表