Soft,Hard 在機器學習表示什麼 @ 蟲匯聚之所

最近在看 Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

這篇論文的時候發現了裡面提到Stochastic “Hard” Attention vs Deterministic “Soft” Attention

雖然大致上有寫出是什麼意思

但是還是稍微上網查了一下

Soft means differentiable.

For example a sigmoid function is soft because it is differentiable.
A softmax function is named soft because it is differentiable and is actually used in soft attention to compute attention probabilities over the input stimuli such as an image or audio.
Soft also means that the function varies somewhat smoothly over most of its domain.
Hard means non-differentiable.

For example, the heaviside step function is hard because it only swings between on and off states and has literally zero gradients everywhere else except at the origin where the derivative is basically an impulse function.
Hard attention uses this switch like mechanism to determine whether to attend to a region or not over the stimuli such as an image or audio.
Hard also means the function has many abrupt changes over most of its domain. Like the way the heaviside step function as a jump discontinuity at the origin.

https://www.quora.com/What-is-the-difference-between-soft-attention-and-hard-attention-in-neural-networks

所以說Soft是可微的,Hard是不可微的

大概是這樣