How to Compute the Derivative of a Sigmoid Function (fully worked example)

derivative_sigmoid
This is a sigmoid function:

\boldsymbol{ s(x) = \frac{1}{1 + e^{-x}} }  

The sigmoid function looks like this (made with a bit of MATLAB code):

x=-10:0.1:10;
s = 1./(1+exp(-x));
figure; plot(x,s); title('sigmoid');

sigmoid

Alright, now let’s put on our calculus hats…

Here’s how you compute the derivative of a sigmoid function

Convert the original equation so we can use the product rule (I always forget the quotient rule).

\boldsymbol{s(x) = \frac{1}{1+e^{-x}} = (1)(1+e^{-x})^{-1} = (1+e^{-x})^{-1}}  

Now we take the derivative:

\frac{d}{dx}s(x) = \frac{d}{dx}((1+e^{-x})^{-1})   

\frac{d}{dx}s(x) = -1((1+e^{-x})^{(-1-1)}) \frac{d}{dx}(1+ e^{-x})   

\frac{d}{dx}s(x) = -1((1+e^{-x})^{(-2)}) (\frac{d}{dx}(1) + \frac{d}{dx}(e^{-x}))  

\frac{d}{dx}s(x) = -1((1+e^{-x})^{(-2)}) (0 + e^{-x}(\frac{d}{dx}(-x)))  

\frac{d}{dx}s(x) = -1((1+e^{-x})^{(-2)}) (e^{-x})(-1)  

Nice! We computed the derivative of a sigmoid! Okay, let’s simplify a bit.

\frac{d}{dx}s(x) = ((1+e^{-x})^{(-2)}) (e^{-x})  

\frac{d}{dx}s(x) = \frac{1}{(1+e^{-x})^{2}} (e^{-x})  

\frac{d}{dx}s(x) = \frac{(e^{-x})}{(1+e^{-x})^{2}}   

Okay! That looks pretty good to me. Let’s quickly plot it and see if it looks reasonable. Again here’s some MATLAB code to check:

x=-10:0.1:10;  % Test values.
s = 1./(1+exp(-x));  % Sigmoid.
ds = (exp(-x))./((1+exp(-x)).^2); % Derivative of sigmoid.
figure; plot(x,s,'b*'); hold on; plot(x,ds,'r+'); legend('sigmoid', 'derivative-sigmoid','location','best')

derivative_of_sigmoid

Looks like a derivative. Good! But wait… there’s more!

If you’ve been reading some of the neural net literature, you’ve probably come across text that says the derivative of a sigmoid s(x) is equal to s'(x) = s(x)(1-s(x)).

[note that \frac{d}{dx}s(x) and s'(x) are the same thing, just different notation.]

[also note that Andrew Ng writes, f'(z) = f(z)(1 – f(z)), where f(z) is the sigmoid function, which is the exact same thing that we are doing here.]

So your next question should be, is our derivative we calculated earlier equivalent to s'(x) = s(x)(1-s(x))?

So, using Andrew Ng’s notation…

How does the derivative of a sigmoid f(z) equal f(z)(1-(f(z))?

Swapping with our notation, we can ask the equivalent question:

How does the derivative of a sigmoid s(x) equal s(x)(1-(s(x))?

Okay we left off with…

\frac{d}{dx}s(x) = \frac{(e^{-x})}{(1+e^{-x})^{2}}   

This part is not intuitive… but let’s add and subtract a 1 to the numerator (this does not change the equation).

\frac{d}{dx}s(x) = \frac{(e^{-x} + 1 -1)}{(1+e^{-x})^{2}}   

\frac{d}{dx}s(x) = \frac{(1 + e^{-x} -1)}{(1+e^{-x})^{2}}   

\frac{d}{dx}s(x) = \frac{(1 + e^{-x})}{(1+e^{-x})^{2}} - \frac{1}{(1+e^{-x})^{2}}   

= \frac{1}{(1+e^{-x})} - \frac{1}{(1+e^{-x})^{2}}  

= \frac{1}{(1+e^{-x})} - (\frac{1}{(1+e^{-x})}) (\frac{1}{(1+e^{-x})})   // factor out a \frac{1}{(1+e^{-x})}  

= \frac{1}{(1+e^{-x})} (1 - \frac{1}{(1+e^{-x})})  

Hmmm…. look at that! There’s actually two sigmoid functions there… Recall that the sigmoid function is, s(x) = \frac{1}{1 + e^{-x}}  . Let’s replace them with s(x).

s'(x) = \frac{d}{dx}s(x) = s(x) (1 - s(x))  

Just like Prof Ng said… 🙂

And for a sanity check, do they both show the same function?

x=-10:0.1:10;  % Test values.
s = 1./(1+exp(-x));  % Sigmoid.
ds = (exp(-x))./((1+exp(-x)).^2); % Derivative of sigmoid.
ds1 = s.*(1-s); % Another simpler way to compute the derivative of a sigmoid.
figure; plot(x,ds,'r+'); hold on; plot(x,ds1, 'go'); legend('(e^{-x})/((1+e^{-x})^2)','(s(x))(1-s(x))','location','best'); title('derivative of sigmoid')

derivative_of_sigmoid_two_ways

Yes! They perfectly match!

So there you go. Hopefully this satisfies your mathematical curiosity of why the derivative of a sigmoid s(x) is equal to s'(x) = s(x)(1-s(x)).

1 thought on “How to Compute the Derivative of a Sigmoid Function (fully worked example)”

Leave a Reply