How to Compute the Derivative of a Sigmoid Function (fully worked example)

This is a sigmoid function:

\boldsymbol{s(x) = \frac{1}{1 + e^{-x}}}  

The sigmoid function looks like this (made with a bit of MATLAB code):

x=-10:0.1:10;
s = 1./(1+exp(-x));
figure; plot(x,s); title('sigmoid');

sigmoid

Alright, now let’s put on our calculus hats…

Here’s how you compute the derivative of a sigmoid function

First, let’s rewrite the original equation to make it easier to work with.

\boldsymbol{s(x) = \frac{1}{1+e^{-x}} = (1)(1+e^{-x})^{-1} = (1+e^{-x})^{-1}}  

Now we take the derivative:

\boldsymbol{\frac{d}{dx}s(x) = \frac{d}{dx}((1+e^{-x})^{-1})}   

\boldsymbol{\frac{d}{dx}s(x) = -1((1+e^{-x})^{(-1-1)}) \frac{d}{dx}(1+ e^{-x})}   

\boldsymbol{\frac{d}{dx}s(x) = -1((1+e^{-x})^{(-2)}) (\frac{d}{dx}(1) + \frac{d}{dx}(e^{-x}))}   

\boldsymbol{\frac{d}{dx}s(x) = -1((1+e^{-x})^{(-2)}) (0 + e^{-x}(\frac{d}{dx}(-x)))}   

\boldsymbol{\frac{d}{dx}s(x) = -1((1+e^{-x})^{(-2)}) (e^{-x})(-1)}   

Nice! We computed the derivative of a sigmoid! Okay, let’s simplify a bit.

\frac{d}{dx}s(x) = ((1+e^{-x})^{(-2)}) (e^{-x})  

\frac{d}{dx}s(x) = \frac{1}{(1+e^{-x})^{2}} (e^{-x})  

\frac{d}{dx}s(x) = \frac{(e^{-x})}{(1+e^{-x})^{2}}   

Okay! That looks pretty good to me. Let’s quickly plot it and see if it looks reasonable. Again here’s some MATLAB code to check:

x=-10:0.1:10;  % Test values.
s = 1./(1+exp(-x));  % Sigmoid.
ds = (exp(-x))./((1+exp(-x)).^2); % Derivative of sigmoid.
figure; plot(x,s,'b*'); hold on; plot(x,ds,'r+'); legend('sigmoid', 'derivative-sigmoid','location','best')

derivative_of_sigmoid

Looks like a derivative. Good! But wait… there’s more!

If you’ve been reading some of the neural net literature, you’ve probably come across text that says the derivative of a sigmoid s(x) is equal to s'(x) = s(x)(1-s(x)).

[note that \frac{d}{dx}s(x) and s'(x) are the same thing, just different notation.]

[also note that Andrew Ng writes, f'(z) = f(z)(1 – f(z)), where f(z) is the sigmoid function, which is the exact same thing that we are doing here.]

So your next question should be, is our derivative we calculated earlier equivalent to s'(x) = s(x)(1-s(x))?

So, using Andrew Ng’s notation…

How does the derivative of a sigmoid f(z) equal f(z)(1-(f(z))?

Swapping with our notation, we can ask the equivalent question:

How does the derivative of a sigmoid s(x) equal s(x)(1-(s(x))?

Okay we left off with…

\frac{d}{dx}s(x) = \frac{(e^{-x})}{(1+e^{-x})^{2}}   

This part is not intuitive… but let’s add and subtract a 1 to the numerator (this does not change the equation).

\frac{d}{dx}s(x) = \frac{(e^{-x} + 1 -1)}{(1+e^{-x})^{2}}   

\frac{d}{dx}s(x) = \frac{(1 + e^{-x} -1)}{(1+e^{-x})^{2}}   

\frac{d}{dx}s(x) = \frac{(1 + e^{-x})}{(1+e^{-x})^{2}} - \frac{1}{(1+e^{-x})^{2}}   

= \frac{1}{(1+e^{-x})} - \frac{1}{(1+e^{-x})^{2}}  

= \frac{1}{(1+e^{-x})} - (\frac{1}{(1+e^{-x})}) (\frac{1}{(1+e^{-x})})   // factor out a \frac{1}{(1+e^{-x})}  

= \frac{1}{(1+e^{-x})} (1 - \frac{1}{(1+e^{-x})})  

Hmmm…. look at that! There’s actually two sigmoid functions there… Recall that the sigmoid function is, s(x) = \frac{1}{1 + e^{-x}}  . Let’s replace them with s(x).

s'(x) = \frac{d}{dx}s(x) = s(x) (1 - s(x))  

Just like Prof Ng said… 🙂

And for a sanity check, do they both show the same function?

x=-10:0.1:10;  % Test values.
s = 1./(1+exp(-x));  % Sigmoid.
ds = (exp(-x))./((1+exp(-x)).^2); % Derivative of sigmoid.
ds1 = s.*(1-s); % Another simpler way to compute the derivative of a sigmoid.
figure; plot(x,ds,'r+'); hold on; plot(x,ds1, 'go'); legend('(e^{-x})/((1+e^{-x})^2)','(s(x))(1-s(x))','location','best'); title('derivative of sigmoid')

derivative_of_sigmoid_two_ways

Yes! They perfectly match!

So there you go. Hopefully this satisfies your mathematical curiosity of why the derivative of a sigmoid s(x) is equal to s'(x) = s(x)(1-s(x)).

23 thoughts on “How to Compute the Derivative of a Sigmoid Function (fully worked example)”

    1. I think if you 1) rewrite my equation so the e^-x in the numerator goes to e^x in the denominator, 2) multiply my equation by e^x/e^x, and 3) expand the denominator in both the wolfram and my equation, they should be equal.

  1. How come websites like wolfram alpha simply the -x in the exponents to positive x? How do we get there? I don’t see it.. :-/

    1. Hi Scott, thanks for your comment! I agree this is confusing/misleading. I re-wrote to remove the reference to the product rule.

  2. EASILY, the best blog post on finding the derivative of a sigmoid function. You didn’t leave any details out. Took me forever to wrap my head around this. The +1 – 1 thing is definitely not intuitive. Thanks for writing this.

Questions/comments? If you just want to say thanks, consider sharing this article or following me on Twitter!