How to Compute the Derivative of a Sigmoid Function (fully worked example)

Last updated on October 2nd, 2017

This is a sigmoid function:

$\boldsymbol{s(x) = \frac{1}{1 + e^{-x}}}$

The sigmoid function looks like this (made with a bit of MATLAB code):

x=-10:0.1:10; s = 1./(1+exp(-x)); figure; plot(x,s); title('sigmoid');

Alright, now let’s put on our calculus hats…

Here’s how you compute the derivative of a sigmoid function

First, let’s rewrite the original equation to make it easier to work with.

$\boldsymbol{s(x) = \frac{1}{1+e^{-x}} = (1)(1+e^{-x})^{-1} = (1+e^{-x})^{-1}}$

Now we take the derivative:

$\boldsymbol{\frac{d}{dx}s(x) = \frac{d}{dx}((1+e^{-x})^{-1})}$

$\boldsymbol{\frac{d}{dx}s(x) = -1((1+e^{-x})^{(-1-1)}) \frac{d}{dx}(1+ e^{-x})}$

$\boldsymbol{\frac{d}{dx}s(x) = -1((1+e^{-x})^{(-2)}) (\frac{d}{dx}(1) + \frac{d}{dx}(e^{-x}))}$

$\boldsymbol{\frac{d}{dx}s(x) = -1((1+e^{-x})^{(-2)}) (0 + e^{-x}(\frac{d}{dx}(-x)))}$

$\boldsymbol{\frac{d}{dx}s(x) = -1((1+e^{-x})^{(-2)}) (e^{-x})(-1)}$


Nice! We computed the derivative of a sigmoid! Okay, let’s simplify a bit.

$\frac{d}{dx}s(x) = ((1+e^{-x})^{(-2)}) (e^{-x})$

$\frac{d}{dx}s(x) = \frac{1}{(1+e^{-x})^{2}} (e^{-x})$

$\frac{d}{dx}s(x) = \frac{(e^{-x})}{(1+e^{-x})^{2}}$


Okay! That looks pretty good to me. Let’s quickly plot it and see if it looks reasonable. Again here’s some MATLAB code to check:

x=-10:0.1:10; % Test values. s = 1./(1+exp(-x)); % Sigmoid. ds = (exp(-x))./((1+exp(-x)).^2); % Derivative of sigmoid. figure; plot(x,s,'b*'); hold on; plot(x,ds,'r+'); legend('sigmoid', 'derivative-sigmoid','location','best')

Looks like a derivative. Good! But wait… there’s more!

If you’ve been reading some of the neural net literature, you’ve probably come across text that says the derivative of a sigmoid s(x) is equal to s'(x) = s(x)(1-s(x)).

[note that $\frac{d}{dx}s(x)$ and s'(x) are the same thing, just different notation.]

[also note that Andrew Ng writes, f'(z) = f(z)(1 – f(z)), where f(z) is the sigmoid function, which is the exact same thing that we are doing here.]

So your next question should be, is our derivative we calculated earlier equivalent to s'(x) = s(x)(1-s(x))?

So, using Andrew Ng’s notation…

How does the derivative of a sigmoid f(z) equal f(z)(1-(f(z))?

Swapping with our notation, we can ask the equivalent question:

How does the derivative of a sigmoid s(x) equal s(x)(1-(s(x))?

Okay we left off with…

$\frac{d}{dx}s(x) = \frac{(e^{-x})}{(1+e^{-x})^{2}}$


This part is not intuitive… but let’s add and subtract a 1 to the numerator (this does not change the equation).

$\frac{d}{dx}s(x) = \frac{(e^{-x} + 1 -1)}{(1+e^{-x})^{2}}$

$\frac{d}{dx}s(x) = \frac{(1 + e^{-x} -1)}{(1+e^{-x})^{2}}$

$\frac{d}{dx}s(x) = \frac{(1 + e^{-x})}{(1+e^{-x})^{2}} - \frac{1}{(1+e^{-x})^{2}}$

$= \frac{1}{(1+e^{-x})} - \frac{1}{(1+e^{-x})^{2}}$

$= \frac{1}{(1+e^{-x})} - (\frac{1}{(1+e^{-x})}) (\frac{1}{(1+e^{-x})})$ // factor out a $\frac{1}{(1+e^{-x})}$

$= \frac{1}{(1+e^{-x})} (1 - \frac{1}{(1+e^{-x})})$


Hmmm…. look at that! There’s actually two sigmoid functions there… Recall that the sigmoid function is, $s(x) = \frac{1}{1 + e^{-x}}$. Let’s replace them with s(x).

$s'(x) = \frac{d}{dx}s(x) = s(x) (1 - s(x))$


Just like Prof Ng said… 🙂

And for a sanity check, do they both show the same function?

x=-10:0.1:10; % Test values. s = 1./(1+exp(-x)); % Sigmoid. ds = (exp(-x))./((1+exp(-x)).^2); % Derivative of sigmoid. ds1 = s.*(1-s); % Another simpler way to compute the derivative of a sigmoid. figure; plot(x,ds,'r+'); hold on; plot(x,ds1, 'go'); legend('(e^{-x})/((1+e^{-x})^2)','(s(x))(1-s(x))','location','best'); title('derivative of sigmoid')

Yes! They perfectly match!

So there you go. Hopefully this satisfies your mathematical curiosity of why the derivative of a sigmoid s(x) is equal to s'(x) = s(x)(1-s(x)).

23 thoughts on “How to Compute the Derivative of a Sigmoid Function (fully worked example)”

1. Jeremy says:

I think if you 1) rewrite my equation so the e^-x in the numerator goes to e^x in the denominator, 2) multiply my equation by e^x/e^x, and 3) expand the denominator in both the wolfram and my equation, they should be equal.

1. Sefrin says:

How come websites like wolfram alpha simply the -x in the exponents to positive x? How do we get there? I don’t see it.. :-/

1. Jeremy says:

Hi Sefrin, could you include an example/link to explain what you mean?

1. Jeremy says:

Thanks Vinay! You created a nice visual summary of different activation functions.

2. Scott Favorite says:

Actually you do use the product rule but it is part of the chain rule. Hope this is clear.

1. Jeremy says:

Hi Scott, thanks for your comment! I agree this is confusing/misleading. I re-wrote to remove the reference to the product rule.

3. Sid says:

Excellent walkthrough. For a guy just getting into activation fn’s, this really helps! Thanks so much!

1. Jeremy says:

You’re welcome Sid!

4. EASILY, the best blog post on finding the derivative of a sigmoid function. You didn’t leave any details out. Took me forever to wrap my head around this. The +1 – 1 thing is definitely not intuitive. Thanks for writing this.

1. Jeremy says:

happy to hear it helped!

5. Thanks! really helped with Prof. Hinton’s NNML Coursera lecture I was struggling to understand.

1. Jeremy says:

Glad it helped! It wasn’t obvious to me either 🙂

6. Balamurugan Balakrishnan says:

Superb!

7. Shubham Juneja says:

Very detailed. Thank you !!

1. Jeremy says:

You’re welcome!

1. Jeremy says:

Glad it helped clear things up!

8. Kevin Wang says:

excellent. Thanks!