PDA

View Full Version : Website headaches



Blessed
08-15-2014, 07:53 PM
So the server host for JBF Corporate (A franchise who I work with on an individual franchisee basis some) went bankrupt and literally walked out on all their clients. As a result everything went down - email, online tagging system, websites... and we've been dealing with it ALL DAY LONG - actually started yesterday afternoon at 3pm.

The biggest event I work with starts on Monday - this has not been fun!

My question though - after the tech guys finished their work on the problem and everything should be working this is what the CEO said:
"Here is the most recent update: It is my understanding that now that the DNS server is back up and running that it takes time to cycle through all the website providers and browsers. Providers do not provide an exact timeline for us to share with everyone, unfortunately. While you may not be seeing our website, your next-door neighbor on a different service may be seeing the website just fine. We have made efforts to find a way to expedite this update process for all providers, but found there is no way to control the timing of these updates. As you can see in other posts, more and more users are seeing their websites from their own computers. All the websites should be UP, it is just that your internet provider may not have updated their DNS process just yet. Not sure if this info is helpful, but might explain some of the inconsistencies in our recovery."

that was hours ago and I still have nothing - no website, no email, no nothing. I've restarted my modem/router, my computer, cleared my internet history, cache and cookies... anything else I can do or am I stuck until my internet provider does whatever little magic thing the "DNS Process" is?

Blessed
08-15-2014, 08:24 PM
OK... everything finally seems to be back up and running! I was out of commission for 28 hours.

What we did to mitigate the damage during the crisis.
Created a Gmail email address our clients could email while our email was down
Posted on Instagram, Facebook and Twitter and kept up the conversation on Facebook, our biggest source of social media contact with our customers at this point.

What we will do as soon as we get final clean-up info from corporate - is send an email out telling our clients what happened and why - within reason... I'm sure corporate will give us a canned statement that will cover the basics and let people know that we are working towards making sure it doesn't happen again!

I'd still like to understand what the magic DNS Process is though...

Harold Mansfield
08-15-2014, 08:41 PM
My question though - after the tech guys finished their work on the problem and everything should be working this is what the CEO said:
"Here is the most recent update: It is my understanding that now that the DNS server is back up and running that it takes time to cycle through all the website providers and browsers. Providers do not provide an exact timeline for us to share with everyone, unfortunately. While you may not be seeing our website, your next-door neighbor on a different service may be seeing the website just fine. We have made efforts to find a way to expedite this update process for all providers, but found there is no way to control the timing of these updates. As you can see in other posts, more and more users are seeing their websites from their own computers. All the websites should be UP, it is just that your internet provider may not have updated their DNS process just yet. Not sure if this info is helpful, but might explain some of the inconsistencies in our recovery."

that was hours ago and I still have nothing - no website, no email, no nothing. I've restarted my modem/router, my computer, cleared my internet history, cache and cookies... anything else I can do or am I stuck until my internet provider does whatever little magic thing the "DNS Process" is?


I'd still like to understand what the magic DNS Process is though...

Sounds like everything is back up, but FYI...yeah, DNS propagation has nothing to do with you or your computer. Think of it like dominoes making connections and falling across all of the DNS servers all over the world. During the process while it's doing it's thing someone in MA may be able to pull up the site in question, while someone in NY still can't pull it up. Or I may see it through my ISP, but my neighbor can't through his.

You just have to wait for it to make it's rounds and update everywhere.

It takes time. It used to be 24-48 hours. Now it's like 2-12 hours.

There's at least 5 people on here that deal with servers all of the time that can probably get far more detailed about it than I can.

Blessed
08-15-2014, 11:52 PM
Ah... Dominoes! I can understand that :D Thanks Harold!

We're mostly up but our email is really sporadic. It's up, then it's down, then 10 emails come in, then you get kicked out again... I'm guessing there will be some emails that have been lost forever!

This is the most recent update from the tech guy - it makes absolutely no sense to me... but at least they are telling us something!
"Around 11am on Thursday we, along with other colocation customers sharing the same /16 of address space were the victim of a massive DDoS attack. This attack cause a peering partner to pull their plug on a connection to our facility. This was the only remaining connection to our facility, which was unknown to me.

The result was a complete isolation of our data center where email and authoritative DNS serverices were hosted. Due to the timing of the pull, we were unable to get any bandwidth provider to get service provisioned that we could take advantage of. The fastest we could get was within 24 hours for new connectivity to be present.

At this point, I activated the DR servers to grab DNS tables and stand up DNS services in AWS. These servers were up with a fresh serial number about 2am CDT. Throughout the course of the day, we revised the serial numbers multiple times and reloaded the zone files to cause a re-propagation to occur. Unfortunately, not all providers honored the 4 hour time to live and expiration directives that are a part of DNS records. This is resulting in some providers still being unable to serve correct information for the jbfsale.com site and subsites. If you are still having this issue, try changing the DNS servers your computer uses to the Google DNS servers"

What is a DDoS attack?
What are DR servers, DNS tables and AWS?
Is this all stuff I should learn a little about? :D

Ah... now off to do some more damage control/customer service/public relations... I know how to do that! Just need something I can understand to tell the people!

billbenson
08-16-2014, 12:36 AM
While there are a lot of people here who don't like godaddy, I have noticed that their domain propagation is in hours, not days. Usually instantly. I have no idea why, or if its consistent . Just my experience.

Brian Altenhofel
08-16-2014, 01:23 AM
28 hours.

That's practically an eternity, especially if your online presence is important to your business.



Created a Gmail email address our clients could email while our email was down

If your online presence is important (and especially if it's mission-critical), email should be hosted separately. That also goes for DNS.


Posted on Instagram, Facebook and Twitter and kept up the conversation on Facebook, our biggest source of social media contact with our customers at this point.

Corporate should have set up something static and re-pointed the DNS to it if it was taking time to execute their disaster recovery (DR) plan. Of course, that'd be if they had properly hosted their DNS separate from their website.


What we will do as soon as we get final clean-up info from corporate - is send an email out telling our clients what happened and why - within reason... I'm sure corporate will give us a canned statement that will cover the basics and let people know that we are working towards making sure it doesn't happen again!

Hopefully their IT folks will actually do a post-mortem.


This was the only remaining connection to our facility, which was unknown to me.

Something wasn't properly monitored and/or documented.


Unfortunately, not all providers honored the 4 hour time to live and expiration directives that are a part of DNS records.

DNS TTL was too high for the situation. When a domain has a 4 hour TTL, that means when a computer makes a request, it will take what it gets and cache it for four hours. This has a cascading effect.

It sounds like they moved their DNS to Amazon Route 53. Most ISPs these days will query the Start of Authority directly. In such a situation, they should have set the TTL there to be 60 seconds (or at least to the minimum that Amazon allows), as well as the TTL for all DNS entries to 60 seconds (or whatever minimum).

The cascading effect is that when your computer asks for the DNS entry, it first checked itself to see if it was within the 4 hour window. If the 4 hours expired, it then asks whatever DNS servers your computer is pointed to for their record. You get back whatever is in their cache, so you could get the old DNS record. Worst case, if you had 3h59m58s left on your end and the server your computer was pointed at had 3h59m59s left, you're going to be waiting almost the full 8 hours. If the DNS you were pointed at was pointed at another upstream resource between them and Amazon and therefore wasn't querying the assigned nameservers directly, add another 3h59m59s. If they had a high SOA TTL (not uncommon to be set to 1 day), then there's a possibility that contributed to the delay because DNS servers won't look for a domain's new nameservers until the SOA is refreshed with the new nameservers (in most cases; some are kind enough to ignore that particular TTL).

The tech is right that not all ISPs respect TTL entries, but the vast majority do. Out of the hundreds of zero-downtime migrations I've done, I've typically had 95% of traffic hitting the new server within 5 minutes (and 99% within a half-hour) using 60s TTLs across the board set at least 48 hours in advance. Of course, a 60s TTL is not ideal because your site would load slowly if your users were having to do a DNS lookup every 60s, so it becomes a balance between "affordable worst case downtime" and "user experience". (Using any of the big anycast DNS providers like DNSMadeEasy, Rackspace, or Amazon practically mitigates that "worst case scenario"). In this situation, if their old TTL was 4 hours, then most likely 99% of traffic would have been redirected within 4 hours if the new TTL was 60s.

Of course, if the DR plan (if it existed) was tested on a regular basis, they probably would have seen that coming. That's why I enjoy Failure Fridays (inspired by PagerDuty (https://blog.pagerduty.com/2013/11/failure-friday-at-pagerduty/)).

billbenson
08-16-2014, 04:15 AM
Brian had a lot of good points. I'll just hone in on this one as its saved my butt more than a few times:



If your online presence is important (and especially if it's mission-critical), email should be hosted separately. That also goes for DNS.

Have your email separate on one or more hosts. At least email is never down.

Blessed
08-16-2014, 04:53 PM
Brian had a lot of good points. I'll just hone in on this one as its saved my butt more than a few times:

Have your email separate on one or more hosts. At least email is never down.

Brian had lots of good points! And thank you for this one too. I'm compiling this info to give to the Franchisee I actually work for - who runs one of the larger JBF Sales and two other smaller JBF sales, so that she can pass it on to Corporate. I know we haven't heard the last of this yet.

Email is still really sporadic, taking a long time for emails sent from my JBF email accounts to get to my personal email account and etc... So we are keeping the Gmail account I set up active and are pointing people to email us there since our event starts on Monday.

The last 3-4 days before our event opens it is CRUCIAL that we have reliable website function. We'll see how devastating this is going to be to us. Fortunately we've worked really hard this year to get our Facebook fanbase up - so we were able to reach a lot of people we wouldn't have reached in the past.

Our email list has 12,000ish contacts on it. We're up to 5500 Facebook contacts - we have approximately 500 families who sell with us and usually see around 7500 shoppers. We've increased our Facebook "fan base" (since I don't know the proper term!) by 1500 contacts this year - so I felt pretty comfortable that we were reaching the majority of the people on our email list who actually do participate in the sale. The largest JBF Sale in the nation started on Thursday/Friday in the middle of all of this mess and they have 23,000 contacts on their email list with only 6000 Facebook contacts - they were in a panic, because they had no way to reach most of their people.

OK... back to answer more emails and do more damage control! Thank you for the input so far!

Harold Mansfield
08-17-2014, 11:09 AM
Ah... Dominoes! I can understand that :D Thanks Harold!

We're mostly up but our email is really sporadic. It's up, then it's down, then 10 emails come in, then you get kicked out again... I'm guessing there will be some emails that have been lost forever!

This is the most recent update from the tech guy - it makes absolutely no sense to me... but at least they are telling us something!
"Around 11am on Thursday we, along with other colocation customers sharing the same /16 of address space were the victim of a massive DDoS attack. This attack cause a peering partner to pull their plug on a connection to our facility. This was the only remaining connection to our facility, which was unknown to me.

The result was a complete isolation of our data center where email and authoritative DNS serverices were hosted. Due to the timing of the pull, we were unable to get any bandwidth provider to get service provisioned that we could take advantage of. The fastest we could get was within 24 hours for new connectivity to be present.

At this point, I activated the DR servers to grab DNS tables and stand up DNS services in AWS. These servers were up with a fresh serial number about 2am CDT. Throughout the course of the day, we revised the serial numbers multiple times and reloaded the zone files to cause a re-propagation to occur. Unfortunately, not all providers honored the 4 hour time to live and expiration directives that are a part of DNS records. This is resulting in some providers still being unable to serve correct information for the jbfsale.com site and subsites. If you are still having this issue, try changing the DNS servers your computer uses to the Google DNS servers"

What is a DDoS attack?
What are DR servers, DNS tables and AWS?
Is this all stuff I should learn a little about? :D

Ah... now off to do some more damage control/customer service/public relations... I know how to do that! Just need something I can understand to tell the people!
This is a completely separate issue that has nothing to do with DNS propagation.

DDoS attack is a distributed denial-of-service attack. It's a concentrated, malicious attack to overwhelm a server or network to max out it's resources making them unavailable for normal operation. Basically they're saying they've been hacked.

Them telling you to change the DNS servers that your computer uses is confusing. I've had to do this to test a new server, but never just to reach certain live websites across the internet. Seems to me if there were settings they are saying you need to change they would tell you what they are.

Some of the other guys can probably speak more about the server stuff.

Brian Altenhofel
08-17-2014, 08:50 PM
DDoS attack is a distributed denial-of-service attack. It's a concentrated, malicious attack to overwhelm a server or network to max out it's resources making them unavailable for normal operation. Basically they're saying they've been hacked.

A victim of a DDoS isn't usually hacked. Even legitimate users can cause a DDoS.

Now, if what they meant to say is the /16 subnet was being used for a DDoS, that's a separate matter. That's a massive compromise.


Them telling you to change the DNS servers that your computer uses is confusing.

Basically, they are just requesting that the user use a first tier DNS provider that queries nameservers directly.

It's actually a common practice in tech support when someone calls with a problem that could possibly be caused by inaccurate DNS records to request that they change their DNS settings to use 8.8.8.8 and 8.8.4.4.

(It's also a way to get around registering a new device with some cable ISPs.)

Harold Mansfield
08-17-2014, 10:19 PM
Basically, they are just requesting that the user use a first tier DNS provider that queries nameservers directly.

It's actually a common practice in tech support when someone calls with a problem that could possibly be caused by inaccurate DNS records to request that they change their DNS settings to use 8.8.8.8 and 8.8.4.4.

(It's also a way to get around registering a new device with some cable ISPs.)

Yes, but no tech department would just tell a consumer to do something like that without clear instructions. You can't just throw that in people's lap as if everyone is an expert. That's why I don't understand why they would just throw that out there to her without instructions.

Brian Altenhofel
08-18-2014, 01:19 AM
Yes, but no tech department would just tell a consumer to do something like that without clear instructions. You can't just throw that in people's lap as if everyone is an expert. That's why I don't understand why they would just throw that out there to her without instructions.

It's not that uncommon these days for the person in the IT cubicle to make assumptions, especially during a time when one is likely under pressure.

Blessed
08-18-2014, 09:19 AM
I just saw more info posted on a "secret" Facebook group of about 12 of the owners with big sale events, like ours, and it concurs with what I'm gathering from the info here - this was completely preventable.

"...bottom line is this outage was preventable and negligent. With that said, Mr. XXX did everything in his power to fix the issue and I kinda feel bad for him--it was basically a perfect storm situation and he didn't have an adequate back up plan. As owners, I now realize that we need to have an understanding of the who, what, where and how of our website and email hosting. We need to have some input into the inevitable re-architecture of the system to avoid a catastrophic attack in the future. Sooo....one of our first requests at conference should be a session on our website infrastructure...."

OK - THANK YOU for your input here I do value it and will take time to try to read through it and understand it better later this week. We set-up for the event today, consignors start bringing their items in tomorrow, sales start Wednesday with a Blogger/Social Media Influencers party/presale and there will be some serious selling going on Thursday-Sunday, then Monday is break-down, clean-up and checks! So - I will be absent, mostly, for the rest of this week... but will check back!

If you are wondering about what this business actually is:
Here's our Sale Facebook Page - https://www.facebook.com/JBFLeesSummit?ref=hl
In summary - 500 +/- consignors (families who sell the things their kids have outgrown/individuals who purchase items to resell); 24,000 square feet of merchandise; 6000-7500 Shoppers over the course of 5 days (more shoppers if we get news coverage - we often do, but it's never guaranteed)
Consignors have to prep their items according to guidelines posted on our website and price them using our online "tagging system" - this creates barcoded tags that go on their items. Everything is then put out and the end result is a shoppable event similar to a typical retail store. Having the website issues we had in the few days prior to our event will definitely have affected us - with consignors not able to prep their items before the weekend. Fortunately it was fixed by Friday evening so many of them may have taken the time Friday evening and over the weekend to go ahead and prep their things, but without a doubt there will be consignors who don't participate this time because of those last minute frustrations. Consignors earn 65-75% on their sold items less a consignor fee of $12.50. In the spring our average consignor check was over $400. It's all infant-pre-teen and maternity clothing, toys, equipment, books, games, etc...
This Facebook Album is from our Spring/Summer event: https://www.facebook.com/media/set/?set=a.10151964337263443.1073741833.94633323442&type=3 We will put up a similar album for this event Thursday morning.

cathyc100
09-14-2014, 03:51 PM
What an awful time for you. I am glad everything is back up and yes you had to let it run its course. However, there are some things you can do in times of distress. As you seemed to have done used email and social media to make contact. That is fast and easy. Another route if you haven't explored it is get a mobile app. Being mobile is a necessity. A mobile app is downloaded to the device and you have everything integrated, website, social media and if your site goes down the app is still running. You can do push notifications and make the app interactive so people will spend time on the app, share the app with others. Any questions give me a shout.

<please set up a signature through Settings>

Blessed
09-15-2014, 12:14 PM
I'm not sure who cathyc100 is... but that post about a mobile app on this thread bumped it and reminded me that I wanted to post an update

At the end of the sale we were down 10% from our previous event. We had enough people comment about difficulties contacting us via email or finding information on our website that we're sure at least part of that drop was due to the outage. In fact even now, weeks later - I'm still receiving a random email from that time frame. I haven't heard what the safeguards against this happening again are - but conference is in January and I know it's on the agenda.

Other things that possibly contribute to the drop in sales are:
1) our event happened on the first week of school, we actually opened on the first day of school - and we know that was an issue for some people.
2) We dropped our direct mail piece this time because we had hired a social media specialist to market the sale via that route and to create relationships with local bloggers etc... she dropped the ball and is compensating for that by providing this service for us during our Holiday event in November at no additional charge.
3) And finally we have a consignor who resell's baby equipment - car seats, strollers, pack-n-plays, etc... he also has a retail store and sells store returns and slightly damaged goods - he wasn't at the sale this time which means we didn't have the selection of strollers, cribs and other big equipment that shoppers are looking for a deal on.

Those three things alone would have affected the bottom line, the website and communication issues we also had didn't help any! At the same time, a 10% drop is not an earth-shattering, business-destroying event. We can recover from that and expect to do so between the Holiday sale in November and the Spring/Summer event in February.

We did also learn some things about Social Media marketing and are beginning to be efficient at using that marketing tool - so that's a good thing as well!

MyITGuy
09-15-2014, 03:28 PM
Surprised I missed this thread...sounds like your host was BurstNet, or was hosted with them as the timing seems to be too much of a coincidence.

Blessed
09-15-2014, 03:35 PM
You know Jeff, I never heard who the host was or who it is now... hadn't thought about that fact until you mentioned it - but then I've always been a front-end gal who knows nothing about how all this stuff works - just how to use it and make it work for me :)

However - this experience has also taught me that I need to quit being too busy with other things - and take the time to learn more about websites and hosting and the internet and all that good stuff. Print isn't what it used to be and I need to enlarge my skill set.

MyITGuy
09-15-2014, 06:25 PM
You know Jeff, I never heard who the host was or who it is now...
Going on a hunch, but I'd say your new host is web-host.net and I've never heard of them.


However - this experience has also taught me that I need to quit being too busy with other things - and take the time to learn more about websites and hosting and the internet and all that good stuff. Print isn't what it used to be and I need to enlarge my skill set.
If your business has any form of web presence, this is an absolute must

Blessed
09-16-2014, 09:45 AM
If your business has any form of web presence, this is an absolute must

At present my business is 100% past clients and word of mouth. I have no website... well, technically I have a domain but I have never done anything with it. My intention for the past 6 or 7 years has been to spend a half a year taking some web design classes and developing that skill set. I'm good at print design. I know all the technicalities and deliver print-ready, trouble-free files to my clients, but I've lost some clients because they needed web design and I couldn't do it so I referred them elsewhere, usually the other designer and I work together for awhile but then it is easier for the client to just use one source for all their needs. No ugly break-up stories - just business!

I've had a handful of clients go from needing a contract graphic designer to having to hire someone full-time in house as their business grew, so that's good - it makes me feel good about the work I've done.

But my business growth is stagnant now, I have my steady handful of clients and my part-time income has been stable for the past 3-4 years. I've replaced any clients I lost with new ones who were referred to me by existing or former clients. But my kids are older, my youngest started kindergarten this year and I'm ready to see Crazy Dog become more than what it is now - somewhere between an actual business and a hobby that makes really good money. I became a freelance designer when my daughter was born 7 years ago, had a son 5 years ago and am done having kids! Now I can baby this business and see it grow. On the plus side - I have a much clearer vision of how it should grow now than I did 7 years ago!

Now - the client this outage affected has a huge internet presence and I'm her Virtual Assistant as well as helping with marketing and sales for her individual franchises within this franchise system - so I need to have a better understanding of things in that capacity as well.

MyITGuy
09-16-2014, 03:35 PM
Now - the client this outage affected has a huge internet presence and I'm her Virtual Assistant as well as helping with marketing and sales for her individual franchises within this franchise system - so I need to have a better understanding of things in that capacity as well.

If there is anything we can do to help, even if its just to understand a few things feel free to post them here or PM me. I'm sure Brian, myself or someone else with some knowledge on the subject would be willing to jump in.

Hope things start looking up for you on your future business ideas/growth!