Conversation with #gubug at 2005-01-12 17:57:58 on macnewbold@irc.freenode.net (irc) (17:57:58) : The topic for #gubug is: roBust, Solid, Dependable -- http://www.anthonychavez.org/images/haxorpc.jpg (18:37:14) pfctl: jlp: In order to detect a downed router, you could use a heartbeat protocol such as VRRP (or preferably, CARP) over a secure tunnel between the two routers. (18:38:44) pfctl: s/the two/two/ (18:38:58) pfctl: s/two routers/each pair of routers/ ;-) (18:39:59) pfctl: s/each pair of routers/ASnnn and each of its routers/ (18:40:08) pfctl: sorry, long day. :-) (18:46:53) pfctl: Another possibility would be to use OSPFv3, eliminate the tunnel, and create a virutal IP, effectively load-balancing the servers, which would probably be a good thing to do in the first place. (18:59:25) jnbek left the room (quit: Read error: 104 (Connection reset by peer)). (19:07:01) jnbek [~jnbek@c-67-166-97-223.client.comcast.net] entered the room. (20:25:54) jlp: pfctl: that assumes you own or otherwise have control of the router(s) (20:26:38) jlp: running with OSPF means you're depending on all the routers between you and your customers to always work correctly... they don't (20:27:29) jlp: google guys told me there was a day last month where comcast got into a routing loop on their internal network. something like packets bound for google from LA headed off to denver instead. once in denver, they promptly were turned around and pointed back at LA (20:28:07) jlp: this loop lasted for about two hours before comcast network wankers figured out what was going on and fixed it (20:28:32) jlp: but just because comcast's network is in a flap, doesn't mean google (e.g.) wants to not be available for all those users in LA (20:29:25) jlp: so I got a call from the HR gal at google this afternoon. she said things are looking good. hiring committee meets tomorrow and she'll call me tomorrow afternoon/evening (20:42:33) pfctl: jlp: I must have missed that in your discussion with Mac. So you're asking how to ensure that traffic from the server to the *customer* is guaranteed...? (20:42:57) pfctl: jlp: That's great news. We're all pulling for ya. ;-) (20:53:20) jlp: well, it's basically how do you tell that service to the customer has been interrupted, so you can do something about it (like direct those customers to a different instance of your site that they can actually reach) (20:53:28) jlp: ever seen F5's 3DNS box? (20:55:53) pfctl: I haven't, no. (20:56:48) pfctl: Fascinating problem, BTW. What did you suggest as the solution? (20:57:16) pfctl: I've often wondered about such a scenario. (20:57:18) jlp: well, you have to have control over your dns servers (i.e., not use "bind" :-) (20:57:44) jlp: so let's think about what happens when a user tries to hit www.foo.com (20:58:25) jlp: first, their browser tries to resolve www.foo.com, say it doesn't already know anything about it, so it asks a local name server (hopefully local :-) (20:58:40) jlp: local name server either knows or doesn't. assume for the moment it doesn't (20:58:42) ***pfctl gets the gist, skip a bit. :-) (20:59:08) jlp: ok, so it asks for an A record for www, right? it picks a random dns server from all of the ones that know foo.com (20:59:19) ***pfctl nods (20:59:45) jlp: that name server gets the req and immediately answers with an A record for one of the www instances (doesn't really matter which at this point)... with a TTL of 1 second (21:00:12) pfctl: ok (21:00:17) jlp: meanwhile, the dns server adds the requesting IP (IP of the user's dns server) to a database (21:00:48) pfctl: hmmm ok (21:00:52) jlp: now it kicks off some tests on all of the dns servers at all of the company sites... the dns servers ping the original requesting dns server, they maybe even traceroute to it (21:01:09) jlp: (if requests come in again, they just keep handing back TTL 1 responses) (21:01:21) pfctl: ahh... I see (21:01:28) pfctl: cheeky ;-) (21:01:29) jlp: at some point, the dns servers have a pretty good idea of where the user(s dns server) is (21:01:47) jlp: now they can start handing back the "right" A record, with a little higher TTL (21:02:04) jlp: this works great until something breaks, right? (21:02:43) jlp: now keep in mind, we own the web servers, too, so we can kind of keep track of where requests are coming from (keeping statistics in other words) (21:03:23) jlp: so when we notice that, while we normally would have had, oh, 5000 requests/second from a particular netblock (on a particular web server), we suddenly notice that we're getting much less than that (21:03:53) jlp: red flags go up, and we tell the dns servers that for whatever reason we can't talk to that netblock from that server any more. we don't know why, we don't really care (21:04:33) jlp: dns servers throw away that known IP/block/AS number (whatever) and start over again, shooting out TTL 1 responses again until they have a good idea of the new "best" server (21:04:38) pfctl: I see... Is there a name for this approach? (21:05:01) jlp: I don't know if it has a name, but it's implemented by F5's 3DNS product (and presumably by custom python code at google :-) (21:05:05) pfctl: Or is it purely a hack at the moment? heh (21:05:13) pfctl: gotcha (21:05:30) jlp: so then, we'd like to know when the network repairs itself, right (or gets repaired) (21:05:48) pfctl: so keep tabs on what pings fail, yeah? (21:05:55) pfctl: and keep trying? (21:06:28) jlp: the dns server at the lost site will start receiving requests again, and we'll know that things are better. also, just because we figured out what the best server for a given request address is once doesn't mean it can't change, so we periodically run our probes again every once in a while (21:06:50) jlp: ultimately, things work fairly well, and google's customers see an average search response time under two seconds (21:06:58) pfctl: similar... I like your idea better. :-) (21:07:01) jlp: which pretty much rocks the world (21:07:26) pfctl: and here I thought it was PigeonRank. :-D (21:07:29) jlp: pretty cool stuff, though, I have to admit (21:07:41) jlp: btw, the F5 3DNS boxes run freebsd :-) (21:07:46) jlp: embedded (21:07:53) jlp: (they're appliances) (21:07:55) pfctl: of course ;-) (21:08:01) jlp: I only knew all this because we had a couple at flipdog (21:08:18) jlp: valuable knowledge when confronted with the question in a google interview :-) (21:08:37) pfctl: You mentioned that you wouldn't want to run BIND. Do you know if they run a /hacked/ version of BIND or is it a completely different software system? (21:08:41) jlp: I was glad I'd done my homework on how the dang things worked when we had them (21:08:49) jlp: 3DNS runs a hacked "named" (21:09:11) jlp: zone files look exactly the same, but you specify a TTL of 1 for the names you want to be specially "managed" (21:09:22) jlp: I don't know how google does it (21:09:47) jlp: even though I'd signed an NDA, they didn't disclose much (21:10:40) jlp: and I didn't see much other than the little conference room we were using, the snack center in that building, lobby, and the cafeteria (21:10:44) pfctl: Very cool. (21:11:28) jlp: of course, on the way to the cafeteria, we took the "skyway" (enclosed bridge between buildings) and I could see the new swimming machines they were in the process of putting in (21:11:34) jlp: picture a river in a tub (21:11:52) jlp: no room for a full sized pool, I guess (21:12:41) jlp: and there was a volleyball game going on, too (this was lunch time) (21:13:33) jlp: and four segways in the lobby (21:13:43) pfctl: haha (21:13:43) jlp: couple of massage chairs, baby grand piano (21:14:17) pfctl: sounds like a good place to work (21:14:25) jlp: so I was talking to the HR gal today... about places to live in the bay area that don't cost $1M for a house the size of my bedroom (21:15:00) jlp: she said google runs shuttles to various parts of the area... so you don't have to drive. shuttles equipped with broadband internet connections (21:15:17) pfctl: niiiiiiice (21:15:37) jlp: she said something like "why arrive at work burnt out from a commute?" (21:16:19) jlp: it would be an incredible place to work (21:16:56) jlp: have you ever felt like you were the "go to guy" for your company? I've always been that way (well, since 1988, anyway) (21:17:10) jlp: imagine a whole company where everyone you see was the "go to guy" (21:19:11) pfctl: Well, I wish you the best of luck. Hopefully, I'll find something equally sweet in Vancouver. ;-) (21:19:23) jlp: you're headed for vancouver? (21:20:02) pfctl: That's the plan, yeah. Probably won't happen for a few years, but BC will be what we'll be calling home soon. (21:20:57) jlp: you guys just like the area? family there? (21:21:23) pfctl: We considered Silicon Valley, Portland OR, Seattle, etc. all the way up the coast. We went to BC for our honeymoon and fell in love. (21:21:42) jlp: ahh, I see. yeah, it's quite pretty up there (21:22:23) pfctl: No family. But I've been itching to get the Hell out of Dodge for quite some time now. hehe (21:22:34) pfctl: Yeah, it's gorgeous. (21:22:42) pfctl: And very mild as far as climate goes. (21:22:54) pfctl: Good beer, too. ;-) (21:23:06) jlp: yeah, the family issue is kind of against me moving to california (21:23:13) jlp: my wife's entire family lives here (21:23:28) jlp: parents, sister, brother, all in murray (less than two miles from us) (21:24:32) jlp: and my brother lives in kearns (21:25:05) pfctl: My family and I have never been very close. My parents and I get along great, but we siblings have too many problems (not necessarily with each other) that keep us apart. (21:25:38) pfctl: One disaster after another. We're not the luckiest bunch. heh (21:25:58) jlp: my brother and I aren't close, but we do see each other on the occasional family member birthday :-) (21:26:45) pfctl: Yeah, of course. Don't get me wrong, we're not alienated. Just somewhat in dire straights. (21:27:55) jlp: I definitely spend more time with my in-laws (21:28:50) pfctl: Yeah, ditto. Traci's got cool parents. (21:30:27) pfctl: She also doesn't have family tying her down, so I think we could go pretty much any time. I would like to build up a bit more experience first though. (21:30:57) jlp: so where are you these days? (21:31:57) pfctl: Actually, I'm between jobs ATM. After we get back from our trip (we leave Saturday), however, I may have a job waiting for me with a startup in SLC. (21:32:19) jlp: cool... startups can be fun (21:32:34) pfctl: Yeah, and they can be disastrous, like the last one I was involved with. hehe (21:32:38) jlp: heh (21:32:41) jlp: me too (21:33:11) jlp: this startup wouldn't involve a guy named robert volker, would it? (21:33:18) pfctl: Fortunately, although I held a pretty high position, I didn't get dragged into being a shareholder. (21:34:03) pfctl: No, this involved a guy from Logan. ;-) (21:34:22) pfctl: Not familiar with Robert Volker. (21:34:58) jlp: ok... I was talking to robert (a former co-worker) last month about some stuff he was trying to put together... looking for someone with freebsd experience for an appliance they were building (21:35:25) jlp: they were hoping to get funding this month, but I haven't heard anything from them for several weeks (21:35:40) pfctl: oh, you mean the job I've got waiting. No Robert Volkers there either. (21:35:43) pfctl: AFAIK (21:36:34) jlp: robert was an interesting fellow... he used to be at USR/3com doing firmware programming stuff (21:36:34) pfctl: hmm... Actually, I think one of the guys was named Robert or Richard or somesuch. I'll be sure to let you know. (21:36:57) pfctl: "Interesting." Now *that*'s an interesting word. ;-) (21:37:00) jlp: I worked with him at partnet (21:37:15) jlp: gregg phipps is one of the other principles (21:37:20) jlp: principals? (21:37:36) pfctl: principles. principal == money (21:38:01) pfctl: anyhow, I need to fly. good luck again and I'll ttyl (21:38:05) jlp: ok, see ya (22:20:35) jlp: http://www.gpf-comics.com/d/19990201.html (08:43:44) pfctl: jlp: haha, that's great! (08:46:14) pfctl: jlp: BTW, it's principal. My bad. ;-) (09:13:54) fungus [~fungus@firebat.aros.net] entered the room. (17:51:56) jlp: just got off the phone with google. the SRE team hiring committee said "hire" (17:51:56) jlp: assuming the exec committee concurs, I should have an offer by tuesday (17:51:56) pfctl: Rock on!!! (17:51:56) jlp: now comes the painful decision making process :-) (17:51:56) jlp: I have to see what kind of offer they come back with (17:51:56) ***pfctl nods. (17:51:59) jnbek left the room (quit: Read error: 104 (Connection reset by peer)). (17:51:59) jnbek_ [~jnbek@c-67-166-97-223.client.comcast.net] entered the room. (18:02:59) fungus left the room (quit: ). (04:14:34) ChanServ left the room (quit: tolkien.freenode.net irc.freenode.net). (04:14:34) jnbek_ left the room (quit: tolkien.freenode.net irc.freenode.net). (04:14:34) jlp left the room (quit: tolkien.freenode.net irc.freenode.net). (04:14:34) pfctl left the room (quit: tolkien.freenode.net irc.freenode.net). (04:20:20) ChanServ [ChanServ@services.] entered the room. (04:20:20) jnbek_ [~jnbek@c-67-166-97-223.client.comcast.net] entered the room. (04:20:20) jlp [natted@c-24-2-96-137.client.comcast.net] entered the room. (04:20:20) pfctl [~pfctl@anthonychavez.org] entered the room. (04:20:20) #gubug: mode (+ooo ChanServ jlp pfctl ) by irc.freenode.net (09:11:53) fungus [~fungus@firebat.aros.net] entered the room. (14:13:56) fungus left the room (quit: Read error: 113 (No route to host)). (14:17:42) fungus [~fungus@firebat.aros.net] entered the room. (15:07:28) fungus left the room (quit: Read error: 104 (Connection reset by peer)). (15:11:32) fungus [Snak@fungus.air.aros.net] entered the room. (18:11:28) fungus left the room (quit: ). (18:49:33) pfctl left the room (quit: Remote closed the connection). (18:57:53) jenny_ahibdda [~jenny_ahi@225-143-89-200.fibertel.com.ar] entered the room. (18:58:16) jenny_ahibdda left the room. (23:32:18) ChanServ left the room (quit: tolkien.freenode.net irc.freenode.net). (23:32:18) jlp left the room (quit: tolkien.freenode.net irc.freenode.net). (23:32:18) jnbek_ left the room (quit: tolkien.freenode.net irc.freenode.net). (23:32:42) ChanServ [ChanServ@services.] entered the room. (23:32:42) jnbek_ [~jnbek@c-67-166-97-223.client.comcast.net] entered the room. (23:32:42) jlp [natted@c-24-2-96-137.client.comcast.net] entered the room. (23:32:42) #gubug: mode (+oo ChanServ jlp ) by irc.freenode.net (02:44:43) LuteM [~chatzilla@63-229-187-130.sxcy.qwest.net] entered the room. (02:45:40) LuteM: Anyone up? (02:47:47) LuteM left the room. (20:28:34) citab [~chatzilla@c-67-166-64-166.client.comcast.net] entered the room. (20:59:30) citab: anyone around? (21:17:37) citab left the room (quit: "Chatzilla 0.9.66 [Mozilla rv:1.7.5/20041107]"). (12:19:03) ChanServ left the room (quit: ACK! SIGSEGV!). (12:20:40) ChanServ [ChanServ@services.] entered the room. (12:20:40) #gubug: mode (+o ChanServ ) by irc.freenode.net (13:41:33) Lute [~lute@63-229-187-130.sxcy.qwest.net] entered the room. (13:46:30) Lute: Anyone here use rioutil? (16:09:14) Lute left the room (quit: "Client Exiting"). (20:31:31) jlp: let's see... nope, uh uh, and doesn't look like it. there, that takes care of all the recent questions on the channel. too bad they're not still here to get such useful responses to their questions :-) (10:42:28) fungus [~fungus@firebat.aros.net] entered the room. (16:20:14) jnbek_ left the room (quit: Read error: 104 (Connection reset by peer)). (16:26:06) jnbek_ [~jnbek@c-67-166-97-223.client.comcast.net] entered the room. (18:33:17) fungus left the room (quit: ). (19:52:24) macnewbol1 [~mac@newbold.dsl.xmission.com] entered the room. (21:50:38) macnewbol1 left the room (quit: Read error: 110 (Connection timed out)). (21:52:12) macnewbol1 [~mac@newbold.dsl.xmission.com] entered the room. (23:39:29) macnewbol1 left the room. (10:00:50) fungus [~fungus@firebat.aros.net] entered the room. (17:25:21) fungus left the room. (10:05:04) fungus [~fungus@firebat.aros.net] entered the room. (18:32:02) fungus left the room (quit: ). (10:04:20) fungus [~fungus@firebat.aros.net] entered the room. (13:20:36) macnewbold: Pretty quiet in here lately... (18:38:35) fungus left the room (quit: ). (23:37:49) jlp: you around, mac? (03:25:30) ChanServ left the room (quit: tolkien.freenode.net irc.freenode.net). (03:25:30) jlp left the room (quit: tolkien.freenode.net irc.freenode.net). (03:25:30) jnbek_ left the room (quit: tolkien.freenode.net irc.freenode.net). (03:25:47) ChanServ [ChanServ@services.] entered the room. (03:25:47) jnbek_ [~jnbek@c-67-166-97-223.client.comcast.net] entered the room. (03:25:47) jlp [natted@c-24-2-96-137.client.comcast.net] entered the room. (03:25:47) #gubug: mode (+oo ChanServ jlp ) by irc.freenode.net