Stubby + Pi-Hole + Quad9 + LXD

7 minutes, 12 seconds

Intro

For some time now, really since last November, I’ve wanted to do two things: Encrypt all my DNS traffic leaving my house LAN and run an instance of Pi-Hole to reduce ads spamming my browser (and running cryptocurrency mining software ;). As well, even if you’re connecting to a web server over HTTPS, your DNS lookups are still in the clear free to monitor, monetize or possible mangle. I thought I wanted to do this on a single, small single board computer. I even bought a few of these sweet Orange Pi Zero boards to run it on!

However, in the end I realized I already had a dual core/quad thread i3 NUC running my ownCloud instance, and I could leverage that to run some of my favorite software: LXD! See my previous write up on setting up LXD so that your containers get an IP accessible on your LAN.

LXD

Once you have LXD set up, then we can create two containers with two static IPs on your LAN, one to run the awesome ad-blocking Pi-Hole and anther to run Stubby which we’ll use to run an encrypted DNS tunnel to Quad9. Quad9 is the semi-new DNS service that does not log, is very fast and is anycasted to over 100 locations world wide. Full disclosure, I work for PCH which sponsors Quad9.

To make provisioning Ubuntu instances easier in LXD, I have this bash script which I’ve cobbled together from various sources:

#!/bin/bash
# SEE https://gist.github.com/ser/1d214423c7387d9c7528a8c1df71c6f3
# Create LXD container
#
if [ $# -eq 0 ]
then
echo "create.new.lxd.container.sh \$NAME \$IP \$GATEWAY \$PROFILE"
echo "create.new.lxd.container.sh myvm 10.0.40.100 10.0.40.1 member"
echo "    or    "
echo "create.new.lxd.container.sh myvm 10.0.40.100 10.0.40.1 default"
exit
fi
###### VARS
NAME=$1
IP_4=$2
GATEWAY=$3
PROFILE=$4
DNS=9.9.9.9
###### NETWORK CONFIG
NC="#cloud-config
version: 1
config:
- type: physical
name: eth0
subnets:
- type: static
ipv4: true
address: $IP_4
netmask: 255.255.255.0
gateway: $GATEWAY
control: auto
- type: nameserver
"
lxc launch ubuntu: $NAME -c user.network-config="$NC" -c user.user-data="$UC" -p $PROFILE

My LAN has a subnet of 192.168.68.0/24, so I used these two calls to make my two LXD instances:

create.new.lxd.container.sh pihole 192.168.68.20 192.168.68.1 default
create.new.lxd.container.sh stubby 192.168.68.21 192.168.68.1 default

Now my (only 2 (for now ;)) LXD containers are running and ready with static IPs:

# lxc list -c n,s,4
+--------+---------+----------------------+
|  NAME  |  STATE  |         IPV4         |
+--------+---------+----------------------+
| pihole | RUNNING | 192.168.68.20 (eth0) |
+--------+---------+----------------------+
| stubby | RUNNING | 192.168.68.21 (eth0) |
+--------+---------+----------------------+

Pi-Hole

The Pi-Hole folks have made the install awesomely easy. You just run this one liner after connecting to your Pi-Hole container:

curl -sSL https://install.pi-hole.net | bash

Of course, you shouldn’t just trust random commands you copy and paste from the Internet, so you can git clone from their repo as well. During the install, you can choose what ever DNS provider you want as we’re going to change it. Otherwise, accept all the defaults. Do notice the admin password at the end – you’ll need this!

After the installer is done, you should be able to to your Pi-Hole IP to log into the admin web GUI. For me this is at https://192.168.68.20/admin. After you’re logged in, go to the DNS settings page. For me this is http://192.168.68.20/admin/settings.php?tab=dns.

Once here, change “Interface listening behavior” to “Listen on all interfaces, permit all origins”. As well, change “Upstream DNS Servers” to be only the IP of your stubby container. For me this is 192.168.68.21:

Now let’s go set up stubby at 192.168.68.21!

Stubby

I’ve found a number of guides that show you how to install both Pi-Hole and Stubby on the same box, often a Raspberry Pi. While convenient to not have more hardware on your network, they also have to hack Pi-Hole to see Stubby. The reason is that Pi-Hole can’t talk to anything but port 53 on which ever IP your specify. The hack is then to get dnsmasq that Pi-Hole uses to talk directly to Stubby. The Pi-Hole GUI cannot see this change and it may get overwritten on Pi-Hole upgrades. The fact that I could have 10 or 20 containers and I’d still run the same amount of hardware coupled with the fact that the config (see Pi-Hole section above) is all done cleanly and upgrade safely in the GUI, made me use two discrete containers.

This Stubby install guide below is 99.99% not mine, so please thank redditor SphericalRedundancy and his comment about setting up Stubby on Ubuntu. Thanks SphericalRedundancy! The changes I made is to move another comment about running ldconfig as a key component of the install and remove Pi-Hole specific changes. Oh, the other change is that I want to run Stubby on port 53 which is a privileged port, so I run Stubby as root instead of creating a new user (stubby).

Install build & run time dependencies

sudo apt install -y build-essential libssl-dev libtool m4 autoconf
sudo apt install -y libev4 libyaml-dev libidn11 libuv1 libevent-core-2.0.5 

Build Stubby

git clone https://github.com/getdnsapi/getdns.git
cd getdns
git checkout develop
git submodule update --init
libtoolize -ci
autoreconf -fi
mkdir build
cd build
../configure --prefix=/usr/local --without-libidn --without-libidn2 --enable-stub-only --with-ssl --with-stubby
make
sudo make install 

Configure stubby.yml

cd ../stubby
cp stubby.yml.example stubby.yml
sed -i.bak '/  - 127.0.0.1/,/  -  0::1/{/  -  0::1/ s/.*/  -  127.0.2.2@2053\
  -  0::2@2053/; t; d}' stubby.yml
sudo /usr/bin/install -Dm644 stubby.yml /etc/stubby.yml 

Configure stubby.service

cd systemd
echo ' ' > ./stubby.service
sed -i '$i [Unit]' ./stubby.service
sed -i '$i Description=stubby DNS resolver' ./stubby.service
sed -i '$i Wants=network-online.target' ./stubby.service
sed -i '$i After=network-online.target' ./stubby.service
sed -i '$i [Service]' ./stubby.service
sed -i '$i ExecStart=/usr/local/bin/stubby -C /etc/stubby.yml' ./stubby.service
sed -i '$i Restart=on-abort' ./stubby.service
sed -i '$i User=root' ./stubby.service
sed -i '$i [Install]' ./stubby.service
sed -i '$i WantedBy=multi-user.target' ./stubby.service 

Install stubby service

sudo /usr/bin/install -Dm644 stubby.conf /usr/lib/tmpfiles.d/stubby.conf
sudo /usr/bin/install -Dm644 stubby.service /lib/systemd/system/stubby.service 

Edit host file

sudo sed -i '/127.0.2.2/d' /etc/hosts
sudo sed -i '/0::2/d' /etc/hosts
sudo sed -i '$i 127.0.2.2     Stubby' /etc/hosts
sudo sed -i '$i 0::2          Stubby-v6' /etc/hosts 

Add path to libgetdns library & running ldconfig

sudo sed -i '$i LD_LIBRARY_PATH=/lib:/usr/lib:/usr/local/lib' /etc/environment
sudo /sbin/ldconfig -v

Enable and run stubby service

sudo systemctl enable stubby
sudo systemctl start stubby 

Clean up

cd ../../../
rm -rf ./getdns/ 

Quad9

Before we configure Stubby to use Quad9’s DNS over TLS service, let’s test everything to make sure it’s working. From the Stubby container, let’s make sure Stubby is working by running sudo systemctl status stubby. My output looks like this:

● stubby.service - stubby DNS resolver
   Loaded: loaded (/lib/systemd/system/stubby.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2018-04-20 17:32:27 UTC; 2 days ago
 Main PID: 14731 (stubby)
    Tasks: 1
   Memory: 2.8M
      CPU: 13.951s
   CGroup: /system.slice/stubby.service
           └─14731 /usr/local/bin/stubby -C /etc/stubby.yml

Apr 20 17:32:27 stubby systemd[1]: Started stubby DNS resolver.
Apr 20 17:32:27 stubby stubby[14731]: [17:32:27.969574] STUBBY: Read config from file /etc/stubby.yml
Apr 20 17:32:27 stubby stubby[14731]: [17:32:27.970478] STUBBY: DNSSEC Validation is OFF
Apr 20 17:32:27 stubby stubby[14731]: [17:32:27.970498] STUBBY: Transport list is:
Apr 20 17:32:27 stubby stubby[14731]: [17:32:27.970506] STUBBY:   - TLS
Apr 20 17:32:27 stubby stubby[14731]: [17:32:27.970515] STUBBY: Privacy Usage Profile is Strict (Authentication required)
Apr 20 17:32:27 stubby stubby[14731]: [17:32:27.970522] STUBBY: (NOTE a Strict Profile only applies when TLS is the ONLY 
Apr 20 17:32:27 stubby stubby[14731]: [17:32:27.970529] STUBBY: Starting DAEMON....

All good! Now let’s ensure DNS lookups are working using the default DNS servers with a dig call, again while logged into your Stubby instance:

dig @127.0.2.2 -p 2053 quad9.net +short
216.21.3.77

Sweet! So if we use port 2053 on the localhost IP of 127.0.2.2, Stubby is listening and relaying the DNS lookups. All good.

Now edit /etc/stubby.yml so this section:

listen_addresses:
  -  127.0.2.2@2053
  -  0::2@2053

Looks like this:

listen_addresses:
  -  192.168.68.21@53
  -  127.0.2.2@2053
  -  0::2@2053

Again, swap out the 192 IP with what ever IP you’re using for your Stubby instance. Now find *all* the sections after upstream_recursive_servers: in that same file and ensure that only Quad9 is there:

upstream_recursive_servers:
  - address_data: 9.9.9.9
    tls_auth_name: "dns.quad9.net"

Removing all comments and empty lines, my complete stubby.yml file looks like this (thanks to this handy grep call: egrep -v '#' /etc/stubby.yml |grep -v -e '^$'):

resolution_type: GETDNS_RESOLUTION_STUB
dns_transport_list:
  - GETDNS_TRANSPORT_TLS
tls_authentication: GETDNS_AUTHENTICATION_REQUIRED
tls_query_padding_blocksize: 128
edns_client_subnet_private : 1
idle_timeout: 10000
listen_addresses:
  -  192.168.68.21@53
  -  127.0.2.2@2053
  -  0::2@2053
round_robin_upstreams: 1
upstream_recursive_servers:
  - address_data: 9.9.9.9
    tls_auth_name: "dns.quad9.net"

Now let’s restart Stubby and make sure it still works with another dig command (again, use your Stubby IP where the 192 IP is), but this time we don’t specify a port so the default 53 is used:

sudo systemctl restart stubby 
dig @192.168.68.21  plip.com +short 
192.81.135.175

All good! Finally, we already configured the Pi-Hole instance to use .21 (but you use your IP!) so we can test the full round trip by running a dig call against it which will, int turn, use the Stubby proxy:

sudo systemctl restart stubby 
dig @192.168.68.20  plip.com +short 
192.81.135.175

The final step is to configure your router to hand out 192.168.68.20 as the DNS server. Each router is different, but for me, I run pfSense, so I changed it there under “ServicesDHCP -> ServerLAN” and then on the page under “DNS servers” I entered just my Pi-Hole IP (you enter yours though, not mine!) of 192.168.68.20

Now you can enjoy ad blocked, unlogged, encrypted DNS on your home LAN – w00t!

APRICOT 2018

2 minutes, 41 seconds

Be sure to read my follow up post about trekking in Nepal after the conference!

Intro

This is post closely mirrors my trip 2 years ago to attend SANOG 27: a conference followed by some awesome tourism. Like before I was luck to have my work send me to APRICOT 2018 and was again humbled to have my talk selected for presenting.

APRICOT is the “Asia Pacific Regional Internet Conference on Operational Technologies”. It’s more or less a larger SANOG meeting encompassing a greater regional area so more folks come. However, a lot of the usual suspects were in attendance (if I can get away saying that having only been to two conferences in Kathmandu ;).

Talks and Workshops

Note: All links below have full YouTube streams to watch and decks to download. Be sure to click through if you’re interested in knowing more about any of them!

I attended a slew of talks, but some of highlights for me were:

  • Daniel Griggs of NZRS gave a fun talk, “Accidental Data Analytics” that was eerily similar to mine and presented immediately before me.
  • Yasunari Momoi and Kota Kanbe’s talk “Vuls & VulsRepo” was…just pure awesome. Especially Kota’s half of the talk. He’s inspired me to do a pull request for how to deploy Vuls on Ubuntu server.
  • I missed the first half, but I very much enjoyed Dima Bekerman’s workshop “Brace Yourselves: DDoS is Coming“.
  • My own co-worker Gaël Hernandez’s talk was of course great to see, “Building and operating a global DNS content delivery anycast network“. It’s always a fun twist to see a presentation intended for outsiders on something that you’re so deeply inside of.
  • While it was plagued with some unfortunate overwhelming of the Hypervisor used for the workshop, Nick Hilliard’s, “IXP Manager Tutorial – Part 2” was a good introduction to introduction to IXP Manager, which I only knew about in concept, not practice.
  • Artyom Gavrichenkov’s talk, “DNS Survival Guide” was a fun history of DNS including up to date comparison of which resolvers are fastest (BIND isn’t even on the list ;). As well, his lightening talk on Feb 28th titled “Memcached amplification DDoS: a 2018 threat” was ridiculously prescient: literally the very next day the largest recorded DDoS hit the ‘net. This was a 1.1Tbps (yes, capital “T” for “Terabit”) attack hit GitHub on March 1st using the Mecached amplification.

Presentation

As I mentioned earlier, I was lucky enough to be selected to speak at APRICOT this year. Unlike my security based talk last trip, I was speaking about DNS and Stats this time. Specifically, my talk was title, “Visualizing a global DNS network with open source tools“. While the final, customer facing product I’m talking about (ccTLD DNS Stats) isn’t live yet (it’s in internal Beta), all the software I talk about is ready for prime time. Well…DNSAuth could use some love, but the other three are prime time ready, for sure ;) Those are:

  • DNSAuth – PCH Go App for cooking pcaps to TSDB faster
  • MapTable – PCH JS app for rendering SVG maps and tables
  • C3.js – Client side, real time charts
  • InfluxDB – TSDB to archive data

Networking and co-workers

It was a real treat to go to Nepal to attend APRICOT. Like all good conferences, you meet a lot of people that you can network with. I had a lot of fun conversations! Best thing of all about going to APRICOT was that I got to chat with all my co-workers, including two I’d never met before who came all the way from Dublin and Johannesburg. Awesome!

Poon Hill Trek

3 minutes, 53 seconds

Be sure to read my prior post about attending the APRICOT 2018 conference in Nepal before the trek!

Intro & Highlights

This was my first time trekking and my first time trekking in Nepal. While we had considered a more rigorous route of the Annapurna Base Camp (ABC), the max elevation of 4130m (13,400ft) had me concerned. I opted for a much more tame Poon Hill route, which tops out at 3210m (10,000ft) which was more my speed. In the end the elevation ended up being a non-issue. In fact, just about all my concerns were non issues ;)

We started on day 1 with celebrating Holi. At first we were reticent, but then joined the festivities by having the local kids do us up. It was lovely.

My favorite day for views was day 3. Poon Hill was clear for our pre-dawn hike and the sunrise and Annapurna mountains were breath taking. Bring money to pay for the entrance fee and hot drinks at the top of the hike! The rest of the day the mountain views kept us company and they were epic. Thanks Nepal!!

The rest of the trip was amazing. While you can feel modern amenities creeping in every town, the both benefit the towns folk and pull the “rural-ness” away from what you can tell was once a remote village. You know, the old man in me wants to have no tech when I hike/trek. I’m there to experience the place and nature, not check me email ;)

Farm animals where everywhere and it was a delight to see them. All the locals were nice, though some drove a harder bargain for their tea house prices. Check around a bit and don’t settle for the first one.

Click any of the photos in this post to see the gallery! Note, you’ll see pictures of my work conference too ;)

Finally, there was a really…oddly comforting I guess, moment when the cabi on the way to Pokhora put on “Like a Virgin“. I loved it (it’s the second clip in this video). Sorry I was too lazy to edit this video down, it’s 3 min long:

Route

Based on this write up, among others, we ended up doing this route (all prices in US Dollars and are per person):

  • Day 1: Kathmandu -> Pokhora – 30 Minute Flight $250
  • Day 1: Pokhora -> Nayapul – 1.5 hour Taxi $30
  • Day 1: Nayapul -> Hile – 1 hour Jeep $30
  • Day 1: Hile -> Uleri – trek
  • Day 2: Uleri -> Ghorepani -> Poon Hill -> Ghorepani (day hike to Poon Hill after dropping packs ;)- trek
  • Day 3: Ghorepani -> Poon Hill -> Ghorepani -> Tadapani – trek
  • Day 4: Tadapani -> Taulung -> Chomrong -> Taulung -> Chinu/Jhinudanda – trek
  • Day 5: Chinu/Jhinudanda -> Tolka – trek
  • Day 5: Tolka -> Pokhora – 1 hour Jeep $15
  • Day 5: Pokhora -> Kathmandu – 30 Minute Flight paid for on first item

I strongly encourage you pick up a map in Kathmandu or Pokhora. They’re cheap and really handy to have. We got this one which was readily available in map stores in Thamel.

Packing list

My pack was 9.5kg. I think if I didn’t carry spare running sneakers, USB battery, as fully stocked first aid kit, and micro 4 3rds camera, I’d have been around 7.5 or 8. I have no regrets though! My list is taken (and culled) from wanderingsasquatch.com. The “two shirts, one for hiking, one for not” was really great. During the warmer months you could totally skip the sleeping bag to save some real weight. The tea houses all had blankets. While I took photos with my nice camera I mentioned above, I actually was fine with cell phone’s pictures, given it’s a newer phone. Walking poles are optional, but I loved them. My gaiters and rain pants went unused the entire trip.

Hiking Clothes

  • Hiking Pants
  • 2 short sleeve shirt (preferably a “technical” shirt with synthetic fibers to help keep you dry)
  • 2 pairs of socks (in case one pair gets wet during the day, you can change socks)
  • upper and lower base layer
  • Down jacket
  • Sun hat
  • warm hat
  • Gloves (look for ones with Gore-Tex)
  • Raincoat
  • Rainpants
  • gaiters
  • Hiking Boots Water proof
  • Sunglasses
  • running shoes/flip flops

Gear

  • Backpack
  • Phone & Cord (doubles as camera)
  • Map
  • Sleeping bag
  • Water purification tabs
  • Water bladder (2 liter size)
  • Headlamp
  • Pack rain cover
  • Passport & Permits (you’ll be asked for these at every checkpoint)
  • Cash –
  • Ziplock bags for organizing/waterproof

Toiletries

Toothpaste, soap and TP can be bought along the way if you want.

  • Toilet Paper
  • Soap
  • Toothbrush
  • Toothpaste
  • Sunscreen
  • Lip balm with SPF
  • nail trimmer
  • ear plugs (tea houses can be LOUD)
  • First aid kit with at least: band aids, moleskin, ibuprofen, antidiarrheal, tweezers

Optional

  • Power Plug adapter (just about every tea house had a universal outlet that fit every wall charger)
  • Pillowcase
  • Trekking poles
  • Real Camera & Extra battery

You’ll want a small pack – no bigger than 40L likely – to fit this in. That big-ass “I fit it all on my back” pack that you carry your tent, stove, pots and pans and food for a week is too big! Here’s my pack (foot for scale) and a day pack I checked at my hotel in Kathmandu with a change of clothes and other stuffs I didn’t want to bring with:

Laser cut Spirograph and gift box

0 minutes, 35 seconds

For E’s birthday two months ago I made him a laser cut spirograph. It works pretty well and I really enjoyed making it! I tried downloading some existing plans for them, but they didn’t mesh well. I ended up using Inkscape to draw the gears. That tutorial makes them look pretty for an illustration, but they’re real gears and the mesh well! Not perfect, but well enough. For the case I started with a basic box-joint box from the ever awesome MakerCase, and then used Inkscape to add the florishes.

If you want to make one yourself, here’s the .svg of the two inner and outer gears. If you want to make the case, here’s the .svg for that too.

Otherwise, enjoy the fabrication and final product pictures:

Poor man’s debugging high load on an Apache server

1 minute, 58 seconds

Hopefully you have a fancy solution to monitor your logs, but if you don’t, then I have a little treat for you. At work today I was trouble shooting why our web server would spike up to a load average of over 200 (I’d never actually seen a quad core server go much over 40 – neat!).  At first I stumbled around looking for non-obvious causes and then I just looked at the traffic.

I was pretty sure a single IP was slamming our server at midnight UTC.  To verify this I wrote a quick bash script to log the load averages, the count of the top process and it’s name, and finally, the count and IP of the top visitor over the prior minute.  It’s 13 lines of actual code and more than double that with comments.

By executing this in a crontab entry every minute, we get nice easy to read logs (and easy for a computer to parse):

* * * * * /root/apachetop.load.ps.sh >> /var/log/httpd/apachetop.load.ps_log

The resulting log entry has the following handy fields:

  • LOAD_1 – load average for past 1 minute\
  • LOAD_5 – load average for past 5 minutes
  • LOAD_15 – load average for past 15 minutes
  • APACHE_CNT – Count of top IP in the last minute
  • APACHE_IP – Top IP in the last minute
  • APACHE_DIR – Top Directory being hit by IP
  • PROCESS_CNT – Count of top process
  • PROCESS – Process name for count

A sample log entry looks like this:

30/Nov/2017:07:24 LOAD_1="1.60" LOAD_5="1.50" LOAD_15="1.35" APACHE_CNT="5" APACHE_IP="127.0.0.1" APACHE_DIR="/resources" PROCESS_CNT="25" PROCESS="/usr/sbin/httpd"

Otherwise, here’s the code from my gist, but please check it out on GitHub!

#!/usr/bin/env bash
date=`date --date='1 minutes ago' +%d/%b/%Y:%R`
apache=`grep $date /var/log/httpd/ssl_access_log |grep -v '::1'|cut -d' ' -f1,6,7|cut -d'/' -f 1,2|sort|uniq -c|sort -nr|head -n 1|tr '"' ' '|xargs`
apache_count=`echo $apache|cut -d' ' -f 1`
apache_IP=`echo $apache|cut -d' ' -f 2`
apache_dir=`echo $apache|cut -d' ' -f 4`
load1=`uptime|cut -d' ' -f14|tr ',' ' '|xargs`
load5=`uptime|cut -d' ' -f15|tr ',' ' '|xargs`
load15=`uptime|cut -d' ' -f16|tr ',' ' '|xargs`
process=`ps xa|cut -d':' -f 2|cut -d' ' -f 2|sort|uniq -c|sort -rn|head -n 1|tr '\n' ' '|xargs`
process_count=`echo $process|cut -d' ' -f 1`
process_name=`echo $process|cut -d' ' -f 2`
echo $date LOAD_1=\"$load1\" LOAD_5=\"$load5\" LOAD_15=\"$load15\" APACHE_CNT=\"$apache_count\" APACHE_IP=\"$apache_IP\" APACHE_DIR=\"$apache_dir\" PROCESS_CNT=\"$process_count\" PROCESS=\"$process_name\"

From zero to LXD: Installing a private compute cloud on a Cisco C220 M4SFF

17 minutes, 42 seconds

Introduction

Having some nice server grade hardware on hand, I knew I needed to put it to good use.  I’m talking about the Cisco C220 M4SFF:

This lil’ guy comes with some nice specs. The one I have is configured thusly:

  • 1U
  • 64GB RAM
  • 8 x 240GB SSD
  • 2 x 10 core 2.4Ghz E5-2640 v4 Xeon  (40 threads)
  • 2 x PSU
  • Cisco 12G SAS RAID controller (more on this later)
  • 2 x 1Gbit on-board NIC

While having only 2 of his 24 RAM slots full, we can make do with this – besides, filling it up to 1.5TB of RAM will cost $14,400 – ouch! Specifically, I think we can make this into a nice host of lots and lots of virtual machines (VMs) via a modern day hypervisor.

Just 4 steps!!

Looking about at what might be a good solution for this, I cast my eyes upon LXD. I’m already a huge fan of Ubuntu, the OS LXD works best with. As well, with 2.0, Canonical is really pumping up LXD as a viable, production worthy solution for a running your own cloud compute. Given that 2.0 now ships with Ubuntu 16.04, I figured I’d give it for a spin.

I’m guessing a techie at Canonical got a hold of a marketing person at Canonical; they realized that with LXD baked into Ubuntu, anyone comfy on the command could line run 4 steps to get an instant container.  This meant that while there’s a lot more complexities to getting a LXD running in a production environment (LAN bridge, ZFS Pools, ZFS Caches, JBOD on your RAID card, static IPs dynamically assigned, all covered below), it was indeed intensely satisfying to have these 4 steps work exactly as advertised.  Sign me up!

But what about the other options?  Did I look at more holistic approaches like OpenStack?  Or even the old, free as in beer vSphere? What about the up and coming SmartOS (not to be confused with the unrelated TrueOS or PureOS)? Given the timing of the demise of SmartOS’s roots in Solaris, maybe it’s time to pay my respects? Finally, there is of course KVM.

While I will still will take up my friend’s offer to run through an OpenStack install at a later date, I think I’m good for now.

This write up covers my learning how to provision my particular bare metal, discovering the nuances of ZFS and finally deploying VMs inside LXD.  Be sure to see my conclusion for next steps after getting LXD set up.

Terms and prerequisites

Since this post will cover a lot of ground, let’s get some of the terms out of the way. For experts, maybe skip this part:

  • Ubuntu – the host OS we’ll install directly onto the C220, specifically 16.04 LTS
  • LXD – say “Lex-dee” (and not this one) – the daemon  that controls LXC. LXD  ships with 16.04 LTS. All commands are lxc
  • LXC –  Linux Containers – the actual hypervisor under LXD. Yes, it’s confusing – even the man says so ;)
  • container – interchangeably used with “VM”, but it’s a secure place to run a whole instance of an OS
  • ZFS – the file system that allows for snapshots, disk limits, self healing arrays, speed and more.
  • JBOD/RAID – Two different disk configurations we’ll be using.
  • CIMC – Cisco Integrated Management Interface which is a out of band management that runs on Cisco server hardware. Allows remote console over SSL/HTML.
  • Stéphane Graber – Project leader of LXC, LXD and LXCFS at Canonical as well as an Ubuntu core developer and technical board member. I call him out because he’s prolific in his documentation and you should note if you’re reading his work or someone else’s.
  • KVM – “Keyboard/Video/Mouse” – The remote console you can access via the CIMC via a browser. Great for doing whole OS installs or for when you bungle your network config and can no longer SSH in.
  • KVM – The other KVM ;)  This is the Kernel Virtual Machine hypervisor, a peer of LXC. I only mention it here for clarification.  Hence forth I’ll always be referencing the Keyboard/Video/Mouse KVM, not the hypervisor.

Now that we have the terms out of the way, let’s give you some home work.  If you had any questions, specifically about ZFS and LXD, you should spend a couple hours on these two sites before going further:

  • Stéphane Graber’s The LXD 2.0: Blog post series – These are the docs that I dream of when I find a new technology: concise, from the horses mouth, easy to follow and, most of all, written for the n00b but helpful for even the most expert
  • Aaron Toponce’s ZFS on Debian GNU/Linux – while originally authored back on 2012, this series of posts on ZFS is canon and totally stands the test of time.  Every other blog post or write up I found on ZFS that was worth it’s salt, referenced Aaron’s posts as evidence that they knew they were right.  Be right too, read his stuff.

Beyond that, here’s a bunch of other links that I collected while doing my research. I will call out that Jason Bayton’s LXD, ZFS and bridged networking on Ubuntu 16.04 LTS+ gave me an initial burst of inspiration that I was on the right track:

LXD Info:

ZFS Info

Prep your hardware

Before starting out with this project, I’d heard a lot of bad things about getting Cisco hardware up and running.  While there may indeed be hard-to-use proprietary jank in there somewhere, I actually found the C220M4 quite easy to use.  The worst part was finding the BIOS and CIMC updates, as you need a Cisco account to download them.  Fortunately I know people who have accounts.

After you download the .iso, scrounge up a CD-R drive, some blank media and  burn the .iso. Then, plug in a keyboard, mouse and CD drive to your server, boot from it with the freshly burned disk, and upgrade your C220. Reboot.

Then you should plug the first NIC into a LAN that your workstation is on that has DHCP.  What will happen is that the CIMC will grab an address and show it to you on the very first boot screen.

You see that 10.0.40.51 IP address in there (click for large image)?  That means you can now control the BIOS via a KVM over the network.  This is super handy. Once you know this IP, you can point your browser to the c220 and never have to use a monitor, keyboard or mouse directly connected to it.  Handy times!  To log into the CIMC the first time, default password is admin/password (of course ;).

The upgraded CIMC looks like this:

I’ve highlighted two items in red: the first, in the upper left, is the secret, harder-to-find-than-it-should-be menu of all the items in the CIMC.  The second, is how to to get to the KVM.

For now, I keep the Ubuntu 16.04 install USB drive plugged into the C220.  Coupled with the remote access to the CIMC and KVM, this allows to me to easily re-install on the bare metal, should anything go really bad.  So handy!

While you’re in here, you should change the password and set the CIMC IP to be static so it doesn’t change under DHCP.

Prep your Disks

Now it’s time to set up the RAID card to have two disks be RAID1 for my Ubuntu boot drive, and the rest show up as JBOD for ZFS use.  When accessing the boot process, wait until you see the RAID card prompt you and hit ctrl+v.  Then configure 2 of your 6 drives as a RAID1 boot drive:

And then expose the rest of your disks as JBOD.  The final result should look like this:

Really, this is too bad that this server has a RAID card.  ZFS really wants to talk to the devices directly and manage them as closely as possible.  Having the RAID card expose them as JBOD isn’t ideal.  The ideal set up would be to have a host bus adapter (HBA) instead of the raid adapter. That’d be a Cisco 9300-8i 12G SAS HBA for my particular hardware. So, while I can get it to work OK, and you often see folks set up their RAID cards as JBOD just like I did, it’s sub-optimal.

Install Ubuntu 16.04 LTS + Software

F6 to select boot drive

As there’s plenty of guides on how to install Ubuntu server at the command line, I won’t cover this in too much detail.  You’ll need to:

  1. Download 16.04 and write it to a USB drive
  2. At boot of the C220, select F6 to select your USB drive as your boot device (see right)
  3. When prompted in the install processtt, select the RAID1 partition you created above.  For me this was /dev/sdg, but will likely be different for you. I used LVM with otherwise default partitions.
  4. Set your system to “Install security updates automatically” when prompted with “Configuring tasksel
  5. For ease of setup, I suggest selecting “OpenSSH Server” when prompted for “Software select”. This way we won’t have to install and enable it later.
  6. Finally, when you system has rebooted, login as your new user and you can install the core software we’ll need for the next steps:
sudo apt-get install lxd zfsutils-linux bridge-utils
sudo apt install -t xenial-backports lxd lxd-client

ZFS

ZFS is not for the faint of heart.  While the LXD can indeed be run on just about any Ubuntu 16.04 box and the default settings will just work, and I do now run it on my laptop regularly, getting a tuned ZFS install was the hardest part for me.  This may be because I have more of a WebDev background and less of a SysAdmin background.  However, if you’ve read up on Aaron Toponce’s ZFS on Debian GNU/Linux’s posts, and you read my guide here, you’ll be fine ;)

After reading and hemming and hawing about which config was best, I decided that getting the most space possible was most important.  This means I’ll allocate the remaining 6 disks to the ZFS pool and not have a hot spare.  Why no hot spare?  Three reasons:

  1. This isn’t a true production system and I can get to it easily (as opposed to an arduous trip to a colo)
  2. The chance of more than 2 disks failing at the same time seems very low (though best practice says I should have different manufacturer’s batches of drives – which I haven’t checked)
  3. If I do indeed have a failure where I’m nervous about either the RAID1 or RIADZ pool, I can always yank a drive from one pool to another.

Now that I have my 6 disks chosen (and 2 already running Ubuntu), I reviewed the RAID levels ZFS offers and choose RAIDZ-2 which is, according to Aaron, “similar to RAID-6 in that there is a dual parity bit distributed across all the disks in the array. The stripe width is variable, and could cover the exact width of disks in the array, fewer disks, or more disks, as evident in the image above. This still allows for two disk failures to maintain data. Three disk failures would result in data loss. A minimum of 4 disks should be used in a RAIDZ-2. The capacity of your storage will be the number of disks in your array times the storage of the smallest disk, minus two disks for parity storage.”

In order to have a log and cache drive (which can be the same physical disk), I’ll split it so 5 drives store data and 1 drive with two partitions store my log and cache. Read up those log and cache links to see how these greatly improve ZFS performance, especially if your data drives are slow and you have a fast cache drive (SSD or better).

To find the devices which we’ll use in our pool, let’s first see a list of all devices from dmesg:

sudo dmesg|grep \\[sd|grep disk
[ 17.490156] sd 0:0:0:0: [sdh] Attached SCSI removable disk
[ 17.497528] sd 1:0:9:0: [sdb] Attached SCSI disk
[ 17.497645] sd 1:0:8:0: [sda] Attached SCSI disk
[ 17.499144] sd 1:0:10:0: [sdc] Attached SCSI disk
[ 17.499421] sd 1:0:12:0: [sde] Attached SCSI disk
[ 17.499487] sd 1:0:11:0: [sdd] Attached SCSI disk
[ 17.499575] sd 1:0:13:0: [sdf] Attached SCSI disk
[ 17.503667] sd 1:2:0:0: [sdg] Attached SCSI disk

This shows us that sdh is our USB drive (“removable”) and that the rest are the physical drives.  But which is our boot drive comprised of a RAID1 volume?  It’s the one mounted with an ext4 file system:

sudo dmesg|grep mount|grep ext4
[ 21.450146] EXT4-fs (sdg1): mounting ext2 file system using the ext4 subsystem

So now we know that sdb, sda, sdc, sde, sdd and sdf are up for grabs. The hard part with ZFS is just understanding the full scope and impact of how to set up your disks.  Once you decide that, things become pleasantly simple.  Thanks ZFS!  To set up our RAIDZ-2 pool of 5 drives and enable compression to it is just these two command:

sudo zpool create -f -o ashift=12 lxd-data raidz1 sda sdb sdc sdd sde
sudo zfs set compression=lz4 lxd-data

Note that there’s no need to edit the fstab file, ZFS does all this for you. Now we need to create our log and cache partitions on the remaining sdf disk.  First find out how many blocks there are on your drive, 468862128 in my case, using parted:

parted /dev/sdf 'unit s print'|grep Disk|grep sdf
 Disk /dev/sdf: 468862128s

Aaron’s guide suggests we can make both partitions in one go with this command, which will make a 4GB log partition and use the remainder to make a ~230GB cache partition:

parted /dev/sdf unit s mklabel gpt mkpart log zfs 2048 4G mkpart cache zfs 4G 468862120

However, this didn’t work for me. I had to go through the parted using the interactive mode. Run this twice starting both times with the sdf device:

parted /dev/sdf unit s

I forgot to record this session when I formatted my partitions, but you should end up with this when you’re done:

sudo parted /dev/sdf print
Model: ATA SAMSUNG MZ7LM240 (scsi)
Disk /dev/sdf: 240GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number Start  End    Size   File system Name Flags
 2     1049kB 3999MB 3998MB zfs         log
 1     4000MB 240GB  236GB  zfs         cache

OK – almost there!  Now, because linux can mount these partitions with different device letters (eg sda vs sdb), we need to use IDs instead in ZFS.  First, however, we need to find the label to device map:

sudo ls -l /dev/disk/by-partlabel/
 total 0
 lrwxrwxrwx 1 root root 10 Sep 15 11:20 cache -> ../../sdf2
 lrwxrwxrwx 1 root root 10 Sep 15 11:20 log -> ../../sdf1

Ok, so we’ll use sdf1 as log and sdf2 as cache.  Now what are they’re corresponding IDs?

sudo ls -l /dev/disk/by-id/|grep wwn|grep sdf|grep part
 lrwxrwxrwx 1 root root 10 Sep 15 11:20 wwn-0x5002538c404ff808-part1 -> ../../sdf1
 lrwxrwxrwx 1 root root 10 Sep 15 11:20 wwn-0x5002538c404ff808-part2 -> ../../sdf2

Great, now we know that wwn-0x5002538c404ff808-part1 will be our log and wwn-0x5002538c404ff808-part2 will be our cache.  Again, ZFS’s commands are simple now that we know what we’re calling.  Here’s how we add our cache and log:

zpool add lxd-data log /dev/disk/by-id/wwn-0x5002538c404ff808-part2
 zpool add lxd-data cache /dev/disk/by-id/wwn-0x5002538c404ff808-part1
 wwn-0x5002538c404ff808-part2 log
 wwn-0x5002538c404ff808-part1 cache

Now we can confirm our full ZFS status:

sudo zpool iostat -v
                                   capacity     operations    bandwidth
pool                            alloc   free   read  write   read  write
------------------------------  -----  -----  -----  -----  -----  -----
lxd-data                        1.37G  1.08T      0     16  3.26K  56.8K
  raidz1                        1.37G  1.08T      0     16  3.25K  56.5K
    sda                             -      -      0      4  1.04K  36.5K
    sdb                             -      -      0      4  1.03K  36.5K
    sdc                             -      -      0      4  1.01K  36.5K
    sdd                             -      -      0      4  1.02K  36.5K
    sde                             -      -      0      4  1.03K  36.5K
logs                                -      -      -      -      -      -
  wwn-0x5002538c404ff808-part2   128K  3.72G      0      0      4    313
cache                               -      -      -      -      -      -
  wwn-0x5002538c404ff808-part1   353M   219G      0      0      0  20.4K
------------------------------  -----  -----  -----  -----  -----  -----

Looking good!  Finally, we need to keep our ZFS pool healthy by scrubbing regularly.  This is the part where ZFS self heals and avoids bit rot. Let’s do this with a once per week job in cron:

0 2 * * 0 /sbin/zpool scrub lxd-data

Networking

Now, much thanks to Jason Bayton’s excellent guide, we know how to have our VMs get IPs in on LAN instead of being NATed.  Right now my NIC is enp1s0f0 and getting an IP via DHCP.  Looking in  /etc/network/interfaces I see:

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto enp1s0f0
iface enp1s0f0 inet dhcp

But to use a bridge (br0) and give my host the static IP of 10.0.40.50, we’ll make that file look like this:

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto br0 
iface br0 inet static
       address 10.0.40.50
       network 10.0.40.0
       netmask 255.255.255.0
       gateway 10.0.40.1
       broadcast 10.0.40.255
       dns-nameservers 8.8.8.8 8.8.4.4
       bridge_ports enp1s0f0
iface enp1s0f0 inet manual

This will allow the VMs to use br0 to natively bridge up to enp1s0f0 and either get a DHCP IP from that LAN or be assigned a static IP. In order for this change to take effect, reboot the host machine.  Jason’s guide suggests running ifdown and ifup, but I found I just lost connectivity and only a reboot would work.

When you next login, be sure you use the new, static IP.

 

Set file limits

While you can use your system as is to run containers, you’ll really want to updated per the LXD recommendations.  Specifically, you’ll want to allow for a lot more headroom when it comes to file handling.  Edit sysct.conf:

sudo vim /etc/sysctl.conf

Now add the following lines, as recommended by the LXD project:

fs.inotify.max_queued_events = 1048576
fs.inotify.max_user_instances = 1048576
fs.inotify.max_user_watches = 1048576

As well, edit limits.conf

sudo vim /etc/security/limits.conf

Now add the following lines. 100K should be enough:

* soft nofile 100000
* hard nofile 100000

LXD

Now that we have our bare metal provisioned, we have our storage configured, file limits increased and our network bridge in place, we can actually get to the virtual machine part, the LXD part, of this post. Heavily leveraging again Jason Bayton’s still excellent guide, we’ll initialize LXD by running lxd init. This go through the first time LXD guide and ask a number of questions about how you want to run LXD.  This is specific to 2.0 ( >2.1 has different questions).  You’ll see below, but the gist of it is that we want to use our ZFS pool and our network bridge:

lxd init
 -----
 Name of the storage backend to use (dir or zfs) [default=zfs]:
 Create a new ZFS pool (yes/no) [default=yes]? no
 Name of the existing ZFS pool or dataset: lxd-data
 Would you like LXD to be available over the network (yes/no) [default=no]? yes
 Address to bind LXD to (not including port) [default=all]:
 Port to bind LXD to [default=8443]:
 Trust password for new clients:
 Again:
 Do you want to configure the LXD bridge (yes/no) [default=yes]?

Note that we don’t accept the ZFS default and specify our own ZFS pool lxd-data. When you say “yes”, you want to configure the bridge, you’ll then be prompted with two questions.  Specify br0 when prompted. Again, refer to Jason Bayton’s guide for thorough screen shots:

Would you like to setup a network bridge for LXD containers now? NO
Do you want to use an existing bridge? YES
Warning: Stopping lxd.service, but it can still be activated by:
 lxd.socket
 LXD has been successfully configured

Success!  LXD is all set up now.  Let’s prep the host OS for a bunch of open files now.

Hello world container

Whew!  Now that all the hard parts are done, we can finally create our first container. In this example, we’ll be creating a container called  nexus and put some basic limitations on it. Remember, do not create these as root! One of the massive strengths of LXD is that it’s explicitly designed to be secure and running containers as root would remove a lot of this security. We’ll call lxc init and then pass some raw commands to set the IP to be .52 in our existing /24 subnet on the bridge. If you run lxc list our new container shows up. That all looks like this:

lxc init ubuntu: nexus
echo -e "lxc.network.0.ipv4 = 10.0.40.52/24\nlxc.network.0.ipv4.gateway = 10.0.40.1\n" | lxc config set nexus raw.lxc -
lxc start nexus

lxc list
+-------+---------+-------------------+------+------------+-----------+
| NAME  | STATE   | IPV4              | IPV6 | TYPE       | SNAPSHOTS |
+-------+---------+-------------------+------+------------+-----------+
| nexus | RUNNING | 10.0.40.52 (eth0) |      | PERSISTENT | 0         |
+-------+---------+-------------------+------+------------+-----------+

Finally, we want to limit this container to have 4 CPUs, have 4GB of RAM and 20GB of disk.  Like before, these commands are not run as root. They are imposed on the container in real time and do not require a restart.  Go LXD, go!

lxc config set nexus limits.cpu 4
lxc config set nexus limits.memory 4GB
lxc config device set nexus root size 20GB

You’re all done!  From here you can trivially create tons more containers.  You can let them have an ephemeral IP via DCHP on the bridge or easily set a static IP. If you have a resource intensive need, you don’t set the limits above, the container will have access to the full host resources.  That’d be 40 cores, 64GB RAM and 1TB of disk. If you created more VMs with out resource limiting, LXC would do the balancing of resources so that each container gets it’s fair share.  There’s a LOT more configuration options available.  As well, even the way I declared a static IP can likely be done via dnsmasq, (see this cyberciti.biz article), but I had trouble getting that to work on my setup, hence the raw calls.

Next steps

Now that you you’ve bootstrapped your bare metal, dialed in your storage back end, specifically deployed your LAN, you should be all set, right?  Not so fast! To make this more of a production deployment, you really need to know how (and practice!) your backup and restore procedures.  This likely will involve the snapshot command (see post 3 of 12) and then backing those snapshots off of the ZFS pool. Speaking of the ZFS pool, our set up as is doesn’t have any alerting if a disk goes bad.  There’s many solutions out there for this – I’m looking at this bash/cron combo.

Having a more automated system to provision containers integrated with a more DevOps-y set up makes sense here. For me this might mean using Ansible, but for you it might be something else!  I’ve heard good things about cloud-init and Juju charms. As well, you’d need a system to monitor and alert on resource consumption.  Finally, a complete production deployment would need at least a pair, if not more, of servers so that you can run LXD in a more highly available setup.

Given this is my first venture into running LXD, I’d love any feedback from you!  Corrections and input are welcome, and praise too of course if I’ve earned it ;) Thanks for reading this far on what may be my longest post ever!

Punk Rock Band Name: Desiccated Ceiling Rabbit

0 minutes, 24 seconds

I was over at my friends house  and they have kids.  One of their kids’ favorite games is to take their sticky, stretchy toy (like these, but rabbit shaped) and throw it up on the ceiling.  Their son has a patented lick-and-stuff-under-your-armpit-for-5-seconds technique which imbues just the right amount of moisture. It’s amazing.

However, before i saw his patented technique, and before even knowing they had the toys, I noticed something was stuck up on the ceiling and asked what it was.

Desiccated Ceiling Rabbit

Was the answer I got back!

Dog Poop Reminder

2 minutes, 18 seconds

This is a truly free idea idea, like my other one, but not like the category at large which has a lot of open source stuff (also free!).

Often when I’m out for a trail run or for a hike with the fam, I see bags of dog poop. Like, right there on the side of the trail, some one saw their dog poop, they pulled a bag out of their pocket, and then they put the stinky poop in the bag. Super nice of them! Then, for some reason, they put the bag of poop on the side of the trail. (Now that I’ve mentioned this to you, I’m sure you’ll see a lot of these.)

But why leave it trailside?!? Why not take it to a trash can and throw it away? Presumably, this is because they don’t want to hike for 2 hours holding a bag of poop. So they think, “I’ll just leave it here and then remember to pick it up on the way back”. Then they forget.

So, what they need is a reminder! This would start out as super simple app for you phone. You launch the app and it immediately opens your camera. You take a pic of yer dog’s poop, and off you go. When you took that picture, the app also noted your GPS coordinates. Then, when you get with in say 100ft of the poop an alarm goes of, “PICK UP YOUR DOG’S POOP!” and shows you the picture you took to help you remember where/which poop it is.

Subsequent versions could implement:

  • Time out reminder – This will to go off N minutes after you leave the poop. Say 4 hours, which assumes that you didn’t go back and pick it up. This alert could also be tied to the fact that you didn’t go back to the GPS poop spot.
  • Multi-dog poop awareness – When the app launches it shows you a picture of all your dogs. You pick the dog which just pooped and then take a pic. This way you can have multiple poops in one trip and know which dog did what.
  • Poop Analysis – The app could alert you if your dog hasn’t pooped on your walks or something. And, you know, if you wanna be like all the cool kids you could do real time poop analysis using Tensor Flow. As well, if you do the same walk every day, you could heat map where your dog is most likely to go.
  • Poop Pickup Up-sell – Now that you have GPS coordinates of your pet’s poop, just go pay some one to pick it up for you with a click!
  • Poop Points – I’m not sure what they’d be worth, but you could social network this bad boy and people could get poop points for picking up other people’s dog’s poop. Certainly the gamification might draw people, but otherwise, I dunno what you’d redeem these points for

Go forth and make this app! The idea is on me, for free.

Otherwise, if you’re a dog owner, please, please pick up that poop.

Punk Rock Band Name: bad pig

0 minutes, 18 seconds

Just after getting cleaned in the bath, my son was going on and on about a book he read at school.  Likely, “The Three Little Wolves and the Big Bad Pig“, but I could be wrong.

Anyway, at one point he said you should write “bad pig” all over your body. To which I mentally added, “and go on stage half naked to perform in your band titled,”:

bad pig