DevOps Bag-of-Tricks

Table of Contents

Every blog post I've tried to write in the last year has started from a simple idea and gradually expanded in scope until I'm left with a sprawling epic that I know no one will ever read and I will never bring myself to publish. So here's something different. This isn't so much a blog post as a wiki entry. I'm allowing myself to edit it at any time, which alleviates some of my publishing-related anxiety.

Here are my notes to myself on how to play sysadmin in a cloud environment.

1 Inspecting a machine

Sometimes you'll SSH in to an unfamiliar environment and need to get the lay of the land quickly in order to start debugging. We'll start with a 1000-foot view before zooming in.

1.1 Who else is here?

Whenever I get paged because of an anomalous resource usage pattern on a box (e.g., the disk is nearly full, CPU or memory usage are spiking, etc.), the first thing I do is use w to check for humans doing silly things.

w
 07:05:33 up 1 day, 5 min,  3 users,  load average: 0.16, 0.28, 0.52
USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
astahlma tty7     :0               Mon07   24:04m  2:15   0.28s /usr/bin/lxsession -s LXDE -e LXDE
astahlma pts/2    tmux(3208).%0    Mon07   22:43m  1:13   2.80s -zsh
astahlma pts/5    tmux(3208).%1    Mon08   22.00s  0.72s  0.72s -zsh

Thankfully I'm all by my lonesome here on my laptop.

1.2 Which are the most expensive processes?

top is the most popular tool for getting a high-level overview of system resource consumption for each process. But if you have sysadmin access on the machine then you should install htop, which is like top's attractive and more capable cousin.

Here's top (running on my laptop):

Screencast of top
Boring ol' top

I know that I can select which resource to sort by, only because I bothered to read the man page. But that's about as fancy as top gets.

In contrast, here's htop. I have never consulted the man page for htop, because I've never felt the need to. The UI is intuitive and holds your hand every step of the way by giving you a menu of available keys. It even has a built-in help section.

Screencast of htop
htop!

In case that's hard to follow, here's a quick breakdown of everything happening in that .gif:

  1. The initial view is a flat view of processes sorted by CPU usage, which is the default. Notice that the memory usage, swap usage, and the load on each CPU is charted in the top-left corner.
  2. I press <F5>, which switches the display from a flat list of processes sorted by resource usage to a tree view, which displays the full process tree rooted at PID 1.
  3. I press <F4> to filter, then start typing "firefox" to incrementally filter the process tree. I'm left with only firefox and all of its descendants.
  4. I press the "s" key to begin tracing system calls for the firefox process with strace (more on that below).
  5. I escape the strace session and press the "l" key to view all of firefox's open file handles with lsof (more on that later, too).
  6. I press <F4> to filter the list of file handles owned by the firefox process, then start typing "log" to match only files whose name contains that string.

Amazing, huh?

I get it, breaking up is hard to do. But don't cry over the good times you and top had together - smile because they happened, and then move on with htop.

2 Inspecting individual processes

Now you've narrowed your focus to a specific process. Here are a few techniques to see how it interacts with its environment.

2.1 Where are the logs for this process?

I've got Spotify running on my laptop. Let's see if it does any logging…

ps -f --ppid 1 | grep "Spotify.app"
501    684      1  0  1:30PM  ??  1:22.77  /Applications/Spotify.app/Contents/MacOS/Spotify       -psn_0_73746                                   |                                                  |                                         |                                                                 |                        |                                                                 |                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |                                                                 |                                          |                                      |                              |                                                                  |                                       |                                                                  |                        |

export PID=684

If the process is writing to a log file then it must have a file handle open for writing.

lsof -p $PID | grep -i log
Spotify 684 andrewstahlman 88w REG 1 5 187 8598375040 /Users/andrewstahlman/Library/Caches/com.spotify.client/Browser/Local Storage/leveldb/LOG
Spotify 684 andrewstahlman 91w REG 1 5 56441 8598347694 /Users/andrewstahlman/Library/Caches/com.spotify.client/Browser/Local Storage/leveldb/000539.log
Spotify 684 andrewstahlman 94w REG 1 5 297 8598375070 /Users/andrewstahlman/Library/Caches/com.spotify.client/Browser/LOG
Spotify 684 andrewstahlman 99w REG 1 5 0 8594628670 /Users/andrewstahlman/Library/Caches/com.spotify.client/Browser/000003.log

Let's examine only regular files opened in write-mode.

lsof -p $PID | perl -lane '$F[3] =~ m/[0-9]+w/ && print'
Spotify 684 andrewstahlman 88w REG 1 5 187 8598375040 /Users/andrewstahlman/Library/Caches/com.spotify.client/Browser/Local Storage/leveldb/LOG
Spotify 684 andrewstahlman 90w REG 1 5 44698 8590792313 /Users/andrewstahlman/Library/Caches/com.spotify.client/Browser/Local Storage/leveldb/MANIFEST-000001
Spotify 684 andrewstahlman 91w REG 1 5 56441 8598347694 /Users/andrewstahlman/Library/Caches/com.spotify.client/Browser/Local Storage/leveldb/000539.log
Spotify 684 andrewstahlman 94w REG 1 5 297 8598375070 /Users/andrewstahlman/Library/Caches/com.spotify.client/Browser/LOG
Spotify 684 andrewstahlman 98w REG 1 5 41 8594628668 /Users/andrewstahlman/Library/Caches/com.spotify.client/Browser/MANIFEST-000001
Spotify 684 andrewstahlman 99w REG 1 5 0 8594628670 /Users/andrewstahlman/Library/Caches/com.spotify.client/Browser/000003.log
Spotify 684 andrewstahlman 123w REG 1 5 1472 8598375664 /Users/andrewstahlman/Library/Saved Application State/com.spotify.client.savedState/windows.plist

Yep, those are log files. Let's check out those leveldb logs. Side note: leveldb is a library implementation of a kv-store. It looks like Spotify is using it to store some data locally on disk. Here's some ad-targeting configuration it's keeping on me:

cat "~/Library/Caches/com.spotify.client/Browser/Local Storage/leveldb/000544.log" | \
LC_ALL=C sed 's/\\n/\
/g' | perl -lne 'print if /var ad/ .. /}/'
var adMetadata = {
        ...
        targetingParams: {
          'country': 'us',
          'historicgenre': 'classical',
          'gender': 'male',
          ...
          'abtest': 'ads-preroll-mvto-ss_Control,ad-sponsored-playlist-dw_Control,premium_18q3_audio_assertive_android_cta_test_learn_more_companion,2018q3_premium_quicksilver_student_dualcta_us_Small-CTA,premium_18q1_evergreen_showcase_frequency_month_long_Control,premium_18q1_banner_creative_refresh_test_banner_control,premium_18q3_dynamicupsell_webcopytest_landing-page,ad_exp_5tile_3,adgen_employee_testing_Control,2018q3_premium_personalization_quicksilver_test,premium_18q3_quicksilver_survey_holdout_treatment,2018q3_premium_quicksilver_hulumm_IOprice_us_IO-CO,premium_17q3_ads_creative_refresh_Control,ads_programmatic_banner_exposed,premium_18q3_quicksilver_falcon3_experiment1_Control,2018q3_premium_latam_winback_treatment,iam_marquee_holdout_1percent_Marquee,premium_18q1_audio_creative_quantity_test_audio_holdout,Holiday_2017_Treatment,2018q1_premiumbusiness_dual_offer_US_dual_subject_dual_body,ad-sponsored-playlist-dw2_test,ad-logic-faux-real_Control,ad-logic-skipton-model-test_modified-window,premium_18q2_summer_holdouts_email_upsell_showcase_quicksilver_filler_guaranteed_perf,premium_18q3_quicksilver_asiaprepaid_holdout_Control,ads-video-events-container_Enabled,premium_18q2_evergreen_showcase_creative_variation_Control,premium_18q4_quicksilver_jp_upsell_holdout_treatment,ads_p_video_exposed,premium_18q2_showcase_artist_marketing_holdout_test_Artist-imagery,ads_adserver_alpha_test_Control,ad_mvt_prog_all,premium_18q1_audio_creative_refresh_test_2_New_Character,adserver-first_test,dummy_ss_test_Exposed,ad-betamax-video_On',
          'streamtimebetweenadbreaks': '810',
          'upsellproduct': 'premium',
          'lang': 'en',
          'client_version': 'desktop_1.0.90',
          'age_pr': '26',
          'product': 'premium',
          ...
        }

Looks like I'm in the control group for the "premium_18q1_evergreen_showcase_frequency_month_long" experiment, whatever that is.

2.2 What are its environment variables?

It's sometimes useful to inspect a process' environment variables to verify that it has been launched with the correct $PATH, secret keys, log configuration, etc. Environment variables are exposed via the /proc filesystem on Linux, so you can read the (null-separated) contents of /proc/$pid/environ like any other text files.

For example, we can inspect the environment variables used by firefox.

cat /proc/$(pidof -s firefox)/environ | tr '\0' '\n' | egrep -v "GDM|GTK|LD_" | head -15
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
DEFAULTS_PATH=/usr/share/gconf/LXDE.default.path
DESKTOP_SESSION=LXDE
DISPLAY=:0.0
HOME=/home/astahlman
IM_CONFIG_PHASE=1
LANG=en_US.UTF-8
LANGUAGE=en_US
LOGNAME=astahlman
MANDATORY_PATH=/usr/share/gconf/LXDE.mandatory.path
MESA_GLSL_CACHE_DIR=/tmp/Temp-f946a23f-b9ea-4f12-849e-44da804f4e58
MOZ_ASSUME_USER_NS=1
MOZ_CRASHREPORTER_DATA_DIRECTORY=/home/astahlman/.mozilla/firefox/Crash Reports
MOZ_CRASHREPORTER_EVENTS_DIRECTORY=/home/astahlman/.mozilla/firefox/uibwrxsw.dev-edition-default/crashes/events
MOZ_CRASHREPORTER_PING_DIRECTORY=/home/astahlman/.mozilla/firefox/Pending Pings

Let's filter that down to variables prefixed with "MOZ_"

cat /proc/$(pidof -s firefox)/environ | tr '\0' '\n' | grep "MOZ_"
MOZ_ASSUME_USER_NS=1
MOZ_CRASHREPORTER_DATA_DIRECTORY=/home/astahlman/.mozilla/firefox/Crash Reports
MOZ_CRASHREPORTER_EVENTS_DIRECTORY=/home/astahlman/.mozilla/firefox/uibwrxsw.dev-edition-default/crashes/events
MOZ_CRASHREPORTER_PING_DIRECTORY=/home/astahlman/.mozilla/firefox/Pending Pings
MOZ_CRASHREPORTER_RESTART_ARG_0=/home/astahlman/tools/firefox/firefox
MOZ_CRASHREPORTER_RESTART_ARG_1=
MOZ_CRASHREPORTER_STRINGS_OVERRIDE=/home/astahlman/tools/firefox/browser/crashreporter-override.ini
MOZ_LAUNCHED_CHILD=
MOZ_PROFILER_STARTUP=
MOZ_PROFILER_STARTUP_ENTRIES=
MOZ_PROFILER_STARTUP_FEATURES_BITFIELD=
MOZ_PROFILER_STARTUP_FILTERS=
MOZ_PROFILER_STARTUP_INTERVAL=
MOZ_SANDBOXED=1
MOZ_SANDBOX_USE_CHROOT=1

2.3 Where are its config files located?

If you're able to launch the process, you can put it under a microscope with strace (or on OSX, dtruss) and trace all of its system calls.

Let's say you can't remember from where Firefox loads user settings. You could fire up firefox under strace to record all of its system calls.

strace firefox 2>&1 | tee /tmp/firefox-syscalls.txt

Give the process a few seconds to initialize, then check the syscalls:

head -n 25 /tmp/firefox-syscalls.txt
execve("/usr/bin/firefox", ["firefox"], [/* 62 vars */]) = 0
brk(NULL)                               = 0x157b000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=168914, ...}) = 0
mmap(NULL, 168914, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f6dcdbd7000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
\0\1\0\0\0\360a\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=144776, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6dcdbd5000
mmap(NULL, 2221160, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f6dcd7bb000
mprotect(0x7f6dcd7d5000, 2093056, PROT_NONE) = 0
mmap(0x7f6dcd9d4000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19000) = 0x7f6dcd9d4000
mmap(0x7f6dcd9d6000, 13416, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f6dcd9d6000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
\0\1\0\0\0\220\16\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=14632, ...}) = 0
mmap(NULL, 2109712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f6dcd5b7000
mprotect(0x7f6dcd5ba000, 2093056, PROT_NONE) = 0
mmap(0x7f6dcd7b9000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f6dcd7b9000

Let's look at just the calls to open().

perl -lne 'm/openat\(\w+, "([^\"]+)\"/ && print $1' /tmp/firefox-syscalls.txt | head
/etc/ld.so.cache
/lib/x86_64-linux-gnu/libpthread.so.0
/lib/x86_64-linux-gnu/libdl.so.2
/lib/x86_64-linux-gnu/librt.so.1
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
/lib/x86_64-linux-gnu/libm.so.6
/lib/x86_64-linux-gnu/libgcc_s.so.1
/lib/x86_64-linux-gnu/libc.so.6
/home/astahlman/tools/firefox/dependentlibs.list
/home/astahlman/tools/firefox/libnspr4.so

Looks like a lot of shared object files. Let's also print the file type.

perl -lne 'm/openat\(\w+, "([^\"]+)\"/ && print $1' /tmp/firefox-syscalls.txt | xargs -I % file "%" | head
/etc/ld.so.cache: data
/lib/x86_64-linux-gnu/libpthread.so.0: symbolic link to libpthread-2.26.so
/lib/x86_64-linux-gnu/libdl.so.2: symbolic link to libdl-2.26.so
/lib/x86_64-linux-gnu/librt.so.1: symbolic link to librt-2.26.so
/usr/lib/x86_64-linux-gnu/libstdc++.so.6: symbolic link to libstdc++.so.6.0.24
/lib/x86_64-linux-gnu/libm.so.6: symbolic link to libm-2.26.so
/lib/x86_64-linux-gnu/libgcc_s.so.1: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=69c6e15d63392ac94eed3af9166a3e66384c52a7, stripped
/lib/x86_64-linux-gnu/libc.so.6: symbolic link to libc-2.26.so
/home/astahlman/tools/firefox/dependentlibs.list: ASCII text
/home/astahlman/tools/firefox/libnspr4.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=f8bf41d87291d74413d28f3f60be2da46300afab, stripped
xargs: file: terminated by signal 13

Let's exclude all of those shared object files…

perl -lne 'm/openat\(\w+, "([^\"]+)\"/ && print $1' /tmp/firefox-syscalls.txt | egrep -v "\.so(\.[0-9])?" | xargs -I % file "%" | head
/home/astahlman/tools/firefox/dependentlibs.list: ASCII text
/proc/filesystems: empty
/home/astahlman/.mozilla/firefox/Crash Reports/InstallTime20181001155545: ASCII text, with no line terminators
/home/astahlman/.mozilla/firefox/Crash Reports/LastCrash: ASCII text, with no line terminators
/home/astahlman/.Xauthority: data
/usr/share/X11/locale/locale.alias: UTF-8 Unicode text
/usr/share/X11/locale/locale.alias: UTF-8 Unicode text
/usr/share/X11/locale/locale.dir: ASCII text
/usr/share/X11/locale/en_US.UTF-8/XLC_LOCALE: ASCII text
/home/astahlman/.Xdefaults-astahlman-ThinkPad-T420: cannot open `/home/astahlman/.Xdefaults-astahlman-ThinkPad-T420' (No such file or directory)
xargs: file: terminated by signal 13

And excluding X11 configuration…

perl -lne 'm/openat\(\w+, "([^\"]+)\"/ && print $1' /tmp/firefox-syscalls.txt | egrep -v "\.so(\.[0-9])?|/X11/" | xargs -I % file "%" | head
/home/astahlman/tools/firefox/dependentlibs.list: ASCII text
/proc/filesystems: empty
/home/astahlman/.mozilla/firefox/Crash Reports/InstallTime20181001155545: ASCII text, with no line terminators
/home/astahlman/.mozilla/firefox/Crash Reports/LastCrash: ASCII text, with no line terminators
/home/astahlman/.Xauthority: data
/home/astahlman/.Xdefaults-astahlman-ThinkPad-T420: cannot open `/home/astahlman/.Xdefaults-astahlman-ThinkPad-T420' (No such file or directory)
/tmp/firefox_astahlman/.parentlock: cannot open `/tmp/firefox_astahlman/.parentlock' (No such file or directory)
/home/astahlman/.Xauthority: data
/home/astahlman/tools/firefox/updates/0/update.version: cannot open `/home/astahlman/tools/firefox/updates/0/update.version' (No such file or directory)
/home/astahlman/.mozilla/firefox/profiles.ini: ASCII text
xargs: file: terminated by signal 13

Hey, profiles.ini sounds promising! And sure enough, that is the entry point to all of my user-specfic configuration.

cat /home/astahlman/.mozilla/firefox/profiles.ini
[General]
StartWithLastProfile=1

[Profile0]
Name=default
IsRelative=1
Path=XXXXXX.default

[Profile1]
Name=dev-edition-default
IsRelative=1
Path=XXXXXX.dev-edition-default
Default=1

2.4 Who is it talking to?

sudo lsof -i | hgrep firefox
COMMAND     PID            USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
firefox   17815       astahlman   57u  IPv4 154278      0t0  TCP astahlman-ThinkPad-T420:53762->151.101.1.69:https (ESTABLISHED)
firefox   17815       astahlman   59u  IPv4 154285      0t0  TCP astahlman-ThinkPad-T420:42880->sea30s02-in-f10.1e100.net:https (ESTABLISHED)
firefox   17815       astahlman   65u  IPv4 153854      0t0  TCP astahlman-ThinkPad-T420:36226->a23-32-46-65.deploy.static.akamaitechnologies.com:http (ESTABLISHED)
firefox   17815       astahlman   98u  IPv4 155339      0t0  TCP astahlman-ThinkPad-T420:52250->104.16.31.34:https (ESTABLISHED)
firefox   17815       astahlman  105u  IPv4 156457      0t0  TCP astahlman-ThinkPad-T420:44490->sea30s01-in-f10.1e100.net:https (ESTABLISHED)
firefox   17815       astahlman  106u  IPv4 157758      0t0  TCP astahlman-ThinkPad-T420:44612->do-2.lastpass.com:https (ESTABLISHED)
firefox   17815       astahlman  107u  IPv4 153897      0t0  TCP astahlman-ThinkPad-T420:43546->server-52-84-51-200.sea32.r.cloudfront.net:https (ESTABLISHED)
firefox   17815       astahlman  109u  IPv4 155678      0t0  TCP astahlman-ThinkPad-T420:52068->a96-7-85-90.deploy.static.akamaitechnologies.com:https (ESTABLISHED)
firefox   17815       astahlman  119u  IPv4 154828      0t0  TCP astahlman-ThinkPad-T420:54044->ec2-50-112-164-16.us-west-2.compute.amazonaws.com:https (ESTABLISHED)
firefox   17815       astahlman  120u  IPv4 156458      0t0  TCP astahlman-ThinkPad-T420:46576->sea15s12-in-f206.1e100.net:http (ESTABLISHED)

Some of those are recognizable. For instance, 1e100.net is Google. Get it? (It's scientific notation). Lastpass I recognize - don't know why it needs to keep a connection open to home, but good to know that it's using HTTPS, I guess.

What about that first IP address? It's an HTTPS connection, so it should be curlable…

curl https://151.101.1.69
curl: (51) SSL: certificate subject name (*.stackexchange.com) does not match target host name '151.101.1.69'

Ah right, I have a StackOverflow tab open (of course I do). What about that random EC2 instance?

curl https://ec2-50-112-164-16.us-west-2.compute.amazonaws.com
curl: (51) SSL: certificate subject name (push.services.mozilla.com) does not match target host name 'ec2-50-112-164-16.us-west-2.compute.amazonaws.com'

Looks like something that's sending me push notifications from Mozilla. Interesting…

3 Networking

All of the techniques described below were honed by a process consisting of banging my head against the keyboard, sifting through StackExchange answers, and re-reading the AWS docs on EC2 security groups for what felt like the hundredth time.

I am not a networking expert and there are probably smarter ways to do half of this stuff.

Disclaimer done, let's get to it.

3.1 What's my IP address?

List all network interfaces:

ifconfig -l
lo0 gif0 stf0 XHC20 en0 p2p0 awdl0 en1 en2 bridge0 utun0

Find the IP address for a given interface:

ifconfig en0 | grep inet
inet6 fe80::407:3265:b168:386d%en0 prefixlen 64 secured scopeid 0x5
inet 192.168.1.2 netmask 0xffffff00 broadcast 192.168.1.255

3.2 What's the IP address for this hostname?

dig google.com

> DiG 9.10.6 <<>> google.com
;; global options: +cmd
;; Got answer:
>HEADER<<- opcode: QUERY, status: NOERROR, id: 64121
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com.            IN    A

;; ANSWER SECTION:
google.com.        286    IN    A    172.217.3.206

;; Query time: 39 msec
53(192.168.1.1)
;; WHEN: Sun Oct 07 09:44:40 PDT 2018
;; MSG SIZE  rcvd: 55

So "google.com" resolves to 172.217.3.206 - or at least this time it did. Curiously, google.com does not resolve to a consistent IP address. Let's repeat the lookup and see what we get back:

for i in {1..25}; do dig +short google.com sleep 1; done | sort | uniq -c
9 172.217.14.206
2 172.217.3.174
13 172.217.3.206
1 216.58.193.78

Interesting - it looks like Google uses round-robin DNS.

3.3 What's the hostname for this IP address?

Using a randomly selected IP address for google.com (see above), let's use dig with the -x flag to do a reverse DNS lookup.

dig -x 172.217.3.206 +short
sea15s12-in-f14.1e100.net.
sea15s12-in-f206.1e100.net.
sea15s12-in-f14.1e100.net.
sea15s12-in-f206.1e100.net.

Huh. That's interesting. We got back 4 hostnames for that IP address, two of which appear to be duplicates for some reason. As a sanity check, let's do a forward-lookup on those hostnames…

dig +short sea15s12-in-f206.1e100.net
172.217.3.206

dig +short sea15s12-in-f14.1e100.net
172.217.3.206

Yep, same IP, two different hostnames. I'm not sure why Google is doing this, but creating multiple A records that point to the same IP is a totally valid thing to do.

3.4 Can I reach someone at this IP address?

The ping utility comes in handy when you need to debug why one machine can't connect to another, usually in a cloud-based environment with some sort of firewall or virtual network. Any time I have two EC2 instances in a VPC that should be talking to each other but aren't, I use ping to test whether the server is routable from the client.

ping 192.168.1.4
PING 192.168.1.4 (192.168.1.4): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
Request timeout for icmp_seq 5
Request timeout for icmp_seq 6
Request timeout for icmp_seq 7
Request timeout for icmp_seq 8
Request timeout for icmp_seq 9
Request timeout for icmp_seq 10
Request timeout for icmp_seq 11
Request timeout for icmp_seq 12
Request timeout for icmp_seq 13
Request timeout for icmp_seq 14
Request timeout for icmp_seq 15
Request timeout for icmp_seq 16
Request timeout for icmp_seq 17
Request timeout for icmp_seq 18
Request timeout for icmp_seq 19
Request timeout for icmp_seq 20
Request timeout for icmp_seq 21
  C-c C-c
--- 192.168.1.4 ping statistics ---
23 packets transmitted, 0 packets received, 100.0% packet loss

Nope, no one there - or at least, no one there who responds to ping requests.

How about our old friend Google?

ping 172.217.3.206
PING 172.217.3.206 (172.217.3.206): 56 data bytes
64 bytes from 172.217.3.206: icmp_seq=0 ttl=54 time=11.238 ms
64 bytes from 172.217.3.206: icmp_seq=1 ttl=54 time=10.262 ms
64 bytes from 172.217.3.206: icmp_seq=2 ttl=54 time=8.791 ms
64 bytes from 172.217.3.206: icmp_seq=3 ttl=54 time=12.061 ms
64 bytes from 172.217.3.206: icmp_seq=4 ttl=54 time=11.419 ms
64 bytes from 172.217.3.206: icmp_seq=5 ttl=54 time=11.619 ms
64 bytes from 172.217.3.206: icmp_seq=6 ttl=54 time=9.720 ms
64 bytes from 172.217.3.206: icmp_seq=7 ttl=54 time=10.119 ms
64 bytes from 172.217.3.206: icmp_seq=8 ttl=54 time=10.627 ms
64 bytes from 172.217.3.206: icmp_seq=9 ttl=54 time=10.045 ms
64 bytes from 172.217.3.206: icmp_seq=10 ttl=54 time=13.929 ms
64 bytes from 172.217.3.206: icmp_seq=11 ttl=54 time=11.993 ms
64 bytes from 172.217.3.206: icmp_seq=12 ttl=54 time=10.509 ms
64 bytes from 172.217.3.206: icmp_seq=13 ttl=54 time=12.870 ms
64 bytes from 172.217.3.206: icmp_seq=14 ttl=54 time=19.877 ms
64 bytes from 172.217.3.206: icmp_seq=15 ttl=54 time=13.560 ms
64 bytes from 172.217.3.206: icmp_seq=16 ttl=54 time=18.439 ms
  C-c C-c
--- 172.217.3.206 ping statistics ---
17 packets transmitted, 17 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 8.791/12.181/19.877/2.879 ms

3.5 Is anyone listening on this port?

ping will tell you whether you have a route to a given machine, but that doesn't always mean that you can connect to process listening on a specific port. Once I've used ping to confirm that the server I'm trying to connect to is routable, I use netcat to check whether the given port is exposed to the client. If it isn't, you usually need to go update some firewall configuration (in EC2, the problem is usually your security group settings).

nc -z -w 1 172.217.3.206 80
Connection to 172.217.3.206 port 80 [tcp/http] succeeded!

nc -z -w 1 172.217.3.206 443
Connection to 172.217.3.206 port 443 [tcp/https] succeeded!

nc -z -w 1 172.217.3.206 22 || echo "Failed"
Failed

3.6 SSH tunneling

Editorial note: I was considering trying to illustrate the various configurations for SSH tunnels until I came across How does reverse SSH tunneling work? on unix.stackexchange. The answer from user erik has the best pictorial explanation I can imagine, so I'm just going to link to his answer here and include his diagram inline.

SSH tunnels are an invaluable tool to circumvent firewalls. That sounds devious, but there are plenty of legitimate reasons to do this that won't get you fired or trigger an FBI investigation.

3.6.1 Forward tunneling

For example, let's say you've got a Java process running in a staging environment that keeps crashing, and you want to connect your IDE's debugger to the remote JVM's debug port in order to troubleshoot.

Obviously you don't want to expose the debug port to the entire world, so leaving a hole in the firewall for this port is a bad idea. Instead, you can set up an SSH tunnel to securely forward traffic from your laptop to the debug port on the staging machine. The effect is that your Java debugger running on your laptop will connect to an address like localhost:8000, and the SSH daemon will forward the traffic to the debug port on the staging machine.

Another common use-case for (forward) SSH tunnels is connecting to an administrative web UI which is bound to localhost on a remote server. With an SSH tunnel, you can setup an SSH tunnel that forwards localhost:8000 to the appropriate port on localhost of the remote server, fire up your browser, and view the web page from your laptop at localhost:8000.

Forward SSH tunnels look like this:

Diagram of SSH tunnel
Image credit: erik - How does reverse SSH tunneling work?

Notice that there are 3 hosts involved in that second diagram. This type of tunneling is common in environments where SSH access is mediated through a "gateway." I've also heard this referred to as a "bastion" or "jump node." The idea is that only the gateway machine is routable from the outside world, and to log in to a machine on the private network you have to SSH via the gateway.

3.6.2 Reverse tunnel

The diagram above describes regular ol' SSH tunnels, in which a local port forwards to a remote port. But you can also create tunnels that work in the opposite direction, in which a remote port forwards traffic to a local port. I use remote tunnels less frequently, but there are times when they are useful.

Here's a hypothetical (and somewhat contrived) scenario. Let's say you want to intercept traffic that's intended for some other machine - maybe you want to see how clients are calling some server. You fire up netcat on your laptop with nc -l 8000. This just binds a listener to localhost port 8000 and prints anything it receives on stdout.

Next, you log into the remote machine and shut down the real server process (to free up its port). Now you can establish a reverse SSH tunnel from your laptop to the server such that requests to the server are forwarded "back" to your laptop, like this: ssh -R 8000:localhost:<remote-port> <remote-ip>

Diagram of reverse SSH tunnel
Image credit: erik - How does reverse SSH tunneling work?

4 Appendix: Unix utilities

4.1 hgrep

Like grep, but print the first line unconditionally. This is useful when you are filtering the output of a tabular command and what to preserve the column [h]eaders.

N=0

while read -r line; do
    if [ $N -eq 0 ]; then
        echo "$line"
    else
        echo "$line" | grep $*
    fi
    N=$((N + 1))
done