Thursday, May 2, 2024

Annoying Latex (Xetex) Overleaf thing

Why is it saying "font cannot be found"? Turns out the fix was just this:

For every user of Overleaf - this is the way to go if you have a local fonts directory:

\setmainfont[Path=fonts/]{myfont.ttf}

Since it took me way to much time to find this issue: Be aware that the / at the end of the path is necessary (but not at the beginning)!

Very annoying.

Saturday, March 30, 2024

Firefox has lag issues when 6 windows are open with over 12,000 tabs open

Currently I have 6 Firefox windows open:

  1. Window 1 has 12,142 tabs open
  2. Window 2 has 129 tabs open
  3. Window 3 has 1 tab open
  4. Window 4 has 42 tabs open
  5. Window 5 has 64 tabs open
  6. Window 6 has 92 tabs open

The majority of the tabs are in the "unloaded" state so they should not be taking up any CPU time.  

RAM usage is less than 50% and CPU usage is on average less than 50% across all cores. 

I am running both Firefox and Chrome at the same time and Chrome is significantly more responsive.

Even something as simple as opening a new tab in Firefox takes a few seconds, whereas opening a new tab in Chrome is instant. To be fair, I only have 300 tabs open in Chrome right now, but this shows that the performance issues are not due to my hardware.

Creating a new tab in Firefox and then entering a URL then hitting enter, the tab takes a while to even begin loading the page. 

Even while writing this blog post, every few seconds while typing, the text just freezes for a couple of seconds before my keystrokes make their way onto the screen. I notice this when typing on monkeytype and typeracer too - sometimes the text just freezes for a few seconds. This happens almost every other sentence and it only happens on Firefox and not on Chrome.

When watching a YouTube video, every few seconds (say 5-20 seconds), the video just freezes for a couple of seconds while the audio continues to play. Again, this ONLY happens on Firefox and does not happen on Chrome.

I think this issue started to happen from a while ago but recently the lag just got bad enough to the point where it is actually impacting my monkeytype performance, so I decided that I had to close some tabs. 

But before closing my precious tabs, I have to save them somehow. Here is where I ran into problems. 

It turns out that Tab Session Manager no longer works when you have 12,000 tabs open.

Tab Stash also doesn't work when you have so many tabs.

Firefox's built-in "save all open tabs as bookmarks" works, but does not save tab state and history, which is desirable.

So I'm going to try to find a way to just save my Firefox "profile" somewhere so that I can restore it later along with all of the tab state and history.












Tuesday, March 26, 2024

How to give your Scaleway Stardust VPS a custom* IPv6 address

(*by "custom" I mean any address you want within the /64 block that Scaleway gives you.)

So I got myself a €0.43/month Scaleway Stardust IPv6-only instance, and I wanted to attach it to a permanent IPv6 address.

Scaleway generously gives you 40 free flexible IPv6 addresses. Each of these is a /64 block! So you actually can add any of the IP in the /64 block and attach to your VM. And you can have multiple IPv6 addresses from multiple of those blocks attached to your VPS at the same time, so you can access your VPS from all those IPs simultaneously! (I tried this, it works - pretty cool!)

Anyway, so my Stardust instance is running Debian 12, and I initially thought that to add my own custom IPv6 address I just had to edit /etc/network/interfaces, because that is what the Debian manual says: https://wiki.debian.org/NetworkConfiguration

# systemctl status networking
# systemctl restart networking

However, when I run that command, I get this result:

# systemctl status networking
Unit networking.service could not be found.

So then I listed all of the running systemd services and looking at the list, it looked like my VPS is using systemd-networkd for configuration. 

Doing systemctl status systemd-networkd gave this kind of message:

if1: Configuring with /run/systemd/network/if1.network.

So I thought I just needed to edit that file. So I went ahead and edited it but upon reboot the changes were not persisted. 

It turns out the /run/systemd/network files are volatile files as explained in the Arch wiki:

The global configuration file in /etc/systemd/networkd.conf may be used to override some defaults only. The main configuration is performed per network device. Configuration files are located in /usr/lib/systemd/network/, the volatile runtime network directory /run/systemd/network/ and the local administration network directory /etc/systemd/network/. Files in /etc/systemd/network/ have the highest priority.

So then I created the file with the same name in /etc/systemd/network/ and now the IP address is restored on reboot.


A list of S3-compatible providers

In the previous post I compared B2 and R2. Then I realized that there are a whole bunch of other S3-compatible providers so here is a list - I evaluated them based on my own use case, yours may vary:

NOTE: I haven't used any of the services listed below, so I cannot comment as to their quality or reliability.

  • BackBlaze B2 - PUT requests are free, no minimum spend.
  • Oracle Cloud - Data storage: $0.0255/GB/month. $0.34/million requests. 10TB free egress per month.
  • Microsoft Azure - $7.7/million write operations
  • IBM Cloud - $5.2/million class A (write) operations 
  • AWS S3 - $5/million PUT calls
  • Google Cloud - $5/million class A (write) operations
  • Fuga - €5/million PUT calls
  • CloudFlare R2 -$4.50/million class A (write) operations
  • Clever cloud - €0.09 / GB egress 
  • Terrahost - minimum spend is $11.5 per month
  • Wasabi - minimum spend is $6.99 / month
  • Vultr - minimum spend is $6 / month 
  • Upcloud - minimum spend is €5 / month
  • Digital Ocean spaces - minimum spend is $5 / month
  • Linode - minimum spend is $5 / month
  • iDrive e2 - minimum spend is $4 / month
  • Contabo - minimum spend is $3 / month
  • Bunny - minimum spend is $1 / month (which gives you 25GB with replication, or 50GB with no replication), no API fees, no API egress fees, S3 coming soon (TM)
  • Synology C2 - €11.99/ year for 100GB, no API fees, no egress fees (???), no upload fees, no deletion fees
  • Serverius - Data storage: €0.009/GB/month. Every month, your first million HTTP requests are free. Each GET and PUT request type has its own limit of 1 million free requests. For example, if you’ve had 0.8 million GET and 0.7 million PUT requests, you’re still within your free limit. In case you exceed 1 million requests, the extra requests will be charged at only 0.0003 Euro per 1000 (0.3/million) HTTP requests. The first 200GB of data egress per month is free.
  • Scaleway - Data storage: €0.012/GB/month for single-zone, €0.0146 for multi-zone. Ingress is free. Requests are free. Egress - 75GB free per month, after that charged at €0.01/GB.
  • OVH - Data storage: 0.012/GB/month. Ingress is free. API requests are free. Egress is charged at 0.012/GB month.
  • tebi.io - Data storage: PAYG plan includes a Free Tier which gives you 25GB of free storage replicated in two locations. Additional storage is charged at $0.02/GB/month. API calls are FREE. Unlimited uploads (free, I guess). 250GB of free egress per month, additional egress is charged at $0.01/GB.
  • Dreamhost - Data storage: $0.025/GB/month. Ingress is FREE. API calls are FREE. Egress is charged at $0.05/GB.
  • Exoscale - Data storage: €0.02/GB/month. Egress is charged at $0.02/GB. There is no other charge - ingress is free.
  • Ionos S3 - Data storage: €0.015/GB/month. Ingress is FREE. API requests are FREE. Outgoing data traffic: €0.03/GB.
  • Storj - Data storage: $0.004/GB/Month. Segments are billed at $0.00000001222 per Segment Hour. Every file smaller than 64MB takes up 1 segment (unless you split them). Egress is charged at $0.007/GB.
  • Telnyx - Data storage: $0.006/GB/month. State-change operations: $0.5 per million. State-read operations: $0.04 per million. Egress is free (???). But see the LET thread for more details: https://lowendtalk.com/discussion/187546/telnyx-s3-compatible-object-storage-4-tb-mo-and-free-egress

Please note that I do not know which of the above listed services have hidden charges or minimum spend limits or some crazy terms/conditions like "once you upload a file you must not delete it for at least 6 months otherwise we will suspend your account" etc.

Caveat emptor, I guess.

 

Btw, the cheapest Scaleway instance - IPv6-only Stardust - only costs around $0.5 per month if you use the 10GB local storage. Pretty cheap! And you get 1GB RAM and "unlimited" bandwidth too. You need to disable IPv4 in order to get that price, though. So pester your ISP until they give you IPv6!!

Monday, March 25, 2024

BackBlaze B2 vs Cloudflare R2 pricing

NOTE: I did not include Wasabi because their minimum price is $6.99 / month. I did not include Digital Ocean spaces because their minimum price is $5 / month. In contrast, it seems Backblaze does not have a minimum price (https://www.reddit.com/r/backblaze/comments/yv55eu/backblaze_b2_is_there_a_minimum_monthly_amount/) so if you store only a few GB then you only pay a few cents per month, which is perfect for my use case.

So I noticed some interesting differences between B2 and R2 pricing:

Backblaze B2

  • Ingress is free, egress is free up to 3x your monthly average storage, with any additional egress priced at $0.01/GB. You also get 1GB free egress per day.
  • Class A operations (PutObject, DeleteObject) are FREE
  • Class B operations (GetObject) - you get 2,500 free operations per day (== 75k/mth), then $0.004 per 10,000 ($0.4 / million)

Cloudflare R2

  • Ingress and egress are both free.
  • Class A Operations (PutObject) - you get 1 million free requests / month, then $4.50 / million 
  • Class B Operations (GetObject) - you get 10 million requests / month, then $0.36 / million requests
  • DeleteObject is free.

Summary

  • DeleteObject is FREE on both B2 and R2
  • GetObject is cheaper on R2: R2 gives you 10 million/month allowance and then charges you $0.36/million thereafter, whereas B2 gives you 2.5k/day allowance and then charges you $0.4/million thereafter.
  • PutObject is cheaper on B2: FREE on B2, whereas R2 gives you 1 million allowance and then charges you $4.50/million thereafter.

Based on the pricing info alone, it looks like if you are going to be doing millions of calls to PutObject per month and less than 2500 calls to GetObject per day, then B2 will be a lot cheaper for you. But if you are going to be doing millions of GetObject calls and less than 1 million PutObject calls per month, then R2 will be cheaper.

Of course we have to take the B2 egress costs into account too. If you are egressing less than 3x your storage, then egress is free, otherwise it costs $10 per TB, so I don't think B2 is suitable for file sharing - the B2 pricing structure makes it only really suitable for file backups. 

Having said that, apparently B2 egress is free through Cloudflare. Though I'm not sure exactly how to take advantage of it. Something to investigate if I end up actually using more than the free B2 egress, I guess.

If I do 4 million PutObject calls (e.g. 1-2 calls per second) that is going to cost me $13.50 per month on R2, whereas it would be free on B2. So I think, if I use R2, I would have to carefully think about how to reduce the number of PutObject calls.




No longer able to reproduce Cloudflare DNS flapping

UPDATE: I tried this with some of the $0.99/year 1.111B class .xyz domains that I registered using a different registrar (you can't register .xyz domains on Cloudflare for some reason). I simply set the nameservers for my 1.111B domain to Cloudflare (add it to Cloudflare first, of course) and it works just as well! The change takes effect instantaneously. As soon as the HTTP PUT request returns, if you run the host command again, you will immediately see the new, updated IP address for that domain. Very cool!!!!

I wonder why more people don't use those $0.99/year 1.111B domains. They're so cheap.

 

 

 

Last post I mentioned that I saw DNS flapping with Cloudflare. 

I wondered if it was because the TTLs on some nameservers had not yet expired. Since the updates presumably take a while to propagate across all nameservers, maybe the TTLs on some nameservers start counting down before others. So maybe the issue was that I was updating the DNS too quickly - if I waited a few minutes between updates, then maybe the updates would become instantaneous and reliable with no flapping.

So I tried what I did in the last post again, this time waiting a few minutes before updating the DNS to a new value, and this time I saw some more interesting behavior.

First, I set the IP to 1.0.0.1 at 10:14:52: Instantaneous and no flapping.

Then I set the IP to 8.8.8.8 at 10:21:02:

Request issued at 10:21:02

First change seen: 10:23:15

Wow! This time it took over 2 minutes to update and there was flapping too!

Then I changed it to 192.168.0.1 and the change was instantaneous once again, and no flapping.

This makes me wonder if either the 1.0.0.1 or the 8.8.8.8 IP address is special - maybe Cloudflare doesn't want to change from 1.0.0.1 or maybe it doesn't want to change to 8.8.8.8. I'll try some more tests to distinguish between the two hypotheses.

Or maybe there is another DNS cache timeout somewhere that is longer than 1 minute?

Then I waited a few minutes and updated the IP to 192.168.0.123, and this time again, the change was instantaneous and there was no flapping.

Then I waited a few minutes and updated the IP to 192.168.0.42, and this time again, the change was instantaneous and there was no flapping.

So it would seem that at least for the IP range 192.168.0.x, as long as you wait a few minutes between each change, the update is instantaneous and reliable with no flapping.

Then I waited a few minutes and updated the IP to 8.8.8.8, and this time again, the change was instantaneous and there was no flapping.

Then I waited a few minutes and updated the IP to 1.0.0.1, and this time again, the change was instantaneous and there was no flapping.

Then I waited 2 minutes and updated the IP to 192.168.0.1, and this time again, the change was instantaneous and there was no flapping.

So it seems that most of the time, if you wait a few minutes before changing the IP, the change is indeed instantaneous with no flapping.

This makes me feel more confident using Cloudflare for instantaneous DDNS updates.

Cloudflare DNS flapping

I saw something interesting with DNS today.

I updated my DNS record, then immediately queried Cloudflare DNS (1.1.1.1) and it would switch between the old and new IPs for a while before settling on the new IP. 

[linux 2024-Mar-25 09:56:43]$ host -v mydomain.com
Trying "mydomain.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56089
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;mydomain.com.                    IN      A

;; ANSWER SECTION:
mydomain.com.             60      IN      A       127.0.0.1

Received 44 bytes from 1.1.1.1#53 in 5 ms

[linux 2024-Mar-25 09:56:46]$ host -v mydomain.com
Trying "mydomain.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63287
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;mydomain.com.                    IN      A

;; ANSWER SECTION:
mydomain.com.             60      IN      A       10.0.0.1

Received 44 bytes from 1.1.1.1#53 in 5 ms

Pretty interesting behavior.

My best guess is that different DNS servers are answering my query each time. I guess some nameservers get updated faster than other ones, and sometimes my queries are answered by one nameserver while at other times it's being answered by other nameservers. Some of the nameservers have the old IP while others have the new IP, hence the flapping behavior you see here. 

I don't know where exactly the flapping is taking place. Maybe Cloudflare internally uses some kind of load balancing mechanism that distributes DNS queries to different machines each time (or randomly)? Don't know.

In any case, this dashes my dreams of using 1.1.1.1 for instantaneous reliable DDNS, because it seems that sometimes the DNS change is not instantaneously reflected in the host/dig output and sometimes it flaps between the old and the new IP. Sadge.

 


Sunday, March 24, 2024

Hmmm...DNS cache expiry patterns ...

So I need a really fast DDNS because ... reasons ... so I tried out the cloudflare DDNS.

Basically what I did is I sent the update query and then I kept running the host command over and over again.

So I sent the update query then I saw it took around 44 seconds for the new value to show up in host.

I tried dynu which says it has 30 second TTL (WOW!) and saw that DNS update took around 48 seconds.

But then I tried dynu again and saw that this time the DNS update took only 4 seconds.

I investigated further and saw this pattern:

21:24:36 Request sent

21:25:24 DNS updated (48s)


21:26:21 Request sent

21:26:25 DNS updated (4s)

 

23:45:02 Request sent

23:46:01 DNS updated (59s)

 

23:46:22 Request sent

23:47:03 DNS updated (41s)

 

23:47:19 Request sent

23:48:03 DNS updated (44s)

 

23:48:57 Request sent

23:49:03 DNS updated (6s)

 
I find it interesting that in the last 4 cases, the DNS update happened near the start of the minute, but in the first two cases, the DNS update happened near second 24-25. 

It could just be a coincidence, or this could indicate that DNS cache timeouts are happening roughly in 1 minute intervals, but with some drift.

I tried again with Cloudflare:

23:58:11 Request sent

23:58:38 DNS updated (27s)

 

00:00:08 Request sent

00:00:52 DNS updated (44s)


00:01:17 Request sent

00:01:52 DNS updated (35s)

 

00:02:44 Request sent

00:02:53 DNS updated (9s)


Here again we see the familiar pattern of the DNS updating around the same second for multiple minutes consecutively, yet from 23:58 to 00:00 it changed from second 38 to second 52-53. 

It seems to me that there is some kind of pattern that occurs regardless of which DDNS service you use.

DNS updates happen via cache expiry, and it seems that the cache can expire around the same time every minute?

Also it seems that the expiry time also changes?

Not really sure what's going on.

 

 

 

In any case, my takeaway from all this is that you cannot count on a TTL of less than 60 seconds. The dynu TTL of 30 seconds does not seem to guarantee that you will see the DNS updated in less than 30 seconds after it changes - sometimes it is more than 30 seconds, sometimes it is less. It should be less than 60 seconds though.

I suppose if you want really fast DDNS, you could host your own special "DNS" server and send a packet there every second so that it will know immediately when your IP changes...

GOD DAMMIT my ISP doesn't support IPv6 😡😡😡

UPDATE: I contacted my ISP and managed to get them to give me IPv6. I can only pray that the IPv6 will continue to work in the future. It is actually kind of crazy that I had to contact them and go through the silly dance of "restart your router" "okay I did that, I still don't have IPv6" in order for them to actually fix their network so that IPv6 works for me. I think every ISP should provide working IPv6 out of the box.

 

 

I can't ping ipv6.google.com and my score on https://test-ipv6.com/ is 0/10

All the VPS vendors support IPV4 now...why doesn't my ISP support IPv6...it's unacceptable...There really ought to be some kind of government mandate that requires all ISPs to provide full IPv6 support.

I want to use one of those cheap IPv6-only VPSes, dammit! 😡😡😡

EDIT: Okay, I guess I'll use a he tunnel

EDIT: My router blocks ping so I can't even create a tunnel, god dammit. I can't use my VPS as the endpoint either, it says "This network is restricted"

EDIT: Okay, so I set up the he tunnel on one of my VPSes, now I can finally ping ipv6.google.com from that VPS but I get 2% packet loss when doing so. Note that I get 0% packet loss when I ping -4 google.com from that VPS so I'm pretty sure it's caused by using the he tunnel...😭😭😭😭😭😭

Ping stats:

--- ipv6.google.com ping statistics ---
5600 packets transmitted, 5506 received, 1.67857% packet loss, time 5608718ms
rtt min/avg/max/mdev = 216.176/256.840/278.688/11.551 ms

---  ping statistics ---
5613 packets transmitted, 5611 received, 0.0356316% packet loss, time 5623708ms
rtt min/avg/max/mdev = 10.223/10.337/25.901/0.226 ms

Maybe I am using the wrong he tunnel.

What would be a really nice way to charge for bandwidth?

According to the Cloudflare blog post AWS's Egregious Egress, it costs AWS around $1.20 per TB of traffic transferred: https://blog.cloudflare.com/aws-egregious-egress

Basically, Cloudflare says: If you run a 3mbps link at 100% utilization for a month, you'll have transferred around 1TB of data. If it costs you $1.20 to use the 3mbps link for a month, then you're effectively paying $1.20 per TB.

I think this would be a really nice way to charge for bandwidth. 

Instead of charging a monthly price for bandwidth, it would be nicer if customers could simply just buy a certain amount of bandwidth that never expires.

It would be nice if I could just pay $50 up front for 50TB of bandwidth, and that bandwidth would never expire, so that I can use it whenever I want.

I think that would be a really nice pricing model. I wonder why VPS providers don't use it.

Actually this is how a lot of prepaid SIM cards work - you can get IoT SIM cards with data that won't expire for 10 years. Pretty neat concept.

How AWS Lightsail bandwidth pricing works

So on the AWS Lightsail page there is a part which says only outbound transfer in excess of allowance is charged.

But in another part of the Lightsail pricing page it says that both inbound AS WELL AS outbound use up your transfer allowance.

Putting the two pieces of information together, it means that both inbound as well as outbound will use up your allowance. But once you have used up your allowance, you will only get charged for the outbound traffic.

So for example, if you pay $3.50 per month then you'll get the 1TB allowance.

So if you do 1TB of ingress followed by 1TB of egress, then the first 1TB of ingress will use up all of your bandwidth allowance, and then the 1TB of egress will be charged at the standard AWS rate of $0.09 per GB which comes out to $90.

Just thought I'd explain this for anyone else who was confused about the Lightsail pricing like I was.

By contrast, many big VPS providers such as Digital Ocean, Linode, Vultr, Contabo, and so on do not charge for ingress at all. Digital Ocean explicitly says "Any inbound transfers don't count against your bandwidth usage."

HOWEVER, I signed up for Vultr wanting to buy their $3.50 plan and later found out that it's only available in one location in the US. I thought their $3.50 plan was available in all locations. Keep this in mind because when you sign up for Vultr you have to buy some credit. I would suggest not putting in any credit until you're sure that the plan you want to buy is actually available.

AWS IPv4 pricing changes everything

UPDATE: AWS has now updated their pricing for the Lightsail to reflect their IPv4 charges. Now the cheapest IPv4 Lightsail plan will cost $5.

So my previous calculations regarding monthly AWS costs are now incorrect. 

Originally IP addresses were free as long as they were attached to your EC2 instance.

Now you will get charged around $3.60 per month for just the IPv4 address alone! And then you have to pay for the compute. 

Given that the compute itself only costs around $1.40 per month on a 3 year reserved plan, this means the IP address itself is more than double the price of the instance itself.

This is especially ridiculous given that a $3.50 Lightsail instance gives you both the compute as well as a static IP address.

So if you use EC2 you are paying $3.60 just for the IP address, not including the compute. When you could be paying $3.50 for Lightsail which includes free IP plus compute.

IMO the AWS IPv4 pricing is overpriced compared to some other places, e.g. Hetzner charges only $0.65 for an IPv4 address per month.

Saturday, March 23, 2024

More proof that I'm an idiot

Context: I wanted to block all non-Cloudflare IPs from accessing my server since I don't want people to be able to query my server and figure out what domains it hosts (yes, this is quite easy to do - a simple curl -k https://aaa.bbb.ccc.ddd:port -v will tell you).

So I wrote a bunch of rules into /etc/nftables.conf thinking that that's where nftables looks for the config file.

Nope, it actually turns out the real config is in /etc/sysconfig/nftables.conf

So I googled and even asked GPT4 and Gemini where do I find the real config, and couldn't find the answer. GPT4 and Gemini were totally useless.

In the end, I had to think for myself, so I thought "well, nftables is a service, so systemd will tell me what command it was started with and maybe that command will contain the location of the config file" and lo and behold:

$ systemctl status nftables
nftables.service - Netfilter Tables
    Loaded: loaded (/usr/lib/systemd/system/nftables.service; enabled; preset: disabled)
    Active: active (exited) since Sat 2024-03-23 14:29:13 PDT; 10min ago
      Docs: man:nft(8)
   Process: 3934471 ExecStart=/sbin/nft -f /etc/sysconfig/nftables.conf (code=exited, status=0/SUCCESS)
  Main PID: 3934471 (code=exited, status=0/SUCCESS)
       CPU: 21ms

Anyways, this just goes to prove what an idiot I am, that I had to Google for something so obvious and couldn't find it. I guess this was so trivial and obvious common sense that nobody bothered writing it down. 

GPT4 and Gemini were completely useless in this case.

Also, did you know that you can use named sets in nftables? Pretty useful feature: https://wiki.nftables.org/wiki-nftables/index.php/Sets


Golang gripes: net/http doesn't log certain errors

I just spent like 40 minutes trying to fix an issue where Cloudflare TLS proxying was working for all TLS ports (e.g. port 2087, 2083 and so on) EXCEPT for port 443. That was driving me nuts.

Context: So I had this Origin Rule which says that when request hostname is a certain value, change destination port to 12345.

Of course, since my server serves TLS on that port, this means normal HTTP traffic to that port won't work. So if you tried visiting that site on plain HTTP, you will get "Client sent an HTTP request to an HTTPS server." which makes sense and is fine.

But here's the problem: If you tried accessing https://mywebsite.com:2083 from a web browser, it would work just fine, but if you tried visiting https://mywebsite.com:443 from a web browser, then you would see error 400.

So, port 443 was special, somehow. But where was the special-case handling for port 443? Was it in Cloudflare or was it in my server? I had a separate process running on my server that received traffic on port 443, but in theory it shouldn't have mattered because the Origin Rule should have been rewriting the destination port to 12345, so none of the traffic would ever even hit port 443 on my server. 

Anyway, I killed that process (that was listening on port 443)  and it made no difference. 

I also killed my process that was listening on port 12345, and that DID make a difference - instead of returning 400, Cloudflare began returning the "server is down" error as soon as I killed the process listening on port 12345. Thus, I know the Origin Rule is working and that all traffic - including traffic to port 443 - was being redirected to port 12345.

So then I thought: Okay, maybe there was some kind of TLS handshake error on my server that only shows up when users connect to the Cloudflare proxy via port 443.

But I was literally not seeing any TLS handshake errors on my server process. But if I killed my server process then Cloudflare would return the "server is down" error message, which means that Cloudflare MUST HAVE BEEN GETTING SOME KIND OF RESPONSE FROM my server process, which resulted in a 400. Later on, when I restarted the server, the error message changed to some bad SSL encryption error - the fact that I couldn't get a useful or even consistent error message drove me crazy. I began Googling for this: I searched for Cloudflare origin rule fails error 400 but only on port 443 - no useful results.

But then for some reason, I thought of using curl instead of my web browser. And hey, whaddayaknow? Instead of returning error 400, curl actually returned an useful error message: "Client sent an HTTP request to an HTTPS server."

This error message shows up when I try to connect to https://mywebsite.com:443 but NOT when I try to connect to https://mywebsite.com:2083

This immediately gave me the hint that Cloudflare was decrypting the traffic. When TLS traffic goes to a Cloudflare proxy on port 443, Cloudflare decrypts it and forwards it to my server IN PLAINTEXT HTTP, BUT ONLY WHEN THE CLIENT SENT IT TO PORT 443 ON THE PROXY.

Anyway, so I simply switched my TLS setting on Cloudflare from Flexible to Full. And that made the error go away - now port 443 works just the same as port 2083.

Thinking about it, it kinda makes sense. Cloudflare does explicitly say that they decrypt TLS traffic and send it to your server via plain HTTP on the Flexible setting. But the fact that this DOESN'T happen for port 2083 is what threw me - Cloudflare didn't explicitly say that their TLS decryption ONLY happens for port 443 and not for the other TLS ports.

Anyway, I'm not sure what I learned from this, but I guess I understand how the Cloudflare Flexible vs Full encryption works a little bit better now.

Also, relevant meme:



EDIT: It now strikes me that the REAL problem was the lack of debugging error messages from the ListenAndServeTLS function.

It seems that by default, it only prints some TLS handshake errors. 

Not sure why it doesn't print anything when it responds with that "Client sent an HTTP request to an HTTPS server." error. 

I need to figure out how to make it log those errors.

I added logging in the handler function but the handler clearly wasn't getting called.

EDIT: It seems that there is no way to intercept those errors at present: https://stackoverflow.com/questions/45802492/how-can-i-customize-http-400-responses-for-parse-errors/45802962#45802962

See:

https://github.com/golang/go/blob/c2c4a32f9e57ac9f7102deeba8273bcd2b205d3c/src/net/http/server.go#L1927

 

I'm surprised that it still isn't possible to log such errors, even despite issues being raised about this from as far back as 2016:

https://github.com/golang/go/issues/12745 

 

I guess this is one of my gripes about Go's net/http - that it doesn't log some 400 errors and there is no way for the user to add logging for those errors.

 

EDIT: Actually, fuck it. I'll just make a PR for this and see what they say.

How to have multiple TLS certificates on the same IP ?

UPDATE: It turns out that Cloudflare actually allows you 10 Origin Rules which allow you to rewrite the destination port to whatever you want! So you can host a service on your web server on port (say) 8081. Now, if you tried to connect to Cloudflare proxy on port 8081, your traffic would just get dropped. But, if you created a custom rule that said that all traffic destined for a certain hostname should have the destination port redirected to port 8081, then you can connect to the Cloudflare proxy on any proxied port and it will rewrite the destination port to whatever you set it to! Pretty cool, right?

 

 

 

UPDATE: Apparently having x (repeated 3 times) dot com in your blog post automatically marks it as an adult blog post by blogger. Pretty interesting. I didn't know that. Changed it to aaa.com, now seems fine.

[This blog post is written for myself only]

So here is my problem:

  1. I want to host multiple domains (e.g. aaa.com and bbb.com)
  2. I want to host them on the same IP address. (IP addresses are very limited, so it's really really important for servers to be able to serve multiple domains from one IP address)
  3. I want to serve them over TLS.
  4. I want to use one TLS certificate for some domains, and another TLS certificate for other domains (yes, I do have one TLS certificate that is valid for some of my domains, but I want to use another TLS certificate for some of my other domains).
  5. I want to proxy my traffic through Cloudflare.

 Anyway, as far as I know there are only 2 solutions to this problem:

  1. Use SNI
  2. Use different ports

If you're proxying your traffic through Cloudflare (the cloud icon on the DNS page in Cloudflare) then ALL traffic will first go thru Cloudflare proxy server before ending up at your server.

This means that if you're hosting a service on a non-proxied port, like port 8081, then try to access that port through your domain, your traffic will simply get dropped by Cloudflare - the packets simply won't arrive at your server!

Unfortunately, the number of ports proxied by Cloudflare is quite small -- only a dozen or so -- and only like 2 or 3 are actually cached - port 80 and port 443 and I think 8080 (haven't tried).

So if you want Cloudflare proxying, you can only choose one out of a dozen or so ports. And if you want Cloudflare caching then your options are basically limited to port 80 or 443.

But let's take a step back. Why are we limited to these 2 options? Why can't we just build a reverse proxy like we can with plain old HTTP traffic?

The reason you can't reverse proxy TLS traffic the same way you reverse proxy plain old HTTP traffic is because during the initial TLS handshake (prior to SNI), the server has to send over the certificate before the client indicates which domain it's trying to connect to. When the server has multiple certificates, it doesn't know which certificate to send over. If it sends over the wrong certificate then the handshake simply fails.

But now there is this cool TLS extension called SNI - Server Name Indication (it's badly named - it should really be called DNI - Domain Name Indication, because the domain name is what is being indicated).

Without SNI, you couldn't have a TLS reverse proxy. Why? Because you want your TLS reverse proxy to direct packets to the service based on the domain name. But the initial TLS handshake packets don't contain the domain name, so you don't know which service to direct the packets to. All you can see is just the IP and port, which are the same regardless of which domain the client is requesting.

So without SNI, it would be impossible to do even something as simple as hosting multiple domains on the same IP over TLS on the same port - something that is trivial to do with HTTP, because HTTP is not encrypted so the reverse proxy can see which domain the client is requesting and just direct the traffic to the appropriate service. You can't do that with TLS. If SNI didn't exist, this blog post would be titled "Why TLS Is Annoying". 



Anyway, using different ports to serve different websites is clearly not a very scalable solution (since Cloudflare only proxies a dozen or so ports), but it also lacks caching, and just generally feels pretty hacky.

So I think SNI is the right way to go here.

EDIT: Found this link about writing a reverse proxy that does SNI in Go: https://www.agwa.name/blog/post/writing_an_sni_proxy_in_go

See also: https://www.gilesthomas.com/2013/07/sni-based-reverse-proxying-with-golang



I guess a further question to ask is whether or not the reverse proxy should decrypt the TLS traffic.

I think it should not, because it would be simpler to have each separate service managing its own TLS certificates.



Friday, March 22, 2024

How to do 2-way bidirectional communication between Raspberry Pi and Pico over USB serial

Original Source: https://forums.raspberrypi.com/viewtopic.php?t=300474

 

Spent some time looking for this really basic trivial thing that I thought would be easy to find online. 

So I want my Pico to constantly send sensor readings to my Pi, and then my Pi to react in real-time to changes in the sensor readings. So I wanted to be able to have a Python program running in the background on my Pi that constantly receives data from my Pico and reacts to it in real time.

Anyway, here is my fully tested and fully working code (yes I tested it, yes it works):

Code that runs on the Pico:

import select
from machine import Pin, Timer
import sys
import time

led = Pin(25, Pin.OUT)

count = 0
while True:
    count += 1
    time.sleep(0.5)
    led.toggle()
    if select.select([sys.stdin],[],[],0)[0]:
        line = sys.stdin.readline()        
        print("You said:", line, count)
    else:
        print("..", count)

The LED toggle is there to tell you that the program is running - if the LED is blinking, then it means the program is running.

Code that runs on the Raspberry Pi:

#!/usr/bin/env python3
import time
import os
import serial

if os.path.exists('/dev/ttyACM0') == True:
    # Set timeout=0 for nonblocking read
    # Set timeout=None for blocking read
    ser = serial.Serial('/dev/ttyACM0', 115200, timeout=None)
    time.sleep(1)
else:
    print("ttyACM0 not detected")
    exit()

last_time = time.time()
while True:
    # VERY IMPORTANT: Input MUST be newline-terminated!!!!!
    if time.time() - last_time > 1:
        last_time = time.time()
        ser.write(bytes("hello\n".encode('ascii')))
    print("Waiting for readline to return...")
    pico_data = ser.readline()
    pico_data = pico_data.decode("utf-8","ignore")
    print(pico_data)

Original Source: https://forums.raspberrypi.com/viewtopic.php?t=300474


Thursday, March 21, 2024

Why can't you hardcode NTP IP???

UPDATE (21 March 2024): Some relevant text from RFC8633:

https://www.rfc-editor.org/rfc/rfc8633.html#section-7

   Note well that using a single anycast address for NTP presents its
   own potential issues.  It means each client will likely use a single
   time server source.  A key element of a robust NTP deployment is each
   client using multiple sources of time.  With multiple time sources, a
   client will analyze the various time sources, select good ones, and
   disregard poor ones.  If a single anycast address is used, this
   analysis will not happen.  This can be mitigated by creating
   multiple, separate anycast pools so clients can have multiple sources
   of time while still gaining the configuration benefits of the anycast
   pools.

   If clients are connected to an NTP server via anycast, the client
   does not know which particular server they are connected to.  As
   anycast servers enter and leave the network or the network topology
   changes, the server to which a particular client is connected may
   change.  This may cause a small shift in time from the perspective of
   the client when the server to which it is connected changes.  Extreme
   cases where the network topology changes rapidly could cause the
   server seen by a client to rapidly change as well, which can lead to
   larger time inaccuracies.  It is RECOMMENDED that network operators
   only deploy anycast NTP in environments where operators know these
   small shifts can be tolerated by the applications running on the
   clients being synchronized in this manner.

 

UPDATE (21 March 2024): Some hacky workarounds: You can probably hardcode these IPs, though there is absolutely no guarantee that they will continue to work:

miyuru on Dec 30, 2022 | prev | next [–]

> It would be great to see Google or Cloudflare use their infrastructure to provide anycasted NTP IP addresses.

Google, Cloudflare and Facebook has vanity IPv6 address, pretty sure they are all static anycast IPs.

time.google.com - 2001:4860:4806::

time.cloudflare.com - 2606:4700:f1::123

time.facebook.com - 2a03:2880:ff0c::123 


jedisct1 on Dec 30, 2022 | parent | prev | next [–]

As for IPv4, time.google.com has been 216.239.35.0 since 2016, so it's unlikely to change anytime soon either.


I can confirm that time.google.com still resolves to that IP address. I also ran these commands today (21 March 2024) for recordkeeping purposes:

$ host time.facebook.com
time.facebook.com has address 129.134.29.123


$ host time.cloudflare.com
time.cloudflare.com has address 162.159.200.123
time.cloudflare.com has address 162.159.200.1


UPDATE (21 March 2024): Still no viable solutions, see below.

UPDATE: I see that there are already-existing solutions for the problem I described:

  • tlsdate - https://github.com/ioerror/tlsdate       (but see below)
  • roughtime proposal - https://datatracker.ietf.org/doc/html/draft-ietf-ntp-roughtime

UPDATE: Here's a relevant blog post by Hanno Bock: https://blog.hboeck.de/plugin/tag/tlsdate

tlsdate is a hack abusing the timestamp of the TLS protocol. The TLS timestamp of a server can be used to set the system time. This doesn't provide high accuracy, as the timestamp is only given in seconds, but it's good enough.

I've used and advocated tlsdate for a while, but it has some problems. The timestamp in the TLS handshake doesn't really have any meaning within the protocol, so several implementers decided to replace it with a random value. Unfortunately that is also true for the default server hardcoded into tlsdate.

Some Linux distributions still ship a package with a default server that will send random timestamps. The result is that your system time is set to a random value. I reported this to Ubuntu a while ago. It never got fixed, however the latest Ubuntu version Zesty Zapis (17.04) doesn't ship tlsdate any more.

Given that Google has shipped tlsdate for some in ChromeOS time it seems unlikely that Google will send randomized timestamps any time soon. Thus if you use tlsdate with www.google.com it should work for now. But it's no future-proof solution.

TLS 1.3 removes the TLS timestamp, so this whole concept isn't future-proof. Alternatively it supports using an HTTPS timestamp. The development of tlsdate has stalled, it hasn't seen any updates lately. It doesn't build with the latest version of OpenSSL (1.1) So it likely will become unusable soon.

Roughtime

Roughtime is a Google project. It fetches the time from multiple servers and uses some fancy cryptography to make sure that malicious servers get detected. If a roughtime server sends a bad time then the client gets a cryptographic proof of the malicious behavior, making it possible to blame and shame rogue servers. Roughtime doesn't provide the high accuracy that NTP provides.

From a security perspective it's the nicest of all solutions. However it fails the availability test. Google provides two reference implementations in C++ and in Go, but it's not packaged for any major Linux distribution. Google has an unfortunate tendency to use unusual dependencies and arcane build systems nobody else uses, so packaging it comes with some challenges.

But wait, it looks like roughtime also requires DNS? At least I haven't been able to find any roughtime IPs that I can hardcode. 





 

 

Original post:

People online say that you shouldn't hardcode NTP IPs, but I don't see why this has to be the case. 

You can hardcode 1.1.1.1 for DNS, so why can't you hardcode an IP for NTP? 

People online say that the NTP server might go down, but that shouldn't be an issue because IP anycast will automatically route the traffic to the nearest available server.

People online say that you might overload the server, but you can do load balancing internally within your datacenter in any number of ways, so that shouldn't be an issue either.

You can argue that IP anycast won't work because the packets might get redirected to another server, but this happens so rarely in practice that it shouldn't be a problem, and you can just try again if it fails.

I don't see what's so special about NTP that you can't have an anycast IP for it like 1.1.1.1

I am writing this blog post because TLS won't work if your clock is wrong. If you force your machine to only use DNS-over-HTTPS, then you can't resolve any domains if your cloick is wrong. 

So this leads to a catch-22 situation: Your DNS doesn't work because your clock is wrong, and you can't fix your clock because you can't resolve NTP domain names to IP addresses because your DNS doesn't work.

This problem would be solved if we could hardcode an IP address for NTP just like we can do with DNS (1.1.1.1)

EDIT: I see that someone has already made a blog post on this: https://news.ycombinator.com/item?id=34177331


> Alternatively it would be good to use an anycast IP for NTP. This is normally a bad idea because it makes calculating skew hard/unreliable, but that really should just mean a poorly sync'ed clock. So set the Anycast clock to be an intentionally high/poor Stratum score, list this along with a DNS based address so it's used until the encrypted DNS can be resolved with a better Stratum score. -- Bob H

Yes, so I suppose anycast might cause poor skew, though that isn't a problem for this use case because TLS will work even if your clock is a few minutes wrong. 

But I suppose we could create a simpler version of NTP whose purpose is to just set your clock to some good-enough-for-TLS time, and then switch to actual NTP once your DNS works.


Sunday, March 10, 2024

Protip: Write your email in a separate text editor then copy it into Gmail

Today I fucked up by writing an email and accidentally pressing ctrl+enter (meant to type shift+enter) and then I looked around for the Undo Send button and couldn't find it so I clicked on my Sent box and right-clicked on my email there and couldn't see an Undo Send option either. In the end I could not undo the accidental send.

So, 3 lessons learned:

  1. Write your emails in a separate text editor, then copy-paste it into your browser email editor once you're done.
  2. Disable the Gmail keyboard shortcuts in the Gmail settings.
  3. Remember that the Undo Send button is in the tiny little popup on the bottom left hand side of the screen. If you click on anything in Gmail then the popup goes away and you can't undo your send anymore.

Tbh I think the Undo Send should be in the right-click menu in the Sent box. It's really bugging me - I think this is a serious usability issue. Also the Undo Send time period should be customizable up to 1 minute so that I have time to go to my Sent box and manually Unsend the email.

But anyway, writing your email in an external text editor is foolproof and will work regardless of your email provider and completely mitigates all of the above mentioned problems, so as long as you do that you don't need to worry about any of what I just said.

Saturday, March 9, 2024

What is the cheapest VPS? AWS vs GCP vs Azure

UPDATE: I got the AWS pricing wrong. You actually need to pay an additional ~$3.60 per month for the IPv4 address, even if the IP is always attached to your instance. This is a recent pricing update and completely changes the cost calculations. This means that the minimum possible AWS EC2 instance cost is now something like $5.60 a month if you include the IPv4.

UPDATE: I got the Azure pricing wrong. If you select any Linux OS image Azure will force you to get a 30GB OS disk which costs $2.40 if you're using standard SSD (more if you're using premium). This brings the Azure pricing to be more than double the AWS price for the t4g.nano ($1.90 per month for t4g.nano including the mandatory 8GB EBS, compared to $3.82 per month for b1ls including the mandatory OS disk). See below for original blog post.

UPDATE: I tested the AWS t4g.nano disk performance and measured 131MB/s write speed for my 9GB disk which uses gp3 storage (which is the default). See below for more details.

 

So I wanted a very small VPS that I can run a lightweight Linux instance on. I will only be using it for personal uptime monitoring so very little egress (I know AWS, GCP, and Azure all give 100GB free monthly egress and that should be more than enough) which means I don't have to worry about bandwidth costs. One of the great attractions of these big cloud vendors is that they offer unlimited free ingress traffic, which few VPS vendors provide.

UPDATE: If you really just want uptime monitoring, fly.io gives you 3x free 256mb "VM" with 160GB monthly egress and free ingress, which is probably enough - but note that fly.io is not a VPS unlike the other services mentioned in this post.

So I looked at AWS, GCP and Azure and found that the cheapest instances are as follows:

  • AWS: t4g.nano (ARM64) - 0.5G RAM - 2x ARM vCPU - both CPU and disk are burstable
  • Azure: b1ls  - 0.5G RAM - 1x x64 vCPU - both CPU and disk are burstable
  • GCP: e2.micro  - 1G RAM

The t4g.nano and b1ls come out to around the same price for similar configurations. AWS requires you to add a certain amount of EBS to match the snapshot image. Azure only gives you 4GB ephemeral disk for free, so if you want persistence you need to pay more. UPDATE: When trying to create a Linux b1ls instance Azure will automatically add a 30GB OS disk which costs $2.40 if you're using standard SSD.

3 year reserved b1ls: $1.42 per month
E4 SSD 32 GiB: $2.40 per month
Total cost for cheapest possible b1ls: $3.82 per month

With 8GB persistent storage you are looking at around $1.90 per month for t4g.nano in the US (and only slightly more expensive outside of the US) vs $2.02 $3.82 per month for the b1ls in West US and $2.30 $4.13 in Central US.

GCP e2.micro comes out to be more expensive at $2.75 including the smallest possible boot disk even with the 3 year committed use discount, but that's only in the cheapest US regions. In other regions it is much much more expensive e.g. in Los Angeles (us-west2) it is $3.79 / month, and outside of the US it is even more expensive. The f1.micro would have been cheaper than the e2.micro except for the fact that the f1.micro is not eligible for the committed use discount, only for the sustained use discount which is only like 30%.

It should be noted that Azure offers price matching with AWS for equivalent services. Maybe this explains why the AWS and Azure prices were so similar for the instances that I looked at. It's not even close lol, Azure is WAY more expensive than AWS: AWS only costs $1.90 per month while Azure costs $3.82 per month, it's more than twice the cost and even more than GCP even in the US.

Of course, this says nothing about how the CPU/disk performance compares for the t4g.nano vs the b1ls vs the e2.micro.

Tbh I can see why the e2.micro is more expensive than the t4g.nano since e2 micro has 1G ram compared to the half gig RAM in the t4g.nano ... but I can't see how the Azure price is even remotely justifiable. Azure says they price match AWS but with the 30GB OS disk I don't see how they would do that unless they make the OS disk free (or just fucking downgrade it to 8GB why the fuck does a Linux image require 30GB??????? AWS only requires a 8GB boot disk for Debian and GCP only requires 10GB, so it really is outrageous that Azure requires a 30GB OS disk).


EDIT:

A few years ago (in 2019) Lerdorf wrote this blog post comparing different cheap VPS providers: https://toys.lerdorf.com/low-cost-vps-testing

He obtained the follow numbers for AWS Lightsail disk performance:

Disk IO 65 MB/s write, 65 MB/s read

However, that was back in 2019. In 2020 AWS introduced gp3 disks which are newer and more performant than the old SSDs:

In December 2020, AWS announced general availability of a new Amazon EBS General Purpose SSD volume type, gp3. AWS designed gp3 to provide predictable 3,000 IOPS baseline performance and 125 MiB/s, regardless of volume size. With gp3 volumes, you can provision IOPS and throughput independently, without increasing storage size, at costs up to 20% lower per GB compared to gp2 volumes. 

Unlike gp2 where performance is tied to disk size, with gp3 you always get the same performance regardless of disk size which is really good if you want to have a really small disk with decent performance (which is exactly I want). And also gp3 is 20% cheaper than gp2.

To test this, I spun up a t4g.nano with 9GB of gp3 and ran fio and got these results:

Run status group 0 (all jobs):
 WRITE: bw=131MiB/s (137MB/s), 131MiB/s-131MiB/s (137MB/s-137MB/s), io=7996MiB (8384MB), run=61240-61240msec

Disk stats (read/write):
 nvme0n1: ios=2347/32762, merge=28/179, ticks=11093/3568868, in_queue=3579960, util=99.43%

When I saw this I was shocked. I had misread the 125MiB/s as MEGABITs per second, but actually it's MEBIBYTEs per second, which is over 8 times larger! So 125 megabits per second is only around 15.6 megaBYTES per second (which is pretty slow, even for spinning rust) but actually AWS gp3 gives 125 MEBIBYTES per second which is around 131 megaBYTES per second, which is pretty good!


There is also Oracle Cloud which gives you 200GB of Always Free storage. If you select the highest performance disk then you can get around 100MB/s throughput at around 60GB of disk storage, which I think is within the Always Free tier usage limit but I'm not sure, will have to wait and see if Oracle charges me for it.


Anyway, I didn't measure Azure disk performance.