aboutsummaryrefslogtreecommitdiff
path: root/doc/todo/design_for_cross-linking_between_content_and_CGI.mdwn
blob: d8040bf3e31e18ba7e4961e9d0b0838cfbeb8981 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
We're accumulating a significant number of bugs related to cross-linking
between the content and the CGI not being as relative as we would like.
This is an attempt to design a solution for them all in a unified way,
rather than solving one bug at the cost of exacerbating another.
--[[smcv]]

# Terminology

* Absolute: starts with a scheme, like
  `http://example.com/ikiwiki.cgi`, `https://www.example.org/`

* Protocol-relative: starts with `//` like `//example.com/ikiwiki.cgi`

* Host-relative: starts with `/` like `/ikiwiki.cgi`

* Relative: starts with neither `/` nor a scheme, like `../ikiwiki.cgi`

# What we need

* Static content must be able to link to other static content

* Static content must be able to link to the CGI

* CGI-generated content must be able to link to arbitrary
  static content (it is sufficient for it to be able to link
  to the "root" of the `destdir`)

* CGI-generated content must be able to link to the CGI

# Constraints

* URIs in RSS feeds must be absolute, because feed readers do not have
  any consistent semantics for the base of relative links

* If we have a `<base href>` then HTML 4.01 says it must be
  absolute, although HTML 5 does relax this by defining semantics
  for a relative `<base href>` - it is interpreted relative to the
  "fallback base URL" which is the URL of the page being viewed
  ([[bugs/trouble_with_base_in_search]],
  [[bugs/preview_base_url_should_be_absolute]])

* It is currently possible for the static content and the CGI
  to be on different domains, e.g. `www.example.com`
  vs. `cgi.example.com`; this should be preserved

* It is currently possible to serve static content "mostly over
  HTTP" (i.e. advertise a http URI to readers, and use a http
  URI in RSS feeds etc.) but use HTTPS for the CGI

* If the static content is served over HTTPS, it must refer
  to other static content and the CGI via HTTPS (to avoid
  mixed content, which is a vulnerability); this may be
  either absolute, protocol-relative, host-relative or relative

* If the CGI is served over HTTPS, it must refer to static
  content and the CGI via HTTPS; again, this may be either
  either absolute, protocol-relative, host-relative or relative
  ([[todo/Protocol_relative_urls_for_stylesheet_linking]])

* Because reverse proxies and `w3mmode` exist, it must be
  possible to configure ikiwiki to not believe the `HTTPS`, etc.,
  CGI variables, and force a particular scheme or host
  ([[bugs/W3MMode_still_uses_http:__47____47__localhost__63__]],
  [[forum/Using_reverse_proxy__59___base_URL_is_http_instead_of_https]],
  [[forum/Dot_CGI_pointing_to_localhost._What_happened__63__]])

* For relative links in page-previews to work correctly without
  having to have global state or thread state through every use of
  `htmllink` etc., `cgitemplate` needs to make links in the page body
  work as if we were on the page being previewed.

# "Would be nice"

* In general, the more relative the better

* [[schmonz]] wants to direct all CGI pageviews to https
  even if the visitor comes from http (but this can be done
  at the webserver level by making http://example.com/ikiwiki.cgi
  a redirect to https://example.com/ikiwiki.cgi, so is not
  necessarily mandatory)

* [[smcv]] has some sites that have non-CA-cartel-approved
  certificates, with a limited number of editors who can be taught
  to add SSL policy exceptions and log in via https;
  anonymous/read-only actions like `do=goto` should
  not go via HTTPS, since random readers would get scary SSL
  warnings
  ([[todo/want_to_avoid_ikiwiki_using_http_or_https_in_urls_to_allow_serving_both]],
  [[forum/CGI_script_and_HTTPS]])

* It would be nice if the CGI did not need to use a `<base>` so that
  we could use host-relative URI references (`/sandbox/`) or scheme-relative
  URI references (`//static.example.com/sandbox/`)
  (see [[bugs/trouble_with_base_in_search]])

As a consequence of the "no mixed content" constraint, I think we can
make some assumptions:

* if the `cgiurl` is http but the CGI discovers at runtime that it has
  been reached via https, we can assume that the https equivalent,
  or a host- or protocol-relative URI reference to itself, would work;

* if the `url` is http but the CGI discovers at runtime that it has been
  reached via https, we can assume that the https equivalent of the `url`
  would work

In other words, best-practice would be to list your `url` and `cgiurl`
in the setup file as http if you intend that they will most commonly
be accessed via http (e.g. the "my cert is not CA-cartel approved"
use-case), or as https if you intend to force accesses into
being via https (the "my wiki is secret" use-case).

# Regression test

I've added a regression test in `t/relativity.t`. We might want to
consider dropping some of it or skipping it unless a special environment
variable is set once this is all working, since it's a bit slow.
--[[smcv]]

# Remaining bugs

## Arguable

* Configure the url and cgiurl to both be https, then access the
  CGI via a non-https address. The stylesheet is loaded from the http
  version of the static site, but maybe it should be forced to https?

* Configure url = "http://static.example.com/",
  cgiurl = "http://cgi.example.com/ikiwiki.cgi" and access the
  CGI via staging.example.net. Self-referential links to the
  CGI point to cgi.example.com, but maybe they should point to
  staging.example.net?

* *(possibly incomplete, look for TODO and ??? in relativity.t)*